The 3rd FSFE System Hackers hackathon

Fortunately not how the FSFE’s infrastructure looks like

On 10 and 11 October, the FSFE System Hackers met in person to tackle problems and new features regarding the servers and services the FSFE is running. The team consists of dedicated volunteers who ensure that the community and staff can work effectively. The recent meeting built on the great work of the past 2 years which have been shaped by large personal and technical changes.

The System Hackers are responsible for the maintenance and development of a large number of services. From the fsfe.org website’s deployment to the mail servers and blogs, from Git to internal services like DNS and monitoring, all these services, virtual machines and physical servers are handled by this friendly group that is always looking forward to welcoming new members.

Overview of the FSFE’s services and servers

Overview of the FSFE’s services and servers

So in October, six of us met in Cologne. Fittingly, according to a saying in this region, if you do something for the third time, it’s already tradition. So we accomplished this after successful meetings in Berlin (April 2018) and Vienna (March 2019). And although it took place on workdays, it’s been the meeting with the highest participation so far!

Getting. Things. Done!

After the first and second meeting were mostly about getting an overview of historically grown and sparsely documented infrastructure and bringing it into a stable state, we were able to deal with a few more general topics this time. At the same time, we exchanged our knowledge with newly joined team members. Please find the areas we worked on below:

  • Florian migrated the FSFE Blogs to a new server and thereby also updated the underlying Wordpress to the latest version. This has been a major blocker for several other tasks and our largest security risk. There are still a few things left to do, e.g. creating a theme in line with the FSFE design and some announcement to the community. However, the most complicated part is done!
  • Altogether, we upgraded a lot of machines to Debian 10, just after we lifted most servers to Debian 9 in March. Some are still missing, but since the migration is rather painless, we can do that during the next months.
  • We confirmed that the new decentralised backup system setup by myself and based on Borg works fine. This gives us more confidence in our infrastructure.
  • Thanks to Florian and Albert, we finally got rid of the last 2 services that were not using Let’s Encrypt’s self-renewing certificates.
  • Vincent and Francesco took care of finishing the migration of all our Docker containers to use the Docker-in-Docker deployment instead of the hacky Ansible playbooks we used initially. This has a few security advantages and enables the next developments for a more resilient Docker infrastructure.
  • At the moment, all our Docker containers run on one single virtual machine. Although this runs on a Proxmox/Ceph cluster, it’s obviously a single point of failure. However, for a distribution on multiple servers we lack the hardware resources. Nonetheless, we already have concrete plans how to make the Docker setup more resilient as soon as we have more hardware available. Vincent documented this on a wiki page.
  • On the human side, we made sure that all of us know what’s on the plate for the next weeks and months. We have quite a few open issues collected in our Kanban board, and we quickly went through all of them to sketch the possible next steps and distribute responsibilities.

Started projects in the making

Two days are quite some time and we worked hard to use them as effectively as possible, so some tasks have been started but could not be completed – partly because we just did no have enough time, partly because they require more coordination and in-depth discussion:

  • As follow-up on a few unpleasant surprises with Mailman’s default values, we figured that it is important to have an automatic overview of the most sensible settings of the 127 (!) mailing lists we host. Vincent started to work on a way to extract this information in a human- and machine-readable format and merge/compare it with the more verbose documentation on the mailing lists we have internally.
  • Francesco tackled a different weak point we have: monitoring. We lack a tool that informs us immediately about problems in our infrastructure, e.g. defunct core services, full disk drives or expired certificates. Since this is not trivial at all, it requires some more time.
  • Thomas, maintainer of the FSFE wiki, researched on a way to better organise and distribute the SSH accesses in our team. Right now, we have no comfortable way to add or remove SSH keys on our more than 20 machines. His idea is to use an Ansible playbook to manage these, and thereby also create a shared Ansible inventory which can be used as a submodule for the other playbooks we use in the team so we don’t have to maintain all of them individually if a machine is added, changed or removed.
  • One of the most ancient physical machines we still run is hosting the SVN service which is only used by one service now: DNS. We started to work on migrating that over to Git and simultaneously improving the error-checking of the DNS configuration. Albert and I will continue with that gradually.
  • Not on the system hackers meeting itself but two days later, Björn, Albert and I worked on getting a Nextcloud instance running. Caused by our rather special LDAP setup, we had to debug a lot of strange behaviour but finally figured everything out. Now, the last missing blocker is some user/permission setting within our LDAP. As soon as this is finished, we can shut down one more historically grown, customised-hacked and user-unfriendly service.

Overall, the perspective for the System Hackers is better than ever. We are a growing team carried by motivated and skilled volunteers with a shared vision of how the systems should develop. At the same time, we have a lot of public and internal documentation available to make it easy for new people to join us.

I would like to thank Albert, Florian, Francesco, Thomas and Vincent for their participation in this meeting, and them and all other System Hackers for their dedication to keep the FSFE running!



Comments