Fortunately not how the FSFE’s infrastructure looks like
On 10 and 11 October, the FSFE System Hackers met in person to tackle
problems and new features regarding the servers and services the FSFE
is running. The team consists of dedicated volunteers who ensure that
the community and staff can work effectively. The recent meeting built
on the great work of the past 2 years which have been shaped by large
personal and technical changes.
The System Hackers are responsible for the maintenance and development
of a large number of
services. From the fsfe.org
website’s deployment to the mail servers and blogs, from Git to
internal services like DNS and monitoring, all these services, virtual
machines and physical servers are handled by this friendly
group that is always
looking forward to welcoming new members.
So in October, six of us met in Cologne. Fittingly, according to a
saying in this region, if you do something for the third time, it’s
already tradition. So we accomplished this after successful meetings in
Berlin (April 2018) and Vienna (March 2019). And although it took place
on workdays, it’s been the meeting with the highest participation so
far!
Getting. Things. Done!
After the first and second meeting were mostly about getting an
overview of historically grown and sparsely documented infrastructure
and bringing it into a stable state, we were able to deal with a few
more general topics this time. At the same time, we exchanged our
knowledge with newly joined team members. Please find the areas we
worked on below:
Florian migrated the FSFE Blogs to a new server and thereby also
updated the underlying Wordpress to the latest version. This has been
a major blocker for several other tasks and our largest security risk.
There are still a few things left to do, e.g. creating a theme in line
with the FSFE design and some announcement to the community. However,
the most complicated part is done!
Altogether, we upgraded a lot of machines to Debian 10, just after we lifted most
servers to Debian 9 in March. Some are still missing, but since the
migration is rather painless, we can do that during the next months.
We confirmed that the new decentralised backup system setup by myself
and based on Borg works fine. This gives us more confidence in our
infrastructure.
Thanks to Florian and Albert, we finally got rid of the last 2
services that were not using Let’s Encrypt’s self-renewing
certificates.
Vincent and Francesco took care of finishing the migration of all our Docker containers
to use the Docker-in-Docker deployment instead of the hacky Ansible
playbooks we used initially. This has a few security advantages and
enables the next developments for a more resilient Docker
infrastructure.
At the moment, all our Docker containers run on one single virtual
machine. Although this runs on a Proxmox/Ceph cluster, it’s obviously
a single point of failure. However, for a distribution on multiple
servers we lack the hardware resources. Nonetheless, we already have
concrete plans how to make the Docker setup more resilient as soon as
we have more hardware available. Vincent documented this on a wiki
page.
On the human side, we made sure that all of us know what’s on the
plate for the next weeks and months. We have quite a few open issues
collected in our Kanban board, and we quickly went through all of them
to sketch the possible next steps and distribute responsibilities.
Started projects in the making
Two days are quite some time and we worked hard to use them as
effectively as possible, so some tasks have been started but could not
be completed – partly because we just did no have enough time, partly
because they require more coordination and in-depth discussion:
As follow-up on a few unpleasant surprises with Mailman’s default
values, we figured that it is important to have an automatic overview
of the most sensible settings of the 127 (!) mailing lists we host.
Vincent started to work on a way to extract this information in a
human- and machine-readable format and merge/compare it with the more
verbose documentation on the mailing lists we have internally.
Francesco tackled a different weak point we have: monitoring. We lack
a tool that informs us immediately about problems in our
infrastructure, e.g. defunct core services, full disk drives or
expired certificates. Since this is not trivial at all, it requires
some more time.
Thomas, maintainer of the FSFE wiki, researched on a way to better organise and distribute the SSH
accesses in our team. Right now, we have no comfortable way to add or
remove SSH keys on our more than 20 machines. His idea is to use an
Ansible playbook to manage these, and thereby also create a shared
Ansible inventory which can be used as a submodule for the other
playbooks we use in the team so we don’t have to maintain all of them
individually if a machine is added, changed or removed.
One of the most ancient physical machines we still run is hosting the
SVN service which is only used by one service now: DNS. We started to work
on migrating that over to Git and simultaneously improving the
error-checking of the DNS configuration. Albert and I will continue
with that gradually.
Not on the system hackers meeting itself but two days later, Björn,
Albert and I worked on getting a Nextcloud instance running. Caused by
our rather special LDAP setup, we had to debug a lot of strange
behaviour but finally figured everything out. Now, the last missing
blocker is some user/permission setting within our LDAP. As soon as
this is finished, we can shut down one more historically grown,
customised-hacked and user-unfriendly service.
Overall, the perspective for the System Hackers is better than ever. We
are a growing team carried by motivated and skilled volunteers with a
shared vision of how the systems should develop. At the same time, we
have a lot of public and internal documentation available to make it
easy for new people to join us.
I would like to thank Albert, Florian, Francesco, Thomas and Vincent for
their participation in this meeting, and them and all other System
Hackers for their dedication to keep the FSFE running!
Comments