content/blog/2019-10-system-hackers-meeting-report.md

---
title: Tales from the 3rd FSFE System Hackers hackathon
date: 2019-10-22
categories:
  - english
tags:
  - fsfe
  - report
  - server
draft: true
headerimage: /blog/system-chaos.jpg
headercredits: Fortunately not how the FSFE's infrastructure looks like

---

On 10 and 11 October, the FSFE System Hackers met in person to tackle
problems and new features regarding the servers and services the FSFE
is running. The team consists of dedicated volunteers who ensure that
the community and staff can work effectively. The recent meeting built
on the great work of the past 2 years which have been shaped by large
personal and technical changes.

The System Hackers are responsible for the maintenance and development
of a [large number of
services](https://wiki.fsfe.org/TechDocs/Services). From the fsfe.org
website's deployment to the mail servers and blogs, from Git to
internal services like DNS and monitoring, all these services, virtual
machines and physical servers are handled by [this friendly
group](https://wiki.fsfe.org/Teams/System-Hackers/) that is always
looking forward to welcoming new members.

{{< figure src="/img/blog/system-servers.png" caption="Overview of the FSFE's services and servers" >}}

So in October, six of us met in Cologne. Fittingly, according to a
saying in this region, if you do something for the third time, it's
already tradition. So we accomplished this after successful meetings in
Berlin (April 2018) and Vienna (March 2019). And although it took place
on workdays, it's been the meeting with the highest participation so
far!

## Getting. Things. Done!

After the first and second meeting were mostly about getting an
overview of historically grown and sparsely documented infrastructure
and bringing it into a stable state, we were able to deal with a few
more general topics this time. At the same time, we exchanged our
knowledge with newly joined team members. Please find the areas we
worked on below:

* Florian migrated the FSFE Blogs to a new server and thereby also
  updated the underlying Wordpress to the latest version. This has been
  a major blocker for several other tasks and our largest security risk.
  There are still a few things left to do, e.g. creating a theme in line
  with the FSFE design and some announcement to the community. However,
  the most complicated part is done!
* Altogether, we upgraded a lot of machines to Debian 10, just after we lifted most
  servers to Debian 9 in March. Some are still missing, but since the
  migration is rather painless, we can do that during the next months.
* We confirmed that the new decentralised backup system setup by myself
  and based on Borg works fine. This gives us more confidence in our
  infrastructure.
* Thanks to Florian and Albert, we finally got rid of the last 2
  services that were not using Let's Encrypt's self-renewing
  certificates.
* Vincent and Francesco took care of finishing the migration of all our Docker containers
  to use the Docker-in-Docker deployment instead of the hacky Ansible
  playbooks we used initially. This has a few security advantages and
  enables the next developments for a more resilient Docker
  infrastructure.
* At the moment, all our Docker containers run on one single virtual
  machine. Although this runs on a Proxmox/Ceph cluster, it's obviously
  a single point of failure. However, for a distribution on multiple
  servers we lack the hardware resources. Nonetheless, we already have
  concrete plans how to make the Docker setup more resilient as soon as
  we have more hardware available. Vincent documented this on [a wiki
  page](https://wiki.fsfe.org/TechDocs/Docker/docker-machine).
* On the human side, we made sure that all of us know what's on the
  plate for the next weeks and months. We have quite a few open issues
  collected in our Kanban board, and we quickly went through all of them
  to sketch the possible next steps and distribute responsibilities.


## Started projects in the making

Two days are quite some time and we worked hard to use them as
effectively as possible, so some tasks have been started but could not
be completed – partly because we just did no have enough time, partly
because they require more coordination and in-depth discussion:

* As follow-up on a few unpleasant surprises with Mailman's default
  values, we figured that it is important to have an automatic overview
  of the most sensible settings of the 127 (!) mailing lists we host.
  Vincent started to work on a way to extract this information in a
  human- and machine-readable format and merge/compare it with the more
  verbose documentation on the mailing lists we have internally.
* Francesco tackled a different weak point we have: monitoring. We lack
  a tool that informs us immediately about problems in our
  infrastructure, e.g. defunct core services, full disk drives or
  expired certificates. Since this is not trivial at all, it requires
  some more time.
* Thomas, maintainer of the FSFE wiki, researched on a way to better organise and distribute the SSH
  accesses in our team. Right now, we have no comfortable way to add or
  remove SSH keys on our more than 20 machines. His idea is to use an
  Ansible playbook to manage these, and thereby also create a shared
  Ansible inventory which can be used as a submodule for the other
  playbooks we use in the team so we don't have to maintain all of them
  individually if a machine is added, changed or removed.
* One of the most ancient physical machines we still run is hosting the
  SVN service which is only used by one service now: DNS. We started to work
  on migrating that over to Git and simultaneously improving the
  error-checking of the DNS configuration. Albert and I will continue
  with that gradually.
* Not on the system hackers meeting itself but two days later, Björn,
  Albert and I worked on getting a Nextcloud instance running. Caused by
  our rather special LDAP setup, we had to debug a lot of strange
  behaviour but finally figured everything out. Now, the last missing
  blocker is some user/permission setting within our LDAP. As soon as
  this is finished, we can shut down one more historically grown,
  customised-hacked and user-unfriendly service.


Overall, the perspective for the System Hackers is better than ever. We
are a growing team carried by motivated and skilled volunteers with a
shared vision of how the systems should develop. At the same time, we
have a lot of public and internal documentation available to make it
easy for new people to join us.

I would like to thank Albert, Florian, Francesco, Thomas and Vincent for
their participation in this meeting, and them and all other System
Hackers for their dedication to keep the FSFE running!