133 lines
6.6 KiB
Markdown
133 lines
6.6 KiB
Markdown
---
|
||
title: The 3rd FSFE System Hackers hackathon
|
||
date: 2019-10-22
|
||
categories:
|
||
- english
|
||
tags:
|
||
- fsfe
|
||
- report
|
||
- server
|
||
headerimage:
|
||
src: /blog/system-chaos.jpg
|
||
text: Fortunately not how the FSFE's infrastructure looks like
|
||
|
||
---
|
||
|
||
On 10 and 11 October, the FSFE System Hackers met in person to tackle
|
||
problems and new features regarding the servers and services the FSFE
|
||
is running. The team consists of dedicated volunteers who ensure that
|
||
the community and staff can work effectively. The recent meeting built
|
||
on the great work of the past 2 years which have been shaped by large
|
||
personal and technical changes.
|
||
|
||
<!--more-->
|
||
|
||
The System Hackers are responsible for the maintenance and development
|
||
of a [large number of
|
||
services](https://wiki.fsfe.org/TechDocs/Services). From the fsfe.org
|
||
website's deployment to the mail servers and blogs, from Git to
|
||
internal services like DNS and monitoring, all these services, virtual
|
||
machines and physical servers are handled by [this friendly
|
||
group](https://wiki.fsfe.org/Teams/System-Hackers/) that is always
|
||
looking forward to welcoming new members.
|
||
|
||
{{< figure src="/img/blog/system-servers.png" caption="Overview of the FSFE's services and servers" >}}
|
||
|
||
So in October, six of us met in Cologne. Fittingly, according to a
|
||
saying in this region, if you do something for the third time, it's
|
||
already tradition. So we accomplished this after successful meetings in
|
||
Berlin (April 2018) and Vienna (March 2019). And although it took place
|
||
on workdays, it's been the meeting with the highest participation so
|
||
far!
|
||
|
||
## Getting. Things. Done!
|
||
|
||
After the first and second meeting were mostly about getting an
|
||
overview of historically grown and sparsely documented infrastructure
|
||
and bringing it into a stable state, we were able to deal with a few
|
||
more general topics this time. At the same time, we exchanged our
|
||
knowledge with newly joined team members. Please find the areas we
|
||
worked on below:
|
||
|
||
* Florian migrated the FSFE Blogs to a new server and thereby also
|
||
updated the underlying Wordpress to the latest version. This has been
|
||
a major blocker for several other tasks and our largest security risk.
|
||
There are still a few things left to do, e.g. creating a theme in line
|
||
with the FSFE design and some announcement to the community. However,
|
||
the most complicated part is done!
|
||
* Altogether, we upgraded a lot of machines to Debian 10, just after we lifted most
|
||
servers to Debian 9 in March. Some are still missing, but since the
|
||
migration is rather painless, we can do that during the next months.
|
||
* We confirmed that the new decentralised backup system setup by myself
|
||
and based on Borg works fine. This gives us more confidence in our
|
||
infrastructure.
|
||
* Thanks to Florian and Albert, we finally got rid of the last 2
|
||
services that were not using Let's Encrypt's self-renewing
|
||
certificates.
|
||
* Vincent and Francesco took care of finishing the migration of all our Docker containers
|
||
to use the Docker-in-Docker deployment instead of the hacky Ansible
|
||
playbooks we used initially. This has a few security advantages and
|
||
enables the next developments for a more resilient Docker
|
||
infrastructure.
|
||
* At the moment, all our Docker containers run on one single virtual
|
||
machine. Although this runs on a Proxmox/Ceph cluster, it's obviously
|
||
a single point of failure. However, for a distribution on multiple
|
||
servers we lack the hardware resources. Nonetheless, we already have
|
||
concrete plans how to make the Docker setup more resilient as soon as
|
||
we have more hardware available. Vincent documented this on [a wiki
|
||
page](https://wiki.fsfe.org/TechDocs/Docker/docker-machine).
|
||
* On the human side, we made sure that all of us know what's on the
|
||
plate for the next weeks and months. We have quite a few open issues
|
||
collected in our Kanban board, and we quickly went through all of them
|
||
to sketch the possible next steps and distribute responsibilities.
|
||
|
||
|
||
## Started projects in the making
|
||
|
||
Two days are quite some time and we worked hard to use them as
|
||
effectively as possible, so some tasks have been started but could not
|
||
be completed – partly because we just did no have enough time, partly
|
||
because they require more coordination and in-depth discussion:
|
||
|
||
* As follow-up on a few unpleasant surprises with Mailman's default
|
||
values, we figured that it is important to have an automatic overview
|
||
of the most sensible settings of the 127 (!) mailing lists we host.
|
||
Vincent started to work on a way to extract this information in a
|
||
human- and machine-readable format and merge/compare it with the more
|
||
verbose documentation on the mailing lists we have internally.
|
||
* Francesco tackled a different weak point we have: monitoring. We lack
|
||
a tool that informs us immediately about problems in our
|
||
infrastructure, e.g. defunct core services, full disk drives or
|
||
expired certificates. Since this is not trivial at all, it requires
|
||
some more time.
|
||
* Thomas, maintainer of the FSFE wiki, researched on a way to better organise and distribute the SSH
|
||
accesses in our team. Right now, we have no comfortable way to add or
|
||
remove SSH keys on our more than 20 machines. His idea is to use an
|
||
Ansible playbook to manage these, and thereby also create a shared
|
||
Ansible inventory which can be used as a submodule for the other
|
||
playbooks we use in the team so we don't have to maintain all of them
|
||
individually if a machine is added, changed or removed.
|
||
* One of the most ancient physical machines we still run is hosting the
|
||
SVN service which is only used by one service now: DNS. We started to work
|
||
on migrating that over to Git and simultaneously improving the
|
||
error-checking of the DNS configuration. Albert and I will continue
|
||
with that gradually.
|
||
* Not on the system hackers meeting itself but two days later, Björn,
|
||
Albert and I worked on getting a Nextcloud instance running. Caused by
|
||
our rather special LDAP setup, we had to debug a lot of strange
|
||
behaviour but finally figured everything out. Now, the last missing
|
||
blocker is some user/permission setting within our LDAP. As soon as
|
||
this is finished, we can shut down one more historically grown,
|
||
customised-hacked and user-unfriendly service.
|
||
|
||
|
||
Overall, the perspective for the System Hackers is better than ever. We
|
||
are a growing team carried by motivated and skilled volunteers with a
|
||
shared vision of how the systems should develop. At the same time, we
|
||
have a lot of public and internal documentation available to make it
|
||
easy for new people to join us.
|
||
|
||
I would like to thank Albert, Florian, Francesco, Thomas and Vincent for
|
||
their participation in this meeting, and them and all other System
|
||
Hackers for their dedication to keep the FSFE running!
|