131 lines
6.6 KiB
Markdown
131 lines
6.6 KiB
Markdown
|
|
---
|
|||
|
|
title: Tales from the 3rd FSFE System Hackers hackathon
|
|||
|
|
date: 2019-10-22
|
|||
|
|
categories:
|
|||
|
|
- english
|
|||
|
|
tags:
|
|||
|
|
- fsfe
|
|||
|
|
- report
|
|||
|
|
- server
|
|||
|
|
draft: true
|
|||
|
|
headerimage: /blog/system-chaos.jpg
|
|||
|
|
headercredits: Fortunately not how the FSFE's infrastructure looks like
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
On 10 and 11 October, the FSFE System Hackers met in person to tackle
|
|||
|
|
problems and new features regarding the servers and services the FSFE
|
|||
|
|
is running. The team consists of dedicated volunteers who ensure that
|
|||
|
|
the community and staff can work effectively. The recent meeting built
|
|||
|
|
on the great work of the past 2 years which have been shaped by large
|
|||
|
|
personal and technical changes.
|
|||
|
|
|
|||
|
|
The System Hackers are responsible for the maintenance and development
|
|||
|
|
of a [large number of
|
|||
|
|
services](https://wiki.fsfe.org/TechDocs/Services). From the fsfe.org
|
|||
|
|
website's deployment to the mail servers and blogs, from Git to
|
|||
|
|
internal services like DNS and monitoring, all these services, virtual
|
|||
|
|
machines and physical servers are handled by [this friendly
|
|||
|
|
group](https://wiki.fsfe.org/Teams/System-Hackers/) that is always
|
|||
|
|
looking forward to welcoming new members.
|
|||
|
|
|
|||
|
|
{{< figure src="/img/blog/system-servers.png" caption="Overview of the FSFE's services and servers" >}}
|
|||
|
|
|
|||
|
|
So in October, six of us met in Cologne. Fittingly, according to a
|
|||
|
|
saying in this region, if you do something for the third time, it's
|
|||
|
|
already tradition. So we accomplished this after successful meetings in
|
|||
|
|
Berlin (April 2018) and Vienna (March 2019). And although it took place
|
|||
|
|
on workdays, it's been the meeting with the highest participation so
|
|||
|
|
far!
|
|||
|
|
|
|||
|
|
## Getting. Things. Done!
|
|||
|
|
|
|||
|
|
After the first and second meeting were mostly about getting an
|
|||
|
|
overview of historically grown and sparsely documented infrastructure
|
|||
|
|
and bringing it into a stable state, we were able to deal with a few
|
|||
|
|
more general topics this time. At the same time, we exchanged our
|
|||
|
|
knowledge with newly joined team members. Please find the areas we
|
|||
|
|
worked on below:
|
|||
|
|
|
|||
|
|
* Florian migrated the FSFE Blogs to a new server and thereby also
|
|||
|
|
updated the underlying Wordpress to the latest version. This has been
|
|||
|
|
a major blocker for several other tasks and our largest security risk.
|
|||
|
|
There are still a few things left to do, e.g. creating a theme in line
|
|||
|
|
with the FSFE design and some announcement to the community. However,
|
|||
|
|
the most complicated part is done!
|
|||
|
|
* Altogether, we upgraded a lot of machines to Debian 10, just after we lifted most
|
|||
|
|
servers to Debian 9 in March. Some are still missing, but since the
|
|||
|
|
migration is rather painless, we can do that during the next months.
|
|||
|
|
* We confirmed that the new decentralised backup system setup by myself
|
|||
|
|
and based on Borg works fine. This gives us more confidence in our
|
|||
|
|
infrastructure.
|
|||
|
|
* Thanks to Florian and Albert, we finally got rid of the last 2
|
|||
|
|
services that were not using Let's Encrypt's self-renewing
|
|||
|
|
certificates.
|
|||
|
|
* Vincent and Francesco took care of finishing the migration of all our Docker containers
|
|||
|
|
to use the Docker-in-Docker deployment instead of the hacky Ansible
|
|||
|
|
playbooks we used initially. This has a few security advantages and
|
|||
|
|
enables the next developments for a more resilient Docker
|
|||
|
|
infrastructure.
|
|||
|
|
* At the moment, all our Docker containers run on one single virtual
|
|||
|
|
machine. Although this runs on a Proxmox/Ceph cluster, it's obviously
|
|||
|
|
a single point of failure. However, for a distribution on multiple
|
|||
|
|
servers we lack the hardware resources. Nonetheless, we already have
|
|||
|
|
concrete plans how to make the Docker setup more resilient as soon as
|
|||
|
|
we have more hardware available. Vincent documented this on [a wiki
|
|||
|
|
page](https://wiki.fsfe.org/TechDocs/Docker/docker-machine).
|
|||
|
|
* On the human side, we made sure that all of us know what's on the
|
|||
|
|
plate for the next weeks and months. We have quite a few open issues
|
|||
|
|
collected in our Kanban board, and we quickly went through all of them
|
|||
|
|
to sketch the possible next steps and distribute responsibilities.
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Started projects in the making
|
|||
|
|
|
|||
|
|
Two days are quite some time and we worked hard to use them as
|
|||
|
|
effectively as possible, so some tasks have been started but could not
|
|||
|
|
be completed – partly because we just did no have enough time, partly
|
|||
|
|
because they require more coordination and in-depth discussion:
|
|||
|
|
|
|||
|
|
* As follow-up on a few unpleasant surprises with Mailman's default
|
|||
|
|
values, we figured that it is important to have an automatic overview
|
|||
|
|
of the most sensible settings of the 127 (!) mailing lists we host.
|
|||
|
|
Vincent started to work on a way to extract this information in a
|
|||
|
|
human- and machine-readable format and merge/compare it with the more
|
|||
|
|
verbose documentation on the mailing lists we have internally.
|
|||
|
|
* Francesco tackled a different weak point we have: monitoring. We lack
|
|||
|
|
a tool that informs us immediately about problems in our
|
|||
|
|
infrastructure, e.g. defunct core services, full disk drives or
|
|||
|
|
expired certificates. Since this is not trivial at all, it requires
|
|||
|
|
some more time.
|
|||
|
|
* Thomas, maintainer of the FSFE wiki, researched on a way to better organise and distribute the SSH
|
|||
|
|
accesses in our team. Right now, we have no comfortable way to add or
|
|||
|
|
remove SSH keys on our more than 20 machines. His idea is to use an
|
|||
|
|
Ansible playbook to manage these, and thereby also create a shared
|
|||
|
|
Ansible inventory which can be used as a submodule for the other
|
|||
|
|
playbooks we use in the team so we don't have to maintain all of them
|
|||
|
|
individually if a machine is added, changed or removed.
|
|||
|
|
* One of the most ancient physical machines we still run is hosting the
|
|||
|
|
SVN service which is only used by one service now: DNS. We started to work
|
|||
|
|
on migrating that over to Git and simultaneously improving the
|
|||
|
|
error-checking of the DNS configuration. Albert and I will continue
|
|||
|
|
with that gradually.
|
|||
|
|
* Not on the system hackers meeting itself but two days later, Björn,
|
|||
|
|
Albert and I worked on getting a Nextcloud instance running. Caused by
|
|||
|
|
our rather special LDAP setup, we had to debug a lot of strange
|
|||
|
|
behaviour but finally figured everything out. Now, the last missing
|
|||
|
|
blocker is some user/permission setting within our LDAP. As soon as
|
|||
|
|
this is finished, we can shut down one more historically grown,
|
|||
|
|
customised-hacked and user-unfriendly service.
|
|||
|
|
|
|||
|
|
|
|||
|
|
Overall, the perspective for the System Hackers is better than ever. We
|
|||
|
|
are a growing team carried by motivated and skilled volunteers with a
|
|||
|
|
shared vision of how the systems should develop. At the same time, we
|
|||
|
|
have a lot of public and internal documentation available to make it
|
|||
|
|
easy for new people to join us.
|
|||
|
|
|
|||
|
|
I would like to thank Albert, Florian, Francesco, Thomas and Vincent for
|
|||
|
|
their participation in this meeting, and them and all other System
|
|||
|
|
Hackers for their dedication to keep the FSFE running!
|