diff --git a/content/blog/2023-09-seafile-mirror.md b/content/blog/2023-09-seafile-mirror.md new file mode 100644 index 0000000..4ebd198 --- /dev/null +++ b/content/blog/2023-09-seafile-mirror.md @@ -0,0 +1,106 @@ +--- +title: "Seafile Mirror - Simple automatic backup of your Seafile libraries" +date: 2023-09-22 +categories: + - english +tags: + - python + - server + - tools +headerimage: /blog/library.jpg +headercredits: Wouldn't it be a shame if your library were to be destroyed? +--- + +I have been using [Seafile](https://www.seafile.com/) for years to host and +synchronise files on my own server. It's fast and reliable, especially when +dealing with a large number and size of files. But making reliable backups of +all its files isn't so trivial. This is because the files are stored in a layout +similar to bare Git repositories, and Seafile's headless tool, seafile-cli, +is... suboptimal. So I created what started out as a wrapper for it and ended up +as a full-blown tool for automatically synchronising your libraries to a backup +location: [**Seafile Mirror**](https://src.mehl.mx/mxmehl/seafile-mirror). + +## My requirements + +Of course, you could just take snapshots of the whole server, or copy the raw +Seafile data files and import them into a newly created Seafile instance as a +disaster recovery, but I want to be able to **directly access the current +state of the files** whenever I need them in case of an emergency. + +It was also important for me to have a **snapshot**, not just another real-time +sync of a library. This is because I also want to have a backup in case I (or an +attacker) mess up a Seafile library. A real-time sync would immediately fetch +that failed state. + +I also want to take a snapshot at a **configurable interval**. Some libraries +should be synchronised more often than others. For example, my picture albums do +not change as often as my miscellaneous documents, but they use at least 20 +times the disk space and therefore network traffic when running a full sync. + +Also, the backup service must have **read-only access** to the files. + +A version controlled backup of the backup (i.e. the plain files) wasn't in +scope. I handle this separately by backing up my backup location, which also +contains similar backups of other services and machines. For this reason, my +current solution does not do incremental backups, even though this may be +relevant for other use cases. + +## The problems + +Actually, [seafile-cli](https://help.seafile.com/syncing_client/linux-cli/) +should have been everything you'd need to fulfill the requirements. But no. It +turned out that this tool has a number of fundamental issues: + +* You can make the host the tool is running on a sync peer. However, it easily + leads to sync errors if the user just has read-only permissions to the + library. +* You can also download a library but then again it may lead to strange sync + errors. +* It requires a running daemon which crashes irregularly during larger sync + tasks or has other issues. +* Download/sync intervals cannot be set manually. + +## The solution + +[seafile-mirror](https://src.mehl.mx/mxmehl/seafile-mirror) takes care of all +these stumbling blocks: + +* It downloads/syncs defined libraries in customisable intervals +* It de-syncs libaries immediately after they have been downloaded to avoid sync + errors +* You can force-re-sync a library even if its re-sync interval hasn't reached + yet +* Extensive informative and error logging is provided +* Of course created with automation in mind so you can run it in cronjobs or + systemd triggers +* And as explained, it deals with the numerous caveats of `seaf-cli` and Seafile + in general + +Full installation and usage documentation can be found in the project +repository. Installation is as simple as running `pip3 install seafile-mirror`, +and a sample configuration is provided. + +In my setup, I run this application on a headless server with systemd under a +separate user account. Therefore the systemd service needs to be set up first. +This is also covered in the tool's documentation. And as an Ansible power user, +I also provide an [Ansible +role](https://src.mehl.mx/mxmehl/seafile-mirror-ansible) that does all the setup +and configuration. + + +## Possible next steps + +The tool has been running every day since a couple of months without any issues. +However, I could imagine a few more features to be helpful for more people: + +* Support of login tokens: Currently, only user/password auth is supported which + is fine for my use-case as it's just a read-only user. This wouldn't be hard + to fix either, seafile-cli supports it (at least in theory). + ([#2](https://src.mehl.mx/mxmehl/seafile-mirror/issues/2)) +* Support of encrypted libraries: Shouldn't be a big issue, it would require + passing the password to the underlying seafile-cli command. + ([#3](https://src.mehl.mx/mxmehl/seafile-mirror/issues/3)) + +If you have encountered problems or would like to point out the need for +specific features, please feel free to contact me or comment on the Mastodon +post. I'd also love to hear if you've become a happy user of the tool 😊. diff --git a/static/img/blog/library.jpg b/static/img/blog/library.jpg new file mode 100644 index 0000000..5aa03b3 Binary files /dev/null and b/static/img/blog/library.jpg differ