This repository has been archived on 2024-02-04. You can view files and clone it, but cannot push or open issues or pull requests.
blog.polynom.me/_posts/2020-01-03-Selfhosting-Lessons.md

7.1 KiB

title hashtag
Lessons Learned From Self-Hosting selfhostlessons

Roughly eight months ago, according to my hosting provider, I spun up my VM which I use to this day to self-host my chat, my mail, my git and so on. At the beginning, I thought that it would allow me both to get away from proprietary software and to learn Linux administration. While my first goal was met without any problems, the second one I achieved in ways I did not anticipate.

During these eight months, I learned quite a lot. Not by reading documentation, but by messing up deployments. So this post is my telling of how I messed up and what lessons I learned from it.

Lesson 1: Document everything

I always tell people that you should document your code. When asked why I answer that you won't remember what that line does when you have not looked at your codebase for weeks or months.

What I did not realise is that this also applies to administration. I only wrote basic documentation like a howto for certificate generation or a small troubleshooting guide. This, however, missed the most important thing to document: the entire infrastructure.

Whenever I needed to look up my port mapping, what did I do? I opened up my Docker compose configuration and search for the port mappings. What did I do when I wanted to know what services I have? Open my nginx configuration and search for server directives.

This is a very slow process since I have to remember what services I have behind a reverse proxy and which ones I have simply exposed. This lead me in the end to creating a folder - called docs - in which I document everything. What certificates are used by what and where they are, port mappings, a graph showing the dependencies of my services, ... While it may be tedious to create at first, it will really help.

[World]
+
|
+-[443]-[nginx]-+-(blog.polynom.me)
                +-(git.polynom.me)-[gitea]

Above, you can see an excerpt from my "network graph".

Lesson 2: Version Control everything

Version Control Systems are a great thing. Want to try something out? Branch, try out and then either merge back or roll back. Want to find out what changes broke something? Diff the last revisions and narrow down your "search space". Want to know what you did? View the log.

While it might seem unneccessary, it helps me keep my cool, knowing that if I ever mess up my configuration, I can just roll back the configuration from within git.

Lesson 3: Have a test environment

While I was out once, I connected to a public Wifi. There, however, I could not connect to my VPN. It simply did not work. A bit later, my Jabber client Conversations told me that it could not find my server. After some thinking, I came to the conclusion that the provider of said public Wifi was probably blocking port 5222 (XMPP Client-to-Server) and whatever port the VPN is using. As such, I wanted to change the port my Jabber server uses. Since I do not have a failover server I tried testing things out locally, but gave up after some time and just went and "tested in production". Needless to say that this was a bad idea. At first, Conversations did not do a DNS lookup to see the changed XMPP port, which lead me to removing the DNS entry. However, after some time - probably after the DNS change propagated far enough - Conversations said that it could not find the server, even though it was listening on port 5222. Testing with the new port yieled success.

This experience was terrible for me. Not only was it possible that I broke my Jabber server, but it would annoy everyone I got to install a Jabber client to talk to me as it would display "Cannot connect to...". If I had tested this locally, I probably would have been much calmer. In the end, I nervously watched as everyone gradually reconnected...

Lesson 4: Use tools and write scripts

The first server I ever got I provisioned manually. I mean, back then it made sense: It was a one-time provisioning and nothing should change after the initial deployment. But now that I have a continually evolving server, I somehow need to document every step in case I ever need to provision the same server again.

In my case it is Ansible. In my playbook I keep all the roles, e.g. nginx, matterbridge, prosody, separate and apply them to my one server. In there I also made heavy use of templates. The reason for it is that before I started my "Road to FOSS" I used a different domain that I had lying around. Changing the domain name manually would have been a very tedious process, so I decided to use templates from the get-go. To make my life easier in case I ever change domains again, I defined all my domain names based on my domain variable. The domain for git is defined as {% raw %}git.{{ domain }}{% endraw %}, the blog one as {% raw %}blog.{{ domain }}{% endraw %}. Additionally, I make use of Ansible Vaults, allowing me to have encrypted secrets in my playbook.

During another project, I also set up an Ansible playbook. There, however, I did not use templates. I templated the configuration files using a Makefile that was calling sed to replace the patterns. Not only was that a fragile method, it was also unneeded as Ansible was already providing this functionality for me. I was just wasting my own time.

What I also learned was that one Ansible playbook is not enough. While it is nice to automatically provision a server using Ansible, there are other things that need to be done. Certificates don't rotate themselves. From that, I derived a rule stating that if a task needs to be done more than once, then it is time to write a script for it.

Lesson 4.1: Automate

Closely tied to the last point: If a task needs to be performed, then you should consider creating a cronjob, or a systemd timer if that is more your thing, to automatically run it. You don't want to enjoy your day, only for it to be ruined by an expired certificate causing issues.

Since automated cronjobs can cause trouble aswell, I decided to run all automated tasks on days at a time during which I am like to be able to react. As such, it is very important to notify yourself of those automated actions. My certificate rotation, for example, sends me an eMail at the end, telling me if the certificates were successfully rotated and if not, which ones failed. For those cases, I also keep a log of the rotation process somewhere else so that I can review it.

Lesson 5: Unexpected things happen

After having my shiny server run for some time, I was happy. It was basically running itself. Until Conversations was unable to contact my server, connected to a public Wifi. This is something that I did not anticipate, but happened nevertheless.

This means that my deployment was not a run-and-forget solution but a constantly evolving system, where small improvements are periodically added.

Conclusion

I thought I would just write down my thoughts on all the things that went wrong over the course of my self-hosting adventure. They may not be best practices, but things that really helped me a lot.

Was the entire process difficult? At first. Was the experience an opportunity to learn? Absolutely! Was it fun? Definitely.