103 lines
7.1 KiB
Markdown
103 lines
7.1 KiB
Markdown
<!-- title: Lessons Learned From Self-Hosting -->
|
|
<!-- render: yes -->
|
|
Roughly eight months ago, according to my hosting provider, I spun up my VM which
|
|
I use to this day to self-host my chat, my mail, my git and so on. At the beginning, I thought that
|
|
it would allow me both to get away from proprietary software and to learn Linux administration. While
|
|
my first goal was met without any problems, the second one I achieved in ways I did not anticipate.
|
|
|
|
During these eight months, I learned quite a lot. Not by reading documentation, but by messing up
|
|
deployments. So this post is my telling of how I messed up and what lessons I learned from it.
|
|
|
|
# Lesson 1: Document everything
|
|
I always tell people that you should document your code. When asked why I answer that you won't
|
|
remember what that line does when you have not looked at your codebase for weeks or months.
|
|
|
|
What I did not realise is that this also applies to administration. I only wrote basic documentation
|
|
like a howto for certificate generation or a small troubleshooting guide. This, however, missed the most
|
|
important thing to document: the entire infrastructure.
|
|
|
|
Whenever I needed to look up my port mapping, what did I do? I opened up my *Docker compose* configuration
|
|
and search for the port mappings. What did I do when I wanted to know what services I have? Open my
|
|
*nginx* configuration and search for `server` directives.
|
|
|
|
This is a very slow process since I have to remember what services I have behind a reverse proxy and which
|
|
ones I have simply exposed. This lead me in the end to creating a folder - called `docs` - in which
|
|
I document everything. What certificates are used by what and where they are, port mappings, a graph
|
|
showing the dependencies of my services, ... While it may be tedious to create at first, it will really
|
|
help.
|
|
|
|
```
|
|
[World]
|
|
+
|
|
|
|
|
+-[443]-[nginx]-+-(blog.polynom.me)
|
|
+-(git.polynom.me)-[gitea]
|
|
```
|
|
|
|
Above, you can see an excerpt from my *"network graph"*.
|
|
|
|
# Lesson 2: Version Control everything
|
|
Version Control Systems are a great thing. Want to try something out? Branch, try out and then either
|
|
merge back or roll back. Want to find out what changes broke something? Diff the last revisions and narrow
|
|
down your "search space". Want to know what you did? View the log.
|
|
|
|
While it might seem unneccessary, it helps me keep my cool, knowing that if I ever mess up my configuration, I
|
|
can just roll back the configuration from within git.
|
|
|
|
# Lesson 3: Have a test environment
|
|
While I was out once, I connected to a public Wifi. There, however, I could not connect to my VPN. It simply
|
|
did not work. A bit later, my Jabber client *Conversations* told me that it could not find my server. After
|
|
some thinking, I came to the conclusion that the provider of said public Wifi was probably blocking port `5222`
|
|
*(XMPP Client-to-Server)* and whatever port the VPN is using. As such, I wanted to change the port my
|
|
Jabber server uses. Since I do not have a failover server I tried testing things out locally, but gave up
|
|
after some time and just went and "tested in production". Needless to say that this was a bad idea. At first,
|
|
*Conversations* did not do a DNS lookup to see the changed XMPP port, which lead me to removing the DNS entry.
|
|
However, after some time - probably after the DNS change propagated far enough - *Conversations* said that it
|
|
could not find the server, even though it was listening on port `5222`. Testing with the new port yieled
|
|
success.
|
|
|
|
This experience was terrible for me. Not only was it possible that I broke my Jabber server, but it would
|
|
annoy everyone I got to install a Jabber client to talk to me as it would display *"Cannot connect to..."*.
|
|
If I had tested this locally, I probably would have been much calmer. In the end, I nervously watched as everyone
|
|
gradually reconnected...
|
|
|
|
# Lesson 4: Use tools and write scripts
|
|
The first server I ever got I provisioned manually. I mean, back then it made sense: It was a one-time provisioning and nothing should
|
|
change after the initial deployment. But now that I have a continually evolving server, I somehow need to document every step in case
|
|
I ever need to provision the same server again.
|
|
|
|
In my case it is *Ansible*. In my playbook I keep all the roles, e.g. *nginx*, *matterbridge*, *prosody*, separate and apply them to my one
|
|
server. In there I also made **heavy** use of templates. The reason for it is that before I started my [*"Road to FOSS"*](https://blog.polynom.me/Road-to-Foss.html)
|
|
I used a different domain that I had lying around. Changing the domain name manually would have been a very tedious process, so I decided to use
|
|
templates from the get-go. To make my life easier in case I ever change domains again, I defined all my domain names based on my `domain` variable.
|
|
The domain for git is defined as {% raw %}`git.{{ domain }}`{% endraw %}, the blog one as {% raw %}`blog.{{ domain }}`{% endraw %}.
|
|
Additionally, I make use of *Ansible Vaults*, allowing me to have encrypted secrets in my playbook.
|
|
|
|
During another project, I also set up an *Ansible* playbook. There, however, I did not use templates. I templated the configuration files using a Makefile
|
|
that was calling `sed` to replace the patterns. Not only was that a fragile method, it was also unneeded as *Ansible* was already providing
|
|
this functionality for me. I was just wasting my own time.
|
|
|
|
What I also learned was that one *Ansible* playbook is not enough. While it is nice to automatically provision a server using *Ansible*, there are other things
|
|
that need to be done. Certificates don't rotate themselves. From that, I derived a rule stating that if a task needs to be done more than once, then it is
|
|
time to write a script for it.
|
|
|
|
# Lesson 4.1: Automate
|
|
Closely tied to the last point: If a task needs to be performed, then you should consider creating a cronjob, or a systemd timer if that is more your thing,
|
|
to automatically run it. You don't want to enjoy your day, only for it to be ruined by an expired certificate causing issues.
|
|
|
|
Since automated cronjobs can cause trouble aswell, I decided to run all automated tasks on days at a time during which I am like to be able to react. As such, it is very
|
|
important to notify yourself of those automated actions. My certificate rotation, for example, sends me an eMail at the end, telling me if the certificates
|
|
were successfully rotated and if not, which ones failed. For those cases, I also keep a log of the rotation process somewhere else so that I can review it.
|
|
|
|
# Lesson 5: Unexpected things happen
|
|
After having my shiny server run for some time, I was happy. It was basically running itself. Until *Conversations* was unable to contact my server,
|
|
connected to a public Wifi. This is something that I did not anticipate, but happened nevertheless.
|
|
|
|
This means that my deployment was not a run-and-forget solution but a constantly evolving system, where small improvements are periodically added.
|
|
|
|
# Conclusion
|
|
I thought I would just write down my thoughts on all the things that went wrong over the course of my self-hosting adventure. They may not
|
|
be best practices, but things that really helped me a lot.
|
|
|
|
Was the entire process difficult? At first. Was the experience an opportunity to learn? Absolutely! Was it fun? Definitely.
|