This repository has been archived on 2024-02-04. You can view files and clone it, but cannot push or open issues or pull requests.
blog.polynom.me/content/blog/2023-07-15-prosody-traefik-2.md

7.5 KiB

Hello everyone. Long time, no read.

In 2020, I published a post titled "Running Prosody on Port 443 Behind traefik", where I described how I run my XMPP server behind the "application proxy" traefik. I did this because I wanted to run my XMPP server prosody on port 443, so that the clients connected to my server can bypass firewalls that only allow web traffic. While that approach worked, over the last three years I changed my setup dramatically.

While migrating my old server from Debian to NixOS, I decided that I wanted a website hosted at the same domain I host my XMPP server at. This, however, was not possible with traefik back then because it only allowed the HostSNI rule, which differentiates TLS connections using the sent Server Name Indication. This is a problem, because a connection to polynom.me the website and polynom.me the XMPP server both result in the same SNI being sent by a connecting client.

Some time later, I stumbled upon sslh, which is a tool similar to traefik in that it allows hosting multiple services on the same port, all differentiated by the SNI and the ALPN set by the connecting client. ALPN, or Application-Layer Protocol Negotiation, is an extension to TLS which allows a connecting client to advertise the protocol(s) it would like to use inside the encrypted session (source). As such, I put sslh in front of my traefik and told it to route XMPP traffic (identified with an ALPN of xmpp-client) to my prosody server and everything else to my traefik server. While this worked well, there were two issues:

  1. I was not running sslh in its "transparent mode", which uses some fancy iptable rules to allow the services behind it to see a connecting client's real IP address instead of just 127.0.0.1. However, this requires more setup to work. This is an issue for services which enforce rate limits, like NextCloud and Akkoma. If one of theses services gets hit by many requests, all the services see are requests from 127.0.0.1 and may thus rate limit (or ban) 127.0.0.1, meaning that all - even legitimate - requests are rate limited. Additionally, I was not sure if I could just use this to route an incoming IPv6 request to 127.0.0.1, which is an IPv4 address.
  2. One day, as I was updating my server, I noticed that all my web services were responding very slowly. After some looking around, it turned out that sslh took about 5 seconds to route IPv6 requests, but not IPv4 requests. As I did not change anything (besides update the server), to this day I am not sure what happened.

Due to these two issues, I decided to revisit the idea I described in my old post.

The Prosody Setup

On the prosody-side of things, I did not change a lot compared to the old post. I did, however, migrate from the legacy_ssl_* options to the newer c2s_direct_tls_* options, which replace the former.

Thus, my prosody configuration regarding direct TLS connections now looks like this:

c2s_direct_tls_ports = { 5223 }
c2s_direct_tls_ssl = {
    [5223] = {
        key = "/etc/prosody/certs/polynom.me.key";
        certificate = "/etc/prosody/certs/polynom.me.crt";
    };
}

The Traefik Setup

On traefik-side of things, only one thing really changed: Instead of just having a rule using HostSNI, I now also require that the connection with the XMPP server advertises an ALPN of xmpp-client, which is specified in the appropriate XMPP spec. From my deployment experience, all clients I tested (Conversations, Blabber, Gajim, Dino, Monal, Moxxy) correctly set the ALPN when connecting via a direct TLS connection.

So my traefik configuration now looks something like this (Not really, because I let NixOS generate the actual config, but it is very similar):

tcp:
	routers:
        xmpps:
            entrypoints:
                - "https"
            rule: "HostSNI(`polynom.me`) && ALPN(`xmpp-client`)"
            service: prosody
            tls:
                passthrough: true
        # [...]
    services:
        prosody:
            loadBalancer:
                servers:
                    - address: "127.0.0.1:5223"

http:
    routers:
        web-secure:
            entrypoints:
                - "https"
            rule: "Host(`polynom.me`)"
            service: webserver
			tls:

The entrypoint https is just set to listen on :443. This way, I can route IPv4 and IPv6 requests. Also note the passthrough: true in the XMPP router's tls settings. If this is not set to true, then traefik would terminate the connection's TLS session before passing the data to the XMPP server.

However, this config has one really big issue: In order to have the website hosted at polynom.me be served using TLS, I have to set the router's tls attribute. The traefik documentation says that "If both HTTP routers and TCP routers listen to the same entry points, the TCP routers will apply before the HTTP routers. If no matching route is found for the TCP routers, then the HTTP routers will take over." (source).

This, however, does not seem to be the case if a HTTP router (in my example with Host(`polynom.me`)) and a TCP router (in my example with HostSNI(`polynom.me`)) respond to the same SNI and the HTTP router has its tls attribute set. In that case, the HTTP router appears to be checked first and will complain, if the sent ALPN is not one of the HTTP ALPNs, for example when connecting using XMPP. As such we can connect to the HTTP server but not to the XMPP server.

It appears to be an issue that I am not alone with, but also one that is not fixed. So I tried digging around in traefik's code and tried a couple of things. So for my setup to work, I have to apply this patch to traefik. With that, the issue appears to be gone, and I can access both my website and my XMPP server on the same domain and on the same port. Do note that this patch is not upstreamed and may break things. For me, it works. But I haven't run extensive tests or traefik's integration and unit tests.

Conclusion

This approach solves problem 2 fully and problem 1 partially. Traefik is able to route the connections correctly with no delay, compared to sslh. It also provides my web services with the connecting clients' IP addresses using HTTP headers. It does not, however, provide my XMPP server with a connecting client's IP address. This could be solved with some clever trickery, like telling traefik to use the PROXY protocol when connecting to prosody, and enabling the net_proxy module. However, I have not yet tried such a setup, though I am very curious and may try that out.