blog.polynom.me/content/2021-04-16-About-Logging.md

+++
title = "About Logging"
date = "2021-04-16"
template = "post.html"
aliases = [ "/About-Logging.html" ]
+++
*TL;DR*: This post also talks about the problems I faced while working on my logging. To log to
syslog from within my containers that do not support configuring a remote syslog server, I had
*syslog-ng* expose a unix domain socket and mounted it into the container to `/dev/log`.

<!-- more -->

## Introduction

I have written a lot of blog posts about the lessons I have learned while setting up and
maintaining my server. But now that I started to rework my infrastructure a bit, I had to
inevitably look at something I may have overlooked in the past: logging!

Previously, I had *Docker* *kind of* manage my logs: If I needed something, I would just
call `docker-compose logs <service>` and it would spit out logs. Then, I started to
configure my services to log to files in various locations: my *prosody* server would
log to `/etc/prosody/logs/info.log`, my *nginx* to `/etc/nginx/logs/error.log`, etc.
This, however, turned out to be problematic
as, in my case, *prosody* stopped logging into the file if I rotated it with *logrotate*. It was
also a bit impractical, as the logs were not all in the same place, but distributed across multiple
directories.
Moreover, *prosody* was logging things that I did not want in my logs but I could not turn off,
like when a client connected or authenticated itself. For me, this is a problem from two perspectives:
On the one hand, it is metadata that does not help me debug an hypothetical issue I have with my
*prosody* installation, on the other hand, it is metadata I straight-up do not want to store.

My solution was using a syslog daemon to process the logs, so that I could remove logs that I do not
want or need, and drop them all off at `/var/log`. However, there was a problem that I faced almost
immediately: Not all software I can configure to log to syslog, I can configure to log to a specific
syslog server. Why is this a problem? Well, syslog does not work inside a *Docker* container out of the
box, so I would have to have my syslog daemon expose a TCP/UDP (unix domain) socket that logs can be sent to. To
see this issue you can try to run `logger -t SomeTag Hello World` inside one of your containers
and try to find it, e.g. in your host's journal.

Today, I found my solution to both syslog logging within the containers and filtering out unneeded logs.

## Syslog inside Containers

The first step was getting the logs out of my containers without using files. To this end, I configured
my syslog daemon - *syslog-ng* - to expose a unix domain socket to, for example, `/var/run/syslog` and
mount it into all containers to `/dev/log`:

```
source s_src {
       system();
       internal();
       unix-dgram("/var/run/syslog");
};
```

If you now try and run `logger -t SomeTag Hello World` inside the container, you should be able
to find "Hello World" inside the host's logs or journals.

## Ignoring Certain Logs

The next step was ignoring logs that I do not need or care about. For this, I set up two logs within
*syslog-ng*: One that was going into my actual log file and one that was dropped:

```
destination d_prosody {
    file("/var/log/prosody.log");
};
filter f_prosody {
    program("prosody");
};
filter f_prosody_drop {
    program("prosody")
	and message("(Client connected|Client disconnected|Authenticated as .*|Stream encrypted .*)$");
};

# Drop
log {
    source(s_src);
    filter(f_prosody_drop);
    flags(final);
};
# Log
log {
    source(s_src);
    filter(f_prosody);
    destination(d_prosody);
    flags(final);
};

```

This example would log all things that *prosody* logs to the *prosody* location `d_prosody` and drop all
lines that match the given regular expression, which, in my case, matches all lines that relate to a client
connecting, disconnecting or authenticating.

Important is the `flags(final);` in the drop rule to indicate that a log line that matches the rule should
not be processed any further. That log also defines no destination, which tells *syslog-ng* in combination with
the `final` flag that the log
should be dropped.

Additionally, I moved the log rule that matches everything sent to the configured source to the bottom
of the configuration to prevent any of the logs to *also* land in the "everything" log.

Since I also host a *Nextcloud* server, I was also interested in getting rid of HTTP access logs. But I would
also like to know when someone is trying to scan my webserver for vulnerable *wordpress* installations.

So I again defined rules similar to those above, but added a twist:

```
filter f_nextcloud_drop {
    program("nextcloud")
	and match("200" value(".nextcloud.response"));
};
log {
    source(s_src);
    parser { apache-accesslog-parser(prefix(".nextcloud.")); };
    filter(f_nextcloud_drop);
    flags(final);
};

```

As you can see, the rule for my *Nextcloud* is quite similar, except that I added a parser. With this, I can
make *syslog-ng* understand the HTTP access log and expose its parts as variables to my filter rule. There,
I say that my drop rule should match all access log lines that indicate a HTTP response code of 200, since
those are locations on my server that I expect to be accessed and thus do not care about.

## Conclusion

With this setup, I feel much better about the logs I produce. I also have done other things not mentioned, like
configure *logrotate* to rotate my logs daily so that my logs don't grow too large and get removed after a day.

Please note that I am not an expert in *syslog-ng*. It just happend to be what I first got to do what I want. And
the example rules I showed are also the first thing that I wrote and filtered out what I wanted.