blog.polynom.me/content/2021-04-16-About-Logging.md

135 lines
5.6 KiB
Markdown
Raw Normal View History

2024-01-05 17:10:44 +00:00
+++
title = "About Logging"
date = "2021-04-16"
template = "post.html"
aliases = [ "/About-Logging.html" ]
+++
*TL;DR*: This post also talks about the problems I faced while working on my logging. To log to
syslog from within my containers that do not support configuring a remote syslog server, I had
*syslog-ng* expose a unix domain socket and mounted it into the container to `/dev/log`.
<!-- more -->
## Introduction
I have written a lot of blog posts about the lessons I have learned while setting up and
maintaining my server. But now that I started to rework my infrastructure a bit, I had to
inevitably look at something I may have overlooked in the past: logging!
Previously, I had *Docker* *kind of* manage my logs: If I needed something, I would just
call `docker-compose logs <service>` and it would spit out logs. Then, I started to
configure my services to log to files in various locations: my *prosody* server would
log to `/etc/prosody/logs/info.log`, my *nginx* to `/etc/nginx/logs/error.log`, etc.
This, however, turned out to be problematic
as, in my case, *prosody* stopped logging into the file if I rotated it with *logrotate*. It was
also a bit impractical, as the logs were not all in the same place, but distributed across multiple
directories.
Moreover, *prosody* was logging things that I did not want in my logs but I could not turn off,
like when a client connected or authenticated itself. For me, this is a problem from two perspectives:
On the one hand, it is metadata that does not help me debug an hypothetical issue I have with my
*prosody* installation, on the other hand, it is metadata I straight-up do not want to store.
My solution was using a syslog daemon to process the logs, so that I could remove logs that I do not
want or need, and drop them all off at `/var/log`. However, there was a problem that I faced almost
immediately: Not all software I can configure to log to syslog, I can configure to log to a specific
syslog server. Why is this a problem? Well, syslog does not work inside a *Docker* container out of the
box, so I would have to have my syslog daemon expose a TCP/UDP (unix domain) socket that logs can be sent to. To
see this issue you can try to run `logger -t SomeTag Hello World` inside one of your containers
and try to find it, e.g. in your host's journal.
Today, I found my solution to both syslog logging within the containers and filtering out unneeded logs.
## Syslog inside Containers
The first step was getting the logs out of my containers without using files. To this end, I configured
my syslog daemon - *syslog-ng* - to expose a unix domain socket to, for example, `/var/run/syslog` and
mount it into all containers to `/dev/log`:
```
source s_src {
system();
internal();
unix-dgram("/var/run/syslog");
};
```
If you now try and run `logger -t SomeTag Hello World` inside the container, you should be able
to find "Hello World" inside the host's logs or journals.
## Ignoring Certain Logs
The next step was ignoring logs that I do not need or care about. For this, I set up two logs within
*syslog-ng*: One that was going into my actual log file and one that was dropped:
```
destination d_prosody {
file("/var/log/prosody.log");
};
filter f_prosody {
program("prosody");
};
filter f_prosody_drop {
program("prosody")
and message("(Client connected|Client disconnected|Authenticated as .*|Stream encrypted .*)$");
};
# Drop
log {
source(s_src);
filter(f_prosody_drop);
flags(final);
};
# Log
log {
source(s_src);
filter(f_prosody);
destination(d_prosody);
flags(final);
};
```
This example would log all things that *prosody* logs to the *prosody* location `d_prosody` and drop all
lines that match the given regular expression, which, in my case, matches all lines that relate to a client
connecting, disconnecting or authenticating.
Important is the `flags(final);` in the drop rule to indicate that a log line that matches the rule should
not be processed any further. That log also defines no destination, which tells *syslog-ng* in combination with
the `final` flag that the log
should be dropped.
Additionally, I moved the log rule that matches everything sent to the configured source to the bottom
of the configuration to prevent any of the logs to *also* land in the "everything" log.
Since I also host a *Nextcloud* server, I was also interested in getting rid of HTTP access logs. But I would
also like to know when someone is trying to scan my webserver for vulnerable *wordpress* installations.
So I again defined rules similar to those above, but added a twist:
```
filter f_nextcloud_drop {
program("nextcloud")
and match("200" value(".nextcloud.response"));
};
log {
source(s_src);
parser { apache-accesslog-parser(prefix(".nextcloud.")); };
filter(f_nextcloud_drop);
flags(final);
};
```
As you can see, the rule for my *Nextcloud* is quite similar, except that I added a parser. With this, I can
make *syslog-ng* understand the HTTP access log and expose its parts as variables to my filter rule. There,
I say that my drop rule should match all access log lines that indicate a HTTP response code of 200, since
those are locations on my server that I expect to be accessed and thus do not care about.
## Conclusion
With this setup, I feel much better about the logs I produce. I also have done other things not mentioned, like
configure *logrotate* to rotate my logs daily so that my logs don't grow too large and get removed after a day.
Please note that I am not an expert in *syslog-ng*. It just happend to be what I first got to do what I want. And
the example rules I showed are also the first thing that I wrote and filtered out what I wanted.