Initial commit
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful

This commit is contained in:
PapaTutuWawa 2024-01-05 18:10:44 +01:00
commit caef031d48
35 changed files with 2049 additions and 0 deletions

2
.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
# Generated using tailwindcss
static/css/index.css

16
.woodpecker.yml Normal file
View File

@ -0,0 +1,16 @@
steps:
build:
image: alpine:3.19
commands:
- apk add --no-cache zola npm
- npm install -D tailwindcss @tailwindcss/typography
- npx tailwindcss -i input.css -o static/css/index.css --minify
- zola build
# deploy:
# image: codeberg.org/xfix/plugin-codeberg-pages-deploy:1
# settings:
# folder: public
# branch: pages
# ssh_key:
# from_secret: ssh_key
# git_config_name: "polynom.me CI system"

19
config.toml Normal file
View File

@ -0,0 +1,19 @@
base_url = "https://blog.polynom.me"
compile_sass = false
build_search_index = false
generate_feed = true
feed_filename = "atom.xml"
title = "PapaTutuWawa's Blog"
description = "PapaTutuWawa's blog. Mainly tech stuff..."
[markdown]
highlight_code = true
[extra]
[extra.email]
user = "papatutuwawa"
domain = "polynom.me"
[extra.fedi]
url = "social.polynom.me/papatutuwawa"
handle = "@papatutuwawa@social.polynom.me"

View File

@ -0,0 +1,113 @@
+++
title = "How I Play Games on My Linux PC"
date = "2019-06-08"
template = "post.html"
# Compat with the old SSG
aliases = [ "/How-I-Play-Games.html" ]
+++
I love Linux. In fact, I love it so much that it runs on every computer I use, except for my phone but that
can be changed. It always amazes me how much control Linux gives me about my computer and how easy it is
to create a script that just does everything that I was doing manually before.
<!-- more -->
Since Septemper of 2018, I decided to stop dual booting Windows and Linux and only use Linux. I mean, I could
play my most played games under Linux: *CS:GO, Split/Second Velocity (Wine), NieR: Automata (Wine).* But there
were still some games that I could not play as either have no Linux port or refuse to run with Wine. I love
playing *Tom Clancy's The Division* and *The Division 2*. I really enjoyed playing *Tom Clancy's Rainbow Six Siege* and
*Wildlands* was much fun. Except for *The Division*, none of these games runs under Wine. So what do?
# GPU Passthrough
Before even having the thought of switching to Linux "full-time", I stumbled across [this video](https://invidio.us/watch?v=16dbAUrtMX4) by Level1Linux.
It introduced me to the concept of hardware passthrough and I wanted to do it ever since. Now that my mainboard
has an IOMMU and my CPU supports all needed virtualization extensions, I was ready.
At that time I was using a AMD Ryzen 2400G and a Nvidia Geforce GTX 1060. I chose this particular CPU
as it contains an iGPU, allowing me to have video output of my host even when I pass the 1060 through
to my VM.
<!-- There are many great tutorials out there that teach you to do this thing but I was amazed at how well -->
<!-- the games run. It should have come to no suprise but it still did. -->
The only thing that I did not like was the fact that the Nvidia driver refuses to run in a Virtual Machine, so
I had to configure my VM via libvirt in a way that hides the fact that the driver is run inside a VM.
# Dynamic GPU Passthrough
While this allowed me to play *The Division*, it was tedious to reboot to not have the GPU bound to the
vfio-pci module so that I could use it on my host. Most guides expect you to have a second powerful GPU
so that you don't have to worry about the unavailable GPU but to me it seemed like a waste.
So I wrote myself a script which...
- unloaded all Nvidia kernel modules;
- started libvirt and loaded the vfio-pci module;
- bound the GPU to the vfio-pci module;
- started the VM.
The only problem with this was that the Nvidia modules kept being loaded by the X server. This was annoying
since I had to blacklist the modules, which prevented me from using the GPU on my host. The solution, albeit
very hacky, was a custom package which installed the kernel modules into a new folder from where the modules
were manually inserted using `insmod` by another script.
My host's video output comes from my Ryzen's iGPU. It is not powerful enough to run games like *Split/Second Velocity*
or *CS:GO* at an acceptable framerate, so what do?
Since the Nvidia driver for Linux is proprietary [PRIME offloading](https://wiki.archlinux.org/index.php/PRIME#PRIME_GPU_offloading) was not an option. I, however, discovered
a library which allowed the offloading of an application's rendering - if it uses GLX - onto another GPU: [primus](https://github.com/amonakov/primus).
It worked well enough for games that used OpenGL, like *CS:GO*. But when I tried launching *Split/Second Velocity*
using Wine, it crashed. Vulkan offloading was not possible with primus, but with [primus_vk](https://github.com/felixdoerre/primus_vk). This library I never got to work so I cannot say anything about it.
The only solution to that, from my point-of-view, was to create another script with launched a second X server
on the Nvidia GPU, start Openbox as a WM on that X server and create a seamless transition from my iGPU- to my
Nvidia-X-server using [barrier](https://github.com/debauchee/barrier). I then could start applications like
Steam on the Nvidia X server and use the GPU's full potential.
Since I was using barrier for the second X server I tried doing the same with barrier inside my VM and all I can
say is that it works very well. It made the entire "workflow" with the VM much less painful as I could just take
control of the host if I ever needed to without the need for a second keyboard.
# GPU Changes
Today, my PC runs the same AMD CPU. However, the Nvidia GPU got replaced with an AMD RX 590. This allowed me to
use the opensource amdgpu driver, which was and still is a huge plus for me. It complicated some things for me
though.
While I can now use PRIME offloading on any application I want, I cannot simply unbind the RX 590 from the amdgpu
driver while in X for use in my VM. While the driver exposes this functionality, it crashes the kernel as soon
as I try to suspend or shutdown my computer.
The only solution for this is to blacklist the amdgpu module when starting the kernel, bind the GPU to the vfio-pci
driver and pass it through. Then I can load the amdgpu module again and have it attach itself to my iGPU. When I am
done with using the VM, I can re-attach the GPU to the amdgpu driver and use it there.
There are some issues with this entire setup though:
- sometimes after re-attaching, the GPU does not run with full speed. While I can normally play *CS:GO* with ~80 FPS, it can be as low as ~55 FPS after re-attachment.
- the GPU cannot be reset by the Linux kernel. This means that the GPU has to be disabled inside Windows before shutting down the VM. Otherwise, the amdgpu module cannot bind to the GPU which even crashed my kernel.
# Some Freezes
Ignoring the GPU issue, since around Linux kernel 4.1x I experienced another issue: My computer would sometimes freeze
up when opening *Steam*. In even newer versions, it even freezed by PC when I gave my VM 10GB of RAM, but did not when
I gave my VM only 8GB.
By running htop with a really small refresh interval I was lucky to observe the probable cause of these freezes: The
kernel tried to swap as much as he could, thus making everything grind to a halt. The solution to this, even though
it *feels* hacky, is to just tell the kernel to swap less aggressively by setting `vm.swappiness` to either a much
lower value to swap later to to 0 to stop swapping.
# Audio
QEMU, which I used as libvirt's backend, allows you to "pass through" audio from inside the VM to your PulseAudio socket
on the host. This worked okay-ish at first, but now - presumably because something got updated inside QEMU - it
works well enough to play games. I get the occasional crackling but it is not distracting at all.
I also tried a software called [scream](https://github.com/duncanthrax/scream) which streamed the audio from a
virtual audio device inside the VM to the network. As the only network interface attached to my VM was going directly
to my host, I just set up the receiver application to listen only on this specific interface. This worked remarkebly
well as I never heard any crackling.
The only issue that I had with scream was that, for some reason, *Tom Clancy's The Division 2* would crash every 5
minutes when I was using scream. Without it, *The Division 2* never crashed.
# Conclusion
My solutions are probably not the most elegant or the most practical but
![](/img/as-long-as-it-works.jpg)

View File

@ -0,0 +1,320 @@
+++
title = "Mainline Hero Part 0 - Modern Linux For My Galaxy S7"
date = "2019-07-01"
template = "post.html"
aliases = [ "/Mainline-Hero.html" ]
[extra]
mathjax = true
+++
Ever heard of [PostmarketOS](https://postmarketos.org/)? If not, then here's a short summary:
PostmarketOS aims to bring *"[a] real Linux distribution for phones and other mobile devices [...]"* to,
well, phones and other mobile devices.
<!-- more -->
Ever since reading about it, I've been intrigued by the idea of running a real Linux distro
with my UI of choice, be it *Plasma* or *Unity*, on my phone. Perhaps even running the device
without any proprietary firmware blobs. So, I tried my best at contributing to PostmarketOS, which
resulted in 3 MRs that have been accepted into master (Sorry for forgetting to bump the pkgver...).
With this series - if I manage to not break my phone - I want to document what I, someone
who has absolutely no idea what he is doing, learned about all this stuff, how I went about it
and what the results are.
## Mainline Hero #0 - Preparations
Before I can even think about trying to make mainline Linux run on my *Galaxy S7*, we should think
about how we can diagnose any issues that the kernel or the bootloader might have. And how do
professionals debug? Exactly! With **a lot** of `printf()` statements. But how can we retrieve those
from the device?
### Getting Output
While preparing myself for this task, I learned that there are a couple of ways.
One is called [*RAM console*](https://wiki.postmarketos.org/wiki/Mainlining_FAQ#Writing_dmesg_to_RAM_and_reading_it_out_after_reboot). What it does is just dump everything that the kernel prints into a
reserved region of memory, which can later be retrieved by reading from `/proc/last_kmsg` with a
downstream kernel.
The other one is via a [serial cable](https://wiki.postmarketos.org/wiki/Serial_debugging). This sounded
pretty difficult at first, the reason being that I have no idea about hardware, besides the occasional
**PC** hardware talk. I imagined a cable coming out of a box, packed to the brim with electronics
doing some black magic.
The reality is - thankfully - much simpler. It is, basically, just a normal USB cable. I mean: *USB* literally
stands for [*Universal Serial Bus*](https://en.wikipedia.org/wiki/USB). But how come my PC does not
read those kernel logs when I plug in my phone?
As it turns out, there is a component built into my phone which decides exactly what data flows from my
phone to the PC. Reading the [XDA post](https://forum.xda-developers.com/galaxy-s7/how-to/guide-samsung-galaxy-s7-uart-t3743895) which the PostmarketOS Wiki linked helped understand that my
device contains a *MUIC*, a chip which multiplexes the data lines of the USB cable towards different
"subsystems". As I later learned, the USB standard for connectors of type Micro Type B requires 5 pins:
power, ground, RX, TX and ID. Power and ground should be self-explanatory if you know anything
about electronics (I don't). RX and TX are the two data lines that USB uses. As USB is just a serial
connection, only **one** line is used for sending and one for receiving data. The ID line is the interesting
one: it tells the MUIC what subsystem it should multiplex the data lines to.
[Pinout diagram](https://web.archive.org/web/20190120234321/https://pinouts.ru/PortableDevices/micro_usb_pinout.shtml) of the Micro Type B connector:
```
_______________
/ \
| 1 2 3 4 5 |
+--|--|--|--|--|--+
| | | | +-o Ground
| | | +----o ID
| | +-------o D+ (Data)
| +----------o D- (Data)
+-------------o VCC (Power)
```
According to the XDA post, the MUIC switches to serial - used for dumping output of the bootloader and the
kernel - if it measures a resistance of 619kOhm attached to the ID pin. So, according to the diagram in the
post, I built a serial cable.
But how did the author of the XDA post know of the exact resistance that would tell the MUIC to switch to
serial? If you `grep` the
[*S7*'s defconfig](https://raw.githubusercontent.com/ivanmeler/android_kernel_samsung_herolte/lineage-15.1/arch/arm64/configs/exynos8890-herolte_defconfig),
for `MUIC`, then one of the results is the KConfig flag `CONFIG_MUIC_UNIVERSAL_MAX77854`.
If we then search the kernel tree for the keyword `max77854`, we find multiple files; one being
`drivers/mfd/max77854.c`. This file's copyright header tells us that we deal with a *Maxim 77854* chip. Judging
from the different files we find, it seems as if this chip is not only responsible for switching between serial
and regular USB, but also for e.g. charging (`drivers/battery_v2/include/charger/max77854_charger.h`).
However, the really interesting file is `drivers/muic/max77854.c`, since there we can find an array of structs
that contain strings. Sounds pretty normal until you look at the strings more closely: One of the strings is
the value `"Jig UART On"`:
```
[...]
#if defined(CONFIG_SEC_FACTORY)
{
.adc1k = 0x00,
.adcerr = 0x00,
.adc = ADC_JIG_UART_ON,
.vbvolt = VB_LOW,
.chgdetrun = CHGDETRUN_FALSE,
.chgtyp = CHGTYP_NO_VOLTAGE,
.control1 = CTRL1_UART,
.vps_name = "Jig UART On",
.attached_dev = ATTACHED_DEV_JIG_UART_ON_MUIC,
},
#endif /* CONFIG_SEC_FACTORY */
[...]
```
The keyword `ADC_JIG_UART_ON` seems especially interesting. Why? Well, the driver has to know what to do
with each measured resistance. It would make sense that we call the constant which contains the resistance
something like that. Additionally, it is the only constant name that does not immediately hint at its
value or function.
So we search the kernel source for this keyword. Most occurences are just
drivers using this constant. But one hit shows its definition: `include/linux/muic/muic.h`. There we
find on [line 106](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/b51cf88008606ebac535785ff549b9f55e5660b4/include/linux/muic/muic.h#L106)
a comment which states that this constant represents a resistance of 619kOhm.
To actually build the serial cable, we need to have a USB Type B male connector that we can solder our cables to.
My first thought was to buy a simple and cheap USB Type B cable, cut it, remove the isolation and solder my
connectors to it. I, however, failed to notice that the Type A part of the cable - the one you plug into e.g.
your PC - only has 4 pins, while the Type B part has 5. After stumbling upon some random diagram, I learned that
for regular USB connectivity, such as connecting your phone to your PC, the ID pin is not needed, so it is left
disconnected. As this plan failed, I proceeded to buy a USB Type B male connector. Since I bought it on the
Internet and the seller did not provide a diagram of what pad on the connector connects to what pin, I also
ordered a USB Type B female breakout board.
After all parts arrived, I used a digital multimeter to measure the resistance between each pad on the connector
and on the breakout board. Since I have no idea about electronics, let me explain: Resistance is defined as
$R = \frac{U}{I}$, where $R$ is the resistance, $U$ the voltage and $I$ the current. This means that we should
measure - practically speaking - infinite resistance when no current is flowing and some resistance $R \gt 0$
when we have a flowing current, meaning that we can test for continuity by attempting to measure resistance.
After some poking around, I got the following diagram:
```
+---------o VCC
| +-----o D+
| | +-o GND
___|___|___|___
/ ? ? ? \
| ? ? |
+------|---|------+
| +---o ID
+-------o D-
```
![The "Serial Cable"](/img/serial-cable.jpg)
Since the data that the serial port inside the phone is coming in using a certain protocol, which also includes
timing, bit order and error correcting codes, we need something to convert this data into something that is
usable on the host. Since the USB specification for data may differ from what we actually receive, we can't just
connect the phone's D- and D+ lines to the host USB's D- and D+. Hence the need for a device which does this
conversion for us and also deals with the timing of the data: The tiny board to which all cables lead to
basically just contains an *FT232RL* chip from *FTDI*. It is what does all the conversion and timing magic.
Since I don't want to accidentally brick by phone by frying it with 3.3V or 5V - though I think that damaging
the hardware with 5V is pretty difficult - I did not connect the USB's 5V to the *FT232*'s VCC port.
Booting up the device, we start to see data being sent via serial!
```
[...]
CP Mailbox Debug
0x10540180 : 0xdca7b414 0x 804f99f
0x10540184 : 0xdeb36080 0x8112566f
0x10540188 : 0xf4bf0800 0x2534862d
0x1054018C : 0x61ff350e 0x1208fd27
0x10540190 : 0x17e60624 0x18121baf
0x105C0038 : 0x3bd58404 0x5674fb39
CP BL flow
0x10920014 : 0x79dab841 0x9b01b3fd
0x10800028 : 0xffbd34b1 0x9fd118cc
Resume el3 flow
EL3_VAL : 0xdcfee785 0xfbb6b0a2 0xccf99641
muic_register_max77854_apis
muic_is_max77854 chip_id:0x54 muic_id:0xb5 -> matched.
[MUIC] print_init_regs
INT:01 00 00 ST:1d 00 00 IM:00 00 00 CDET:2d 0c CTRL:1b 3b 09 b2 HVCT:00 00 LDO0:47
MUIC rev = MAX77854(181)
init_multi_microusb_ic Active MUIC 0xb5
[...]
```
Nice! We can see what *SBOOT*, the bootloader that *Samsung* uses, tells us. But for some reason, I wasn't
able to get into the *SBOOT* prompt to tell the kernel to dump everything via serial. While the XDA post
used the programm `minicom`, which I could use to get *SBOOT* output, it never seemed to send the carriage
returns while I was pressing the return key like crazy. So what I did was try to use a different tool to
interact with the serial converter: `picocom`. And it worked!
Although I set the kernel parameters to output to the TTY device `ttySAC4`, just like the XDA post said,
I did not receive any data.
### Device Tree
So we can just try and boot mainline on the phone then, yes? With a very high probability: no. The reason being
that the kernel has no idea about the actual hardware inside the phone.
This may seem weird as you don't have to tell your kernel about your shiny new GPU or about your RAM. The reason
is that your PC is designed to be modular: You can swap the CPU, the RAM and even the attached devices, like
your GPU. This means that on X86, the CPU is able to discover its hardware since there is only one bus for
attaching devices (ignoring RAM and the CPU): the PCI bus. How does the CPU know about its RAM?
The RAM-modules are swappable, which means that the CPU cannot anticipate just how much RAM you
have in your system. These information get relayed, perhaps via the MMU, to the CPU.
Can't we just probe the available memory in an ARM SoC? Technically yes, but it would take a lot
of time if we have a modern 64 bit CPU. Moreover, how do you know that a probed memory location
is not a memory mapped device? Wouldn't it make sense to bake this data into the SoC then? Here
again: not really. The reason is that the SoCs are vendor specific. This means that the vendor
basically just buys the rights to put the CPU into their SoC. The rest is up to the vendor. They
can add as much RAM as they want, without the CPU designer having much input. This means that the
data must not be **hardcoded** into the CPU.
On ARM and probably most other microprocessors devices can be memory mapped, which means that they respond to
a certain region of memory being written to or read from. This makes auto-discovering devices quite difficult
as you would have to probe **a lot** of memory regions.
As an example: Imagine we can access 4 different locations in memory, each holding 1 byte of data. These regions
are at the memory addresses `0x1` to `0x4`. This means that we would have to probe 4 memory locations. Easy,
right?
Not exactly. We would have to probe 4 times to discover 4 possible memory mapped areas with a width of 1 byte.
If we allow a width of 2 bytes, then we would have to probe 3 different regions: `0x1`-`0x2`, `0x2`-`0x3` and
`0x3`-`0x4`.
This assumes that memory maps need to be directly next to each other. Otherwise we would need to use the
binomial coefficient.
This results in 10 (4x 1 byte, 3x 2 bytes, 2x 3 bytes and 1x 4 bytes) different probing attempts to discover
possible memory mapped devices. This does not seem much when we only have a 2 bit CPU, but in the case of the
*S7*, we have a 64 bit CPU; so we would have to probe about $\sum_{n=1}^{2^{64}} n$ times. This finite sum
is equal ([German Wikipedia](https://de.wikipedia.org/wiki/Gau%C3%9Fsche_Summenformel)) to
$\frac{1}{2} 2^{64} {(2^{64} + 1)} = 1.7014 \cdot 10^{38}$. Quite a lot! Keep in mind that this
calculation does not factor in any other busses that the SoC might use; they can, probably, use their own
address space.
So, long story short: We need to tell the kernel about all the hardware beforehand. This is where the so-called
Device Tree comes into play. It is a structured way of describing the attached hardware. You can find examples
in the kernel tree under `arch/arm{,64}/boot/dts/`. The problem that arises for my phone is that it
uses the Exynos SoC from Samsung. While Exynos 7 or older would just require an addition to the already existing
Device Tree files, the *S7* uses the Exynos 8890 SoC. This one is not in mainline, which mean that it is
required to port it from the [downstream kernel](https://github.com/ivanmeler/android_kernel_samsung_universal8890/) into mainline.
### Device Support
The challenge that follows, provided I don't brick my phone, is the kernel support for the SoC's hardware.
#### GPU
The GPU of the Exynos 8890 SoC is a Mali-T880 from ARM. While there is no "official" FOSS-driver for it, one
is in development: [Panfrost](https://gitlab.freedesktop.org/panfrost/linux). One of the developers once
mentioned in PostmarketOS' Matrix channel that the driver is not ready for day-to-day use. But hopefully it
will be in the forseeable future.
#### Wifi
While I found no data on the Exynos 8890's Wifi-chip, I managed to allow the downstream kernel to use it, albeit
with its proprietary firmware ([MR](https://gitlab.com/postmarketOS/pmaports/merge_requests/309)).
This patch requires a patch which changes the path of the firmware in the file `drivers/net/wireless/bcmdhd4359/dhd.h`.
The license header of [said file](https://github.com/ivanmeler/android_kernel_samsung_universal8890/blob/lineage-15.0/drivers/net/wireless/bcmdhd4359/dhd.h)
hints at a chip from Broadcom. The model of the chip appears to be 4359. What the *dhd* stand for? I don't know.
Looking at the compatibility of the [kernel modules](https://wireless.wiki.kernel.org/en/users/drivers/brcm80211) for Broadcom wireless chips, we can find
that the *BCM4359* chip is compatible. But is that the same as the module folder's name specifies? Again, I don't know.
Hopefully it is...
#### Other Components
At the time of writing this post, it has been a "long time" since I last flashed PostmarketOS on
my phone to look at what the kernel is saying. All of this device data I gathered by looking at
spec sheets by Samsung or the kernel. So I don't really know what other hardware is inside my
*S7*.
## Next Steps
The next steps are actually testing things out and playing around with values and settings and all kinds of things.
## Other Devices I Have Lying Around
This may be off-topic for the "*Mainline Hero*" series but I recently tried to find out whether another device
I have lying around - a *Samsung Galaxy Note 8.0* - also uses such a MUIC to multiplex its USB port. While
at first I somehow found out, which I now know is wrong, that the *Note 8.0* uses the same *Maxim 77854* as my
*S7*, I discovered that the *Note 8.0* does use a MUIC, just not the *77854*. Since I found no other links
talking about this, I am not sure until I test it, but what I will do is tell you about how I reached this
conclusion!
If you `grep` the [defconfig for the herolte](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/arch/arm64/configs/exynos8890-herolte_defconfig) for
"*77854*", then one of the results is the flag `CONFIG_MUIC_UNIVERSAL_MAX77854`. The prefix `CONFIG_MUIC` makes
sense since this enables kernel support for the *Maxim 77854* **MUIC**. As such, we should be able to find
an enabled MUIC in the *Note 8.0*'s [defconfig](https://github.com/LineageOS/android_kernel_samsung_smdk4412/blob/lineage-16.0/arch/arm/configs/lineageos_n5110_defconfig).
If we grep for `CONFIG_MUIC`, then we indeed get results. While the results do not look like the one for
the *77854*, we get ones like `CONFIG_MUIC_MAX77693_SUPPORT_OTG_AUDIO_DOCK`. This indicates that the *Note 8.0*
has a *Maxim 77693* MUIC built in. But it's not a very strong indicator. Since the [kernel source](https://github.com/LineageOS/android_kernel_samsung_smdk4412/) is available
on Github, we can just search the repo for the keyword "*MAX77693*". One of the results hints at the file
`drivers/misc/max77693-muic.c`. Looking at the Makefile of the `drivers/misc` directory, we find that this
source file is only compiled with the KConfig flag `CONFIG_MFD_MAX77693`. Grepping the *Note 8.0*'s defconfig
for this flag yields the result that this kernel module is enabled, hence hinting at the existence of a MUIC
in the *Note 8.0*.
If we take a closer look at the source file at `drivers/misc/max77693-muic.c`, we can find an interesting part
at [line 102](https://github.com/LineageOS/android_kernel_samsung_smdk4412/blob/b7ffe7f2aea2391737cdeac2a33217ee0ea4f2ba/drivers/misc/max77693-muic.c#L102):
```
[...]
ADC_JIG_UART_ON = 0x1d, /* 0x11101 619K ohm */
[...]
```
This means that, as the *Maxim 77854* requires a 619kOhm resistor to enable UART, we can debug
the *Note 8.0* with the same serial cable as the *S7*.
Plugging it into the DIY serial cable and booting it up, we also get some output:
```
[...]
BUCK1OUT(vdd_mif) = 0x05
BUCK3DVS1(vdd_int) = 0x20
cardtype: 0x00000007
SB_MMC_HS_52MHZ_1_8V_3V_IO
mmc->card_caps: 0x00000311
mmc->host_caps: 0x00000311
[mmc] capacity = 30777344
```
Theory proven! We **can** also serial debug the *Note 8.0* using the same cable.
## Some Closing Words
I want to emphasize that just very few of the things I mentioned were discovered or implemented by me. I just collected
all these information to tell you about what I learned. The only thing that I can truly say I discovered is the MR for
the Wifi firmware...
Additionally, I want to make it clear that I have no idea about microelectronics, electronics or ARM in general. All the
things I wrote that are about ARM or electronic - especially everything in the *Device Tree* section - is pure speculation
on my side. I never really looked into these things, but all the statements I made make sense to me. You can't just probe
$2^{64}$ different memory addresses just to figure out how much RAM you have, can you?

View File

@ -0,0 +1,169 @@
+++
title = "Mainline Hero Part 1 - First Attempts At Porting"
date = "2019-08-21"
template = "post.html"
aliases = [ "/Mainline-Hero-1.html" ]
+++
In the first post of the series, I showed what information I gathered and what tricks can be used
to debug our mainline port of the *herolte* kernel. While I learned a lot just by preparing for
the actual porting, I was not able to actually get as close as to booting the kernel. I would have
liked to write about what I did to *actually* boot a *5.X.X* kernel on the device, but instead I will tell you
about the journey I completed thus far.
<!-- more -->
If you are curious about the progress I made, you can find the patches [here]({{ site.social.git_url}}/herolte-mainline). The first patches I produced are in the `patches/` directory, while the ones I created with lower
expectations are in the `patches_v2/` directory. Both "patchsets" are based on the `linux-next` source.
## Starting Out
My initial expectations about mainlining were simple: *The kernel should at least boot and then perhaps
crash in some way I can debug*.
This, however, was my first mistake: Nothing is that easy! Ignoring this, I immeditately began writing
up a *Device Tree* based on the original downstream source. This was the first big challenge as the amount of
downstream *Device Tree* files is overwhelming:
```
$ wc -l exynos* | awk -F\ '{print $1}' | awk '{sum += $1} END {print sum}'
54952
```
But I chewed through most of them by just looking for interesting nodes like `cpu` or `memory`, after which
I transfered them into a new simple *Device Tree*. At this point I learned that the *Github* search does not
work as well as I thought it does. It **does** find what I searched for. But only sometimes. So how to we find
what we are looking for? By *grep*-ping through the files. Using `grep -i -r cpu .` we are able to search
a directory tree for the keyword `cpu`. But while *grep* does a wonderful job, it is kind of slow. So at that
point I switched over to a tool called `ripgrep` which does these searches a lot faster than plain-old grep.
At some point, I found it very tiring to search for nodes; The reason being that I had to search for specific
nodes without knowing their names or locations. This led to the creation of a script which parses a *Device Tree*
while following includes of other *Device Tree* files, allowing me to search for nodes which have, for example, a
certain attribute set. This script is also included in the "patch repository", however, it does not work perfectly.
It finds most of the nodes but not all of them but was sufficient for my searches.
After finally having the basic nodes in my *Device Tree*, I started to port over all of the required nodes
to enable the serial interface on the SoC. This was the next big mistake I made: I tried to do too much
without verifiying that the kernel even boots. This was also the point where I learned that the *Device Tree*
by itself doesn't really do anything. It just tells the kernel how the SoC looks like so that the correct
drivers can be loaded and initialized. So I knew that I had to port drivers from the downstream kernel into the
mainline kernel. The kernel identifies the corresponding driver by looking at the data that the drivers
expose.
```
[...]
static struct of_device_id ext_clk_match[] __initdata = {
{ .compatible = "samsung,exynos8890-oscclk", .data = (void *)0, },
};
[...]
```
This is an example from the [clock driver](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/drivers/clk/samsung/clk-exynos8890.c#L122) of the downstream kernel.
When the kernel is processing a node of the *Device Tree* it looks for a driver that exposes the same
compatible attribute. In this case, it would be the *Samsung* clock driver.
So at this point I was wildly copying over driver code into the mainline kernel. As I forgot this during the
porting attempt, I am
mentioning my mistake again: I never thought about the possibility that the kernel would not boot at all.
After having "ported" the driver code for the clock and some other devices I decided to try and boot the
kernel. Having my phone plugged into the serial adapter made my terminal show nothing. So I went into the
*S-Boot* console to poke around. There I tried some commands in the hope that the bootloader would initialize
the hardware for me so that it magically makes the kernel boot and give me serial output. One was especially
interesting at that time: The name made it look like it would test whether the processor can do **SMP** -
**S**ymmetric **M**ulti**p**rocessing; *ARM*'s version of *Intel*'s *Hyper Threading* or *AMD*'s *SMT*.
By continuing to boot, I got some output via the serial interface! It was garbage data, but it was data. This
gave me some hope. However, it was just some data that was pushed by something other than the kernel. I checked
this hypothesis by installing the downstream kernel, issuing the same commands and booting the kernel.
## Back To The Drawing Board
At this point I was kind of frustrated. I knew that this endeavour was going to be difficult, but I immensely
underestimated it.
After taking a break, I went back to my computer with a new tactic: Port as few things as possible, confirm that
it boots and then port the rest. This was inspired by the way the *Galaxy Nexus* was mainlined in
[this](https://postmarketos.org/blog/2019/06/23/two-years/) blog post.
What did I do this time? The first step was a minimal *Device Tree*. No clock nodes. No serial nodes. No
GPIO nodes. Just the CPU, the memory and a *chosen* node. Setting the `CONFIG_PANIC_TIMEOUT`
[option](https://cateee.net/lkddb/web-lkddb/PANIC_TIMEOUT.html) to 5, waiting at least 15 seconds and seeing
no reboot, I was thinking that the phone did boot the mainline kernel. But before getting too excited, as I
kept in mind that it was a hugely difficult endeavour, I asked in *postmarketOS*' mainline Matrix channel whether it could happen that the phone panics and still does not reboot. The answer I got
was that it could, indeed, happen. It seems like the CPU does not know how to shut itself off. On the x86 platform, this
is the task of *ACPI*, while on *ARM* [*PSCI*](https://linux-sunxi.org/PSCI), the **P**ower **S**tate
**C**oordination **I**nterface, is responsible for it. Since the mainline kernel knows about *PSCI*, I wondered
why my phone did not reboot. As the result of some thinking I thought up 3 possibilities:
1. The kernel boots just fine and does not panic. Hence no reboot.
2. The kernel panics and wants to reboot but the *PSCI* implementation in the downstream kernel differs from the mainline code.
3. The kernel just does not boot.
The first possibility I threw out of the window immeditately. It was just too easy. As such, I began
investigating the *PSCI* code. Out of curiosity, I looked at the implementation of the `emergency_restart`
function of the kernel and discovered that the function `arm_pm_restart` is used on *arm64*. Looking deeper, I
found out that this function is only set when the *Device Tree* contains a *PSCI* node of a supported version.
The downstream node is compatible with version `0.1`, which does not support the `SYSTEM_RESET` functionality
of *PSCI*. Since I could just turn off or restart the phone when using *Android* or *postmarketOS*, I knew
that there is something that just works around old firmware.
The downstream [*PSCI* node](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/arch/arm64/boot/dts/exynos8890.dtsi#L316) just specifies that it is compatible with `arm,psci`, so
how do I know that it is only firmware version `0.1` and how do I know of this `SYSTEM_RESET`?
If we grep for the compatible attribute `arm,psci` we find it as the value of the `compatible` field in the
source file `arch/arm64/kernel/psci.c`. It [specifies](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/arch/arm64/kernel/psci.c#L381) that the exact attribute of `arm,psci`
results in a call to the function `psci_0_1_init`. This indicates a version of *PSCI*. If we take a look
at *ARM*'s [*PSCI* documentation](http://infocenter.arm.com/help/topic/com.arm.doc.den0022d/Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf)
we find a section called *"Changes in PSCIv0.2 from first proposal"* which contains the information that,
compared to version 0.2, the call `SYSTEM_RESET` was added. Hence we can guess that the *Exynos8890* SoC
comes with firmware which only supports this version 0.1 of *PSCI*.
After a lot of searching, I found a node called `reboot` in the [downstream source](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/arch/arm64/boot/dts/exynos8890.dtsi#L116).
The compatible driver for it is within the [*Samsung* SoC](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/drivers/soc/samsung/exynos-reboot.c) driver code.
Effectively, the way this code reboots the SoC, is by mapping the address of the PMU, which I guess stands for
*Power Management Unit*, into memory and writing some value
to it. This value is probably the command which tells the PMU to reset the SoC.
In my "patchset" *patches_v2* I have ported this code. Testing it with the downstream kernel, it
made the device do something. Although it crashed the kernel, it was enough to debug.
To test the mainline kernel, I added an `emergency_restart` at the beginning of the `start_kernel` function.
The result was that the device did not do anything. The only option I had left was 3; the kernel does not even
boot.
At this point I began investigating the `arch/arm64/` code of the downstream kernel more closely. However, I
noticed something unrelated during a kernel build: The downstream kernel logs something with *FIPS* at the
end of the build. Grepping for it resulted in some code at [the end](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/scripts/link-vmlinux.sh#L253) of the `link-vmlinuz.sh` script. I thought
that it was signing the kernel with a key in the repo, but it probably is doing something else. I tested
whether the downstream kernel boots without these crypto scripts and it did.
The only thing I did not test was whether the kernel boots without
["double-checking [the] jopp magic"](https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/scripts/link-vmlinux.sh#L270). But by looking at this script, I noticed another interesting thing:
`CONFIG_RELOCATABLE_KERNEL`. By having just a rough idea of what this config option enables, I removed it
from the downstream kernel and tried to boot. But the kernel did not boot. This meant that this option
was required for booting the kernel. This was the only success I can report.
By grepping for this config option I found the file `arch/arm64/kernel/head.S`. I did not know what it was
for so I searched the internet and found a [thread](https://unix.stackexchange.com/questions/139297/what-are-the-two-head-s-files-in-linux-source)
on *StackOverflow* that explained that the file
is prepended onto the kernel and executed before `start_kernel`. I mainly investigated this file, but in
hindsight I should have also looked more at the other occurences of the `CONFIG_RELOCATABLE_KERNEL` option.
So what I did was try and port over code from the downstream `head.S` into the mainline `head.S`. This is
the point where I am at now. I did not progress any further as I am not used to assembly code or *ARM*
assembly, but I still got some more hypotheses as to why the kernel does not boot.
1. For some reason the CPU never reaches the instruction to jump to `start_kernel`.
2. The CPU fails to initialize the MMU or some other low-level component and thus cannot jump into `start_kernel`.
At the moment, option 2 seems the most likely as the code from the downstream kernel and the mainline kernel
do differ some and I expect that *Samsung* added some code as their MMU might have some quirks that the
mainline kernel does not address. However, I did not have the chance to either confirm or deny any of these
assumptions.
As a bottom line, I can say that the most useful, but in my case most ignored, thing I learned is patience.
During the entire porting process I tried to do as much as I can in the shortest amount of time possible.
However, I quickly realized that I got the best ideas when I was doing something completely different. As
such, I also learned that it is incredibly useful to always have a piece of paper or a text editor handy
to write down any ideas you might have. You never know what might be useful and what not.
I also want to mention that I used the [*Bootlin Elixir Cross Referencer*](https://elixir.bootlin.com/linux/latest/source)
a lot. It is a very useful tool to use when exploring the kernel source tree. However, I would still
recommend to have a local copy so that you can very easily grep through the code and find things that
neither *Github* nor *Elixir* can find.

View File

@ -0,0 +1,163 @@
+++
title = "Road2FOSS - My Journey to Privacy by Self-Hosting"
date = "2019-10-06"
template = "post.html"
aliases = [ "/Road-to-Foss.html" ]
+++
About one year ago, I made plans to ditch many of the proprietary services that I used
on a daily basis and replace them with FOSS alternatives. Now it is a year later and
while my project is not done, I really did quite a lot.
<!-- more -->
## History
But why do all this?
The answer consists of three main points, though they are weighed differently:
1. Privacy: The inspiration for this project came from the fact that I did not trust my messaging application back then. It was proprietary and probably collecting all the data it could, thus I wanted to get away from it.
2. Learning: I really enjoy tinkering with computer hardware, software and am quite interested in server administration. Hence, I thought it would be a greate learning opportunity for me.
3. Fun: I do enjoy this kind of work, so I thought it would be a fun, but quite major, side project.
I knew that it would be a major undertaking but I still wanted to give it a try.
## Instant Messaging
Judging by the amount of personal data I leak when texting people I know I wanted to switch IM services
as quickly as possible.
At this stage, there were three candidates for me:
- *Signal*
- *Matrix* with Riot
- *Jabber/XMPP*
Originally, *Signal* was my preferred choice since I really liked its interface. But the problem with Signal,
and I do not blame the developers for this one, is that the service only works with a mobile device running
the app. If I wanted to run *Signal* on my computer because, for example, my phone is broken or the battery
is empty, then I just could not since it requires my phone to be online. Also, which I learned only just recently,
*Signal*'s *Android* app has a bug which [drains the phone's battery](https://github.com/signalapp/Signal-Android/issues/8658)
when one does not have *Google services* installed on their phone.
*Matrix* in combination with Riot was another idea of mine. But here the problem was the mobile app. It
seemed to me more like the interface of messengers like *Slack* and *Discord*, which I personally do not like
for mobile Instant Messaging. When I last looked at the entire *Matrix* ecosystem, there was only one
well-working client for mobile, which was Riot. Additionally, the homeserver was difficult to set up; at least much more than
*Prosody*, to which I will come in the next paragraph. Moreover, I read in the the [*Disroot blog*](https://web.archive.org/web/20190921180013/https://disroot.org/en/blog/donating_floss) that they have
quite some problems with their *Matrix* homeserver as *"[...] [k]eeping room history and all metadata connected to them forever
is a terrible idea, in our opinion, and not sustainable at all. One year of history is way too much already [...]"*. This
was the end for the idea of self-hosting a *Matrix* server.
*Jabber/XMPP* being something I saw only once way back when browsing a linux forum, I became interested. It
checked all my requirements: It is cross-platform, as it is only a protocol, allows self-hosting with FOSS
software and, the most important factor, includes End-to-End-Encryption using *OMEMO*. I also started to
appreciate federated software solutions, which made *Jabber* the clear winner for me. Tehe *Jabber* clients
that I now use on a daily basis are also very fine pieces of opensource software: *Conversations*' interface
is simple, works without draining my battery and it just works. *Gajim*, after some configuration and tweaking,
works really well, looks clean and simple and I would really love to replace *Discord* on the desktop with
*Gajim*.
Recently, I also started to use *Profanity*, which seems a bit rough around the edges and sometimes does not
work, but maybe I am just doing something wrong.
In terms of server software I initially wanted to go with *ejabberd*. But after seeing its amount of
documentation, I just chose *Prosody*. It is the software that was the least painful to set up with all
requirements for modern messaging being covered by it internal or external modules. It also never crashed;
only when I messed the configuration up with syntax errors.
Since I use *Discord* and it is more difficult to bring people over from there, I went with a compromise
and started to bridge the channels I use the most to a *Jabber MUC* using [*matterbridge*](https://github.com/42wim/matterbridge).
Thus I can use those channels without having to have the *Discord* app installed on my devices.
Another use I got out of *Jabber* is the fact that I can create as many bot accounts on my server as I want. While this
sounds like I use those bots for bad things it is the opposite: I use them to tell me when something is wrong
using *netdata* or for the already mentioned bridge between *Discord* and *Jabber*.
## VoIP
VoIP is something that I use even more than plain Instant Messaging, which is why I wanted to self-host
a FOSS VoIP-solution. The most commonly used one is *Mumble*, which was a run-and-forget experience. Especially
when not using the full server but a smaller one like *umurmur*.
## Code
At first, I used *Github*. But after *Microsoft* bought it, I was a bit sceptical and switched to *Gitlab*, which
worked really well. It was even opensource so I started using it. But after some time, I found that
there are some things that annoy me with *Gitlab*. This includes it automatically enabling "Pipelines" when I
just created a repository even though I never enabled those.
That was when I came across *gogs* and *gitea*; the latter being my current solution. I wanted a simple
software that I can just run and has a somewhat nice interface. Why the nice interface? I want that if people
look at my code that it feels familiar to browse it in the browser. Also, I can invite friends to use it if
they also want to get away from proprietary services and software.
My instance has registrations disabled as I do not have the time to moderate it, but I have seen that federation
of some sorts, in the context of *ForgeFed*, is being discussed on the issue tracker, though you should not quote
me on this one.
*Gitea* was mostly a run-and-forget experience for me and is working very well.
## Personal Information Management
Since I've started to use calendars more, I wanted a solution to sync those across my devices. Before this entire
project I was using *Google*'s own calendar service. Then I started using *Disroot*'s NextCloud to synchronize
calendar data. However, it not being encrypted at rest was a concern for me as my calendar does contain some
events that I would not like an attacker to know as this would put the attacker in a position where sensitve
information can be deduced about me.
After some looking around, I found [*EteSync*](https://github.com/etesync). This software works really great, given that the server is just
a simple django app that stores data and does user management and authentication. The *Android* app, in my case,
does most of the work and works really well. The only problem I had was the fact that *EteSync* has no desktop
client. They provide a web app and a server that bridges between regular DAV and *EteSync* but nothing like
a regular client.
Since I used regular WebDAV services, like the *Disroot* one I mentioned earlier, I have [*vdirsyncer*](https://github.com/pimutils/vdirsyncer)
installed and configured only to find out that they dropper support for *EteSync* in the last version.
Wanting a tool like *vdirsyncer* but for *EteSync* I went to work and created [*etesyncer*](https://git.polynom.me/PapaTutuWawa/etesyncer).
## EMail
Most of my online life I used proprietary EMail-services. Most of that time I used *GMail*. Since I bought a
domain for this project and have a server running, I thought: *"Why not self-host EMail?"*. This is exactly
what I did!
I use the "traditional" combination of *postfix* and *dovecot* to handle incoming, outgoing EMail and IMAP
access. Since I use [*mu4e*](https://web.archive.org/web/20190921054652/http://www.djcbsoftware.nl/code/mu/mu4e.html) in combination with *msmtp* and *mbsync* for working with email, I did not
install a webmail client.
This was the most difficult part to get working as the configuration sometimes worked and sometimes not.
The main culprit here was *DKIM* because it changed the permissions of its files at startup to something else
which made *openDKIM* crash. Now it stopped doing this but I am not sure why.
What made the EMail-server so difficult was also the fact that so much goes into hosting an EMail-server I never
thought about, like *DKIM*, *SPF* or having a *FQDN*.
At this point, it pretty much runs itself. It works, it receives EMails, it sends EMails and it allows
me to view my EMails via IMAP.
Coming from *Protonmail*, the only thing that I am missing is encryption of my EMails. Since not every person
I contact using EMail uses or knows *PGP*, I would like to encrypt incoming EMails. While there are solutions
to do this, they all involve encrypting the EMail after they are put in the queue by *postfix*, which puts
them on disk. Hence, the mail was once written in plaintext. While I would like to avoid this, I have not
found a way of doing this without digging into *postfix*'s code and adding support for this.
## Blog
I wanted a blog for a long time and since I had a spare domain lying around, I decided to create one. While
I could have gone with a solution like *Wordpress* and the like, they were too complicated for my needs.
So I just went with the simplest solution which is using a static site generator: *jekyll* in my case.
This is one of the points where decentralization was a huge factor directly from the start, as this is exactly
what the web was made for, so I was actively avoiding any non-selfhost solutions. While I could have gone with
a federated solution like *write freely*, I chose the staic page generator as it was much simpler. And because
I love writing in Markdown.
## Webserver
Since I now use *GPG* to sign any emails that I send, I needed a way of exposing these keys to the public. While
I could have gone with a keyserver, I decided against it. Admittedly, I did not look into self-hosting a
keyserver but this was not my plan. I want to keep everything simple and prevent myself from installing too many
services on my server. This led me to just putting my public keys on the server and pointing my
webserver to them.
Since I run multiple services that are accessible via the browser, I needed the webserver as a reverse proxy,
pointing my different domain names to the correct services. This way, all services can run on their own ports while
the reverse proxy "unifies" them on port 443.
## Conclusion
All in all I am very happy with my setup. It allows me to host my own instances privacy-respecting software the way I like
to. It gives me something to do and allows me to learn about system administration and different tools like *Docker*
or *Ansible*. So all in all, although the project has no real end, I would say that it was and is a huge success for me.
During the course of this project, I also switched services like my search engine or the software with which I watch videos
but as I do not self-host these, I did not mention them.

View File

@ -0,0 +1,108 @@
+++
title = "Lessons Learned From Self-Hosting"
date = "2020-01-03"
template = "post.html"
aliases = [ "/Selfhosting-Lessons.html" ]
+++
Roughly eight months ago, according to my hosting provider, I spun up my VM which
I use to this day to self-host my chat, my mail, my git and so on. At the beginning, I thought that
it would allow me both to get away from proprietary software and to learn Linux administration. While
my first goal was met without any problems, the second one I achieved in ways I did not anticipate.
<!-- more -->
During these eight months, I learned quite a lot. Not by reading documentation, but by messing up
deployments. So this post is my telling of how I messed up and what lessons I learned from it.
# Lesson 1: Document everything
I always tell people that you should document your code. When asked why I answer that you won't
remember what that line does when you have not looked at your codebase for weeks or months.
What I did not realise is that this also applies to administration. I only wrote basic documentation
like a howto for certificate generation or a small troubleshooting guide. This, however, missed the most
important thing to document: the entire infrastructure.
Whenever I needed to look up my port mapping, what did I do? I opened up my *Docker compose* configuration
and search for the port mappings. What did I do when I wanted to know what services I have? Open my
*nginx* configuration and search for `server` directives.
This is a very slow process since I have to remember what services I have behind a reverse proxy and which
ones I have simply exposed. This lead me in the end to creating a folder - called `docs` - in which
I document everything. What certificates are used by what and where they are, port mappings, a graph
showing the dependencies of my services, ... While it may be tedious to create at first, it will really
help.
```
[World]
+
|
+-[443]-[nginx]-+-(blog.polynom.me)
+-(git.polynom.me)-[gitea]
```
Above, you can see an excerpt from my *"network graph"*.
# Lesson 2: Version Control everything
Version Control Systems are a great thing. Want to try something out? Branch, try out and then either
merge back or roll back. Want to find out what changes broke something? Diff the last revisions and narrow
down your "search space". Want to know what you did? View the log.
While it might seem unneccessary, it helps me keep my cool, knowing that if I ever mess up my configuration, I
can just roll back the configuration from within git.
# Lesson 3: Have a test environment
While I was out once, I connected to a public Wifi. There, however, I could not connect to my VPN. It simply
did not work. A bit later, my Jabber client *Conversations* told me that it could not find my server. After
some thinking, I came to the conclusion that the provider of said public Wifi was probably blocking port `5222`
*(XMPP Client-to-Server)* and whatever port the VPN is using. As such, I wanted to change the port my
Jabber server uses. Since I do not have a failover server I tried testing things out locally, but gave up
after some time and just went and "tested in production". Needless to say that this was a bad idea. At first,
*Conversations* did not do a DNS lookup to see the changed XMPP port, which lead me to removing the DNS entry.
However, after some time - probably after the DNS change propagated far enough - *Conversations* said that it
could not find the server, even though it was listening on port `5222`. Testing with the new port yieled
success.
This experience was terrible for me. Not only was it possible that I broke my Jabber server, but it would
annoy everyone I got to install a Jabber client to talk to me as it would display *"Cannot connect to..."*.
If I had tested this locally, I probably would have been much calmer. In the end, I nervously watched as everyone
gradually reconnected...
# Lesson 4: Use tools and write scripts
The first server I ever got I provisioned manually. I mean, back then it made sense: It was a one-time provisioning and nothing should
change after the initial deployment. But now that I have a continually evolving server, I somehow need to document every step in case
I ever need to provision the same server again.
In my case it is *Ansible*. In my playbook I keep all the roles, e.g. *nginx*, *matterbridge*, *prosody*, separate and apply them to my one
server. In there I also made **heavy** use of templates. The reason for it is that before I started my [*"Road to FOSS"*](https://blog.polynom.me/Road-to-Foss.html)
I used a different domain that I had lying around. Changing the domain name manually would have been a very tedious process, so I decided to use
templates from the get-go. To make my life easier in case I ever change domains again, I defined all my domain names based on my `domain` variable.
The domain for git is defined as {% raw %}`git.{{ domain }}`{% endraw %}, the blog one as {% raw %}`blog.{{ domain }}`{% endraw %}.
Additionally, I make use of *Ansible Vaults*, allowing me to have encrypted secrets in my playbook.
During another project, I also set up an *Ansible* playbook. There, however, I did not use templates. I templated the configuration files using a Makefile
that was calling `sed` to replace the patterns. Not only was that a fragile method, it was also unneeded as *Ansible* was already providing
this functionality for me. I was just wasting my own time.
What I also learned was that one *Ansible* playbook is not enough. While it is nice to automatically provision a server using *Ansible*, there are other things
that need to be done. Certificates don't rotate themselves. From that, I derived a rule stating that if a task needs to be done more than once, then it is
time to write a script for it.
# Lesson 4.1: Automate
Closely tied to the last point: If a task needs to be performed, then you should consider creating a cronjob, or a systemd timer if that is more your thing,
to automatically run it. You don't want to enjoy your day, only for it to be ruined by an expired certificate causing issues.
Since automated cronjobs can cause trouble aswell, I decided to run all automated tasks on days at a time during which I am like to be able to react. As such, it is very
important to notify yourself of those automated actions. My certificate rotation, for example, sends me an eMail at the end, telling me if the certificates
were successfully rotated and if not, which ones failed. For those cases, I also keep a log of the rotation process somewhere else so that I can review it.
# Lesson 5: Unexpected things happen
After having my shiny server run for some time, I was happy. It was basically running itself. Until *Conversations* was unable to contact my server,
connected to a public Wifi. This is something that I did not anticipate, but happened nevertheless.
This means that my deployment was not a run-and-forget solution but a constantly evolving system, where small improvements are periodically added.
# Conclusion
I thought I would just write down my thoughts on all the things that went wrong over the course of my self-hosting adventure. They may not
be best practices, but things that really helped me a lot.
Was the entire process difficult? At first. Was the experience an opportunity to learn? Absolutely! Was it fun? Definitely.

View File

@ -0,0 +1,218 @@
+++
title = "Running Prosody on Port 443 Behind traefik"
date = "2020-02-13"
template = "post.html"
aliases = [ "/Running-Prosody-traefik.html" ]
+++
*TL;DR: This post is about running prosody with HTTPS services both on port 443. If you only care about the how, then jump to*
**Considerations** *and read from there.*
<!-- more -->
# Introduction
As part of my [*"road to FOSS"*](https://blog.polynom.me/Road-to-Foss.html) I
set up my own XMPP server using *prosody*. While it has been running fine for
quite some time, I noticed, while connected to a public Wifi, that my
server was unreachable. At that time I was panicing because I thought prosody
kept crashing for some reason. After using my mobile data, however, I saw
that I **could** connect to my server. The only possible explanation I came
up with is that the provider of the public Wifi is blocking anything that
is not port 53, 80 or 443. *(Other ports I did not try)*
My solution: Move *prosody*'s C2S - *Client to Server* - port from 5222 to
either port 53, 80 or 443. Port 53 did not seem like a good choice as I
want to keep myself the possibilty of hosting a DNS server. So the only
choice was between 80 and 443.
# Considerations
Initially I went with port 80 because it would be the safest bet: You cannot
block port 80 while still allowing customers to access the web. This would
have probably worked out, but I changed it to port 443 later-on. The reason
being that I need port 80 for Let's Encrypt challenges. Since I use nginx
as a reverse proxy for most of my services, I thought that I can multiplex
port 80 between LE and *prosody*. This was not possible with nginx.
So I discoverd traefik since it allows such a feat. The only problem is that
it can only route TCP connections based on the
[SNI](https://github.com/containous/traefik/blob/master/docs/content/routing/routers/index.md#rule-1). This requires the
XMPP connection to be encrypted entirely, not after STARTTLS negotiation,
which means that I would have to configure *prosody* to allow such a
connection and not offer STARTTLS.
# Prosody
Prosody has in its documentation no mentions of *direct TLS* which made me
guess that there is no support for it in *prosody*. After, however, asking
in the support group, I was told that this feature is called *legacy_ssl*.
As such, one only has to add
```lua
-- [...]
legacy_ssl_ports = { 5223 }
legacy_ssl_ssl = {
[5223] = {
key = "/path/to/keyfile";
certificate = "/path/to/certificate";
}
}
-- [...]
```
*Note:* In my testing, *prosody* would not enable *legacy_ssl* unless I
explicitly set `legacy_ssl_ports`.
When *prosody* tells you that it enabled `legacy_ssl` on the specified
ports, then you can test the connection by using OpenSSL to connect to it:
`openssl s_client -connect your.domain.example:5223`. OpenSSL should tell
you the data it can get from your certificate.
# traefik
In my configuration, I run *prosody* in an internal *Docker* network. In
order to connect it, in my case port 5223, to the world via port 443, I
configured my traefik to distinguish between HTTPS and XMPPS connections
based on the set SNI of the connection.
To do so, I firstly configured the static configuration to have
port 443 as an entrypoint:
```yaml
# [...]
entrypoints:
https:
address: ":443"
# [...]
```
For the dynamic configuration, I add two routers - one for TCP, one for
HTTPS - that both listen on the entrypoint `https`. As the documentation
[says](https://github.com/containous/traefik/blob/master/docs/content/routing/routers/index.md#general-1),
*"If both HTTP routers and TCP routers listen to the same entry points, the TCP routers will apply before the HTTP routers."*. This means that traefik has
to distinguish the two somehow.
We do this by using the `Host` rule for the HTTP router and `HostSNI` for
the TCP router.
As such, the dynamic configuration looks like this:
```yaml
tcp:
routers:
xmpps:
entrypoints:
- "https"
rule: "HostSNI(`xmpps.your.domain.example`)"
service: prosody-dtls
tls:
passthrough: true
# [...]
services:
prosody-dtls:
loadBalancer:
servers:
- address: "<IP>:5223"
http:
routers:
web-secure:
entrypoints:
- "https"
rule: "Host(`web.your.domain.example`)"
service: webserver
```
It is important to note here, that the option `passthrough` has to be `true`
for the TCP router as otherwise the TLS connection would be terminated by
traefik.
Of course, you can instruct prosody to use port 443 directly, but I prefer
to keep it like this so I can easily see which connection goes to where.
# HTTP Upload
HTTP Upload was a very simple to implement this way. Just add another HTTPS
route in the dynamic traefik configuration to either the HTTP port of
prosody, which would terminate the TLS connection from traefik onwards, or
the HTTPS port, which - if running traefik and prosody on the same host -
would lead to a possible unnecessary re-encryption of the data.
This means that prosody's configuration looks like this:
```lua
[...]
-- Perhaps just one is enough
http_ports = { 5280 }
https_ports = { 5281 }
Component "your.domain"
-- Perhaps just one is required, but I prefer to play it safe
http_external_url = "https://http.xmpp.your.domain"
http_host = "http.xmpp.your.domain"
[...]
```
And traefik's like this:
```yaml
[...]
http:
routers:
prosody-https:
entrypoints:
- "https"
rule: "Host(`http.xmpp.your.domain`)"
service: prosody-http
services:
prosody-http:
loadBalancer:
servers:
- "http://prosody-ip:5280"
[...]
```
# DNS
In order for clients to pick this change up, one has to create a DNS SRV
record conforming to [XEP-0368](https://xmpp.org/extensions/xep-0368.html).
This change takes some time until it reaches the clients, so it would be wise
to keep the regular STARTTLS port 5222 open and connected to prosody until
the DNS entry has propagated to all DNS servers.
# Caveats
Of course, there is nothing without some caveats; some do apply here.
This change does not neccessarilly get applied to all clients automatically.
Clients like *Conversations* and its derivatives, however, do that when they
are reconnecting. Note that there may be clients that do not support XEP-0368
which will not apply this change automatically, like - at least in my
testing - *profanity*.
Also there may be some clients that do not support *direct TLS* and thus
cannot connect to the server. In my case, *matterbridge* was unable to
connect as it, without further investigation, can only connect with either
no TLS or with STARTTLS.
# Conclusion
In my case, I run my *prosody* server like this:
```
<<WORLD>>-------------+
| |
[traefik]-------------/|/--------------+
| | |
{xmpp.your.domain} [5269] {other.your.domain}
[443 -> 5223] | [443 -> 80]
{http.xmpp.your.domain} | |
[443 -> 5280] | |
| | |
[prosody]-------------+ [nginx]
```
As I had a different port for *prosody* initially (80), I had to wait until
the DNS records are no longer cached by other DNS servers or clients. This
meant waiting for the TTL of the record, which in my case were 18000 seconds,
or 5 hours.
The port 5222 is, in my case, not reachable from the outside world but via my
internal *Docker* compose network so that my *matterbridge* bridges still work.

View File

@ -0,0 +1,167 @@
+++
title = "Jekyll Is Cool, But..."
date = "2020-09-29"
template = "post.html"
aliases = [ "/Static-Site-Generator.html" ]
+++
I love static site generators. They are really cool pieces of software.
Give them some configuration files, maybe a bit of text and you receive
a blog or a homepage. Neat!
<!-- more -->
For a long time, I have been using [*Jekyll*](https://github.com/jekyll/jekyll)
as my static site generator of choice. Mostly, because it is one of the
most famous ones out there and thus there are tons of plugins, documentation
and templates to get started. It was nice, until I wished it would do
a bit more...
During some time off, I wanted to do an overhaul of my infrastructure. Make
things cleaner, document more things and finally do those tasks that I have
been pushing aside for quite some time. One of those things is to make all
my webpages, which today only include [this blog](https://git.polynom.me/PapaTutuWawa/blog.polynom.me)
and my [XMPP invite page](https://git.polynom.me/polynom.me/xmpp-invite-web),
share common assets. This got started after I wrote the invitation page
and thought that it looked pretty good.
So off I went to create a subdomain for my own "CDN", generate a TLS
certificate and... I got stuck. I wanted to have *Jekyll* generate two
seperate versions of my pages for me depending on what I wanted to do:
One with local assets for local previewing and testing and one with my
"CDN" attached. As such I would have liked to have three files: `_config.dev.yml`,
`_config.deploy.yml` and `_config.common.yml`, where `_config.common.yml`
contained data shared between both the deployed and the locally developed
version and the other two just contain a variable that either points to a local
folder or my "CDN". However, I have not found a way to do this. Looking back, I perhaps
would have been able to just specify the common config first and then specify another
config file to acomplish this, but now I am in love with another piece of software.
Additionally, I would have liked to integrate the entire webpage building process
more with my favourite build system, *GNU Make*. But *Jekyll* feels like it attempts
to do everything by itself. And this may be true: *Jekyll* tries to do as much as
possible to cater to as many people as possible. As such, *Jekyll* is pretty powerful, until
you want to change things.
## Introducing makesite
While casually browsing the Internet, I came across a small
[*Github* repository](https://github.com/sunainapai/makesite) for a
static page generator. But this one was different. The entire repository was just
a Python script and some files to demonstrate how it works. The script itself was just
232 lines of code. The Readme stated that it did one thing and that the author was not
going to just add features. If someone wanted a new feature, he or she was free to just
add it by themself. Why am I telling you all this? Because this is the - in my opinion -
best static site generator I have ever used.
### Simplicity
*makesite* is very simple. In its upstream version, it just generates user defined pages,
renders a blog from Markdown to HTML and generates a RSS feed. It does templating, but without
using heavy and fancy frameworks like *Jinja2*. The "*Getting Started*" section of *makesite* is
shorter than the ones of other static site generators, like *Jekyll* and *Hugo*.
This may seem like a bad thing. If it does not do thing X, then I cannot use it. But that is where
*makesite*'s beauty comes in. You can just add it. The code is very short, well documented and
extensible. It follows the ["*suckless philosophy*"](https://suckless.org/philosophy/). In my case,
I added support for loading different variables based on the file *makesite* is currently compiling,
copying and merging different asset folders - and ignoring certain files - and specifying variables
on the command line. Would I upstream those changes? Probably not as they are pretty much unique to
my own needs and my own usecase. And that is why *makesite* is so nice: Because it is not *a* static
site generator, it is **your** static site generator.
### Speed
*makesite* is fast... Really fast. In the time my Makefile has compiled my page and tar-balled it,
ready for deployment, *Jekyll* is still building it. And that makes sense, *Jekyll* is pretty powerful
and does a lot. But for me, I do not need all this power. This blog is not so difficult to generate,
my invite page is not difficult to generate, so why would I need all this power?
```
# Jekyll version
> time make build
# [...]
make build 1.45s user 0.32s system 96% cpu 1.835 total
# makesite version
> time make build
# [...]
make build 0.35s user 0.06s system 100% cpu 0.406 total
```
### Buildsystem Integration
In case of *Jekyll*, *Jekyll* pretty much *is* your buildsystem. This is not so great, if you already
have a favourite buildsystem that you would prefer to use, since it does not integrate well. *makesite*, on
the other hand, does just the bare minimum and thus gives the buildsystem much more to work with. In my case,
*makesite* just builds my blog or my other pages. If I want to preview them, then my Makefile just starts a
local webserver with `python -m http.server 8080`. If I want to deploy, then my Makefile tar-balls the resulting
directory.
```makefile
# [...]
serve: ${OPTIMIZED_IMAGES}
python ../shared-assets/makesite.py \
-p params.json \
-v page_assets=/assets \
-v build_time="${BUILD_DATE}" \
--assets ../shared-assets/assets \
--assets ./assets \
--copy-assets \
--ignore ../shared-assets/assets/img \
--ignore assets/img/raw \
--include robots.txt \
--blog \
--rss
cd _site/ && python -m http.server 8080
build: ${OPTIMIZED_IMAGES}
python ../shared-assets/makesite.py \
-p params.json \
-v page_assets=https://cdn.polynom.me \
-v build_time="${BUILD_DATE}" \
--assets ./assets \
--copy-assets \
--ignore assets/img/raw \
--include robots.txt \
--blog \
--rss
tar -czf blog.tar.gz _site
```
This is an excerpt from the Makefile of this blog. It may seem verbose when *Jekyll* does all this
for you, but it gives me quite a lot of power. For example:
- `-v page_assets=...` (only in my version) gives me the ability to either use local assets or my "CDN" for deploying;
- `--copy-assets --assets ./assets` (only in my version) allows me to copy my static assets over, so that everything is ready for deployment. If I want to use all assets, including the shared ones, then I just add another `--assets ../shared-assets/assets` and change the `page_assets` variable;
- conditionally decide if I want a blog and/or an RSS feed with `--blog` and `--rss`
- `-v` allows me to pass variables directly from the commandline so that I can inject build-time data, like e.g. the build date
If I wanted to, I could now also add a minifier on the build target or page signing with [Signed Pages](https://github.com/tasn/webext-signed-pages).
It would be more difficult with *Jekyll*, while it is just adding a command to my Makefile.
Another great thing here is the usage of `${OPTIMIZED_IMAGES}`: In my blog I sometimes use images. Those images have to be loaded and, especially if
they are large, take some time until you can fully see them. I could implement something using JavaScript and make the browser load the images
lazily, but this comes with three drawbacks:
1. It requires JavaScript for loading an image, which is a task that the browser is already good at;
2. Implementing it with JavaScript may lead to content moving around as the images are loaded in, which results in a terrible user experience;
3. Some people may block JavaScript for security and privacy, which would break the site if I were to, for example, write a post that is filled with images for explanations.
The target `${OPTIMIZED_IMAGES}` in my Makefile automatically converts my raw images into progressive JPEGs, if new images are added. However, this
rebuild does not happen every time. It only happens when images are changed or added. Progressive JPEGs are a kind of JPEG where the data can be
continously loaded in from the server, first showing the user a low quality version which progressively gets higher quality. With *Jekyll* I probably
would have had to install a plugin that I can only use with *Jekyll*, while now I can use *imagemagick*, which I have already installed for other
use cases.
## Conclusion
Is *makesite* easy? It depends. If you want to generate a website with blog
that fits exactly the way upstream wrote the script, yes. If you want to do
something different, it becomes more difficult as you then have to patch
*makesite* yourself.
Per default, *makesite* is more limited than other static site generators out
there, but that is, in my opinion, where *makesite*'s versatility and
customizability comes from. From now on, I will only use *makesite* for my
static pages.

View File

@ -0,0 +1,134 @@
+++
title = "About Logging"
date = "2021-04-16"
template = "post.html"
aliases = [ "/About-Logging.html" ]
+++
*TL;DR*: This post also talks about the problems I faced while working on my logging. To log to
syslog from within my containers that do not support configuring a remote syslog server, I had
*syslog-ng* expose a unix domain socket and mounted it into the container to `/dev/log`.
<!-- more -->
## Introduction
I have written a lot of blog posts about the lessons I have learned while setting up and
maintaining my server. But now that I started to rework my infrastructure a bit, I had to
inevitably look at something I may have overlooked in the past: logging!
Previously, I had *Docker* *kind of* manage my logs: If I needed something, I would just
call `docker-compose logs <service>` and it would spit out logs. Then, I started to
configure my services to log to files in various locations: my *prosody* server would
log to `/etc/prosody/logs/info.log`, my *nginx* to `/etc/nginx/logs/error.log`, etc.
This, however, turned out to be problematic
as, in my case, *prosody* stopped logging into the file if I rotated it with *logrotate*. It was
also a bit impractical, as the logs were not all in the same place, but distributed across multiple
directories.
Moreover, *prosody* was logging things that I did not want in my logs but I could not turn off,
like when a client connected or authenticated itself. For me, this is a problem from two perspectives:
On the one hand, it is metadata that does not help me debug an hypothetical issue I have with my
*prosody* installation, on the other hand, it is metadata I straight-up do not want to store.
My solution was using a syslog daemon to process the logs, so that I could remove logs that I do not
want or need, and drop them all off at `/var/log`. However, there was a problem that I faced almost
immediately: Not all software I can configure to log to syslog, I can configure to log to a specific
syslog server. Why is this a problem? Well, syslog does not work inside a *Docker* container out of the
box, so I would have to have my syslog daemon expose a TCP/UDP (unix domain) socket that logs can be sent to. To
see this issue you can try to run `logger -t SomeTag Hello World` inside one of your containers
and try to find it, e.g. in your host's journal.
Today, I found my solution to both syslog logging within the containers and filtering out unneeded logs.
## Syslog inside Containers
The first step was getting the logs out of my containers without using files. To this end, I configured
my syslog daemon - *syslog-ng* - to expose a unix domain socket to, for example, `/var/run/syslog` and
mount it into all containers to `/dev/log`:
```
source s_src {
system();
internal();
unix-dgram("/var/run/syslog");
};
```
If you now try and run `logger -t SomeTag Hello World` inside the container, you should be able
to find "Hello World" inside the host's logs or journals.
## Ignoring Certain Logs
The next step was ignoring logs that I do not need or care about. For this, I set up two logs within
*syslog-ng*: One that was going into my actual log file and one that was dropped:
```
destination d_prosody {
file("/var/log/prosody.log");
};
filter f_prosody {
program("prosody");
};
filter f_prosody_drop {
program("prosody")
and message("(Client connected|Client disconnected|Authenticated as .*|Stream encrypted .*)$");
};
# Drop
log {
source(s_src);
filter(f_prosody_drop);
flags(final);
};
# Log
log {
source(s_src);
filter(f_prosody);
destination(d_prosody);
flags(final);
};
```
This example would log all things that *prosody* logs to the *prosody* location `d_prosody` and drop all
lines that match the given regular expression, which, in my case, matches all lines that relate to a client
connecting, disconnecting or authenticating.
Important is the `flags(final);` in the drop rule to indicate that a log line that matches the rule should
not be processed any further. That log also defines no destination, which tells *syslog-ng* in combination with
the `final` flag that the log
should be dropped.
Additionally, I moved the log rule that matches everything sent to the configured source to the bottom
of the configuration to prevent any of the logs to *also* land in the "everything" log.
Since I also host a *Nextcloud* server, I was also interested in getting rid of HTTP access logs. But I would
also like to know when someone is trying to scan my webserver for vulnerable *wordpress* installations.
So I again defined rules similar to those above, but added a twist:
```
filter f_nextcloud_drop {
program("nextcloud")
and match("200" value(".nextcloud.response"));
};
log {
source(s_src);
parser { apache-accesslog-parser(prefix(".nextcloud.")); };
filter(f_nextcloud_drop);
flags(final);
};
```
As you can see, the rule for my *Nextcloud* is quite similar, except that I added a parser. With this, I can
make *syslog-ng* understand the HTTP access log and expose its parts as variables to my filter rule. There,
I say that my drop rule should match all access log lines that indicate a HTTP response code of 200, since
those are locations on my server that I expect to be accessed and thus do not care about.
## Conclusion
With this setup, I feel much better about the logs I produce. I also have done other things not mentioned, like
configure *logrotate* to rotate my logs daily so that my logs don't grow too large and get removed after a day.
Please note that I am not an expert in *syslog-ng*. It just happend to be what I first got to do what I want. And
the example rules I showed are also the first thing that I wrote and filtered out what I wanted.

View File

@ -0,0 +1,131 @@
+++
title = "Running Prosody on Port 443 Behind traefik 2: Electric ALPN"
date = "2023-07-15"
template = "post.html"
aliases = [ "/prosody-traefik-2.html" ]
# <!-- description: In this blog post, I tell you how I changed my setup for proxying my XMPP server using traefik -->
+++
Hello everyone. Long time, no read.
In 2020, I published a post titled "[Running Prosody on Port 443 Behind traefik](https://blog.polynom.me/Running-Prosody-traefik.html)", where I described how I run my XMPP server
behind the "application proxy" [*traefik*](https://github.com/traefik/traefik).
I did this because I wanted to run my XMPP server *prosody* on port 443, so that the clients connected
to my server can bypass firewalls that only allow web traffic. While that approach worked,
over the last three years I changed my setup dramatically.
<!-- more -->
While migrating my old server from *Debian* to *NixOS*, I decided that I wanted a website
hosted at the same domain I host my XMPP server at. This, however, was not possible with
*traefik* back then because it only allowed the `HostSNI` rule, which differentiates TLS
connections using the sent *Server Name Indication*. This is a problem, because a connection
to `polynom.me` the website and `polynom.me` the XMPP server both result in the same SNI being
sent by a connecting client.
Some time later, I stumbled upon [*sslh*](https://github.com/yrutschle/sslh), which is a
tool similar to *traefik* in that it allows hosting multiple services on the same port, all
differentiated by the SNI **and** the ALPN set by the connecting client. ALPN, or *Application-Layer Protocol Negotiation*, is an extension
to TLS which allows a connecting client to advertise the protocol(s) it would like to use
inside the encrypted session [(source)](https://en.wikipedia.org/wiki/Application-Layer_Protocol_Negotiation). As such, I put
*sslh* in front of my *traefik* and told it to route XMPP traffic (identified with an ALPN
of `xmpp-client`) to my prosody server and everything else to my *traefik* server. While this
worked well, there were two issues:
1. I was not running *sslh* in its ["transparent mode"](https://github.com/yrutschle/sslh/blob/master/doc/config.md#transparent-proxy-support), which uses some fancy iptable rules to allow the services behind it to see a connecting client's real IP address instead of just `127.0.0.1`. However, this requires more setup to work. This is an issue for services which enforce rate limits, like *NextCloud* and *Akkoma*. If one of theses services gets hit by many requests, all the services see are requests from `127.0.0.1` and may thus rate limit (or ban) `127.0.0.1`, meaning that all - even legitimate - requests are rate limited. Additionally, I was not sure if I could just use this to route an incoming IPv6 request to `127.0.0.1`, which is an IPv4 address.
2. One day, as I was updating my server, I noticed that all my web services were responding very slowly. After some looking around, it turned out that *sslh* took about 5 seconds to route IPv6 requests, but not IPv4 requests. As I did not change anything (besides update the server), to this day I am not sure what happened.
Due to these two issues, I decided to revisit the idea I described in my old post.
## The Prosody Setup
On the prosody-side of things, I did not change a lot compared to the old post. I did, however,
migrate from the `legacy_ssl_*` options to the newer `c2s_direct_tls_*` options, which
[replace the former](https://hg.prosody.im/trunk/file/tip/doc/doap.xml#l758).
Thus, my prosody configuration regarding direct TLS connections now looks like this:
```lua
c2s_direct_tls_ports = { 5223 }
c2s_direct_tls_ssl = {
[5223] = {
key = "/etc/prosody/certs/polynom.me.key";
certificate = "/etc/prosody/certs/polynom.me.crt";
};
}
```
## The *Traefik* Setup
On *traefik*-side of things, only one thing really changed: Instead of just having a rule using
`HostSNI`, I now also require that the connection with the XMPP server advertises an ALPN
of `xmpp-client`, which is specified in the
[appropriate XMPP spec](https://xmpp.org/extensions/xep-0368.html). From my deployment
experience, all clients I tested (*Conversations*, *Blabber*, *Gajim*, *Dino*, *Monal*, [Moxxy](https://moxxy.org))
correctly set the ALPN when connecting via a direct TLS connection.
So my *traefik* configuration now looks something like this (Not really, because I let NixOS
generate the actual config, but it is very similar):
```yaml
tcp:
routers:
xmpps:
entrypoints:
- "https"
rule: "HostSNI(`polynom.me`) && ALPN(`xmpp-client`)"
service: prosody
tls:
passthrough: true
# [...]
services:
prosody:
loadBalancer:
servers:
- address: "127.0.0.1:5223"
http:
routers:
web-secure:
entrypoints:
- "https"
rule: "Host(`polynom.me`)"
service: webserver
tls:
```
The entrypoint `https` is just set to listen on `:443`. This way, I can route IPv4 and IPv6
requests. Also note the `passthrough: true` in the XMPP router's `tls` settings. If this is
not set to `true`, then *traefik* would terminate the connection's TLS session before passing
the data to the XMPP server.
However, this config has one really big issue: In order
to have the website hosted at `polynom.me` be served using TLS, I have to set the
router's `tls` attribute. The *traefik*
documentation says that "*If both HTTP routers and TCP routers listen to the
same entry points, the TCP routers will apply before the HTTP routers. If no matching route
is found for the TCP routers, then the HTTP routers will take over.*"
[(source)](https://doc.traefik.io/traefik/routing/routers/#general_1).
This, however, does not seem to be the case if a HTTP router (in my example with ```Host(`polynom.me`)```) and a TCP router (in my example with ```HostSNI(`polynom.me`)```) respond to the same
SNI **and** the HTTP router has its `tls` attribute set. In that case, the HTTP router appears
to be checked first and will complain, if the sent ALPN is not one of the
[HTTP ALPNs](https://developer.mozilla.org/en-US/docs/Glossary/ALPN), for example when
connecting using XMPP. As such we can connect to the HTTP server but not to the
XMPP server.
It appears to be an issue that [I am not alone with](https://github.com/traefik/traefik/issues/9922), but also
one that is not fixed. So I tried digging around in *traefik*'s code and tried a couple of
things. So for my setup to work, I have to apply [this patch](https://github.com/PapaTutuWawa/traefik/commit/36f0e3c805ca4e645f3313f667a6b3ff5e2fe4a9) to *traefik*. With that, the issue *appears*
to be gone, and I can access both my website and my XMPP server on the same domain and on the
same port. Do note that this patch is not upstreamed and may break things. For me, it
works. But I haven't run extensive tests or *traefik*'s integration and unit tests.
## Conclusion
This approach solves problem 2 fully and problem 1 partially. *Traefik* is able to route
the connections correctly with no delay, compared to *sslh*. It also provides my web services
with the connecting clients' IP addresses using HTTP headers. It does not, however, provide
my XMPP server with a connecting client's IP address. This could be solved with some clever
trickery, like telling *traefik* to use the [*PROXY* protocol](https://doc.traefik.io/traefik/routing/services/#proxy-protocol) when connecting to prosody,
and enabling the [`net_proxy`](https://modules.prosody.im/mod_net_proxy.html) module. However,
I have not yet tried such a setup, though I am very curious and may try that out.

View File

@ -0,0 +1,108 @@
+++
title = "Signing Android Apps Using a YubiKey (on NixOS)"
date = "2023-07-24"
template = "post.html"
aliases = [ "/Android-Yubikey-Signing.html" ]
+++
In my spare time, I currently develop two Android apps using *Flutter*: [AniTrack](https://codeberg.org/PapaTutuWawa/anitrack), a
simple anime and manga tracker based on my own needs, and [Moxxy](https://moxxy.org), a modern XMPP
client. While I don't provide release builds for AniTrack, I do for Moxxy. Those
are signed using the key-pair that Flutter generates. I thought to myself: "Wouldn't it be cool if I could keep
the key-pair on a separate device which does the signing for me?". The consequence
of this thought is that I bought a *YubiKey 5c*. However, as always, using it for my
purposes did not go without issues.
<!-- more -->
The first issue is that the official [*Android* documentation](https://developer.android.com/build/building-cmdline#deploy_from_bundle)
says to use the `apksigner` tool for creating the signature. [The *YubiKey* documentation](https://developers.yubico.com/PIV/Guides/Android_code_signing.html), however,
uses `jarsigner`. While I, at first, did not think much of it, *Android* has
[different versions of the signature algorithm](https://source.android.com/docs/security/features/apksigning/): `v1` (what `jarsigner` does), `v2`, `v3`, `v3.1` and
`v4`. While it seems like it would be no problem to just use `v1` signatures, *Flutter*, by default,
generates `v1` and `v2` signatures, so I thought that I should keep it like that.
So, the solution is to just use `apksigner` instead of `jarsigner`, like [another person on the Internet](https://geoffreymetais.github.io/code/key-signing/) did.
But that did not work for me. Running `apksigner` like that makes it complain that `apksigner` cannot
access the required `sun.security.pkcs11.SunPKCS11` Java class.
```
> /nix/store/ib27l0593bi4ybff06ndhpb8gyhx5zfv-android-sdk-env/share/android-sdk/build-tools/34.0.0/apksigner sign \
--ks NONE \
--ks-pass "pass:<YubiKey PIN>" \
--provider-class sun.security.pkcs11.SunPKCS11 \
--provider-arg ./provider.cfg \
--ks-type PKCS11 \
--min-sdk-version 24 \
--max-sdk-version 34 \
--in unsigned.apk \
--out signed.apk
Exception in thread "main" java.lang.IllegalAccessException: class com.android.apksigner.ApkSignerTool$ProviderInstallSpec cannot access class sun.security.pkcs11.SunPKCS11 (in module jdk.crypto.cryptoki) because module jdk.crypto.cryptoki does not export sun.security.pkcs11 to unnamed module @75640fdb
at java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
at java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:489)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at com.android.apksigner.ApkSignerTool$ProviderInstallSpec.installProvider(ApkSignerTool.java:1233)
at com.android.apksigner.ApkSignerTool$ProviderInstallSpec.access$200(ApkSignerTool.java:1201)
at com.android.apksigner.ApkSignerTool.sign(ApkSignerTool.java:343)
at com.android.apksigner.ApkSignerTool.main(ApkSignerTool.java:92)
```
It may only be an issue because I use NixOS, as I
cannot find another instance of someone else having this issue. But I still want my APK signed using the key-pair
on my *YubiKey*. After a lot of trial and error, I found out that I can force Java to export certain classes
using the `--add-exports` flag. Since `apksigner` complained that the security classes are not exported to its
unnamed class, I had to specify `--add-exports sun.security.pkcs11.SunPKCS11=ALL-UNNAMED`.
## My Setup
TL;DR: I wrapped this entire setup (minus the Gradle config as that's a per-project thing) into a fancy [script](https://codeberg.org/PapaTutuWawa/bits-and-bytes/src/branch/master/src/flutter/build.sh).
My provider configuration for the signature is exactly like the one provided in [previously mentioned blog post](https://geoffreymetais.github.io/code/key-signing/#set-up-your-own-management-key),
with the difference that I cannot use the specified path to the `opensc-pkcs11.so` as I am on NixOS, where such
paths are not used. So in my setup, I either use the Nix REPL to build the derivation for `opensc` and then
use its `lib/opensc-pkcs11.so` path (`/nix/store/h2bn9iz4zqzmkmmjw9b43v30vhgillw4-opensc-0.22.0` in this case) for testing or, as
used in [AniTrack](https://codeberg.org/PapaTutuWawa/anitrack/src/branch/master/flake.nix), let Nix figure out the path by building
the config file from within my Nix Flake:
```nix
{
# ...
providerArg = pkgs.writeText "provider-arg.cfg" ''
name = OpenSC-PKCS11
description = SunPKCS11 via OpenSC
library = ${pkgs.opensc}/lib/opensc-pkcs11.so
slotListIndex = 0
'';
# ...
}
```
Next, to force Java to export the `sun.security.pkcs11.SunPKCS11` class to `apksigner`'s unnamed class, I added `--add-exports sun.security.pkcs11.SunPKCS11`
to the Java command line. There are two ways of doing this:
1. Since `apksigner` is just a wrapper script around calling `apksigner.jar`, we could patch the wrapper script to include this parameter.
2. Use the wrapper script's built-in mechanism to pass arguments to the `java` command.
While option 1 would work, it would require, in my case, to override the derivation that builds my Android SDK environment, which I am not that fond of.
Using `apksigner`'s way of specifying Java arguments (`-J`) is much easier. However, there is a little trick to it: When you pass `-Jsomething` to `apksigner`,
the wrapper scripts transforms it to `java -something`. As such, we cannot pass `-Jadd-exports sun.security.pkcs11.SunPKCS11` because it would get transformed
to `java -add-exports sun.security.[...]`, which is not what we want. To work around this, I quote the entire parameter to trick Bash into thinking that I'm
passing a single argument: `-J"-add-exports sun.security.pkcs11.SunPKCS11"`. This makes the wrapper append `--add-exports sun.security.pkcs11.SunPKCS11` to the
Java command line, ultimately allowing me to sign unsigned Android APKs with the key-pair on my *YubiKey*.
Since signing a signed APK makes little sense, we also need to tell Gradle to *not* sign the APK. In the case of Flutter apps, I modified the `android/app/build.gradle`
file to use a null signing config:
```gradle
android {
// ...
buildTypes {
release {
// This prevents Gradle from signing release builds.
// I don't care what happens to debug builds as I'm not distributing them.
signingConfig null
}
}
}
```

5
content/_index.md Normal file
View File

@ -0,0 +1,5 @@
+++
title = "test"
sort_by = "date"
template = "index.html"
+++

61
flake.lock Normal file
View File

@ -0,0 +1,61 @@
{
"nodes": {
"flake-utils": {
"inputs": {
"systems": "systems"
},
"locked": {
"lastModified": 1701680307,
"narHash": "sha256-kAuep2h5ajznlPMD9rnQyffWG8EM/C73lejGofXvdM8=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "4022d587cbbfd70fe950c1e2083a02621806a725",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1704161960,
"narHash": "sha256-QGua89Pmq+FBAro8NriTuoO/wNaUtugt29/qqA8zeeM=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "63143ac2c9186be6d9da6035fa22620018c85932",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixpkgs-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"root": {
"inputs": {
"flake-utils": "flake-utils",
"nixpkgs": "nixpkgs"
}
},
"systems": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
}
},
"root": "root",
"version": 7
}

27
flake.nix Normal file
View File

@ -0,0 +1,27 @@
{
description = "blog.polynom.me";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
flake-utils.url = "github:numtide/flake-utils";
};
outputs = { self, nixpkgs, flake-utils }: flake-utils.lib.eachDefaultSystem (system: let
pkgs = import nixpkgs {
inherit system;
};
tailwindWithTypography = (pkgs.nodePackages.tailwindcss.overrideAttrs (old: {
plugins = with pkgs.nodePackages; [
"@tailwindcss/typography"
];
}));
in {
devShell = pkgs.mkShell {
buildInputs = with pkgs; [
tailwindWithTypography
zola
imagemagick
];
};
});
}

28
input.css Normal file
View File

@ -0,0 +1,28 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
@layer base {
article > p > a, h1, h2, h3, h4, h5, h6 {
@apply text-indigo-400 !important;
}
article > p > strong, code {
@apply text-white !important;
}
article > h1, h2, h3, h4, h5, h6 {
@apply text-indigo-400 !important;
}
body {
background-color: #212121;
}
html {
@apply text-white;
}
a {
@apply text-indigo-400 !important;
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

BIN
static/img/avatar.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

BIN
static/img/serial-cable.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 252 KiB

19
static/js/MathJax/MathJax.js vendored Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

10
tailwind.config.js Normal file
View File

@ -0,0 +1,10 @@
module.exports = {
content: [
"./templates/*.html",
],
theme: {
},
plugins: [
require('@tailwindcss/typography'),
],
}

49
templates/base.html Normal file
View File

@ -0,0 +1,49 @@
<!doctype html>
<html lang="en-gb">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link href="{{ get_url(path="css/index.css") }}" rel="stylesheet" integrity="sha384-{{ get_hash(path="css/index.css", sha_type=384, base64=true) | safe }}" />
{% block rss %}
<link rel="alternate" type="application/rss+xml" title="blog.polynom.me Atom feed" href="{{ get_url(path="atom.xml", trailing_slash=false) }}">
{% endblock %}
{% if page %}
<meta property="og:description" content="{{ page.description }}" />
<meta property="og:title" content="{{ page.title }}" />
<title>{{ page.title }}</title>
{% else %}
<meta property="og:description" content="{{ config.description }}" />
<meta property="og:title" content="{{ config.title }}" />
<title>{{ config.title }}</title>
{% endif %}
{% if page %}
{% if page.extra.mathjax %}
<script type='text/javascript' async src='{{ get_url(path="js/MathJax/MathJax.js") }}?config=TeX-AMS_CHTML' integrity="sha384-{{ get_hash(path="js/MathJax/MathJax.js", sha_type=384, base64=true) | safe }}"></script>
<script type='text/x-mathjax-config'>MathJax.Hub.Config({'CommonHTML': {scale: 100}, tex2jax: {inlineMath: [['$','$']]}});</script>
{% endif %}
{% endif %}
</head>
<body>
<div class="flex flex-col p-2 md:p-8 items-start md:w-4/5 mx-auto">
<!-- Header -->
<div class="flex flex-row self-center">
<img class="w-12 h-12 md:w-24 md:h-24 rounded-lg" src="{{ get_url(path="img/avatar.jpg") }}" integrity="sha384-{{ get_hash(path="img/avatar.jpg", sha_type=384, base64=true) | safe }}" alt="Profile picture"/>
<div class="ml-4 self-center">
<a class="self-center text-2xl font-bold" href="/">PapaTutuWawa's Blog</a>
<ul class="list-none">
<li class="inline mr-8"><a href="/">Posts</a></li>
<li class="inline mr-8"><a href="{{ get_url(path="atom.xml", trailing_slash=false) }}">RSS</a></li>
<li class="inline mr-8"><a href="https://polynom.me">About</a></li>
</ul>
</div>
</div>
{% block content %}{% endblock %}
</div>
</body>
</html>

19
templates/index.html Normal file
View File

@ -0,0 +1,19 @@
{% extends "base.html" %}
{% block content %}
<!-- Container for posts -->
<div class="mx-auto">
{% for page in section.pages %}
<!-- Post item -->
<div class="flex flex-col pt-4">
<a href="{{ page.permalink | safe }}"><h1 class="text-indigo-400 prose prose-lg text-xl">{{ page.title }}</h1></a>
<span class="text-md mt-2">Posted on {{ page.date }}</span>
<!-- Blurp -->
<span class="prose text-white mt-4">
{{ page.summary | safe }}
</span>
</div>
{% endfor %}
</div>
{% endblock content %}

30
templates/post.html Normal file
View File

@ -0,0 +1,30 @@
{% extends "base.html" %}
{% block content %}
<!-- Container for posts -->
<div class="mx-auto mt-4 w-full md:max-w-prose">
<h1 class="text-indigo-400 text-3xl">{{ page.title }}</h1>
<span class="text-md mt-2">Posted on {{ page.date }}</span>
{% if page.extra.mathjax %}
<div class="mt-6">
<div class="prose lg:prose-lg text-md text-white">NOTE: This post uses the JavaScript library MathJax to render math equations</div>
</div>
{% endif %}
<!-- Actual article -->
<article class="prose lg:prose-lg text-white mt-4">
{{ page.content | safe }}
</article>
<!-- Common post footer -->
<div class="mt-6">
<span class="prose lg:prose-lg text-md text-white">
If you have any questions or comments, then feel free to send me an email (Preferably with GPG encryption)
to {{ config.extra.email.user }} [at] {{ config.extra.email.domain }} or reach out to me on the Fediverse at <a href="https://{{ config.extra.fedi.url }}">{{ config.extra.fedi.handle }}</a>.
</span>
</div>
</div>
{% endblock content %}