203 lines
16 KiB
HTML
203 lines
16 KiB
HTML
|
<!doctype html>
|
||
|
<html lang="en-gb">
|
||
|
<head>
|
||
|
<meta charset="UTF-8" />
|
||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
|
<link href="https://blog.polynom.me/css/index.css" rel="stylesheet" integrity="sha384-R7KUcezOBiIXJ95JUBiXFdX0mMReehb8omi2xIGyZ6mbgXtQ3spxTx4c9BfffIA8" />
|
||
|
|
||
|
|
||
|
<link rel="alternate" type="application/rss+xml" title="blog.polynom.me Atom feed" href="https://blog.polynom.me/atom.xml">
|
||
|
|
||
|
|
||
|
|
||
|
<meta property="og:description" content="" />
|
||
|
<meta property="og:title" content="Mainline Hero Part 1 - First Attempts At Porting" />
|
||
|
<title>Mainline Hero Part 1 - First Attempts At Porting</title>
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
</head>
|
||
|
<body>
|
||
|
<div class="flex flex-col p-2 md:p-8 items-start md:w-4/5 mx-auto">
|
||
|
<!-- Header -->
|
||
|
<div class="flex flex-row self-center">
|
||
|
<img class="w-12 h-12 md:w-24 md:h-24 rounded-lg" src="https://blog.polynom.me/img/avatar.jpg" integrity="sha384-uiNteVXosQ2+o/izp41L1G9VwuwYDYCOPxzFWks058DMUhW7KfQXcipM7WqgSgEZ" alt="Profile picture"/>
|
||
|
<div class="ml-4 self-center">
|
||
|
<a class="self-center text-2xl font-bold" href="/">PapaTutuWawa's Blog</a>
|
||
|
|
||
|
<ul class="list-none">
|
||
|
<li class="inline mr-8"><a href="/">Posts</a></li>
|
||
|
<li class="inline mr-8"><a href="https://blog.polynom.me/atom.xml">RSS</a></li>
|
||
|
<li class="inline mr-8"><a href="https://polynom.me">About</a></li>
|
||
|
</ul>
|
||
|
</div>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
<!-- Container for posts -->
|
||
|
<div class="mx-auto mt-4 w-full md:max-w-prose">
|
||
|
<h1 class="text-indigo-400 text-3xl">Mainline Hero Part 1 - First Attempts At Porting</h1>
|
||
|
|
||
|
<span class="text-md mt-2">Posted on 2019-08-21</span>
|
||
|
|
||
|
|
||
|
|
||
|
<!-- Actual article -->
|
||
|
<article class="prose lg:prose-lg text-white mt-4">
|
||
|
<p>In the first post of the series, I showed what information I gathered and what tricks can be used
|
||
|
to debug our mainline port of the <em>herolte</em> kernel. While I learned a lot just by preparing for
|
||
|
the actual porting, I was not able to actually get as close as to booting the kernel. I would have
|
||
|
liked to write about what I did to <em>actually</em> boot a <em>5.X.X</em> kernel on the device, but instead I will tell you
|
||
|
about the journey I completed thus far.</p>
|
||
|
<span id="continue-reading"></span>
|
||
|
<p>If you are curious about the progress I made, you can find the patches [here]({{ site.social.git_url}}/herolte-mainline). The first patches I produced are in the <code>patches/</code> directory, while the ones I created with lower
|
||
|
expectations are in the <code>patches_v2/</code> directory. Both "patchsets" are based on the <code>linux-next</code> source.</p>
|
||
|
<h2 id="starting-out">Starting Out</h2>
|
||
|
<p>My initial expectations about mainlining were simple: <em>The kernel should at least boot and then perhaps
|
||
|
crash in some way I can debug</em>.</p>
|
||
|
<p>This, however, was my first mistake: Nothing is that easy! Ignoring this, I immeditately began writing
|
||
|
up a <em>Device Tree</em> based on the original downstream source. This was the first big challenge as the amount of
|
||
|
downstream <em>Device Tree</em> files is overwhelming:</p>
|
||
|
<pre style="background-color:#2b303b;color:#c0c5ce;"><code><span>$ wc -l exynos* | awk -F\ '{print $1}' | awk '{sum += $1} END {print sum}'
|
||
|
</span><span>54952
|
||
|
</span></code></pre>
|
||
|
<p>But I chewed through most of them by just looking for interesting nodes like <code>cpu</code> or <code>memory</code>, after which
|
||
|
I transfered them into a new simple <em>Device Tree</em>. At this point I learned that the <em>Github</em> search does not
|
||
|
work as well as I thought it does. It <strong>does</strong> find what I searched for. But only sometimes. So how to we find
|
||
|
what we are looking for? By <em>grep</em>-ping through the files. Using <code>grep -i -r cpu .</code> we are able to search
|
||
|
a directory tree for the keyword <code>cpu</code>. But while <em>grep</em> does a wonderful job, it is kind of slow. So at that
|
||
|
point I switched over to a tool called <code>ripgrep</code> which does these searches a lot faster than plain-old grep.</p>
|
||
|
<p>At some point, I found it very tiring to search for nodes; The reason being that I had to search for specific
|
||
|
nodes without knowing their names or locations. This led to the creation of a script which parses a <em>Device Tree</em>
|
||
|
while following includes of other <em>Device Tree</em> files, allowing me to search for nodes which have, for example, a
|
||
|
certain attribute set. This script is also included in the "patch repository", however, it does not work perfectly.
|
||
|
It finds most of the nodes but not all of them but was sufficient for my searches.</p>
|
||
|
<p>After finally having the basic nodes in my <em>Device Tree</em>, I started to port over all of the required nodes
|
||
|
to enable the serial interface on the SoC. This was the next big mistake I made: I tried to do too much
|
||
|
without verifiying that the kernel even boots. This was also the point where I learned that the <em>Device Tree</em>
|
||
|
by itself doesn't really do anything. It just tells the kernel how the SoC looks like so that the correct
|
||
|
drivers can be loaded and initialized. So I knew that I had to port drivers from the downstream kernel into the
|
||
|
mainline kernel. The kernel identifies the corresponding driver by looking at the data that the drivers
|
||
|
expose.</p>
|
||
|
<pre style="background-color:#2b303b;color:#c0c5ce;"><code><span>[...]
|
||
|
</span><span>static struct of_device_id ext_clk_match[] __initdata = {
|
||
|
</span><span> { .compatible = "samsung,exynos8890-oscclk", .data = (void *)0, },
|
||
|
</span><span>};
|
||
|
</span><span>[...]
|
||
|
</span></code></pre>
|
||
|
<p>This is an example from the <a href="https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/drivers/clk/samsung/clk-exynos8890.c#L122">clock driver</a> of the downstream kernel.
|
||
|
When the kernel is processing a node of the <em>Device Tree</em> it looks for a driver that exposes the same
|
||
|
compatible attribute. In this case, it would be the <em>Samsung</em> clock driver.</p>
|
||
|
<p>So at this point I was wildly copying over driver code into the mainline kernel. As I forgot this during the
|
||
|
porting attempt, I am
|
||
|
mentioning my mistake again: I never thought about the possibility that the kernel would not boot at all.</p>
|
||
|
<p>After having "ported" the driver code for the clock and some other devices I decided to try and boot the
|
||
|
kernel. Having my phone plugged into the serial adapter made my terminal show nothing. So I went into the
|
||
|
<em>S-Boot</em> console to poke around. There I tried some commands in the hope that the bootloader would initialize
|
||
|
the hardware for me so that it magically makes the kernel boot and give me serial output. One was especially
|
||
|
interesting at that time: The name made it look like it would test whether the processor can do <strong>SMP</strong> -
|
||
|
<strong>S</strong>ymmetric <strong>M</strong>ulti<strong>p</strong>rocessing; <em>ARM</em>'s version of <em>Intel</em>'s <em>Hyper Threading</em> or <em>AMD</em>'s <em>SMT</em>.
|
||
|
By continuing to boot, I got some output via the serial interface! It was garbage data, but it was data. This
|
||
|
gave me some hope. However, it was just some data that was pushed by something other than the kernel. I checked
|
||
|
this hypothesis by installing the downstream kernel, issuing the same commands and booting the kernel.</p>
|
||
|
<h2 id="back-to-the-drawing-board">Back To The Drawing Board</h2>
|
||
|
<p>At this point I was kind of frustrated. I knew that this endeavour was going to be difficult, but I immensely
|
||
|
underestimated it.</p>
|
||
|
<p>After taking a break, I went back to my computer with a new tactic: Port as few things as possible, confirm that
|
||
|
it boots and then port the rest. This was inspired by the way the <em>Galaxy Nexus</em> was mainlined in
|
||
|
<a href="https://postmarketos.org/blog/2019/06/23/two-years/">this</a> blog post.</p>
|
||
|
<p>What did I do this time? The first step was a minimal <em>Device Tree</em>. No clock nodes. No serial nodes. No
|
||
|
GPIO nodes. Just the CPU, the memory and a <em>chosen</em> node. Setting the <code>CONFIG_PANIC_TIMEOUT</code>
|
||
|
<a href="https://cateee.net/lkddb/web-lkddb/PANIC_TIMEOUT.html">option</a> to 5, waiting at least 15 seconds and seeing
|
||
|
no reboot, I was thinking that the phone did boot the mainline kernel. But before getting too excited, as I
|
||
|
kept in mind that it was a hugely difficult endeavour, I asked in <em>postmarketOS</em>' mainline Matrix channel whether it could happen that the phone panics and still does not reboot. The answer I got
|
||
|
was that it could, indeed, happen. It seems like the CPU does not know how to shut itself off. On the x86 platform, this
|
||
|
is the task of <em>ACPI</em>, while on <em>ARM</em> <a href="https://linux-sunxi.org/PSCI"><em>PSCI</em></a>, the <strong>P</strong>ower <strong>S</strong>tate
|
||
|
<strong>C</strong>oordination <strong>I</strong>nterface, is responsible for it. Since the mainline kernel knows about <em>PSCI</em>, I wondered
|
||
|
why my phone did not reboot. As the result of some thinking I thought up 3 possibilities:</p>
|
||
|
<ol>
|
||
|
<li>The kernel boots just fine and does not panic. Hence no reboot.</li>
|
||
|
<li>The kernel panics and wants to reboot but the <em>PSCI</em> implementation in the downstream kernel differs from the mainline code.</li>
|
||
|
<li>The kernel just does not boot.</li>
|
||
|
</ol>
|
||
|
<p>The first possibility I threw out of the window immeditately. It was just too easy. As such, I began
|
||
|
investigating the <em>PSCI</em> code. Out of curiosity, I looked at the implementation of the <code>emergency_restart</code>
|
||
|
function of the kernel and discovered that the function <code>arm_pm_restart</code> is used on <em>arm64</em>. Looking deeper, I
|
||
|
found out that this function is only set when the <em>Device Tree</em> contains a <em>PSCI</em> node of a supported version.
|
||
|
The downstream node is compatible with version <code>0.1</code>, which does not support the <code>SYSTEM_RESET</code> functionality
|
||
|
of <em>PSCI</em>. Since I could just turn off or restart the phone when using <em>Android</em> or <em>postmarketOS</em>, I knew
|
||
|
that there is something that just works around old firmware.</p>
|
||
|
<p>The downstream <a href="https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/arch/arm64/boot/dts/exynos8890.dtsi#L316"><em>PSCI</em> node</a> just specifies that it is compatible with <code>arm,psci</code>, so
|
||
|
how do I know that it is only firmware version <code>0.1</code> and how do I know of this <code>SYSTEM_RESET</code>?</p>
|
||
|
<p>If we grep for the compatible attribute <code>arm,psci</code> we find it as the value of the <code>compatible</code> field in the
|
||
|
source file <code>arch/arm64/kernel/psci.c</code>. It <a href="https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/arch/arm64/kernel/psci.c#L381">specifies</a> that the exact attribute of <code>arm,psci</code>
|
||
|
results in a call to the function <code>psci_0_1_init</code>. This indicates a version of <em>PSCI</em>. If we take a look
|
||
|
at <em>ARM</em>'s <a href="http://infocenter.arm.com/help/topic/com.arm.doc.den0022d/Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf"><em>PSCI</em> documentation</a>
|
||
|
we find a section called <em>"Changes in PSCIv0.2 from first proposal"</em> which contains the information that,
|
||
|
compared to version 0.2, the call <code>SYSTEM_RESET</code> was added. Hence we can guess that the <em>Exynos8890</em> SoC
|
||
|
comes with firmware which only supports this version 0.1 of <em>PSCI</em>.</p>
|
||
|
<p>After a lot of searching, I found a node called <code>reboot</code> in the <a href="https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/arch/arm64/boot/dts/exynos8890.dtsi#L116">downstream source</a>.
|
||
|
The compatible driver for it is within the <a href="https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/drivers/soc/samsung/exynos-reboot.c"><em>Samsung</em> SoC</a> driver code.</p>
|
||
|
<p>Effectively, the way this code reboots the SoC, is by mapping the address of the PMU, which I guess stands for
|
||
|
<em>Power Management Unit</em>, into memory and writing some value
|
||
|
to it. This value is probably the command which tells the PMU to reset the SoC.
|
||
|
In my "patchset" <em>patches_v2</em> I have ported this code. Testing it with the downstream kernel, it
|
||
|
made the device do something. Although it crashed the kernel, it was enough to debug.</p>
|
||
|
<p>To test the mainline kernel, I added an <code>emergency_restart</code> at the beginning of the <code>start_kernel</code> function.
|
||
|
The result was that the device did not do anything. The only option I had left was 3; the kernel does not even
|
||
|
boot.</p>
|
||
|
<p>At this point I began investigating the <code>arch/arm64/</code> code of the downstream kernel more closely. However, I
|
||
|
noticed something unrelated during a kernel build: The downstream kernel logs something with <em>FIPS</em> at the
|
||
|
end of the build. Grepping for it resulted in some code at <a href="https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/scripts/link-vmlinux.sh#L253">the end</a> of the <code>link-vmlinuz.sh</code> script. I thought
|
||
|
that it was signing the kernel with a key in the repo, but it probably is doing something else. I tested
|
||
|
whether the downstream kernel boots without these crypto scripts and it did.</p>
|
||
|
<p>The only thing I did not test was whether the kernel boots without
|
||
|
<a href="https://github.com/ivanmeler/android_kernel_samsung_herolte/blob/lineage-15.1/scripts/link-vmlinux.sh#L270">"double-checking [the] jopp magic"</a>. But by looking at this script, I noticed another interesting thing:
|
||
|
<code>CONFIG_RELOCATABLE_KERNEL</code>. By having just a rough idea of what this config option enables, I removed it
|
||
|
from the downstream kernel and tried to boot. But the kernel did not boot. This meant that this option
|
||
|
was required for booting the kernel. This was the only success I can report.</p>
|
||
|
<p>By grepping for this config option I found the file <code>arch/arm64/kernel/head.S</code>. I did not know what it was
|
||
|
for so I searched the internet and found a <a href="https://unix.stackexchange.com/questions/139297/what-are-the-two-head-s-files-in-linux-source">thread</a>
|
||
|
on <em>StackOverflow</em> that explained that the file
|
||
|
is prepended onto the kernel and executed before <code>start_kernel</code>. I mainly investigated this file, but in
|
||
|
hindsight I should have also looked more at the other occurences of the <code>CONFIG_RELOCATABLE_KERNEL</code> option.</p>
|
||
|
<p>So what I did was try and port over code from the downstream <code>head.S</code> into the mainline <code>head.S</code>. This is
|
||
|
the point where I am at now. I did not progress any further as I am not used to assembly code or <em>ARM</em>
|
||
|
assembly, but I still got some more hypotheses as to why the kernel does not boot.</p>
|
||
|
<ol>
|
||
|
<li>For some reason the CPU never reaches the instruction to jump to <code>start_kernel</code>.</li>
|
||
|
<li>The CPU fails to initialize the MMU or some other low-level component and thus cannot jump into <code>start_kernel</code>.</li>
|
||
|
</ol>
|
||
|
<p>At the moment, option 2 seems the most likely as the code from the downstream kernel and the mainline kernel
|
||
|
do differ some and I expect that <em>Samsung</em> added some code as their MMU might have some quirks that the
|
||
|
mainline kernel does not address. However, I did not have the chance to either confirm or deny any of these
|
||
|
assumptions.</p>
|
||
|
<p>As a bottom line, I can say that the most useful, but in my case most ignored, thing I learned is patience.
|
||
|
During the entire porting process I tried to do as much as I can in the shortest amount of time possible.
|
||
|
However, I quickly realized that I got the best ideas when I was doing something completely different. As
|
||
|
such, I also learned that it is incredibly useful to always have a piece of paper or a text editor handy
|
||
|
to write down any ideas you might have. You never know what might be useful and what not.</p>
|
||
|
<p>I also want to mention that I used the <a href="https://elixir.bootlin.com/linux/latest/source"><em>Bootlin Elixir Cross Referencer</em></a>
|
||
|
a lot. It is a very useful tool to use when exploring the kernel source tree. However, I would still
|
||
|
recommend to have a local copy so that you can very easily grep through the code and find things that
|
||
|
neither <em>Github</em> nor <em>Elixir</em> can find.</p>
|
||
|
|
||
|
</article>
|
||
|
|
||
|
<!-- Common post footer -->
|
||
|
<div class="mt-6">
|
||
|
<span class="prose lg:prose-lg text-md text-white">
|
||
|
If you have any questions or comments, then feel free to send me an email (Preferably with GPG encryption)
|
||
|
to papatutuwawa [at] polynom.me or reach out to me on the Fediverse at <a href="https://social.polynom.me/papatutuwawa">@papatutuwawa@social.polynom.me</a>.
|
||
|
</span>
|
||
|
</div>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
</div>
|
||
|
</body>
|
||
|
</html>
|