In this article, we’ll talk about building up a tiny (micro) Linux “distribution” from scratch. This distribution really won’t do much, but it will be built from scratch.
We will build the Linux kernel on our own, and write some software to package our micro-distro.
Lastly, we are doing this example on the RISC-V architecture, specifically QEMU’s riscv64 virt machine. There’s very little in this article that is specific to this architecture, so you might as well do an almost identical exercise for other architectures like x86. We recently went through the RISC-V boot process with SBI and bare metal programming for RISC-V, so this is just a continuation up the software stack.
Warning: This article is a very simplified view of a Linux distribution. There are things written below that are not 100% accurate, but more like 99.9%. This article is meant for beginners and helping them form a basic mental framework for understanding Linux systems. More advanced users may be triggered by over-simplification in some parts.
Table of contents
Open Table of contents
What is an OS kernel?
Let’s assume we’re working on a single-core machine. They’re still around us, maybe not in our laptops and phones, but in some smaller devices, and historically they have been actually widely used even in our “big” personal devices like desktops. The latter ones have been capable of running multiple program simultaneously for many years, even as single cores. We’ll get into what simultaneous really means in a bit, but for now let’s just note that the one of the operating system kernel’s big tasks is to make that happen.
If you go back to the articles about bare metal programming and SBI on RISC-V, you can see how at the lowest layers of software we interact with our I/O devices. It usually (most often, but not necessarily always) boils down to the CPU writing some data at the appropriate address. Imagine if the application developers had to keep all these addresses in mind and they had to know which values exactly to send to those addresses! That would mean we’d have far fewer applications today, but we don’t, and that’s owing to the operating system kernels which abstract away these details and provide some simple high-level interfaces instead. In the RISC-V SBI article, we looked at an example of such interface for Linux on x86 — instead of knowing which addresses to write to and what values to send there, we focused on the logic and basically just told to the OS kernel that “we want message so and so written to the standard output”, and then the OS kernel dealt with the details of interacting with the hardware. So that’s another big task for the OS kernel: managing the hardware on the machine and making the interaction with it easier.
Going further, the OS kernel offers some really high-level programming interfaces like the filesystems. This may or may not be about managing some hardware and abstracting operations over it. For example, the most common case for the filesystems, of course, is to store some data on the disk and retrieve it later, and this has to do with the OS kernel managing the hardware related to disks on the machine (i.e. sending some data to certain addresses, which makes those hard disk devices respond in some way). However, this is not always the case, the files are not always some data stored on disk, and so filesystem is an interface exposed to us, meaning it’s a way of talking to the OS kernel, not necessarily a way to talk to the data. We’ll cover the filesystems in great detail in some other article, but let’s keep this in mind for now — the OS kernel needs to provide a straightforward way of doing high-level things through multiple interfaces.
Finally, the last thing I wanted to cover about the kernels is that they provide a programming model. Remember how we mentioned (as I’m sure you already know) that multiple programs can run even on a single-core device simultaneously? The OS enables the running applications to be programmed to not even know about each other, in other words, an application can live its lifecycle acting like it is the only application running on the computer and no one else is touching its memory. Imagine a world where your Python Django server needs to know about that texting app on your device in order to be working — we’d have far fewer Django apps and texting apps, for sure, as coding them would quickly get gnarly. However, the apps can also know about each other’s existence on the same machine. The operating system kernel facilitates both. It gives a programming model in which you can insulate applications from each other, or join a few apps in isolation from other apps, etc.
Basically, the OS kernel does a lot of heavy lifting to enable you to run your code easily on a very generic and complicated machinery such as your smartphone. What is written above probably doesn’t do full justice to the kernels, they do a whole lot of things, but the few paragraphs above should give a fairly good idea of kernel’s main tasks, and there are many.
Linux is an extremely popular operating system kernel. It can be built to run on many architectures (really, a lot), it is open source and free to use. And a lot of people are “Linux users”, but what does it exactly mean that someone “uses Linux”? Those Linux users typically install something like Debian, or Ubuntu on their machines, and they use Linux that way, and what does that mean?
What is a Linux distribution?
We talked above about what kernels do, i.e. what are their tasks and we said Linux is an OS kernel, but can we really just take bare Linux and as end users who just want to watch YouTube, do something with it? The answer is likely no, we need a lot more layers on top of Linux to get to firing up a Chrome browser and watching YouTube.
How to go all the way towards the top of the software stack where we can just use those super simple and intuitive apps like graphical web browsers? We have previously discussed the boot process, and we went all the way from the very first operations on the machine after the power on, to the moment we land in the operating system kernel. We did not cover the bootloaders in any detail, we just briefly mentioned them because we were able to get QEMU to directly load our fake kernel into the memory in one go, which is typically not possible with full blown systems like desktop Linux (there is an intermediate booting stage where the bootloader fetches the OS image from something like a disk, or maybe even network and loads it into the memory). The kernel we wrote was a fake little stub that does effectively nothing, and so we ended our last article at the point where the OS kernel is in memory and ready to go, it’s just we had no kernel to run.
Based on what we see above, I think the right mental model for the kernel right now is that it is the infrastructure for running user applications on a complex machine, but it really doesn’t do anything for the user’s business logic. This is what I meant when I said the bare Linux on its own cannot fire up Chrome and let you watch YouTube — it is merely the infrastructure that the application developer uses to implement Chrome, and its streaming capabilities.
However, the kernel alone is not infrastructure for Chrome to run. We need to run sort of “infrastructure on top of infrastructure” to achieve the full infrastructure to run Chrome. Again, much like in the SBI article, we’re just layering abstractions on top of each other in some way, so essentially there is nothing new here, just the way we do it.
For example, in order for a machine to connect to the Internet, the OS kernel first needs to be able to drive the network device on the machine to send the signals out of the machine (to the switch, router, another machine or whatever it is connected to). However, in Linux, there is more or less where the kernel stops. Which networks you connect to, are you using VPN, how do you assign IPs to your machine (statically or dynamically) and that kind of business, it happens in the upper layers of the infrastructure.
You may now guess where this is going — a Linux distribution is really the Linux kernel plus the infrastructure on top of the kernel infrastructure. Let’s dig into it.
How does “infrastructure on top of infrastructure” run?
Again, the kernel does a whole bunch of things, a million times more than what we can cover in a single article, but it definitely has its limits and it doesn’t do all the heavy lifting on your everyday personal device — and this is where something outside of the kernel gets into the picture.
Disclaimer: You can get really creative with Linux in a million different ways, and from this point on we’re going with a very basic, textbook-like, simple view of what happens in the mainstream distributions. There are many super complex things we can do, and there are lots of details we’re leaving out, but my hope here is that you get a general idea and enough knowledge to be able to understand more advanced material on this topic; there is plenty of it on the Internet.
The reason why I wrote the disclaimer above is mainly because we’re going to be assuming that your Linux has a filesystem going forward, as this is the most common path. How many times have you seen a Linux deployment without a filesystem? It certainly seems possible to do, but it may be borderline useless except for some super edge/advanced cases, and we’ll disregard them in this article. Check out this page to get more idea of what I’m talking about.
So what is the stuff outside the kernel? It’s what we call the user code! It’s just a normal code that runs within the Linux environment, just like you run basically anything on your Linux machine. Sure, some code is more privileged than the other, and there are a million more details that can get involved, but let’s just focus on the main distinction here: when you are running Linux on a machine, there is kernel code running, as well as the user code running, and everything that’s a part of the kernel itself is running in the kernel space, and everything that is running on the machine that is not a part of the kernel is running in the user space, and they are fairly isolated from each other.
So this “infrastructure on top of infrastructure” that we have talked about runs in the user space. Sure, it needs to bubble down to the kernel for many primitives, and we’ve seen already how that happens. Linux has a well defined ABI that exposes a set of services that the user space code can invoke in the kernel space. And where does this user space code come into the picture?
The init process (and its “children”)
Once the kernel is done loading and making itself comfortable on the machine, it kicks off the first bit of the code in user space — the init process. This is a piece of user space code that lives in a binary that sits somewhere on your filesystem, and the kernel will look for it in a few locations, beginning with /init (if it doesn’t find it there, it will give a few more shots at different locations before throwing its hands up). Let’s say the kernel found a binary in the filesystem at /init — it’s going to start it and assign the ID 1. This is basically the only user process that the kernel will start: the init process then is the ancestor of all other user space processes. This means that init will start some other processes, these other processes will in turn start some other processes, and so on. Very shortly you have a bunch of processes running on your machine, hopefully each one of them useful for the desired operations on the machine. The machine should at this point start actively interacting with the world around it: whether we’re talking about a smartphone giving the UI to its user, an embedded device that collects data off the sensors and sending it into the cloud, etc. Additionally, the machine will often have various tools available that are not actively running on the machine, but can be invoked in certain situations for some high level operations (e.g. a Python script can invoke a couple of tools like ls, cat or something to get a snapshot of what’s going on with the machine and then sending the data somewhere). Quick note is that even these periodically-started or ad-hoc tools are in some way descendants of init; it’s not too important to know now, but it’s good to keep in mind.
The collection of kernel, the processes that get launched right after the kernel, and the tools that are available at your disposal represent the Linux distribution. It’s essentially a packaging for the kernel alongside all these useful tools that do more around the machine than what the kernel alone does (but it still provides the infrastructure for everything outside of the kernel to run, nothing bypasses the kernel).
Even a distrbution minimally useful for everyday use can get crufty pretty quickly. If you go onto the path of building your own custom little distro, as we actually will now, you will almost inevitably hit a lot of roadblocks where something that you expect to be working is just not working and the full solution is either to code some of your own software to talk to the kernel to get something done on the system, or just use an off-the-shelf software to do so. The latter is the path of least resistance, and you’ll likely keep adding stuff until you end up with a deployment that can do something remotely useful for you. At this point, you will have likely accumulated a significant number of software packages.
On the other hand, you have probably heard people criticizing certain distributions as being “bloated”, probably meaning they accumulated so much complexity in their packaging, they waste a lot of hardware resources doing things that are not useful, etc. Without discipline, I can easily see distrbution developers just randomly throwing different tools at the system just to get that one missing thing going, without retroactively cleaning up the excess later and just moving onto the next feature where they do the same — a (sadly) common pattern in software engineering.
Some distributions draw the line at different places where they just make a decision for the user and do something on the system, versus letting the user make the full decision and be more hands on. For example, you can install Arch Linux in a minimal way where it’s just a little more than the kernel booted up with a shell. All the subsequent decisions are on you, and you have to be very hands on in order to get it to a point where it’s very graphical and highly interactive. Or you can decide it’s just not worth your time setting it up so much, and just install a very user-friendly Ubuntu distrbution, which may be “bloated” for someone’s taste, but it gets you up and running very fast (I personally like it).
Building our almost useless Linux micro distrbibution
Let’s get our hands dirty and build something that’s basically useless but we’ll actually end up booting it for real. You may want to refresh your memory on the RISC-V boot process, I think it will be rewarding here.
First things first, let’s build the kernel.
Building a Linux operating system for RISC-V
I’m on an x86 platform here, so I will depend heavily on the cross-platform toolchain to build things for RISC-V. You will likely do something similar (I’m not sure I have yet seen someone build the RISC-V kernel on RISC-V itself).
Let’s get the source code for Linux. Linux development is done on top of the Git version control system, but we’ll take a shortcut here and just download a tarball with the sources for one branch, we won’t be syncing the whole Linux codebase with all the Git branches, experimental stuff and so on. We’ll be downloading the tarball from kernel.org for version 6.5.2 (here). You can also just download any tarball for whatever the latest stable version is from kernel.org homepage. Once it’s downloaded, go ahead and unpack that. Let’s also cd into that directory.
Now is the time to configure the build. The first step is to make the defconfig which basically initiates your configuration file.
Note: Here and below, you may want to use a different CROSS_COMPILE prefix, depending on how the cross compilation tool is identified on your machine
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- defconfig
This was hopefully quick and the .config file should be generated. The config file should contain a lot of IDs for individual configurations and the values for those, very often in yes/no format (e.g. CONFIG_FOO=y or CONFIG_FOO=n). You could edit the file manually, but I personally wouldn’t recommend it, especially as a beginner (I don’t consider myself an expert at this either). A better way to edit this is through the curses-based pseudo-interface. You can get there by running
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- menuconfig
This interface has a few benefits.
- You have a more readable, folder-like overview of the configs.
- There are insights into dependencies between the configs, i.e. it may only make sense to be able to enable config
fooifbarandbazare also enabled. - This interface has a search feature, activated by pressing the
/button (I don’t think you’ll get far by searching there in natural language; my way of getting around here is by searching on Google and finding which exactly config key am I looking for, for exampleCONFIG_TTY_PRINTK). When you find what you’re looking for, hit the button you see in the parentheses.
We won’t be tweaking anything here for now, let’s just exit and move on.
It’s time to build the kernel! Quick note here, the make process famously has the -j flag, which basically sets the concurrency in the build process, meaning it allows the build process to run a few things simultaneously. If you want to build faster, but not sure what to do, count the number of cores, and if it’s something like 8, just pass the flag -j8 below, as so. I will run the command like this (I’m on a 16-core machine):
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- -j16
This can take some time, though for the RISC-V build, it shouldn’t take awfully long, but I would expect at least a few minutes.
Once this is done, you will probably see something like this near the very bottom:
OBJCOPY arch/riscv/boot/Image
and this is the file we will be feeding to QEMU.
Great, let’s fire up QEMU!
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image
Switching to the UART view, we see that OpenSBI tidily started and the Linux took over! Great! We even see some references to the SBI layer that we have discussed before:
[ 0.000000] Linux version 6.5.2 (uros@uros-debian-desktop) (riscv64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Mon Sep 11 00:45:40 PDT 2023
[ 0.000000] Machine model: riscv-virtio,qemu
[ 0.000000] SBI specification v0.2 detected
[ 0.000000] SBI implementation ID=0x1 Version=0x8
[ 0.000000] SBI TIME extension detected
[ 0.000000] SBI IPI extension detected
[ 0.000000] SBI RFENCE extension detected
After reading about the boot process, we should now have a full understanding of what is going on here. This happened super early in the boot phase. There is a lot happening in these logs, and I’ll highlight a few things:
[ 0.000000] riscv: base ISA extensions acdfim
Seems like Linux is capable of dynamically figuring out the capability of the underlying RISC-V hardware. I’m not sure what exactly is the mechanism behind it, could it be somehow passed through the device tree that we mentioned in the previous article, or something in the ISA itself tells this to the kernel, I’m not sure.
[ 0.000000] Kernel command line:
This is interesting, a kernel has a command line? Turns out that the kernel, much like your everyday binaries, has startup flags. The kernel bootloader usually sets those up — after all, it knows how to fire up the kernel, and this could simply be a part of the starting process. With QEMU, remember, we’re sort of short circuiting the whole bootloader thing, and with passing the -kernel flag, we let QEMU also wear the bootloader hat here by loading the kernel image into the memory and starting it up. QEMU actually has a flag called -append with which you can append to this kernel command line. The command line itself is baked into the config file under Boot options somewhere, I leave it to the reader to search for it, and the QEMU flag basically lets you adjust it with a VM launch, instead of having to rebuild the kernel to tweak the command line. In this case, the command line is just blank by default.
[ 0.003376] printk: console [tty0] enabled
I guess this means that printk will now write to tty0? printk is basically a way to write out messages from the kernel space. Remember, your typical printf from C’s stdio.h is meant for running in the user space, not kernel space, so kernel space must have its own solution, and it is printk.
[ 0.211634] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.221544] 10000000.uart: ttyS0 at MMIO 0x10000000 (irq = 12, base_baud = 230400) is a 16550A
[ 0.222659] printk: console [ttyS0] enabled
Great, Linux knows there is UART at 0x10000000, just like we established before. Linux can now choose whether to use the SBI interface to drive the UART, or talk to it directly (if the S-mode allows it on that machine, that is). On many platforms, the OS can disregard that a lower level software like BIOS may offer to interact with the hardware, and from what I hear, this actually indeed happens a lot.
There’s also a lot of other stuff in the kernel logs:
[ 0.250030] SuperH (H)SCI(F) driver initialized
I don’t think we need this? I guess we can go back to the kernel config and not bake this driver into the kernel and thus slim the kernel down. What we’re building here is a generic build, really. We didn’t customize anything and presumably the authors of the default config thought this is a reasonable default that should just run on a lot of different setups, so they probably included a lot of things to be on the safe side. If you’re working on smaller hardware, with less generous memory, CPU, etc. you do have to carefully choose what gets baked into the kernel and what doesn’t.
Additionally, this generic build is smart enough to figure out that the console should go to the right UART device, which is really handy for us. Otherwise, we’d probably have to do a bunch of configs like making sure TTY (let’s not overfocus on what this is now) is enabled, we want to enable printing to UART as the kernel boots, etc. All this is basically configurable in the menuconfig interface.
We’ll keep it simple in this article, and we won’t customize anything in the kernel unless we have to.
First obstacles
Scrolling down closer to the bottom of the output, we see this:
[ 0.330411] /dev/root: Can't open blockdev
[ 0.330743] VFS: Cannot open root device "" or unknown-block(0,0): error -6
[ 0.330984] Please append a correct "root=" boot option; here are the available partitions:
[ 0.331648] List of all bdev filesystems:
[ 0.331785] ext3
[ 0.331803] ext2
[ 0.331882] ext4
[ 0.331950] vfat
[ 0.332028] msdos
[ 0.332098] iso9660
[ 0.332181]
[ 0.332405] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 0.332756] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.2 #1
[ 0.333018] Hardware name: riscv-virtio,qemu (DT)
[ 0.333248] Call Trace:
[ 0.333442] [] dump_backtrace+0x1c/0x24
[ 0.333940] [] show_stack+0x2c/0x38
[ 0.334138] [] dump_stack_lvl+0x3c/0x54
[ 0.334318] [] dump_stack+0x14/0x1c
[ 0.334493] [] panic+0x102/0x29e
[ 0.334683] [] mount_root_generic+0x1e8/0x29c
[ 0.334891] [] mount_root+0x1f2/0x224
[ 0.335108] [] prepare_namespace+0x1ca/0x222
[ 0.335320] [] kernel_init_freeable+0x23e/0x262
[ 0.335539] [] kernel_init+0x1e/0x10a
[ 0.335714] [] ret_from_fork+0xa/0x1c
[ 0.336208] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
Whoops, we crashed! The kernel has fallen into a panic.
Remember how we talked that pretty much always Linux needs a filesystem to be useful and how all the “infrastructure on top of infrastructure” is in the user space? Well, we didn’t really pass anything related to the filesystem explicitly and we surely didn’t pass any user space code to serve as the init, though we didn’t even get to the latter.
You might imagine that the filesystem needs to be on a disk, but that’s not necessarily the case. We’ll talk some other time about filesystems in great detail, but you can really have a filesystem be backed by RAM memory too. And this is actually very often used by Linux, most notably in the boot up phase. When the kernel gets to where it crashed for us just now, in a normal, typical situation, it will find the whole, fully functional filesystem actually loaded into the RAM. If this confuses you, just think about it this way — a disk is just a bunch of bytes, just like RAM is, though RAM is faster but much smaller; conceptually they’re basically the same. Who and how loads this memory?
One way is to bake the filesystem directly into the kernel image. In this case, as the kernel loads, so does the initial, memory-backed filesystem, and our system would be ready to go if we had done that. If you don’t want to bulk up your kernel image and you want your initial filesystem to be loaded by some other means, like through a bootloader or something, then you package it separately. In QEMU case, we can shortcircuit things a little bit again, and make it wear a few more hats — we’ll make it also load the initial filesystem into the memory as well. If you’re interested in building the filesystem into the kernel, read the discussion here and try it as an exercise after you’re done with this guide.
This initial filesystem has a name: initramfs. You’ll often hear it called initrd too (I imagine rd is short for ramdisk?). The latter is how QEMU takes in the filesystem for loading (-initrd flag).
The filesystem is packaged as a cpio archive, which is conceptually similar to tar, but it’s not the same binary format. Short discussion can be read here.
Building the initramfs
The only real requirement for the initramfs from the kernel is that it has a binary it can start up as the init process, and the first place where the kernel will look for it is at the filesystem root, so the path is /init. If you have absolutely nothing else on your filesystem, it’s questionably useful, but this is the bare requirement. Let’s start by writing the init process in C. This process can be really anything, Linux won’t stop you from writing a useless init, it will happily just execute it. We can go with a ‘hello world’ then?
#include
int main(int argc, char *argv[]) {
printf("Hello world\n");
return 0;
}
Great, now let’s package it up into a cpio archive.
riscv64-linux-gnu-gcc -static -o init init.c
cpio -o -H newc < file_list.txt > initramfs.cpio
The file_list.txt has a single line:
init
- We’re building a static binary because we do not want to dynamically depend on the standard C library. The filesystem won’t have it, we’re making a filesystem with
initalone. - Linux expects the
initramfsarchive to be built with the-H newcflag.
Let’s run QEMU.
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image -initrd /PATH/TO/NEWLY_BUILT/initramfs.cpio
The kernel stil falls into a panic, but a different one!
[ 0.351894] Run /init as init process
Hello world
[ 0.379006] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
[ 0.379360] CPU: 0 PID: 1 Comm: init Not tainted 6.5.2 #1
[ 0.379597] Hardware name: riscv-virtio,qemu (DT)
[ 0.379812] Call Trace:
[ 0.380005] [] dump_backtrace+0x1c/0x24
[ 0.380724] [] show_stack+0x2c/0x38
[ 0.380906] [] dump_stack_lvl+0x3c/0x54
[ 0.381095] [] dump_stack+0x14/0x1c
[ 0.381283] [] panic+0x102/0x29e
[ 0.381447] [] do_exit+0x760/0x766
[ 0.381623] [] do_group_exit+0x24/0x70
[ 0.381806] [] __wake_up_parent+0x0/0x20
[ 0.382009] [] do_trap_ecall_u+0xe6/0xfa
[ 0.382218] [] ret_from_exception+0x0/0x64
[ 0.382808] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 ]---
I guess this just means init shouldn’t finish, so it should be easy to fix? Let’s just make it print something every 10 seconds and never stop. Important to note: our output worked, we see a “Hello world” string!
We’ll write a new init, but let’s also make our initramfs a little more complex too. Let’s remember how we said that init starts up all the other processes on the machine. Wouldn’t it be nice if we actually had some sort of a shell? After all, that’s what we typically have with Linux — shells go well with Linux. We’ll build a useless shell, the one that just tells us what we asked it to do (echoes back the input).
Let’s first write the init process. Before it begins looping and printing something every 10 seconds, it has an important job of spawning our “little shell”. The way a process can spawn another process in Linux is through 2 operations: fork and exec. fork will start a new process by literally cloning the current process at the moment of fork. The way the underlying code can differentiate the “parent” and “child” processes after that is by checking the return value of the fork operation. If it is 0, this means the process is the child process, and it’s a parent otherwise (-1 is returned in an error case).
Next, it’s not useful for us here to just keep executing the init program in 2 different processes. That’s where one of the many exec operations come into the picture. When I say there are many exec operations available on Linux, I mean there are execl, execlp, execle, etc. Take a look at more documentation here, please. We’re going with execl here, and the first parameter is which binary do we want to launch. We’ll package our fake shell as the little_shell binary on the root. The rest of the parameters do not really matter (as evidenced by the value of the second parameter). More important, the mechanism of this operation is that we’re calling into the kernel to take whatever is running in the current process and replace it with the program that is loaded for execution from the binary listed as the first parameter. This is how programs get launched on Linux and when you’re working in your Bash shell, and you end up launching a program, this is what happens — a sequence of fork and exec-style calls.
#include
#include
int main(int argc, char *argv[]) {
pid_t pid = fork();
if (pid == -1) {
printf("Unable to fork!");
return -1;
}
if (pid == 0) {
// This is a child process.
int status = execl("/little_shell", "irrelevant", NULL);
if (status == -1) {
printf("Forked process cannot start the little_shell");
return -2;
}
}
int count = 1;
while (1) {
printf("Hello from the original init! %d\n", count);
count++;
sleep(10);
}
return 0;
}
We build the init the same way as we did before:
riscv64-linux-gnu-gcc -static -o init init.c
For the “shell” we’re building, I want to get a little more creative. Why don’t we write this one in Go instead of old school C?
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
fmt.Println("Hello world from Go!")
reader := bufio.NewReader(os.Stdin)
for {
fmt.Print("Enter your command: ")
line, _ := reader.ReadString('\n')
fmt.Printf("Your command is: %s", line)
}
}
I am able to cross compile this to RISC-V out-of-the-box with my go compiler.
GOOS=linux GOARCH=riscv64 go build little_shell.go
Nice thing that I really like about Go is that it’s very easy to reference other remote repositories on GitHub to include libraries, and things get neatly packaged up statically. I’m not going to lie, the little_shell Go binary is pretty thick, weighing in at 1.9M on my machine, compared to only 454K for the statically-linked simple init, but in the days of desktops/laptops/phones with hundreds of GB of storage, if you’re building a distro for these kinds of devices, you may want to consider the tradeoff.
Note, there are situations where you may not be able to simply run your Go binary just like that on top of a bare kernel, it could start throwing Go panics all over the place. In order to run Go, you need to build your kernel with the right features in it, futex support feature being one of them (I think I’ve identified only 2 in my past experience). If you encounter any problems running the Go applications and you suspect you may not have the right kernel support, carefully read through the panics and you will be able to identify what is missing. Good news here is that the default config for the RISC-V kernel is good enough for running Go.
Let’s update our file_list.txt:
init
little_shell
Pack it all up again:
cpio -o -H newc < file_list.txt > initramfs.cpio
Let’s run it!
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image -initrd /PATH/TO/NEWLY_BUILT/initramfs.cpio
[ 0.356314] Run /init as init process
Hello from the original init! 1
Hello world from Go!
Enter your command: [[[mkdir hello]]]
Your command is: mkdir hello
Enter your command: [[[ls]]]
Your command is: ls
Enter your command: Hello from the original init! 2
[[[echo 123]]
Your command is: echo 123
Enter your command: [[[exit]]]
Your command is: exit
Enter your command: Hello from the original init! 3
[[[I give up!]]]
Your command is: I give up!
The bits in this console excerpt enclosed with triple square brackets are my user-provided input over UART. You can see 3 things interleaved on the UART
- Original
init’s period output every 10 seconds. - Output from the
little_shell. - Input from the user.
We are using the sole UART device on the virtual machine for all this, but that is not the only reason why everything is mixed up here. init process prints to the standard output, just like little_shell does, and you may not be aware of it, but any sort of print on Linux is a print to an open file. Standard output, as far as Linux knows, is a file that is opened by a process and you are printing to the standard output by writing to that file. When we fork-ed the little_shell from init, the little_shell inherited the open files from init. So they are literally sharing all the standard input and output streams. Even if we had multiple I/O devices that we used on this machine, they’d still be sending outputs over to the same output stream. When init was started, its standard output was set to produce content over to UART, and this behavior was simply inherited by the little_shell.
And there we have it, we have a pretty useless, but home-made Linux distribution! Go ahead and send it over to your friends! 🙂
Jokes aside, you can make an exercise out of this and implement some sort of a mini shell out of this little_shell. Instead of just echoing back the commands given to it, you could make it actually understand what mkdir is. You can even have it fork off a process to execute that elsewhere. Sky is the limit, you’re in the Linux userspace!
Let’s just step back a little and see if Linux kernel achieved the initial few promises for us:
-
It’s abstracting away the hardware. Our
initand our shell didn’t know anything about the UART. All they knew was they’re writing to some Linux file handle. It happens to be mapped to something abstract in the Linux kernel that invokes the UART driver in the Linux kernel, which may or may not use the SBI under the hood (I have honestly not verified if the kernel removes its dependence on SBI after it boots). -
It offers some high-level programming paradigms, like filesystems. Our
initprocess located the other binary through the filesystem (the path was trivial, the binary was right in the root, but still, the paradigm is there). -
There is a pretty clean isolation between the processes running. Once the shell was forked off from the
init, the processes were basically running independently. The memory was not shared between them and they didn’t have to worry about each other’s memory layout. They did share something else, though, like the file handles, but this is a consequence of how they were launched into running. Linux enables you to actually change some of this behavior, e.g. you can set up some shared memory between the processes, if you explicitly want to.
There are many other things the kernel does for us, but let’s just stop here for now and appreciate this. It may not look like a lot, but the kernel gives us a pretty solid, portable infrastructure with which we can develop high level software while often disregarding the complexities of the underlying machine.
So what is an operating system?
This is now a game of words in my opinion. In my view, what matters is that the reader now has an understanding of what Linux as the kernel is, what “infrastructure” it offers, and what is running in the user space and what is running in the kernel space.
Some people may call the kernel itself an operating system, some people will refer to the whole distribution as the operating system, or they may come up with something completely different. I hope that at this point you have a good understanding of what is happening on a machine once Linux is started and where the responsibilities of each component end (or you can at least imagine the boundaries on a more complex system).
I hope this was useful!
Bonus section: making an actually useful micro distribution with u-root
I thought about wrapping up here, but it wouldn’t make for a flashy demo. Why don’t we instead boot into something that’s actually useful, meaning that you can do things you would typically do on a Linux-based system, like run your ls, mkdir, echo and whatnot. Let’s stick with the kernel we have previously built, and add some useful “infrastructure on top of infrastructure” in the user space domain to make the whole machine more useful.
I really like the u-root project for this.
Note: The title of their project mentions Go bootloaders, and this may stump you because as a careful reader, you know that Go programs are not really something you can run on bare metal. These bootloaders are somewhat exotic userspace bootloaders, meaning that they will actually run on top of a live Linux kernel, and then use this amazing Linux mechanism called kexec to re-load a different kernel into the memory from user space. We won’t be using these bootloaders for now, we’ll just focus on the other user space goodies they have available, but I thought a quick paragraph here would help the confused readers.
The reason why I like the u-root project is because it’s so insanely easy to use. Its usage is a bit creative though, so there are really 2 steps here:
- Install
u-rootper their instructions. You should end up with au-rootbinary in yourPATH. - Now to actually generate a functional
initramfswithu-root, the easiest way is to clone their Git repo andcdyour way into the directory that you just cloned. From there, you can cross-compile a fully functional user space set of tools with a single command.
git clone https://github.com/u-root/u-root.git
cd u-root
GOOS=linux GOARCH=riscv64 u-root
I get a few lines of output, the last being:
18:31:31 Successfully built "/tmp/initramfs.linux_riscv64.cpio" (size 14827284).
And that’s really it, this cpio file can now be just ran with QEMU and you’ll boot right into a shell! Go through the u-root documentation to understand how you can customize this initramfs image you get, including what sort of changes you can make to the init process behavior, but I think the default setup is so amazing to explore with.
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image -initrd /tmp/initramfs.linux_riscv64.cpio
Wow, this booted really smoothly! Providing the bottom of the UART output.
[ 0.400269] Run /init as init process
2023/09/12 01:34:33 Welcome to u-root!
_
_ _ _ __ ___ ___ | |_
| | | |____| '__/ _ \ / _ \| __|
| |_| |____| | | (_) | (_) | |_
\__,_| |_| \___/ \___/ \__|
And as you can see by the little /# prompt, you’re actually in a shell! u-root’s init forked off a shell process and gave it the control over the UART.
/# ls
bbin
bin
buildbin
dev
env
etc
go
init
lib
lib64
proc
root
sys
tcz
tmp
ubin
usr
var
/# pwd
/
/# echo "Hello world!"
Hello world!
This little shell that u-root gives even supports Tab-completion! I will say I have encountered some hiccups occassionally with it, it’s definitely not your full blown Bash, but it’s more than just a toy.
The standard tools like ls seem to be taking the standard flags:
/# ls -lah
dtrwxrwxrwx root 0 420 B Sep 12 01:35 .
drwxr-xr-x root 0 2.1 kB Jan 1 00:00 bbin
drwxr-xr-x root 0 80 B Jan 1 00:00 bin
drwxrwxrwx root 0 40 B Sep 12 01:34 buildbin
drwxr-xr-x root 0 12 kB Sep 12 01:34 dev
drwxr-xr-x root 0 40 B Sep 12 01:35 directory
drwxr-xr-x root 0 40 B Jan 1 00:00 env
drwxr-xr-x root 0 80 B Sep 12 01:34 etc
drwxrwxrwx root 0 60 B Sep 12 01:34 go
Lrwxrwxrwx root 0 9 B Jan 1 00:00 init -> bbin/init
drwxrwxrwx root 0 40 B Sep 12 01:34 lib
drwxr-xr-x root 0 40 B Jan 1 00:00 lib64
dr-xr-xr-x root 0 0 B Sep 12 01:34 proc
drwx------ root 0 40 B Sep 11 07:43 root
dr-xr-xr-x root 0 0 B Sep 12 01:34 sys
drwxr-xr-x root 0 40 B Jan 1 00:00 tcz
dtrwxrwxrwx root 0 60 B Sep 12 01:34 tmp
drwxr-xr-x root 0 40 B Jan 1 00:00 ubin
drwxr-xr-x root 0 60 B Jan 1 00:00 usr
drwxr-xr-x root 0 60 B Jan 1 00:00 var
Visit google.com from this!
One last flashy thing — let’s connect to google.com from this VM with our custom user-land!
First, we need to attach a network device. We add -device virtio-net-device,netdev=usernet -netdev user,id=usernet,hostfwd=tcp::10000-:22 to our QEMU CLI. I think the last 2 numbers do not really matter as we won’t be SSH’ing into this machine (maybe you can do that exercise yourself, but I’m afraid it won’t be easy). The default kernel build should indeed bake in the virtio network device drivers, so this should more or less just work.
We’ll need a working IP address, and we’ll use something from u-root to obtain it. That something requires 3 things present in the kernel config: CONFIG_VIRTIO_PCI, CONFIG_HW_RANDOM_VIRTIO and CONFIG_CRYPTO_DEV_VIRTIO. My default settings for the kernel have all that flipped to y, so I’m good to go and you should be too, but you can double check just in case. If you have changed any kernel settings, please rebuild the kernel image.
Finally, we need to attach an RNG (doesn’t matter what it is) device to our QEMU machine so we can obtain our IP address. We simply add -device virtio-rng-pci to our QEMU CLI.
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image -initrd /tmp/initramfs.linux_riscv64.cpio -device virtio-net-device,netdev=usernet -netdev user,id=usernet,hostfwd=tcp::10000-:22 -device virtio-rng-pci
Once we’re in, we can run ip addr to see what’s our IP address.
/# ip addr
1: lo: mtu 65536 state UNKNOWN
link/loopback
inet 127.0.0.1 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 state DOWN
link/ether 52:54:00:12:34:56
3: sit0: <0> mtu 1480 state DOWN
link/sit
Our Ethernet is not set up. Let’s enable IPv4 networking (we don’t need 6). In this little setup, QEMU is running a virtualized network and it embeds a little DHCP server which can dynamically assign IPs (documentation is here). So let’s run a DHCP helper from u-root for this by running
dhclient -ipv6=false
The output I got was the following:
2023/09/12 03:46:59 Bringing up interface eth0...
2023/09/12 03:47:00 Attempting to get DHCPv4 lease on eth0
2023/09/12 03:47:00 Got DHCPv4 lease on eth0: DHCPv4 Message
opcode: BootReply
hwtype: Ethernet
hopcount: 0
transaction ID: 0x05f008e1
num seconds: 0
flags: Unicast (0x00)
client IP: 0.0.0.0
your IP: 10.0.2.15
server IP: 10.0.2.2
gateway IP: 0.0.0.0
client MAC: 52:54:00:12:34:56
server hostname:
bootfile name:
options:
Subnet Mask: ffffff00
Router: 10.0.2.2
Domain Name Server: 10.0.2.3
IP Addresses Lease Time: 24h0m0s
DHCP Message Type: ACK
Server Identifier: 10.0.2.2
2023/09/12 03:47:00 Configured eth0 with IPv4 DHCP Lease IP 10.0.2.15/24
2023/09/12 03:47:00 Finished trying to configure all interfaces.
The QEMU documentation will tell you why pinging won’t work, so let’s not bother with pinging. Let’s just “visit” google.com!
wget http://google.com
You can now read the downloaded index.html file!
cat index.html
You’ll get a lot of obfuscated JavaScript, but this is great! It means we have successfully visited google.com through wget! I hope this sparks your imagination to do some other cool things with u-root.
Package managers
You might intuitively understand at this point that some of the most important software of a Linux distro is the package manager. It’s really the gateway to getting the functionality on your machine that you need. What we went through here is more of an embedded flow: we generated these somewhat monolithic software images and if we want to update something, we rebuild the whole image and re-image the device. This doesn’t work for desktops, phones, etc. Package managers are there to update, add or remove the software on our machines. We won’t be talking about them here, just giving them a brief shoutout and you can hopefully imagine from the high level how they work and what do they do.
The monster of init
The init we created is definitely just a toy, and in the end it just started some sort of a shell. However, make no mistake about it, init is an incredibly important thing on a Linux system and getting it right is a science. You’ll see a lot of strong opinions on different init systems for Linux online. init doesn’t usually just spawn one process off and call it a day, it can set up a whole bunch of things like different devices, for example. As an exercise, just run ls /dev from your u-root-based build and see all those devices set up. A lot of them come from the init’s setup and many are extremely useful. You can then read some of the u-root source code to see what’s going on there in init.
GitHub repo
The code for this guide is available here, where you can just sync and build the initramfs images.
