Step 2 – Extracting the kernel source tree
In the previous section, in step 1, you learned how exactly you can obtain a Linux kernel source tree. One way – and the one we follow in this book – is to simply download a compressed source file from the kernel.org website (or one of its mirror sites). Another way is to use Git to clone a recent kernel source tree.
So, I’ll assume that by now you have obtained the 6.1.25 (LTS) kernel source tree in compressed form onto your Linux box. With it in place, let’s proceed with step 2, a simple step, where we learn how to extract it.
As mentioned earlier, this section is meant for those of you who have downloaded a particular compressed Linux kernel source tree from the repository, https://www.kernel.org, and aim to build it. In this book, we work primarily on the 6.1 longterm kernel series, particularly, on the 6.1.25 LTS kernel.
On the other hand, if you have performed git clone
on the mainline Linux Git tree, as shown in the immediately preceding section, you can safely skip this section and move on to the next one – Step 3 – Configuring the Linux kernel.
Right; now that the download is done, let’s proceed further. The next step is to extract the kernel source tree – remember, it’s a tar-ed and compressed (typically .tar.xz
) file. At the risk of repetition, we assume that by now you have downloaded the Linux kernel version 6.1.25 code base as a compressed file into the ~/Downloads
directory:
$ cd ~/Downloads ; ls -lh linux-6.1.25.tar.xz
-rw-rw-r-- 1 c2kp c2kp 129M Apr 20 16:13 linux-6.1.25.tar.xz
The simple way uncompress and extract this file is by using the ubiquitous tar
utility to do so:
tar xf ~/Downloads/linux-6.1.25.tar.xz
This will extract the kernel source tree into a directory named linux-6.1.25
within the ~/Downloads
directory. But what if we would like to extract it into another folder, say, ~/kernels?
Then, do it like so:
mkdir -p ~/kernels tar xf ~/Downloads/linux-6.1.25.tar.xz \
--directory=~/kernels/
This will extract the kernel source into the ~/kernels/linux-6.1.25/
folder. As a convenience and good practice, let’s set up an environment variable to point to the location of the root of our shiny new kernel source tree:
export LKP_KSRC=~/kernels/linux-6.1.25
Note that, going forward, we will assume that this variable LKP_KSRC
holds the location of our 6.1.25 LTS kernel source tree.
While you could always use a GUI file manager application, such as Nautilus, to extract the compressed file, I strongly urge you to get familiar with using the Linux CLI to perform these operations.
Don’t forget tldr
when you need to quickly lookup the most frequently used options to common commands! Take tar
, for example: simply do tldr tar
to look at common tar commands, or look it up here: https://tldr.inbrowser.app/pages/common/tar.
Did you notice? We can extract the kernel source tree into any directory under our home directory, or elsewhere. This is unlike in the old days, when the tree was always extracted into a root-writeable location, often /usr/src/
.
If all you wish to do now is proceed with the kernel build recipe, skip the following section and move along. If you’re interested (I certainly hope so!), the next section is a brief but important digression into looking at the structure and layout of the kernel source tree.
A brief tour of the kernel source tree
Imagine! The entire Linux kernel source code is now available on your system! Awesome – let’s take a quick look at it:
Figure 2.7: The root of the pristine 6.1.25 Linux kernel source tree
Great! How big is it? A quick du -h .
issued within the root of the uncompressed kernel source tree reveals that this kernel source tree (recall, its version is 6.1.25) is approximately 1.5 gigabytes in size!
FYI, the Linux kernel has grown to be big and is getting bigger in terms of Source Lines of Code (SLOCs). Current estimates are close to 30 million SLOCs. Of course, do realize that not all this code will get compiled when building a kernel.
How do we know which version exactly of the Linux kernel this code is by just looking at the source? That’s easy: one quick way is to just check out the first few lines of the project’s Makefile
. Incidentally, the kernel uses Makefiles
all over the place; most directories have one. We will refer to this Makefile
, the one at the root of the kernel source tree, as the top-level Makefile:
$ head Makefile
# SPDX-License-Identifier: GPL-2.0
VERSION = 6
PATCHLEVEL = 1
SUBLEVEL = 25
EXTRAVERSION =
NAME = Hurr durr I'ma ninja sloth
# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
# More info can be located in ./README
Clearly, it’s the source of the 6.1.25 kernel. We covered the meaning of the VERSION
, PATCHLEVEL
, SUBLEVEL
, and EXTRAVERSION
tags – corresponding directly to the w.x.y.z
nomenclature – in the Understanding the Linux kernel release nomenclature section. The NAME
tag is simply a nickname given to the release (looking at it here – well, what can I say: that’s kernel humor for you. I personally preferred the NAME
for the 5.x kernels – it’s “Dare mighty things”!).
Right, let’s now get for ourselves a zoomed-out 10,000-foot view of this kernel source tree. The following table summarizes the broad categorization and purpose of the more important files and directories within the root of the Linux kernel source tree. Cross-reference it with Figure 2.7:
File or directory name |
Purpose |
Top-level files |
|
|
The project’s The documentation is really important; it’s the authentic thing, written by the kernel developers themselves. Do read this short |
|
This file details the license terms under which the kernel source is released. The vast majority of kernel source files are released under the well-known GNU GPL v2 (written as GPL-2.0) license. The modern trend is to use easily grep-pable industry-aligned SPDX license identifiers. Here’s the full list: https://spdx.org/licenses/. See more below, point [2]. |
|
FAQ: something’s wrong in kernel component (or file) XYZ – who do I contact to get some support? That is precisely what this file provides – the list of all kernel subsystems along with its maintainer(s). This goes all the way down to the level of individual components, such as a particular driver or file, as well as its status, who is currently maintaining it, the mailing list, website, and so on. Very helpful! There’s even a helper script to find the person or team to talk to: |
|
This is the kernel’s top-level Makefile; the kernel’s Kbuild build system as well as kernel modules use this |
Major subsystem directories |
|
|
Core kernel subsystem: the code here deals with a large number of core kernel features including stuff like process/thread life cycle management, CPU task scheduling, locking, cgroups, timers, interrupts, signaling, modules, tracing, RCU primitives, [e]BPF, and more. |
|
The bulk of the memory management (mm) code lives here. We will cover a little of this in Chapter 6, Kernel Internals Essentials – Processes and Threads, and some related coverage in Chapter 7, Memory Management Internals – Essentials, and Chapter 8, Kernel Memory Allocation for Module Authors – Part 1, as well. |
|
The code here implements two key filesystem features: the abstraction layer – the kernel Virtual Filesystem Switch (VFS) – and the individual filesystem drivers (for example, |
|
The underlying block I/O code path to the VFS/FS. It includes the code implementing the page cache, a generic block IO layer, IO schedulers, the new-ish blk-mq features, and so on. |
|
Complete implementation of the network protocol stack, to the letter of the Request For Comments (RFCs) – https://whatis.techtarget.com/definition/Request-for-Comments-RFC. Includes high-quality implementations of TCP, UDP, IP, and many, many more networking protocols. Want to see the code-level implementation of TCP/IP for IPv4? It’s here: net/ipv4/, see the |
|
The Inter-Process Communication (IPC) subsystem code; the implementation of IPC mechanisms such as SysV and POSIX message queues, shared memory, semaphores, and so on. |
|
The audio subsystem code, aka the Advanced Linux Sound Architecture (ALSA) layer. |
|
The virtualization (hypervisor) code; the popular and powerful Kernel Virtual Machine (KVM) is implemented here. |
Arch/Infrastructure/Drivers/Miscellaneous |
|
|
The official kernel documentation resides right here; it’s important to get familiar with it. The |
|
The text of all licenses, categorized under different heads. See point [2]. |
|
The arch-specific code lives here (by the word arch, we mean CPU). Linux started as a small hobby project for the i386. It is now very probably the most ported OS ever. See the arch ports in point [4] of the list that follows this table. |
|
Support code for generating signed modules; this is a powerful security feature, which when correctly employed ensures that even malicious rootkits cannot simply load any kernel module they desire. |
|
This directory contains the kernel-level implementation of ciphers (as in encryption/decryption algorithms, or transformations) and kernel APIs to serve consumers that require cryptographic services. |
|
The kernel-level device drivers code lives here. This is considered a non-core region; it’s classified into many types of drivers. This tends to be the region that’s most often being contributed to; as well, this code accounts for the most disk space within the source tree. |
|
This directory contains the arch-independent kernel headers. There are also some arch-specific ones under |
|
The arch-independent kernel initialization code; perhaps the closest we get to the kernel’s |
|
Kernel infrastructure for implementing the new-ish |
|
The closest equivalent to a library for the kernel. It’s important to understand that the kernel does not support shared libraries as user space apps do. Some of the code here is auto-linked into the kernel image file and hence is available to the kernel at runtime. Various useful components exist within |
|
Kernel infrastructure for supporting the Rust programming language; see point [6]. |
|
Sample code for various kernel features and mechanisms; useful to learn from! |
|
Various scripts are housed here, some of which are used during kernel build, many for other purposes like static/dynamic analysis, debugging, and so on. They’re mostly Bash and Perl scripts. (FYI, and especially for debugging purposes, I have covered many of these scripts in Linux Kernel Debugging, 2022.) |
|
Houses the kernel’s Linux Security Module (LSM), a Mandatory Access Control (MAC) framework that aims at imposing stricter access control of user apps to kernel space than the default kernel does. The default model is called Discretionary Access Control (DAC). Currently, Linux supports several LSMs; well-known ones are SELinux, AppArmor, Smack, Tomoyo, Integrity, and Yama. Note that LSMs are “off” by default. |
|
The source code of various user mode tools is housed here, mostly applications or scripts that have a “tight coupling” with the kernel, and thus require to be within the particular kernel codebase. Perf, a modern CPU profiling tool, eBPF tooling, and some tracing tools, serve as excellent examples. |
|
Support code to generate and load the initramfs image; this allows the kernel to execute user space code during kernel init. This is often required; we cover initramfs in Chapter 3, Building the 6.x Linux Kernel from Source – Part 2, section Understanding the initramfs framework. |
Table 2.2: Layout of the Linux kernel source tree
The following are some important explanations from the table:
- README: This file also mentions the document to refer to for info on the minimal acceptable versions of software to build and run the kernel:
Documentation/process/changes.rst
. Interestingly, the kernel provides an Awk script (scripts/ver_linux
) that prints the versions of current software on the system it’s run upon, helping you to check whether the versions you have installed are acceptable. - Kernel licensing: Without getting stuck in the legal details (needless to say, I am not a lawyer), here’s the pragmatic essence of the thing. As the kernel is released under the GNU GPL-2.0 license (GNU GPL is the GNU General Public License), any project that directly uses the kernel code base automatically falls under this license. This is the “derivative work” property of the GPL-2.0. Legally, these projects or products must now release their kernel software under the same license terms. Practically speaking, the situation on the ground is a good deal hazier; many commercial products that run on the Linux kernel do have proprietary user- and/or kernel-space code within them. They typically do so by refactoring kernel (most often, device driver) work in Loadable Kernel Module (LKM) format. It is possible to release the kernel module (LKM) under a dual-license model. The LKM is the subject matter of Chapter 4, Writing Your First Kernel Module – Part 1, and Chapter 5, Writing Your First Kernel Module – Part 2, and we cover some information on the licensing of kernel modules there.
Some folks, preferring proprietary licenses, manage to release their kernel code within a kernel module that is not licensed under GPL-2.0 terms; technically, this is perhaps possible, but is at the very least considered as being terribly anti-social and can even cross the line to being illegal. The interested among you can find more links on licensing in the Further reading document for this chapter.
- MAINTAINERS: Just peek at this file in the root of your kernel source tree! Interesting stuff... To illustrate how it’s useful, let’s run a helper Perl script:
scripts/get_maintainer.pl
. Do note that, pedantically, it’s meant to be run on a Git tree only. Here, we ask the script to show the maintainers of the kernel CPU task scheduling code base by specifying a file or directory via the-f
switch:$ scripts/get_maintainer.pl --nogit -f kernel/sched Ingo Molnar <[email protected]> (maintainer:SCHEDULER) Peter Zijlstra <[email protected]> (maintainer:SCHEDULER) Juri Lelli <[email protected]> (maintainer:SCHEDULER) Vincent Guittot <[email protected]> (maintainer:SCHEDULER) Dietmar Eggemann <[email protected]> (reviewer:SCHEDULER) Steven Rostedt <[email protected]> (reviewer:SCHEDULER) Ben Segall <[email protected]> (reviewer:SCHEDULER) Mel Gorman <[email protected]> (reviewer:SCHEDULER) Daniel Bristot de Oliveira <[email protected]> (reviewer:SCHEDULER) Valentin Schneider <[email protected]> (reviewer:SCHEDULER) [email protected] (open list:SCHEDULER)
- Linux arch (CPU) ports: As of 6.1, the Linux OS has been ported to all these processors. Most have MMUs, You can see the arch-specific code under the
arch/
folder, each directory representing a particular CPU architecture:$ cd ${LKP_KSRC} ; ls arch/ alpha/ arm64/ ia64/ m68k/ nios2/ powerpc/ sh/ x86/ arc/ csky/ Kconfig microblaze/ openrisc/ riscv/ sparc/ x86_64/ arm/ hexagon/ loongarch/ mips/ parisc/ s390/ um/ xtensa/
In fact, when cross-compiling, the
ARCH
environment variable is set to the name of one of these folders, in order to compile the kernel for that architecture. For example, when building target “foo” for the AArch64, we’d typically do something likemake ARCH=arm64 CROSS_COMPILE=<...> foo
.As a kernel or driver developer, browsing the kernel source tree is something you will have to get quite used to (and even grow to enjoy!). Searching for a particular function or variable can be a daunting task when the code is in the ballpark of 30 million SLOCs though! Do learn to use efficient code browser tools. I suggest the
ctags
andcscope
Free and Open Source Software (FOSS) tools. In fact, the kernel’s top-levelMakefile
has targets for precisely these:make [ARCH=<cpu>] tags ; make [ARCH=<cpu>] cscope
A must-do! (Withcscope
, withARCH
set to null (the default), it builds the index for the x86[_64]. To, for example, generate the tags relevant to the AArch64, runmake ARCH=arm64 cscope
.) Also, FYI, several other code-browsing tools exist of course; another good one is opengrok.
- io_uring: It’s not an exaggeration to say that
io_uring
and eBPF are considered to be two of the new(-ish) “magic features” that a modern Linux system provides (theio_uring
folder here is the kernel support for this feature)! The reason database/network-like folks are going ga-ga overio_uring
is simple: performance. This framework dramatically improves performance numbers in real-world high I/O situations, for both disk and network workloads. Its shared (between user and kernel-space) ring buffer architecture, zero-copy schema, and ability to use much fewer system calls compared to typical older AIO frameworks, including a polled mode operation, make it an enviable feature. So, for your user space apps to get on that really fast I/O path, check outio_uring
. The Further reading section for this chapter carries useful links. - Rust in the kernel: Yes, indeed, there was a lot of hoopla about the fact that basic support for the Rust programming language has made it into the Linux kernel (6.0 first). Why? Rust does have a well-advertised advantage over even our venerable C language: memory-safety. The fact is, even today, one of the biggest programing-related security headaches for code written in C/C++ – for both OS/drivers as well as user space apps – have at their root memory-safety issues (like the well-known BoF (Buffer Overflow) defect). These can occur when developers generate memory corruption defects (bugs!) in their C/C++ code. This leads to vulnerabilities in software that clever hackers are always on the lookout for and exploit! Having said all that, at least as of now, Rust has made a very minimal entry into the kernel – no core code uses it. The current Rust support within the kernel is to support writing modules in Rust in the future. (There is a bit of sample Rust code, of course, here:
samples/rust/
.) Rust usage in the kernel will certainly increase in time.... The Further reading section has some links on this topic – do check it out, if interested.
We have now completed step 2, the extraction of the kernel source tree! As a bonus, you also learned the basics regarding the layout of the kernel source. Let’s now move on to step 3 of the process and learn how to configure the Linux kernel prior to building it.