Inspecting Linux Source Code

Objective

After completion of this assignment students are expected to acquire the following knowledge of the Linux kernel:

Explain the boot sequence to allow Linux to manage a computer
Identify its various building blocks
Show how these building blocks interact with each other
Relate actual implementation of various kernel subsystem to concepts explained in lecture since the beginning of the semester

Introduction

Linux source is available for public download from Linus Torvalds' GitHub page.

TIP

Software developers use the following convention of semantic versioning: MAJOR.MINOR.PATCH-REV. For instance 5.9.11 can be parsed into the following version numbers:

Major version 5
Minor version 9
Patch version 11
Optional revision number

Specifically for Linux kernel, the optional revision numbers indicate the release candidate (such as 5.10-rc5)

Linux Kernel Cross Reference

Reading plain C files for a gargantuan project linux the Linux kernel directly from Torvald's GitHub is definitely overwhelming. To help you navigate into different sections of the kernel, use the annotated kernel source code provided by Bootlin. It provides the following nice navigational features:

Search box: for searching function name, variable name, C structure, or any plain text
Search results that show where the name is declared and used
Kernel version selection as far back as version 0.01 (the first release of Linux)

Elixir Cross Ref

The following graph shows the growth of the kernel source since its inception (version 0.01).

Linux Kernel Size

The dark blue bars (the first/bottommost) show the growth of the core kernel logic which indicates the it has stayed stable for a while
The light gray bars (the third from bottom) show the growth of the device drivers which seem to outgrow the other components.

Quick Review of Intel Assembly

The Intel architecture provides the following basic registers.

Four general purpose registers: A, B, C, D with four different variant depending on their length:

On the older 16-bit models these registers are symbolically referred as: AH (higher 8-bit), AL (lower 8-bit), AX (the entire 16-bit)
In 32-bit models, additional mnemonics EAX, EBX, ECX, EDX refer to the entire 32-bit value
Likewise, 64-bit models use RAX, RBX, RCX, and RDX to refer to the entire 64-bit value

+-------------------------------------------------------------+
|                      64-bit RAX                             |
+-------------------------------------------------------------+
                            +---------------------------------+
                            |           32-bit EAX            |
                            +---------------------------------+
                                            +-----------------+
                                            |     16-bit AX   |
                                            +-----------------+
                                            +-----------------+
                                            |8-bit AH|8-bit AL|
                                            +-----------------+

Four segment registers: CS (Code Segment), DS (Data Segment), ES (Extra Segment), and SS (Segment Register).
Stack pointer register: SP (top of stack) and BP (bottom of stack)
Several control registers: CR0, CR1, CR2, and so on

Newer models may provide more registers beyond those listed above.

Parameter passing among assembly routines generally uses the following convention:

Parameter passing between assembly and C functions are done via the stack and return value from a function is usually stored in the register AX. Hence, the following code pattern is quite common:

asm

# Push the actual function arguments
push ___
push ___
push ___
call _some_function_with_three_args
# check the value in AX

Fundamental Building Blocks

Navigating through a gargantuan amount of code ( $>$ 27 million lines of code as of 2020) would be a challenging task. To understand the major building blocks of the Linux kernel, it would be easier to browse the code of the first release (version 0.01) written in 1991 when kernel is organized into a small number of components:

Directory	Description	Number of files
`boot`	The boot code	2
`fs`	File system	20
`include`	Header files	30
`init`	The kernel "main" function	1
`kernel`	Various elements of the kernel	17
`lib`	Basic system calls	11
`mm`	Memory Management	2

The Boot Sequence

The comment at the top of boot/boot.s explains what happens at boot time. The code in this file is the first to execute at boot time. Be aware that at this time the boot code interacts with a bare hardware. Hence, any interrupt calls (INT 0x___) will invoke one of the BIOS functions.

Several important initialization steps performed by boot.s:

Load the actual boot code and kernel code from a hard drive or floppy drive
Initialize the Programmable Interrupt Controller chip (8259)

How does the hardware know where to run the code in boot.s?
The first version of Linux was designed to run only on Intel CPU and the comment in boot.s refers to "protected mode". Do your own research to about Intel "protected mode". What is the counterpart of "protected mode"?
How are the two CPU modes above differ from "user mode"/"kernel mode"?
The actual assembly instruction of the boot code begins at the label start. Subsequent instructions refers to INT 0x10 and two CALL instructions.
- Which BIOS routine was invoked by INT 0x10?
- Which tasks are invoked by the two CALLs?
At what memory address is the Linux Kernel (not the boot code) reside in RAM?

Browse through the rest of the assembly code in boot.s until you find a section about configuring the interrupts. This particular section refers to a particular Intel chip 8259.

What is the main role of this chip?
In lecture it was mentioned that an operating system replaces (basic) functionalities provided by BIOS. How does Linux achieve this objective?

Thereafter, the code in boot/head.s will get loaded to RAM and perform the following tasks:

Initialize the Global Descriptor Table (GDT) akin to Segment Table described in lecture
Initialize the Interrupt Descriptor Table (IDT), also known as Interrupt Vector Table
Initialize paging, page tables (inside the assembly routine setup_paging)
Call the main function (defined in init/main.c). GNU assembler automatically adds a _ prefix to C function names. So main in a C program is known as _main in assembly.

In lecture it was mentioned that the address of the page table of each process is stored in the Page Table Base Register (PTBR). Which control register in the Intel architecture plays the role of PTBR?

The kernel `main()` function

TIP

A basic understanding of CPU exceptions will help in answering some of the questions in this section.

The kernel main function is possibly easier to understand, because it is all written in C. The main function calls a number of functions but the following ones are of special interest to us:

Function	Defined in
trap_init	kernel/traps.c
sched_init	kernel/sched.c
move_to_user_mode	include/asm/system.h
fork	kernel/fork.c
init	init/main.c

Inspect the implementation of trap_init where you will find the code to initialize traps for 32 different "gates".
- Are these interrupts generated internally by the CPU or initiated externally outside the CPU?
- Which descriptor table is affected by each trap setting?
Inspect the implementation of sched_init where you will find the following function calls:
c
```
set_intr_gate(0x20, &timer_interrupt);

set_system_gate(0x80, &system_call);
```
1
2
3
Explain the importance of these two calls and the number 0x80 in the second call. Hint: recall the implementation of Linux system call discussed in Chapter 2.
TIP
To provide you with more context, both timer_interrupt and system_call are defined as assembly routines in kernel/system_call.s.

Inspect the assembly routine timer_interrupt. In addition to saving the CPU registers and communicating with the 8259 chip, the routine makes two interesting calls (or jump):

asm

call _do_timer

jmp ret_from_sys_call

The function do_timer is invoked from the above assembly routine with the value CPL passed in the EAX register. What are the two possible values of CPL and what each value means?
Now inspect the C implementation of do_timer in kernel/sched.c. You will found the following if-statement
c
```
if (cpl)
    current->utime++;
else
    current->stime++;
```
1
2
3
4
Speculate the meaning of utime and stime. (Recall that in C non-zero value of any types is equivalent to boolean true, and consequently zero is treated as boolean false).
At the end of the function you will find the following snippet:
c
```
if (!cpl) return;
schedule();
```
1
2
What value of cpl will cause the schedule() function not called? Based on your answer to 11 and 12 above, explain the intent of the above if-statement in English (not in C)
Based on the observations you made in the last few questions above, and without looking at the actual implementation of the scheduling algorithm implemented by Linux, is the scheduling algorithm pre-emptive or non-preemptive? Justify your answer!

TIP

The Intel architecture has two assembly instructions for "return from a function call"

ret: return from a regular function
iret: return from an interrupt handling function

System Calls

Entry addresses to system calls are compiled into a "dispatch table" (an array) sys_call_table defined in include/linux/sys.h. This table is referenced in an assembly routine _system_call defined in kernel/system_call.s. The value in the EAX register determines which system call was invoked (out of 67 system calls)

It was mentioned in class that the CPU scheduler logic is invoked at the end of interrupt handling or system calls. Show the snippet of code in _system_call assembly routine which can be used as a proof for the latter case?
Inspect the assembly code at the entry point ret_from_system_call in system_call.s. What task is being handled by the assembly block between lines 83-100?

Important system calls related to process management are fork(), wait(), exit(), and execve(), Each of these system calls are handled by different C or assembly routines:

System Call	Handle By	Defined in	Additional Files
`fork()`	`_sys_fork`	`kernel/system_call.s`	`kernel/fork.c`
`wait()` or `waitpid()`	`sys_waitpid()`	`kernel/exit.c`
`exit()`	`sys_exit()` and `do_exit()`	`kernel/exit.c`
`execve()`	`_sys_execve` and `do_execve()`	`kernel/system_call.s`	`fs/exec.c`

In lecture it was mentioned that fork() creates a "twin" of a process. Which snippet of code supports this statement?
Why does the function do_exit() calls schedule()?
Why do you think the function do_execve() is defined as part of the Filesystem (under the directory "fs")?

Scheduler and Context Switching

The CPU scheduler is implemented as a C function schedule (in kernel/sched.c).

At the end of schedule(), the Linux scheduler calls switch_to to dispatch the "next" task. Based on the code at lines 87-103, what criterial used by the algorithm to pick the "next" task?

Objective ​

Introduction ​

Linux Kernel Cross Reference ​

Quick Review of Intel Assembly ​

Fundamental Building Blocks ​

The Boot Sequence ​

The kernel main() function ​

System Calls ​

Scheduler and Context Switching ​