Objective
After completion of this assignment students are expected to acquire the following knowledge of the Linux kernel:
- Explain the boot sequence to allow Linux to manage a computer
- Identify its various building blocks
- Show how these building blocks interact with each other
- Relate actual implementation of various kernel subsystem to concepts explained in lecture since the beginning of the semester
Introduction
Linux source is available for public download from Linus Torvalds' GitHub page.
TIP
Software developers use the following convention of semantic versioning: MAJOR.MINOR.PATCH-REV
. For instance 5.9.11 can be parsed into the following version numbers:
- Major version 5
- Minor version 9
- Patch version 11
- Optional revision number
Specifically for Linux kernel, the optional revision numbers indicate the release candidate (such as 5.10-rc5)
Linux Kernel Cross Reference
Reading plain C files for a gargantuan project linux the Linux kernel directly from Torvald's GitHub is definitely overwhelming. To help you navigate into different sections of the kernel, use the annotated kernel source code provided by Bootlin. It provides the following nice navigational features:
- Search box: for searching function name, variable name, C structure, or any plain text
- Search results that show where the name is declared and used
- Kernel version selection as far back as version 0.01 (the first release of Linux)
The following graph shows the growth of the kernel source since its inception (version 0.01).
- The dark blue bars (the first/bottommost) show the growth of the core kernel logic which indicates the it has stayed stable for a while
- The light gray bars (the third from bottom) show the growth of the device drivers which seem to outgrow the other components.
Quick Review of Intel Assembly
The Intel architecture provides the following basic registers.
Four general purpose registers: A, B, C, D with four different variant depending on their length:
- On the older 16-bit models these registers are symbolically referred as: AH (higher 8-bit), AL (lower 8-bit), AX (the entire 16-bit)
- In 32-bit models, additional mnemonics EAX, EBX, ECX, EDX refer to the entire 32-bit value
- Likewise, 64-bit models use RAX, RBX, RCX, and RDX to refer to the entire 64-bit value
+-------------------------------------------------------------+ | 64-bit RAX | +-------------------------------------------------------------+ +---------------------------------+ | 32-bit EAX | +---------------------------------+ +-----------------+ | 16-bit AX | +-----------------+ +-----------------+ |8-bit AH|8-bit AL| +-----------------+
Four segment registers: CS (Code Segment), DS (Data Segment), ES (Extra Segment), and SS (Segment Register).
Stack pointer register: SP (top of stack) and BP (bottom of stack)
Several control registers: CR0, CR1, CR2, and so on
Newer models may provide more registers beyond those listed above.
Parameter passing among assembly routines generally uses the following convention:
- Register AX holds the value (or address) of the first arg
- Register BX holds the value (or address) of the second arg
- Register CX holds the value (or address) of the third arg
- Register DX holds the value (or address) of the fourth arg
Parameter passing between assembly and C functions are done via the stack and return value from a function is usually stored in the register AX. Hence, the following code pattern is quite common:
# Push the actual function arguments
push ___
push ___
push ___
call _some_function_with_three_args
# check the value in AX
Fundamental Building Blocks
Navigating through a gargantuan amount of code (
Directory | Description | Number of files |
---|---|---|
boot | The boot code | 2 |
fs | File system | 20 |
include | Header files | 30 |
init | The kernel "main" function | 1 |
kernel | Various elements of the kernel | 17 |
lib | Basic system calls | 11 |
mm | Memory Management | 2 |
The Boot Sequence
The comment at the top of boot/boot.s
explains what happens at boot time. The code in this file is the first to execute at boot time. Be aware that at this time the boot code interacts with a bare hardware. Hence, any interrupt calls (INT 0x___
) will invoke one of the BIOS functions.
Several important initialization steps performed by boot.s
:
- Load the actual boot code and kernel code from a hard drive or floppy drive
- Initialize the Programmable Interrupt Controller chip (8259)
How does the hardware know where to run the code in
boot.s
?The first version of Linux was designed to run only on Intel CPU and the comment in
boot.s
refers to "protected mode". Do your own research to about Intel "protected mode". What is the counterpart of "protected mode"?How are the two CPU modes above differ from "user mode"/"kernel mode"?
The actual assembly instruction of the boot code begins at the label
start
. Subsequent instructions refers toINT 0x10
and twoCALL
instructions.- Which BIOS routine was invoked by
INT 0x10
? - Which tasks are invoked by the two
CALL
s?
- Which BIOS routine was invoked by
At what memory address is the Linux Kernel (not the boot code) reside in RAM?
Browse through the rest of the assembly code in boot.s
until you find a section about configuring the interrupts. This particular section refers to a particular Intel chip 8259.
- What is the main role of this chip?
- In lecture it was mentioned that an operating system replaces (basic) functionalities provided by BIOS. How does Linux achieve this objective?
Thereafter, the code in boot/head.s
will get loaded to RAM and perform the following tasks:
- Initialize the Global Descriptor Table (GDT) akin to Segment Table described in lecture
- Initialize the Interrupt Descriptor Table (IDT), also known as Interrupt Vector Table
- Initialize paging, page tables (inside the assembly routine
setup_paging
) - Call the
main
function (defined ininit/main.c
). GNU assembler automatically adds a_
prefix to C function names. Somain
in a C program is known as_main
in assembly.
- In lecture it was mentioned that the address of the page table of each process is stored in the
Page Table Base Register
(PTBR). Which control register in the Intel architecture plays the role of PTBR?
The kernel main()
function
TIP
A basic understanding of CPU exceptions will help in answering some of the questions in this section.
The kernel main function is possibly easier to understand, because it is all written in C. The main function calls a number of functions but the following ones are of special interest to us:
Function | Defined in |
---|---|
trap_init | kernel/traps.c |
sched_init | kernel/sched.c |
move_to_user_mode | include/asm/system.h |
fork | kernel/fork.c |
init | init/main.c |
Inspect the implementation of
trap_init
where you will find the code to initialize traps for 32 different "gates".- Are these interrupts generated internally by the CPU or initiated externally outside the CPU?
- Which descriptor table is affected by each trap setting?
Inspect the implementation of
sched_init
where you will find the following function calls:cset_intr_gate(0x20, &timer_interrupt); set_system_gate(0x80, &system_call);
Explain the importance of these two calls and the number
0x80
in the second call. Hint: recall the implementation of Linux system call discussed in Chapter 2.TIP
To provide you with more context, both
timer_interrupt
andsystem_call
are defined as assembly routines inkernel/system_call.s
.
Inspect the assembly routine timer_interrupt
. In addition to saving the CPU registers and communicating with the 8259 chip, the routine makes two interesting calls (or jump):
call _do_timer
jmp ret_from_sys_call
The function
do_timer
is invoked from the above assembly routine with the value CPL passed in the EAX register. What are the two possible values ofCPL
and what each value means?Now inspect the C implementation of
do_timer
inkernel/sched.c
. You will found the following if-statementcif (cpl) current->utime++; else current->stime++;
Speculate the meaning of
utime
andstime
. (Recall that in C non-zero value of any types is equivalent to boolean true, and consequently zero is treated as boolean false).At the end of the function you will find the following snippet:
cif (!cpl) return; schedule();
What value of
cpl
will cause theschedule()
function not called? Based on your answer to 11 and 12 above, explain the intent of the above if-statement in English (not in C)Based on the observations you made in the last few questions above, and without looking at the actual implementation of the scheduling algorithm implemented by Linux, is the scheduling algorithm pre-emptive or non-preemptive? Justify your answer!
TIP
The Intel architecture has two assembly instructions for "return from a function call"
ret
: return from a regular functioniret
: return from an interrupt handling function
System Calls
Entry addresses to system calls are compiled into a "dispatch table" (an array) sys_call_table
defined in include/linux/sys.h
. This table is referenced in an assembly routine _system_call
defined in kernel/system_call.s
. The value in the EAX register determines which system call was invoked (out of 67 system calls)
- It was mentioned in class that the CPU scheduler logic is invoked at the end of interrupt handling or system calls. Show the snippet of code in
_system_call
assembly routine which can be used as a proof for the latter case? - Inspect the assembly code at the entry point
ret_from_system_call
insystem_call.s
. What task is being handled by the assembly block between lines 83-100?
Important system calls related to process management are fork()
, wait()
, exit()
, and execve()
, Each of these system calls are handled by different C or assembly routines:
System Call | Handle By | Defined in | Additional Files |
---|---|---|---|
fork() | _sys_fork | kernel/system_call.s | kernel/fork.c |
wait() or waitpid() | sys_waitpid() | kernel/exit.c | |
exit() | sys_exit() and do_exit() | kernel/exit.c | |
execve() | _sys_execve and do_execve() | kernel/system_call.s | fs/exec.c |
- In lecture it was mentioned that
fork()
creates a "twin" of a process. Which snippet of code supports this statement? - Why does the function
do_exit()
callsschedule()
? - Why do you think the function
do_execve()
is defined as part of the Filesystem (under the directory "fs")?
Scheduler and Context Switching
The CPU scheduler is implemented as a C function schedule
(in kernel/sched.c
).
- At the end of
schedule()
, the Linux scheduler callsswitch_to
to dispatch the "next" task. Based on the code at lines 87-103, what criterial used by the algorithm to pick the "next" task?