Monday, October 25, 2004

Overview of the linux kernel


This article discusses the basic kernel features of the linux kernel.It starts from the history of the linux kernel to basic operataing designs required for a kernel and how linux implements it in a nutshell.Then it concludes by promising future articles on different layers of the linux kernel and also an articles on Linux kernel 2.6 features,Future of Linux which will be contributed by different members of the lkg_india group.


I myself being a kernel newbie,has intended to write the article in such a fashion that it will be understood by anyone who has used the computer.It can also be read by experienced kernel developers so as to refresh their thoughts and also suggest/crticise the mistakes in the article.This article is dedicated for free for the benefit and education of all. That a person seeking knowledge should have the opportunity to find it.Thanks to every other document written with the same vein which has made this article possible.


To start off immediately what does kernel mean?Well thats what this blog is for,for us to understand what the kernel is.Hopefully future articles in the blog will make us understand the kernel completely and clearly.Well here is the literal meaning of the word "kernel" straight from dict.org

"The central, substantial or essential part of anything;the gist; the core;"

This is how the Online dictionary of computing defines the word kernel

"The essential part of Unix or other operating systems, responsible for resource allocation,low-level hardware interfaces, security etc"

Operating System

Any computer system includes a basic set of programs called the operating system. The most important program in the set is called the kernel.The other programs are less crucial utilities; they can provide a wide variety of interactive experiences for the user as well as doing all the jobs the user bought the computer for but the essential shape and capabilities of the system are determined by the kernel.

Is kernel the entire operating system?

No,as discussed above.The kernel is the core component of the operating system.The operating system contains the Kernel plus other systen utilities which use the kernel to provide higher level house keeping tasks.Technically speaking,Linux is only the kernel,as it does not provide system utilities like file system utilities,compilers,editors,the graphical user interface which are provided by any other operating system.So linux users typically rely on commercial ditributions like Suse,Red Hat etc., to have the entire operating system.Having known the what a Operating system and what the kernel is,let us know something about the history of the Linux kernel.

Linux,the revolutionay open source kernel

Linux was intially developed by Linus in Aug 1991.As a source of inspiration listed below is his mail on the famous comp.os.newsgroup

"----- Message from "Linus Benedict Torvalds"
on Mon, 26 Aug 1991 02:27:08 +0530 -----

Subject: What would you like to see most in minix?

Hello everybody out there using minix-

I'm doing a (free) operating system (just a hobby, won't be big
and professional like gnu) for 386(486) AT clones. This has
been brewing since april, and is starting to get ready. I'd like
any feedback on things people like/dislike in minix; as my OS
resembles it somewhat (same physical layout of the file-sytem
due to practical reasons)among other things.

I've currently ported bash (1.08) an gcc (1.40), and things seem to work.
This implies that i'll get something practical within a few months, and I'd
like to know what features most people want. Any suggestions are welcome,
but I won't promise I'll implement them :-)

Linus Torvalds torvalds@kruuna.helsinki.fi"

As seen from the mail there was an effort by the FSF(Free Software Foundation) headed by Richard Stallman to build a complete free professional Operating System. So lets talk about FSF before proceeding further.


The FSF was founded by Richard Stallman in 1984.Its also know as the GNU software project which was launched to built a complete free operating system.When richard started working for writing a free OS he felt that he should use a free editor to write programs for the Operating system,then he wrote the GNU emacs editor.Then they needed a free compiler to compile their C programs,thereby was born the GNU Compiler collection(gcc).Later when linus released Linux kernel free and it became popular,it was adopted by FSF as their kernel.So GNU utilities were used with the Linux kernel to make the complete Operating System.Now Linux remains as one of the most popular Open source operating system in the world.

Features of the Linux Kernel

I.Monolithic kernel with module support

The linux kernel is monolithic with module suport.So now let us try to decipher the meaning of the previous sentence and why linux adopted a monolithic kernel with module strategy.

Monolithic kernel
A monolithic kernel is a single large complex "do-it-yourself" kernel program which is composed of several different logical entities(kernel layers).Each kernel layer is integrated in to the large kernel program and runs in kernel mode(more on kernel mode later)on behalf of the current process.


A micro kernel consists of a small set of important fucntions in the kernel generally a simple scheduler,synchronisation primitives.Several System processes that run on top of the kernel to implement other OS layer functions like memory allocators,device drivers,system calls,file system etc., These kernel layers cordinate together by message passing between them.Therby because of the message passing the microkernel is slower than the monolithic kernel.

Monolithic Vs Microkernel

Monolithic Kernel is faster than microkernel as stated above.However,Microkernel has some theoritical advantages over the monolithic kernel.They are as follows,

1.The microkernel occupies less RAM,since system processes(as disscussed above) that are not doing their functionalities are swapped out or destroyed.

2.The architecture of the microkernel forces the programmers to adopt a modularized approach
as different layers of the kernel are independant of each other.Moreover,the different layers interact with each other through clear well defined software interfaces.

3.Moreover, an existing microkernel operating system can be fairly easily ported to other architectures, since all hardware dependent components are generally encapsulated in the microkernel code.

Modules,the linux way

Modules are a kernel feature that effectively achieves many of the theoretical advantages of microkernels without introducing performance penalties. A module is an object file whose code can be linked to (and unlinked from) the kernel at runtime. The object code usually consists of a set of functions that implements a filesystem, a device driver, or other features at the kernel's upper layer. The module, unlike the external layers of microkernel operating systems, does not run as a specific process. Instead, it is executed in Kernel Mode on behalf of the current process, like any other statically linked kernel function.

Advantages provided by a monolithic kernel with modules

1.Less main memory usage
A module is linked to the kernel when its functionality is needed and unlinked when its no longer used.This mechanism is done automatically by the kernel and is transparent to the user.

2.Modularized approach
Modules force the programmers to intoduce well defined software interfaces for interaction.

3.Platform independence
A module does not depend on a fixed hardware platform.A module like device driver is infact specfic to the device but not to the hardware platform(x86,sparc etc..,)

Since the module is linked in to the kernel,it is faster like the monolithic kernel as there is no message passing as in microkernel.Infact there is small time we lose for linking and unlinking of the modules which is less than that of the time required for message passing in microkernels.

II.Linux Filesystem

Files are a basic abstraction provided by the operating system along with processes(more about processes later).A file is an information container structured as a sequence of bytes.From the user's point of view, files are organized in a tree-structured name space starting from /(parent) which is called the root directory.

Different file types in linux

The beauty of linux lies in the fact that it almost treats everything as files including devices. There are the following types of files

1. Regular files
2. Directory files
3. Symbolic links
4. Device files(character,block).
5. Pipes
6. Sockets

The first three types are constituents of the linux filesystem.Device files are related to I/O devices and device drivers integrated into the kernel.Pipes and sockets are special files used for interprocess communication.


All information needed by the filesystem to handle a file in included in a data structure called the inode.Each file has its own inode which the filesystem uses to identify the file.The following information is kept in the inode
1.File type(as discussed above)
2.Number of hard links associated with the file(see next section)
3.File length in bytes
4.Inode number that identifies the file within the filesystem
5.User ID of the file owner(discussed below)
5.Group ID of the file(discussed below)
6.The last modify time
7.Access rights and file mode (discussed below)

Hard links and symbolic links

The same file may have several links included in the same directory or in different ones, thus several filenames.


A hardlink is just a different filename but points to the same inode on the disc.So the file information of the both the file and the hard link will coincide as they point to the same inode.The unix command used to create a hard link is
$ln f1 f2

f2 is the hard link for file f1

Limitations of hard link

1.Hard links cannot exist to directories as it might transform the tree structure in to a graph with cycles thus making ti impossible to locate a file according to its name.

2.Hard links can be created only for files in the same filesystem as inode is the same.This is a serious limitation as Linux supports variuos other filesystems.


Symbolic links are short files that contain an arbitrary pathname of another file. The pathname may refer to any file located in any filesystem,it may even refer to a non exixtent file or a file in another filesystem.Thus symbolic links are short files than contain the pathname of the file that is linked to.So a symbolic link will have a separate inode for itself.The unix command to create a symbolic link is
$ln -s f1 f2

f2 is a symbolic link to f1.The linux filesystem will create a sym link f2 and will write in to it the pathname for f1.In this way f2 can refer to f1.

Users and Groups

A multiuser system is a computer that is able to concurrently and independently execute several applications belonging to two or more users. "Concurrently" means that applications can be active at the same time and contend for the various resources such as CPU, memory, hard disks, and so on. "Independently" means that each application can perform its task with no concern for what the applications of the other users are doing.

In a multiuser system, each user has a private space on the machine: typically, he owns some quota of the disk space to store files, receives private mail messages, and so on. The operating system must ensure that the private portion of a user space is visible only to its owner. In particular, it must ensure that no user can exploit a system application for the purpose of
of violating the private space of the another user.
All users are identified by a unique number called the User ID , or UID. Usually only a restricted number of persons are allowed to make use of a computer system. When one of these users starts a working session, the operating system asks for a login name and a password.If the user does not input a valid pair, the system denies access. Since the password the assumed to be private the users privacy is maintained.In order to selectively share material with other users, each user is a member of one or more groups, which are identified by a unique number called a Group ID,or GID. Each file is associated with a UID and GID.

Access rights and mode

The users of a file fall in to one of this three classes

1.The owner of the file
2.The users who belong to the same groups as that of the owner
3.All other users(others) in the system

There are three types of access rights for each of these classes namely
Read,write and execute.
Thus the set of access rights associated with a file in linux consists of nine(3(for diff classes of users) * 3(for different access rights)) different flags.There are three additional flags which define the file mode which have a meaning when applied to executable files.They are
1.SUID flag

If the executable file has the SUID flag set, the process gets the UID of the file owner.


If the executable file has the SUID flag set, the process gets the GID of the file group.

The suid and sgid programs are important programs to be protected in a secure system as they change the id's during execution.So an intruder might use these programs for executing something malicious on behalf of the user who is suid or sgid.

An executable file with the sticky flag set corresponds to a request to the kernel to keep the program in memory after its execution terminates.These flag is set when many processes share the same program for example vi might be used by many processes.So vi can have its sticky bit set.This flag has become obsolete as other approaches like copy-on-write are used now for sharing code pages between processes.

chmod is the unix command used to allow the user to set access and mode flags for a file.

$chmod 4777 f1

4777 in binary is 100111111111.So there are 12 bits corresponding to User classes,access and mode permissions as below.

100---will set the suid flag for f1(mode of the file)
111---will give read,write,exec access to owner of f1
111---will give read,write,exec access to other users of the group which the user belongs to.
111---will give read,write,exec access to other remaining users in the system.

Virtual File System technology

Linux follows a object oriented Virtual File System technology inspired from SVR4 and solaris.There linux supports most of the filesystems like DOS,FAT,ext3,resierfs,JFS etc..,Also porting a file system to Linux is very easy task because of the VFS technology that linux follows.

III.Processes in Linux

Process is another fundamental abstraction(apart from File) provided by the OS.Having seen a overview of the filesystem in linux let us see how linux handles processes.

A process can be defined either as "an instance of a program in execution," or as the "execution context" of a running program. In traditional operating systems, a process executes a single sequence of instructions in an address space,the address space is the set of memory addresses that the process is allowed to reference.Operating systems like linux allow multiple execution flows,that is,multiple sequences of instructions executed in the same address space.

Multiuser systems must allow different processes to be active concurrently and thereby contend for tbe resources,mainly the CPU.Such systems are said to be multitasking systems.The scheduler is the part of the kernel which decided which process will run on the CPU at a given instant of time.It is done in such a manner that every process feels that it is the only process running on the CPU.This concept is called virtualization of the processor.Another virtualisation that is provided by linux(or any OS) to processes is the virtual memory where in a process feels that it has the entire memory on system available to it.Processes of a multiuser system must be preemptive ie., the scheduler of the kernel will decide how long each process can hold the CPU.Thereby if a higher priority process comes in to execution the scheduler will preempt the lower priority running process.

Linux is a multiprocessing system with preemptive processes.

Process management in Linux

The fork( ) and exit( ) system calls are used respectively to create a new process and to terminate it.Linux maintains a clear distinction between the process and the program by using exec( )-like system call to load a new program.After exec has been done the process resumes execution with a brand new address space containing the loaded program.

Process Creation
Processes in Linux follow a parent child relationship.The Process which invokes the fork is the parent and the process that is created is the child.The init process(created by the init.c of the kernel) is the root parent of all the other processes in linux.The task_struct is the data structure in the kernel that defines the process.Parents and children can easily find each other as the data structure contains information about the relationship.Linux implements fork using the copy-on-write approach which defers address space duplication of parent to the child on fork.The address space is copied only when a write is being made in to the address space.So till a write happens the parent and the child share only the Page tables and not the address space.The copy-on-write has the following advantage

1.Most instances of fork are followed by an exec to a program as the child does some other functionailty when compared to the program.Thereby it is a waste(an overhead) to copy the address space of the parent to the child and then again immediately overwrite the address space.So this overhead is reduced by the copy-on-write approach.

Process Termination
The exit system call is used to terminate a process.The kernel handles this system call by releasing the resources owned by the process and sending the parent process a SIGCHLD signal, which is ignored by default.

Zombie processes
The parent enquires about the termination of the child process by the wait( ) system call which allows a process to wait until one of its children terminates; it returns the process ID (PID) of the terminated child.A special zombie process state is introduced to represent terminated processes on which the parent has not issued a wait system call.The process descriptor of the child is released after the wait is executed by the parent and the child process goes to the stopped state.Now the question arises if the parent does not issue a wait call and if it terminates what happens to the child in Zombie state.There will be many zombies which will occupy useful memory.The solution lies in init process which takes over as the parent of child process whose parents have terminated.It routinely issues wait calls thereby getting rid of the zombies.

IV.The process/Kernel model

Having known about the kernel architecture and the basic abstractions(file,process) let us now see how the linux kernel is modelled.On multiuser/multitasking systems the operating system must hide all low level details concerning the physical organization of the computer from applications run by user.So when a user application wants to access a hardware,it request the kernel which evaluates the request and if it chooses to grant access,it interacts with the hardware on behalf of the process.This thereby enhances the security on multiuser systems which prevent user applications from damaging the system hardware resources.Probably,this is the reason why DOS does not require such a model as it is single user system which allows the user to do anything with the systems hardware.

Well how does linux implement the above mentioned feature.All modern operating systems,implement the above mentioned feature with the help of hardware specific features which forbids the user programs to interact directly with the hardware.For example,the x86 CPU provides four rings of execution from ring 0 to ring 3 in the order of descending privileges.Linux uses the two rings namely ring 0 and ring 3 for implementing the feature.User mode applications run in ring 3 of x86 and kernel runs in ring 0 of x86.Therefore these rings correspond to the User mode and kernel mode concept of the linux operating system.

Note:Intel introduced protected mode(ring levels) starting from 80386.So Linux was developed for 80386 and above(commonly referred as i386/i686 architectures).

When a program is executed in User Mode, it cannot directly access the kernel data structures or the kernel programs. When an application executes in Kernel Mode, however, these restrictions no longer apply. Each CPU model provides special instructions to switch from User Mode to Kernel Mode and vice versa. A program executes most of the time in User Mode and switches to Kernel Mode only when requesting a service provided by the kernel. When the kernel has satisfied the program's request, it puts the program back in User Mode.

The kernel is a process manager and not a process by itself.The creation,deletion and handling of processes are done by a group of routines in the kernel.The process/kernel model in linux allows a process to request a kernel service through system calls.System calls thereby take the process from user mode to kernel mode,does the necessary request and puts the process back in the user mode.

A switch to Kernel mode from user mode will happen by any one of these methods

1.Through system call by a process as discussed above
2.The CPU exceutes an exception.The exception has to be handled by the kernel on behalf of the process that caused the exception.For example,the kernel must handle the page fault exception.
3.A device issues an interrupt to the CPU to notify the CPU an event.In this case the kernel executes the corresponding interrupt handler for the device.

Linux also has some privileged kernel level processes.They are called kernel threads.Linux uses kernel threads in a limited way for certain functionalites.They execute in the kernel thread.keventd is an example of a kernel thread.

V.Kernel Synchronisation

On multiuser/multitasking systems many processes must be handled by the kernel at a single instant of time as several processes may be executing in the kernel at the same time.For example ,a process executing on a CPU in a uniprocessor system might be waiting on some I/0.During this time the process in kernel mode is interleaved and another process is executed in the kernel.When the I/O interrupt is finished the process waiting for I/O is executed again.So the Linux kernel must be reentrant,ie..,multiple processes must be handled by the kernel maintaining the global data structures they use in a consistent state.
Before proceeding further,we will see the definition of a kernel control path.A kernel control path is a sequence of instructions executed in the kernel mode on behalf of a process or on behalf of an interupt.

How can Kernel reentrancy be achieved?

The following are the methods to achieve kernel reentrancy

1.Reentrant functions
2.Atomic operations
3.Non Premptive kernel
4.Interrupt disabling
5.Locking mechanisms(semaphores and spinlocks)

Reentrant functions
Reentrant functions are those functions which operate only on local variable and not on global data structures.But the kernel cannot only be limited to reentrant functions.Also the kernel has a fixed stack which is small,so local variable must be less.

Before looking at other mechanisms of achieving reentarncy let us see what a race condition is and what is a critical region.

Assume that there is a global data structure(a resource) R1.If R1 is one the resource is free.Now a kernel control path KCP1 reads the value of R1 which is 1 i.e.., the resource is free.Now if KCP1 is interleaved from the kernel and if KCP2 reads the value of R1 which is still 1.It will take the resource and decrement the value to 0.Now if KCP1 resumes execution it still sees that the resource is free and decrements the value of R1 to -1,thereby taking the resource.So both the KCP's are using the resources leading to dangerous effects.This is a race condition.

Atomic operations
It is always safe to access a global variable with a single atomic uninteruptible instruction.For example in the previuos example if the two kernel paths have read the data and decremented the value of R1 in a singl operation there would have been no race condition.

But its always not possible to access data structures in a single atomic operation.Any section of code that should be finished by a process befored another process is scheduled is called the critical region.Further mechanisms we will see how to protect critical regions.

Non Premptive kernel
A simple solution to synchronisation problems is to make the kernel non preemptive ie., when a process is in kernel mode it cannot be interleaved by any other process until it voluntarily relinquishes the CPU(in which case it makes sure that the Data structures are in a consistent state).Therefore in a non preemptive kernel all the global data structures except those that are used by interupts and exceptions are safe(as interleaving of a process in kernel mode happens when a interuppt or exception occurs).Non premptability is ineffective in Multiprocessor systems as two kernel control paths executing in different CPU's can access the same data structure.The linux kernel was non preemptable until 2.5 devt series and 2.6 stable series.Now the kernel is preemptible.Preemptible kernels are more suited for time critical real time processes.

Interrupt disabling
Another method to achieve synchronisation is to disable interuppts before entering a critical region and enabling it after the critical region.This is a simple solution but is not optimal as large critical region will have the Hardware interrupts freezed for a long time leading to a freeze.Also it is ineffective in multiprocessor systems as the data structure might be accessed by another process running in a different CPU.

Locking mechanisms
The locking mechanisms lock the corresponding global data structure in question thereby making sure that when the lock is acquired only a single process can access the data structure.They are effective both in Uniprocessor and Multiprocessor systems.The locking mechanisms used by the linux kernel are

2.Spin Locks

A semaphore is simply a counter associated with a data structure; the semaphore is checked by all kernel threads before they try to access the data structure.Each semaphore may be viewed as an object composed of:

a)An integer variable
b)A list of waiting processes
c)Two atomic methods: down() and up()

The down() method decrements the value of the semaphore. If the new value is less than 0, the method adds the running process to the semaphore list and then blocks (i.e., invokes the scheduler). The up() method increments the value of the semaphore and, if its new value is greater than or equal to 0, reactivates one or more processes in the semaphore list. Each data structure to be protected has its own semaphore, which is initialized to 1. When a kernel control path wishes to access the data structure, it executes the down() method on the proper semaphore. If the value of the new semaphore isn't negative, access to the data structure is granted. Otherwise, the process that is executing the kernel control path is added to the semaphore list and blocked. When another process executes the up() method on that semaphore, one of the processes in the semaphore list is allowed to proceed.

In multiprocessor systems, semaphores are not always the best solution to the synchronization problems. Some kernel data structures should be protected from being concurrently accessed by kernel control paths that run on different CPUs. In this case, if the time required to update the data structure is short, a semaphore could be very inefficient. To check a semaphore, the kernel must insert a process in the semaphore list and then suspend it. Since both operations are relatively expensive, in the time it takes to complete them, the other kernel control path could have already released the semaphore. In these cases, multiprocessor operating systems make use of spin locks. A spin lock is very similar to a semaphore, but it has no process list: when a process finds the lock closed by another process, it "spins" around repeatedly, executing a tight instruction loop until the lock becomes open. Of course, spin locks are useless in a uniprocessor environment. When a kernel control path tries to access a locked data structure, it starts an endless loop. Therefore, the kernel control path that is updating the protected data structure would not have a chance to continue the execution and release the spin lock. The final result is that the system hangs.

Signals and interprocess communication
The linux kernel implements signals for communicating to the process an event.For example,SIGKILL is sent to the process if it receives a terminate signal.The linux kernel implements 32 different posix signals.User processes can communicate with each other with the help of SYS V IPCs like sahred memory,pipes,fifos,semaphores and message queues.

VI.Memory Management in Linux

Memory management is the most complex(as it is architecture dependant) and important activity in the kernel.Linux supports Virtual memory management.Linux uses paging to implement virtual memory concept.Linux does not use segmentation.Memory management will be described in detail in a later article.For now the advantages a virtual memory offers are

1.Several processes can be executed concurrently.
2.It is possible to run applications whose memory needs are larger than the available physical memory.
3.Processes can execute a program whose code is only partially loaded in memory.
4.Each process is allowed to access a subset of the available physical memory.
5.Processes can share a single memory image of a library or program
6.Programs can be relocatable, that is, they can be placed anywhere in physical memory.
7.Programmers can write machine-independent code, since they do not need to be concerned about physical memory organization.


The usage of memory is one very important thing that has to be taken care by the kernel.Linux clearly distiguishes between Memory that is dedicated to the kernel and the memory that can be used by the processes.The static kernel image is loaded from the 1st Megabyte of the RAM and is pinned(it is not swapped or paged out).The remaining part of the RAM is used for

1.Kernel dynamic structures
2.Memory for processes
3.Caches for disks etc., to get better performance.

How the memory is allocated for above mentioned three is very important and hence requires a separate article.In a few words, linux takes care of memory allocation problems like internal fragmentation,external fragmentation etc., by making use of buddy system algorithm and slab allocation mechanism.So Linux uses a Slab allocator on top of a buddy system algorithm.

Virtual Adddress space of a process
Every process in linux has a virtual address space(ranging from 0 to 4GB on a 32 bit intel CPU).
A address space of a process contains all the virtual memory addresses that the process can reference.The kernel usually stores a process virtual address space as a list of memory area descriptors(for example memory area descriptors to the code segment,stack segment,heap etc.,).Linux uses demand paging ie., the page is allocated after a page fault happens.

VII.Device drivers

Having discussed most of kernel subsystems namely filesystem,process,synchronisation,interrupts,syscalls,meory management lets see in a nutshell about device drivers.The kernel interacts with I/O devices with help of device drivers.Device drivers are included in the kernel and consist of data structures and functions that control one or more devices like hard disks,keyboard,mouse etc.,Each driver interacts with the remaining part of the kernel (even with other drivers) through a specific interface.There the device driver layer can be seen as the last layer in the kernel interacting with other layers through well defined interfaces.This approach helps programmers to write device specfic code in a separate module without knowing about the kernel source code as well as the internal architecture.

VIII.Linux implementation of threads

Finally,it is mandatory to discuss how linux supports multithreaded application programming as it does in a unique manner.In linux threads are also processes which share the address space of the processes.Linux implements threads as processes because the process creation time in linux is much faster compared to other OS'es.A thread is therefore a process which can scheduled independently of the main process and it shares the same address space.

In summary a linux kernel is monolithic with modular support and consists of the following subsystems,
2.Process management
3.Memory management
5.System calls
6.Device drivers
7.Interrupts and exceptions
The explanation and deciphering of the different kernel layers will be explained in future articles contirbuted by the various members of the lkg_india.I also promise two more articles which will be extension of this article namely
1.Linux kernel 2.6 features
2.Linux VS other Operating systems and the future of linux


Thanks for the time the reader has devoted to reading the article.Any suggesttion/critcisms are welcome and i kindly request the readers to post in a comment after reading the article.Negavtive feedbacks are very much welcome as they stabilise the article.

Saravanan S(sar.kernel@gmail.com)