Sunday, January 09, 2005

Some fun with Linux Netfilter Hooks

|=------------=[ Some Tricks with Linux netfilter hooks ]=---------------=|

This document is based on my understanding of bioforge's "Hacking the Linux Kernel Network Stack" article in Phrack issue 61. Please do correct me if I am wrong somewhere.The sourcecodes provided here is compiled and tested in 2.4.28 kernel.

Netfilter is a subsystem in the Linux 2.4 kernel. Netfilter provides an generic and abstract interface to the standard routing code.This is currently used in Linux kernel for packet filtering,mangling,NAT(network address translation) and queuing packets to the userspace.Netfilter makes connection tracking possible through the use of various hooks in the kernel's network code. These hooks are places that kernel code, either statically built or in the form of a loadable module, can register functions to be called for specific network events. An example of such an event is the reception of a packet.

Although Linux 2.4 supports hooks for IPv4,IPv6 and DECnet, only IPv4 will be discussed in this document.

Netfilter defines five hooks for IPv4. The declaration of the symbols for these can be found in "linux/netfilter_ipv4.h". These hooks are displayed in the table below:

Table 1: Available IPv4 hooks

Hook Called
NF_IP_PRE_ROUTING After sanity checks, before routing decisions.
NF_IP_LOCAL_IN After routing decisions if packet is for this host.
NF_IP_FORWARD If the packet is destined for another interface.
NF_IP_LOCAL_OUT For packets coming from local processes on
their way out.
NF_IP_POST_ROUTING Just before outbound packets "hit the wire".

The NF_IP_PRE_ROUTING hook is called as the first hook after a packet has been received. This is the hook that the module presented later will utilise. Yes the other hooks are very useful as well, but for now we will focus only on NF_IP_PRE_ROUTING.

After hook functions have done whatever processing they need to do with a packet they must return one of the predefined Netfilter return codes.
These codes are:

Table 2: Netfilter return codes

Return Code Meaning
NF_DROP Discard the packet.
NF_ACCEPT Keep the packet.
NF_STOLEN Forget about the packet.
NF_QUEUE Queue packet for userspace.
NF_REPEAT Call this hook function again.

The NF_DROP return code means that this packet should be dropped completely and any resources allocated for it should be released. NF_ACCEPT tells Netfilter that so far the packet is still acceptable and that it should move to the next stage of the network stack. NF_STOLEN is an interesting one because it tells Netfilter to "forget" about the packet.
What this tells Netfilter is that the hook function will take processing of this packet from here and that Netfilter should drop all processing of it. This does not mean, however, that resources for the packet are released. The packet and it's respective sk_buff structure are still valid, it's just that the hook function has taken ownership of the packet away
from Netfilter. NF_REPEAT requests that Netfilter calls the hook function again.

Registration of a hook function is a very simple process that revolvesaround the nf_hook_ops structure,defined in "linux/netfilter.h".The definition of this structure is as follows:

struct nf_hook_ops {
struct list_head list;

/* User fills in from here down. */
nf_hookfn *hook;
int pf;
int hooknum;
/* Hooks are ordered in ascending priority. */
int priority;

The list member of this structure is used to maintain the lists of Netfilter hooks and has no importance for hook registration as far as users are concerned. hook is a pointer to a nf_hookfn function. This is the function that will be called for the hook. nf_hookfn is defined in "linux/netfilter.h" as well. The pf field specifies a protocol family. Valid
protocol families are available from "linux/socket.h" but for IPv4 we want to use PF_INET. The hooknum field specifies the particular hook to install this function for and is one of the values listed in table 1. Finally, the priority field specifies where in the order of execution this hook function should be placed.For IPv4,acceptable values are defined in "linux/netfilter_ipv4.h" in the "nf_ip_hook_priorities" enumeration.For the purposes of demonstration modules we will be using NF_IP_PRI_FIRST.

Registration of a Netfilter hook requires using a nf_hook_ops structure with the nf_register_hook() function.nf_register_hook() takes the address of an "nf_hook_ops" structure and returns an integer value.However,if you actually look at the code for the nf_register_hook() function in "net/core/netfilter.c", you will notice that it only ever returns a value of zero. Provided below is example code that simply registers a function that
will drop all packets that come in. This code will also show how the
Netfilter return values are interpreted.

Listing 1. Registration of a Netfilter hook
/* Sample code to install a Netfilter hook function that will
* drop all incoming packets. */

#define __KERNEL__
#define MODULE


/* This is the structure we shall use to register our function */
static struct nf_hook_ops nfho;

/* This is the hook function itself */
unsigned int hook_func(unsigned int hooknum,
struct sk_buff **skb,
const struct net_device *in,
const struct net_device *out,
int (*okfn)(struct sk_buff *))
return NF_DROP; /* Drop ALL packets */

/* Initialisation routine */
int init_module()
/* Fill in our hook structure */
nfho.hook = hook_func; /* Handler function */
nfho.hooknum = NF_IP_PRE_ROUTING; /* First hook for IPv4 */
nfho.pf = PF_INET;
nfho.priority = NF_IP_PRI_FIRST; /* Make our function first */


return 0;

/* Cleanup routine */
void cleanup_module()

Now its time to start looking at what data gets passed into hook
functions and how that data an be used to make filtering decisions. So
let's look more closely at the prototype for nf_hookfn functions. The
prototype is given in linux/netfilter.h as follows:

typedef unsigned int nf_hookfn(unsigned int hooknum,
struct sk_buff **skb,
const struct net_device *in,
const struct net_device *out,
int (*okfn)(struct sk_buff *));

The first argument to nf_hookfn functions is a value specifying one of the hook types given in table 1. The second argument is more interesting. It is a pointer to a pointer to a sk_buff structure, the structure used by the network stack to describe packets. This structure is defined in "linux/skbuff.h" .

Possibly the most useful fields out of sk_buff structures are the three unions that describe the transport header (ie. UDP, TCP, ICMP, SPX), the network header (ie. IPv4/6, IPX, RAW) and the link layer header (Ethernet or RAW). The names of these unions are h, nh and mac respectively. These unions contain several structures, depending on what protocols are in use in a particular packet. One should note that the transport header and network header may very well point to the same location in memory. This is the case for TCP packets where h and nh are both considered as pointers to IP header structures. This means that attempting to get a
value from h->th thinking it's pointing to the TCP header will result in false results because
h->th will actually be pointing to the IP header,just like nh->iph.

The two arguments that come after skb are pointers to net_device structures.net_device structures are what the Linux kernel uses to describe network interfaces of all sorts.It is defined in "linux/netdevice.h". The first of these structures, in, is used to describe the interface the packet arrived on. Not surprisingly, the out structure describes the interface the packet is leaving on. It is important to realise that usually only one of these structures will be provided.For instance, in will only be provided for the NF_IP_PRE_ROUTING and NF_IP_LOCAL_IN hooks.out will only be provided for the NF_IP_LOCAL_OUT and NF_IP_POST_ROUTING hooks. At this stage I haven't tested which of these structures are available for the NF_IP_FORWARD hook but if you make sure the pointers are non-NULL before attempting to dereference them you should be fine.

Finally,the last item passed into a hook function is a function pointer called okfn that takes a sk_buff structure as its only argument and returns an integer. I'm not too sure on what this function does. Looking in "net/core/netfilter.c" there are two places where this okfn is called. These two places are in the functions nf_hook_slow() and nf_reinject()
where at a certain place this function is called on a return value of NF_ACCEPT from a Netfilter hook. If anybody has more information on okfn please let me know.

Now that we've looked at the most interesting and useful bits of information that our hook functions receive, it's time to look at how we can use that information to filter packets.

We will build a module which filters packets based on their TCP destination port.This is only a bit more fiddly than checking IP addresses because we need to create a pointer to the TCP header ourselves. Remember what was discussed earlier about transport headers
and network headers? Getting a pointer to the TCP header is a simple matter of allocating a pointer to a struct tcphdr (define in linux/tcp.h) and pointing after the IP header in our packet data.Perhaps an example would help. Listing 2 presents code to check if the destination TCP port of a packet matches some port we want to drop all packets for.

Listing 2. Checking the TCP destination port of a received packet
/* Sample code to install a Netfilter hook function that will
* drop all incoming packets to a particular TCP destination port.*/

#define __KERNEL__
#define MODULE

#include /* For IP header */
#include /* For IPPROTO_TCP */

/* This is the structure we shall use to register our function */
static struct nf_hook_ops nfho;

/* Port no we want to drop packets going to, in NB order */
unsigned char *deny_port = "\x00\x19"; /* port 25 */

/* This is the hook function itself */
unsigned int hook_func(unsigned int hooknum,
struct sk_buff **skb,
const struct net_device *in,
const struct net_device *out,
int (*okfn)(struct sk_buff *))
struct sk_buff *sb = *skb;

struct tcphdr *thead;

/* We don't want any NULL pointers in the chain
* to the IP header. */
if (!sb ) return NF_ACCEPT;
if (!(sb->nh.iph)) return NF_ACCEPT;

/* Be sure this is a TCP packet first */
if (sb->nh.iph->protocol != IPPROTO_TCP) {
return NF_ACCEPT;

thead = (struct tcphdr *)(sb->data +
(sb->nh.iph->ihl * 4));

/* Now check the destination port */
if ((thead->dest) == *(unsigned short *)deny_port) {
return NF_DROP;

return NF_ACCEPT;

/* Initialisation routine */
int init_module()
/* Fill in our hook structure */
nfho.hook = hook_func;
/* Handler function */
nfho.hooknum = NF_IP_PRE_ROUTING; /* First for IPv4 */
nfho.pf = PF_INET;
nfho.priority = NF_IP_PRI_FIRST; /* Make our func first */


return 0;

/* Cleanup routine */
void cleanup_module()

So, as you can see, using Netfilter hooks we can create our own custom filters.There are a lot of interesting things using these hooks. i.e-

1.Kernel level Firewall ( For an implementation plz check bioforge's article).
2.Kernel level sniffer.
3.Module to hide packets from libpcap so that user level sniffers can't see it.
(For more details plz check bioforge's article).
4.Kernel level Backdoor daemon which will allow users to upload files and execute commands remotely. (Sounds like a cool trojan. isn' it ? )
The list is only limited by your imagination :). With the powers made availble to a kernel level programmer, you can do whatever you want ( Just beaware of those pretty nasty kernel faults :-P).

Happy experimenting with linux network stack.

Monday, January 03, 2005

Memory Management in Linux Kernel

Linux Memory Management: Chapter 02 : Segmentation
WRITTEN BY : Karthikeyan Raghuraman
REFERENCE : Understanding the Linux Kernel, O'Reilly Publications
DISCLAIMER : The content below is my understanding of the Chapter 1 in the above mentioned book. I myself being a newbie do not guarantee the authenticity of the following information.
CONTACT : karthikeyan dot raghuraman at gmail dot com

The OS is not forced to keep track of physical memory all by itself.todays Micro processors are built in with hardware which make the memory management effecient as well as robust against programming errors.

**Memory Addresses
++This is specific to the more common X86 processors.++

There are three kinds of addresses in the X86 processors.
This is the SEGMENT:OFFSET representation of the actual address.
A single 32 bit integer that can be used to address till 4GB of space(2^32 bytes).
Used to address the memory cells in the memory chips. They are the address sent in the Address Bus.

The flow of Address Values when the address is represented in Logical Address format.

Logical Address | | Linear Address | |Physical Address
----------------->|Segmentation Unit|----------------->|Paging Unit|------------------> AddressBus
| | | |

**Address Translation in the x86 processors [1]
The address translation occurs in two different ways
1. REAL Mode.

REAL Mode:
This is used mainly for Backward compatibility of the processors.
BIOS uses real mode of addressing.A real mode address comprises of a segment and offset.The corresponding address is given by

say u have address
then the addres is FFFF1.
FFFF * 16 = FFFF0
FFFF0 + 0001 = FFFF1

If you could recollect the x86 architecture after x386 the Global Descriptor Table/Local Descriptor tables are used for calculating the physical address. But the initialization of these registers is done when the computer is booted and is done in the REAL mode.

**PROTECTED MODE Address translation
As most of the operations in the processor occurs in this mode we might need to touch base with the basics of the processor a liitle bit before proceeding with the Address translation.

* Segmentation Registers

We know that logical address consists of two parts. SEGMENT identifier:OFFSET

SEGMENT identifier : 16 bit field for SEGMENT SELECTOR.
OFFSET : 32 bit field.

To retrieve segment selectors quickly we have the SEGMENT REGISTERS in the processor like. There are totally 6 of these registers.
CS :Code Segment
DS :Data Segment
SS :oh no not the Reich army of Himmler;) it is Stack Segment
ES :Extended Segment
FS :
GS :

The CS register has a 2 bit field which specifies the Current Privilege level(CPL) of the CPU. 0 is highest priority and 3 is lowest priority

++ Linux uses only two priority levels. 0 and 3 for Kernel and User mode respectively.

* Segment Descriptor:
Each segment is represented by an 8 byte segment descriptor. They are stored either in the Globat Descriptor Table(GDT) or the Local Table Descriptor(LDT).

Generally only one GDT is defined and each process has its own LDT.

<-> Address of GDT in Main memory: This is in the _gdtr_ processor register.
<-> Address of LDT in Main memory: This is in the _ldtr_ processor register.

What is in a Segment Descriptor
1. BASE Field : A 32 bit field that contains the linear address of the first byte in the segment.
2. Granularity(G) bit : If set size of the segment in multiples of 4KB else in bytes
3. LIMIT field : 20 bit field. If G bit is 0 then the size may vary from 1b to 1MB else 4KB to 4GB
4. System(S) bit : If RESET indicates Kernel is using this else it is a normal segment.
5. Type Field : 4 bit field indicating the access rights for the descriptor and the segment.

* List of Segment Descriptors
-CS Descriptor : It can be present in GDT/LDT and has the S flag set.
-DS Descriptor : It can be present in GDT/LDT and has the S flag set. Stacks represented using Generic data segments.
-TSS Descriptor: Task State Segment Descriptor. This segment is used to save the contents of the processor registers.
It can appear only in the GDT. The TYPE field has a value 11/9<1011,1001>. S flag is cleared.
-LDTD : Local Descriptor Table Descriptor.
Indicates that the Segment Descriptor refers to a segment containing an LDT.
Present in the GDT only.
TYPE Flag : 2
S flag : 0
+Contents of this Descriptor
1. DPL : Descriptor Privilege Level. A 2 bit field. Represents the minimum CPL is required to access the Descriptor.
2. Segment Present Flag: This is set when the segment is in the main memory else it is reset.
3. Data/Code Flag : This flag says if Data is present or Code in the segment.
4. AVL flag : used by different OS but is ignored by LINUX.

** SEGMENT Selectors
- Every segment register maps to a non programmable register in the processor which directs you to the Segment descriptor.
- Every time a segement register is loaded with the value in a segment selector, the corresponding segment descriptor address is stored in the non-programmable register.

What does a Segment Selector contain ?
1. A 13 bit segment descriptor entry pointer, which identifies the entry in the GDT/LDT.
2. A Table Indicator Flag which describes whether the entry is in GDT/LDT.
3. An Request Privilege Level (RPL) which is the same as CPL in the code segment descriptor.

So now we know that the value of the CPL is used in Code Segment, the LDTD and the Segment selector.

How is a logical Address converted to a Linear Address
| |
.-----> |descriptor
|_______|-------------> + -----> Linear Address
| | | |
| | | |
| `-------' |
| |
+<----- gdtr/ldtr |
| ---------- |
| ---------- |
X 8 ^ |
| | |
------------------ ---------------
Index | TI : Offset
------------------ ---------------
Segment Selector
Logical Address

1. The Segment Selector has the Fields Index<13 bit> and the Table Index Flag<1bit>
2. Each Segment Descriptor is 8 byte long.
3. If the Descriptor X is stored at address MM in the memory, the next Descriptor will be at MM+8.
4. Hence to obtain the address of the descriptors, the Index is MULTIPLIED BY 8.
5. The Table Index flag is used to determine if the table is in the LDT/GDT.
6. If GDT then the GDTR value is added to the result of step 3 else the LDTR is added.
7. Now the value in step 6 represents the address of the descriptor in the GDT/LDT.8. Access the Descriptor.
9. Now from the Descriptor's Base Field<32 bit field> we get the first byte of the segment.
10. Now to this value add the offset in the Logical address.
11. The value in Step 10 is the Linear address.

The first byte of the GDT/LDT is always zero. This is to ensure that a logical Address with a Null Segment selector is invalid.
The Maximum number of segment descriptors that can be stored in GDT is (2^13 - 1 =8191)

the above topics covered the basics of Segmentation in a x86 processor.
Now for some Linux stuff

**Segmentation in Linux
Does Linux use Segmentation ?? the fact is it uses it in the most limited way.
Linux relies mainly on Paging than on Segmentation. This is because
- Memory Management is easy when all the processes use the same segment register value, this boils down to
the processes using the same set of linear addresses
- Linux is ported into many systems. The RISC architecture systems do not support/in a limited way provide segmentation.

In this architecture it is possible to store all the segments in a single GDT.why this is not a limiting factor>

LDT are not used by the KERNEL, but an Interface is exposed which allows the process to create there own LDTs.

Segments Used by LINUX
* The kernel code segment. The GDT entries for this are
- Base : 0x00000000
- Limit: 0xFFFFF
- G : 1
- S : 1
- Type : 0xA
- DPL : 0
- D/B : 1

The KERNEL mode segment selector is given by the macro __KERNEL_CS. To address this segment the kernel loads
the return value of this macro in to the CS register.

* The Kernel Data Segment The GDT entries are
- Base : 0x00000000
- Limit: 0xFFFFF
- G : 1
- S : 1
- Type : 2 ** this is the only variation with the above
- DPL : 0
- D/B : 1

The KERNEL mode segment selector is given by the macro __KERNEL_DS.

* The User Code Segment. The GDT entries are
- Base : 0x00000000
- Limit: 0xFFFFF
- G : 1
- S : 1
- Type : 0xA
- DPL : 3
- D/B : 1

This segment can be accessed in both the Kernel and the User modes.
The segment selector is defined by the Macro __USER_CS.

* The User Data Segment. The GDT entries are
- Base : 0x00000000
- Limit: 0xFFFFF
- G : 1
- S : 1
- Type : 2.
- DPL : 3
- D/B : 1

The KERNEL mode segment selector is given by the macro __USER_DS.

Segmentation was introduced for the logical distinction between different types of codes.What we find from the above
descriptions of the GDT entries is, all the different actually overlap which each other which kind of defeats the
purpose of segmentation introduced in the x86 processors.

So we could say that LINUX uses segmentation in such a limited way, that it could actually do away with it.

* Task State Segment.
The descriptors of these segments are stored in the GDT.
Base : this is associated with the tss field of each process
G : this is cleared.
Limit: 0xEB
Type : 9/11
DPL : 0

* A Default LDT Segment

This segment is shared by all processes.

Stored in the default_ldt variable.
Includes a single null Segment Descriptor.
Each process has its own LDT segment descriptor.

Generally the values are set as
Base Field : default_ldt value.
Limit : 7

If a process requires an LDT then a new 4096Byte segment is created. The default LDT segment descriptor
for this process is replaced with the value of the current segment in the GDT.

Now for each process the system has to maintain two descriptors, One for TSS and one for LDT.
So totally 12 entries.
No of entries that could be made in the GDT is 2^13 - 1 = 8191
No of processes that can run in the system at any moment is (8191-12)/2 ~ 4090