atomicity and alignment of data in memory

A data item is aligned in memory when its address is a multiple of its size in bytes. For instance, the address of an aligned short integer must be a multiple of two while the address of an aligned integer must be a multiple of four.

Why is it important to know about alignment ?

Assembly language instructions that make zero or one aligned memory access are atomic.  Generally, a unaligned memory access is not atomic.

fork and vfork

quick question: what’s the difference between fork() and vfork() system calls ?

quick answer: vfork() system call creates a process that shares the memory address space of its parent.

details:

fork() is implemented by linux as a clone() system call whose flags parameter specifies both a SIGCHLD signal and all the clone flags cleared and whose child_stack parameter is 0.

vfork() is implemented by linux as a clone() system call whose flags parameter specifies both a SGCHLD signal and flags CLONE_VM and CLONE_VFORK and whose second parameter is 0.

[ discussion: copy on write ]

This is a concept of making the process creation using fork() efficient in that instead of copying the parent’s address space while process creation, it is shared but as soon as either of them write on the page, kernel allocates a new page and assigns it to the writer process.

Most of the time, forking is required just to run a new process in which case it’s a waste to copy the whole parent address space.

verifying user space addresses in kernel

We can verify a user space address while executing in kernel by using the following function

int access_ok(int type, const void *addr, unsigned long size);

Defined in <asm/uaccess.h>, this function returns 1 if the address addr is a user space address and 0 if its a kernel space address (talking of the virtual address of course). argument type can be VERIFY_READ in case you ought to read from the address addr and VERIFY_WRITE if you ought to write to the address addr. VERIFY_WRITE is a superset of VERIFY_READ, hence if you need to read as well as write then use VERIFY_WRITE. argument size is the byte size of the data to be read or written.

This comes handy to be used in drivers and should return -EFAULT if the address is a kernel address where you expect a user space address e.g. in implementation of ioctls.

Slab Poisoning

Slab Poisoning is a term popular among linux kernel hackers and refers to the condition caused by using an uninitialized dynamically allocated memory location, mostly a panic (or oops).

How to find if you have a slab poisoning ?

If you have an offending address 0xa5a5a5a5 somewhere in the kernel oops message, you can be almost be sure that you have used an uninitialized dynamically allocated memory somewhere. Similarl, if you see some where the address 0x6b6b6b6b, you can very much be sure of using a freed variable.

Note: This help from the kernel comes only when it is compiled with CONFIG_DEBUG_SLAB  configuration. In this case, each byte of allocated memory is set to 0xa5 before being handed over to the caller and also set to 0x6b when it is freed. Why not 0x00 ? because that hides more bugs than it can help find (See Writing Solid Code and my review on that book).

Using memory tags  before and after the allocated memor, it is possible to tell about any memory overrun or buffer overflow. When kernel debugging is enabled, linux kernel does exactly that.

Sleep/Wakeup and Linux kernel threads

As a part of understanding the scheduling of kernel thread in linux, I wrote following module code.

#include <linux/module.h>
#include <linux/kernel.h>

#define DBG_FN_ENTRY()  \
do { \
printk(KERN_INFO “Inside function [ %s ]\n”, \
__FUNCTION__); \
} while(0)

struct task_struct *sleeping_task = NULL;
int k = 0;
int func(void *s)
{
int i;

for(i=0;i<20;i++) {
printk(“[%d][%s]\n”, i, (char *)s);
if(sleeping_task)
wake_up_process(sleeping_task);
if(i==10) {
sleeping_task = current;
set_current_state(TASK_INTERRUPTIBLE);
schedule();
}
}
}

int init_module(void)
{
DBG_FN_ENTRY();

kernel_thread(func, (void *)”first”, 0);
kernel_thread(func, (void *)”second”, 0);

return 0;
}

void cleanup_module(void)
{
DBG_FN_ENTRY();
}

This module on compilation and insertion (2.6.17-10-generic) using

insmod hello.ko

produces following output (/var/log/messages) :

Inside function [ init_module ]
[0][first]
[1][first]
[2][first]
[3][first]
[4][first]
[5][first]
[6][first]
[7][first]
[8][first]
[9][first]
[10][first]
[0][second]
[1][second]
[2][second]
[3][second]
[4][second]
[5][second]
[6][second]
[7][second]
[8][second]
[9][second]
[10][second]
[11][first]
[12][first]
[13][first]
[14][first]
[15][first]
[16][first]
[17][first]
[18][first]
[19][first]
[11][second]
[12][second]
[13][second]
[14][second]
[15][second]
[16][second]
[17][second]
[18][second]
[19][second]

Mistakes that I made and rectified to make it work (..arrgh confessions are painful !)

> used schedule()  without changing the task state, task state remained TASK_RUNNING and hence the thread got scheduled again.

> did not wak up the process and hence threads did not get rescheduled, leading to output only upto the counter 10 for both the threads.

I got hold of an article (slightly late..) on LWN about sleeping/wakeup. worth going.