Anatomy of a linux kernel bug

Posted on November 13, 2013 by rkrishnan

Compiling a C program on a GNU/Linux system involves a lot of magic under the hood. One of them, which is taken for granted is that the kernel version running on a system can be different from the version of kernel header files used to compile a program. The Linux kernel developers work really hard to give this guarantee to the userspace programs. Read on for a case where that guarantee got broken.

ioctl

ioctl(2) is the standard Unix way of controlling a device file from userspace. For example, let us say, for debugging, we want to read and write some registers from an i2c device. One of the ways to do this is to provide an experimental ioctl command to read/write the registers.

The ioctl call in the userspace has the following prototype:

int ioctl(int fd, int cmd, ...);

The driver API is usually implemented using a table of function pointers. The ioctl function pointer API is a little different from that of the userspace API but for this discussion, that doesn’t matter. The key point is that the second parameter cmd is passed unchanged into the kernel ioctl function call.

What is cmd?

cmd is the ioctl command code. cmd can be thought of as a 32-bit bit-field derived from a few other things to make it unique. Here are some things used to define command codes:

These 4 sets of information is used to create the bitfield by the macro _IOC

As an aside, LXR is a great tool to browse through code quickly.

The bug

I am writing a video4linux driver for an HDMI input device. Unfortunately, this is suppose to work with a 2 year old kernel (v3.0) shipped with Android JellyBean release running on a TI OMAP4 device. For some reason, the kernel headers shipped with AOSP is a bit different from that in the kernel version 3.0.

The particular control code of interest to me is the VIDIOC_DQEVENT, which is defined as follows:

  #define VIDIOC_DQEVENT           _IOR('V', 89, struct v4l2_event)

I have the following code snippet in a simple userspace application (not showing the entire code here):

...
res = select(fd + 1, NULL, NULL, &fds, NULL);
if (res <= 0)
        fprintf(stderr, "%s: %s\n", argv[0], strerror(errno));

res = ioctl(fd, VIDIOC_DQEVENT, &ev);
...

I observed that the select is succeeding but the ioctl call with the command VIDIOC_DQEVENT was failing with an errno ENOTTY. A bit of grepping in the driver source revealed that the ENOTTY is coming from my own driver’s default handler. This means that the switch statement didn’t succeed with the command code we passed. That was strange! This clearly showed that VIDIOC_DQEVENT has different values in kernel and userspace! Printing its value made it clear that this was indeed the case.

A bit more printing revealed that struct v4l2_event which is used to calculate the control code VIDIOC_DQEVENT has a size different by exactly 8 bytes in userspace vs that in the kernel. This was very strange because this indeed means that kernel ABI guarantee is broken.

The kernel header file include/linux/videodev2.h has the struct v4l2_event defined as follows:

...
struct v4l2_event_vsync {
        /* Can be V4L2_FIELD_ANY, _NONE, _TOP or _BOTTOM */
        __u8 field;
} __attribute__ ((packed));

struct v4l2_event {
        __u32                           type;
        union {
                struct v4l2_event_vsync vsync;
                __u8                    data[64];
        } u;
        __u32                           pending;
        __u32                           sequence;
        struct timespec                 timestamp;
        __u32                           reserved[9];
};
...

… and the kernel headers shipped with the userspace had this version for the same structure:

...
struct v4l2_event_ctrl {
        __u32 changes;
        __u32 type;
        union {
                __s32 value;
                __s64 value64;
        };
        __u32 flags;
        __s32 minimum;
        __s32 maximum;
        __s32 step;
        __s32 default_value;
};

struct v4l2_event_frame_sync {
        __u32 frame_sequence;
};

struct v4l2_event {
        __u32                           type;
        union {
                struct v4l2_event_vsync         vsync;
                struct v4l2_event_ctrl          ctrl;
                struct v4l2_event_frame_sync    frame_sync;
                __u8                            data[64];
        } u;
        __u32                           pending;
        __u32                           sequence;
        struct timespec                 timestamp;
        __u32                           id;
        __u32                           reserved[8];
};
...

Now comes the interesting part. Notice the union u in the struct v4l2_event? The largest element in the union is a 64 byte array. If you do the math, you can see that no other element in the array exceeds this size, so even though userspace has some extra structures in the union, in theory, we are not going to exceed 64 bytes. But struct v4l2_event_ctrl has another union inside which has a 64-bit value.

The compiler decided to align this value at a 64 bit boundary and also align the reserved array by another 4 bytes, resulting in a struct v4l2_event_ctrl with size increase of 8 bytes and this exceeds 64 bytes, making it the largest element in the union.

Here is some quick and dirty test code to verify that this is indeed the case: https://gist.github.com/vu3rdd/7445863

The fix

I fixed it in my system by copying the relevant portion of the userspace header into the kernel header so that the struct v4l2_event definitions match. I could do that because I know that there is no other user of the Video4Linux events in my system.