Bow here again. It has been a while since we posted a binary analysis on our blog, so I figured we would post one for a vuln that has been getting a lot of hoopla the past few weeks :)
Whenever there is a critical vulnerability in a product that is used frequently, we perform an internal binary analysis in order to get a complete picture of the vulnerability and write reliable countermeasures for it. In some cases, we post the analysis to our threat intelligence portal and in others they simply stay internal to SecureWorks. Enough of that, on to the fun part:
The vuln we will be talking about is CVE-2009-0658, which is a code execution vulnerability in Adobe Acrobat. There are a few PoCs hosted over at milw0rm for this vulnerability:
http://www.milw0rm.com/exploits/8099
http://www.milw0rm.com/exploits/8090
The interesting thing about these two PoCs is that they are both supposedly for the same vulnerability, however they crash in two different locations (which will be explained later). I started off debugging the vulnerability using the second exploit (8090). If you attach a debugger to Adobe Reader, you?ll see that the crash occurs here:
.text:009ADAD7 loc_9ADAD7:
.text:009ADAD7 mov ecx, [ebx+edi*4] ; breakpoint #1
.text:009ADADA mov eax, [ecx+1Ch]
.text:009ADADD mov edx, [esi+10h]
.text:009ADAE0 lea eax, [eax+eax*4]
.text:009ADAE3 lea eax, [edx+eax*4-14h]
.text:009ADAE7 mov edx, [eax+4]
.text:009ADAEA test edx, edx
.text:009ADAEC jz short loc_9A
.text:009ADAEE mov ebp, [eax+10h]
.text:009ADAF1 mov [edx+ebp*4], ecx ; Crash #2 occurs here
.text:009ADAF4 add dword ptr [eax+10
An access violation is triggered with a failed write to 0x41414141, which is obviously an attacker controlled value.
In order to go any further in the code, we must have an understanding of the JBIG2 file format and the stream within the PDF that we are dealing with. First, the documentation is available here:
http://www.jpeg.org/public/fcd14492.pdf
The irony being that you have to open it in a PDF reader ;) Now to look at the file. If you search the file for JBIG2 you'll find it in 4 places:
0000ad0: 792f 4669 6c74 6572 2f4a 4249 4732 4465 y/Filter/JBIG2De
00029e0: 7465 722f 4a42 4947 3244 6563 6f64 652f ter/JBIG2Decode/
0009750: 792f 4669 6c74 6572 2f4a 4249 4732 4465 y/Filter/JBIG2De
0010aa0: 4669 6c74 6572 2f4a 4249 4732 4465 636f Filter/JBIG2Deco
These are 4 separate JBIG2 streams within the PDF and any of them could result in the trigger of the vulnerability, however debugging reveals that the first stream triggers the vuln. The first 64 bits or so of the initial stream is here:
0000af0: 3e3e 7374 7265 616d 0d0a 0000 0001 4000 >>stream......@.
0000b00: 0033 3333 1300 000a 0000 000c e400 0001 .333............
0000b10: 2c00 0001 2c50 0000 0000 0001 0001 0100 ,...,P..........
0000b20: 0009 0400 0003 fffd ff02 fefe fe00 0000 ................
So now we need to determine what out of this stream causes the crash. I started this by trying to trace back the values used in the pointer calculation that ends up being 0x41414141 in the crash. Before continuing, it is worth noting that the calculation at 0x009ADAF1 results in a value of 0x41414141, but both registers contain the same value, 0x0D0D0D0D (which is 0x41414141/5), which is again obviously user controlled. Both of the registers used in the calculation, EBP and EDX, obtain their values shortly before the crash, so set a breakpoint just before the crash at 0x009ADAD7. The next few instructions are a fairly annoying sequence of arithmetic operations, so we'll walk through them step by step. Here is another annotated version of the disassembly:
.text:009ADAD7 loc_9ADAD7:
.text:009ADAD7 mov ecx, [ebx+edi*4] ; breakpoint #1, EDI = 0, EBX = pointer [1]
.text:009ADADA mov eax, [ecx+1Ch] ; EAX = 0x00333333 [2]
.text:009ADADD mov edx, [esi+10h]
.text:009ADAE0 lea eax, [eax+eax*4] ; EAX = 0x00FFFFFF [3]
.text:009ADAE3 lea eax, [edx+eax*4-14h] ; Final object calculation prior to crash [4]
.text:009ADAE7 mov edx, [eax+4] ;
.text:009ADAEA test edx, edx
.text:009ADAEC jz short loc_9A
It is important to note before moving on that both values in EBP and EDX during the crash are obtained by accessing a pointer to some object that is pointed to by EAX (which is calculated at [4]). We will need to focus first on what the value inside of EAX is and how we can control it.
The first calculation ([1]) is part of an array processing loop where EDI is the counter andEBX is the base pointer to the array. The array holds a series of objects, the loop is fairly small and just seems to be iterating through every available object and moving around some pointers. The second ([2]) instruction moves an integer into EAX, the value of which is 0?00333333 during exploitation, which should look familiar. If you look at the bytes from the file, 0?00333333 is very clearly at the start of the stream, we can verify this statically or by modifying that value in the file and debugging again. For the sake of time, I will just state that it is indeed from the file and is completely user controlled. If we move down, we see that value is used yet again in a calculation, which results in a value of 0?00FFFFFF ([3]) being placed into EAX. Shortly after this calculation, an array access is performed using our controlled value and the result is put into EAX, which is used later on as a pointer to some object [4]. This is where the vulnerability lies, we are able to read a value outside the array which is then used as a pointer for a memory write. It is worth noting that while debugging the exploit, the memory locations surrounding the array access are filled with 0?0D0D0D0D and it seems to be sprayed across a large portion of memory.
So to summarize all of this, we have an array of objects which are accessed via a user-controlled value and whose member variables are used to calculate an address to write to. This results in us being able to write anywhere in Adobe Reader's address space by using some trickery. From here we need to see what the values used in the calculation represent in the file. We?ll start by looking at the stream documentations?
We know that we are dealing with the first few bytes in the JBIG2 stream and it is likely to be a stream header of some sorts. The JBIG2 documentation says that encoded JBIG2 streams are broken down into segments which are made up of two parts, the segment header and the segment data (see page 71). The format of the segment data is different depending upon the segment type, however the format for the segment header is the same across all different segment types. Knowing this, we can start looking at the segment header format. Here are the first few bytes of the stream again:
[00 00 00 01] [40] [00] [00 33 33 33] 13 00 00 0a ...
[Segment#] [FL] [RT] [SPAssociat] ?
According to the documentation, the first DWORD in the stream is always the segment number (pg 72, 7.2.2). The next byte is a flags field ([FL]) which indicates the segment type (the lower 6 bits) and two possible characteristics of the stream, page association field size (bit 7) and deferred non-retain (bit 8). In the proof of concept, the 7th bit (page association field size) is the only bit in the flags that is set. The byte that follows the flag field is a varying length object called the referred-to segment count [RT], in this case, it is only a byte long and the value is zero, indicating that this segment does not refer to any other segments. If this value were non-zero, then the object could be larger and would be followed by another variable length object which contains the numbers of the segments that the current segment refers to. Since the referred-to segment count is zero in the PoC, we don?t need to worry about this too much. Finally, the next DWORD is the segment page association ([SPAssociat]), which will be a byte if the page association field bit is not set or a DWORD if it is. In this case, the bit is set and so this field is a DWORD. The segment page association is the value that is responsible for the access violation.
I like to always verify that whatever I?m reversing actually lives up to the specification, so we'll take a look at some of the code that processes this header structure. Unfortunately, the Adobe code that processes JBIG2 streams is significantly more complicated than other code I've seen dealing with JBIG2, so the only thing I am going to deal with here is whether or not the page association bit needs to be set for exploitation. We know from the documentation this is supposed to be true and we could just fiddle with it in a hex editor, but where is the fun in that ;) So first things first, we need to find where our file is in memory and look for references to it (specifically the 5th byte).
The first routine we will be looking at is sub_9A8D50 (boo no symbols), where var_2C is a pointer to our stream buffer that contains the stream itself. The pointer in var_2C is obtained from the return value of another function, sub_30D220 and is the result of a call to a typical malloc type function (although there is some other code wrapped around it). This occurs during this virtual function call:
.text:009A8FE8 loc_9A8FE8:
.text:009A8FE8 mov ecx, [esi+0D0h]
.text:009A8FEE mov edx, [ecx]
.text:009A8FF0 push eax
.text:009A8FF1 mov eax, [edx+14h]
.text:009A8FF4 push 0
.text:009A8FF6 push ecx
.text:009A8FF7 call eax ; 0x0030D220
.text:009A8FF9 add esp, 0Ch
.text:009A8FFC test eax, eax
.text:009A8FFE mov [esp+44h+var_30], eax
.text:009A9002 jz short loc_9A8F
.text:009A9004 xor edi, edi
.text:009A9006 cmp [esi+78h], edi
.text:009A9009 mov [esp+44h+var_2C], eax ; stream buf pointer moved into var_2C
.text:009A900D mov [esp+44h+var_28], eax
The allocation looks like this:
.text:0030D26B mov ecx, [esp+0Ch+arg_4]
.text:0030D26F cmp eax, ecx
.text:0030D271 mov ebx, eax
.text:0030D273 jb short loc_30
.text:0030D277
.text:0030D277 loc_30D277:
.text:0030D277 push ecx ; allocation size - stream length
.text:0030D278 mov ecx, esi
.text:0030D27A call malloc_wrapper_sub2
.text:0030D27F test edi, edi
.text:0030D281 mov esi, eax
There are two important things to note here, first that the stream length is used for the allocation and second that the stream data is not copied over in sub_30D220, but another routine we?ll look at a little later. Once sub_30D220 returns, we see that the return value is stored into var_28, var_2C, and var_30, so we have three variables holding onto the stream buf pointer. We then run into another virtual function call (yay):
.text:009A9019 mov edx, [esi+78h] ; one iteration
.text:009A901C mov ecx, [ebp+10h]
.text:009A901F mov eax, [esp+44h+var_28]
.text:009A9023 mov ecx, [ecx+8]
.text:009A9026 push ebp
.text:009A9027 sub edx, edi
.text:009A9029 push edx
.text:009A902A push 1
.text:009A902C push eax ; stream buf ptr
.text:009A902D call ecx ; sub_317A80
.text:009A902F add [esp+54h+var_28], eax
.text:009A9033 add edi, eax
.text:009A9035 mov eax, [esp+54h+var_30]
.text:009A9039 add esp, 10h
This virtual function call ends up resolving to sub_317B30, which is a routine with a loop and a number of function calls inside of it. During the second iteration of the loop, the following is executed:
.text:00317BFE loc_317BFE: ;
.text:00317BFE mov eax, [esi+0Ch] ; pointer to the stream buffer (with the data in it)
.text:00317C01 lea ecx, [edi+ebp]
.text:00317C04 cmp ecx, eax
.text:00317C06 jbe short loc_317C22
.text:00317C22
.text:00317C22 loc_317C22:
.text:00317C22 push ebp ; size_t (stream size)
.text:00317C23 push eax ; source pointer (obtained from ESI+0Ch)
.text:00317C24 push edi ; destination pointer (pointer allocated earlier)
.text:00317C25 call memcpy
.text:00317C2A add [esi+0Ch], ebp
.text:00317C2D add esp, 0Ch
.text:00317C30 sub [esi+8], ebp
.text:00317C33 jmp short loc_317
So now we know where the buffer we are dealing with is allocated and filled with the file contents, we can set breakpoints on the flags and see where they are read. The first trigger is here:
.text:009AD509 mov bl, [edx+4] ; Flags byte
.text:009AD50C and bl, 3Fh ; Obtain segment type
.text:009AD50F mov byte ptr [esp+30h+var_8], bl
.text:009AD513 mov edx, [esp+30h+var_8]
Which is simply attempting to obtain the segment type by ANDing the lower 5 bytes. Moving on, the next trigger of the breakpoint should occur here inside sub_9BB160:
.text:009BB176 movzx edx, byte ptr [ecx] ; segment flags, hw bp trigger #2
.text:009BB179 mov [eax+0Ch], dl ; segment flags copied
.text:009BB17C add ecx, 1
.text:009BB17F mov [eax], ecx
.text:009BB181 mov al, dl
.text:009BB183 mov cl, al
.text:009BB185 and cl, 3Fh
.text:009BB188 test al, 40h ; Page association size
.text:009BB18A setnbe dl ; is the 7th bit set? (set dl = 1)
.text:009BB18D mov [esi+4], cl
.text:009BB190 test al, 80h
.text:009BB192 movzx cx, dl ; store value of dl into cx
.text:009BB196 setnbe dl
.text:009BB199 movzx ax, dl
.text:009BB19D mov [esi+8], ax
.text:009BB1A1 mov eax, [esi+2Ch]
.text:009BB1A4 mov [esi+6], cx ; move value of cx (result from the page association test) into memory
You should see a couple of things here. First, the next hardware breakpoint will trigger where indicated above. You then see that the segment flags are copied to another location. Moving down a little ways, the lower byte of the EAX register is TESTed against 0?40. The setnbe instruction will set the destination operand (lower byte of EDX in this case) to 1 if both CF and ZF are equal to zero, or set it to 0 otherwise. Since test performs an AND and the 7th bit of the flags is set, the AND will result in ZF not being set, so DL will be set to 1. Moving down we see that the result of the previous instruction is tossed around and eventually stored back into an object in memory.
Moving a little further down the routine, you'll see this:
.text:009BB388
.text:009BB388 loc_9BB388:
.text:009BB388 cmp word ptr [esi+6], 0 ; Was page association size field set?
.text:009BB38D jnz short loc_9BB3A3
.text:009BB38F mov eax, [esi+2Ch] ; if the page association size field is 0, this block is executed (only read one byte)
.text:009BB392 mov ecx, [eax]
.text:009BB394 mov dl, [ecx]
.text:009BB396 add ecx, 1
.text:009BB399 mov [eax+0Ch], dl
.text:009BB39C mov [eax], ecx
.text:009BB39E movzx eax, dl
.text:009BB3A1 jmp short loc_9BB3AD
.text:009BB3A3
.text:009BB3A3 loc_9BB3A3:
.text:009BB3A3 mov ecx, [esi+2Ch] ; if page association is set to 1
.text:009BB3A6 push 4
.text:009BB3A8 call sub_9BA6B0 ; read DWORD
.text:009BB3AD loc_9BB3AD:
.text:009BB3AD mov ecx, [esi+2Ch]
.text:009BB3B0 push 4
.text:009BB3B2 mov [esi+1Ch], eax ; store segment page association field in memory
This is testing the result of our previous test against 040 and making a function call if it was set. A quick analysis of sub_9BA6B0 indicates that it is looping through and reading the value of the segment page association field byte by byte and returning the full DWORD. The return value is then stored into memory at ESI+1Ch. This is the location where the vulnerable routine will read the page association field value from that results in the access violation. Now, moving back into sub_9AD380 we see that the value read and stored at ESI+1Ch is accessed and tested to see if it has a value:
.text:009AD887 loc_9AD887:
.text:009AD887 mov ecx, [edx]
.text:009AD889 mov eax, [ecx+1Ch] ; segment page association
.text:009AD88C test eax, eax
.text:009AD88E jz loc_9ADB40
.text:009AD894 mov ecx, [esi+10h]
.text:009AD897 lea eax, [eax+eax*4]
.text:009AD89A add dword ptr [ecx+eax*4-14h], 1 ; Crash #1 (8099)
.text:009AD89F lea eax, [ecx+eax*4-14h]
.text:009AD8A3 mov eax, [edx]
.text:009AD8A5 mov ecx, [eax+10h]
.text:009AD8A8 test ecx, ecx
.text:009AD8AA mov [esp+30h+var_4],
It is of interest to note that this is just above where the crash occurs with one of the other PoCs. If you are debugging, it is important to understand why one PoC triggers here, while the other one does not (8099). In 8099 the add instruction which triggers the crash will not be able to resolve to a valid memory address, resulting in the access violation, however in 8090, the fact we have a lot of memory allocated filled with 0D's increases the likelihood that the address calculation will result in a valid memory address (which is a result of a Javascript heap spray). Moving down, we see that value is accessed once again prior to the final crash:
.text:009ADAD7 loc_9ADAD7:
.text:009ADAD7 mov ecx, [ebx+edi*4]
.text:009ADADA mov eax, [ecx+1Ch] ; segment page association
.text:009ADADD mov edx, [esi+10h]
.text:009ADAE0 lea eax, [eax+eax*4]
.text:009ADAE3 lea eax, [edx+eax*4-14h]
.text:009ADAE7 mov edx, [eax+4]
.text:009ADAEA test edx, edx
.text:009ADAEC jz short loc_9
So, looking back now, we can see where the page association field size is checked and where memory is written accordingly. In addition, we can also see where the value used in the array access is written into memory.
Several sources have indicated that Javascript is not necessary for this vulnerability to be exploited. There is no need for Javascript in order to trigger the vulnerability, however I've yet to see any exploits that are reasonably reliable with or without using Javascript. The sample that we debugged from "the wild" used Javascript for the heap spray and had a fairly small rate of success. There are potential ways of increasing the reliability of the exploit without using Javascript, however we have not captured anything of that nature yet in the wild.
This issue was patched by Adobe on March 10th, 2009. A link to the Adobe advisory is available here:
http://www.adobe.com/support/security/bulletins/apsb09-03.html