Thursday, October 15, 2009

A couple of unicode issues on PHP and Firefox

Well, here I am developing ACS, finding that this project resembles at some degree the creation of a browser.. but anyway, it's close to a working beta (yay!).

In any case, a couple of bugs came to my attention, some of them are public, some of them are not.

First of all, I want to describe the PHP vulnerability I made public on my presentation with David Lindsay, at Blackhat USA 2009, that apparently only Chris Weber, Giorgio Maone (creator of NoScript), Mario Heiderich (creator of PHP-IDS) and the Acunetix security team have realized the danger of it.

It has been reported, well, more than enough times to the PHP team (I made another attempt today, hoping this will get fixed in some time soon.. if at all). This issue affects all PHP versions Mario Heiderich and me could test, and endangers practically all PHP programs that use the utf8_decode() function for decoding (as recommended by OWASP guidelines).

The disclosure timeline follows:
* Reported by May 11 2009
* Discovered by June 19 2009
* Discovered by Giorgio Maone / Eduardo Vela: July 14 2009
* Reported and Fixed on PHPIDS: July 14 2009
* Microsoft notified of a XSS Filter bypass: July 14 2009
* Fixed XSS Filter bypass on NoScript 1.9.6:  July 20 2009
* Vulnerability disclosed on BlackHat USA 2009: July 29 2009
* Added signature to Acunetix WVS: August 14 2009
* Re-reported by September 27 2009
* Vendor claims it was fixed on 5.2.11: September 29 2009
* Re-re-reported by after checking 5.2.11: October 16 2009
* Published October 16 2009

You can check the bug here:

In reality there are several vulns in just a couple of lines, so I'll describe them here:
1.- Overlong UTF-8:
As REQUIRED by UNICODE 3.1, and noted in the Unicode Technical Report #36, UTF-8 is forbidden to interpretate a character's non-shortest form.

VULN: PHP makes no checks whatsoever on this matter.

Why is this a vulnerability?

A filter (such as addslashes, htmlentities, escapeshellarg, etc.) will NOT be able to detect&escape such byte sequences, and so an application that relies on them for security checks wont be protected at all. Because it allows an attacker to encode "dangerous" chars, such as ', ", <, ;, &, \0 in different ways:

' = %27 = %c0%a7 = %e0%80%a7 = %f0%80%80%a7
" = %22 = %c0%a2 = %e0%80%a2 = %f0%80%80%a2
< = %3c = %c0%bc = %e0%80%bc = %f0%80%80%bc
; = %3b = %c0%bb = %e0%80%bb = %f0%80%80%bb
& = %26 = %c0%a6 = %e0%80%a6 = %f0%80%80%a6
\0= % 00 = %c0%80 = %e0%80%80 = %f0%80%80%80

Use hackvertor to generate them.

Enabling attacks on systems that use addslashes for example (but almost all encoding functions would be vulnerable):

// add slashes!
foreach($_GET as $k=>$v)$_GET[$k]=addslashes("$v");

//  .... some code ...

// $name is encoded in utf8
mysql_query("SELECT * FROM table WHERE name='$name';");


2.- Ill formed sequences:
As REQUIRED by UNICODE 3.0, and noted in the Unicode Technical Report #36, if a leading byte is followed by an invalid successor byte, then it should NOT consume it.

VULN: PHP will consume invalid bytes.

Why is this a vulnerability?

It will allow an attacker to "eat" controll chars. For example:

// htmlentities
foreach($_GET as $k=>$v)$_GET[$k]=htmlentities("$v",ENT_QUOTES);

//  ... some code ...


//  ... some code ...

$profileImage="<img alt=\"Photo of $name\" src=\"http://$url\" />";

// ... some code ...
echo utf8_decode($profileImage);

A request such as:


Will execute the code "alert(1)" when the page loads.

Note that htmlpurifier does a utf8_decode function call at the end of the decoding, BUT they are safe because of a pre-encoding made by htmlpurifier.. other codes that do the same wont be so lucky.

Bogdan Calin from Acunetix WVS described a couple of other potential attack scenarios:

Where an attacker could fool the filter by doing a request like:


Where an attacker could fool the filter by doing a request like:

3.- Integer overflow:
Unsigned short has a size of 16 bits (2 bytes), that is UNCAPABLE of storing unicode characters of 21 bits, and represented on UTF with 4 bytes (1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx). PHP attempts to sum a 21 bits value to a 16 bits-size variable, and then makes no checks on the value.

The affected code follows:

//  php/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode(    //  ...
    int pos = len;
    char *newbuf = emallo    //  ...
    unsigned short c;          // sizeof(unsigned short)==16 bits
    char (*decoder)(unsig    //  ...
    xml_encoding *enc = x    //  ...
//  ...
//  #580
    c = (unsigned char)(*s);
    if (c >= 0xf0) {         /* four bytes encoded, 21 bits */
        if(pos-4 >= 0) {
            c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | (s[3]&63);
        } else {
            c = '?';   
        s += 4;
        pos -= 4;
//  ...

The relevant part of the code is of course, the declaration of c as an unsigned int, the comment specifing that the char is 21 bits, and this:
x= ((s[0]&7)<<18) | ...

s[0]&7<<18 means it will move 3 bits, 18 bits to the right. As we noted before.. c's size is only 16 bits.
(xxxx xxxx & 0000 0111) << 18

Also, this part:
...  ((s[1]&63)<<12) | ...

s[1]&63<<12 means it will move 6 bits, 12 bits to the right. So, 2 bits are going to be lost.
(xxxx xxxx & 0011 1111) << 12

This allows us to make something even more interesting.

Code like this:

%FF%F0%40%FC that is invalid unicode, overlong, and all you want (definatelly NOT valid), will be casted as a "lower than" simbol (<).

This besides the already mentioned problems, and the possibility of bypassing quite a lot of WAFs and Filters.. demonstrate the problem of a bad unicode implementation on PHP.

I hope the PHP development team acknowledges all this issues that have been reported before, and were explained some months ago on Blackhat USA (and the developers were noticed to check the ppt more than once), and now are explained yet another time.

This was fixed on 5.2.11 :) on my birthday!! Sept 17

Anyway.. that's not all, now to finish this post I want to publish a overlong utf-8 exception on Firefox (actually, Mozilla's).

The firefox one

Firefox is supposed to consider the non-shortest form exception (point #1 in the PHP vulnerabilities), and section 3.1 of the Unicode Technical Report #36 but apparently there's a flaw on it. This is specially problematic for the reasons that an overlong unicode sequence not taken into consideration may allow several types of filter bypasses.

Anyway, the severity of this vulnerability is not as high as the PHP ones, but is worth mentioning. The following non-shortest form for the char U+1000:
0xF0 0x81 0x80 0x80

is allowed, as well as the correct shortest form:
0xE1 0x80 0x80

Note that this problem is only present on the 4 bytes representation.

You can track this bug at:

Anyway, that's all! Thanks for your time :)