Thursday, October 15, 2009

A couple of unicode issues on PHP and Firefox

Well, here I am developing ACS, finding that this project resembles at some degree the creation of a browser.. but anyway, it's close to a working beta (yay!).

In any case, a couple of bugs came to my attention, some of them are public, some of them are not.

First of all, I want to describe the PHP vulnerability I made public on my presentation with David Lindsay, at Blackhat USA 2009, that apparently only Chris Weber, Giorgio Maone (creator of NoScript), Mario Heiderich (creator of PHP-IDS) and the Acunetix security team have realized the danger of it.

It has been reported, well, more than enough times to the PHP team (I made another attempt today, hoping this will get fixed in some time soon.. if at all). This issue affects all PHP versions Mario Heiderich and me could test, and endangers practically all PHP programs that use the utf8_decode() function for decoding (as recommended by OWASP guidelines).

The disclosure timeline follows:
* Reported by root@80sec.com: May 11 2009
* Discovered by webmaster@lapstore.de: June 19 2009
* Discovered by Giorgio Maone / Eduardo Vela: July 14 2009
* Reported and Fixed on PHPIDS: July 14 2009
* Microsoft notified of a XSS Filter bypass: July 14 2009
* Fixed XSS Filter bypass on NoScript 1.9.6:  July 20 2009
* Vulnerability disclosed on BlackHat USA 2009: July 29 2009
* Added signature to Acunetix WVS: August 14 2009
* Re-reported by sird@rckc.at: September 27 2009
* Vendor claims it was fixed on 5.2.11: September 29 2009
* Re-re-reported by sird@rckc.at after checking 5.2.11: October 16 2009
* Published sirdarckcat.blogspot.com: October 16 2009

You can check the bug here:
http://bugs.php.net/bug.php?id=49687

In reality there are several vulns in just a couple of lines, so I'll describe them here:
1.- Overlong UTF-8:
As REQUIRED by UNICODE 3.1, and noted in the Unicode Technical Report #36, UTF-8 is forbidden to interpretate a character's non-shortest form.
    http://www.unicode.org/reports/tr36/#UTF-8_Exploit

VULN: PHP makes no checks whatsoever on this matter.

Why is this a vulnerability?

A filter (such as addslashes, htmlentities, escapeshellarg, etc.) will NOT be able to detect&escape such byte sequences, and so an application that relies on them for security checks wont be protected at all. Because it allows an attacker to encode "dangerous" chars, such as ', ", <, ;, &, \0 in different ways:

' = %27 = %c0%a7 = %e0%80%a7 = %f0%80%80%a7
" = %22 = %c0%a2 = %e0%80%a2 = %f0%80%80%a2
< = %3c = %c0%bc = %e0%80%bc = %f0%80%80%bc
; = %3b = %c0%bb = %e0%80%bb = %f0%80%80%bb
& = %26 = %c0%a6 = %e0%80%a6 = %f0%80%80%a6
\0= % 00 = %c0%80 = %e0%80%80 = %f0%80%80%80

Use hackvertor to generate them.


Enabling attacks on systems that use addslashes for example (but almost all encoding functions would be vulnerable):

// add slashes!
foreach($_GET as $k=>$v)$_GET[$k]=addslashes("$v");

//  .... some code ...

// $name is encoded in utf8
$name=utf8_decode($_GET['name']);
mysql_query("SELECT * FROM table WHERE name='$name';");

?>


2.- Ill formed sequences:
As REQUIRED by UNICODE 3.0, and noted in the Unicode Technical Report #36, if a leading byte is followed by an invalid successor byte, then it should NOT consume it.
    http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences

VULN: PHP will consume invalid bytes.

Why is this a vulnerability?

It will allow an attacker to "eat" controll chars. For example:

// htmlentities
foreach($_GET as $k=>$v)$_GET[$k]=htmlentities("$v",ENT_QUOTES);

//  ... some code ...

$name=$_GET['name'];
$url=$_GET['url'];

//  ... some code ...

$profileImage="<img alt=\"Photo of $name\" src=\"http://$url\" />";

// ... some code ...
echo utf8_decode($profileImage);
?>

A request such as:

?name=%90&src=%20onerror=alert(1)%20

Will execute the code "alert(1)" when the page loads.

Note that htmlpurifier does a utf8_decode function call at the end of the decoding, BUT they are safe because of a pre-encoding made by htmlpurifier.. other codes that do the same wont be so lucky.

Bogdan Calin from Acunetix WVS described a couple of other potential attack scenarios:







Where an attacker could fool the filter by doing a request like:
vuln.php?input=%F6%3Cimg+onmouseover=prompt(/xss/)//%F6%3E

And:



Where an attacker could fool the filter by doing a request like:
index.php?username=test%FC%27%27+or+1=1+–+&password=a

3.- Integer overflow:
Unsigned short has a size of 16 bits (2 bytes), that is UNCAPABLE of storing unicode characters of 21 bits, and represented on UTF with 4 bytes (1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx). PHP attempts to sum a 21 bits value to a 16 bits-size variable, and then makes no checks on the value.

The affected code follows:

//  php/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode(    //  ...
{
    int pos = len;
    char *newbuf = emallo    //  ...
    unsigned short c;          // sizeof(unsigned short)==16 bits
    char (*decoder)(unsig    //  ...
    xml_encoding *enc = x    //  ...
//  ...
//  #580
    c = (unsigned char)(*s);
    if (c >= 0xf0) {         /* four bytes encoded, 21 bits */
        if(pos-4 >= 0) {
            c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | (s[3]&63);
        } else {
            c = '?';   
        }
        s += 4;
        pos -= 4;
//  ...

The relevant part of the code is of course, the declaration of c as an unsigned int, the comment specifing that the char is 21 bits, and this:
x= ((s[0]&7)<<18) | ...

s[0]&7<<18 means it will move 3 bits, 18 bits to the right. As we noted before.. c's size is only 16 bits.
(xxxx xxxx & 0000 0111) << 18

Also, this part:
...  ((s[1]&63)<<12) | ...

s[1]&63<<12 means it will move 6 bits, 12 bits to the right. So, 2 bits are going to be lost.
(xxxx xxxx & 0011 1111) << 12

This allows us to make something even more interesting.

Code like this:

%FF%F0%40%FC that is invalid unicode, overlong, and all you want (definatelly NOT valid), will be casted as a "lower than" simbol (<).

http://eaea.sirdarckcat.net/xss.php?unicode&html_xss=%FF%F0%40%FC

This besides the already mentioned problems, and the possibility of bypassing quite a lot of WAFs and Filters.. demonstrate the problem of a bad unicode implementation on PHP.

I hope the PHP development team acknowledges all this issues that have been reported before, and were explained some months ago on Blackhat USA (and the developers were noticed to check the ppt more than once), and now are explained yet another time.

This was fixed on 5.2.11 :) on my birthday!! Sept 17

Anyway.. that's not all, now to finish this post I want to publish a overlong utf-8 exception on Firefox (actually, Mozilla's).

The firefox one


Firefox is supposed to consider the non-shortest form exception (point #1 in the PHP vulnerabilities), and section 3.1 of the Unicode Technical Report #36 but apparently there's a flaw on it. This is specially problematic for the reasons that an overlong unicode sequence not taken into consideration may allow several types of filter bypasses.

Anyway, the severity of this vulnerability is not as high as the PHP ones, but is worth mentioning. The following non-shortest form for the char U+1000:
0xF0 0x81 0x80 0x80

is allowed, as well as the correct shortest form:
0xE1 0x80 0x80

Note that this problem is only present on the 4 bytes representation.

You can track this bug at:
https://bugzilla.mozilla.org/show_bug.cgi?id=522634

Anyway, that's all! Thanks for your time :)

Greetings!!

7 comments:

  1. Those are interesting issues, but tbh, I wouldn't blame PHP for introducing vulnerabilities, doing decoding *after* escaping (or anything security related) is just asking for worlds of security pain, see half-width characters, etc.

    ReplyDelete
  2. yeah, I agree that this is not exactly a PHP-core vuln.. but small things like this in webapps make stuff easy to break.. just to name one, the xss filters (like webkit, and ie) wont catch this.. at least noscript will.. haha

    ReplyDelete
  3. Firefox's bug 522634 is already fixed on Trunk (see also: bug 511859).

    ReplyDelete
  4. It is not good to use decodeURI() to test Firefox's internal UTF-8 decoder.

    Because UTF-8 decoder of decodeURI and decodeURIComponent is dependent from Firefox's internal UTF-8 decoder.

    ReplyDelete
  5. >Because UTF-8 docoder of ... is dependent from ...
    Oops , what I want to say, is
    Decoder used in decodeURI and decodeURIComponent is INDEPENDENT from ...

    ReplyDelete
  6. Hi!

    1. On the second example, on the query string, what does %90 is for? to masquerade as UTF-8?
    2. In the same example, the query string shouldn't be url= instead of src=
    3. I'm using PHP 5.2.8 and the function htmlentities() converts %22 into " entity.
    4.

    ReplyDelete
  7. Firefox3.6 is released , and it contains fix of this problem.

    ReplyDelete