Sunday, October 04, 2015

Range Responses: Mix, Match & Leak

Hey!

The videos from AppSec 2015 are now online, and the Service Workers talk is too. Anyway, this post is about another of the slides in the presentation about Range Requests / Responses (or.. more commonly known as Byte Serving) and Service Workers.

As things go, turns out you can read content cross-domain with this attack, and this post explains how that works. Range Responses are essentially normal HTTP responses that only contain a subset of the body. It works like this:

It's relatively straight forward! With service workers in the mix, you can do similar things:

And, I mean, what happens there is that the Service Worker intercepts the first request and gives you LoL instead of Foo. Note that this works cross-origin since Service Workers apply to the page that created the request (so, say, if evil.com embeds a video from apple.com, the service worker from evil.com will be able to intercept the request to apple.com). One thing to note about this, is that the Service Worker actually controls the total length of the response. What this means, is that even if the actual response is 2MB if the Service Worker says it's 100KB, the browser will believe the Service Worker (it seems browsers respect the first response size they see, in this case, the one from the service worker).

This all started when I noticed is that you could split a response in two, one generated by a Service Worker and one that isn't. To clarify, the code below, intercepts a request to a song. What's interesting about it, is that it actually serves part of the response from the Service Worker (the first 3 bytes).

Another interesting aspect, is that the size of the file is truncated to 5,000 bytes. This is because the browser remembers the first provided length.

The code for this (before the bug was patched by Chrome) was:
  // Basic truncation example
  if (url.pathname == '/wikipedia/ru/c/c6/Glorious.ogg') {
    if (e.request.headers.get('range') == 'bytes=0-') {
      console.log('responding with fake response');
      e.respondWith(
        new Response(
          'Ogg', {status: 206, headers: {'content-range': 'bytes 0-3/5000'}}))
    } else {
      console.log('ignoring request');
    }
    return;
  }
So, to reiterate, the request is responding with "Ogg" as the first three bytes, and truncating the response to 5,000 bytes. Alright, so what can you do with this? Essentially, you can partially control audio/video responses, but it's unlikely this can cause a lot of problems, right? I mean, so what you can make a bad song last 1 second rather than 30 seconds?.

To exploit this we need to take a step back and see the data as the video decoder sees it. In essence, the video decoder doesn't actually know anything about range responses, it just sees a stream of bytes. Imagine if the video that the decoder sees had the following format:

  • [Bytes 0-100] Video Header
  • [Bytes 101-200] Video Frame
  • [Bytes 201-202] Checksum of Video Frame
But the content in ORANGE comes from the Service Worker, and the content in RED comes from the target/victim site. A clever attacker would send 65 thousand different video frames until one of them doesn't error out, and then we would know what is the value of the bytes 201 and 202 of the target/victim site.

I searched for a while for a video format or container with this property, but unfortunately didn't find one. To be honest, I didn't look too hard as it's really confusing to read these specs but essentially after like 1 hour of searching and scratching my head I gave up, and decided to do some fuzzing instead.

The fuzzing was essentially like this:
  1. Get a sample of video and audio files.
  2. Load them in audio/video tags.
  3. Listen to all events.
  4. Fetch the video from length 0-X, and claim a length of X+1.
  5. Have the last byte be 0xFF.
  6. Repeat
And whenever it found a difference between the response on 0-X and 0-X+1, that means there's an info leak! I did this and in a few hours and after wasting my time looking at MP3 files (it seems they just give out approximate duration at random) I found a candidate. Turns out it wasn't a checksum, it was something even better! :)

So, I found that a specific Encrypted WebM video errors out when truncated to 306 bytes if the last byte is greater than 0x29, but triggers an onencrypted event when it's less than that. This seems super weird, but that's a good start, I guess. I created a proof of concept, where I tested whether the byte in a cross-origin resource is less than 0x29 or not. If you have a vulnerable browser (Chrome <=44) you can see this proof of concept here:
If you compare both pages you will see that one says PASS and the other says FAIL. The reason is because in the position 306 of the robots.txt of www.bing.com there is a new line, while in the position 306 of www.yahoo.com there is a letter.

Good enough you might think! I can now know whether position 306 is a space or not. Not really very serious, but serious enough to be cool, right? No. Of course not.

First of all, I wanted to know how to make this apply to any offset, not just 306. It was easy enough to just create another EBML object of arbitrary size. Then that's about it! You just make the size longer and longer. So first problem solved. Now you can change the byte offset you are testing.

But still, the only thing I can know is whether the character at that position is a space or not. So, I started looking into why that errored out, and turns out it was because the dimensions of the video were too big. There is a limit on the pixel size of a video because of some reason, and the byte in position 306 happened to be the video height dimensions that was longer than it was valid.

So, well.. now that we know that, can we learn the exact value? What if we tried to load the video 256 times, each time with a different width value, for which it would overflow if the size is too big. The formula the video decoder was using for calculating the maximum dimensions is:


So! Seems easy enough then. I get the minimum value on which it errors out, and calculate it's complement to INT_MAX/8 (minus 128), and that's the value of the byte at that position. Since this is a "less than" comparison you can solve this minimum value with a simple binary search.

And that's it. I made a slightly nicer exploit here, although the code is quite unreadable. The PoC will try to steal byte by byte of the robots.txt of www.bing.com.

Severity-wise fortunately, this isn't an end-of-the-world situation. Range Responses (or.. Byte Serving) don't actually work on any dynamically generated content I found online. This is most likely because dynamic generated content isn't static, so requesting it in chunks doesn't make sense.

And that's it for today! It's worth noting that it's possible this isn't specific with Service Workers as it seems that HTTP redirects have a similar effect (you make two requests to a same-origin video, the first time you respond with a truncated video, the second time you redirect to another domain).

Specially since Service Workers aren't as widely supported yet. Feel free to write a PoC for that (you can use this for inspiration). And once the bug is public, you can follow along this bug discovery here and the spec discussion here.

The bug is fixed in Chrome, but I didn't test other browsers as thoroughly for lack of time (and this blog post is already delayed for like 3 months because of Chrome, so didn't want to embargo this further). If you do find a way to exploit this elsewhere, please let me know the results, and happy to help if you get stuck.

Have a nice day!