What the Google Wireless Transcoder does, is pretty simple, it get's HTML code, and translate's it into XHTML mobile compilant code.
The way it works is a little mysterious (it's made in Java b.t.w.).. It could be something similar to HTML Purifier (this was pointed out by .mario), but I would say, that it works as a server-side browser, that generates valid XHTML code reconstructing the DOM.. (which in fact is not very difficult to do).. I think this because there are some errors very similar to other Java browsers, like jrex.. or jakarta (the GWT supports ftp:// gopher:// http:// between others..) for example:
- http://www.google.com/gwt/n?u=a:@
- Host name may not be null
http://jakarta.apache.org/commons/httpclient/xref/org/apache/commons/httpclient/HttpHost.html
So this makes me believe that they (at least) use HttpHost.java
They also use BASE64DecoderStream.java
Any way, there are some other errors like this one:
- http://www.googlr.com/gwt/n?u=javascript:123
- XML Parsing Error: no element found
(this website is googlr.com, that is a mirror of google.com, for avoiding the session generated at google.com).
We can also see that GWT, can be used as a "redirector", like:
http://www.google.com/gwt/n?u
note the _gwt_pg
We can also temporarily host images, we just need to enter any website that contains images, (like google.com).
http://www.google.com/gwt/n?u=www.google.com
and the logo, will have an url simillar to:
http://www.google.com/gwt/i?i=01F8441E4_F9610322_4DB7F91D
Another interesing thing that RSnake pointed out is that, this "internal proxy's" are "logically
separated from their internal addresses." Any way, I found very interesting that:
http://www.google.com/gwt/n?u
http://www.google.com/gwt/n?u=gopher://unexistent
returns something different to:
http://www.google.com/gwt/n?u
http://www.google.com/gwt/n?u
http://www.google.com/gwt/n?u
Even do local.sirdarckcat.net, and localhost (supposedly) point's to 127.0.0.1, but localhostABCD doesn't. why.. gopher://unexistent is different to gopher://localhostABCD ? maybe it's a way to avoid an attacker to contact 127.0.0.1..
We could try to enumerate the "alive" hosts with local.sirdarckcat.net:port#, but as far as I tested, all ports return's the same.
Something else that was discovered was that GWT parses data URIs.
http://www.google.com/gwt/n?u
pretty amazing it's the first web-proxy (I've seen) that actually parses them..
For ending, I think that GWT is a great tool, has a lot of features (some of them hidden to naive eyes). I think this should be investigated more deeply, (for example the impact of using GWT as a SEO technique, to use GWT pagerank as an inbound link to your site).
Greetz!!