After some recent discussion concerning the use of document.write() in XHTML documents served with the doctype “application/xhtml+xml” I decided to revisit the problem. An issue with the solutions proposed by Sam and Ajaxian is that they aren’t really solutions – just a lot of hand waving (not that that’s bad, it’s just that the problem is a lot harder than what they propose).
So I sat down and decided to write a semi-complete document.write() replacement for Firefox 1.5+, Opera 9, and Safari 2+ – all handling straight XHTML documents served with an “application/xhtml+xml” content-type.
Note: Notice that I completely ignore Internet Explorer. Since IE doesn’t even know to render XHTML pages (served with the correct mimetype), I’m assuming that you’re doing some form of browser sniffing in your code (on the server). If that’s the case, then you may be serving a different version of the page, and not include the (at this point) unnecessary document.write() hack. If you want to serve only one version of the code, then I suggest that you use conditional comments, or do some client-side browser sniffing to serve the hack to those that need it. (This is mostly because I have yet to find a way to reliably detect a broken document.write() implementation.)
I had a couple of goals for my solution:
- It should be as faithful to the normal document.write() as possible. (This means arbitrary injection of XHTML into the DOM)
- It should inject the XHTML into the document at the current DOM position.
- It should correct for basic weird things that people do (like using write to add invalid XHTML to a document – and writing out closing tags). Stuff like this:
document.write("<iframe src='test.html'>"); // ... some code ... document.write("</iframe>");
- It should make Google Adsense work, with no code modification.
I’ll start by saying that solving this problem in Mozilla “isn’t that bad” nor is it in Opera. Safari is a royal PITA, which I’ll talk about, more, later.
The vast majority of the cross-browser issues that occur relate to how innerHTML works in XHTML documents. In order to make document.write() work as you would expect it to, you need to write out straight (X)HTML. This topic has been discussed extensively by some of the great JavaScript and standards developers in the industry.
A Solution
So I’ve developed a basic solution to the document.write()/XHTML problem. The full code for which can found found below, along with a demo of it in action here:
https://johnresig.com/apps/write.xhtml
document.write = function(str){
var moz = !window.opera && !/Apple/.test(navigator.vendor);
// Watch for writing out closing tags, we just
// ignore these (as we auto-generate our own)
if ( str.match(/^<\//) ) return;
// Make sure & are formatted properly, but Opera
// messes this up and just ignores it
if ( !window.opera )
str = str.replace(/&(?![#a-z0-9]+;)/g, "&");
// Watch for when no closing tag is provided
// (Only does one element, quite weak)
str = str.replace(/<([a-z]+)(.*[^\/])>$/, “<$1$2>$1>“);
// Mozilla assumes that everything in XHTML innerHTML
// is actually XHTML – Opera and Safari assume that it’s XML
if ( !moz )
str = str.replace(/(<[a-z]+)/g, "$1 xmlns='http://www.w3.org/1999/xhtml'");
// The HTML needs to be within a XHTML element
var div = document.createElementNS("http://www.w3.org/1999/xhtml","div");
div.innerHTML = str;
// Find the last element in the document
var pos;
// Opera and Safari treat getElementsByTagName("*") accurately
// always including the last element on the page
if ( !moz ) {
pos = document.getElementsByTagName("*");
pos = pos[pos.length - 1];
// Mozilla does not, we have to traverse manually
} else {
pos = document;
while ( pos.lastChild && pos.lastChild.nodeType == 1 )
pos = pos.lastChild;
}
// Add all the nodes in that position
var nodes = div.childNodes;
while ( nodes.length )
pos.parentNode.appendChild( nodes[0] );
};[/js]
It's important to note what this solution does - and does not - work for.
- The code will work perfectly for well-formed XHTML markup. This code only does basic “crappy HTML” checks. For example, if you do: document.write(“<img src=’foo.jpg’>”) it’ll correct it to become XHTML compliant (with the extra / at the end). However, doing document.write(“<img src=’foo.jpg’> <img src=’bar.jpg’>”); will break – as only the last element in the document.write() is “fixed”. (And even then, the fixing isn’t very smart – it just adds a closing tag, which may not always be correct.) Much of this can be fixed with some smarter regular expressions. I took a stab at it, but cross-browser support for variable negative lookaheads seems to be shaky, at best.
- When using innerHTML in an XML document in Opera and Safari, it assumes that all elements are just XML elements. For this reason the code forcefully puts all elements in the XHTML namespace. Again, this is pretty crude and may break some of your markup, but it’s worked well for me so far.
- The only extra purification that’s performed is the conversion of ampersands (&) to their entity code (&) – where appropriate. If you have other symbols (like < or >, then I can’t make any guarantees.)
- It’s also interesting to note that two completely different methods of traversing the document had to be used. Mozilla-based browsers start acting really strange when you do getElementsByTagName(“*”) inline in an XHTML document. It will always work fine for the first document.write(), but all subsequent calls will revert back to the position of the last inline <script/>.
- In the end, this is still not as good as document.write() since with .write() you can write out stuff like table rows, options, partial HTML, script elements, all without blinking an eye. The code to handle all of this is quite significant (having written the code to do it for jQuery, you can take my word for it). I don’t plan on re-writing all of that special-case code, so please only use this solution for simple fixes.
Ok, so now that that’s out of the way – let’s see how well this works in the different browsers.
Firefox 1.5+ | Opera 9 | Safari 2 | Webkit (Safari 3) |
|
---|---|---|---|---|
Simple Text Insertion | Pass | Pass | Fail | Pass |
Simple HTML Insertion | Pass | Pass | Fail | Pass |
Google Adsense | Pass | Pass | Fail | Sort-of Fail |
So here’s the dirt on Safari. I spent many hours banging my head against the keyboard and finally admitted defeat in Webkit for Adsense and anything in Safari 2.0. Here’s the issues:
- Safari 2.0 completely rejects any attempts to use innerHTML in an XHTML document. It throws exceptions and simply will not let you do it. For this reason, Safari (as it is currently available) is a lost cause.
- Webkit Nightlites (Safari 3.0) on the other hand, fixed the innerHTML problems – allowing it to work nearly flawlessly. You can see that on the demo page (in a Webkit Nightly) that the Google Adsense IFrame is inserted into the page – and a URL is even requested – however the Adsense script seems to be fundamentally flawed. Looking at the URL generated for Webkit vs. the URL generated for Firefox or Opera, it is apparent that the Adsense script simply isn’t working correctly. So while, technically, Adsense does not (currently) work in the Webkit Nightly, with this hack, it seems like it’s not by a fault of mine.
In all, this hack was an interesting experience – considering that every browser seems to behave in some sort of nonsensical fashion (in one way or another). I’m glad that there’s, at least, a solution now for two of the major browsers (and possibly the next version of Safari too, after some more tinkering). I was, perhaps, most pleasantly surprised by Firefox’s innerHTML/XHTML implementation. You feed it valid XHTML, it inserts it into the document. Any other value throws an exception. Very simple and logical.
As a side note: I’m going to try and feed some of this code back into jQuery, so that stuff like $(…).append(““) will work as you might expect it to in the major browsers.
It’s pretty obvious that writing XHTML documents with the preferred mimetype is still a ways off from real-world usage, however I’m more hopeful now than I was before – which is good, to say the least.
Sam Ruby (November 12, 2006 at 4:17 pm)
1) If you look at the AdSense code, there is at least one place where it writes out a script element including both the start and closing tag. This will be completely ignored by the implementation of document.write above.
2) For IE, you don’t want to override document.write at all.
3) For Safari, you could always try a DOM parser: http://pagead2.googlesyndication.com/pagead/show_ads.js
Sam Ruby (November 12, 2006 at 4:18 pm)
Oops, ignore #1… didn’t notice the ^.
Andi (November 12, 2006 at 5:58 pm)
http://www.intertwingly.net/blog/2006/11/10/Thats-Not-Write#comments
My solution works in Safari 2 – and includes a decision – overwrite document.write or not?
John Resig (November 12, 2006 at 7:13 pm)
@Sam: Thanks for your comments. I didn’t include IE in this document.write() implementation simply because some form of browser sniffing is going to occur in order to serve the page properly to IE anyway. Therefore, I sort of assumed that either a different version of the site was going to be served – or that conditional comments were going to be used. I guess I could state that more clearly in the original post, as that’s a fairly large assumption.
I also tried messing around with some of the DOMParser stuff, but generally didn’t have much luck. It really seems like the current version of Safari is a dead-end.
@Andi: I’m not sure how you tested your script, but it doesn’t work at all for me:
http://ejohn.org/apps/write2.xhtml
Adsense completely fails in all browsers and nothing works in Safari – in fact, it even crashes the Webkit nightly! I originally did the same try/catch document.write() sniffing that you’ve done, but it in the end it didn’t fail as often enough as it should have, to be a reliable test.
Mark Rowe (November 12, 2006 at 8:47 pm)
John,
Can I ask you to please file bug reports () on the issues you have noticed in Safari and WebKit? The crash you mentioned in the nightly build is of particular interest, but bug reports on the other areas that behave incorrectly would be very much appreciated.
Thanks!
John Resig (November 12, 2006 at 9:34 pm)
@Mark: Sure thing. I’m going to see if I can localize the issues. I have a haunting suspicion that some of the issues are derived from how Webkit handles ampersands, in URLs, in innerHTML, in XHTML documents. I’ll try to pump out a couple test cases and post them to the tracker.
Mark Rowe (November 13, 2006 at 2:32 am)
Hrmph, I didn’t notice when I posted my comment but the blog software ate the URL… http://webkit.org/quality/reporting.html has information about the process of filing bugs on WebKit, if you’re not already familiar with it.
Richard (November 21, 2006 at 2:43 pm)
Thanks
Lindsey Simon (December 18, 2006 at 8:49 pm)
Wow, this is great, thanks John!
For anyone curious, this trick also makes it work to load the Google Maps API on the fly, which otherwise fails because of bad tags / document.write. Right now, I have a page with a map and a lot of functionality is waiting on DOMLoad to be ready, but when you have a map, that can take a while – so my page was blocking all other JS functionality on that map api coming down the wire, which kind of sucked. Now, I can lazy load it and then poll for the functionality before initializing the map, meanwhile the rest of the page is “alive”.
Maestro (January 10, 2007 at 5:07 pm)
the use of document.write() in XHTML documents served with the doctype “application/xhtml+xml”
Sorry, didn’t have a chance to sniff HTTP traffic on my server. Are HTTP headers part of XHTML 1.0 Strict specifications, indeed?
I don’t see any problem with AdSense and XHTML 1.0 Strict.
‘document.write’ is not a part of XHTML specs, and not mentioned here tag is not a part of XHTML 1.0 Strict. However, it does not prevent a browser to do some kind of XML Transformation, such as transform XHTML 1.0 Strict-compliant XML to another XML with additional element, and then execute SGML (whatever).
Google does not use ‘document.write’ embedded into XHTML:
All this staff is executed during/after DOM is constructed (and validated against schema), and JavaScript simply modifies DOM (it is possible with any kind of other technology too, such as XSLT, C++, Java, …)
I run successfully AdSense with XHTML 1.0 Strict during a month, sorry for not having a chance to check Mime at HTTP-levels.
Yesterday I had a problem with my new site, IE-6 didn’t show a whole page after I enabled AdSense. However, I didn’t have any problems with Mozilla/Firefox. I am going to enable AdSense again, after some fixes (mostly in CSS).
Common solution with AdSense: I put it inside element.
Maestro (January 10, 2007 at 5:14 pm)
Please put [tag]iframe[…] in some sentences, were removed by your BLOG software in previous comment…
XHTML 1.0 Strict does not allow iFrame tag. However a) Google uses iFrame b) my website is XHTML 1.0 Strict based c) everything works just fine!
(I disabled AdSense temporary, due to CSS fixes; should be done tomorrow)
Fuad Efendi (January 10, 2007 at 10:48 pm)
I even added W3C XHTML 1.0 banner at bottom.
Tokenizer dot Org – faceted browsing with automated categorization (experimental).
XHTML 1.0 Strict works fine with AdSense (and it worked fine in September for another site, bambarbia dot com). (BTW, 1.0 strict does not allow <iframe> element)
All discussions around XHTML and IFRAME (and AdSense, and document.write, …) do not use correct wording, so we are in a loop. HTTP Headers and XHTML DTDs – from different operas.
JavaScript can transform XML, and IFRAME will be added to DOM during XHTML (correct word: XML) events, after XML validation, etc.
Thanks
bindon (February 11, 2007 at 6:12 am)
Thanks, John. That’s really helpful.
vkaryl (February 19, 2007 at 12:17 am)
Wow. Thanks, I just spent all day digging through abstruse documents and articles trying to figure out what to do about this (the problem with content negotiation, the xml prolog, and AdSense that is) and I finally found this page (from google, after I got through reading tons of stuff that didn’t seem to help!)
Johnny (February 28, 2007 at 3:17 pm)
This works with XHTML, but it still doesn’t work with XSLT in Mozilla. Does it?
Obi (March 1, 2007 at 3:24 pm)
Hello:
I am really in a bind. I need to AJAXify a web page that contains a charting component (ILOG). Unfortunately, charting component uses document.write() to write some divs and tags so I am having problems loading the charting component (mainly javascript) via AJAX.
Any ideas, please!!!!
Regards,
–Obi
PS: Site is an internal one in a company where 99% of users have IE
Bowker (March 16, 2007 at 12:14 am)
Your code is pretty slick for replacing document.write. I’m having trouble with absolutely positioned items. My guess is these objects are not placed in the same area in the DOM (but that’s totally a guess.)
Any clue how to get that to work?
Brandon (April 16, 2007 at 6:17 pm)
Thanks for this. Just changed over to application/xhtml+xml today and wondered what happened to my ads. Fortunately you turned up high on the search results and saved the day.
zhao-zhuxi (June 24, 2007 at 11:37 am)
Thank You! Works perfect! I have been looking for the solution for so long… Thanks again!
Bambarbia Kirkudu (July 17, 2007 at 4:31 pm)
I repeat again and again…
No any problems with IFRAME-based AdSense and XHTML 1.0 Strict, all these articles and workarounds are just… copywritings from wrong places.
Browser parses XHTML and creates a DOM model. At this stage it does not know yet anything about IFRAME and AdSence. Then it executes JavaScript. It creates a widget associated with IFRAME, inserts it into DOM, and runs it.
That’s it. Simple. Without any JavaScript expertise.
1. XHTML 1.0 Strict does not allow iframe
2. DOM is not the same as XHTML!
All such writings were caused because so many people think about plain XML as of smth which can be seen in a browser window, but it is not true. XML contains definitions of controls/widgets, and JavaScript can add additional control even if it is iframe.
Brandon: “Just changed over to application/xhtml+xml”
application/xhtml+xml is a MIME type, it is Transport-related. I have text/html as a MIME type. And XHTML 1.0 Strict as a content type.
People, please, do not mix things from different operas! Read all comments, something is not copywrited ;)
Thanks.
Chris Phillips (September 12, 2007 at 11:39 am)
# str = str.replace(/&(?![#a-z0-9]+;)/g, “&”);
# works in Opera 9.23 now
Maximos (October 20, 2007 at 2:01 am)
Nice