John Resig - XPath Overnight

XPath Overnight

A fascinating thing has happened in the world of JavaScript DOM traversal: Over the course of a couple months in 2007 three of the major JavaScript libraries (Prototype, Dojo, and Mootools) all switched their CSS selector engines to using the browser’s native XPath functionality, rather than doing traditional DOM traversal. What’s interesting about this is that the burden of functionality and performance has, literally, flipped overnight on to the browser’s XPath engine, from its pure DOM implementation.

There’s some really interesting things about this switch:

Native XPath is blazing fast. For a majority of CSS selectors it completely trumps using native DOM methods (like getElementsByTagName, for example). Sometimes it pays to special-case your code for selectors like #id, but overwhelmingly XPath is the direction in which JavaScript libraries are heading.
Since a large percentage of JavaScript users use JavaScript libraries (and, thus, use the behind-the-scenes XPath, as well) this means that browsers are now spending significantly more time processing XPath queries than they ever were before. This means that the performance field is now, effectively, split between two areas: Traditional DOM querying and XPath.
No one is analyzing the performance of browser XPath queries. Or, if they are, it’s certainly not public. I’m working on some new XPath performance tests, in order to bring them some more visibility, and hope to have them released this week.
XPath, while incredibly useful, is a black box. The developer has no control over how fast the results come back – or if they are even correct. Contrast this with traditional DOM scripting (where you can fine-tune your queries to perfection). Browsers will always be bound to have some bugs in their implementations. For example, Safari 3 isn’t capable of doing “-of-type” or “:empty” style CSS selectors, nor is any browser able to access the ‘checked’ property, or namespaced attributes (all noted in Prototype’s implementation) which means that they have to fall back to a traditional DOM scripting model.
Internet Explorer is a dead-end. Since most users want a CSS selector implementation that will work against HTML documents – and IE is unable to provide one – all CSS Selector implementations must provide two (2) side-by-side selector engines in order to handle these cases (not to mention the aforementioned cases where browsers provide unexpected behavior).

A couple things to take away from all of this:

XPath (and new methods like querySelector) are the way of the future for a lot of JavaScript libraries – and the next frontier for browser optimization.
These implementations are black boxes that are unable to be modified by the developer (leaving them vulnerable to browser bugs).
A dual DOM-only CSS selector engine must be provided well into the foreseeable future, by libraries, in order to account for browser mis-implementations.

I should, also, probably answer the inevitable question: “Why doesn’t jQuery have an XPath CSS Selector implementation?” For now, my answer is: I don’t want two selector implementations – it makes the code base significantly harder to maintain, increases the number of possible cross-browser bugs, and drastically increases the filesize of the resulting download. That being said, I’m strongly evaluating XPath for some troublesome selectors that could, potentially, provide some big performance wins to the end user. In the meantime, we’ve focused on optimizing the actual selectors that most people use (which are poorly represented in speed tests like SlickSpeed) but we hope to rectify in the future.

Posted: February 10th, 2008

Subscribe for email updates

19 Comments (Show Comments)

Mark (February 10, 2008 at 11:30 pm)

I’d be excited if the promise of this were fully realized, but the code-forking issues you’ve cited as well as potential inconsistencies or shortcomings browser-to-browser sound unruly.
funtomas (February 11, 2008 at 2:28 am)

Checked out the SlicSpeed test and found the jQuery is beaten in child selectors. Those are quite frequent ones.
Bob (February 11, 2008 at 2:29 am)

Hi John,

I think jQuery is so fast currently, that if you run into performance issues, it’s time to re-think your own code. I mean, I can’t think of a real-world case where jQuery’s selector speed would be the bottleneck.

That said, I agree with you that it’s highly preferable to stick to a code base that’s maintainable and contains fewer weird cross-browser issues.
Andrew Dupont (February 11, 2008 at 3:09 am)

I agree with all your points except for “drastically increases the filesize of the resulting download.” I’d been tossing around the XPath idea ever since Joe Hewitt did it back in 2006, but only coded it once I realized I could do so with high levels of code reuse alongside a traditional DOM approach. In our source code, selector.js is ~700 LOC; the parts relating to XPath represent ~125 LOC.
Daniel (February 11, 2008 at 3:33 am)

“The developer has no control over how fast the results come back – or if they are even correct.”

That’s not entirely true. When Opera first supported XPath, it had a bug which I reported. It was fixed in the next release, a couple of weeks later. They might have already known about it, they’re not very open about those things. But I was very impressed regardless.

Even if they hadn’t fixed it, I managed to find a workaround, just as most javascript libraries are full of workarounds. Luckily, I didn’t have to use it for long. I’ve found that on the whole XPath implementations are more reliable than javascript – it’s a simpler, better specified language.

You’re either dependent on the javascript implementation or the XPath implementation, it’s the same story. But I’m not saying the you should immediately jump on the XPath bandwagon though, javascript based selectors are good enough most of time.

But fine tuning your javascript is an odd advantage if it’s still slower than XPath.
Robert Nyman (February 11, 2008 at 4:01 am)

XPath rocks. Just wanted to mention that, from my testing, implementing things like empty ([count(child::*) = 0 and string-length(text()) = 0]) and checked ([@checked=’checked’]) has been working out just fine.
Robert Nyman (February 11, 2008 at 4:04 am)

Should mention that the ckecked solution doesn’t work in Opera, until version 9.5 (currently in beta).
Daniel PihlstrÃ¶m (February 11, 2008 at 4:45 am)

I think you may be overstating the issues with browser inconsistencies. They do obviously exist though.

The “:checked” and “:empty” pseudo-classes not working however was news to me, and in quick testing of my own implementation.. I can’t find a browser where they don’t work (not tested Safari 2). (Just running “.//*[@checked]” turns up any element that has a checked attribute in Safari 3, FF 2.0.0.12 and Opera 9.25).. So don’t know what I’m missing there.. Safari 3 on Mac, maybe.

I completely agree with “of-type” not working, there’s just no way that I can find to get it actually working with any browser implementations of xpath, unfortunately. Prototypes implementation is flawed in that it uses the position() method to pick out indexes of elements. This method returns the index of the returned element in the previous query. This means that it works fine for simple selectors “div:nth-of-type(1)” will yield the first divs in any element, but “div.test:nth-of-type(1)” will return the first div with class test in any element.. (i.e. there could be 10 divs without the test class before it, it’ll still match that div.test)

I can totally understand not supplying an XPath selection method, in that there really aren’t any situations where traditional dom traversal selection is unbearably slow.. And in the case where it is, well, it’s going to be slow in IE anyway.

Whether the file size has to increase so drastically when adding XPath support I’m not entirely convinced of. It depends, I suppose, on how extensively you intend to embed it. In my own selection routines adding XPath support was actually surprisingly fast and light-weight.
Anup Shah (February 11, 2008 at 8:32 am)

I will be really interested to see how this goes, in terms of performance. I really like XPath (XSLT, etc), but XPath can be written badly really quickly, with regards to performance.

E.g. //div[@class=”section”] will result in a tree walk of every single element in the document which I have seen time and time again as a performance killer (I am mostly experienced using XPath in MSXML, System.XML (i.e. .NET), and PHP 5, all on the server side, not client side, but from what I have read, these performance concerns will likely be the same elsewhere.)

if you are able to optimize the code (somehow) so the above can at least become /html/body//div[@class=”section”] (or even more granular if you are passed in some context) then I guess that will be quite helpful.

That being said, your reasons for not wanting to do too much in this area at this time seems sensible…
Dean Edwards (February 11, 2008 at 8:37 am)

It is definitely possible to write slow XPath queries. base2 has optimised XPath queries. That’s why it is faster than the “major” libraries.
David Smith (February 11, 2008 at 9:28 am)

Perhaps I’m biased here by having worked on WebKit’s querySelector implementation, but it seems to me that XPath is merely an interesting stopover point on the road to real selector lookup. I wouldn’t be at all surprised to see that XPath performance actually *decreases* in importance over the next few years.
Dave Savage (February 11, 2008 at 10:38 am)

This would be very helpful info. I’m a fan of whatever will get the job done quickest with least impact on performance.
Dean Edwards (February 11, 2008 at 10:53 am)

@David – I’m not sure that querySelector is such a “real selector lookup”. XPath is far more expressive. It’s just that web developers are more familiar with CSS style selectors than they are with XPath.
Wade Harrell (February 11, 2008 at 11:14 am)

Personally I can not disconnect XPath and XSLT in my head, I read about one I think of the other. IE5 had client-side XSLT ages ago. But since Safari did not support it until 3 one could not build a public facing site with it. In the mean time JSON has gained favor over XML, and CSS selectors over XPath. It is a funny dance all these technologies make.

I am curious if it is possible to wean developers off the CSS Selector abstraction to use the built in technologies?

p.s. What happened to the jQuery XPath plugin that was released when XPath was removed from the core?
David Smith (February 11, 2008 at 11:30 am)

@Dean – I suppose I should clarify what I said to read “…it seems to me that XPath *as discussed in this post* is merely an interesting stopover point on the road to real selector lookup”.

XPath is certainly a very powerful technology, but using it to reimplement CSS selector matching systems that browsers have already fine-tuned is slightly pointless when browser-hosted implementations are both faster and (in my experience) easier to write. I’m sure that XPath as XPath will remain useful.
Diego Perini (February 12, 2008 at 1:38 pm)

@John,
I completely agree about “XPath” not being much faster than DOM Traversal, at least not in actual frameworks implementation,
and you have already seen my NWMatcher beating them all.

http://javascript.nwbox.com/NWMatcher/

I still believe that a well done implementation of “XPath” should
be much faster, I don’t understand where these OO frameworks loose
their time, and sometime they also return incorrect results.

From what I understand, they implemented “XPath” to be faster
with complex selector like “nth” query and the “of-type” variants,
but in the end they simply return wrong results in many occasions.

So if “XPath” is still so buggy, or not existent in some browser,
I wouldn’t bother implementing it since as you already said this
will introduce more bugs, infinite workaround for each browser
and a considerable amount of code.

Could someone explain why some of the “nth” and “of-type” queries return wrong or inconsistent results in different frameworks using XPath (mootools/prototype) ?
Diego Perini (February 12, 2008 at 1:49 pm)

@Dean,
I just realized you said above you have optimized “XPath” selectors.

Where can I pick a copy of your new selectors to try ?
Shog9 (February 25, 2008 at 1:36 pm)

Just wanted to toss in my own experience here. There’s a Greasemonkey script of mine that uses both jQuery and XPath heavily, often in combination. There are two reasons for this:

1) A GM script has to load very quickly, and operate on pages that i have no control over and aren’t necessarily optimized for selectors. In situations where i’m delving deeply into nested tables looking for a specific pattern, using XPath can be the difference between sluggish and imperceptible.

2) I frequently request pages asynchronously, looking to extract a few key bits of information from the response. The response text is used to construct a DOM tree that’s queried but never actually gets inserted into the DOM of the displayed page. For some reason, using jQuery (or any technique for traversing the DOM other than XPath) adds a huge initial delay – presumably, Mozilla holds off on some initialization until it’s actually needed. Again, the choice is between horribly slow and blazingly fast – and here, both DOM scripting and XPath are black boxes, with normally fast operations taking potentially ridiculous amounts of time to complete.

I’m not saying either of these are good arguments for building an XPath-based engine into jQuery. Just throwing in my experiences with their trade-offs.
Henrik Lindqvist (March 15, 2008 at 4:17 pm)

An XPath implementation doesn’t have to “drastically increases the filesize”. Our implementation is only 13K. See http://llamalab.com/js/xpath/.

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Subscribe for email updates

@jeresig / Mastodon

Infrequent, short, updates and links.

XPath Overnight

19 Comments (Show Comments)

Mark (February 10, 2008 at 11:30 pm)

funtomas (February 11, 2008 at 2:28 am)

Bob (February 11, 2008 at 2:29 am)

Andrew Dupont (February 11, 2008 at 3:09 am)

Daniel (February 11, 2008 at 3:33 am)

Robert Nyman (February 11, 2008 at 4:01 am)

Robert Nyman (February 11, 2008 at 4:04 am)

Daniel PihlstrÃ¶m (February 11, 2008 at 4:45 am)

Anup Shah (February 11, 2008 at 8:32 am)

Dean Edwards (February 11, 2008 at 8:37 am)

David Smith (February 11, 2008 at 9:28 am)

Dave Savage (February 11, 2008 at 10:38 am)

Dean Edwards (February 11, 2008 at 10:53 am)

Wade Harrell (February 11, 2008 at 11:14 am)

David Smith (February 11, 2008 at 11:30 am)

Diego Perini (February 12, 2008 at 1:38 pm)

Diego Perini (February 12, 2008 at 1:49 pm)

Shog9 (February 25, 2008 at 1:36 pm)

Henrik Lindqvist (March 15, 2008 at 4:17 pm)

Secrets of the JS Ninja

Subscribe for email updates

@jeresig / Mastodon