John Resig - XPath and CSS Selectors

XPath and CSS Selectors

Lately, I’ve been doing a lot of work building a parser for both XPath and CSS 3 – and I was amazed at just how similar they are, in some respects – but wholly different in others. For example, CSS is completely tuned to be used with HTML, with the use of #id (to get something by ID) and .class (to get something by its class). On the other hand, XPath has to ability to traverse back up the DOM tree with .. and test for existance with foo[bar] (foo has a bar element child). The biggest thing to realize is that CSS Selectors are, typically, very short – but woefully underpowered, when compared to XPath.

I thought it would be worth some merit to do a side-by-side comparison of the different syntaxes of the two selectors.

Goal	CSS 3	XPath
All Elements	*	//*
All P Elements	p	//p
All Child Elements	p > *	//p/*
Element By ID	#foo	//*[@id=’foo’]
Element By Class	.foo	//*[contains(@class,’foo’)] ¹
Element With Attribute	*[title]	//*[@title]
First Child of All P	p > *:first-child	//p/*[0]
All P with an A child	Not possible	//p[a]
Next Element	p + *	//p/following-sibling::*[0]

Syntactically, I was surprised how similar the two selectors were, in some cases – especially between the ‘>’ and ‘/’ tokens. While they don’t always mean the same thing (depending on what axis you’re using in XPath), they’re generally assumed to mean the child element of the parent. Also, the ‘ ‘ (space) and ‘//’ both mean ‘all descendants of the current element’. Finally, the ‘*’ means ‘all elements’, regardless of their name, in both.

Even though I already knew all of this ahead of time – it’s certainly nice being able to rediscover the similarities when it comes down actually having to program an implementation of them.

¹ This isn’t right due to the fact that it would match ‘foobar’ and ‘foo bar’, when only the second pattern is correct. The actual syntax would be far more complex and would probably require multiple expressions to get the job done.

Posted: December 13th, 2005

Subscribe for email updates

10 Comments (Show Comments)

karl (December 29, 2005 at 9:54 pm)

There are two times this.

All Child Elements p > * //p/*
John Resig (December 30, 2005 at 2:42 am)

Good catch! I must’ve been copying that line to create other lines, and it slipped through.
Christopher Sahnwaldt (March 30, 2006 at 8:10 am)

Regarding footnote 1: Unless I’m missing something, you can use the xpath “//*[@class=â€™fooâ€™]” to match all elements with class=’foo’. This should do the same as the CSS selector “.foo”.
John Resig (March 30, 2006 at 12:33 pm)

Christopher: It’s possible to have multiple classes on a single element, for example: class=”foo bar baz”
daddydave (May 31, 2006 at 11:02 pm)

Try //*[contains(concat(” “, @class, ” “),concat(” “, “foo”, ” “))] for the Footnote 1 problem.
Steven (September 7, 2006 at 7:56 am)

Make that

//*[contains(concat(” “, @class, ” “),” foo “)]
Steven (September 7, 2006 at 7:59 am)

Strictly speaking the id and class examples in CSS are only for HTML.

The general case is:

*[id=”foo”]
and
*[class~=”foo”]
PS (February 9, 2007 at 11:23 am)

shouldn’t the XPath equivalent of #foo be id(‘foo’) ?
AydÄ±n (February 15, 2007 at 2:03 pm)

css and xsl comparison (xslt& xsl-fo) must be added to this article. To me it is a vain activity to consume energy on css. There is a powerful technology standing on the shelf, waiting browsers to give support…
andy (April 25, 2007 at 10:08 am)

IMHO xsl-fo is half-baked too – it also has a rigidly defined set of visual properties, and a lot fewer than CSS at that.

As an old-time coder all these lame technologies are eventually frustrating. They all seem to be half baked attempts to “simplify” things for non-coders; but just end up making it more and more complex because they are not programmatically complete: Specific functions have to be added to the standards to do things. The standards bloat until it’s impossible to make or even understand a fully compliant implementation, and eventually people get so frustrated they create a new set of simplere standards that suffer the same fundamental flaws.

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.