John Resig - Unimpressed by NodeIterator

Unimpressed by NodeIterator

I just posted a run down of some of the new DOM Traversal APIs in Firefox 3.5. The first half of the post is mostly a recap of my old Element Traversal API post.

The second half of the post is all about the new NodeIterator API that was just implemented. For those that are familiar with some of the DOM TreeWalker APIs this will look quite familiar.

It’s my opinion, though, that this API is, at best, bloated, and at worst incredibly misguided and impractical for day-to-day use.

Observe the method signature of createNodeIterator:

var nodeIterator = document.createNodeIterator(
  root, // root node for the traversal
  whatToShow, // a set of constants to filter against
  filter, // an object with a function for advanced filtering
  entityReferenceExpansion // if entity reference children so be expanded
);

This is excessive for what should be, at most, a simple way to traverse DOM nodes.

To start, you must create a NodeIterator using the createNodeIterator method. This is fine except this method only exists on the Document node – which is especially strange since the first argument is the node which should be used as the root of the traversal. The first argument shouldn’t exist and you should be able to call the method on any DOM element, document, or fragment.

Second, in order to specify which types of nodes you wish to see you need to provide a number (which is the result of the addition of various constants) that the results will be filtered against. This is pretty insane so let me break this down. The NodeFilter object contains a number of properties representing the different types of nodes that exist. Each property has a number associated with it (which makes sense, this way the method can uniquely identify which type of node to look for). But then the crazy comes in: In order to select multiple, different, types of nodes you must OR together the properties to creating a resulting number that’ll be passed in.

For example if you wanted to find all elements, comments, and text nodes you would do:

NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT

I’m not sure if you can get a much more counter-intuitive JavaScript API than that (you can certainly expect little, to no, common developer adoption, that’s for sure).

Next, the filter argument accepts an object that has a method (called acceptNode) which is capable of further filtering the node results before being returned from the iterator. This means that the function will be called on every applicable node (as specified by the previous whatToShow argument).

Two points to consider:

~~The filter argument must be an object with a property named ‘acceptNode’ that has a function as a value. It can’t just be a function for filtering, it must be enclosed in a wrapper object.~~ Update: Actually, this isn’t true – at least with Mozilla’s implementation you can pass in just a function. Thanks for the tip, Neil!
The argument is required (even though you can pass in null, making it equivalent to accepting all nodes).

The last argument, entityReferenceExpansion, comes in to play when dealing with XML entities that also contain sub-nodes (such as elements). For example, with XML entities, it’s perfectly valid to have a declaration like <!ENTITY aname "<elem>test</elem>"> and then later in your document have &aname; (which is expanded to represent the element). While this may be useful for XML documents it is way out of the scope of most web content (thus the argument will likely always be false).

So, in summary, createNodeIterator has four arguments:

The first of which can be removed (by making the method available on elements, fragments, and documents).
The second of which is obtuse and should be optional (especially in the case where all nodes are to be matched.
The third which requires a superfluous object wrapping and should be optional.
The fourth of which should be optional.

None of this actually takes into account the actual iteration process. If you look at the specification you can see that all the examples are in Java – and when seeing this a lot of the API decisions start to make more sense (not that it really applies to the world of web-based development, though). In JavaScript one doesn’t really use iterators, more typically an array is used instead. (In fact a number of helpers have been added in ECMAScript 5 which make the iteration and filtering process that much simpler.)

I’d like to propose the following, new, API that would exist in place of the NodeIterator API (dramatically simplifying most common interactions, especially on the web).

// Get all nodes in the document
document.getNodes();

// Get all comment nodes in the document
document.getNodes( Node.COMMENT_NODE );

// Get all element, comment, and text nodes in the document
document.getNodes( Node.ELEMENT_NODE, Node.COMMENT_NODE, Node.TEXT_NODE );

I’d also like to propose the following helper methods:

// Get all comment nodes in the document
document.getCommentNodes();

// Get all text nodes in a document
document.getTextNodes();

Beyond finding elements, finding comments and text nodes are the two most popular queries types that I see requested.

Consider the code that would be required to recreate the above using NodeIterator:

// Get all nodes in the document
document.createNodeIterator(document, NodeFilter.SHOW_ALL, null, false);

// Get all comment nodes in the document
document.createNodeIterator(document, NodeFilter.SHOW_COMMENT, null, false);

// Get all element, comment, and text nodes in the document
document.createNodeIterator(document, 
    NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT,
    null, false
);

This proposed API would return an array of DOM nodes as a result (instead of an NodeIterator object). You can compare the difference in results between the two APIs:

NodeIterator API

var nodeIterator = document.createNodeIterator(
    document,
    NodeFilter.SHOW_COMMENT,
    null,
    false
);

var node;

while ( (node = nodeIterator.nextNode()) ) {
    node.parentNode.removeChild( node );
}

Proposed API

document.getCommentNodes().forEach(function(node){
    node.parentNode.removeChild( node );
});

Another example, if we were to find all elements with a node name of ‘A’.

NodeIterator API

var nodeIterator = document.createNodeIterator(
    document,
    NodeFilter.SHOW_ELEMENT,
    {
        acceptNode: function(node){
          return node.nodeName.toUpperCase() === "A";
        }
    },
    false
);

var node;

while ( (node = nodeIterator.nextNode()) ) {
    node.className = "found";
}

Proposed API

document.getNodes( Node.ELEMENT_NODE ).forEach(function(node){
    if ( node.nodeName.toUpperCase() === "A" )
        node.className = "found";
});

Almost always, when finding some of the crazy intricacies of the DOM or CSS, you’ll find a legacy of XML documents and Java applications – neither of which have a strong application to the web as we know it or to the web as it’s progressing. It’s time to divorce ourselves from these decrepit APIs and build ones that are better-suited to web developers.

Update: An even better alternative (rather than using constants representing node types) would be something like the following:

 document.getNodes( Element, Comment, Text );

Just refer back to the back objects representing each of the types that you want.

Posted: June 19th, 2009

Subscribe for email updates

54 Comments (Show Comments)

Remy Sharp (June 19, 2009 at 11:07 am)

On the getNodes returning an array – that would be awesome, but currently querySelectorAll doesn’t return an array, it’s an HTML collection (can’t remember the exact var name) – so to do a forEach off it, I have to augment the object to an array and donkey on the forEach method.

So, if you’re on a crusade to get this API sorted, any chances you could hint at making a consistent return type for these queries? :-)
Ian McKellar (June 19, 2009 at 11:09 am)

NodeIterator is what I refer to as a “decoy API”. When you glance at what it’s supposed to provide it seems like a perfect fit for many tasks, but as soon as you get into the nitty gritty, you hit walls of confusion and dumb. FWIW, the DOM (or at least Mozilla’s) XPath API feels a lot like NodeIterator so I suspect the same people are to blame. They don’t feel DOM-y…
Aaron (June 19, 2009 at 11:29 am)

Are there any advantages to using an iterator over an array (maybe memory usage)?
Jonathan Fingland (June 19, 2009 at 11:31 am)

@john I’m still amazed at some of the w3c recommendations. XqueryX (http://www.w3.org/TR/xqueryx/) comes to mind. On the whole I support the idea of w3c as a standards body. It’s just that the signal to noise ratio doesn’t look so good.

@IanMcKellar firefox’s xpath api, while being a little off the standard xpath 1.0, has in my experience a big step up compared to horrible document.getElementsByTagName(“div”)[0].firstChild.nextSibling.firstChild (and so on) style selections. That said, the option of returning an array (or array-like node list) just seems like such an obvious extension to the standard, I have to wonder why it was left out.
Ulrich Petri (June 19, 2009 at 11:48 am)

This “crazy” method of specifying the node types is a quite ordinary use case of binary flag values.
What you really should be doing here is bitwise OR-ing the flags together. That it is possible to just add them lies in the fact that adding integers with _non_ _overlaping_ 1 bits is an equivalent operation to bitwise OR-ing.
Marius Gundersen (June 19, 2009 at 12:14 pm)

It looks like this API was created by someone more familiar with C programming than JavaScript. It reminds me of the clunky Windows API, where you have hundreds of arguments, and different combination’s of those will make the function do different things.
Sevenspade (June 19, 2009 at 12:36 pm)

@Remy Sharp, use Function.prototype.call, e.g., Array.prototype.forEach.call(collection, callback).
John Resig (June 19, 2009 at 12:46 pm)

@Ulrich: Where is this a conventional practice? It certainly isn’t one in the world of JavaScript or the DOM. Maybe it is for Java (for which this API was designed) or C but it doesn’t translate well to the web.
Kuba Bogaczewicz (June 19, 2009 at 12:51 pm)

While I have to agree with most arguments here, I am a bit surprised with this proposition of selecting many types and a list of arguments:
document.getNodes( Node.ELEMENT_NODE, Node.COMMENT_NODE, Node.TEXT_NODE );
At least for me a better way of saying I want to get one type or another is to use “or” – bitwise, but still “or”
document.getNodes( Node.ELEMENT_NODE | Node.COMMENT_NODE | Node.TEXT_NODE );
John Resig (June 19, 2009 at 12:58 pm)

@Kuba Bogaczewicz: I disagree. Nowhere else in common JavaScript or DOM APIs is the practice of ORing (|) arguments together a common practice. For most web developers the practice of using a bitwise OR means nothing to them (in fact, I would be willing to bet that most web developers would assume that | was just a mis-typed ||). I proposed a second solution that is even simpler:

document.getNodes( Element, Comment, Text );
Julien Oster (June 19, 2009 at 12:58 pm)

Yes. And it’s easy to pass those flags around:

flags = Node.ELEMENT_NODE | Node.COMMENT_NODE

and to add flags to the passed around value:

flags |= Node.TEXT_NODE

or to remove them:

flags &= ~Node.TEXT_NODE

… and to answer the other question: it’s only the most commonly found form to handle binary flags in virtually every environment ever :-/
Erik Harrison (June 19, 2009 at 1:05 pm)

Good changes, John. Personally, I’ve got no problem adding constants together – I think it scans well, and I’m not unfamiliar with bitwise ops. But your proposal is more generally useful.

But, can we get a reasonable lazy loaded list type in JavaScript, please? As nice as $.each() and the array forEach methods are, I don’t want to worry about accidentally trapping off some memory in a closure that the browser can’t reclaim, or thunking various collection types into arrays, or allocating potentially large amounts of memory or blah blah blah.

Or is this already there and I’m not enough of a guru to know?
Jeff (June 19, 2009 at 1:11 pm)

@John
> Where is this a conventional practice?

What you are experiencing is culture shock. Instead of reacting by attacking this ubiquitous convention, you should instead learn it. It’s efficient. It’s not going away.
josh (June 19, 2009 at 1:11 pm)

John, I like your first alternative but not your second. If I understand it correctly it would look like this:

var Element, Comment, Text;
document.getNodes( Element, Comment, Text );
for (var node in Element){
//do something with node of type element.
}

This is rather cumbersome! And if you wanted just text nodes you’d be forced to put dummy arguments in to get access to that variable.

And of course, the real question is why doesn’t Mozilla just start bundling jQuery’s selector API with the browser? There is precedent for successful open source projects being codified as standards: Gavin King’s Hibernate was the basis for EJB3.

(And to everyone else who seems to be intentionally obtuse about John’s point about bit fields in JavaScript being completely un-idiomatic: get a life.)
Julien Oster (June 19, 2009 at 1:21 pm)

@josh: I don’t think that’s what he meant. I think he meant specifying the (readily available) DOM objects to specify that you would like to include nodes descending of that object.

Which, by the way, I think is quite a neat and convenient idea! As long as it’s just an addition, however. For example, passing a closure to the filter argument of the original API seems insanely useful.
John Resig (June 19, 2009 at 1:22 pm)

@Jeff: I’m already familiar with using constants and bitwise-OR-ing them to create an argument for a function. That’s not the point. I’m saying that 1) MOST web developers are not familiar with this technique and 2) This is poor API design for JavaScript. JavaScript has the ability to pass around multiple arguments, or object literals holding multiple properties – any of those would be easier to use and be more JavaScript-like than what is proposed.

@josh: Sorry, you misunderstood my proposal. Element, Comment, and Text are already global variables – they are the base DOM Element, the base DOM Comment, and the base DOM Text Node. You would use the API like so:

document.getNodes( Element, Comment, Text ).forEach(... etc. ... );
Wesley Walser (June 19, 2009 at 1:25 pm)

Binary flags are common in C.

It may be worth mentioning that in Javascript (unlike c) using the bitwise or operator is going to be slower than addition since numbers are always stored as floating point. In order to do the or the number is converted into a signed int, the or is computed, then the int is converted back to floating point. All of that being said addition is more error prone as you must know what bits are already set.

Not sure how common binary flags are in Java.
Wesley Walser (June 19, 2009 at 1:29 pm)

So I started writing my response before people started flipping out. I think the proposed changes are good ideas, and really like that fact that John is keeping the average web developer in mind when making API design decisions.
GregV (June 19, 2009 at 1:43 pm)

The JavaScript API may be obtuse, but it’s not Java’s fault. The entire W3C DOM API is a pain, even by Java standards. Bitwise operations are standard in C, but Java (1.4 and before) has BitSet, and now has the even-more-convenient EnumSet. (So the “all” case that you show as a pain would simply be EnumSet.allOf(NodeFilter.class). That is, it could be, if W3C had been smart enough to take advantage of the platform.

The problem is that these APIs are designed by committee for some perceived lowest-common-denominator, so that “any” language can implement them. I think it’s a bad idea.
Boris (June 19, 2009 at 1:48 pm)

John, returning an array means having to allocate memory for that array, making it easy for the web app to leak that memory for the page lifetime by entraining it.

On the other hand, returning an array is the only way to go if you want a non-live snapshot of the DOM. So really, the question here is how you want your API to behave if the DOM is mutated by that forEach function of yours. That decision needs to be made first, before anything else.
John Resig (June 19, 2009 at 1:52 pm)

@Boris: Yeah, I’m undecided as to if it should be static (like querySelectorAll) or dynamic (like getElementsByTagName/ClassName/Name). I think for the sake of performance (and simplicity when iterating – as you mentioned) I would have to go with a static set.
Luke Andrews (June 19, 2009 at 2:03 pm)

Help! The comments on this page have all become auto-scrolling boxes of Courier.

Speaking as a person who used to be and probably still is the sort of “average web developer” John has in mind, I strongly agree with him that it’s silly to propose non-idiomatic solutions to common problems. Bitwise operators are almost never used in JavaScript. But the larger point was that it’s silly to force people to specify “thingA | thingB | thingC” when they really want… anything, and more generally, it’s silly to require arguments that will be the same in the vast majority of cases.
Christian Romney (June 19, 2009 at 2:04 pm)

Why not getElementNodes, getCommentNodes, getTextNodes for consistency?
John Resig (June 19, 2009 at 2:10 pm)

@Christian: Yeah – I guess if I have .getNodes and .getTextNodes I should have .getCommentNodes (instead of .getComments). Good call.
josh (June 19, 2009 at 2:27 pm)

I wanted to learn more about these globals and couldn’t find them in Firebug’s DOM inspector. But an alert(Element) (typed in at the console) doesn’t error out, so I know it exists! I looked “Element” up in Flanagan’s JavaScript book, and he mentions it only as a “sub-interface” to Node.

The objects that come back from methods like getElementsByClassName are of type Element. However, in what sense is it a global?
Sean Catchpole (June 19, 2009 at 2:54 pm)

Iterators are bad design in a parallel world. MapReduce is far more efficient and are embarrassingly easy to create for filter situations such as this. I agree with John’s proposed API.
John Resig (June 19, 2009 at 2:57 pm)

@josh: Actually, the results that come back from getElementsByTagName/ClassName/Name are NodeLists. NodeList is a global variable as well – for example if you were to type in the location bar: javascript:alert(window.Nodelist) you’d get back “[object NodeList]” – same if you do javascript:alert(window.Element). In the case of Element it has the Element.prototype from which other DOM elements inherit their methods and properties. Since a DOM implementation is going to have an implementation of Element it makes sense to reference it in this context.
Neil (June 19, 2009 at 3:38 pm)

“The filter argument must be an object with a property named ‘acceptNode’ that has a function as a value. It can’t just be a function for filtering, it must be enclosed in a wrapper object.”

Actually, my reading of the code suggests that using a function is indeed possible, and in fact, the spec specifically says that for ECMASript using a function and not a separate NodeFilter object is the expected implementation.
John Resig (June 19, 2009 at 3:42 pm)

@Neil: You are correct, I was mistaken. I just did some testing on my end and it does appear to work that way (at least in Mozilla’s implementation). I’ve updated the post.
Braden (June 19, 2009 at 4:06 pm)

PHP also uses bitwise flag options, notably in error_reporting(), but also in preg_split(), for example. I actually find your arbitrary argument count example more unusual.
Sean Hogan (June 19, 2009 at 5:28 pm)

DOM APIs aren’t just for javascript processing of HTML in the browser. They are usually designed with consideration for a variety of languages and needs.

But no-one’s complaining about the lack of standardization in JS frameworks, so go ahead and implement whatever API makes sense to you.
Jörn Zaefferer (June 19, 2009 at 6:19 pm)

Most Java APIs, like the IO stuff, are written for handling each and every edge case, eg. very very large files. Why it may be nice to be able to handle very very large files, or XML documents, I don’t get why its so hard to add a few simple methods for handling the most common cases.

And just transferring those APIs to a browser makes even less sense. I wouldn’t want to use those in Java in most case, why should I use it in JavaScript?

Your proposal looks good, though I don’t like the varargs for getNodes(). That would make it mostly impossible to change the method arguments in the future.
Dan (June 19, 2009 at 10:12 pm)

The reason IT is verbose is the W3C DOM rec is language AGNOSTIC, so how does your example work in languages where functions are not first-class objects, such as Java? The use of binary flags is done because some lanaguages don’t support enums.

The list goes on and on.
jor (June 20, 2009 at 5:40 am)

@John

Unfortunately, sometimes W3C build great stuff (thinking about XPATH, the DOM although it could be simpler, …), but sometimes they just create bloated stuff that just cannot be used.

Two examples come to mind: XSLT, that is just not worth its complexity.

And the DOM Load and Save specification. Honestly, how come that’s the best thing they can come up with for the simple purpose of parsing a string to an xml document ? Come on, document.parseXML( string ), and that’s all !

</angry>

I agree with the modifications you suggest, except that maybe I would prefer this kind of method signature, at least for multiple choice:

document.getNodes( [ Element, Comment, Text ] );

That way,the node type(s) are always are always defined via the first parameter, and it may be easier for some later specification to add paraeters (if they are ever needed).

Also I am not sure the other variations are really needed, if you use the global objects Comment, Element and Text:

document.getCommentNodes();
document.getNodes(Comment);

Both are the same number of characters, so I am not sure adding those getSomethingNodes methods to the document namespaces is really needed at all.

Just my two cents.
John Resig (June 20, 2009 at 8:50 am)

@Dan: Naturally what the W3C proposes is language agnostic – I’m simply saying that we can (and should) expect better APIs in JavaScript and for the DOM. What we have now are generally quite mediocre (heavily Java and C inspired) when we could have something much better
Thomas Hansen (June 20, 2009 at 7:32 pm)

Your first argument towards having this method on DOM elements instead of only the document object I agree with, and this could be considered a “bummer” by Mozilla. However your second argument is embarrassing for you to be honest. Sure it would have been better with e.g. a predicate or something (function to method returning true/false something) but your proposal here with changing the ORing of arguments to comma separated arguments is weird at the least. I guess others would be willing to use stronger words…

When that’s said, aren’t you working for Mozilla? Shouldn’t you be like *promoting* their stuff…?
^love*encounter~flow (June 20, 2009 at 9:32 pm)

wow, now that i actually read the discussion i realize there is quite bit of abuse about using bit flags going on here. i especially liked the way one guy wants to bestrafen john for being 1ll1tera7e in the wonderful ubiquitous and efficient world of myflag &= ~yourflag when that other guy comes in to tell us that javascript must do two int/float conversions to accomplish a single bit flag operation. that was quite dramatic! hey my commodore used to do that right on the cpu, long ago. i would like to re-inforce that NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT equals 133, and that is the problem. you pass around a piece of data that is utterly silent about its intended purpose. 133 could be anything, right? in a world running on magnetic core memory, that kind of efficiency is obligatory; in world shuffling around terabytes, you quickly lose track if you don’t label your things. this consideration may or may not be valid for the case discussed here—the arguments to that iterator method will typically be produced right within the call parentheses and die microseconds later when the call terminates. the moment you store or transmit such stuff, however, things start to look different. then, a dump of failed data transport will only call for ‘that 133-ish selection y’know’ and you gotta reach for the manual to find out. like they had only numbers for error messages back then, y’know. i find the technique acceptable for specialized purposes, and preferrable for size/speed challenged small systems, but outmoded and downright dangerous in the general case.

and please stop enticing others to get even more abusive—i’m looking at you, mr hansen. instead pls go ahead and detail what part is so weird to you, thx.
AndersH (June 21, 2009 at 5:24 am)

Funny how the “name of language” can determine the API design so dramatically. That is, even though java and javascript syntactically are so similar, the feel is so different. Also funny that java developers put up with those clunky APIs. (And sad that the Pascal-ish variable declaration was proposed for javascript)

I don’t really like your proposed APIs. Both have the problem that representing sets via variable argument lists just seems wrong. Something about the ordering being very important among arguments, but undefined in a set. Also it seems wrong to “waste” all those argument positions. What if I wanted to add arguments (eg. a callback — although you probably would want to use a “grep” function on the returned nodelist rather than adding a special functionality to this function). As you already mentioned the use of enums, which are even not really enums in the first place, seems really java-ish and those capitals really draws alot of attention. But I don’t like the use of types either, because the meaning was not immediately clear to me (what “Element” is he talking about?). It seems as if you where referring to something undefined or something I would expect to find somewhere in the code.

How about this to get the element, comment and text nodes from node and down two levels deep:
node.getNodes({element:true, comment:true, text:true}, 2);
It is not that much more code (when “true” is written as “1”).
Topper (June 21, 2009 at 10:25 am)

1. DOM is designed for many languages and environments. Not for browser only!

2. For cross-browser reson:
We can implement it like this:

if (!document.createNodeIterator) {
document.createNodeIterator = function(root,whatToShow,filter,entityReferenceExpansion) {…};
}

If this function is defined in Element, You have to implement it like this:

if (!Element.prototype.createNodeIterator) {
Element.prototype.createNodeIterator = function(…) {};
}

How about IE6, IE7 ?
Rick (June 21, 2009 at 6:56 pm)

@John:

document.getNodes( Element, Comment, Text );

This solution is perhaps logically the best, though I just had a thought about alternative implementations…

A single array argument…

var nodeTypes = [ Element, Comment, Text ];
document.getNodes( nodeTypes );

-or-

document.getNodes( [ Element ] );
(which i don’t actually like the look or feel of, but i felt it nec. to post along with the other example)
Eduard Bespalov (June 22, 2009 at 2:42 am)

In cases like the above bit flags are independent. Thus the expression (NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT) is equivalent to (NodeFilter.SHOW_ELEMENT + NodeFilter.SHOW_COMMENT + NodeFilter.SHOW_TEXT) which is more intuitive.
Dmitrii 'Mamut' Dimandt (June 22, 2009 at 3:24 am)

What’s the point of this API when we have querySelectorAll (if it’s implemented correctly, a la jQuery’s $)?

Does it only come down to static data vs. live data?
boen_robot (June 22, 2009 at 7:38 am)

@John
“Naturally what the W3C proposes is language agnostic – I’m simply saying that we can (and should) expect better APIs in JavaScript and for the DOM.”

The problem is a language agnostic API will work for more than JavaScript, and any “better APIs in JavaScript” would not. The W3C answers to more than JavaScript developers. It answers to anyone who does anything for the web, and thus to any language that is used to create and/or manipulate web pages (JAVA, C#, you name it).

I like the change suggested by Rick, only instead of using an array of objects, one could simply use an array of the node type constants, like
var nodeTypes = [Node.ELEMENT_NODE, Node.COMMENT_NODE, Node.TEXT_NODE]; document.getNodes(nodeTypes);

Coupled with being able to use this method on any “Node” object, having NodeFilter object (or a function) as a second argument, and perhaps having the entityReferenceExpansion as a third argument (with default to false), this would be a great and simple enough function… all while being language agnostic.

One problem though. This does assume the language supports arrays. You’ll notice the DOM never uses or returns arrays, as it seems some languages don’t have them (or something…). The only possible workaround is what the WG currently has – a special class (NodeFilter) that defines constants for what to show, so that a single number, representing the nodes to show, can be passed.

With even that change back to the original, it seems the only really needed change from the original is being able to use a node iterator over any node (without a first argument) and actually providing some defaults for the parameters, so that one can write:
(all nodes in the document)
var iterator = document.createNodeIterator();
(all element, comment and text nodes in the document)
var iterator = document.createNodeIterator(NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT);
(all element, text and comment nodes with the contents “A”)
var iterator = document.createNodeIterator(NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT, function(node) { if (node.nodeValue === "A") { return NodeFilter.FILTER_ACCEPT; } });
(all nodes inside the element with ID of “nav”)
var iterator = document.getElementById("nav").createNodeIterator();
(all “A” elements inside the element with ID of “nav”)
var iterator = document.getElementById("nav").createNodeIterator(NodeFilter.SHOW_ELEMENT, function(node) { if (node.nodeName.toUpperCase() === "A") { return NodeFilter.FILTER_ACCEPT; } });
TNO (June 22, 2009 at 7:39 am)

@Eduard Bespalo
NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_ELEMENT

is NOT equivalent to:

NodeFilter.SHOW_ELEMENT + NodeFilter.SHOW_ELEMENT

2 | 2 => 2
2 + 2 => 4
John Resig (June 22, 2009 at 12:25 pm)

@Dmitrii ‘Mamut’ Dimandt: querySelectorAll is only capable of selecting DOM elements. It can’t select any DOM node (e.g. text nodes, comments, etc.).
^love*encounter~flow (June 22, 2009 at 12:25 pm)

2 | 2 => 2 2 + 2 => 4

see, that makes the practice of passing bitfields so dangerous. those values behave like integers, but work for you only as long you OR and AND NOT them. ADD them only for funny effects, AYOR.

i am adverse to the idea of passing a list of binary values to the method in question. because in effect, you build a specialized data structure so you can conveniently pass one type of argument. how about building a container around your string arguments and another for your numerical arguments? guess you would be against that. sometimes this kind of solution may be right, but here, it is not.

funny: you build a specialized data structure to house your arguments, a list; in effect, you build a named non-standard sub-namespace with indexed semaphores. the indexes are not important, so you ignore them. the values you feed in there are distinct enough so they can live together in a list. but you do not want to let them mingle with the hoipolloi that are the other common arguments. like, i mean, there is an `arguments` global in javascript that you can use to fish out any numbers of argument values. the arguments are, in themselves, already a list-like object that you can iterate over.

passing bitfields for arguments requires that, in absence of names, values are distinct enough in themselves so they can take care of themselves inside of a set—unordered, unnamed. what John suggests is to pass a value that is by extension (actual data type) a list, but by intention a set whose ordering you cannot and don’t have to predict. the values, as such, are ultimately non-informative: they print out as 4, 8, 16, and so on. i suggest to keep the idea of unordered, distinctive keys, and suggest that you go one step beyond and give those 4, 8, 16 a voice in the general assembly of values with a name and rid yourselves of the data prison this list of integers is.

that prison is one more tier for your argument processing. you see, argument processing in javascript is hard. it is even hard in python. now you have at your hands a passed-in variable, `f(showme)`, that you presume to be an integer number and that you will test like `node=(showme&4)?this_node:null;`. the test will be run against an ANDed integer. this integer number you don’t want to show up in the source, so you reference it by name: `node=(showme&SHOWME_COMMENTS)?this_node:null;`. the name of the reference is stored somewhere so both the producer and the consumer can access it symbolically (by name). you have just sunken one piece of interface definition into the source. this feels much like sinking stored procedures into a database, in the hope that will help integration and performance. it will both not, and i argue against it: just look into the hideous syntax that procedural programming in SQL is and you know you can trust that platform for procedural purposes as far as you can throw it (just learned that idiom).

the code in a web application—the client-side javascript and the server-side favourite snake—reports to a (possibly unwritten) interface specification. that is a huge conundrum of age-old conventions, best practices, feasabilities and lastley side-notes on your `getNodeIterator` implementation. i suggest to keep that last part of the docs short, without surprises, and to shun any very special data-mangling requirement. like, ‘pass in `4` if you want text nodes; OR that with `8` if you want comments as well; AND NOT it with `16` to exclude any ordinary nodes’. what read is that? better then say: ‘you can pass in `show_nodes:true` to show nodes, `show_text:true` to show text nodes, and `show_comments` to show comments’. this pushes the honor/onus of naming to where it belongs: into the interface code, and into the documentation. it does not belong into the deeper sinks of code (part of the interface, but only a tiny part, being administered by a namespace-look-alike module), unless it is considered cached object data (so-called implicit assumptions, stretches of data that belong to your model, but that have to be too availbale, are too involatile, and are too few to be queried from a database).

someone from a ruby list once complained that the moment they’d get named values in ruby function calls would be the moment when his holy function parameter names were to be set in the stone of the documentation, and used to identify arguments in calls, and he didn’t like that.

it’s the way to go.

i say the function call parameters constitute a data type and namespace like any other, with the provision that you want to have positional and obligatory members; these features make the conventional function signature namespace type different from an ordinary dictionary and list namespace. people have answered to the calamity of not having named arguments in javascript by writing functions that accept a single `options` argument which is a `{}` (POD, plain old dictionary). `{}` is not perfect in that it requires extraneous punctuation `f({‘bar’:42})` instead of `f(bar=42)` plus you have to complain yourself if a parameter did not get passed in; in python this is a post-syntax but pre-i-call-thee error—you cannot call a function with a missing required argument, it is not a syntax error though (it would be in java, which won’t compile any `f()` where `f(a)` was called for). in javascript, it is not even a post-i-call-thee error—the method gets called alright even with missing requirements, and either implements their own sanity checks upfront in the function body, or depends on the morale of, weah, ‘shit passed in, shit happens, f*’em’ (99% of all javascript code are like this; even jQuery fails with mysteriousest error messages pointing to undocumented myracuolous lines of incantations written for arcane efficiency, not human consumption—assembler of the machine age).

in all, in javascript failures (1) happen to late, (2) cause too strange effects, (3) die too mysteriously when they lastly do die. it’s like having an obnoxious locaust under your bed in the dead of night: kan’t kill it, and when you do, it’s too late, and you wonder why. you always want to make javascript fail early, with helpful hints (unlike canvas API calls that fail with garble, no traceback, nuthin). of course, there is a performance penalty. but some of it—maybe not in an efficiency-aspiring library like jQuery—is justified (the other day i realized that in my box2d-js-based physics simulation, 40% of all cycles got burnt in protoclass.js, a purely OOP-shim-whatnot module, there to ease out the bumps on automated transition from C to AS to JS. bummer. as a consumer, all i get is the benefits of an utterly unscrutinable classy-OOP API. you get to show like flash inside an html5 canvas, but your money rests with the burocracy).

i counsel against using protoclass.js or any other such tool, and i counsel for burning framework cycles in frameworks that produce visible output for clients. of course, one must be precautious: you don’t use optimized helper A but roll your own methodology (in an informal and distributed manner), then of course your doings are not going to show up on the firebug profiler data dump. that is because they don’t have a specific address to the machine so whatever cycles you burn to cook up your own argument processing gets subsumed under all your function calls; doesn’t mean you did all well only coz’ those ugly figures have been wiped for good.

i can build quite capable javascript libraries by now, things that are reasonably organized and do not use any OOP shim, outrageously deficient as javascript’s proto-typee-bypee model of code-organization may be. i don’t care. i never touch `x.protoype`, and wouldn’t with a long pole, for i have heard people talk bad on them. hey, how many projects out there promise to give you OOP Kool Aid for JS and get it right only in version 3.1? those experts are smart, but is is so difficult, even crockford sometimes gives stoopid pieces of advice. you ain’t wanna talk bout the kidneys when you’s not an hepatologist. and when they get it right, they do so with a 40% fallout that rains on my precious browser performance. not good.

the good news is not that we get class-based, real OOPs into JS, some day; the good news is that we have native (para-) PODs in javascript, right now. those `{}` and `new Object()` thingies are worth almost a python `dict`; they are the most versatile and capable data type of the seven sisters of JSON (null, false, true, number, text, list, pod), even excelling general objects (ie. your typical class instance), whose un-arcane mishmash of names that denote state and names that denote function makes them misfits for consumption over the wire, or printouts.

i therefore argue that taking a suitably named parameter, assigned to the minimal value of `true` (or `1` for people that like to live on the edge), is, generally speaking, the right solution; your method calls can then most of the time be reduced to `f()`, `f(a,b,c)`, `f({})`, `f(a,b,c,{})`, thereby reducing the seemingly endless variety of distinct flavors of doing the one same thing to a predictable set of choices: pure calls, calls with a few positional parameters, calls with a sole `options` POD, lastly, calls with few positional parameters plus an option POD (ie, summing up, ‘at most a few unnamed parameters, where called for by convenience, efficiency, convention; all complexity into one rather flat options dictionary’). the requirements for the POD arguments, sadly, can not be written into the function signature, but must somehow be done in the function body. while ‘primitive’, this is not an altogether bad thing, since even the most strictest bind-and-conquer-est languages for drama programmers (like java) only give you so much of sanity checking in their function signatures—to be fair, you MUST ALWAYS specify parameters with a type, but you don’t get a chance to specify that only numbers between 3 and 12 will be allowed. you also do not get the freedom to pass a float where an integer was required. that is sooo helpful. java’s datatype-prescriptionistic ways will rule 120% of your way to build an application. they wholly-own you. will you get a reasonable call signature data type or any kind of declarative assertive infrastructure? no. wanna pass a float for an int? signature with templating required. say what?

so while `x.SHOW_ME_THIS|x.SHOW_ME_THAT` might make sense for a seasoned C programmer or someone who have underwritten to live out their lives inside the w3c, it is neither suitable, nor efficient, nor without more promosing alternatives in javascript, borked as that language may be—is it not yet warped to the level of complete IDL idiocy. heck, the computer industrie can’t even agree on keycodes for virtual keyboards (means: my google pinyin chinese input method always gets to see my keyboard like it was a US layout. an input method that gets the input terminal wrong, great), what do you expect a universal cross-language cross-platform cross-culture API definition language designed by committee like? like in, restricted, convoluted? you would be right. we should take IDL specifications as a jargon with its own specifcs, a general instruction at best what is there to do and a suggestion how to call stuff. if they want more power over our ways we implemnt their suggestions, API spec-wrights must needs become better at using a language anyone in programming wants to speak. you would not actually want to programm in IDL, so why defer to it? to let IDL take the reins of implementation is not wise. people rightly say that the DOM specification is not only there for javascript, it is there for all the languages. i say, have the gods of DOM now become our new overlords and are we the ones to hail them? i get it we’re not.

the two solutions i prefer to have in a situation like this is either to allow passing, to the function call, strings that spell out their purpose, like `f(‘show-nodes’,’show-comments’)`, or to use the general options: `f({‘show-nodes’:true,’show-comments’:true})`, or allow the optional mixture of both ways. this solution is appropriate for the following benefits:

(1) users can call the function with arguments as needed. counterexample: i’m expected to end 99% of all `arc` calls to the html canvas context with the incantation `ctx,arc(…,0,2*math.PI,true)` (meaning: draw the arc from 0 degrees all the way round, and yes, in counterclokwise direction—otherwise angles go the other way, heck knows why not here).

(2) users do have to learn the `show-comments`, `show-texts` little language, it is true, but they were expected to learn `foobarbatz.SHOW_NODES` before anyhow, so ne extra burden here (of course there will be those who learn the inofficial numbers behind those names and take the `iterNodes(42)` highway. it is a much faster crash y’know). users don’t need a potentially unobvious library call to claim their desires, the can just go along and write out the text.

(3) these arguments don’t have to be written in any specific order, and that is fine. they can combine as they like. the bitfield used before wasn’t ordered, and that was OK about it. it is not like the bitfield as a data type was inappropriate, it is strictly the implementation of the bitfield as an integer, which it is not, is problematic—add to that the fan-meets-ape-factor that javascript has no integer datatype at all and see that bitfields are no member of parliament in this browser—they are but simulated.

(4) argument collections, aka configurations, can be serialized—if the `options` argument is just another POD, filled with nothing but strings, lists, PODs, true, false, none, numbers, it can go to about anywhere in the wide world of IT. you know, good girls go to heaven, bad girls go to sandville, trueville, everywhere. good data go to memory, but bad data go to the database, to the file, over the wire, and into the browser, and back. you gain the ability to persist you choice of arguments as-is, JSONified, to a flat file configuration, allowing for editing, and you doon’t have to unpack anything in the simple case to feed those choices to a function. you don’t always want to do thit buit using this methodology you can always do it. well not in browser javascript, except inside a cookie or reading from http.
^love*encounter~flow (June 22, 2009 at 12:27 pm)

“querySelectorAll is only capable of selecting DOM elements. It can’t select any DOM node (e.g. text nodes, comments, etc.).” can APIs be any more needlessly specific and offer worse naming? querySelectorAll? say what? that does what?
eyelidlessness (June 22, 2009 at 2:58 pm)

^love*encounter~flow,

“querySelectorAll? say what? that does what?”

The way I read it, you query the [CSS] selector engine for all matches (as opposed to querySelector, which returns a single result). Naturally, the selector engine only deals with elements. If this is “needlessly specific”, I’d like to know what CSS selectors you are using to select text and comment nodes, or how you would propose doing so?
^love*encounter~flow (June 23, 2009 at 4:00 am)

i always use `$(‘.foo’)`, `node.find(‘.foo’)`, and, of late,

$.fn.text_nodes = function() { var R = []; this.each( function() { var fn = arguments.callee; $(this).contents().each( function() { if ( this.nodeType == 3 /* || $.nodeName(this, "br") */ ) R.push( this ); else fn.apply( $(this) ); }); }); return $(R); };

to get text nodes ($(‘.foo’).text_nodes()). i find these methods infinetely easier to use that than the standard ones. so `querySelector` really means `query_for_one_element_using_css_selector` if i understand you? well that name would be a bit too long indeed, so some shortening had to be done. “needlessly specific”, because many things you want to do with the dom consist in finding nodes and walk over them in a certain way, and that is not dom-specific, it is a general data processing task. yet these methods will not work for general data, nor are they refinements of more general methods. dom elements are not implemented as refinements of existing data types; likewise, `arguments` is not a plain array or a refinement of an array, with the same problems. i think we should much more use standard data types wherever possible and try to write methods that are more generic. more generic methods are also more amenable for general consumption, and broader usage will help to sort out issues with interfaces, practices, and performance.
IAmAnIdiot (June 23, 2009 at 7:20 pm)

Sorry John, but regarding the “return an array” thing you are probably missing the point.
The “thing” is called NodeIterator because it is, well, you know, an iterator. It allows you to “iterate” over a list of elements without actually allocating all of them in memory. I mean, is cool to have an array, but c’mon, at least you should’ve proposed to add a “.toArray()” method to the iterator object itself.

If the intention was to have an “iterator”, simply proposing to use an array is not an answer ;) !

p.s.: oh, and btw: i don’t find it so ugly to write things like “Comment + TextNode” as a filter, it seems intuitive at least; and you can even add/remove single filters from a composed filter. Very usefull! Instead using multiple arguments seems limited, counter-intuitive and just plain wrong. What do you think?
James Pearce (June 23, 2009 at 11:10 pm)

John, are you involved in any of the relevant W3C groups?

If not, you should be – bring some fresh contemporary thinking in at the source, rather than via the blogosphere.

(Conversely, you can hear and question the arguments for ‘crazy’ decisions first hand.)

I know that sometimes W3C groups invite ‘subject matter experts’. I guess you are one ;-)
Kevin C (June 25, 2009 at 11:19 am)

Looking at NodeIterator API the immediate response is: “let’s create a wrapper API of some kind”. Then lots of developers and library builders will do that in a variety of ways.

But really we want API’s that are so simple and intuitive but powerful that wrappers are unnecessary. And the shared understanding can be reused among developers. No need to learn the ‘wrapper’ that a particular project has chosen.
Przemek Klosowski (June 25, 2009 at 3:51 pm)

Re. different way to specify node filtering, your proposal to use variable arguments is flawed. Besides being syntactically ugly (subjective, that), it makes it impossible to parametrize the call: if the filter was determined elsewhere in the code how would you pass it to the iterator??? I think you’d need an eval or something like that.

Think about it–the filter really is a logical expression (I want nodes that are text nodes OR comment nodes), so why not use a logical expression to specify one? In this case the properties are exclusive, but in general they don’t have to be, and then you may want to AND them (I want nodes that are text AND non-empty). What if you needed an arbitrary logical combination of criteria (x OR y AND (z OR t))?

If you like specifying multiple arguments, use explicit logical functions: document.getNodes(OR(Node.COMMENT_NODE, Node.TEXT_NODE))
Just four extra characters, and you don’t paint yourself into a corner.
Fred P. (June 29, 2009 at 1:54 am)

document.getNodes(Node.ELEMENT_NODE,Node.COMMENT_NODE,Node.TEXT_NODE);

Why not simply add this functionality to the simpler querySelector() API?
http://www.w3.org/TR/selectors-api/

document.querySelector(“:elementNode,:commentNode,:textNode”);

Since :elementNode is really *, then we get:

document.querySelector(“*,:commentNode,:textNode”);

then we can do things more easily:

document.querySelector(“script :commentNode”);
document.querySelector(“span :textNode”);

For instance, this would be very useful for IE opacity bug:

$(‘#container :textNode’).parent().css(‘background-color’,’white’);

Also, an grouping OR ‘|’ operator shortcut would be very useful:

$(‘#container (td|div|span).cls a img’)

would be equal to the verboseness of this:
$(‘#container td.cls a img, #container div.cls a img, #container span.cls a img’)
or this:
$(‘#container’).find(‘td,div,span’).filter(‘.cls’).find(‘a img’)

The reason is that sometime iterating on each node to find .cls on IE,
creates some problems for flash elements and similar.

$(‘#container .cls a img’)

or an iterator grouping :safe (at least for IE bugs enumeration)
(that would exclude embed,object,iframe,applet,script,style)

$(‘:not(script|style|embed|object|iframe|applet).cls’)

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Subscribe for email updates

@jeresig / Mastodon

Infrequent, short, updates and links.

Unimpressed by NodeIterator

54 Comments (Show Comments)

Remy Sharp (June 19, 2009 at 11:07 am)

Ian McKellar (June 19, 2009 at 11:09 am)

Aaron (June 19, 2009 at 11:29 am)

Jonathan Fingland (June 19, 2009 at 11:31 am)

Ulrich Petri (June 19, 2009 at 11:48 am)

Marius Gundersen (June 19, 2009 at 12:14 pm)

Sevenspade (June 19, 2009 at 12:36 pm)

John Resig (June 19, 2009 at 12:46 pm)

Kuba Bogaczewicz (June 19, 2009 at 12:51 pm)

John Resig (June 19, 2009 at 12:58 pm)

Julien Oster (June 19, 2009 at 12:58 pm)

Erik Harrison (June 19, 2009 at 1:05 pm)

Jeff (June 19, 2009 at 1:11 pm)

josh (June 19, 2009 at 1:11 pm)

Julien Oster (June 19, 2009 at 1:21 pm)

John Resig (June 19, 2009 at 1:22 pm)

Wesley Walser (June 19, 2009 at 1:25 pm)

Wesley Walser (June 19, 2009 at 1:29 pm)

GregV (June 19, 2009 at 1:43 pm)

Boris (June 19, 2009 at 1:48 pm)

John Resig (June 19, 2009 at 1:52 pm)

Luke Andrews (June 19, 2009 at 2:03 pm)

Christian Romney (June 19, 2009 at 2:04 pm)

John Resig (June 19, 2009 at 2:10 pm)

josh (June 19, 2009 at 2:27 pm)

Sean Catchpole (June 19, 2009 at 2:54 pm)

John Resig (June 19, 2009 at 2:57 pm)

Neil (June 19, 2009 at 3:38 pm)

John Resig (June 19, 2009 at 3:42 pm)

Braden (June 19, 2009 at 4:06 pm)

Sean Hogan (June 19, 2009 at 5:28 pm)

Jörn Zaefferer (June 19, 2009 at 6:19 pm)

Dan (June 19, 2009 at 10:12 pm)

jor (June 20, 2009 at 5:40 am)

John Resig (June 20, 2009 at 8:50 am)

Thomas Hansen (June 20, 2009 at 7:32 pm)

^love*encounter~flow (June 20, 2009 at 9:32 pm)

AndersH (June 21, 2009 at 5:24 am)

Topper (June 21, 2009 at 10:25 am)

Rick (June 21, 2009 at 6:56 pm)

Eduard Bespalov (June 22, 2009 at 2:42 am)

Dmitrii 'Mamut' Dimandt (June 22, 2009 at 3:24 am)

boen_robot (June 22, 2009 at 7:38 am)

TNO (June 22, 2009 at 7:39 am)

John Resig (June 22, 2009 at 12:25 pm)

^love*encounter~flow (June 22, 2009 at 12:25 pm)

^love*encounter~flow (June 22, 2009 at 12:27 pm)

eyelidlessness (June 22, 2009 at 2:58 pm)

^love*encounter~flow (June 23, 2009 at 4:00 am)

IAmAnIdiot (June 23, 2009 at 7:20 pm)

James Pearce (June 23, 2009 at 11:10 pm)

Kevin C (June 25, 2009 at 11:19 am)

Przemek Klosowski (June 25, 2009 at 3:51 pm)

Fred P. (June 29, 2009 at 1:54 am)

Secrets of the JS Ninja

Subscribe for email updates

@jeresig / Mastodon

**^love*encounter~flow** (June 20, 2009 at 9:32 pm)

**^love*encounter~flow** (June 22, 2009 at 12:25 pm)

**^love*encounter~flow** (June 22, 2009 at 12:27 pm)

**^love*encounter~flow** (June 23, 2009 at 4:00 am)