A new feature being introduced in HTML 5 is the addition of custom data attributes. This is a, seemingly, bizarre addition to the specification – but actually provides a number of useful benefits.
Simply, the specification for custom data attributes states that any attribute that starts with “data-” will be treated as a storage area for private data (private in the sense that the end user can’t see it – it doesn’t affect layout or presentation).
This allows you to write valid HTML markup (passing an HTML 5 validator) while, simultaneously, embedding data within your page. A quick example:
<li class="user" data-name="John Resig" data-city="Boston" data-lang="js" data-food="Bacon"> <b>John says:</b> <span>Hello, how are you?</span> </li>
The above will be perfectly valid HTML 5. This should be a welcome addition to nearly every JavaScript developer. The question of the best means of attaching raw data to HTML elements – in a valid manner – has been a long-lingering question. Frameworks have tried to deal with this in different manners, two solutions being:
- Using HTML, but with a custom DTD.
- Using XHTML, with a specific namespace.
The addition of this prefix completely routes around both issues (including any extra markup for validation or needing to be valid XHTML) with this effective addition.
On top of this a simple JavaScript API is presented to access these attribute values (in addition to the normal get/setAttribute):
var user = document.getElementsByTagName("li")[0]; var pos = 0, span = user.getElementsByTagName("span")[0]; var phrases = [ {name: "city", prefix: "I am from "}, {name: "food", prefix: "I like to eat "}, {name: "lang", prefix: "I like to program in "} ]; user.addEventListener( "click", function(){ var phrase = phrases[ pos++ ]; // Use the .dataset property span.innerHTML = phrase.prefix + user.dataset[ phrase.name ]; }, false);
The .dataset
property behaves very similarly to the the .attributes
property (but it only works as a map of key-value pairs). While no browsers have implemented this exact DOM property, it’s not hugely needed – the above code could be done with the critical line replaced with:
span.innerHTML = phrase.prefix + user.getAttribute("data-" + phrase.name );
I think what is most enticing about this whole specification is that you don’t have to wait for any browser to implement anything in order to begin using it. By starting to use data- prefixes on your HTML metadata today you’ll be safe in knowing that it’ll continue to work well into the future. The time at which the HTML 5 validator is integrated into the full W3C validator your site will already be compliant (assuming, of course, you’re already valid HTML 5 and using the HTML 5 Doctype).
Andrew Dupont (July 13, 2008 at 1:22 am)
Reading that last paragraph made me realize that I know of no sites which are using HTML5 already, even though it’d be easy to start. There’d be almost no friction to move to HTML5 from either XHTML1 or HTML4, on top of which the HTML5 DOCTYPE is already known and ensures all browsers still serve up the document in standards mode.
Why don’t you do it, John? You’d be a trend-setter.
Anne van Kesteren (July 13, 2008 at 1:43 am)
The DOM attribute is called
dataset
, notuserdata
, FWIW.Pete Forde (July 13, 2008 at 2:44 am)
*rubs nipples*
carefulweb (July 13, 2008 at 4:17 am)
Excellent post. This is really useful feature without cripple hacks. Thanks for sharing info, John!
Jake Archibald (July 13, 2008 at 6:14 am)
This part of the HTML5 spec is fantastic and scary.
It’s great that there’s a place to put meta data in any HTML element. It will see a increase in microformat-type patterns, and I’m sure javascript libraries will use it for UI widgets similar to how dojo does now.
However, I can see it being frequently abused. When a developer’s learning a new standard they’re more likely to latch on to the “you can put anything you want here” things, rather than the “this is for a specific type of data” things. I think you’d see people use it for storing time data, forgetting about the time element.
This isn’t a new problem to HTML. You don’t have to look far to see a div being used where another element is more appropriate.
Would the HTML5 validator throw a warning if it saw time data in a dataset attribute? Or does that smell of a clippy-style “I see you’re trying to mark up a datetime”?
Jake.
Jostein Kjønigsen (July 13, 2008 at 6:45 am)
Call me a XHTML zealot, but while all this seems nice and fancy, how would you embed hierarchical data or nested data into a system like this?
Compared to using Micro-formats in XHTML with custom namespaces, this solution just seems like a half-assed job to me.
an0n1 m0us (July 13, 2008 at 8:19 am)
John are you suggesting we use markup that breaks existing standards now, for who knows how many years until HTML 5 is ratified, just because we’ll be safe when HTML 5 finally arrives?
John Resig (July 13, 2008 at 8:32 am)
@Andrew Dupont: You know, I’ve been considering this for quite a while now. I’ve been contemplating a redesign of my site and if I did it, going the HTML 5 route would make a ton of sense. Now I just need to think of excuses to sneak some of these features in!
@Anne: Sorry about that, fixed – that’s what I get for writing at 2am in the morning.
@Jake: While I will agree on the div element, I think I will disagree on attributes – I think we see far more abuse of “you can only put specific things in here” attributes. Take the title attribute for example (in relation to Microformats). They abuse that one to high-heaven, constantly overloading its intended meaning (which is now causing problems for it).
I think there needs to be a clear benefit, to the developer, for using a specific attribute or element. For example there’s really no benefit to them for using an address element instead of a div. However, that is not so in many of the new HTML 5 elements and attributes – they outline very specific new behaviors that wouldn’t have been possible otherwise.
I don’t think it would throw a warning for putting time information in a data attribute. As a hypothetical example – let’s pretend someone used the time element to embed when a blog post was published. Then they also used a data- attribute to embed the time the blog post was last updated. Since the last updated information isn’t important or useful to the user (as deemed by the author) then there’s no reason to expose it.
@Jostein: Nope, this system doesn’t support hierarchical data or nested data any more than any other attribute. That doesn’t mean taht you can’t use these attributes to create that situation, though. For example, do you want namespace-like behavior in your HTML? Here you go!
<div data-ns-foo="http://foo.com/" data-foo-name="John"/>
Nothing precludes you from using this system to implement and enhance others (such as Microformats or RDFa). One major advantage that data- attributes have over XHTML and namespaces is its simplicity. They’re far more understandable and yielding to new users – and I fully expect that they will flourish where namespaces have fallen behind.
@an0n1m0us: Who said anything about breaking existing standards? I’m just suggesting that it’s perfectly possible to begin using HTML 5 today with no ramifications. You can convert a site to use HTML 5 – and have it validate – right now. The markup proposed by the specification is largely complete now and already built into many parsers and validators. There’s really no reason not to use it.
Flavio (July 13, 2008 at 8:33 am)
This sucks. And Jostein is right. I’m somewhat concerned about this disregard for XML features. I don’t really like HTML5 reintroducing tag soup.
John Resig (July 13, 2008 at 8:51 am)
@Flavio: Are you recommending that HTML 5 become proper XML with namespace support, instead? Because that certainly doesn’t work in the web as we know it. We’ve been down this road before (with XHTML) and it was a failure by almost all counts (HTML is still the dominant markup). The learning curve and failure rate are too high to purely XML-based markup, which is why the data- attribute exists as a means to implementing this solution. It should be very important to realize that data- attributes don’t try to replace namespaces, RDFa, Microformats – or any of that. They provide the tools necessary to *implement* the above functionality. With data- attributes you can implement RDFa in HTML, advanced Microformats in HTML, and even some form of namespacing in HTML. None of this was previously possible – and now it is.
Adam Bergmark (July 13, 2008 at 9:03 am)
I don’t like this at all. Usage of this will tightly couple DOM structures with model data. Using this 1-to-1 mapping seems crippling. Instead I’d suggest separating Model and View so that multiple views of the same data structure won’t be a problem . Conventions can be used to make the bindings between the two pretty much seamless ( http://www.cactusjs.com/browser/trunk/MVC/View/Template.js ).
Jostein Kjønigsen (July 13, 2008 at 9:35 am)
@John: Not debating your argument about poor XHTML-adaptation, but I just can’t help but feel that is is replacing a good, concise and flexible standard with a messier, more limited standard just for the sake of standardisation.
Basically solving parts of the problems we have today and pushing the remaining complex bits ahead to cause new, perhaps messier problems and hacks in the future.
To the W3C and anyone involved: Sorry but I’m definitely not on the HTML5 bandwagon yet and can’t help but think it’s anything but a camouflaged step backwards.
Rutherford (July 13, 2008 at 10:09 am)
I agree with Adam, separating View and Model is crucial. Markup has a very specific purpose and I never liked microformats for bending markup to their needs.
Style, Script and Data should be separated on their own.
Craig Buchek (July 13, 2008 at 10:09 am)
I’m on the “why don’t we just use XML namespaces?” side.
To put it in a different way, if the spec can say “any attribute starting with ‘data-‘ is valid HTML5, why couldn’t it say “any attribute in a namespace is ignored when checking HTML5 validity”, or something similar?
Many JavaScript libraries are already (ab)using such namespaced attributes (in HTML or XHTML), and just saying that the document is valid HTML, except for those added attributes.
Rutherford (July 13, 2008 at 10:16 am)
Now imagine rendering a table with this kind of hybrid monster, every piece of td-text will have a cloned data-attribute, unnecessary duplicating data.
On the other hand, the “datasource” attribute may be the solution we are looking for, putting all the data in a separate layer, accessible from the dom and code.
Bob Marchman (July 13, 2008 at 11:56 am)
I feel that XML namespaces are certainly a more elegant and abstracted approach, but I welcome HTML5 for the same reasons John mentioned above (learning curve, adoption rate). I think that the current implementation of XML namespaces is dangerous in the wrong hands, and if the developer uses it incorrectly then it’s basically useless.
We developers bear the responsibility of evangelizing the best suitable spec or standard for the web, so regardless of your opinion, if you want developers/designers to use the method you think is best then get the word out. HTML5 may not be XHTML2, but I still think it’s a step forward. A wonderful thing about the web is how ever-evolving it is. If you don’t like it, then change it. :-)
Sergey (July 13, 2008 at 12:12 pm)
They should have used namespaces instead. This is ugly and smells like JavaBeans property naming convention: arbitrary and inconvenient.
hat (July 13, 2008 at 12:32 pm)
I wonder if we could have external data attached to a page like we can do with stylesheets and javascript, and have static pages that can use that external data without themselves having to change…
Rutherford (July 13, 2008 at 2:10 pm)
I believe xml data islands mixed with some E4X and a new DATA tag would be a good alternative.
This runs fine in Firefox:
<html>
<head>
<style>xml{display:none;}</style>
</head>
<body>
<xml id="mydata">
<person>
<name>John</name>
<city>Miami</city>
</person>
</xml>
<div>Name: <data id="person.name"/></div>
<div>City: <data id="person.city"/></div>
<script>
cxml = document.getElementById("mydata").innerHTML;
person = new XML(cxml);
document.getElementById("person.name").innerHTML = person.name;
document.getElementById("person.city").innerHTML = person.city;
//alert(person);
</script>
</body>
</html>
Forgive me if this screws up the comment system, please format or delete at will.
maht (July 13, 2008 at 2:20 pm)
This is awful, truly terrible.
If you want a dataset for the page, jolly well define one out of band!
markus (July 13, 2008 at 3:11 pm)
To be honest, I don’t like it :(
First, nice post and explanations. I also dislike XML as such and Javascript does not help much to solve the complexities.
I cant help but i feel this seems to create more problems than that it does help.
I will personally rather use meaningful div tags with proper id’s – which I can use as a “namespace” anyway, somewhat – than introduce data-foo* elements which I am sure will have a very narrow and limited use case anyway.
mario (July 13, 2008 at 4:30 pm)
I’ve lately tried to preserve some meta data in my web pages. Using xmlnamespaces with jQuery to access them wasn’t working at all. (I’ve tried Opera, allthough Firefox might work better here.)
So this HTML5 proposal seems like a nice solution. XML support isn’t yet where it should be. While HTML5 isn’t either, these data-attributes are perfectly fine from a SGML point of view. XML purists might be offended by open DTDs, but I see this as just another notation.
You could always later fix it later by s/data-/data:/ and a proper namespace URN.
Ben Hoyt (July 13, 2008 at 7:24 pm)
Huh, interesting that they’re now standardising this with the
data-
prefix. But we’ve been using custom attributes on many of our tags for a while now. The W3C validator doesn’t like it much, but all the browsers support it already.I’m just not sure I see what’s so cool about something that already works fine. But I guess I can see sense in using the
data-
prefix on custom attributes from now on.ismailis (July 13, 2008 at 9:26 pm)
safdfadsa adsfasdf sf
Jake Archibald (July 14, 2008 at 1:33 am)
@Ben: The whole point of HTML5 is to make something that works fine now. Even the new features have an acceptable fallback in current browsers (such as the new form elements).
Flavio (July 14, 2008 at 4:09 am)
@John, thank you for your reply. The whole “learning curve” argument is flawed IMHO. HTML was never meant to be created by non techy users. That’s what (visual) tools and CMSs are for. HTML is for developers. And developers must get things right. If they don’t get XML, maybe they should reposition their careers.
What do you mean by “failure rate”? Failure to author a valid document? Solution: automate and create higher level application specific languages that help devs *generate* proper XML. Or maybe you mean failure on the browser side? In that everybody knows that user agents have to be fault tolerant. Either way I don’t get your point.
Jay Smith (July 14, 2008 at 10:16 am)
The “data-” prefix came about from the whole idea of microformats with strong desire from the engineer who pushed for this feature through the committee. Folks, this problem of microformats have been solve. Look:
<li class="user" name="John Resig" city="Boston"/>
Simple. Clean. Intuitive and Consistent.
Tom (July 14, 2008 at 10:18 am)
I think Craig Buchek wins the Best Idea award.
Jostein Kjønigsen (July 14, 2008 at 11:56 am)
@Tom: Are you saying that HTML should still be the way forward, while we ditch XHTML and rely on XML features in our HTML?
That really doesn’t make sense on any level.
Brad (July 14, 2008 at 12:51 pm)
Part of what was so brilliant about web standards was that we got to raise our page-level interaction to a node-level view. With Microformats and HTML5, I fear we are re-entering the attribute soup of old, even if the attributes are actually meaningful this time around.
That’s my knee-jerk reaction. Part of me can’t see past the “font-face=” aspect, and as I read about all the progress on HTML5, I can’t help but think about the XSLT’s that it’ll take to produce this invisible code.
Philip Dorrell (July 14, 2008 at 3:59 pm)
When this proposal gets implemented, I can add it to my list of HTML annotation methods (with working examples) at http://www.1729.com/blog/HtmlAnnotations.html .
Adam Nemeth (July 14, 2008 at 6:43 pm)
The problem with this attribute is that it has no semantic meaning.
HTML is a document. It’s not a script. a span could contain anything, you say, but in fact, that span would be likely put in a CSS class- and from that on, it has some meaning.
Another problem – which is not a cause, but a symptom – is that if the document is auto-generated, with data coming from multiple sources (multiple contexts), they could be easily mixed. For example: we have a “city” attribute of a person, meaning his/her home location, and a ‘city’ attribute from something else – if we merge the two, what would happen?
The problem becomes familiar in this age of cloud computing, when we think of what happens, if we try to mix two ajax libraries, both trying to overload the “$” name. For some time, it was practically impossible to let two of such code to mix together.
That’s why I’d be in favour of some namespace-based solution, either somens:attribute, or the microformat-ish xmlns:….
IF you argue that contexts shouldn’t be mixed at all, it still stands that “data” has no meaning. Everything is a data. You wouldn’t remove the “class” attribute from a div, why are you trying to have unspecified attributes in a markup language?
Craig Buchek (July 14, 2008 at 11:09 pm)
@Jostein: Why would XML enter into the picture? I said that if you can say in the standard that all attributes starting with “data-” are valid, why can’t we instead say that all attributes with a “:” are valid? This would apply in the HTML version of the spec. If the document is XHTML, then the “:” can have some additional meaning, as per the XML Namespace specs.
Jostein Kjønigsen (July 15, 2008 at 2:40 am)
@Craigh: If that’s what you meant, I got you wrong. Sorry about that.
I still think putting data in attributes is a horrible idea, as it limits the kind of that data can be inserted into the document, not to mention means that data will be tightly coupled to display elements, and cant be separated out in elements of it’s own.
Maybe I’m just old-fashioned, but I think this solution is way less elegant than the XML approach.
Tom (July 15, 2008 at 10:13 am)
The point of Craig Buchek’s recommendation is that it is instantly compatible with both world views.
Al Toman (July 16, 2008 at 8:27 am)
Your web page at http://ejohn.org/blog/html5-doctype/ states “A lot has changed in HTML5 in an attempt to make it even easier to develop a standards-based web page, and it should really pay off in the end.”
If one is a professional web developer, it is relatively easy to develop a standards-based web page disregarding html5. You’re saying that html5 lowers the W.W.W. so that amateurs (who are selling themselves as professionals and the common Joe client knows not the difference) can skim through code.
PHP is simple, learned by most anyone who is interested, however, that simplicity is PHP’s downfall. These simple-amateurs bypass the security needed in back end script, hence, compromising security for quick and easy. Most javascript experts, as well, script poorly. Their script has to be combed to W3C validate it.
Standards need to be toughened, not dumbed-down to satisfy the masses who are just too lazy to read what is good code.
Does html5 introduce a spam-free “mailto” tag? Spam costs us all $29 billion dollars a year alone in the United States. The html mailto tag, though W3C valid, should have been depricated and browser banned years ago. That is, the standards need to be tough not easy!
Why I mention the mailto tag here, is, is I see a potentially simple way to rid the mailto tag with this html5 data attribute. Do you? An interesting way to save a few billion bucks, hey?
Al Toman (July 16, 2008 at 9:08 am)
Rutherford,
Your script, as written, does not run “fine” in firefox. It is not W3C compliant. As written, your xml, person, name, and city tags are not recognized and therefore considered erroneous. Therefore, I fail to see the point that you’re trying to make, here.
Clay McIlrath (July 16, 2008 at 10:25 am)
@Al Toman – I agree that standards are way too loose. DT’s need to be way more strict it what an element can or can’t do, and follow a more logical structure like a programming language would.
With that, why do so many look down upon XHTML? Because it’s harder to work with or because you see a valid flaw in the DT definition/browser support for the standard?
Phil (July 16, 2008 at 8:33 pm)
@Adam Bergmark – Yeah it could be used in a bad way, but then again maybe not. Assuming that there is server scripting generating the markup, that data could indeed be generated through a properly seperated MVC codebase. If it was just flat HTML, then you’re right that the model would mix with the view. But you could say the same about some data wrapped in an H1 tag if you wanted (if with was data hard coded into a view).
I think it all depends on how it’s implemented, and exactly what type of data is being put into these tags. The view is always going to be pulling data from the model. These are just a different way to wrap that data in the view. The actual source would not necessarily be corrupted by using these tags.
Again, it’s all in how it’s used, but I don’t think there is anything inherently “evil” about it.
Flavio (July 17, 2008 at 3:30 am)
@Phil: everything in an HTML should already be “data”. There’s no need for data-* attribute, well, unless your web design is completely flawed.
Kari (August 6, 2008 at 2:15 pm)
First I thought that this is great – there is finally a way to get RDFa into HTML (not XHTML). However, the approach is perhaps a bit too simple.
Mashups will suffer from such a simple system. Say, I use a markup, which uses data-city for marking up my home city and another markup, which uses data-city for marking up the city, where the comment was written. There will be a nameclash.
Of course, it is possible to differentiate the attributes using data-xmlns (as mentioned in a comment at http://ejohn.org/blog/bbc-removing-microformat-support/ ), but why don’t you make it a standard?
Another way to do it is by having the namespace after the data-prefix: data-www.example.com-city but this is hardly a nice solution…
Andrey Shchekin (September 21, 2008 at 5:30 am)
@Jostein
Where I can vote you into HTML5 standartization group? Maybe spec would be more sane then.
@Flavio
I absolutely agree about the learning curve. Also, the actual problem is not in the learning curve as-is, it is in the fact that xml namespaces do not work consistently in browsers+are not advertised enough.
90% of developers I know find XML easier to understand than HTML. Just because XML has strict rules, and HTML has backwards compatibility with tag soup.
si (October 8, 2008 at 6:52 pm)
Thanks for the tip John, I’ve just had need to use a custom attribute, and renamed it to match the spec.
Our use case was this: we wanted client-side sorting on date fields in a table, but because of culture differences we can’t easily parse the date rendered to the client, so instead we set a machine-friendly attribute data-sort=’yyyyMMddHHmmss’ along with the user-friendly date.
In an ideal world I could’ve hidden it inside the table row, but because of server-side limitations to the rendering, it would have been a lot more work and more change$ to our web framework, and this is an easy, quick solution that is obvious to anyone reading the code.
I agree with the purists that this attribute should have it’s own namespace, and chances are data- will get abused, but it is useful.
TomCarnell (March 5, 2009 at 6:30 am)
Yes, it will be very useful. But seems to me to demonstrate that there is scope for a new concept:
- HTML => semantic markup and structure
- CSS => style and presentation
- Javascript => behaviour
- Data content => ???
The next question will be that I want to store JSON data in a “data-XXX” field – will the browser give me back a JS object if the contents ‘appear’ to be JSON? or will there be “data-plain-XXX” (for plain text data values) and “data-json-XXX”.
This seems like a quick solution to the bigger problem of seperating data from structure.
Eric Garside (April 16, 2009 at 1:14 pm)
This method of storing data attributes on HTML markup could be integrated seamlessly with jQuery. On line 1296 in jQuery-1.3.2, where it returns expando data:
...
// Return the named cache data, or the ID for the element
return name ?
jQuery.cache[ id ][ name ] :
id;
Could be easily expanded to something like
...
// Return the named cache data, or the ID for the element
return name ?
jQuery.cache[ id ][ name ] ||
elem.getAttribute("data-" + name ) :
id;
This way, you can access data through a convenient, already-used interface in jQuery.