When doing DOM-based performance testing you frequently need to pick a sample HTML document to work against. This raises the question: What is a good, representative, HTML document?
For many people a good document seems to file into one of two categories:
- A large web page with a lot of content. When we did our initial performance testing with jQuery we used Shakespeare’s As You Like It (lots of elements, but a very flat structure) – Mootools uses an old draft of the W3C CSS3 Selectors recommendation. This has a lot of content, as well – thousands of elements with a medium depth structure.
- A popular web page. Common recommendations include ‘yahoo.com’ and ‘microsoft.com’.
What’s troubling is that there doesn’t really seem to be any way to determine what a representative web page actually is. There’s a couple things that I’d like to propose as being good indicators:
- Standards-based semantic markup (including strong use of attributes: id, class, etc.).
- Non-trivial file size and element count (testing the scalability of the performance).
- Some use of tables and form elements (frequent inclusions in most web pages).
- Strong use of CSS (frequently implies a deep element structure, in order to accommodate complex layouts).
I think, out of all of these aspects, one page stands out: CNN.com. Here’s why:
- It uses semantic HTML 4 markup with a lot of classnames and ids.
- It’s about 92kb in size and has 1648 elements in it.
- It has some tables (seemingly for legacy material) and some forms (search forms, drop-downs).
Of course, analysis could always be done against multiple pages and be viewed in aggregate but we need a place to start. So, what do you think; is CNN a good, representative, page for doing performance analysis against?
kangax (February 20, 2008 at 1:57 pm)
CNN is no doubt a great candidate. I think most of the news websites offer a rich page structure with plenty of functionality and follow standards somewhat decently. From the quick glance, http://www.msnbc.msn.com/, http://www.nytimes.com/ and http://news.yahoo.com/ seem to stand out the most.
Ryan (February 20, 2008 at 2:04 pm)
I agree, it’s difficult to determine what is a good page to benchmark against.
Another idea is how often JS libraries are used for web apps, which IMO tend to have simpler structures compared to CNN or Yahoo.com.
Robert Accettura (February 20, 2008 at 2:07 pm)
As good as any I’d say. For this purpose I’d suggest that any page picked should *not* validate, and should contain legacy code in addition to semantic.
Rafael (February 20, 2008 at 2:10 pm)
Yes and we’ve worked pretty well with their tech guys/leads in the past. Their tech lead actually dropped by the office before they did their big web redesign. They’re a top site so makes them a great candidate.
Jody Tate (February 20, 2008 at 3:44 pm)
I was thinking what Robert Accettura already thought and wrote. CNN.com is as good a choice as any. ESPN.com (I’m not a sports fan, honest) might be one to look at as well. Their redesign to use CSS has been written about various places, online and in-print. But looking at the site it has some tables doing what might be layout work rather than data display (but that was just with a quick glance at the source code).
Boris (February 20, 2008 at 4:54 pm)
The right answer is that you want to test against several different documents of different types. One large, flat document, one nested presentational tables document, one site like CNN.com is probably the bare minimum. I’d recomment a slashdot comment document with comments expanded, and maybe with moderation controls as another test case.
We’ve certainly had serious Gecko performance bugs that didn’t bite until, say, you had several thousand form controls without a form containing them…
Wade Harrell (February 20, 2008 at 5:45 pm)
this makes me curious about how some other similar sites measure up….
Wade Harrell (February 20, 2008 at 5:47 pm)
doh, ignore that last comment as I used an < after nbc to point out they are using jq.js and that stripped the rest of the copy… :(
Jon (February 20, 2008 at 6:02 pm)
Sounds good, and hopefully I’m just pointing out the obvious but if its going to be used as a metric for performance someone should create a static mirror of a particular date so everyone is comparing results against the exact same DOM. Elements like Breaking news banners are constantly being added and removed so performance testing day to day may fluctuate.
Andrew Dupont (February 20, 2008 at 8:30 pm)
I support the idea, if only so that I can say that Prototype is run on 100% of the sites on the web. (Extrapolating, of course, from a sample of “representative” web sites.)
Robert Nyman (February 21, 2008 at 2:07 am)
The key stance here should be like Boris states above: it needs to be tested with a lot of different documents of very varying types to establish the best common ground and solutions.
zac spitzer (February 21, 2008 at 2:54 am)
I think it’s a good example because it of the changing content, running samples of a DOM-based performance test over a longer period where some of the content changes makes for a good test.
Any unusual performance spikes which occur can be noted and investigated which may reveal quirky performance issues.