John Resig - A Typical HTML Page

A Typical HTML Page

When doing DOM-based performance testing you frequently need to pick a sample HTML document to work against. This raises the question: What is a good, representative, HTML document?

For many people a good document seems to file into one of two categories:

A large web page with a lot of content. When we did our initial performance testing with jQuery we used Shakespeare’s As You Like It (lots of elements, but a very flat structure) – Mootools uses an old draft of the W3C CSS3 Selectors recommendation. This has a lot of content, as well – thousands of elements with a medium depth structure.
A popular web page. Common recommendations include ‘yahoo.com’ and ‘microsoft.com’.

What’s troubling is that there doesn’t really seem to be any way to determine what a representative web page actually is. There’s a couple things that I’d like to propose as being good indicators:

Standards-based semantic markup (including strong use of attributes: id, class, etc.).
Non-trivial file size and element count (testing the scalability of the performance).
Some use of tables and form elements (frequent inclusions in most web pages).
Strong use of CSS (frequently implies a deep element structure, in order to accommodate complex layouts).
Pervasive use of JavaScript. If JavaScript performance analysis is being done it’s probably good to start with a page that already has a desire to use it.

I think, out of all of these aspects, one page stands out: CNN.com. Here’s why:

It uses semantic HTML 4 markup with a lot of classnames and ids.
It’s about 92kb in size and has 1648 elements in it.
It has some tables (seemingly for legacy material) and some forms (search forms, drop-downs).
Lots of CSS and JavaScript. Makes good use of Prototype which, at least, should show an desire in having a page with performant JavaScript.
It’s, also, imperfect. I would consider this to be desirable. Rarely are pages completely without fault – and the heavy use of embedded JavaScript, ads, and non-semantic tables helps to add a stark dash of reality.

Of course, analysis could always be done against multiple pages and be viewed in aggregate but we need a place to start. So, what do you think; is CNN a good, representative, page for doing performance analysis against?

Posted: February 20th, 2008

Subscribe for email updates

12 Comments (Show Comments)

kangax (February 20, 2008 at 1:57 pm)

CNN is no doubt a great candidate. I think most of the news websites offer a rich page structure with plenty of functionality and follow standards somewhat decently. From the quick glance, http://www.msnbc.msn.com/, http://www.nytimes.com/ and http://news.yahoo.com/ seem to stand out the most.
Ryan (February 20, 2008 at 2:04 pm)

I agree, it’s difficult to determine what is a good page to benchmark against.

Another idea is how often JS libraries are used for web apps, which IMO tend to have simpler structures compared to CNN or Yahoo.com.
Robert Accettura (February 20, 2008 at 2:07 pm)

As good as any I’d say. For this purpose I’d suggest that any page picked should *not* validate, and should contain legacy code in addition to semantic.
Rafael (February 20, 2008 at 2:10 pm)

Yes and we’ve worked pretty well with their tech guys/leads in the past. Their tech lead actually dropped by the office before they did their big web redesign. They’re a top site so makes them a great candidate.
Jody Tate (February 20, 2008 at 3:44 pm)

I was thinking what Robert Accettura already thought and wrote. CNN.com is as good a choice as any. ESPN.com (I’m not a sports fan, honest) might be one to look at as well. Their redesign to use CSS has been written about various places, online and in-print. But looking at the site it has some tables doing what might be layout work rather than data display (but that was just with a quick glance at the source code).
Boris (February 20, 2008 at 4:54 pm)

The right answer is that you want to test against several different documents of different types. One large, flat document, one nested presentational tables document, one site like CNN.com is probably the bare minimum. I’d recomment a slashdot comment document with comments expanded, and maybe with moderation controls as another test case.

We’ve certainly had serious Gecko performance bugs that didn’t bite until, say, you had several thousand form controls without a form containing them…
Wade Harrell (February 20, 2008 at 5:45 pm)

this makes me curious about how some other similar sites measure up….

http://www.bbc.co.uk/
http://www.cbsnews.com/
http://www.cbs.com/
http://www.nbc.com/
Wade Harrell (February 20, 2008 at 5:47 pm)

doh, ignore that last comment as I used an < after nbc to point out they are using jq.js and that stripped the rest of the copy… :(
Jon (February 20, 2008 at 6:02 pm)

Sounds good, and hopefully I’m just pointing out the obvious but if its going to be used as a metric for performance someone should create a static mirror of a particular date so everyone is comparing results against the exact same DOM. Elements like Breaking news banners are constantly being added and removed so performance testing day to day may fluctuate.
Andrew Dupont (February 20, 2008 at 8:30 pm)

I support the idea, if only so that I can say that Prototype is run on 100% of the sites on the web. (Extrapolating, of course, from a sample of “representative” web sites.)
Robert Nyman (February 21, 2008 at 2:07 am)

Hmm… Hard to say. I think that it’s very hard to agree on just one specific web site to be the role model/blueprint for testing. The risk is, no matter what site is chosen, that optimizations in JavaScript code, unintentional or not, will be focused on just one example.

The key stance here should be like Boris states above: it needs to be tested with a lot of different documents of very varying types to establish the best common ground and solutions.
zac spitzer (February 21, 2008 at 2:54 am)

I think it’s a good example because it of the changing content, running samples of a DOM-based performance test over a longer period where some of the content changes makes for a good test.

Any unusual performance spikes which occur can be noted and investigated which may reveal quirky performance issues.

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.