A Typical HTML Page

When doing DOM-based performance testing you frequently need to pick a sample HTML document to work against. This raises the question: What is a good, representative, HTML document?

For many people a good document seems to file into one of two categories:

  1. A large web page with a lot of content. When we did our initial performance testing with jQuery we used Shakespeare’s As You Like It (lots of elements, but a very flat structure) – Mootools uses an old draft of the W3C CSS3 Selectors recommendation. This has a lot of content, as well – thousands of elements with a medium depth structure.
  2. A popular web page. Common recommendations include ‘yahoo.com’ and ‘microsoft.com’.

What’s troubling is that there doesn’t really seem to be any way to determine what a representative web page actually is. There’s a couple things that I’d like to propose as being good indicators:

  • Standards-based semantic markup (including strong use of attributes: id, class, etc.).
  • Non-trivial file size and element count (testing the scalability of the performance).
  • Some use of tables and form elements (frequent inclusions in most web pages).
  • Strong use of CSS (frequently implies a deep element structure, in order to accommodate complex layouts).
  • Pervasive use of JavaScript. If JavaScript performance analysis is being done it’s probably good to start with a page that already has a desire to use it.

I think, out of all of these aspects, one page stands out: CNN.com. Here’s why:

  • It uses semantic HTML 4 markup with a lot of classnames and ids.
  • It’s about 92kb in size and has 1648 elements in it.
  • It has some tables (seemingly for legacy material) and some forms (search forms, drop-downs).
  • Lots of CSS and JavaScript. Makes good use of Prototype which, at least, should show an desire in having a page with performant JavaScript.
  • It’s, also, imperfect. I would consider this to be desirable. Rarely are pages completely without fault – and the heavy use of embedded JavaScript, ads, and non-semantic tables helps to add a stark dash of reality.

Of course, analysis could always be done against multiple pages and be viewed in aggregate but we need a place to start. So, what do you think; is CNN a good, representative, page for doing performance analysis against?

Posted: February 20th, 2008

Subscribe for email updates

12 Comments (Show Comments)

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.

Secrets of the JavaScript Ninja

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

John Resig Twitter Updates

@jeresig / Mastodon

Infrequent, short, updates and links.