When doing DOM-based performance testing you frequently need to pick a sample HTML document to work against. This raises the question: What is a good, representative, HTML document?
For many people a good document seems to file into one of two categories:
- A large web page with a lot of content. When we did our initial performance testing with jQuery we used Shakespeare’s As You Like It (lots of elements, but a very flat structure) – Mootools uses an old draft of the W3C CSS3 Selectors recommendation. This has a lot of content, as well – thousands of elements with a medium depth structure.
- A popular web page. Common recommendations include ‘yahoo.com’ and ‘microsoft.com’.
What’s troubling is that there doesn’t really seem to be any way to determine what a representative web page actually is. There’s a couple things that I’d like to propose as being good indicators:
- Standards-based semantic markup (including strong use of attributes: id, class, etc.).
- Non-trivial file size and element count (testing the scalability of the performance).
- Some use of tables and form elements (frequent inclusions in most web pages).
- Strong use of CSS (frequently implies a deep element structure, in order to accommodate complex layouts).
I think, out of all of these aspects, one page stands out: CNN.com. Here’s why:
- It uses semantic HTML 4 markup with a lot of classnames and ids.
- It’s about 92kb in size and has 1648 elements in it.
- It has some tables (seemingly for legacy material) and some forms (search forms, drop-downs).
Of course, analysis could always be done against multiple pages and be viewed in aggregate but we need a place to start. So, what do you think; is CNN a good, representative, page for doing performance analysis against?