Affiliations: CTO, Opera Software; Chairman, YesLogic
Abstract: This paper describes a case study on the markup behind "info-boxes" in Wikipedia articles. Currently, the markup uses HTML tables to achieve its formatting. A rewritten version of the code is more semantic, more compact, more readable, and more friendly to reuse — while still maintaining the same presentation in standards-compliant browsers. A key part of the solution is to use the CSS "inline-block" feature.
Keywords: Wikipedia, markup structured documents, style sheets
Date: 2009-03-23
This case study is part of a series.
Many Wikipedia articles contain "info-boxes" on the right side. Info-boxes contain essential statistics about the subject of the article, in a format that can easily be compared with other similar subjects. For example, all articles about countries have an info-box with the name of the country, its flag, name of the capital etc.
When trying to print Wikipedia articles, info-boxes pose some challenges due to the HTML code behind the info-boxes. The current markup uses HTML tables to achieve their presentation. This is not entirely unreasonable; much of the information in the info-boxes is tabular in its form. However, the chosen markup is poorly structured and extensive use of style attributes makes it hard to re-purpose the articles for other presentations, e.g., printing.
I have studied the markup behind one info-box in particular, namely the one for Norway (which happens to be my native country). All countries use the same templates to generate their info-boxes, so the findings should be relevant beyond this specific articles.
I have also created an alternative, table-free version of the
info-box. Instead of tables, the markup in my alternative version uses
semantic HTML elements (dl, dt, dt, ol, li) combined with
some of the more advanced features of CSS 2.1. In particular, inline-block and counters are used to replicate the traditional presentation of info-boxes.
The result is an info-box that looks identical to the tradition info-box, but that uses fewer bytes and fewer elements to achieve the same presentation. Also, I believe the code is more readable, more suited for content reuse, and is more accessibility for non-visual renditions. Here is a table with links to the two code snippets (without style sheets) and a comparison between them:
| bytes | elements | style attributes | table cells | list items | dt/dd pairs | |
|---|---|---|---|---|---|---|
| original code (source) | 16692 | 349 | 68 | 99 | 0 | 0 |
| table-free code (source) | 11655 | 256 | 0 | 0 | 5 | 33 |
I believe these findings to be representative of what can be achieved if the markup of info-boxes are changed throughout Wikipedia. It should be fairly easy to do so, as the HTML code is generated by templates.
Here is a sample markup fragment from the original code:
<tr class="mergedrow"> <td style="width:1em; padding:0 0 0 0.6em;"> - </td> <td style="padding-left:0em;"><a href="/wiki/List_of_Norwegian_monarchs" title="List of Norwegian monarchs">Monarch</a></td> <td><a href="/wiki/Harald_V_of_Norway" title="Harald V of Norway"> Harald V</a></td> </tr>
And here is the corresponding code in the table-free version:
<dt class=merged><a href="/wiki/List_of_Norwegian_monarchs" title="List of Norwegian monarchs">Monarch</a> <dd><a href="/wiki/Harald_V_of_Norway" title="Harald V of Norway">Harald V</a>
Below are the rendered versions; the original version is on the left and my alternative, table-free version is on the right:
Some notes about the examples above:
dt and dd elements on separate lines,
and the infobox is therefore somewhat longer. I believe this rendering
to be acceptable and it will serve as a subtle reminder to upgrade to a more
standards-compliant browser.
span elements can be removed.
There are some key differences between the the original document and the table-free version. In the table-free version:
div and span elements have a class attribute with a meaningful value
dl, dt, dt, ol, li are preferred over div and span, when appropriate
I suggest that these principles are followed for Wikipedia's markup in general.