Abstract: This paper describes a case study on the markup behind "info-boxes" in Wikipedia articles. Currently, the markup uses HTML tables to achieve its formatting. A rewritten version of the code is more semantic, more compact, more readable, and more friendly to reuse — while still maintaining the same presentation in standards-compliant browsers. A key part of the solution is to use the CSS "inline-block" feature.
Keywords: Wikipedia, markup structured documents, style sheets
This case study is part of a series.
Many Wikipedia articles contain "info-boxes" on the right side. Info-boxes contain essential statistics about the subject of the article, in a format that can easily be compared with other similar subjects. For example, all articles about countries have an info-box with the name of the country, its flag, name of the capital etc.
When trying to print Wikipedia articles, info-boxes pose some challenges due to the HTML code behind the info-boxes. The current markup uses HTML tables to achieve their presentation. This is not entirely unreasonable; much of the information in the info-boxes is tabular in its form. However, the chosen markup is poorly structured and extensive use of style attributes makes it hard to re-purpose the articles for other presentations, e.g., printing.
I have studied the markup behind one info-box in particular, namely the one for Norway (which happens to be my native country). All countries use the same templates to generate their info-boxes, so the findings should be relevant beyond this specific articles.
I have also created an alternative, table-free version of the
info-box. Instead of tables, the markup in my alternative version uses
semantic HTML elements (
dl, dt, dt, ol, li) combined with
some of the more advanced features of CSS 2.1. In particular, inline-block and counters are used to replicate the traditional presentation of info-boxes.
The result is an info-box that looks identical to the tradition info-box, but that uses fewer bytes and fewer elements to achieve the same presentation. Also, I believe the code is more readable, more suited for content reuse, and is more accessibility for non-visual renditions. Here is a table with links to the two code snippets (without style sheets) and a comparison between them:
|bytes||elements||style attributes||table cells||list items||dt/dd pairs|
|original code (source)||16692||349||68||99||0||0|
|table-free code (source)||11655||256||0||0||5||33|
I believe these findings to be representative of what can be achieved if the markup of info-boxes are changed throughout Wikipedia. It should be fairly easy to do so, as the HTML code is generated by templates.
Here is a sample markup fragment from the original code:
<tr class="mergedrow"> <td style="width:1em; padding:0 0 0 0.6em;"> - </td> <td style="padding-left:0em;"><a href="/wiki/List_of_Norwegian_monarchs" title="List of Norwegian monarchs">Monarch</a></td> <td><a href="/wiki/Harald_V_of_Norway" title="Harald V of Norway"> Harald V</a></td> </tr>
And here is the corresponding code in the table-free version:
<dt class=merged><a href="/wiki/List_of_Norwegian_monarchs" title="List of Norwegian monarchs">Monarch</a> <dd><a href="/wiki/Harald_V_of_Norway" title="Harald V of Norway">Harald V</a>
Below are the rendered versions; the original version is on the left and my alternative, table-free version is on the right:
Some notes about the examples above:
ddelements on separate lines, and the infobox is therefore somewhat longer. I believe this rendering to be acceptable and it will serve as a subtle reminder to upgrade to a more standards-compliant browser.
spanelements can be removed.
There are some key differences between the the original document and the table-free version. In the table-free version:
spanelements have a class attribute with a meaningful value
dl, dt, dt, ol, liare preferred over
span, when appropriate
I suggest that these principles are followed for Wikipedia's markup in general.