A proposal to de-table Wikipedia’s info-boxes

Håkon Wium Lie

Affiliations: CTO, Opera Software; Chairman, YesLogic

Abstract: This paper describes a case study on the markup behind "info-boxes" in Wikipedia articles. Currently, the markup uses HTML tables to achieve its formatting. A rewritten version of the code is more semantic, more compact, more readable, and more friendly to reuse — while still maintaining the same presentation in standards-compliant browsers. A key part of the solution is to use the CSS "inline-block" feature.

Keywords: Wikipedia, markup structured documents, style sheets

Date: 2009-03-23

This case study is part of a series.

Many Wikipedia articles contain "info-boxes" on the right side. Info-boxes contain essential statistics about the subject of the article, in a format that can easily be compared with other similar subjects. For example, all articles about countries have an info-box with the name of the country, its flag, name of the capital etc.

When trying to print Wikipedia articles, info-boxes pose some challenges due to the HTML code behind the info-boxes. The current markup uses HTML tables to achieve their presentation. This is not entirely unreasonable; much of the information in the info-boxes is tabular in its form. However, the chosen markup is poorly structured and extensive use of style attributes makes it hard to re-purpose the articles for other presentations, e.g., printing.

I have studied the markup behind one info-box in particular, namely the one for Norway (which happens to be my native country). All countries use the same templates to generate their info-boxes, so the findings should be relevant beyond this specific articles.

I have also created an alternative, table-free version of the info-box. Instead of tables, the markup in my alternative version uses semantic HTML elements (dl, dt, dt, ol, li) combined with some of the more advanced features of CSS 2.1. In particular, inline-block and counters are used to replicate the traditional presentation of info-boxes.

The result is an info-box that looks identical to the tradition info-box, but that uses fewer bytes and fewer elements to achieve the same presentation. Also, I believe the code is more readable, more suited for content reuse, and is more accessibility for non-visual renditions. Here is a table with links to the two code snippets (without style sheets) and a comparison between them:

byteselementsstyle attributestable cellslist itemsdt/dd pairs
original code (source)16692349689900
table-free code (source)1165525600533

I believe these findings to be representative of what can be achieved if the markup of info-boxes are changed throughout Wikipedia. It should be fairly easy to do so, as the HTML code is generated by templates.

The markup

Here is a sample markup fragment from the original code:

<tr class="mergedrow">
<td style="width:1em; padding:0 0 0 0.6em;">&#160;-&#160;</td>
<td style="padding-left:0em;"><a href="/wiki/List_of_Norwegian_monarchs" 
  title="List of Norwegian monarchs">Monarch</a></td>
<td><a href="/wiki/Harald_V_of_Norway" title="Harald V of Norway">
  Harald V</a></td>

And here is the corresponding code in the table-free version:

<dt class=merged><a href="/wiki/List_of_Norwegian_monarchs" 
  title="List of Norwegian monarchs">Monarch</a>
<dd><a href="/wiki/Harald_V_of_Norway" title="Harald V of Norway">Harald V</a>


Below are the rendered versions; the original version is on the left and my alternative, table-free version is on the right:

Some notes about the examples above:

Suggested principles

There are some key differences between the the original document and the table-free version. In the table-free version:

I suggest that these principles are followed for Wikipedia's markup in general.