A proposal to simplify and improve footnote markup in Wikipedia

Håkon Wium Lie

Affiliations: CTO, Opera Software; Chairman, YesLogic

Abstract: Wikipedia offers a simple and powerful method to create footnotes (a.k.a. references) through the <ref> element in the wiki markup. The resulting HTML markup, however, is more complex than necessary. This study shows how the number of elements required to represent a footnote can be halved, while improving the reusability of the content.

Date: 2009-03-31

This case study is part of a series.

Footnotes, or references, in Wikipedia articles are created with the <ref> element in the wiki markup. Here is a simple example from the article on Norway:

In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,<ref>[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2197762&dopt=Abstract The Black Death in Norway]</ref> resulting in a period of decline, both socially and economically.

The wiki markup is converted to HTML code in Wikipedia's servers. The above code will result in two HTML chunks being generated: a paragraph with a footnote call, and a footnote with a footnote marker. Here is the resulting HTML code of the paragraph with the footnote call:

<p>In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,<sup id="cite_ref-16" class="reference"><a href="#cite_note-16" title=""><span>[</span>17<span>]</span></a></sup> resulting in a period of decline, both socially and economically.

And here is the resulting HTML code of the footnote with the footnote marker:

<li id="cite_note-16"><b><a href="#cite_ref-16" title="">^</a></b> <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=PubMed&list_uids=2197762&dopt=Abstract" class="external text" title="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2197762&dopt=Abstract" rel="nofollow">The Black Death in Norway</a></li>

Current markup

In a browser, the HTML code is presented approximately like this:

In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,[17] resulting in a period of decline, both socially and economically.

...

  1. ^ The Black Death in Norway

The presentation is fine, but the underlying markup can be simplified.

Simplified HTML, more advanced CSS

It is possible to simplify the HTML markup while still retaining the same presentation in common browsers. The key to the simplifications is to describe the presentation in CSS instead of using presentational HTML elements. Also, the number of elements are reduced by having one element serve both as a source and target anchor. The presentation, and the link functionality, remains the same:

In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,17 resulting in a period of decline, both socially and economically.

...

  1. ^ The Black Death in Norway

The CSS code used to style the footnote call is simple:

a.ref { font-size: 83%; vertical-align: 35% }

Also, the square brackets around the footnote call are generated by the style sheet:

a.ref:before { content: '[' }
a.ref:after  { content: ']' }

Support for legacy browsers

Some legacy browsers do not support CSS 2.1 generated content (most notably IE6 and IE7). To allow for graceful degradation in these browsers, one must insert the square brackets into the markup, rather than in the style sheet. This is done in this example:

In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,[17] resulting in a period of decline, both socially and economically.

...

  1. ^ The Black Death in Norway

The HTML code used in this example is:

<p>In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,<a class="ref" id="cite_ref-16" href="#cite_note-16">[17]</a> resulting in a period of decline, both socially and economically.
<li id="cite_note-16"><a class="backref" href="#cite_ref-16">^</a> <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2197762&dopt=Abstract" class="external text" title="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2197762&dopt=Abstract" rel="nofollow">The Black Death in Norway</a></li>

This is also the HTML code that I propose for Wikipedia to use.

Measuring the gain

The table below compares the current markup of Wikipedia articles with the two proposals.

elementsattributespseudo-elements
current markup 790
CSS-based solution 372
CSS-based without generated content 370

As can be seen, the number of elements is halved, and the number of attributes is reduced. The reported gain is per footnote. Mature articles can have many footnotes. For example, the article on Unites States has more than 200 footnotes. Only inline elements and attributes on inline elements are counted in the table above.

The gains in the two CSS-based solutions are simlar. The version that uses generated content may be slightly more complex in browsers due to it using two pseudo-elements per footnote. However, it also leaves more flexibility; the square brackets in the pseudo-elements can be removed or exchanged with other content.

The gains, as described in the table, are achieved by:

Also, some of the code is slightly more complex:

Improving reusability

The markup proposed in this study isn't just simpler, it also allows for better reuse of the content. Consider this code in the current markup:

<b><a href="#cite_ref-16a" title="">^</a></b>

and this code in the new markup:

<a class="backref" href="#cite_ref-16b">^</a>

As can be seen, the new markup has added a class ("backref") to the element that contains the aback reference (which points to footnote callout). This way, the character that denotes the back reference ("^") can be removed when necessary. For example, it doesn't make sense to show the character in a printed version of the article.

Improving the presentation

CSS 2.1 offers features that can further improve the presentation of the content. Here is a simple example where square brackets are added to the footnote marker:

In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,[17] resulting in a period of decline, both socially and economically.

...

  1. ^ The Black Death in Norway

Also, the back reference character has been removed in the above presentation. This shows how the proposed markup allows for presentational flexibility not offered by the current markup.

Does it matter?

Reducing the number of elements and attributes may seem like an interesting, but useless exercise in the days of cheap memory and fast connections. Still, when multiplied by a vast number of articles and page requests, I believe the gains are significant. Along with other optimizations, Wikipedia's bandwidth can be improved, and more articles can fit into smaller machines.

Also, the presentational flexibility offered by the proposed markup is, in itself, a reason to use it.

Acknowledgements

Comments from Reisio improved this paper.