A proposal to simplify and improve image markup in Wikipedia

Håkon Wium Lie

Affiliations: CTO, Opera Software; Chairman, YesLogic

Abstract: Wikipedia's HTML code for representing images with captions is analyzed. The current code is quite good. However, it is verbose and it contains redundancies. A more compact solution is proposed as a replacement; the proposed solution reduces the number of elements from 10 to 6 and the code size is reduced by more than 50%. Also, some functionality considered redundant is removed. A version that retains all functionality, while still simplifying the markup, is also presented. All proposed versions remove the style attribute to make content reuse simpler.

Date: 2009-04-04

This case study is part of a series.

Wikipedia articles are often illustrated by images. This study looks at the most common image type: the thumbnail images that are floated to the left or right side of the text. These images are accompanied by a textual caption underneath the image. By clicking on the thumbnail image, or on a small icon below the image, users are taken to a page showing a larger version of the image. The default width of thumbnail images is 180 pixels, but users may indicate a preference for other sizes (e.g. 200 pixels) in their account settings.

I have chosen one image from the article on Norway as a sample in this study. The image, with its caption, can be seen on the right side.

The wiki markup to add this image to a page is quite simple and measures only 153 bytes:

[[Image:Bryggen (6-2007).jpg|thumb|left|[[Bryggen]] in Bergen is on the [[List of World Heritage Sites in Europe|list of UNESCO World Heritage Sites]].]]

Current HTML code

When the wiki code is converted into HTML by Wikipedia's servers, it expands to this code which is sent to browsers:

<div class="thumb tleft"> <div class="thumbinner" style="width:182px;"><a href="/wiki/File:Bryggen_(6-2007).jpg" class="image" title="Bryggen in Bergen is on the list of UNESCO World Heritage Sites."><img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e3/Bryggen_%286-2007%29.jpg/180px-Bryggen_%286-2007%29.jpg" width="180" height="121" border="0" class="thumbimage" /></a> <div class="thumbcaption"> <div class="magnify"><a href="/wiki/File:Bryggen_(6-2007).jpg" class="internal" title="Enlarge"><img src="/skins-1.5/common/images/magnify-clip.png" width="15" height="11" alt="" /></a></div> <a href="/wiki/Bryggen" title="Bryggen">Bryggen</a> in Bergen is on the <a href="/wiki/List_of_World_Heritage_Sites_in_Europe" title="List of World Heritage Sites in Europe">list of UNESCO World Heritage Sites</a>.</div> </div> </div>

Presented in a more structured way, the HTML code looks like this:

<div class="thumb tleft">
  <div class="thumbinner" style="width:182px;">
    <a href="/wiki/File:Bryggen_(6-2007).jpg" class="image" 
        title="Bryggen in Bergen is on the list of UNESCO World Heritage Sites.">
      <img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/
           e/e3/Bryggen_%286-2007%29.jpg/180px-Bryggen_%286-2007%29.jpg" 
           width="180" height="121" border="0" class="thumbimage" />
    </a>
    <div class="thumbcaption">
      <div class="magnify">
        <a href="/wiki/File:Bryggen_(6-2007).jpg" class="internal" title="Enlarge">
          <img src="/skins-1.5/common/images/magnify-clip.png" width="15" height="11" alt="" />
        </a>
      </div>
      <a href="/wiki/Bryggen" title="Bryggen">Bryggen</a> 
      in Bergen is on the 
      <a href="/wiki/List_of_World_Heritage_Sites_in_Europe" 
                 title="List of World Heritage Sites in Europe">
      list of UNESCO World Heritage Sites</a>.
    </div>
  </div>
</div>

The current HTML markup, as shown above, uses 10 elements and one style attribute. The code measures 899 bytes – more than a five-fold increase from the original wiki markup. There are several reasons why the code is bigger than the original:

All in all, the HTML code is quite good. However, it can be reduced in size and the style attribute should be removed. In this study, I will present one proposed solution that shows how the code can be reduced in size. The strategy for doing so is twofold: remove redundancy and make CSS do more of the work. Also, several variations of the proposed solution will be discussed.

Proposed solution

I propose to use this HTML code instead of the current code:

<div class="thumb tleft w180">
  <a href="/wiki/File:Bryggen_(6-2007).jpg">
    <img src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e3/
       Bryggen_%286-2007%29.jpg/180px-Bryggen_%286-2007%29.jpg" alt=""/>
  </a>
  <div class="caption">
    <a href="/wiki/Bryggen">Bryggen</a> in Bergen is on the 
    <a href="/wiki/List_of_World_Heritage_Sites_in_Europe" 
      title="List of World Heritage Sites in Europe">list of UNESCO World Heritage Sites
    </a>.
  </div>
</div>

This code uses 6 elements, no style attribute, and only takes up 239 bytes (compared with the 569 bytes in the current code). This represents a significant saving when multiplied with the number of images served by Wikipedia.

The presentation of the image, as can be seen on the right, is almost the same. The changes are:

It should also be noted that a container element around the caption has been kept (<div class="caption">). This is not necessary to achieve the current rendering (it is removed in the next two examples), but it has been kept in order to achieve other presentations in the future. For example, the caption can be positioned differently when there is an element to attach CSS rules to.

CSS3-based solution with magnifier

<div class="thumb tleft w180">
  <a href="/wiki/File:Bryggen_(6-2007).jpg">
    <img src="http://upload.wikimedia.org/wikipedia/commons/thumb/
      e/e3/Bryggen_%286-2007%29.jpg/180px-Bryggen_%286-2007%29.jpg" alt=""/>
  </a>
  <a class="magnify" href="/wiki/File:Bryggen_(6-2007).jpg">(magnify)</a>
  <a href="/wiki/Bryggen">Bryggen</a> in Bergen is on the 
  <a href="/wiki/List_of_World_Heritage_Sites_in_Europe" 
    title="List of World Heritage Sites in Europe">list of UNESCO World Heritage Sites
  </a>.
</div>

It is possible to further reduce the number of elements while bringing back the magnifier icon. This CSS3-based solution uses 5 elements, no style attributes and 252 bytes.

In CSS3, the content property applies to all elements, not just pseudo-elements. In the code above, the text "(magnify)" is replaced with an image in the style sheet. Browsers that support this part of CSS3 will show the magnifier icon, while those that support the content property as per CSS 2.1 will show the text (magnify). Currently, only Opera shows an image. Therefore, it may be too early to start using this feature.

Here is the CSS code that replaces the text with an image:

.magnify { 
  content: url(http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png);
}

CSS2-based solution with magnifier and hovering effect

<div class="thumb tleft w180">
  <a href="/wiki/File:Bryggen_(6-2007).jpg"
      title="Bryggen in Bergen is on the list of UNESCO World Heritage Sites.">
    <img src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/
           e3/Bryggen_%286-2007%29.jpg/180px-Bryggen_%286-2007%29.jpg" alt=""/>
  </a>
  <a href="/wiki/File:Bryggen_(6-2007).jpg">
    <img alt="enlarge" class="magnify" 
      src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png"/>
  </a>
  <div class="caption">
    <a href="/wiki/Bryggen">Bryggen</a> in Bergen is on the 
    <a href="/wiki/List_of_World_Heritage_Sites_in_Europe" 
       title="List of World Heritage Sites in Europe">list of UNESCO World Heritage Sites</a>.
  </div>
</div>

This solution brings back the magnifier icon and the hovering effects without using CSS3. As such, this version provides exactly the same functionality as the current code.

If a magnifier glass and the hover effect are deemed necessary, this solution is recommended.

The bare minimum solution

<div class="t-l-w180">
  <a href="/wiki/File:Bryggen_(6-2007).jpg">
    <img alt="" src="http://upload.wikimedia.org/wikipedia/commons/
       thumb/e/e3/Bryggen_%286-2007%29.jpg/180px-Bryggen_%286-2007%29.jpg"/>
  </a>
  <a href="/wiki/Bryggen">Bryggen</a> in Bergen is on the 
  <a href="/wiki/List_of_World_Heritage_Sites_in_Europe" 
      title="List of World Heritage Sites in Europe">list of UNESCO World Heritage Sites
  </a>.
</div>

This solution uses only 4 elements and 204 bytes. There is no magnifying glass and no container element around the caption. Also, the class names have been compressed from "thumb tleft w180" to "t-l-w180". The resulting presentation is similar to the proposed solution, but the relative gain is probably not worth the required efforts. Also, by removing the container element around the caption, the presentational flexibility has been reduced.

Conclusion

The table below compares Wikipedia's current code with the proposals described in this paper. The proposed solution is this author's recommendation; it is significantly smaller than the current code and it removes functionality which I consider redundant.

If it is deemed necessary to retain the magnifier icon and the hover effect, the css2-based solution is recommended. It is still simpler and more reusable than the current code.

elementsstyle attributesbytessize of currentmagnifier?caption hover effect?
current code101569100%yesyes
proposed solution6023942%nono
css3-based solution5025244%yesno
css2-based solution8036464%yesyes
bare minimum4020436%nono

Acknowledgements

Comments from Aryeh Gregor improved this paper.