A proposal to simplify and improve geo markup in Wikipedia

Håkon Wium Lie

Affiliations: CTO, Opera Software; Chairman, YesLogic

Abstract: Wikipedia's HTML code for representing geographical coordinates in English articles is analyzed. The current code is verbose and is not always recognized as using the Geo microformat. Three alternatives, differing on functionality and code size, are suggested as replacements.

Date: 2009-04-06

This case study is part of a series.

Many Wikipedia articles describe real-world places that are listed with their geographical coordinates. Geographical coordinate consist of latitude and longitude. There are two different notations for latitude and longitude that are relevant to this paper: the DMS (degree-minute-second) system represents latitude/longitude as three separate integers, and the DEC (decimal) system represents latitude/longitude as one decimal number.

As an example, I will use the geographical coordinates for Oslo, as found in Wikipedia's article of Norway. In the article, Oslo's coordinates are expressed in DMS:

|capital            = [[Oslo]]
|latd=59 |latm=56 |latNS=N |longd=10 |longm=41 |longEW=E

In the HTML version of the article, which Wikipedia's servers send to browsers, the code has been expanded into:

<td><a href="/wiki/Oslo" title="Oslo">Oslo</a><br /> <small><span style="white-space:nowrap;"><span class="plainlinksneverexpand"><a href="http://stable.toolserver.org/geohack/geohack.php?pagename=Norway&params=59_56_N_10_41_E_type:country(385,252)" class="external text" title="http://stable.toolserver.org/geohack/geohack.php?pagename=Norway&params=59_56_N_10_41_E_type:country(385,252)" rel="nofollow"><span class="geo-default"><span class="geo-dms" title="Maps, aerial photos, and other data for this location"><span class="latitude">59°56′N</span> <span class="longitude">10°41′E</span></span></span><span class="geo-multi-punct"> / </span><span class="geo-nondefault"><span class="geo-dec" title="Maps, aerial photos, and other data for this location">59.933°N 10.683°E</span><span style="display:none"> / <span class="geo">59.933; 10.683</span></span></span></a></span></span></small></td>

Here is a more structured presentation of the same code where non-ascii characters have been expanded into numerical character references:

<a href="/wiki/Oslo" title="Oslo">Oslo</a><br />
<small>
  <span style="white-space:nowrap;">
    <span class="plainlinksneverexpand">
      <a href="http://stable.toolserver.org/geohack/geohack.php?pagename=Norway
           &params=59_56_N_10_41_E_type:country(385,252)" class="external text" 
           title="http://stable.toolserver.org/geohack/geohack.php?pagename=Norway
           &params=59_56_N_10_41_E_type:country(385,252)" rel="nofollow">
        <span class="geo-default">
          <span class="geo-dms" 
                title="Maps, aerial photos, and other data for this location">
            <span class="latitude">59&#xB0;56&#x2032;N</span> 
            <span class="longitude">10&#xB0;41&#x2032;E</span>
          </span>
        </span>
        <span class="geo-multi-punct">&#xFEFF; / &#xFEFF;</span>
        <span class="geo-nondefault">
          <span class="geo-dec" 
               title="Maps, aerial photos, and other data for this location">
               59.933&#xB0;N 10.683&#xB0;E
          </span>
          <span style="display:none">&#xFEFF; / 
            <span class="geo">59.933; 10.683</span>
          </span>
        </span>
      </a>
    </span>
  </span>
</small>

This code is not optimal. Among the problems are:

In this paper, I present three alternative solutions.

Current code vs. the alternatives

It is quite easy to simply and improve the current HTML markup. In general, by relying more on CSS, the HTML code can be simplified and improved. The question is: how much should one try to simplify the code? The table below compares the current code with the three alternative solutions. Each alternative is discussed in more detail after the table.

code in actionelementsstyle attributesbytessupports geo
microformat
user-selectable
DMS/DEC presentation?
hover effects?
Current code Oslo
59°56′N 10°41′E / 59.933°N 10.683°E / 59.933; 10.683
14 1 798 no yes yes
Oslo
59°56′N 10°41′E / 59.933°N 10.683°E / 59.933; 10.683
Oslo
59°56′N 10°41′E / 59.933°N 10.683°E / 59.933; 10.683
Alternative 1: DMS, DEC, or both Oslo 10 0 408 yes yes, using Wikipedia's current syntax yes
Oslo
Oslo
Alternative 2: DMS, DEC, or both – using CSS generated content 5 0 248 yes yes, using CSS generated content yes
Alternative 3: DEC only Oslo 5 0 179 yes no no

In the byte-size comparisons, the length of the URLs have not been counted.

Alternative 1: DMS, DEC, or both

The first alternative provides all the functionality of Wikipedia's current markup, and more. The added feature is the use of the Geo microformat. Some browser configurations can pick up this information. For example, the Operator extension will enable users to navigate to various map services based on coordinates encoded in the Geo microformat. (Wikipedia's current code show some signs of being influenced by Geo at some point – most notably from the "geo", "latitude" and "longitude" class names – but the code is currently not recognized by Geo implementations.

Update 2009-04-07: It appears that the Operator extenstion can extract geographical coordinates from the current markup if the "Show hidden microformats" is set.

<a href="/wiki/Oslo" title="Oslo">Oslo</a>
<div class="geo plainlinksneverexpand">
  <a href="http://stable.toolserver.org/geohack/geohack.php?pagename=Norway
       ¶ms=59_56_N_10_41_E_type:country(385,252)">
    <span class="geo-dms">
      <abbr class="latitude" title="59.933">59°56′N</abbr>
      <abbr class="longitude" title="10.683">10°41′E</abbr>
    </span>
    <span class="geo-multi-punct"> / </span>
    <span class="geo-dec">
      <span class="latitude">59.933</span>°N
      <span class="longitude">10.683</span>°E
    </span>
  </a>
</div>

This alternative also supports user-selectable presentation of coordinates. That is, a user can add a small CSS fragment to his personal style sheet to select between DMS or DEC presentation, or both. The personal style sheet can reside in Wikipedia, or in the browser. Wikipedia has introduced a certain syntax based on class names, and this alternative is compatible with the existing syntax.

PresentationCode added to personal style sheet
Oslo
.geo-default { display: inline }
.geo-nondefault { display: inline }
.geo-dec { display: none }
.geo-dms { display: inline }
.geo-multi-punct { display: none }
Oslo
.geo-default { display: inline }
.geo-nondefault { display: inline }
.geo-dec { display: inline }
.geo-dms { display: none }
.geo-multi-punct { display: none }
Oslo None

Alternative 2: DMS, DEC, or both – using CSS generated content

This alternative uses simpler HTML code than alternative 1. The main reason for the smaller size is that coordinates are not duplicated; DEC coordinates appears in attributes and DMS coordinates appear in the content of elements:

<a href="/wiki/Oslo" title="Oslo">Oslo</a>
<div class="geo plainlinksneverexpand">
  <a href="http://stable.toolserver.org/geohack/geohack.php?pagename=Norway
       ¶ms=59_56_N_10_41_E_type:country(385,252)">
    <abbr class="latitude" title="59.933">59°56′N</abbr>
    <abbr class="longitude" title="10.683">10°41′E</abbr>
  </a>
</div>

To present the DEC values that appear in the title attribute, CSS generated content and the attr() functional notation must be used. This way, the presentation can be personalized beyond what can be achieved in Wikipedia's current code and alternative 1. Here are some of the possibilities:

PresentationCode added to personal style sheet
None
.geo abbr.latitude::before { content: "latitude: " attr(title) ", " }
.geo abbr.longitude::before { content: "longitude: " attr(title) }
.geo abbr { content: "" }
.geo abbr.latitude::after { content: " (" attr(title) ") " }
.geo abbr.longitude::after { content: " (" attr(title) ") " }

It should be noted that CSS generated content is not supported in some legacy browsers, including FF2, IE6, and IE7. However, the default DMS-based presentation works fine for most users.

Alternative 3: Minimized decimal presentation

This is a minimalist alternative that only shows DEC coordinates.

<a href="/wiki/Oslo" title="Oslo">Oslo</a>
<div class="geo plainlinksneverexpand">
  <a href="http://stable.toolserver.org/geohack/geohack.php?pagename=Norway
       ¶ms=59_56_N_10_41_E_type:country(385,252)">
    <span class="latitude">59.933</span>°N
    <span class="longitude">10.683</span>°E
  </a>
</div>

Conclusion

This author recommends that alternative 1 is implemented immediately in Wikipedia. In the longer term, alternative 2 will produce both more compact pages and more flexible presentations. Unless the current syntax for personalized style sheets is frozen, alternative 2 should be a long-term goal.

Appendix 1: plainlinksneverexpand?

The class name "plainlinksneverexpand" is listed as a problem in the introduction, but the problem is not addressed in the alternative solutions. I think the name is long and incomprehensible, and I would probably suggest a shorter name – or perhaps remove it.

For reference purposes, here is the list of rules that apply to this class name:

.plainlinksneverexpand { 
  background: transparent !important;
  padding: 0 !important;
}
.plainlinksneverexpand .urlexpansion { 
  display: none !important;
}
.plainlinksneverexpand a { 
  background: transparent !important;
  padding: 0 !important;
}
.plainlinksneverexpand a.external.text::after { 
  display: none !important;
}
.plainlinksneverexpand a.external.autonumber::after { 
  display: none !important;
}

The CSS code quoted above is from:

<link rel="stylesheet" href="/w/index.php?title=MediaWiki:Common.css&usemsgcache=yes&ctype=text%2Fcss&smaxage=2678400&action=raw&maxage=2678400" type="text/css" />

Appendix 2: The interactive map

In real Wikipedia pages, Javascript is used to add an interactive map. The map is accessed by clicking on a small globe icon that appears in front of the numerical coordinates when Javascript is running. It is possible to add interactive maps to all the alternative solutions presented in this paper.

I have not analyzed the Javascript code. For reference purposes, i describe briefly where to find the Javascript code. This script element is found in the HTML source:

<script type="text/javascript" src="/w/index.php?title=-&action=raw&gen=js&useskin=monobook"><!-- site js --></script>

In the script, this code is found:

/** WikiMiniAtlas *******************************************************
  *
  *  Description: WikiMiniAtlas is a popup click and drag world map.
  *               This script causes all of our coordinate links to display the WikiMiniAtlas popup button.
  *               The script itself is located on meta because it is used by many projects.
  *               See [[Meta:WikiMiniAtlas]] for more information. 
  *  Maintainers: [[User:Dschwen]]
  */

if (wgServer == "https://secure.wikimedia.org") {
  var metaBase = "https://secure.wikimedia.org/wikipedia/meta";
} else {
  var metaBase = "http://meta.wikimedia.org";
}
importScriptURI(metaBase+"/w/index.php?title=MediaWiki:Wikiminiatlas.js&action=raw&ctype=text/javascript&smaxage=21600&maxage=86400")

Finally, this URL, which contains 15190 bytes of Javascript code, is loaded:

http://meta.wikimedia.org/w/index.php?title=MediaWiki:Wikiminiatlas.js&action=raw&ctype=text/javascript&smaxage=21600&maxage=86400