Geotagging Web Pages and RSS Feeds
My previous article discussed how geolocation by IP address enables applications and Web sites to determine users‘ locations automatically in order to provide specific location-based services to users and members of an on-line community. In this article, we present various methods by which Web sites can provide their geographic locations to static pages and syndicated feeds, in the form of meta information or geotags. Put another way, geolocation by IP address is the technique a Web site uses to determine where users are located; geotagging is the technique users employ to find out where a Web site is located.
Geotags typically locate the Web site’s principle location on the Earth. This information can include latitude and longitude information for exact locations or simply city, region and country information for general locations. Web services, applications and users then can query this information to obtain directions (how to get from here to there), locality (what’s near there) or context (where was this article written). Geotags differ from a simple address in that they usually are encoded in metadata and are not visible as part of the Web page. Furthermore, by following a standard, other services easily and reliably can find these geotags. Various semantic Web projects still are solidifying geospatial tagging standards, but several techniques already have become common and supported. This article presents these current techniques.
Providing a geographic location is beneficial particularly for retail and service businesses, tourist attractions and entertainment venues. Geographic link directories, such as A2B and Multimap, can index these services by location and allow users to search geographically as well as by service type. Currently, many of these services limit users in their selection of available services. But, it would be possible to allow for more complex queries, such as searching for „Italian restaurants within 2 miles of downtown Arlington, Virginia“. Or, when using automatic geolocation, one could ask for „directions from my current location to the nearest theater“.
Current location-based services rely on the Web site administrator registering with an on-line index and specifying its location. Some of these services charge a fee, and many are not used commonly, nor are they cross-referenced. Google is now beta-testing Google Local, a free engine that allows users to search for location-based services using complex queries such as the examples above. Unfortunately, Google Local probably uses a form of scraping instead of meta information, considering that the 2002 Google Programming Contest winner’s entry, by Daniel Egnor, is a geographic search which: „includes a geocoder (… to turn street addresses into latitude/longitude coordinates), a simple indexer that looks for addresses and keywords in documents, and a query engine to search for documents matching certain keywords that also contain addresses within a certain distance of a target location“. Still, Google Local is an excellent example of how providing geographic information on a Web site greatly can enhance its visibility and usefulness to potential customers and users.
Geographic metadata also is useful for bloggers and photographers. Traveling writers, travel writers and reviewers can give context to their articles by supplying specific geographic information about where they are writing from or where the business they are reviewing is located. Photographers can provide viewers with information necessary for better understanding the photograph by informing them of where it was taken. Environmental services are now beginning to offer syndication feeds for weather and earthquakes. By geotagging these feeds, aggregators could sort, search and display information by region and location. As a result, users would gain a better picture of the current events happening in their areas.
By embedding a geographic location in the metadata of the Web site, applications and Web-based services quickly and reliably can determine the site’s location relative to search criteria. Using metadata prevents the confusion of an automated search bot having to determine the location from the site’s text. The rest of this article discusses the techniques used for embedding geographic information in your Web site or syndicated feed.
For a Web site, several means of geotagging are available. My previous article explained how to embed the site’s geographic information in its DNS entry. Other options also allow this information to be placed within a site or each individual page. These are the older ICBM tags and the newer, more generic geo-structure tags.
The ICBM (original acronym is Intercontinental Ballistic Missile) tags derive from a more historical application, as described in the AntiOnline jargon dictionary:
(Also ‚missile address‘) The form used to register a site with the Usenet mapping project, back before the day of pervasive Internet, included a blank for longitude and latitude, preferably to seconds-of-arc accuracy. This was actually used for generating geographically-correct maps of Usenet links on a plotter; however, it became traditional to refer to this as one’s ‚ICBM address‘ or ‚missile address‘, and some people include it in their sig block with that name. (A real missile address would include target elevation.)
ICBM tags are limited to latitude and longitude and do not include other regional information, such as city or country. From Matt Croydon’s PostNeo, the RFC (request for comment) of the syntax is as follows:
<meta name='ICBM' content="latitude, longitude" />
This tag would be included in your Web page’s <head> section.
Another means of embedding geographic metadata is through geo-structure tags. These geo-structure tags can include latitude and longitude information as well as regional information and an extra placename. The placename could contain the specific address of the person or business. Or, it could be useful for providing a location that may not have a specific point but covering a broader region, such as a city or district. The following example is for the Museo Nacional Del Prado, in Madrid, Spain.
<meta name="geo.position" content="40.4157;-3.6947" /><meta name="geo.region" content="ES-M"><meta name="geo.placename" content="Paseo del Prado">
I obtained the geo.position information using Multimap, given the address provided on the museum’s Web page. The geo.region uses the ISO-3166-1 Country Names and Region Names specifications. For the US and Canada, this is the abbreviation for the state or province; it varies in other countries. All together, the code for my Web site looks like this:
<meta name="DC.title" content="High Earth Orbit" /><meta name="ICBM" content="42.4266, -83.49307" /><meta name="geo.position" content="42.4266;-83.49307" /><meta name="geo.region" content="US-MI"><meta name="geo.placename" content="Northville">
As mentioned in the introduction, Multimap provides a location service that offers hotels located within the area of the maps you have selected. This service is provided by the Accommodation Search Engine Network. The Web pages on the ASE site employ the use of commented-out tables to hide the geographic information. The Multimap Web service then pulls this information from each of the hotels‘ Web pages or backend databases. This method is not recommended, however, because it is not standard and is difficult to use for applications and other Web services. An example of this block is shown below:
<!--<a name="geo"></a><br><table> <tr class="bg2"> <td class="head2">Geographical Information:</td> </tr> <tr class="bg1"> <td class="text"> <span>Accuracy:</span>Property<br> <span>Lat:</span>39.894337<br> <span>Long:</span>-83.807657<br> <span>TimeZone:</span>Eastern<br> </td> </tr></table>-->
Besides geotagging a Web site, it is possible to geotag the source of an RSS feed as well as the individual articles. By geotagging each article, your feed can provide entries from various locations. Then, these entries can be displayed on a map where users can read about locations that interest them. Alternatively, by geotagging the source of the feed, a directory or opml file could provide feeds based on user-selected locations.
The Resource Description Framework Interest Group (RDFIG) has published a solidifying standard for the geospatial vocabulary. It includes specifying latitude, longitude and, optionally, altitude. This is similar to the geo tags discussed above, with small syntactic differences. Furthermore, the RDF requires you to specify an XML namespace (xmlns) for the WGS84 geodetic reference datum.
<rdf:RDF > <geo:Point> <geo:lat>55.701</geo:lat> <geo:long>12.552</geo:long> <geo:alt>52.4</geo:alt> </geo:Point></rdf:RDF>
The ICBM standard discussed above also can be used in tagging an RSS feed. Again, an XML namespace is used to specify the keywords of the file, and the tags are included either in the header or within the item tags. Here is example from the USGS Earthquake feed of events over 2.5 on the Richter scale in the last 7 days:
<rss version="2.0" ><item> <title>M 3.7, Southern Alaska</title> <description>January 02, 2005 03:55:52 GMT</description> <link>http://earthquake.usgs.gov/recenteqsww/Quakes/ak00043775.htm</link> <icbm:latitude>60.4780</icbm:latitude> <icbm:longitude>-152.4355</icbm:longitude> <dc:subject>3</dc:subject> <dc:subject>pasthour</dc:subject></item>
Finally, some Weblog services may prevent users from adding new tags to RSS feeds. In this case, it is acceptable for some sites and packages to embed the geographic information in <dc:subject> tags, as shown below:
<?xml version="1.0"?><rss version="2.0" ><channel> <item> <title>Example Title</title> <link>http://site.com/geo</link> <description>Example Description</description> <dc:subject>geo:lat=33.00 geo:long=-44.54</dc:subject></item> </channel> </rss>
Several Weblog packages already incorporate the ability to specify a geographic location within an entry as well as for the entire Weblog. This geographic information then can be included for users when reading the Weblog through their browsers or through their own aggregators. Each entry, when posted, is assigned either a default location or is given a new location. The following is an example of the geotagged RSS feed generated by WordPress v1.2:
<?xml version="1.0" encoding="utf-8"?><rss version="2.0" ><channel> <title>High Earth Orbit</title> <link>http://highearthorbit.com</link> <description>LinuxJournal Example</description> <icbm:latitude>42.4266</icbm:latitude> <icbm:longitude>-83.4931</icbm:longitude> <copyright>Copyright 2005</copyright> <pubDate>Wed, 05 Jan 2005 18:54:47 +0000</pubDate> <generator>http://wordpress.org/?v=1.2</generator><item> <title>Sample Title</title> <link>http://highearthorbit.com/index.php?p=121</link> <pubDate>Wed, 05 Jan 2005 18:54:47 +0000</pubDate> <category>Sample Category</category> <guid>http://highearthorbit.com/index.php?p=121</guid> <description>Sample Description</description> <geo:Point> <geo:lat>42.5021</geo:lat> <geo:long>-83.1454</geo:long> </geo:Point> <icbm:latitude>42.5021</icbm:latitude> <icbm:longitude>-83.1454</icbm:longitude></item></channel></rss>
For more information on other particular Weblogs, check out the Worldkit Documentation.
Now that your Web site has been geotagged, what can you do to share this information with users and have new users find your site? A2B is the new incarnation of the defunct geourl.com. A2B allows Web site administrators to register their sites. From there, users can search for sites based on location or geographic locality to another Web site. It may be interesting to find out what other sites and places can be found in your area.
A2B also provides a free public API that allows application and Web site developers to query the A2B database of locations. The A2B query does not return the actual location of the Web sites, however, merely their distances and directions (compass headings) from the queried location.
To find out the latitude and longitude or city and region of a Web site, the user can view the Web site’s meta information. To illustrate this, we have written an extension to the Firefox browser that alerts users that geotags are available for the Web site currently being viewed. The extension also retrieves that information without the user having to look at the Web site’s markup source. Download and install the extension to try it out for yourself.
Another index to check out is WorldPress. For RSS feeds, MapBureau and Michael Maron’s WorldKit Mapper have on-line mapper applications that parse out the locations from your feed and display them on a map. It then is possible to embed a link to a map of your feed in your Web site.
Other applications of geotags include creating a Web page of closely related Web sites, similar to a Web ring, and display their locations on a map of the Earth or a specific region. A restaurant review Web page, for example, could display a map of their reviewing regions, and users could click on locations to read reviews of the restaurants located there. Furthermore, travelers could pull up Weblogs and travel information for the area they will be visiting. Hopefully, larger services similar to Google Local or Multimap will be developed that automatically will collect and use this information to provide users with a large database of services.
Geotags currently are not employed widely, and only a small number of services support their use. However, many could benefit from better geographic knowledge of Web sites and on-line data. Applications could provide a central location to assist users in finding out about their locations or intended travel locations. In order for this to occur, a better standardization of geospatial metadata must be created, utilized and supported by the Internet community. The W3C Semantic Web is such an effort to standardize the extension of Web data. Many groups across the globe are working together to create enhanced definitions (see Resources). Part of these efforts is defining a complete standard for geospatial tagging and for supporting other location-based services. With this work, the future of geotagging will provide better integration between the digital world and the physical world.