Mapping Guide

From Mediawiki1
Revision as of 15:52, 10 May 2011 by Kreis (talk | contribs)
Jump to navigationJump to search

Dear fellow Mappers, one benefit of the DBpedia ontology is to standardise and reduce the properties in use of entities. At the moment, the DBpedia ontology is starting to inflate with equal properties. The ontology is getting unclear and the benefits of standardisation get lost. For instance, there are the ontology properties: dateClosed, closingDate, closed, dateOfAbandonment and dissolved. All these properties describe the same or at least nearly the same. For example a closure of a firm, closing a road, decommissioning of facilities, or an abandonment of a project. It seems that there is a need for a short guide how to write mappings and take care of the usefulness of the ontology.

Check redirects for your infobox

If you have found an infobox that isn't mapped already, check whether the infobox is redirected to another. If it is so, check whether that infobox is already mapped. If it is not, create a mapping for the infobox to which is redirected, not for the one that redirects to another.

Read template documentation

Take the template documentation of the infobox that you want to map as your source for property definitions. It can be found at the Wikipedia page of the template. See at the template documentation of the Infobox China station for instance. Of course, not all templates have a adequate documentation. So, if your infobox hasn't one, the following points become even more important.

Check for similar mappings

A helpful hint. Check for already mapped infoboxes that describe similar things. Example: If you want to map the "Infobox China station", the mappings for "Infobox station" or "Infobox japan station" are really helpful. Do not just copy and paste, but take a look at properties that are equal or similar to properties used in your infobox to map. You can find similar infoboxes via the Wikipedia categories. The most template documentation pages have links to that categories at their bottom.

Map the properties

Please spend some research effort into this issue.

Get an overview of the property values

You should have an overview of the values of the infobox property that you want to map. For this issue a short SPARQL query is really helpful. Go to http://dbpedia.org/sparql and enter the following query:

SELECT DISTINCT * WHERE 
{
?s  <http://dbpedia.org/property/platform>  ?o.
?s  <http://dbpedia.org/property/wikiPageUsesTemplate>  <http://dbpedia.org/resource/Template:Infobox_china_station>.
}

Instead of platform you enter the name of your infobox property. Consider that spaces and underscores are removed and compound words are camelCase. Instead of Infobox_china_station you enter your infobox for which you are just writing a mapping. The current DBpedia version can already be outdated, therefore you have to consider recent redirects. The "Infobox china station" now redirects to "Infobox China station" for example. If your query do not deliver results, try a simple property that is mostly used in the infobox like "name" for instance. So you can check whether your query is correct. Otherwise, check the infobox history for redirects and try other variations of the infobox name. From the results, you know what kind of values the property holds.

Search for ontology properties

Search for ontology properties not only via the left-hand search box in the Wiki-menu, but via the Ontology Properties link in the menu. Consider that you can not just search for "date" and all properties that include "date" in their name or label are displayed. You will only get the properties that start with the term "date", so the property closingDate is not in the results. The search function of the Wiki is not sufficient at all. Therefore, do not rely upon the search results in the moment (btw. do you know a good Wikimedia search extension?). If you have found a possible ontology property for the infobox property, check out the "What links here"-link of the Wiki and compare the already mapped infobox properties with the one you want to map to that ontology property. Do they describe same things? Note that some of the already written mappings can be inaccurate. If you found inconsistency, add your concerns to the discussion page of the inaccurate mapping, or change it, if it's an unambiguous error.

Create new ontology properties

If you have an infobox property that definitely can not be mapped to an existing ontology property, you can create a new one. But please stick to some simple rules:

Naming conventions

The name of the new property should not just copied from the infobox. Better take a look at the template documentation and the property definition if there is one. If not, take a look at a few Wikipedia articles that uses the infobox you want to map, or revert to the SPARQL Query above, and check how the property is used. If the property is used for numbers, it should be considered for the name of the new ontology property by adding a prefix like "numberOf", or if the property is used for dates, the term "date" should be part of the new ontology property name. Generally, the property name should be build from more than one word.

Domain

Take care by defining a domain and a range of properties. Do not just define them as owl:Thing only because it is simple. If your property is especially for an ontology class, do not hesitate to define this class as domain. That will prevent people to reuse this property for other classes by mistake, especially if the property name is not unambiguous.

Range

The range of the property should be defined by considering the property values and the infobox definition. Some infobox properties hold different data types or patterns of values, because the infobox property is not clearly defined in the template documentation. Therefore, Wikipedia authors use that property as they want. That makes it difficult for us to define the property's range. If a range is defined in the infobox definition, generally stick to that range. I found infobox properties with a range defined, but as I checked the values, I had to discover that the property values mostly disagree with the defined range. In such a case, chose a range that covers the property values and leave a note in the property comment. If the infobox property has no range defined, you always have to look at the values. For example, you have to weigh up to chose between a strict object property with an ontology class as range or a data type property with xsd:string as range. A string would catch more information, but a object property is the clearer definition. You can motivate your decision in the property comment.

Comments

Please add an English comment to the ontology property. If the template documentation has a definition of the property, copy it to the comment. A short description of the property or a definition of the property values is really helpful for other people, which have to decide whether this property can be used for their mapping. Two examples for good comments: OntologyProperty:IsHandicappedAccessible and OntologyProperty:EffectiveRadiatedPower.

Validate the infobox mapping

Validate your mapping. Use the "Test this mapping"-link at the mapping page. Especially, check properties that you have created yourself. If you found a good example Wikipedia article that uses the infobox you just mapped, you can make a test extraction here (Adjust the language tag in the end of the URL).

General instructions

Please try to minimize the amount of edits. First write the whole mapping before committing it. That helps other people to keep track of the edits.

Generally, if you found unclear or doubled ontology properties, do not hesitate to create a discussion page for this property and note your questions or objections about this property. Help us to keep the ontology clean and useful.