How to edit DBpedia Mappings: Difference between revisions

From Mediawiki1
Jump to navigationJump to search
No edit summary
mNo edit summary
 
(19 intermediate revisions by 4 users not shown)
Line 8: Line 8:


In order to overcome the problems of synonymous attribute names and multiple templates being used for the same type of things, the DBpedia project maps Wikipedia templates as well as tables within an article to the [http://wiki.dbpedia.org/Ontology DBpedia ontology].  
In order to overcome the problems of synonymous attribute names and multiple templates being used for the same type of things, the DBpedia project maps Wikipedia templates as well as tables within an article to the [http://wiki.dbpedia.org/Ontology DBpedia ontology].  
These mappings are specified using the '''DBpedia Mapping Language'''. The mapping language makes use of MediaWiki templates that define DBpedia ontology classes and properties as well as template/table to ontology mappings.
These mappings are specified using the [https://github.com/dbpedia/extraction-framework/raw/master/core/doc/mapping_language/DBpedia_Mapping_Language.pdf DBpedia Mapping Language]. The mapping language makes use of MediaWiki templates that define DBpedia ontology classes and properties as well as template/table to ontology mappings.


The following mappings map English Wikipedia infoboxes and tables to this ontology. From the existing one, you can get a good idea of how they work:
The following mappings map English Wikipedia infoboxes and tables to this ontology. From the existing one, you can get a good idea of how they work:
Line 16: Line 16:




== Tools ==
== Tools and Resources ==
* '''Mapping Validator.''' When you are editing a mapping, there is a ''validate button'' on the bottom of the page. Pressing the button validates your changes for syntactic correctness and highlights inconsistencies such as missing property definitions.  
* '''Mapping Validator.''' When you are editing a mapping, there is a ''validate button'' on the bottom of the page. Pressing the button validates your changes for syntactic correctness and highlights inconsistencies such as missing property definitions. It checks if [http://mappings.dbpedia.org/validation/ your mappings conform to the DBpedia ontology], it is updated once per day.
* '''Extraction Tester.''' The extraction tester tests a mapping against a set of example Wikipedia pages. This gives you direct feedback about whether a mapping works and how the resulting data will look like.
* '''Extraction Tester.''' The extraction tester linked on each mapping page tests a mapping against a set of example Wikipedia pages. This gives you direct feedback about whether a mapping works and how the resulting data will look like.
* '''MappingTool.''' The [[MappingTool|DBpedia MappingTool]] is a graphical user interface that supports users to create and edit mappings.
* <s>'''MappingTool.''' The [[MappingTool|DBpedia MappingTool]] is a graphical user interface that supports users to create and edit mappings.</s>
* DBpedia Mapping [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/84eeec36fa5d/core/doc/mapping_language/ Language Specification] (detailed)
* [https://raw.githubusercontent.com/dbpedia/extraction-framework/master/core/doc/mapping_language/dbpedia_grammar.xml DBpedia Mapping Language Grammar]
 
* [https://github.com/dbpedia/extraction-framework/raw/master/core/doc/mapping_language/DBpedia_Mapping_Language.pdf DBpedia Mapping Language Design]


== How to map a Wikipedia Template ==
== How to map a Wikipedia Template ==
Line 37: Line 37:
When mapping a Wikipedia template to an ontology class and mapping template properties to ontology properties for this template, users will have to edit the corresponding template documentation page in MediaWiki.
When mapping a Wikipedia template to an ontology class and mapping template properties to ontology properties for this template, users will have to edit the corresponding template documentation page in MediaWiki.


The following templates cover the template to ontology schema mapping:
The following templates cover the template to ontology schema mapping (see [https://raw.githubusercontent.com/dbpedia/extraction-framework/master/core/doc/mapping_language/dbpedia_grammar.xml DBpedia Mapping Language Grammar] for the formal grammar):
* [[#Template_Mapping|TemplateMapping]] Mapping from Wikipedia templates to ontology classes.
* [[#Template_Mapping|TemplateMapping]] Mapping from Wikipedia templates to ontology classes.
* [[#Property_Mapping|PropertyMapping]] Mapping from Wikipedia template properties to ontology properties.
* [[#Property_Mapping|PropertyMapping]] Mapping from Wikipedia template properties to ontology properties.
Line 61: Line 61:
** A template property to ontology property mapping should list one ontology property.
** A template property to ontology property mapping should list one ontology property.
* ''templateProperty''  
* ''templateProperty''  
** A template property to ontology property mapping should list one template property which is to be mapped.  
** A template property to ontology property mapping should list one template property which is to be mapped.
* ''select''
** A selector used to map only one value from the list defined by the template property. Currently only 'first' and 'last' are admitted.
* ''unit''  
* ''unit''  
** If a template property containing a numerical value and a unit is mapped, the unit has to be defined. If a template property has no default unit defined, e.g. its values can contain different units of the same dimension, the dimension has to be defined for usability as well as validation reasons. Possible dimensions are [http://mappings.dbpedia.org/index.php/DBpedia_Datatypes Length or Mass].
** If a template property containing a numerical value and a unit is mapped, the unit has to be defined. If a template property has no default unit defined, e.g. its values can contain different units of the same dimension, the dimension has to be defined for usability as well as validation reasons. Possible dimensions are [http://mappings.dbpedia.org/index.php/DBpedia_Datatypes Length or Mass].
* ''factor''
** Multiplication factor that is applied for numeral data.


==== Intermediate Node Mapping ====
==== Intermediate Node Mapping ====
Line 85: Line 89:


* ''ontologyProperty'': A constant mapping should list one ontology property.
* ''ontologyProperty'': A constant mapping should list one ontology property.
* ''value'': A constant mapping should list one value. Dependening on the ontology property, if it is a object property or a datatype property, a URI or a literal is produced for this value.
* ''value'': A constant mapping should list one value. Dependening on the ontology property, if it is a object property or a datatype property, a URI or a literal is produced for this value. Please provide ''decoded'' URIs here, i.e. specify "Billy Murray (actor)" instead of <s>"Billy_Murray_%28actor%29"</s>.
* ''unit'': If the value contains a numerical value and a unit is mapped, the unit has to be defined (Please use only values from [http://mappings.dbpedia.org/index.php/DBpedia_Datatypes DBpedia unit and dimensions]). If a template property has no default unit defined, e.g. its values can contain different units of the same dimension, the dimension has to be defined for usability as well as validation reasons. Possible dimensions are Length or Mass.
* ''unit'': If the value contains a numerical value and a unit is mapped, the unit has to be defined (Please use only values from [http://mappings.dbpedia.org/index.php/DBpedia_Datatypes DBpedia unit and dimensions]). If a template property has no default unit defined, e.g. its values can contain different units of the same dimension, the dimension has to be defined for usability as well as validation reasons. Possible dimensions are Length or Mass.


==== Custom Mappings ====
For specific tasks, such as extracting durations or calculating a geo-location-ID based on multiple properties, we allow the DBpedia extraction framework to be extended with custom value parsers and allow the definition of DBpedia custom mapping templates. The name of a custom mapping template has to be equal to the name of the corresponding DBpedia parser class.


For specific tasks, such as extracting durations or calculating a geo-location-ID based on multiple properties, we allow the DBpedia extraction framework to be extended with custom value parsers and allow the definition of DBpedia custom mapping templates. The name of a custom mapping template has to be equal to the name of the corresponding DBpedia parser class.
==== DateIntervalMapping ====
As examples of custom mapping, we define the DateIntervalMapping and the GeocoordinatesMapping.


The [[Template:DateIntervalMapping|DateIntervalMapping]] template provides an exact mapping from start and end dates of a template property value to ontology properties. It offers the following template parameters:
[[Template:DateIntervalMapping]] provides an exact mapping from an interval in a single template property value to two ontology properties:
* ''templateProperty''
* ''templateProperty''
* ''startDateOntologyProperty''
* ''startDateOntologyProperty''
* ''endDateOntologyProperty''
* ''endDateOntologyProperty''


The [[Template:GeocoordinatesMapping|GeocoordinatesMapping]] template offers the following template parameters:
==== CombineDateMapping ====
[[Template:CombineDateMapping]] allows you to combine 2 or 3 template parameters into 1 ontoloogy datetime property.
* ''templateProperty1''
* ''unit1'': xsd:gMonthDay, xsd:gMonth, etc
* ''templateProperty2''
* ''unit2''
* ''templateProperty3''
* ''unit3''
* ''ontologyProperty''
 
==== GeocoordinatesMapping ====
[[Template:GeocoordinatesMapping]] allows you to map geo coordinates from 1, 2 or 8 template properties to one ontology property:
Use this if the geo coordinates are covered by 1 template property:
* ''coordinates''
* ''coordinates''
** Use the coordinates parameter if the geo coordinates are covered by one template property.


Use these if the geo coordinates are covered by 2 template properties:
* ''latitude''
* ''latitude''
* ''longitude''
* ''longitude''


Use these if the geo coordinates are covered by 8 template properties:
* ''latitudeDirection''
* ''latitudeDirection''
* ''latitudeDegrees''
* ''latitudeDegrees''
Line 114: Line 130:
* ''longitudeSeconds''
* ''longitudeSeconds''


Target ontology property to map to:
* ''ontologyProperty''
* ''ontologyProperty''


The [[Template:CombineDateMapping|CombineDateMapping]] template offers the following template parameters:
==== CalculateMapping ====
[[Template:CalculateMapping]] allows you to combine 2 template properties with an arithmetic operation (currently only "add" is supported)
* ''operation''
* ''templateProperty1''
* ''templateProperty1''
* ''unit1''
* ''unit1''
* ''templateProperty2''
* ''templateProperty2''
* ''unit2''
* ''unit2''
* ''templateProperty3''
* ''unit3''
* ''ontologyProperty''
* ''ontologyProperty''


The [[Template:CalculateMapping|CalculateMapping]] template offers the following template parameters:
=== Nesting of Templates ===
* ''operation''
Mapping templates have to be nested in the following way (see a realistic example in [[Template:ConditionalMapping#Example of Mapping Gender|Example of Mapping Gender]]):
* ''templateProperty1''
 
* ''unit1''
ConditionalMapping
* ''templateProperty2''
  Condition
* ''unit2''
    TemplateMapping (1)
* ''ontologyProperty''
    TableMapping    (2)
      IntermediateNodeMapping
        PropertyMapping  (3)
        ConstantMapping
        DateIntervalMapping
        CombineDateMapping
        GeocoordinatesMapping
        CalculateMapping
 
* (1) TemplateMapping is mandatory in an infobox mapping, since it defines the ontologyClass
* (2) [[Template:TableMapping|TableMapping]] may have the same nested templates as TemplateMapping
* (3) PropertyMapping will also very likely appear
* All the others are optional
 
Although you cannot nest ConditionalMapping, you can test for several properties. Conditions are tested in sequence, and the first match is executed. Eg at [http://mappings.dbpedia.org/index.php?title=Mapping_bg:Музикален_изпълнител&action=edit Mapping_bg:Музикален_изпълнител] (Musical artist) we first check for фон=група (background=group), which is mapped to class=Band. Then we check for наставка (whether the name has a suffix) and map to class=MusicalArtist and gender=Female. Otherwise we map to class=MusicalArtist and gender=Male:
<pre>
{{ Condition
  | templateProperty = фон
  | operator = contains
  | value = група
  | mapping = {{ TemplateMapping | mapToClass = Band }}
}}


{{ Condition
  | templateProperty = наставка
  | operator = isSet
  | mapping = {{ TemplateMapping | mapToClass = MusicalArtist | mappings =
    {{ConstantMapping | ontologyProperty = gender | value = http://dbpedia.org/resource/Female}}}}
}}
</pre>


== How to map a Wikipedia Table ==
== How to map a Wikipedia Table ==

Latest revision as of 13:45, 7 April 2017

In the spirit of open source projects, the idea of this wiki is to enable the interested public to contribute to the definition of DBpedia mappings by updating existing mappings and by adding new mappings to this wiki.

The type of Wikipedia content that is most valuable for the DBpedia extraction are infoboxes and tables. Infoboxes display an article's most relevant facts as a table of attribute-value pairs on the top right-hand side of the Wikipedia page.

As Wikipedia's infobox template system has decentrally evolved over time, different communities of Wikipedia editors use different templates to describe the same type of things (e.g. infobox_city_japan, infobox_swiss_town and infobox_town_de). Different templates use different names for the same attribute (e.g. birthplace and placeofbirth). As many Wikipedia editors do not strictly follow the recommendations given on the page that describes a template, attribute values are expressed using a wide range of different formats and units of measurement.

In order to overcome the problems of synonymous attribute names and multiple templates being used for the same type of things, the DBpedia project maps Wikipedia templates as well as tables within an article to the DBpedia ontology. These mappings are specified using the DBpedia Mapping Language. The mapping language makes use of MediaWiki templates that define DBpedia ontology classes and properties as well as template/table to ontology mappings.

The following mappings map English Wikipedia infoboxes and tables to this ontology. From the existing one, you can get a good idea of how they work:


Tools and Resources

  • Mapping Validator. When you are editing a mapping, there is a validate button on the bottom of the page. Pressing the button validates your changes for syntactic correctness and highlights inconsistencies such as missing property definitions. It checks if your mappings conform to the DBpedia ontology, it is updated once per day.
  • Extraction Tester. The extraction tester linked on each mapping page tests a mapping against a set of example Wikipedia pages. This gives you direct feedback about whether a mapping works and how the resulting data will look like.
  • MappingTool. The DBpedia MappingTool is a graphical user interface that supports users to create and edit mappings.
  • DBpedia Mapping Language Grammar
  • DBpedia Mapping Language Design

How to map a Wikipedia Template

  • Get the encoded template page name from Wikipedia. Make sure that the template is no redirect page.
  • Create a wiki page in this wiki in the Mapping namespace, using the encoded Wikipedia template page name.
  • Decide on the ontology class you would like to map the template to.
    • Example: Ontology classes belong to the Class namespace. A list of existing ontology classes can be found via the sidebar (Ontology Classes).
  • Write a Template:TemplateMapping or Template:ConditionalMapping to map the Wikipedia template to an ontology class and save it to the created wiki page in the Mapping namespace.

Template to Ontology Mapping Language

When mapping a Wikipedia template to an ontology class and mapping template properties to ontology properties for this template, users will have to edit the corresponding template documentation page in MediaWiki.

The following templates cover the template to ontology schema mapping (see DBpedia Mapping Language Grammar for the formal grammar):

  • TemplateMapping Mapping from Wikipedia templates to ontology classes.
  • PropertyMapping Mapping from Wikipedia template properties to ontology properties.
  • IntermediateNodeMapping For extracting multiple values from a single property it is necessary to introduce an intermediate node. The IntermediateNodeMapping allows to express mappings from Wikipedia template properties to ontology properties on an additional node and to connect the additional node to the mapped instance.
  • ConditionalMapping Maps templates to ontology classes. In comparison to a TemplateMapping the mapping can be defined depending on template properties and their values.
  • Custom mappings
    • To cover specific, more complex mapping cases, the DBpedia extraction framework can be extended with custom parsers which have to implement a specific PHP interface. These parsers are invoked using custom mappings.

Template Mapping

The TemplateMapping template offers the following template parameters:

  • mapToClass
    • Templates are mapped to ontology classes. The template parameter mapToClass allows one DBpedia ontology class as a value.
  • correspondingClass, correspondingProperty
    • In the case that different templates are used on the same page (for instance Automobile and Automobile Generation), the instance resulting from the second grade template (Automobile Generation) can be connected to the instance of the first grade template (Automobile) using a corresponding property. Thus, if an instance of type correspondingClass is found on the same page, it will be connected to the instances of the mapped template by correspondingProperty.
  • mappings
    • Mappings map template properties to ontology properties, they have to be defined by using PropertyMapping or IntermediateNodeMapping. Custom, user-defined, mappings like the GeocoordinatesMapping can also be defined.

Property Mapping

The PropertyMapping template offers the following template parameters:

  • ontologyProperty
    • A template property to ontology property mapping should list one ontology property.
  • templateProperty
    • A template property to ontology property mapping should list one template property which is to be mapped.
  • select
    • A selector used to map only one value from the list defined by the template property. Currently only 'first' and 'last' are admitted.
  • unit
    • If a template property containing a numerical value and a unit is mapped, the unit has to be defined. If a template property has no default unit defined, e.g. its values can contain different units of the same dimension, the dimension has to be defined for usability as well as validation reasons. Possible dimensions are Length or Mass.
  • factor
    • Multiplication factor that is applied for numeral data.

Intermediate Node Mapping

The IntermediateNodeMapping template offers the following template parameters:

  • nodeClass, correspondingProperty
    • Creates an additional node of the type nodeClass, which will be connected to the instance extracted from template by the property provided by correspondingProperty.
  • mappings
    • Mappings map template properties to ontology properties, they have to be defined by using PropertyMapping, IntermediateNodeMapping, or a CustomMapping.

Conditional Mapping

The ConditionalMapping template offers mapping templates to ontology classes. In comparison to a TemplateMapping the mapping can be defined depending on template properties and their values.

  • cases: Cases define conditions on template properties and their values and can change the default mapping, like the ontology class the template is mapped to and the ontology properties the template properties are mapped to. The cases template property should contain a list of Condition templates.
  • defaultMappings: The default mapping defines the default template property mappings using PropertyMapping etc.. The default ontology class the template is mapped to has to be defined by an otherwise condition.

Constant Mappings

The ConstantMapping template maps information that is only contained in the infobox title (other than the class information) to ontology properties.

  • ontologyProperty: A constant mapping should list one ontology property.
  • value: A constant mapping should list one value. Dependening on the ontology property, if it is a object property or a datatype property, a URI or a literal is produced for this value. Please provide decoded URIs here, i.e. specify "Billy Murray (actor)" instead of "Billy_Murray_%28actor%29".
  • unit: If the value contains a numerical value and a unit is mapped, the unit has to be defined (Please use only values from DBpedia unit and dimensions). If a template property has no default unit defined, e.g. its values can contain different units of the same dimension, the dimension has to be defined for usability as well as validation reasons. Possible dimensions are Length or Mass.

For specific tasks, such as extracting durations or calculating a geo-location-ID based on multiple properties, we allow the DBpedia extraction framework to be extended with custom value parsers and allow the definition of DBpedia custom mapping templates. The name of a custom mapping template has to be equal to the name of the corresponding DBpedia parser class.

DateIntervalMapping

Template:DateIntervalMapping provides an exact mapping from an interval in a single template property value to two ontology properties:

  • templateProperty
  • startDateOntologyProperty
  • endDateOntologyProperty

CombineDateMapping

Template:CombineDateMapping allows you to combine 2 or 3 template parameters into 1 ontoloogy datetime property.

  • templateProperty1
  • unit1: xsd:gMonthDay, xsd:gMonth, etc
  • templateProperty2
  • unit2
  • templateProperty3
  • unit3
  • ontologyProperty

GeocoordinatesMapping

Template:GeocoordinatesMapping allows you to map geo coordinates from 1, 2 or 8 template properties to one ontology property: Use this if the geo coordinates are covered by 1 template property:

  • coordinates

Use these if the geo coordinates are covered by 2 template properties:

  • latitude
  • longitude

Use these if the geo coordinates are covered by 8 template properties:

  • latitudeDirection
  • latitudeDegrees
  • latitudeMinutes
  • latitudeSeconds
  • longitudeDirection
  • longitudeDegrees
  • longitudeMinutes
  • longitudeSeconds

Target ontology property to map to:

  • ontologyProperty

CalculateMapping

Template:CalculateMapping allows you to combine 2 template properties with an arithmetic operation (currently only "add" is supported)

  • operation
  • templateProperty1
  • unit1
  • templateProperty2
  • unit2
  • ontologyProperty

Nesting of Templates

Mapping templates have to be nested in the following way (see a realistic example in Example of Mapping Gender):

ConditionalMapping
  Condition
    TemplateMapping (1)
    TableMapping    (2)
      IntermediateNodeMapping
        PropertyMapping  (3)
        ConstantMapping
        DateIntervalMapping
        CombineDateMapping
        GeocoordinatesMapping
        CalculateMapping 
  • (1) TemplateMapping is mandatory in an infobox mapping, since it defines the ontologyClass
  • (2) TableMapping may have the same nested templates as TemplateMapping
  • (3) PropertyMapping will also very likely appear
  • All the others are optional

Although you cannot nest ConditionalMapping, you can test for several properties. Conditions are tested in sequence, and the first match is executed. Eg at Mapping_bg:Музикален_изпълнител (Musical artist) we first check for фон=група (background=group), which is mapped to class=Band. Then we check for наставка (whether the name has a suffix) and map to class=MusicalArtist and gender=Female. Otherwise we map to class=MusicalArtist and gender=Male:

{{ Condition
  | templateProperty = фон
  | operator = contains
  | value = група
  | mapping = {{ TemplateMapping | mapToClass = Band }}
}}

{{ Condition
  | templateProperty = наставка
  | operator = isSet
  | mapping = {{ TemplateMapping | mapToClass = MusicalArtist | mappings =
    {{ConstantMapping | ontologyProperty = gender | value = http://dbpedia.org/resource/Female}}}}
}}

How to map a Wikipedia Table

Table mappings apply to tables containing a set of keywords in the table header. If a table mapping is defined, all rows of the table are mapped to instances of an ontology class, all of its columns are be mapped to ontology properties.

To map a table:

  • Find important keywords in the table header that identify a table unambiguously.
  • Create a wiki page in this wiki in the Mapping namespace, using the Table prefix, or use an existing table mappings wiki page. You can define more than one table mapping on one wiki page. The wiki page name doesn't have to refer to any of the table keywords. Bundling table mappings depending on the table topic could be of use.
    • A list of existing table mappings can be found via the sidebar (Table Mappings).
  • Decide on the ontology class you would like to map the table to.
    • A list of existing ontology classes can be found via the sidebar (Ontology Classes).
  • Write a Template:TableMapping to map the Wikipedia table rows to an ontology class and save it to the created wiki page in the Mapping namespace.


Examples

TODO

Mapping:Infobox_actor

One TemplateMapping with class mapping to http://dbpedia.org/ontology/Actor and two PropertyMappings.

{{TemplateMapping 
| mapToClass = Actor 
| mappings = 
   {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }}
   {{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }}
}}


Create new mappings

To create a new mapping, type the following line into your web browser

http://mappings.dbpedia.org/index.php/Mapping_LANGUAGE:INFOBOXNAME
  • replace LANGUAGE by the language code you are currently working on (for example mt for Maltese)
  • replace INFOBOXNAME by the box that you want to create a mapping for (replace spaces with underscores)

e.g.

http://mappings.dbpedia.org/index.php/Mapping_mt:Infobox_album

for the Album infobox on the Maltese Wikipedia.

If there is no mapping for this box yet, you will see a page saying "There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page." On the top you can click on "create" and start writing the mapping.