Main Page: Difference between revisions

From Mediawiki1
Jump to navigationJump to search
No edit summary
 
(58 intermediate revisions by 10 users not shown)
Line 1: Line 1:
== About DBpedia ==
== DBpedia Mappings Wiki ==
 
In this DBpedia Mappings Wiki you can help to enhance the information in DBpedia. The DBpedia Extraction Framework uses the mappings defined here to homogenize information extracted from Wikipedia before generating structured information in [http://en.wikipedia.org/wiki/Resource_Description_Framework RDF].
 
Anybody can help by editing:
* the [[How_to_edit_the_DBpedia_Ontology|DBpedia ontology schema]] (classes, properties, datatypes)
* the [[How_to_edit_DBpedia_Mappings|DBpedia infobox-to-ontology mappings]]
 
Mappings can be written for a variety of languages, connecting multiligual information to a language-independent unified ontology schema (language-specific labels can be provided [[How_to_edit_the_DBpedia_Ontology|there]]).
 
 
== Mapping Example ==
This is how you write a simple infobox mapping.
 
'''Mapping:Infobox_actor'''
<pre>
{{TemplateMapping
| mapToClass = Actor
| mappings =
  {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }}
  {{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }}
}}
</pre>
 
This mapping extracts three information bits:
# the type information (Actor)
# the name of the actor
# the actor's place of birth.
 
Therefore, three RDF triples for each Infobox_actor in the English Wikipedia are extracted. For example for [http://en.wikipedia.org/w/index.php?title=Vince_Vaughn&oldid=437756176 Vince Vaughn]
<pre>
dbpedia:Vince_Vaughn  rdf:type                dbpedia-owl:Actor  .
dbpedia:Vince_Vaughn  foaf:name              "Vince Vaughn"@en  .
dbpedia:Vince_Vaughn  dbpedia-owl:birthPlace  dbpedia:Minneapolis .
</pre>
 


[http://dbpedia.org/ DBpedia] is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. The [http://wiki.dbpedia.org/Datasets DBpedia knowledge base], which has been created by extracting stuctured information from Wikipedia, currently describes more than 2.9 million things, including at least 282,000 persons, 339,000 places (including 241,000 populated places), 88,000 music albums, 44,000 films, 15,000 video games, 119,000 organizations (including 20,000 companies and 29,000 educational institutions), 130,000 species and 4400 diseases.
== Detailed Information ==
* Check the '''[[Mapping Guide]]''' that defines the best practices for how to write clean, efficient mappings that extract lots of high-quality data
* Take a look at the '''[[Mapping_Statistics|Mapping Statistics]]''' to search for relevant infoboxes to map.
* '''[[How_to_edit_the_DBpedia_Ontology|How to edit the DBpedia ontology]]'''
* '''[[How_to_edit_DBpedia_Mappings|How to edit infobox and table mappings]]'''
* [[Use the DBpedia Extraction Framework]] to extract structured data


== About mappings.dbpedia.org ==
== Prerequisites ==
This wiki contains the infobox-to-ontology and the table-to-ontology mappings which are used by the DBpedia extraction framework as well as the ontology definition itself. The framework collects the templates defined in this Wiki and extracts the Wikipedia content according to them (As of March 2010, only the dump extraction uses the mappings. DBpedia Live is going to follow shortly).
If you would like to edit the mappings or ontology schema this is what you need:
* a user account on this wiki (''[[Special:UserLogin|login/sign up]]'')
* editor rights: application for editor rights is done by:
** register for http://forum.dbpedia.org
** ask for editor rights [https://forum.dbpedia.org/t/mappings-wiki-accounts/38 here]. Include your user name in the message and a short introduction of yourself.
* a namespace for the language you want to write mappings for
** if the namespace does not exist already (see the left side bar) please request it at [mailto:dbpedia-discussion@lists.sourceforge.net dbpedia-discussion@lists.sourceforge.net]
* If you will contribute frequently, get a Github account (see below)


=== DBpedia Mappings ===
== Editorial Process ==
A significant quality problem until 2015 was that there was neither bug tracking nor discussion on the best approaches. A major strength of Wikipedia and Wikidata is that editors are in constant discussion and there are established editorial processes. Such were missing on this mapping wiki, and it is our collective task to rectify the situation. If you find a problem:
* Post a new issue to one of the following trackers, depending on the nature of the issue:
** Mapping: https://github.com/dbpedia/mappings-tracker/issues
** Ontology: https://github.com/dbpedia/ontology-tracker/issues
** Extraction framework: https://github.com/dbpedia/extraction-framework/issues
* Edit the corresponding Discussion page (of the mapping or ontology element):
** Describe the problem in detail. The reason to do it here and not in Github is so that we have most of the info in one place
** Provide a link to the issue
** Propose a solution if you'd like


The type of Wikipedia content that is most valuable for the DBpedia extraction are infoboxes and tables. Infoboxes display an article's most relevant facts as a table of attribute-value pairs on the top right-hand side of the Wikipedia page.  
== Best Practices ==
If you write a best practice, list it here:
* [[Mapping Guide]] (thorough)
* [http://vladimiralexiev.github.io/pres/20150209-dbpedia/add-mapping.html Adding a Mapping] (shorter)
* [[Main Page#Editorial Process]]
* [[Main Page#Testing Best Practices]]


As Wikipedia's infobox template system has decentrally evolved over time, different communities of Wikipedia editors use different templates to describe the same type of things (e.g. infobox_city_japan, infobox_swiss_town and infobox_town_de). Different templates use different names for the same attribute (e.g. birthplace and
Focused investigations of massive problems that require discussion, fixes to many props/templates, documenting a pattern:
placeofbirth). As many Wikipedia editors do not strictly follow the recommendations given on the page that describes a template, attribute values are
* [[What's in a Name]]
expressed using a wide range of different formats and units of measurement.  
* [[Connecting Places]] [https://github.com/dbpedia/mappings-tracker/issues/29 #29]
* [[Agent Relations]]


In order to overcome the problems of synonymous attribute names and multiple templates being used for the same type of things, the DBpedia project maps Wikipedia templates as well as tables within an article to the [http://wiki.dbpedia.org/Ontology DBpedia ontology].
== Testing Best Practices ==
These mappings are specified using the '''DBpedia Mapping Language'''. The mapping language makes use of MediaWiki templates that define DBpedia ontology classes and properties as well as template/table to ontology mappings.
Whenever we find or fix a problem, we should have some test cases for it. This serves many important purposes:
* to illustrate the problem
* as proof it works after the problem is fixed
* to provide test cases for any bugs in the extraction framework (upstream bug reporting)


The following mappings map Wikipedia infoboxes and tables to this ontology:
Every infobox mapping has a link "test this mapping", eg
* http://mappings.dbpedia.org/server/mappings/fr/extractionSamples/Mapping_fr:Infobox_Ville_de_Serbie


* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=204 Infobox Mappings]
Unfortunately this works mostly for EN dbpedia, see bug [https://github.com/dbpedia/extraction-framework/issues/289 #289]. But you can still test per resource, eg
* [http://mappings.dbpedia.org/index.php?title=Special%3APrefixIndex&prefix=Table&namespace=204 Table Mappings]
* http://mappings.dbpedia.org/server/extraction/fr/extract?title=Požega_(Serbie)&revid=&format=turtle-triples
* http://mappings.dbpedia.org/server/extraction/bg/extract?title=Лили+Иванова&revid=&format=turtle-triples


=== DBpedia Ontology ===
This is even better because it provides specific test cases.
Also provide a link to the corresponding wiki pages in edit mode, so the markup can be seen immediately.
Add these to the mapping's Discussion page.


The DBpedia ontology is based on OWL and forms the structural backbone of DBpedia. It describes classes, e.g. person, city, country, and properties, e.g. birth place, longitude. Information in Wikipedia articles is then mapped via the above described mapping to this ontology. Most prominently, many Wikipedia pages use so called infoboxes. For instance, the English wikipedia article about [http://en.wikipedia.org/wiki/London London] contains a "settlement infobox". This infobox may be mapped to e.g. the class "populated place" (see [[OntologyClass:PopulatedPlace|PopulatedPlace]]) in the DBpedia ontology and the attributes in the infobox are mapped to properties in the DBpedia ontology. <!-- Please see the [http://en.wikipedia.org/wiki/Template:Infobox_settlement/doc documentation of the settlement infobox] for details. --> This way, a unified view over all data in infoboxes can be obtained. Since this information conforms to Semantic Web standards, it can be queried and combined by a broad range of tools in a useful way. This increases the value of information entered by the Wikipedia community.
Eg on [[Mapping fr talk:Infobox Ville de Serbie]] we have:
* Testing:
** page: https://fr.wikipedia.org/w/index.php?title=Požega_(Serbie)&action=edit
** result: http://mappings.dbpedia.org/server/extraction/fr/extract?title=Požega_(Serbie)&revid=&format=turtle-triples


A listing of all classes, properties and datatypes (units of measurement) used by the DBpedia ontology is found below:
We've asked the developers to add UTF-8 encoding [https://github.com/dbpedia/extraction-framework/issues/304 #304], which will make it easier to inspect the output. Else you need to save it to file and open it in a proper editor.


* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=200 Ontology Classes] - OWL classes and their definitons
=== Custom or Default Extractor ===
* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=202 Ontology Properties] - OWL Object and Datatype properties
The above URLs use the default extractor, which extracts only labels and mappings. This is probably what you need for testing, since you're debugging the mapped triples, right?
* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=206 Datatypes]
If you want to see more triples, add "&extractors=custom" to the URL. This runs all available extractors.
But there is a limit in the extraction samples (1000 triples?) so for big articles this may not return all expected triples.


== How is the Mapping and the Ontology maintained? ==
Let's illustrate with Elvis Presley: [http://mappings.dbpedia.org/server/extraction/en/extract?title=Elvis_Presley&revid=&format=turtle-triples&extractors=custom custom] 921 triples, [http://mappings.dbpedia.org/server/extraction/en/extract?title=Elvis_Presley&revid=&format=turtle-triples default] 118 triples.
So far, few people inside the DBpedia project maintained the mapping and ontology, but in the spirit of open source projects, control will be handed over to the Wikipedia and DBpedia community. The members of the DBpedia team are not able to extend the mappings to cover all Wikipedia infoboxes and tables, due to the size of the task and the knowledge required to map templates from exotic domains. Therefore, the idea of this Wiki is to enable the interested public to contribute to the definition of DBpedia mappings by updating existing mappings and by adding new mappings to this wiki.
So the limit is not reached in this case.


''This wiki is read-only. If you like to edit the mappings or ontology schema, please [[Special:UserLogin|register]] and the DBpedia team will add you to the editors list.''
=== Copy IRIs not URL-encoded ===
The URLs above use non-ASCII characters, so they are '''International''' Resource Identifiers (IRIs).
These are readable and allow a user to see what they represent.
But when you copy from the browser's address box, an IRI is URL-encoded to an unreadable ugliness like:
* http://mappings.dbpedia.org/server/extraction/fr/extract?title=Po%C5%BEega_(Serbie)&revid=&format=turtle-triples
* http://mappings.dbpedia.org/server/extraction/bg/extract?title=%D0%9B%D0%B8%D0%BB%D0%B8+%D0%98%D0%B2%D0%B0%D0%BD%D0%BE%D0%B2%D0%B0&revid=&format=turtle-triples


=== Tutorials ===
The browsers do that for obscure historical reasons.
Please be kind to your fellow editors and use an addon that preserves IRIs, eg:
* Chrome addon: [https://chrome.google.com/webstore/detail/copy-url/mkhnbhdofgaendegcgbmndipmijhbili Copy URL]


The Specification of the '''DBpedia Mapping Language''' can be found [http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/trunk/extraction/core/doc/mapping%20language/ here]. Please find below step-by-step tutorials on:
If you don't have such, you can use this trick:
* Copy everything but the first letter "m"
* Paste, then add the missing letter "m" (or "http://m").


* [[Ontology_Editing|How to edit the ontology schema]]
=== Domain Validation ===
* How to write
The [http://mappings.dbpedia.org/validation/index.html Domain Validation service] generates a list of domain exceptions, updating it daily.
** [[Writing_Mappings/Templates|Template mappings]]
For more information please refer to A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens, S. Hellmann, and R. Van de Walle. [http://jens-lehmann.org/files/2015/iswc_rml_rdfunit.pdf Assessing and refining mappings to rdf to improve dataset quality]. In Proceedings of the 14th International Semantic Web Conference, Oct. 2015.
** [[Writing_Mappings/Tables|Table mappings]]


=== Tools ===
For each '''predicate''' used in a '''mapping''', it shows the '''expected''' domain class (defined for the predicate) and '''existing''' class (corresponding to that mapping).
Please filter for your language (the first column) and correct as many errors as you can:
* Make the '''existing''' class into a subclass of '''expected''', OR
* Correct (usually raise) the domain of '''predicate''', OR
* Correct the '''mapping''' to use the expected mapToClass


This wiki provides several tools that help you to edit the mappings and the ontology:
In all cases, ''document'' the property according to the changes you made! You can see some examples of such changes in this [http://mappings.dbpedia.org/index.php?limit=50&tagfilter=&title=Special%3AContributions&contribs=user&target=VladimirAlexiev&namespace=&year=2015&month=8 list of contributions]


* '''Ontology View.''' The [http://mappings.dbpedia.org/server/ontology/classes ontology view] gives you an overview about the current shape of the DBpedia ontology.
== That's it! ==
* '''Mapping Validator.''' When you are editing a mapping, there is a validate button on the bottom of the page. Pressing the button validates your changes for syntactic correctness and highlights inconsistencies such as missing property definitions.
That is all you need to kick-start. Your contributions will be available:
* '''Extraction Tester.''' The extraction tester tests a mapping against a set of example Wikipedia pages. This gives you direct feedback about whether a mapping works and how the resulting data will look like.
* in the [http://live.dbpedia.org/ DBpedia Live] end point shortly after your edit (currently only for English)
* in the next [http://dbpedia.org/downloads DBpedia datasets] release


== Mappings for new languages ==
''Happy mapping!''
=== Create new mappings ===


=== Use new mappings in the extraction ===
== About DBpedia ==
To learn more about DBpedia itself visit http://dbpedia.org/About.

Latest revision as of 14:43, 10 July 2019

DBpedia Mappings Wiki

In this DBpedia Mappings Wiki you can help to enhance the information in DBpedia. The DBpedia Extraction Framework uses the mappings defined here to homogenize information extracted from Wikipedia before generating structured information in RDF.

Anybody can help by editing:

Mappings can be written for a variety of languages, connecting multiligual information to a language-independent unified ontology schema (language-specific labels can be provided there).


Mapping Example

This is how you write a simple infobox mapping.

Mapping:Infobox_actor

{{TemplateMapping 
| mapToClass = Actor 
| mappings = 
   {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }}
   {{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }}
}}

This mapping extracts three information bits:

  1. the type information (Actor)
  2. the name of the actor
  3. the actor's place of birth.

Therefore, three RDF triples for each Infobox_actor in the English Wikipedia are extracted. For example for Vince Vaughn

dbpedia:Vince_Vaughn  rdf:type                dbpedia-owl:Actor   .
dbpedia:Vince_Vaughn  foaf:name               "Vince Vaughn"@en   .
dbpedia:Vince_Vaughn  dbpedia-owl:birthPlace  dbpedia:Minneapolis .


Detailed Information

Prerequisites

If you would like to edit the mappings or ontology schema this is what you need:

  • a user account on this wiki (login/sign up)
  • editor rights: application for editor rights is done by:
    • register for http://forum.dbpedia.org
    • ask for editor rights here. Include your user name in the message and a short introduction of yourself.
  • a namespace for the language you want to write mappings for
  • If you will contribute frequently, get a Github account (see below)

Editorial Process

A significant quality problem until 2015 was that there was neither bug tracking nor discussion on the best approaches. A major strength of Wikipedia and Wikidata is that editors are in constant discussion and there are established editorial processes. Such were missing on this mapping wiki, and it is our collective task to rectify the situation. If you find a problem:

Best Practices

If you write a best practice, list it here:

Focused investigations of massive problems that require discussion, fixes to many props/templates, documenting a pattern:

Testing Best Practices

Whenever we find or fix a problem, we should have some test cases for it. This serves many important purposes:

  • to illustrate the problem
  • as proof it works after the problem is fixed
  • to provide test cases for any bugs in the extraction framework (upstream bug reporting)

Every infobox mapping has a link "test this mapping", eg

Unfortunately this works mostly for EN dbpedia, see bug #289. But you can still test per resource, eg

This is even better because it provides specific test cases. Also provide a link to the corresponding wiki pages in edit mode, so the markup can be seen immediately. Add these to the mapping's Discussion page.

Eg on Mapping fr talk:Infobox Ville de Serbie we have:

We've asked the developers to add UTF-8 encoding #304, which will make it easier to inspect the output. Else you need to save it to file and open it in a proper editor.

Custom or Default Extractor

The above URLs use the default extractor, which extracts only labels and mappings. This is probably what you need for testing, since you're debugging the mapped triples, right? If you want to see more triples, add "&extractors=custom" to the URL. This runs all available extractors. But there is a limit in the extraction samples (1000 triples?) so for big articles this may not return all expected triples.

Let's illustrate with Elvis Presley: custom 921 triples, default 118 triples. So the limit is not reached in this case.

Copy IRIs not URL-encoded

The URLs above use non-ASCII characters, so they are International Resource Identifiers (IRIs). These are readable and allow a user to see what they represent. But when you copy from the browser's address box, an IRI is URL-encoded to an unreadable ugliness like:

The browsers do that for obscure historical reasons. Please be kind to your fellow editors and use an addon that preserves IRIs, eg:

If you don't have such, you can use this trick:

  • Copy everything but the first letter "m"
  • Paste, then add the missing letter "m" (or "http://m").

Domain Validation

The Domain Validation service generates a list of domain exceptions, updating it daily. For more information please refer to A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens, S. Hellmann, and R. Van de Walle. Assessing and refining mappings to rdf to improve dataset quality. In Proceedings of the 14th International Semantic Web Conference, Oct. 2015.

For each predicate used in a mapping, it shows the expected domain class (defined for the predicate) and existing class (corresponding to that mapping). Please filter for your language (the first column) and correct as many errors as you can:

  • Make the existing class into a subclass of expected, OR
  • Correct (usually raise) the domain of predicate, OR
  • Correct the mapping to use the expected mapToClass

In all cases, document the property according to the changes you made! You can see some examples of such changes in this list of contributions

That's it!

That is all you need to kick-start. Your contributions will be available:

Happy mapping!

About DBpedia

To learn more about DBpedia itself visit http://dbpedia.org/About.