Main Page: Difference between revisions

From Mediawiki1
Jump to navigationJump to search
Line 67: Line 67:
If there's a massive problem that requires discussion and fixes to many properties & templates, write a separate page and list it here:
If there's a massive problem that requires discussion and fixes to many properties & templates, write a separate page and list it here:
* [[What's in a Name]]
* [[What's in a Name]]
== Testing Best Practices ==
Whenever we find or fix a problem, we should have some test cases for it. This serves many important purposes:
* to illustrate the problem
* as proof it works after the problem is fixed
* to provide test cases for any bugs in the extraction framework (upstream bug reporting)
Every infobox mapping has a link "test this mapping", eg
* http://mappings.dbpedia.org/server/mappings/fr/extractionSamples/Mapping_fr:Infobox_Ville_de_Serbie
Unfortunately this works mostly for EN dbpedia, see bug [https://github.com/dbpedia/extraction-framework/issues/289 #289]. But you can still test per resource, eg
* http://mappings.dbpedia.org/server/extraction/fr/extract?title=Belgrade&revid=&format=turtle-triples&extractors=custom
* http://mappings.dbpedia.org/server/extraction/bg/extract?title=Лили+Иванова&revid=&format=turtle-triples&extractors=custom
This is even better because it provides specific test cases.
Also provide a link to the corresponding wiki pages in edit mode, so the markup can be seen immediately.
Add these to the mapping's Discussion page.
Eg on [[Mapping fr talk:Infobox Ville de Serbie]] we have:
* Testing:
** page: https://fr.wikipedia.org/w/index.php?title=Požega_(Serbie)&action=edit
** result: http://mappings.dbpedia.org/server/extraction/fr/extract?title=Požega_(Serbie)&revid=&format=turtle-triples&extractors=custom
We've asked the developers to add UTF-8 encoding [https://github.com/dbpedia/extraction-framework/issues/304 #304], which will make it easier to inspect the output. Else you need to save it to file and open it in a proper editor.
=== Copy IRIs not URL-encoded ===
The URLs above use non-ASCII characters, so they are '''International''' Resource Identifiers (IRIs).
These are readable and allow a user to see what they represent.
But when you copy from the browser's address box, an IRI is URL-encoded to an unreadable ugliness like:
* http://mappings.dbpedia.org/server/extraction/fr/extract?title=Po%C5%BEega_(Serbie)&revid=&format=turtle-triples&extractors=custom
* http://mappings.dbpedia.org/server/extraction/bg/extract?title=%D0%9B%D0%B8%D0%BB%D0%B8+%D0%98%D0%B2%D0%B0%D0%BD%D0%BE%D0%B2%D0%B0&revid=&format=turtle-triples&extractors=custom
The browsers do that for obscure historical reasons.
Please be kind to your fellow editors and use an addon that preserves IRIs, eg:
* Chrome addon: [https://chrome.google.com/webstore/detail/copy-url/mkhnbhdofgaendegcgbmndipmijhbili Copy URL]
If you don't have such, you can use this trick:
* Copy everything but the first letter "m"
* Paste, then add the missing letter "m" (or "http://m").


== That's it! ==
== That's it! ==

Revision as of 09:04, 13 January 2015

DBpedia Mappings Wiki

In this DBpedia Mappings Wiki you can help to enhance the information in DBpedia. The DBpedia Extraction Framework uses the mappings defined here to homogenize information extracted from Wikipedia before generating structured information in RDF.

Anybody can help by editing:

Mappings can be written for a variety of languages, connecting multiligual information to a language-independent unified ontology schema (language-specific labels can be provided there).


Mapping Example

This is how you write a simple infobox mapping.

Mapping:Infobox_actor

{{TemplateMapping 
| mapToClass = Actor 
| mappings = 
   {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }}
   {{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }}
}}

This mapping extracts three information bits:

  1. the type information (Actor)
  2. the name of the actor
  3. the actor's place of birth.

Therefore, three RDF triples for each Infobox_actor in the English Wikipedia are extracted. For example for Vince Vaughn

dbpedia:Vince_Vaughn  rdf:type                dbpedia-owl:Actor   .
dbpedia:Vince_Vaughn  foaf:name               "Vince Vaughn"@en   .
dbpedia:Vince_Vaughn  dbpedia-owl:birthPlace  dbpedia:Minneapolis .


Detailed Information

Prerequisites

If you would like to edit the mappings or ontology schema this is what you need:

  • a user account on this wiki (login/sign up)
  • editor rights
    • they will be given to you within a couple of days
    • if not, please ask for editor rights at dbpedia-discussion@lists.sourceforge.net. Include your user name in the message.
    • once you got editor rights, please provide some information about yourself on your user wiki page
  • a namespace for the language you want to write mappings for
  • If you will contribute frequently, get a Github account (see below)

Editorial Process

A significant quality problem until 2015 was that there was neither bug tracking nor discussion on the best approaches. A major strength of Wikipedia and Wikidata is that editors are in constant discussion and there are established editorial processes. Such were missing on this mapping wiki, and it is our collective task to rectify the situation. If you find a problem:

If there's a massive problem that requires discussion and fixes to many properties & templates, write a separate page and list it here:

Testing Best Practices

Whenever we find or fix a problem, we should have some test cases for it. This serves many important purposes:

  • to illustrate the problem
  • as proof it works after the problem is fixed
  • to provide test cases for any bugs in the extraction framework (upstream bug reporting)

Every infobox mapping has a link "test this mapping", eg

Unfortunately this works mostly for EN dbpedia, see bug #289. But you can still test per resource, eg

This is even better because it provides specific test cases. Also provide a link to the corresponding wiki pages in edit mode, so the markup can be seen immediately. Add these to the mapping's Discussion page.

Eg on Mapping fr talk:Infobox Ville de Serbie we have:

We've asked the developers to add UTF-8 encoding #304, which will make it easier to inspect the output. Else you need to save it to file and open it in a proper editor.

Copy IRIs not URL-encoded

The URLs above use non-ASCII characters, so they are International Resource Identifiers (IRIs). These are readable and allow a user to see what they represent. But when you copy from the browser's address box, an IRI is URL-encoded to an unreadable ugliness like:

The browsers do that for obscure historical reasons. Please be kind to your fellow editors and use an addon that preserves IRIs, eg:

If you don't have such, you can use this trick:

  • Copy everything but the first letter "m"
  • Paste, then add the missing letter "m" (or "http://m").

That's it!

That is all you need to kick-start. Your contributions will be available:

Happy mapping!

About DBpedia

To learn more about DBpedia itself visit http://dbpedia.org/About.