How to add a mapping namespace: Difference between revisions
(in the correct alphabetical position) |
|||
Line 19: | Line 19: | ||
==== Edit Namespace.scala ==== | ==== Edit Namespace.scala ==== | ||
Edit your copy of [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala]. Add something like this | Edit your copy of [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala]. Add something like this in the correct alphabetical position: | ||
<pre> | <pre> | ||
Line 27: | Line 27: | ||
==== Edit extract.default.properties ==== | ==== Edit extract.default.properties ==== | ||
Edit your copy of [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/dump/extract.default.properties dump/extract.default.properties]. Add something like this | Edit your copy of [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/dump/extract.default.properties dump/extract.default.properties]. Add something like this in the correct alphabetical position: | ||
<pre> | <pre> | ||
Line 72: | Line 72: | ||
Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs. | Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs. | ||
Open htdocs/mappings/LocalSettings.php. Add the following snippet | Open htdocs/mappings/LocalSettings.php. Add the following snippet in the correct alphabetical position in the map defining the extra namespaces: | ||
<pre> | <pre> | ||
Line 82: | Line 82: | ||
==== Update [[MediaWiki:Sidebar|mappings wiki sidebar]] ==== | ==== Update [[MediaWiki:Sidebar|mappings wiki sidebar]] ==== | ||
Edit [[MediaWiki:Sidebar]]. Add a link for the new language: | Edit [[MediaWiki:Sidebar]]. Add a link for the new language in the correct alphabetical position: | ||
<pre> | <pre> | ||
Line 98: | Line 98: | ||
Download the latest dump for language xx. | Download the latest dump for language xx. | ||
Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/dump/extract.stats.properties dump/extract.stats.properties] should contain the correct settings. cd into directory dump/, copy extract.stats.properties to extract.properties, modify if necessary, and run | Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/dump/extract.stats.properties dump/extract.stats.properties] should already contain the correct settings. cd into directory dump/, copy extract.stats.properties to extract.properties, modify the path if necessary, and run | ||
<pre> | <pre> |
Revision as of 02:06, 16 May 2012
As an example, we use a fictitious language with code "xx" and Wikipedia rank 44.
CAUTION: some subtle code changes will be needed for the first language code that contains a dash "-". In this case, please update the code and this guide.
Get language code and rank
Get the wiki language code and rank from the list of Wikipedias.
Namespace number: multiply the rank by 2 and add 200
Example: language code "xx", rank 44, namespace number 288.
CAUTION: If the calculated namespace number already exists for another language (because the ranking has changed) do not change the existing namespace number. Please find a neighboring or close enough number that works.
Example: if 288 is in use, we choose some other number that is not used, let's say 298.
Update the extraction framework
Edit Namespace.scala
Edit your copy of core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala. Add something like this in the correct alphabetical position:
"xx"->288,
Edit extract.default.properties
Edit your copy of dump/extract.default.properties. Add something like this in the correct alphabetical position:
extractors.xx=MappingExtractor
You can add more extractors, but make sure that the required configuration exists for new language.
Commit changes
Commit and push the changes to default branch.
Update and restart the mapping server
Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.
Stop the server:
ps axfu | grep java
Look for class ...server.Server, and then:
kill <process id>
Then update, compile and start the server:
cd extraction_framework hg pull hg update mvn clean install --projects core,server cd server ../run server &>server-<YYYY>-<MM>-<DD>.01.log &
Update mappings wiki
Update MediaWiki settings
Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs.
Open htdocs/mappings/LocalSettings.php. Add the following snippet in the correct alphabetical position in the map defining the extra namespaces:
"xx"=>288,
Restart the Apache server.
Update mappings wiki sidebar
Edit MediaWiki:Sidebar. Add a link for the new language in the correct alphabetical position:
** {{fullurl:Special:AllPages|namespace=288}}|Mappings (xx)
Update datasets overview
Edit DBpedia datasets. Add a column for the new language in the correct alphabetical position and update all rows according to the settings in dump/extract.default.properties. Ouch...
Generate and deploy statistics
Extract data from Wikipedia dump file
Download the latest dump for language xx.
Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. dump/extract.stats.properties should already contain the correct settings. cd into directory dump/, copy extract.stats.properties to extract.properties, modify the path if necessary, and run
dump> ../run extract
Extract statistics from triples files
cd into directory server/, modify the path to the dump base dir in pom.xml if necessary and run
server> ../run stats
Copy src/main/statistics/mappingstatistics_xx.txt to same folder on the mappings server.
Update and deploy sprint stuff
Ask Pablo how to do that...