How to add a mapping namespace: Difference between revisions

From Mediawiki1
Jump to navigationJump to search
Line 86: Line 86:
</pre>
</pre>


==== Edit [[DBpedia datasets]] ====
==== Update [[DBpedia datasets|datasets overview]] ====


Edit [[DBpedia datasets]]. Add a column for the new language and update all rows. Ouch...
Edit [[datasets]]. Add a column for the new language and update all rows according to the settings in . Ouch...


=== Generate and deploy statistics ===
=== Generate and deploy statistics ===

Revision as of 01:45, 16 May 2012

As an example, we use a fictitious language with code "xx" and Wikipedia rank 44.

CAUTION: some subtle code changes will be needed for the first language code that contains a dash "-". In this case, please update the code and this guide.

Get language code and rank

Get the wiki language code and rank from the list of Wikipedias.

Namespace number: multiply the rank by 2 and add 200

Example: language code "xx", rank 44, namespace number 288.

CAUTION: If the calculated namespace number already exists for another language (because the ranking has changed) do not change the existing namespace number. Please find a neighboring or close enough number that works.

If 288 is in use, we choose some other number that is not used, let's say 298.

Update the extraction framework

Edit core/org.dbpedia.extraction.wikiparser.Namespace.scala

Edit core/org.dbpedia.extraction.wikiparser.Namespace.scala. Add something like this at the appropriate position:

"xx"->288,

Edit dump/extract.default.properties

Edit dump/extract.default.properties. Add something like this at the appropriate position:

extractors.xx=MappingExtractor

Commit changes

Commit and push the changes to default branch.

Update and restart the mapping server

Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.

Stop the server:

ps axfu | grep java

Look for class ...server.Server, and then:

kill <process id>

Then update, compile and start the server:

cd extraction_framework
hg pull
hg update
mvn clean install --projects core,server
cd server
../run server &>server-<YYYY>-<MM>-<DD>.01.log &

Update mappings wiki

Update MediaWiki settings

Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs.

Open htdocs/mappings/LocalSettings.php. Add the following snippet at the correct position in the code:

"xx"=>288,

Restart the Apache server.

Update mappings wiki sidebar

Edit MediaWiki:Sidebar. Add a link for the new language:

** {{fullurl:Special:AllPages|namespace=288}}|Mappings (xx)

Update datasets overview

Edit datasets. Add a column for the new language and update all rows according to the settings in . Ouch...

Generate and deploy statistics

Extract data from Wikipedia dump file

Download the latest dump for language xx.

Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. dump/extract.stats.properties should contain the correct settings. cd into directory dump/, copy extract.stats.properties to extract.properties, modify if necessary, and run

dump> ../run extract

Extract statistics from triples files

cd into directory server/ and run

server> ../run stats

Copy src/main/statistics/mappingstatistics_xx.txt to same folder on the mappings server.

Update and deploy sprint stuff

Ask Pablo how to do that...