How to add a mapping namespace: Difference between revisions

From Mediawiki1
Jump to navigationJump to search
Line 43: Line 43:
Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.
Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.


Add a dummy file extraction_framework/server/src/main/statistics/mappingstats_xx.txt with the following content (make sure there is an empty line at the end!):  
Add a dummy file extraction_framework/server/src/main/statistics/mappingstats_xx.txt with the following content (make sure there are ''two empty lines'' at the end!):  


<pre>
<pre>

Revision as of 17:30, 28 November 2012

As an example, we use the fictitious language Xxyzish with Wikipedia domain xx and Wikipedia rank 44.

CAUTION: several subtle code changes will be needed to accomodate language codes that contain a dash (e.g. roa-rup or be-x-old), especially in regard to URLs, file names and other identifiers, also in parts of the code base not listed here. In this case, please update the code and this guide.

Get language code and rank

Get the wiki language code and rank from the list of Wikipedias.

Namespace number: multiply the rank by 2 and add 200

Example: language code "xx", rank 44, namespace number 288.

CAUTION: If the calculated namespace number already exists for another language (because the ranking has changed) do not change the existing namespace number. Please find a neighboring or close enough number that works.

Example: if 288 is in use, we choose some other number that is not used, let's say 298.

Update the extraction framework

Edit Namespace.scala

Edit your copy of core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala. Add something like this in the correct alphabetical position:

"xx"->288,

Edit extract.default.properties

Edit your copy of dump/extract.default.properties. Add something like this in the correct alphabetical position:

extractors.xx=MappingExtractor

You can add more extractors, but make sure that the required configuration exists for the new language.

Commit changes

Commit and push the changes to default branch.

Update and restart the mapping server

Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.

Add a dummy file extraction_framework/server/src/main/statistics/mappingstats_xx.txt with the following content (make sure there are two empty lines at the end!):

wikiStats|et

redirects|0

templates|0

Stop the server:

ps axfu | grep java

Look for class ...server.Server, and then:

kill <process id>

Then update, compile and start the server:

cd extraction_framework
hg pull
hg update
mvn clean install --projects core,server
cd server
../run server &>server-<YYYY>-<MM>-<DD>.01.log &

Update mappings wiki

Update MediaWiki settings

Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs.

Open htdocs/mappings/LocalSettings.php. Add the following snippet in the correct alphabetical position in the map defining the extra namespaces:

"xx"=>288,

Restart the Apache server.

Add the mappings main page

Edit https://mediawiki1.informatik.uni-mannheim.de/index.php/Mapping_xx. The page content should be the following, where Xxyzish is the English name of the language:

{{Mapping main page|xx|Xxyzish}}

Update mappings wiki sidebar

Edit MediaWiki:Sidebar. Add a link for the new language in the correct alphabetical position:

** Mapping xx|Mappings (xx)

Update datasets overview

Edit DBpedia datasets. Add a column for the new language in the correct alphabetical position and update all rows according to the settings in dump/extract.default.properties. This is probably the most tedious part...

Generate and deploy statistics

Extract data from Wikipedia dump file

Download the latest dump for language xx.

Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. dump/extract.stats.properties should already contain the correct settings. cd into directory dump/, copy extract.stats.properties to extract.properties, modify the path if necessary, and run

dump> ../run extract

Extract statistics from triples files

cd into directory server/, modify the path to the dump base dir in pom.xml if necessary and run

server> ../run stats

Copy src/main/statistics/mappingstatistics_xx.txt to same folder on the mappings server.

Update and deploy sprint stuff

Ask Pablo how to do that...