How to add a mapping namespace: Difference between revisions

From Mediawiki1
Jump to navigationJump to search
(Created page with '== How to create a mapping namespace for a new language == As an example, we use a fictitious language with code "xx" and Wikipedia rank 44. Note: '''more code changes will be ...')
 
No edit summary
 
(45 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== How to create a mapping namespace for a new language ==
As an example, we use the fictitious language ''Xxyzish'' with Wikipedia domain ''xx'' and Wikipedia rank 44.


As an example, we use a fictitious language with code "xx" and Wikipedia rank 44.
'''CAUTION''': several subtle code changes will be needed to accomodate '''language codes that contain a dash''' (e.g. ''roa-rup'' or ''be-x-old''), especially in regard to URLs, file names and other identifiers, also in parts of the code base not listed here. In this case, please update the code and this guide.


Note: '''more code changes will be needed for the first language code that contains a dash "-".'''
=== Get language code and rank ===


1. get language code
Get the wiki language code and rank from [http://s23.org/wikistats/wikipedias_html.php the list of Wikipedias].


get wiki language code and rank from http://s23.org/wikistats/wikipedias_html.php
Namespace number: multiply the rank by 2 and add 200
namespace number: multiply the rank by 2 and add 200
talk namespace number: add 1 to the namespace number
Example: language code "xx", rank 44, namespace number 288, talk namespace number 289
if 288 is in use, we choose numbers, let’s say 298 and 299
CAUTION: If the calculated namespace number already exists for another language (because the ranking has changed) do NOT change the existing namespace number. Please find a neighboring or close enough number that works.
2. update the extraction framework and server


a) edit core/org.dbpedia.extraction.wikiparser.Namespace
Example: language code "xx", rank 44, namespace number 288.


add something like this at the appropriate positions in the code
'''CAUTION''': If the calculated namespace number already exists for another language (because the ranking has changed) do '''not''' change the existing namespace number. Please find a neighboring or close enough number that works.
“xx->288
b) commit and push the changes to default branch


c) log onto 160.45.137.69
Example: If 288 is in use, we choose some other number that is not used, let's say 298.


with user name "dbpedia-server"
'''CAUTION''': Do not use namespace numbers >= 400. Namespaces 400 and above are used by MediaWiki. Please find a number between 200 and 398 that is not yet used.
stop the server
 
Example: If the language rank is 100, the formula above would yield namespace number 400. Instead, we choose some other number that is not used, let's say 398.
 
=== Update mappings wiki ===
 
==== Update MediaWiki settings ====
 
Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs.
 
Open LocalSettings.php. Add the following snippet in the correct alphabetical position in the map defining the extra namespaces:
 
<pre>
"xx"=>288,
</pre>
 
Restart the Apache server.
 
==== Add the mappings main page ====
 
Edit {{fullurl:Mapping xx}}. The page content should be the following, where ''Xxyzish'' is the English name of the language:
 
<pre>
{{Mapping main page|xx|Xxyzish}}
</pre>
 
==== Update [[MediaWiki:Sidebar|mappings wiki sidebar]] ====
 
Edit [[MediaWiki:Sidebar]]. Add a link for the new language in the correct alphabetical position:
 
<pre>
** Mapping xx|Mappings (xx)
</pre>
 
==== Update [[DBpedia datasets|datasets overview]] ====
 
Edit [[DBpedia datasets]]. Add a column for the new language in the correct alphabetical position and update all rows according to the settings in [https://github.com/dbpedia/extraction-framework/blob/master/dump/extraction.default.properties dump/extraction.default.properties]. This is probably the most tedious part...
 
=== Update the extraction framework ===
 
==== Edit Namespace.scala ====
 
Edit your copy of [https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala]. Add something like this in the correct alphabetical position:
 
<pre>
"xx"->288,
</pre>
 
==== Edit extraction.default.properties ====
 
Edit your copy of [https://github.com/dbpedia/extraction-framework/blob/master/dump/extraction.default.properties dump/extraction.default.properties]. Add something like this in the correct alphabetical position:
 
<pre>
extractors.xx=MappingExtractor
</pre>
 
You can add more extractors, but make sure that the required configuration exists for the new language.
 
==== Update namespace settings for mappings wiki ====
 
To update the namespace settings for the mappings wiki, cd to core/ and run
 
<pre>
../clean-install-run generate-settings
</pre>
 
==== Commit changes ====
 
Commit and push the changes to default branch.
 
=== Update and restart the mapping server ===
 
Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.
 
Stop the server:
 
<pre>
sudo /etc/init.d/dbpedia-server stop
</pre>
 
Or, if there's no start/stop script:
 
<pre>
ps axfu | grep java
ps axfu | grep java
look for class ...server.Server
</pre>
 
Look for class ...server.Server, and then:
 
<pre>
kill <process id>
kill <process id>
in the terminal
</pre>
cd /home/dbpedia-server/dbpedia/extraction_framework
 
hg pull
Add a dummy file extraction_framework/server/src/main/statistics/mappingstats_xx.txt with the following content (make sure there are '''two empty lines''' at the end!):
hg update
 
<pre>
wikiStats|xx
 
redirects|0
 
templates|0
 
 
</pre>
 
Then update and compile the server:
 
<pre>
cd extraction_framework
git pull
mvn clean install --projects core,server
mvn clean install --projects core,server
cd server
</pre>
 
Finally, start the server:
 
<pre>
sudo /etc/init.d/dbpedia-server start
</pre>
 
Or, if there's no start/stop script:
 
<pre>
cd extraction_framework/server
../run server &>server-<YYYY>-<MM>-<DD>.01.log &
../run server &>server-<YYYY>-<MM>-<DD>.01.log &
</pre>
=== Generate and deploy statistics ===
==== Extract data from Wikipedia dump file ====
Download the latest dump for language xx. [https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions see here for details]
<pre>
dump> ../run download config={download-config-file}
</pre>
Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. [https://github.com/dbpedia/extraction-framework/blob/master/dump/extraction.stats.properties dump/extraction.stats.properties] should already contain the correct settings. cd into directory dump/ and execute
<pre>
dump> ../run stats-extraction extraction.stats.properties
</pre>


3. update mappings wiki
==== Extract statistics from triples files ====
TODO: add LocalSettings.php to some source repo!!!
a) log onto www5 (160.45.137.86)


with user name "Administrator"
cd into directory server/, modify the path to the dump base dir in pom.xml if necessary and run
open
C:\Program Files (x86)\Apache Software Foundation\Apache2.2\htdocs\mappings\LocalSettings.php
add the following lines at the right position in the code
“xx"=> 288
restart the Apache server
b) edit http://mappings.dbpedia.org/index.php/MediaWiki:Sidebar


add a link at the right position in the ranking with the right number and language code
<pre>
c) edit http://mappings.dbpedia.org/index.php/Template:Class, http://mappings.dbpedia.org/index.php/Template:Datatype, http://mappings.dbpedia.org/index.php/Template:DatatypeProperty, http://mappings.dbpedia.org/index.php/Template:ObjectProperty
server> ../run stats
</pre>


add two lines for label@xx
Copy src/main/statistics/mappingstatistics_xx.txt to same folder on the mappings server.


d) edit http://mappings.dbpedia.org/index.php/Mapping_Statistics and http://mappings.dbpedia.org/index.php/DBpedia_datasets
=== Update and deploy sprint stuff ===


e) generate statistics for new language
Ask Pablo how to do that...
run RedirectExtractor, InfoboxExtractor, TemplateParameterExtractor (see dump/extraction.server.properties)
run CreateMappingStats (launcher ‘stats’ in server/pom.xml)
copy src/main/statistics/mappingstatistics_bg.txt to same folder on server
Update and deploy sprint stuff.

Latest revision as of 08:51, 5 March 2015

As an example, we use the fictitious language Xxyzish with Wikipedia domain xx and Wikipedia rank 44.

CAUTION: several subtle code changes will be needed to accomodate language codes that contain a dash (e.g. roa-rup or be-x-old), especially in regard to URLs, file names and other identifiers, also in parts of the code base not listed here. In this case, please update the code and this guide.

Get language code and rank

Get the wiki language code and rank from the list of Wikipedias.

Namespace number: multiply the rank by 2 and add 200

Example: language code "xx", rank 44, namespace number 288.

CAUTION: If the calculated namespace number already exists for another language (because the ranking has changed) do not change the existing namespace number. Please find a neighboring or close enough number that works.

Example: If 288 is in use, we choose some other number that is not used, let's say 298.

CAUTION: Do not use namespace numbers >= 400. Namespaces 400 and above are used by MediaWiki. Please find a number between 200 and 398 that is not yet used.

Example: If the language rank is 100, the formula above would yield namespace number 400. Instead, we choose some other number that is not used, let's say 398.

Update mappings wiki

Update MediaWiki settings

Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs.

Open LocalSettings.php. Add the following snippet in the correct alphabetical position in the map defining the extra namespaces:

"xx"=>288,

Restart the Apache server.

Add the mappings main page

Edit https://mediawiki1.informatik.uni-mannheim.de/index.php/Mapping_xx. The page content should be the following, where Xxyzish is the English name of the language:

{{Mapping main page|xx|Xxyzish}}

Update mappings wiki sidebar

Edit MediaWiki:Sidebar. Add a link for the new language in the correct alphabetical position:

** Mapping xx|Mappings (xx)

Update datasets overview

Edit DBpedia datasets. Add a column for the new language in the correct alphabetical position and update all rows according to the settings in dump/extraction.default.properties. This is probably the most tedious part...

Update the extraction framework

Edit Namespace.scala

Edit your copy of core/src/main/scala/org/dbpedia/extraction/wikiparser/Namespace.scala. Add something like this in the correct alphabetical position:

"xx"->288,

Edit extraction.default.properties

Edit your copy of dump/extraction.default.properties. Add something like this in the correct alphabetical position:

extractors.xx=MappingExtractor

You can add more extractors, but make sure that the required configuration exists for the new language.

Update namespace settings for mappings wiki

To update the namespace settings for the mappings wiki, cd to core/ and run

../clean-install-run generate-settings

Commit changes

Commit and push the changes to default branch.

Update and restart the mapping server

Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.

Stop the server:

sudo /etc/init.d/dbpedia-server stop

Or, if there's no start/stop script:

ps axfu | grep java

Look for class ...server.Server, and then:

kill <process id>

Add a dummy file extraction_framework/server/src/main/statistics/mappingstats_xx.txt with the following content (make sure there are two empty lines at the end!):

wikiStats|xx

redirects|0

templates|0


Then update and compile the server:

cd extraction_framework
git pull
mvn clean install --projects core,server

Finally, start the server:

sudo /etc/init.d/dbpedia-server start

Or, if there's no start/stop script:

cd extraction_framework/server
../run server &>server-<YYYY>-<MM>-<DD>.01.log &

Generate and deploy statistics

Extract data from Wikipedia dump file

Download the latest dump for language xx. see here for details

dump> ../run download config={download-config-file}

Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. dump/extraction.stats.properties should already contain the correct settings. cd into directory dump/ and execute

dump> ../run stats-extraction extraction.stats.properties

Extract statistics from triples files

cd into directory server/, modify the path to the dump base dir in pom.xml if necessary and run

server> ../run stats

Copy src/main/statistics/mappingstatistics_xx.txt to same folder on the mappings server.

Update and deploy sprint stuff

Ask Pablo how to do that...