What's in a Name: Difference between revisions
No edit summary |
|||
(24 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
Analysis of Name Properties in DBpedia. | Analysis of Name Properties in DBpedia. | ||
Line 8: | Line 6: | ||
</pre> | </pre> | ||
--[[User:VladimirAlexiev|VladimirAlexiev]] 18:49, 11 January 2015 (UTC) | --[[User:VladimirAlexiev|VladimirAlexiev]] 18:49, 11 January 2015 (UTC) | ||
== Intro == | |||
Believe ot or not, DBO has '''86''' properties called "name". Isn't that a bit too much? Yes it is, and we need to fix this situation to avoid confusion. | Believe ot or not, DBO has '''86''' properties called "name". Isn't that a bit too much? Yes it is, and we need to fix this situation to avoid confusion. | ||
Line 13: | Line 13: | ||
If you ponder on the prop names below without reading my explanations, you'll appreciate the importance of '''documenting''' every property and class. I don't mean something complicated: just explain the purpose and when it's used.x | If you ponder on the prop names below without reading my explanations, you'll appreciate the importance of '''documenting''' every property and class. I don't mean something complicated: just explain the purpose and when it's used.x | ||
== Name | == Basic Name == | ||
{| border="1" | {| border="1" | ||
Line 33: | Line 29: | ||
|} | |} | ||
Birth, | == Names Forms == | ||
{|border="1" | |||
|dbo:officialName | |||
| | |||
| | |||
|- | |||
|dbo:alternativeName | |||
| | |||
| | |||
|- | |||
|dbo:otherName | |||
| | |||
| | |||
|- | |||
|dbo:longName | |||
| | |||
| | |||
|- | |||
|dbo:commonName | |||
|Eg "cat" | |||
| | |||
|- | |||
|dbo:scientificName | |||
|Eg "Felix catus" (biology) | |||
| | |||
|} | |||
== Name Lifecycle == | |||
People's names change during their lifetime. Eg Cranach was born Lucas Maler (after the profession of his father), was renamed Lucas Cranach when he became famous (after his birthplace Cronach), then art historians stated calling him Lucas Cranach the Elder after his son (also Lucas Cranach) became an artist. | |||
In some cultural heritage documentation systems (eg CIDOC CRM), name allocation and usage across time and space can be tracked with separate nodes. In DBpedia the situation is simpler, but we still need several name properties. But not as many as we find here! | |||
"Birth, former, historical, old, original, previous, same, present" name: in what situations should each one be used? | |||
{|border="1" | {|border="1" | ||
Line 68: | Line 98: | ||
|? | |? | ||
|} | |} | ||
This list is relatively short but is crucially important, since these props are used numerous times: everything has a name | |||
== Language-specific Names == | |||
There are thousands upon thousands of languages in the world. | |||
IANA has defined lang tags for a lot of them (following ISO2 and ISO3 codes, extending, and allowing custom extensions). See eg [http://vocab.getty.edu/doc/#IANA_Language_Tags Getty LOD documentation] for lang tag examples, and a script to fetch the IANA registry to a table. Open it in excel and search: [https://www.dropbox.com/s/g5j4bdiqt4mcyly/iana-lang-tags.xlsx?dl=1 iana-lang-tags.xlsx] | |||
Rather than making up a new property for each language in the world, we must use one property, with proper lang tag. | Rather than making up a new property for each language in the world, we must use *one* property, with proper lang tag. | ||
Eg instead of | Eg instead of | ||
Line 107: | Line 112: | ||
<pre class="example">dbr_fr:belgrade name "Belgrade"; cyrilliqueName "Белград". | <pre class="example">dbr_fr:belgrade name "Belgrade"; cyrilliqueName "Белград". | ||
</pre> | </pre> | ||
We should | We should do: | ||
<pre class="example">dbr_fr:belgrade name "Belgrade"@fr, "Белград"@sr-Cyrl. | <pre class="example">dbr_fr:belgrade name "Belgrade"@fr, "Белград"@sr-Cyrl. | ||
</pre> | </pre> | ||
The Template:PropertyMapping has a parameter "language" just for that purpose | The Template:PropertyMapping has a parameter "language" just for that purpose. | ||
We have to investigate the use of these of each case below. Eg: | We have to investigate the use of these of each case below. Eg: | ||
Line 117: | Line 122: | ||
* algerianName should be fixed but algerianSettlementName should be investigated | * algerianName should be fixed but algerianSettlementName should be investigated | ||
* frenchName should be fixed to "name" but frenchNickname might become otherName@fr or something like this | * frenchName should be fixed to "name" but frenchNickname might become otherName@fr or something like this | ||
* some of them might map to originalName (plus language), depending on how they're used | |||
All these are tracked under [https://github.com/dbpedia/mappings-tracker/issues/15 #15] | All these are tracked under [https://github.com/dbpedia/mappings-tracker/issues/15 #15]. | ||
Unfortunately the extractor currently doesn't handle lang tags like "sr-Cyrl" [https://github.com/dbpedia/extraction-framework/issues/303 #303] | |||
{| | {|border="1" | ||
|dbo:alemmanicName | |dbo:alemmanicName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:algerianName | |dbo:algerianName | ||
|#15 | |#15 qqq-DZ. Not a single language, so we use "Private language used in specific region" Algeria | ||
| | |fixed | ||
|- | |- | ||
|dbo:algerianSettlementName | |dbo:algerianSettlementName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:arabicName | |dbo:arabicName | ||
Line 138: | Line 145: | ||
|- | |- | ||
|dbo:arberishtName | |dbo:arberishtName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:calabrianName | |dbo:calabrianName | ||
| | |#15: x-calabria (custom tag). IANA doesn't have a code and https://en.wikipedia.org/wiki/Languages_of_Calabria says a mix of Neapolitan, Sicilian; even Greek, Occitan and Albanian | ||
| | |fixed | ||
|- | |- | ||
|dbo:chaouiName | |dbo:chaouiName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:cornishName | |dbo:cornishName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:cyrilliqueName | |dbo:cyrilliqueName | ||
Line 158: | Line 165: | ||
|- | |- | ||
|dbo:dutchName | |dbo:dutchName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:englishName | |dbo:englishName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:finnishName | |dbo:finnishName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:frenchName | |dbo:frenchName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:frenchNickname | |dbo:frenchNickname | ||
| | |#15, replaced by foaf:nick "nickname"@fr | ||
| | |fixed | ||
|- | |- | ||
|dbo:frioulanName | |dbo:frioulanName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:gaelicName | |dbo:gaelicName | ||
| | |#15 | ||
| | |fixed | ||
|- | |||
|dbo:gagaouze | |||
|#15, @gag | |||
|fixed | |||
|- | |- | ||
|dbo:germanName | |dbo:germanName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:greekName | |dbo:greekName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:irishName | |dbo:irishName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:italianName | |dbo:italianName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:japanName | |dbo:japanName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:kabyleName | |dbo:kabyleName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:kanjiName | |dbo:kanjiName | ||
| | |#15, @ja-Hani | ||
| | |fixed | ||
|- | |- | ||
|dbo:ladinName | |dbo:ladinName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:luxembourgishName | |dbo:luxembourgishName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:manxName | |dbo:manxName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:maoriName | |dbo:maoriName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:moldavianName | |dbo:moldavianName | ||
| | |#15: mo | ||
| | |fixed | ||
|- | |- | ||
|dbo:mozabiteName | |dbo:mozabiteName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:occitanName | |dbo:occitanName | ||
| | |#15 | ||
| | |fixed | ||
|- | |||
|dbo:russianName | |||
|#15, @ru | |||
|fixed | |||
|- | |- | ||
|dbo:sardinianName | |dbo:sardinianName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:scotishName | |dbo:scotishName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:scotsName | |dbo:scotsName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:scottishName | |dbo:scottishName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:sicilianName | |dbo:sicilianName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:tamazightName | |dbo:tamazightName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:tamazightSettlementName | |dbo:tamazightSettlementName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:touaregName | |dbo:touaregName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:touaregSettlementName | |dbo:touaregSettlementName | ||
| | |#15 | ||
| | |fixed | ||
|- | |- | ||
|dbo:welshName | |dbo:welshName | ||
| | |#15 | ||
| | |fixed | ||
|} | |} | ||
== To Investigate == | == To Investigate == | ||
{| | {|border="1" | ||
|dbo:colonialName | |dbo:colonialName | ||
| | | | ||
Line 314: | Line 325: | ||
|- | |- | ||
|dbo:namedByLanguage | |dbo:namedByLanguage | ||
| | |used with IntermediateNodeMapping, to be removed | ||
| | |[https://github.com/dbpedia/mappings-tracker/issues/41 #41] | ||
|- | |- | ||
|dbo:personName | |dbo:personName | ||
Line 323: | Line 334: | ||
|dbo:phonePrefixName | |dbo:phonePrefixName | ||
|Say again? | |Say again? | ||
| | |deleted | ||
|- | |- | ||
|dbo:reignName | |dbo:reignName | ||
| | |Likely incorrect, it's used in http://mappings.dbpedia.org/index.php/Mapping_fr:Infobox_Rôle_monarchique for monarchic servants, eg https://fr.wikipedia.org/wiki/Henri_d'Orléans_(1822-1897) | ||
| | | | ||
|- | |- | ||
|dbo:sharingOutName | |dbo:sharingOutName | ||
|dc:type of a part Place | |dc:type of a part Place | ||
| | |[https://github.com/dbpedia/mappings-tracker/issues/8 #8] | ||
|- | |- | ||
|dbo:sharingOutPopulationName | |dbo:sharingOutPopulationName | ||
| | |unused | ||
| | |deleted | ||
|- | |- | ||
|dbo:signName | |dbo:signName | ||
Line 356: | Line 363: | ||
== Is Ok == | == Is Ok == | ||
{| | {|border="1" | ||
|dbo:nameAsOf | |dbo:nameAsOf | ||
|Date when "name" was first assigned | |Date when "name" was first assigned | ||
Line 376: | Line 383: | ||
|fuelType of PowerStation as literal | |fuelType of PowerStation as literal | ||
| | | | ||
|- | |||
|dbo:messierName | |||
| | |||
|Astrological object, as classified by Charles Messier | |||
|- | |- | ||
|dbo:peopleName | |dbo:peopleName | ||
Line 400: | Line 411: | ||
|The colors of a party, school, taxon(?) | |The colors of a party, school, taxon(?) | ||
| | | | ||
|- | |||
|dbo:policeName | |||
|The police detachment serving a UK place, eg Wakefield -> "West Yorkshire Police" | |||
|ok | |||
|- | |||
|} | |} | ||
== Not Ok == | == Not Ok == | ||
{| | {|border="1" | ||
|dbo:genereviewsname | |dbo:genereviewsname | ||
|Bad capitalization (camelCase) | |Bad capitalization (camelCase) |
Latest revision as of 16:25, 16 February 2015
Analysis of Name Properties in DBpedia.
Note: this page was first edited it in emacs orgmode, then converted with
pandoc prop-names.org -w mediawiki >prop-names.mw
--VladimirAlexiev 18:49, 11 January 2015 (UTC)
Intro
Believe ot or not, DBO has 86 properties called "name". Isn't that a bit too much? Yes it is, and we need to fix this situation to avoid confusion.
If you ponder on the prop names below without reading my explanations, you'll appreciate the importance of documenting every property and class. I don't mean something complicated: just explain the purpose and when it's used.x
Basic Name
foaf:name | Use for Person & Organisation | ok |
dbo:name | Use for everything except Person & Organisation | check |
dbo:names | Bug, came from some Greek Astronomy props | #7 |
Names Forms
dbo:officialName | ||
dbo:alternativeName | ||
dbo:otherName | ||
dbo:longName | ||
dbo:commonName | Eg "cat" | |
dbo:scientificName | Eg "Felix catus" (biology) |
Name Lifecycle
People's names change during their lifetime. Eg Cranach was born Lucas Maler (after the profession of his father), was renamed Lucas Cranach when he became famous (after his birthplace Cronach), then art historians stated calling him Lucas Cranach the Elder after his son (also Lucas Cranach) became an artist.
In some cultural heritage documentation systems (eg CIDOC CRM), name allocation and usage across time and space can be tracked with separate nodes. In DBpedia the situation is simpler, but we still need several name properties. But not as many as we find here!
"Birth, former, historical, old, original, previous, same, present" name: in what situations should each one be used?
dbo:birthName | TODO | |
dbo:formerName | TODO | |
dbo:historicalName | TODO | |
dbo:oldName | TODO | |
dbo:originalName | TODO | |
dbo:previousName | TODO | |
dbo:sameName | TODO | |
dbo:presentName | Duplicate of name | ? |
This list is relatively short but is crucially important, since these props are used numerous times: everything has a name
Language-specific Names
There are thousands upon thousands of languages in the world.
IANA has defined lang tags for a lot of them (following ISO2 and ISO3 codes, extending, and allowing custom extensions). See eg Getty LOD documentation for lang tag examples, and a script to fetch the IANA registry to a table. Open it in excel and search: iana-lang-tags.xlsx
Rather than making up a new property for each language in the world, we must use *one* property, with proper lang tag.
Eg instead of
dbr_fr:belgrade name "Belgrade"; cyrilliqueName "Белград".
We should do:
dbr_fr:belgrade name "Belgrade"@fr, "Белград"@sr-Cyrl.
The Template:PropertyMapping has a parameter "language" just for that purpose.
We have to investigate the use of these of each case below. Eg:
- germanName should be fixed but alemmanicName might have some cultural significance
- algerianName should be fixed but algerianSettlementName should be investigated
- frenchName should be fixed to "name" but frenchNickname might become otherName@fr or something like this
- some of them might map to originalName (plus language), depending on how they're used
All these are tracked under #15. Unfortunately the extractor currently doesn't handle lang tags like "sr-Cyrl" #303
dbo:alemmanicName | #15 | fixed |
dbo:algerianName | #15 qqq-DZ. Not a single language, so we use "Private language used in specific region" Algeria | fixed |
dbo:algerianSettlementName | #15 | fixed |
dbo:arabicName | #15 | fixed |
dbo:arberishtName | #15 | fixed |
dbo:calabrianName | #15: x-calabria (custom tag). IANA doesn't have a code and https://en.wikipedia.org/wiki/Languages_of_Calabria says a mix of Neapolitan, Sicilian; even Greek, Occitan and Albanian | fixed |
dbo:chaouiName | #15 | fixed |
dbo:cornishName | #15 | fixed |
dbo:cyrilliqueName | #15 | fixed |
dbo:dutchName | #15 | fixed |
dbo:englishName | #15 | fixed |
dbo:finnishName | #15 | fixed |
dbo:frenchName | #15 | fixed |
dbo:frenchNickname | #15, replaced by foaf:nick "nickname"@fr | fixed |
dbo:frioulanName | #15 | fixed |
dbo:gaelicName | #15 | fixed |
dbo:gagaouze | #15, @gag | fixed |
dbo:germanName | #15 | fixed |
dbo:greekName | #15 | fixed |
dbo:irishName | #15 | fixed |
dbo:italianName | #15 | fixed |
dbo:japanName | #15 | fixed |
dbo:kabyleName | #15 | fixed |
dbo:kanjiName | #15, @ja-Hani | fixed |
dbo:ladinName | #15 | fixed |
dbo:luxembourgishName | #15 | fixed |
dbo:manxName | #15 | fixed |
dbo:maoriName | #15 | fixed |
dbo:moldavianName | #15: mo | fixed |
dbo:mozabiteName | #15 | fixed |
dbo:occitanName | #15 | fixed |
dbo:russianName | #15, @ru | fixed |
dbo:sardinianName | #15 | fixed |
dbo:scotishName | #15 | fixed |
dbo:scotsName | #15 | fixed |
dbo:scottishName | #15 | fixed |
dbo:sicilianName | #15 | fixed |
dbo:tamazightName | #15 | fixed |
dbo:tamazightSettlementName | #15 | fixed |
dbo:touaregName | #15 | fixed |
dbo:touaregSettlementName | #15 | fixed |
dbo:welshName | #15 | fixed |
To Investigate
dbo:colonialName | ||
dbo:informationName | ||
dbo:leaderName | ||
dbo:legislativePeriodName | ||
dbo:meshName | ||
dbo:municipalityRenamedTo | Anything can be renamed, why "municipality". Is that the current name? | |
dbo:namedByLanguage | used with IntermediateNodeMapping, to be removed | #41 |
dbo:personName | ||
dbo:phonePrefixName | Say again? | deleted |
dbo:reignName | Likely incorrect, it's used in http://mappings.dbpedia.org/index.php/Mapping_fr:Infobox_Rôle_monarchique for monarchic servants, eg https://fr.wikipedia.org/wiki/Henri_d'Orléans_(1822-1897) | |
dbo:sharingOutName | dc:type of a part Place | #8 |
dbo:sharingOutPopulationName | unused | deleted |
dbo:signName | ||
dbo:statName | ||
dbo:subdivisionName |
Is Ok
dbo:nameAsOf | Date when "name" was first assigned | |
dbo:filename | Filename of a Sound | see #19 |
dbo:iupacName | IUPAC name of a Chemical | |
dbo:ngcName | NGC name of a CelestialBody | |
dbo:fuelTypeName | fuelType of PowerStation as literal | |
dbo:messierName | Astrological object, as classified by Charles Messier | |
dbo:peopleName | Name for the people of a certain place, eg Bulgaria->Bulgarian | |
dbo:spouseName | Spouse of someone as literal | |
dbo:teamName | Name of a School's athletic teams | |
dbo:nameDay | Name-day of a saint (a xsd:gMonthDay) | |
dbo:namedAfter | Person after whom something is named (eg School, Disease, Theorem etc) | |
dbo:colourName | The colors of a party, school, taxon(?) | |
dbo:policeName | The police detachment serving a UK place, eg Wakefield -> "West Yorkshire Police" | ok |
Not Ok
dbo:genereviewsname | Bad capitalization (camelCase) | #18 |
dbo:circuitName | replace with raceTrack | bug |