[Crm-sig] New Issue "Appellations that ARE URIs" (HW Issue 363, 383)

Martin Doerr martin at ics.forth.gr
Sat Nov 10 23:15:03 EET 2018


Dear All,

After complete rewriting of the text about implementing CRM in RDF I 
have temporarily abandoned google Docs. It is more efficient to split 
the topic, and then recombine.

Here my reformulation of the "punning" topic, duality of Appellation, as 
a discussion Item. Please check at the end the open questions I pose!

"In the CRM names are modelled as instances ofE41 Appellation. This 
class comprises any symbolic object used or created to name something 
without requiring further meaning. The CIDOC CRM version 6.2 definesE41 
Appellation, subclass of E90 Symbolic Object, as:

“This class comprises signs, either meaningful or not, or arrangements 
of signs following a specific syntax, that are used or can be used to 
refer to and identify a specific instance of some class or category 
within a certain context.

Instances of E41 Appellation do not identify things by their meaning, 
even if they happen to have one, but instead by convention, tradition, 
or agreement. Instances of E41 Appellation are cultural constructs; as 
such, they have a context, a history, and a use in time and space by 
some group of users. A given instance of E41 Appellation can have 
alternative forms, i.e., other instances of E41 Appellation that are 
always regarded as equivalent independent from the thing it denotes. “

The CRM is an ontology in the proper sense. Therefore, instances of 
physical things and phenomena of the physical worlds are regarded to be 
the things themselves, and not their machine representation, and any 
identifier or name used for something from the material world is 
different from the thing itself. For instance, I, Martin Doerr, am an 
instance of E21 Person, and not any of the URIs or records that may 
represent me in an information system. I am unique in this world, as is 
any particular thing, in contrast to representations of me.

In the CRM, the property“P1 is identified by” from E1 CRM Entity” to 
“E41 Appellation” relates the things to their names or identifiers.

In any knowledge representation schema, any item that cannot “reside” in 
the machine itself due to its nature, must be represented by one 
selected primary identifier, in the case of RDF by a URI. For an 
information system to be consistent with the described reality, these 
selected identifiers should map one-to-one to the ontological instances 
they stand for. Therefore, any instance of a class represented by a URI 
in RDF plays a dual role: it stands for the ontological instance and is 
an identifier for it (see also Meghini et al. 2014).

For practical reasons, we do not represent this duality by a recursive 
use of “P1 is identified by” from an instance to itself in its second 
capacity as an identifier. However, all other names and identifiers are 
related to the select primary identifier via “P1 is identified by”. This 
implies that the choice about which of multiple identifiers is the 
primary one may be changed without changing the meaning. In contrast, 
owl:same_as relates two primary URIs of things as different 
representation of the same real world thing, aggregating the properties 
of both representations as valid for the real world thing.

In practice, only the URIs, literals and datatypes “reside” themselves 
directly in a machine and need no additional identification because they 
are completely identified by their content.

We may distinguish four different kinds of Appellations: URIs, 
identifiers from local application contexts, literally defined names 
used in human written communication and names from oral communication 
and tradition. Typically, URIs and local identifiers have a unique 
representation as strings. However, the situation for names is more 
complex.

For instance, 北京is a literally defined name for the capital of China. 
“Bei Jing” is meant to be an representation of the same name in Latin 
characters (underspecified without accent marks), and not meant to be 
another name for the same city. “*Doerr* is a respelling of *Dörr*, a 
German surname^^[1] <#_ftn1>”. The most elaborate and effective good 
practice for registering proper names comes from the library community 
(Doerr, Riva and Zumer 2012). The FRBR Review Group of IFLA decided for 
practical reasons to identify a name (“Nomen” in their terminology) by 
the identical sequence of characters in a given script, not by the 
binary encoding.

For historical research however, in particular capturing oral tradition, 
this definition is too narrow, and we are confronted in relevant CRM 
applications with cases of names with spelling variants and even spoken 
variants. All cases of names that cannot uniquely be identified with a 
character sequence must be represented with a URI and *further 
properties of description must be added, by preference the newly 
proposed property “E90 Symbolic Object: has symbolic content”*. Also, if 
someone wants to document facts about a name other than its spelling, a 
URI must first be assigned, because a character string itself cannot be 
referred to in RDF. This case must not be confused with documenting 
facts about the relation between a name and a particular carrier of that 
name, because that would be a reification of this relation, and not 
talking about the name.

Summarizing, there are two cases:

a)A name or identifier is completely defined and identified by a 
character sequence or any digitally, unambiguously encoded symbol.

b)A name or identifier is identified but not defined by a URI.

As a matter of fact, RDFS provides the property rdfs:label, which 
implements exactly the case a) above, without the possibility to add 
descriptions of the name itself. SKOS specializes rdfs:label into 
properties such as skos:prefLabel and skos:altLabel, which define indeed 
the names by which things are called by people. We take therefore the 
use of rdfs:label as existing good practice. Consequently, we have to 
regard rdfs:label as a special case of “P1 is identified by”, and all 
literals used as range instances of rdfs:label implicitly as instances 
of E41 Appellation (see section “RDF implementation tests” item 1.).

Unfortunately, our KR languages have not foreseen the case that an 
instance of a datatype is also an instance of a user-defined class. This 
causes a range conflict, which can be overcome by “punning” the range of 
“P1 is identified by” to be both rdfs:Literal and E41 Appellation (see 
section “RDF implementation tests” item 2.).

This recommended implementation allows for using both models for 
Appellations, via an additional URI or directly as literal, and 
returning with one query all range instances of “P1 is identified by” 
following this interpretation. The SPARQL query result separates URIs 
from literals automatically. So, there is no ambiguity about the nature 
of the result.

Only if the same name is described both directly via rdfs:label and 
indirectly via a URI, the matching of both would need another query.

So, the frequently asked question remains, why not avoiding this double 
definition and describe any instance of E41 Appellation via another URI? 
The answer is, that actually the cases that require explicit 
representation of E41 Appellation are relevant but rare. On the other 
side, good practice requires all nodes in a semantic graph represented 
by a URI to carry a human-readable label in addition. This means that 
the storage volume and query performance would be heavily hampered by 
such a “pure-logic-driven” decision.

The only ambiguity that remains is the case in which the instance of 
Appellation is literally the URI itself, and not a URI representing an 
Appellation of different form. There are two solution to this problem: 
*Either classify this URI by the class of things it identifies and use 
owl:same_as, or we define a specific subclass of E41 Appellation “URI”.*

*Another question is, if label for the readability of the semantic graph 
should be distinguished from names used in the referred to world.*

*Tests:*

**

*asking for the subproperties of rdfs:label as follows:*

*

<rdf:Property rdf:about=*="http://www.w3.org/2000/01/rdf-schema#label">*

<rdfs:domain rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>

<rdfs:range rdf:resource=" http://www.w3.org/2000/01/rdf-schema#Literal "/>

*<rdfs:subPropertyOf rdf:resource="P1_is_identified_by"/>*

</rdf:Property>

Query (Give me all the superproperties of rdfs:label) :

select * where {

rdfs:label rdfs:subPropertyOf ?p

}

Result from Virtuoso:

p:

http://www.cidoc-crm.org/cidoc-crm/P1_is_identified_by

*

1.The ttl data that was presented previously has been added in virtuoso:

  @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .

  <http://example.com/person/alexander_the_great>

crm:P1_is_identified_by 
<http://example.com/appellation/alexander_the_great> .

<http://example.com/appellation/alexander_the_great>

rdfs:label "Alexander the Great" .

  <http://example.com/person/alexander_the_great>

rdfs:label "Alexander the Great" .

<http://example.com/person/alexander_the_great>

crm:P1_is_identified_by "Alexander the Great" .

2.A query to return all the “identifiers” of alexander the great using 
the is identified property was applied:

prefix crm: <http://www.cidoc-crm.org/cidoc-crm/>

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select * where

{ <http://example.com/person/alexander_the_great>crm:P1_is_identified_by 
?identifier }

*result: *

*identifier*

http://example.com/appellation/alexander_the_great

Alexander the Great

------------------------------------------------------------------------

^^[1] <#_ftnref1>https://en.wikipedia.org/wiki/Doerr

-- 
------------------------------------
  Dr. Martin Doerr
               
  Honorary Head of the
  Center for Cultural Informatics
  
  Information Systems Laboratory
  Institute of Computer Science
  Foundation for Research and Technology - Hellas (FORTH)
                   
  N.Plastira 100, Vassilika Vouton,
  GR70013 Heraklion,Crete,Greece
  
  Vox:+30(2810)391625
  Email: martin at ics.forth.gr
  Web-site: http://www.ics.forth.gr/isl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20181110/b74796b5/attachment-0001.html>


More information about the Crm-sig mailing list