[Crm-sig] Issue: Solution for Dualism of E41 Appellation and rdfs:label

Richard Light richard at light.demon.co.uk
Fri Sep 14 19:54:35 EEST 2018

On 13/09/2018 20:57, Martin Doerr wrote:
> Dear Richard,
>>> What we need, to my opinion, is a property of Symbolic Object we may
>>> call it "has symbolic content" or "has symbolic content inline" or
>>> anything better, which defines that the symbolic content *is
>>> identical to* the Literal, *abstracted *to the "level of symbolic
>>> specificity" that the Literal implies and that conforms to the
>>> identity condition of the Symbolic Object, i.e., characters of a
>>> certain script, or whatever. That would make the meaning of the
>>> "value" unambiguous.
>> Again, I'm in complete agreement with this line of thought.  One
>> decision we should make is whether this property forms part of the
>> generic CRM framework, or if it is to be an implementation-specific
>> property which only appears in our RDF implementation of the CRM.  My
>> instinct is for it to go into the CRM proper: the treatment of
>> Symbolic Object and its subclasses would I think be made clearer by
>> the addition of this property.
> For CRM proper!
OK: perhaps we should start a new issue to address this?
>> It's worth bearing in mind that RDF strings have a built-in mechanism
>> for specifying the language of the string.  This would allow us to
>> express, for example, a place name in multiple languages by simply
>> having one 'has symbolic content' property per language, each with an
>> associated string.
>>> We may need add another property, such as "is contained in" or so
>>> pointing to a URL actually holding an instance of its content, again
>>> abstracted to the "level of symbolic specificity" that the file
>>> instance implies and that conforms to the identity condition of the
>>> Symbolic Object.
>> I think that we would benefit from some use cases which demonstrate
>> the practical need for this property.  My own instinct is that if we
>> are really just recording a string value, then it is overkill to
>> assign it a URL and put it somewhere 
> I made a jump here. This is for things like a (standardized) text of
> Aristotle in a MS Word document, and in a .html file. If I mean the
> text alleged to Aristotle, I obviously do not mean the type face in MS
> Word to belong to Aristotle's text, nor html layout instructions.
> Means, that both contain the precisely the same text, but are
> themselves different, because they are richer in information, which
> are modern renderings. All three, the standardized text of Aristotle,
> the MS Word representation and the html representation are different
> Symbolic Objects, but one is contained in the other two.
I see: thanks. Yes, this would indeed be a different property, and in
fact the URL concerned will not be a 'Linked Data' URL, since it will
address a non-RDF resource.  So that forms a different discussion,
perhaps?  To me, the Word document and HTML page sound like
"attestations" of the Aristotle text (as the Pelagios/Linked Pasts
people would say).  Another example would be a photograph of a plaque
containing a text of interest.  This would also be an attestation; the
difference is that the text in question is not encoded within the
digital resource.

I'm certainly interested in the interface between the Linked Data world
and the digital humanities world of TEI and Word documents and the
like.  Techniques like Open Annotation [1] could well have a useful role
to play here.

>> My suggestion is that we define the "has symbolic content" property,
>> and then put our energy into agreeing one or more subproperties of
>> rdf:value which meet the known recording requirements for cultural
>> heritage information.  By doing this, I suggest that we will have
>> solved the main problem which confronts implementors who want to
>> express CRM in RDF.
> Yep, subproperty of rdf:value is not bad.
>>> I think the polymorphism we describe here, well studied in
>>> object-oriented languages, is in the nature of Appellations. The
>>> problem for me is, that the the respective KR models have NOT
>>> THOUGHT of the case that such polymorphisms can occurr.
>>> Nevertheless, RDFS is tolerant enough to accept the Superproperty
>>> statement, but not to create a class which is either URI or *inline
>>> expanded* object.
>>> This polymorphism occurs EXCLUSIVELY for Symbolic Objects with
>>> symbol sets a certain machine supports. Another reason not to use
>>> rdfs:value, because it does not give credit to the fact that only
>>> Symbolic Objects can have such a "value".
>> I'm afraid you have lost me here. It would be very helpful to me (and
>> might encourage others to join in the conversation) if you could post
>> one or two concrete examples of what you mean.
> OK, in simple words: there are names which have an identity based on a
> certain sequence of characters. There are others, historically
> interesting, which have a phonetic identity, and even that may vary.
> We collaborate with historians, that deal with family names in the
> Aegean area around 1800, which have no standard spelling at all, not
> even a preferred one. The different spelling variants have later
> evolved into distinct family names. But in order to match instances in
> the documents, we need both concepts of identity.
True, but any instance of the name in a document will only take one
concrete form, not all of them.  (For handwritten sources it may be a
matter of judgement what that form actually is.)  So you can record the
form of name it exhibits (as a string), and then assert that it is (in
your view) an attestation of the generic family name for which you have
a URI.
> Even my ancestors used "Derr" instead of "Dörr". Since the local
> dialect does not distinguish "e" and "ö", it is unclear if it is a
> spelling variant of the same phonetics or if the "ö" is an
> etymological misinterpretion, because "Dörr" has a linguistic meaning
> and the "e" in "Derr" may have another semantic root, but this is not
> widely accepted.
> So, the names that are not identical to a Literal must be represented
> using a URI. That is what I mean by polymorphism.  Also, if we want to
> talk about the name itself as a historical fact, we need a distinct
> identity. All these cases are needed but rare for names. 
There are perfectly good reasons for considering names to be worthy of
study and recording in their own right.  I would argue that this is
equally true whether the name in question has one, or many, possible
forms.  So there is always an argument for minting a URI to represent
the name as a Symbolic Object. Doing this allows you to make statements,
for example, about its genesis, its meaning, its historical
distribution, etc., and means you can record specific instances of the
name as attestations of this Symbolic Object.

However, I would still argue that /instances /of the name should be
recorded as strings - the actual value found in the resource in question.

> For texts, it is the opposite. They are more often in files than in
> literals.
> On the other side, only Symbolic Objects can "reside" on computers and
> outside. Therefore the "punning" problem does only occur in connection
> to Symbolic Objects. Only these can have a "value" in the machine,
> whereas rdfs:value may be about anything.

[1] https://www.w3.org/community/openannotation/

> Best,
> Martin
>> Best wishes,
>> Richard
>>> I agree that we may over-think the point. As I mentioned, the
>>> superproperty statement I propose has no other effect than that I
>>> can get E41's and labels back by querying P1 only.
>>> Opinions?
>>> Best,
>>> Martin
>>> On 9/12/2018 9:56 AM, Richard Light wrote:
>>>> On 11/09/2018 20:02, Martin Doerr wrote:
>>>>> Dear All,
>>>>> Firstly, apologies, the RDF was wrong, it was intended to be P1 is
>>>>> superproperty of rdfs:label.
>>>> I'm not sure that this is something we need to state at all, and I
>>>> worry that - if it is included in our RDFS Schema - it may bring
>>>> unwanted side-effects.  Isn't this saying that any instance of
>>>> rdfs:label is to be treated as an instance of P1?  Bear in mind
>>>> that CRM data may co-exist in triple stores in company with other
>>>> RDF data, which may well use rdfs:label for its own purposes.  This
>>>> assertion that 'all rdfs:labels are P1 relationships' would then be
>>>> applied to this other data as well.  This might well result in
>>>> incorrect/spurious results when SPARQL queries are applied to the data.
>>>> In general, I suggest that we are ok to define
>>>> sub-classes/properties of standard RDFS types, but we shouldn't
>>>> define super-classes/properties of them.  (I would welcome comments
>>>> on the validity of this suggestion from someone who understands RDF
>>>> better than me.)
>>>>> Semantically, the range of rdfs:label, when used, is ontologically
>>>>> an Appellation in the sense of the CRM.
>>>> Agreed (see my reply from yesterday).  The conclusion I draw from
>>>> this is that we can validly say:
>>>> E1 rdfs:label "string value" is a shortcut for the path 'E1 CRM
>>>> Entity' 'P1 is identified by' 'E41 Appellation' ...
>>>> in exactly the same spirit as the similarly-worded note which we
>>>> find in the definition of P1 itself. (Obviously, by using this
>>>> shortcut, we lose the information that this string value is an
>>>> Appellation, but that's the nature of short-cuts.)
>>>>> I agree with George, that all RDF nodes should have a human
>>>>> readable label. They name the thing, even if it is a technical node.
>>>>> I would find it confusing to say, labels are not to be queried,
>>>>> only to be read, and the "real" names must have a URI,
>>>>> regardless weather I have more to say about it.
>>>>> I am really not a fan of punning, we definitely forbid it in the CRM.
>>>>> The point with Appellations is that some, the simple ones, can
>>>>> directly be represented in the machine, or be outside. The
>>>>> solution to assign a URI in all cases, and then a value or label,
>>>>> does not make the world easier. It is extremely bad performance.
>>>>> We talk here about implementation, not about ontology.
>>>>> You get simply a useless explosion of the graph for a purpose of
>>>>> theoretic purity.
>>>> Agreed. What we need to do is to propose a simple way of expressing
>>>> simple Appellations in RDF.  That is why my shortcut definition
>>>> above ends with '...': I don't think we have yet decided how to do
>>>> this.
>>>> I've just been looking over the draft document we are trying to
>>>> write, and it currently says that a fully-worked-out path will use
>>>> 'P3 has note -> E62 string' to express the value of an E41
>>>> Appellation.  This (i.e. the suggestion to use P3) comes from the
>>>> definition of the superclass E90 Symbolic Object.  A comment in our
>>>> draft RDF document questions whether this is sufficiently precise,
>>>> since P3 is simply "a container for all informal descriptions about
>>>> an object that have not been expressed in terms of CRM
>>>> constructs".  I suggest that we need either to use rdfs:value to
>>>> hold the string value, or (better) to define a CRM-specific
>>>> subproperty of rdfs:value and use that.  (This subproperty could be
>>>> part of the published CRM, or it could just form part of the 'RDF
>>>> implementation' guidelines.)  I don't think that we should use
>>>> rdfs:label here.
>>>> I don't think we should concern ourselves with URLs in our RDF
>>>> guidance document.  Any implementer of our RDF solutions can choose
>>>> to assign a URL to represent any node in the structure, but it
>>>> won't change the logic of the resulting RDF, or how it responds to
>>>> SPARQL queries.
>>>>> Those claiming confusing should be more precise. Has someone
>>>>> looked at query benchmarks? Has someone looked at graphical
>>>>> representations of RDF graphs. Do they really look better?
>>>>> So either we either ignore the issue, and write queries that
>>>>> collect names either via P1, URI and a value/label, or via a
>>>>> label, because this is where names appear in RDF, we make no
>>>>> punning, but our queries implement exactly this meaning. So, we
>>>>> are not better, but do as if we wouldn't know.
>>>>> Or, we describe the fact by punning, have one superproperty for
>>>>> all cases, which we can query, and stop thereby the discussion if
>>>>> labels are allowed or not, and how they relate to appellations.
>>>>> The punning comes in, because the range of the superproperty must
>>>>> comprise the ranges of the subproperties. We can play a bit more,
>>>>> make the punning with a superproperty of P1, and have both P1 and
>>>>> rdfs:label subproperties of it, if this is preferred.
>>>>> The solution I describe is just a logical representation of the
>>>>> situation, not creating a different situation. It just says that
>>>>> names can be complex objects or simple literals.
>>>> As I said yesterday, I don't see how any punning strategy can make
>>>> differently-structured RDF equivalent for the purposes of querying.
>>>> Therefore, I think we will have to accept that if we allow more
>>>> than one way of representing a given statement in CRM RDF, we will
>>>> have to construct queries which look explicitly for each of the
>>>> possible patterns.
>>>>> The problem is, that the RDF literals do have meaning beyond being
>>>>> symbol sequences.
>>>> Insofar as they have such meaning, I would argue that we define it
>>>> (i.e. that meaning) by the CRM context in which we place the
>>>> string/literal value.  I think there is a danger that we could
>>>> over-think this problem.
>>>> Richard
>>>>> The punning does not introduce the problem. With or without, the
>>>>> queries have to cope with names in either form.
>>>>> This holds similarly for space primitives and large geometry
>>>>> files, for short texts and equivalent files etc.
>>>>> Opinions?
>>>>> Best
>>>>> Martin
>>>> -- 
>>>> *Richard Light*
>>>> _______________________________________________
>>>> Crm-sig mailing list
>>>> Crm-sig at ics.forth.gr
>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>> -- 
>>> --------------------------------------------------------------
>>>  Dr. Martin Doerr              |  Vox:+30(2810)391625        |
>>>  Research Director             |  Fax:+30(2810)391638        |
>>>                                |  Email: martin at ics.forth.gr |
>>>                                                              |        
>>>                Center for Cultural Informatics               |
>>>                Information Systems Laboratory                |
>>>                 Institute of Computer Science                |
>>>    Foundation for Research and Technology - Hellas (FORTH)   |
>>>                                                              |
>>>                N.Plastira 100, Vassilika Vouton,             |
>>>                 GR70013 Heraklion,Crete,Greece               |
>>>                                                              |
>>>              Web-site: http://www.ics.forth.gr/isl           |
>>> --------------------------------------------------------------
>>> _______________________________________________
>>> Crm-sig mailing list
>>> Crm-sig at ics.forth.gr
>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>> -- 
>> *Richard Light*
>> _______________________________________________
>> Crm-sig mailing list
>> Crm-sig at ics.forth.gr
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
> -- 
> --------------------------------------------------------------
>  Dr. Martin Doerr              |  Vox:+30(2810)391625        |
>  Research Director             |  Fax:+30(2810)391638        |
>                                |  Email: martin at ics.forth.gr |
>                                                              |        
>                Center for Cultural Informatics               |
>                Information Systems Laboratory                |
>                 Institute of Computer Science                |
>    Foundation for Research and Technology - Hellas (FORTH)   |
>                                                              |
>                N.Plastira 100, Vassilika Vouton,             |
>                 GR70013 Heraklion,Crete,Greece               |
>                                                              |
>              Web-site: http://www.ics.forth.gr/isl           |
> --------------------------------------------------------------
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig

*Richard Light*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20180914/785597bc/attachment-0001.html>

More information about the Crm-sig mailing list