[Crm-sig] ISSUE: E13 Attribute Assignment

Martin Doerr martin at ics.forth.gr
Tue Mar 27 18:54:12 EEST 2018


Dear Maximilian,

I had been a bit shorthand.

I referred to a knowledge graph theoretically, as CRM instance, without 
a quad feature, or looking at it as being a Named Graph. Of course 
current systems have these features built in.

The reason being, that only separating the Graph from the data about the 
Graph, be it quad or Named Graph or on paper, we can understand the 
logical problem.

If we look at the "metagraph" that is about a graph, it is again a 
simple graph. It does not have a provenance. Its provenance
could be another graph about this graph, and we are still at the same 
point.

This means, that ultimately, we have to rely on curators of information, 
that mediate trust in it. By "trusted source",
I do not mean a logical "TRUE", I mean that someone gives me information 
he himself trusts because of further knowledge,
and I trust that it is his best knowledge, regardless whether it turns 
out to be wrong later.

We may talk about degrees of trust, and plausibility, but that are only 
variant of trust. Trust is not a statistical probability.
If I get an image of a Duerer, I have no bl.. idea if it is a fake or 
not.  I would not assume that it is a fake, if I can trace its 
provenance to a museum. I would check the Website, if it suspicious or not.

The epistemological chain will go back to primary evidence. I assume 
that the museum knows how to connect the image to the original paper, 
that the paper has a credible chain of owners back to Duerer, or has 
been examined by analytical methods to be genuine.  Arguing about my 
image, I imply all this good practice was applied. Even if this is all 
described in the document, without the connection to the human curator 
there is no knowledge in it.

So, point b) means, the solution to the descriptive chain is that at the 
end there is a trusted source, i.e. a source I have no good reason to 
question, which may make errors, but follow a good practice of knowledge 
creation.

Your point c) below looks only at one level of metadata. If, e.g., I 
have an old physical book from 15th century in a library, I would rely 
on the author information and date in it. But I I would simultaneously 
rely on the physical book exhibiting features of that are compatible 
with the age and I would rely on the library curators not having 
smuggled in fakes over time. I would compare with other copies etc. Even 
if a fake is detected, I would rely on this human/material provenance to 
hypothesize how a fake could have come in. It will still be a form of 
(less;-))trusted source.

So, my point is, the trusted source can be directly responsible for the 
content. It may only be responsible for the metadata, or the 
metametadata, etc.

Each indirection is expected to rely on good practice and hence provide 
trust in the level below, i.e.,
the metametadescription about the metadescription, the metadescription 
about the description. Each level should describe where its confidence 
comes from.

At the end, there is always a (living) human being connecting data with 
the real world. In our information systems, we must keep that curation 
chain. No other information can save us from fake news, once they can be 
spread and multiplied without limits.

OR not?:-)

Best,

martin




On 3/24/2018 6:15 PM, Maximilian Schich wrote:
>
> Dear Martin,
>
> My "recommendation" was just putting into question an aspect of 
> Florian's suggestion, and not meant to replace it in a final way.
>
> Regarding your points: The practical cases I am familiar with would 
> use the E13 on the whole triple, i.e. the link/property-type including 
> a specific source node and a particular target node. This means either 
> the triple is stored as a quad, or the triple carries an ID or 
> address, so one can refer to it. TEI standoff markup would be another 
> practical example.
>
> As an art historian/archaeologist and hopeless class-conceptualist, I 
> do not believe in trusted sources. Everything comes with a probability. :)
>
> a) Self-description is of course never perfect, yet depends on the 
> density of information: A signature, as in "Martin performed 
> Attr.Ass.512" or "A[lbrecht] D[ürer] fecit" is only one form of 
> (self-)descriptive information, which is as good or bad as anything 
> else, internal or external. Of course, it is better to see Dürer in 
> detail or to hear Anne-Sophie Mutter actually play, rather than 
> relying on a verbal statement of attribution.
>
> b) I don't understand: Any graph-like description of a graph 
> constitutes a forest of graphs with the original graph, i.e. a 
> disconnected graph that contains the description of itself. If we 
> generalize that statement to symbolic representation, you are in 
> essence saying description is impossible.
>
> c) I think in most cases "description within a set of information 
> about its provenance" is the only thing we have. There is no default 
> up the next source of source. Evolutionary biologists, material 
> scientists, art historians working on renaissance drawings, and 
> scholars of ancient manuscripts all rely on hysteresis, i.e. history 
> of the object contained within the object. There never was a 
> comprehensive DNS for organisms, manuscript fragments, or paintings, 
> and there never will be. For the same reason we need to embed 
> provenance in our data sets. Probably we should even block-chain it in 
> with enough information, so we don't have to rely on simple signatures.
>
> */To make my case much more simple and short: "All of Wikipedia 
> includes the full edit-history". /*This is how it is produced, and how 
> it should be analyzed. The same standard should apply to any cultural 
> heritage data set. Any other practice would be like citing monographs 
> without pagination. This is why E13 is really central, particularly in 
> multi-authored data sets.
>
>
> On 2018-03-24 15:01, Martin Doerr wrote:
>> Dear Maximilian,
>>
>> This makes sense to me, but I do not agree with your recommendation 
>> as a general rule.
>>
>> There is a fundamental epistemological problem, which has nothing to 
>> do with quantitative evidence. The latter,
>> by the way, cannot detect an endless recursion anyhow, because people 
>> would break it.
>>
>> The ramifications of this breaking are huge, as can be seen by your 
>> answer.
>>
>> Let us start with a more fundamental construct, a simple 
>> CRM-compatible "knowledge graph" with one attribute:
>>
>> "Martin" has residence "Heraklion".
>>
>> Using an E13,
>> "Martin" performed "Attr.Ass.512". has type: "has residence"
>>   assigned: "Heraklion"
>>   assigned to: "Martin"
>> now reading it, I know the knowledge graph wants to make me believe 
>> who said "has residence", but I do not know, who introduced these 
>> three more attributes.
>> So, I reify the three new attributes with 9 more, and I am still not 
>> wiser, nor will I be with any other iteration of it.
>>
>> If I know that the knowledge graph *was produced by Martin as a 
>> trusted source as a whole*, I do not need the E13 in it.
>>
>> Then, I can add metadata to the whole knowledge graph, e.g., as a 
>> Named Graph or "context" or on paper etc. , but I am
>> still in the same situation: who produced these metadata, are they 
>> trusted?
>>
>> Hence, I conclude three things:
>>
>> a) There is no completely self-descriptive information. The trusted 
>> source ("sender of the message" in Claude Shannon's sense) lies 
>> outside the information unit. It must always be the default. In order 
>> to characterize the default, we need semantics  different from E13.
>>
>> b) It makes no sense to describe the default in the graph itself.
>>
>> c) Any description within a set of information about its provenance 
>> pushes the level where the default applies up to the next source of 
>> source. Hence, if a team decides to register actions of their 
>> members, the team as a whole pushes the default up to the trust in 
>> the registration, rather than in the primarily registered. I see all 
>> you examples as practices of this kind. There may be many reasons to 
>> do this, but in other cases also not to do it.
>>
>> Such a rule cannot replace understanding the basic epistemology, 
>> which is always the same.
>>
>> Does that make sense:-)?
>>
>> All the best,
>>
>> Martin
>>
>>
>>
>> On 3/24/2018 12:10 PM, Maximilian Schich wrote:
>>>
>>> Dear Florian and all,
>>>
>>> Based on quantitative evidence, I'd object to the following to part of your suggestion:
>>>
>>> "This fact must not individually be registered for all instances of properties provided by the maintaining team, because it*/would result in an endless recursion/*  of whose opinion was the description of an opinion."
>>>
>>> => This would only be correct if the maintaining team would add additional E13 Attribute Assignments to their own E13 statements. Otherwise,*/in practice, the data would (a) more or less double, plus (b) a 
>>> non-exploding truncated tail of additional E13 correction statements/**/, where the maintaining team corrects itself./*
>>>
>>> => Example for (a): In large data sets such as the "Census of Antique Works of Art and Architecture" the "record history" approximately doubles the data set as a whole. Note: The Census "record history" is the place where the maintaining team records their own E13-like/attribute //assertions /(aka/assertions of database record authorship/). It is important to point out that the record history, where an internal database curator implicitly claims authorship for say an artist attribution in the Census, is conceptually in no way different from an external author providing a differing opinion (both usually have PhDs in art history). Ergo there are two default cases: (1) The internal database curator claims authorship for a*/direct assertion/*  via a single E13 Attribute assignment in the record history; (2) The internal database curator claims authorship for a*/cited assertion/*  via an E13 attribute assignment in the record history on top of the*/original assertion/*  that connects the stated opinion to its external source via another E13 attribute assignment.
>>>
>>> => Example for (b): In large data sets where the multiplicity of opinion is recorded, the number of competing assertions including both record history and external opinions, is usually characterized by a tailed frequency distribution*. This usually means in practice that the data set stays in the same order of magnitude relative to the case where the maintaining team decides to follow one of the alternative assertions.**
>>> * The frequency distributions would look similar to Schich 2010 "Revealing Matrices" Fig. 14-8. Indeed, my pre-publication version of this figure had a column for the record history, not included in the article, as the networks were too large for the preceding figure.
>>> ** Yes, we should expect some "assertion cascades" to be exceedingly large, but we can also expect the median cascade length being very short, between 1 and 2 in cultural heritage databases based on personal experience, and still short in very large scale cases, such as spreading rumors on the Web (cf. Friggeri et al. 2014 "Rumour cascades" Fig. 5).
>>>
>>> => The recommendation, in my opinion, should be:*/By default, the maintaining team should establish authorship by 
>>> adding an E13 Attribute Assignment to each assertion in the data 
>>> set. Yet, the maintaining team should _only_ add an E13 Attribute 
>>> Assignment to their own E13 Attribute Assignments in the case of 
>>> discernible modifications, updates, or corrections. To avoid comment 
>>> cascades, such alternative E13 statements should be done in /**/*/parallel(!) not recursively.***/* This recommended procedure 
>>> establishes a record history and granular ability to cite data set 
>>> contributions by author, yet also avoids a recursive explosion of 
>>> E13 statements./*
>>> *** Parallel, means E13 statements in the internal record history should never be about statements in the record history itself. This can easily be maintained with users being logged in or recorded via IP and timestamp. Working example: The Wikipedia edit history.
>>>
>>>
>>> Hope this makes sense.
>>>
>>> Best, Max
>>
>> -- 
>> --------------------------------------------------------------
>>   Dr. Martin Doerr              |  Vox:+30(2810)391625        |
>>   Research Director             |  Fax:+30(2810)391638        |
>>                                 |  Email:martin at ics.forth.gr  |
>>                                                               |
>>                 Center for Cultural Informatics               |
>>                 Information Systems Laboratory                |
>>                  Institute of Computer Science                |
>>     Foundation for Research and Technology - Hellas (FORTH)   |
>>                                                               |
>>                 N.Plastira 100, Vassilika Vouton,             |
>>                  GR70013 Heraklion,Crete,Greece               |
>>                                                               |
>>               Web-site:http://www.ics.forth.gr/isl            |
>> --------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Crm-sig mailing list
>> Crm-sig at ics.forth.gr
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>
> -- 
> *Dr. Maximilian Schich*
> Associate Professor, Arts & Technology
> Founding member, The Edith O'Donnell Institute of Art History
>
> */The University of Texas at Dallas/*
> 800 West Campbell Road, AT10
> Richardson, Texas 75080 – USA
> US phone: +1-214-673-3051
> EU phone: +49-179-667-8041
>
> www.utdallas.edu/atec/schich/ <http://www.utdallas.edu/atec/schich/>
> www.schich.info <http://www.schich.info/>
> www.cultsci.net <http://www.cultsci.net/>
>
> Current location: Dallas, Texas
>
>
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig


-- 
--------------------------------------------------------------
  Dr. Martin Doerr              |  Vox:+30(2810)391625        |
  Research Director             |  Fax:+30(2810)391638        |
                                |  Email: martin at ics.forth.gr |
                                                              |
                Center for Cultural Informatics               |
                Information Systems Laboratory                |
                 Institute of Computer Science                |
    Foundation for Research and Technology - Hellas (FORTH)   |
                                                              |
                N.Plastira 100, Vassilika Vouton,             |
                 GR70013 Heraklion,Crete,Greece               |
                                                              |
              Web-site: http://www.ics.forth.gr/isl           |
--------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20180327/1fc50edd/attachment-0001.html>


More information about the Crm-sig mailing list