[Crm-sig] Issue 230 Co-reference

martin martin at ics.forth.gr
Thu Mar 27 11:35:31 EET 2014


Hi Maximilian,

Yes, sure! Self-evident that you would not state derived links manually. 
The process we describe here here is based on scholarly insight. Only 
those links are described, which result
directly from insight, and are not already deduced due to transitivity.
The minimal number of links is always the same.
The deductions, the rest of the n*(n-1), can be managed by a program, 
even be calculated at query time from the original statements.

A "high error rate in manual curation" does not make sense for the rest:
  We talk here about primary knowledge from scholarly research. There is 
no other source than curatorial knowledge, regardless whether it is 
supported by an automated "instance matching
algorithm", which produces identity assumptions, or detected manually.

See also:

 1. Meghini, C., Doerr, M., & Spyratos, N. (2009). Managing Co-reference
    Knowledge for Data Integration. /Proceeding of the 2009 conference
    on Information Modelling and Knowledge Bases XX/, (pp. 224-244).
    Amsterdam, The Netherlands, The Netherlands: IOS Press
    (978-1-58603-957-8), (pdf
    <http://www.ics.forth.gr/_publications/ejc08-final.pdf>).
 2. Doerr, M., Meghini, C., & Spyratos, N. (2007). Leveraging on
    Associations - a New Challenge for Digital Libraries. /In Proc of
    the First International Workshop on Digital Libraries Foundations In
    conjunction with ACM IEEE Joint Conference on Digital Libraries
    (JCDL 2007)/, Vancouver, Canada, 23 June. (pdf
    <http://www.ics.forth.gr/_publications/Martin%20Canada%20paper%20-%20Leveraging.pdf>).


See VIAF, they have created the co-reference network to over 30% 
manually. Only 69% could be done by an algorithm that was manually! 
confirmed to be sufficiently reliable.
Simple references to "unknown objects" miss the point: Who said that 
they are the same?

I'll take your point to add this intention to the scope note!

Best,

Martin


On 26/3/2014 10:15 ??, Maximilian Schich wrote:
> Whoever collects co-references: The number of co-reference links 
> explodes n*(n-1) with the number of references to the reference 
> object. Imagine co-reference links between all books citing the bible. 
> This is likely to result in a high error-rate, especially in manual 
> curation.
>
> Workaround: Use simple references to an "unknown/potential reference 
> object" and put a probability on those links => scales with n.
> Max Schich
> On 2014-03-26 14:26 , martin wrote:
>> Dear All,
>>
>> Here my homework:
>>
>>
>>       E91 Co-Reference Assignment
>>
>> Subclass of:E13 Attribute Assignment
>>
>> Scope note:This class comprises actions of making the assertion 
>> whether two or more particular instances of E89 Propositional Object 
>> refer to the same instance of E1 CRM Entity. The assertion is based 
>> on the assumption that this was an implicit fact being made explicit 
>> by this assignment. Use of this class allows for the full description 
>> of the context of this assignment. (MD will write an extension about 
>> the levels of belief)
>>
>> A co-reference assertion may admit a certain degree or strength of 
>> belief, such as "possibly", "most likely" etc. This can be modelled 
>> using the property /P2 has type/ with a suitable terminology. 
>> However, this degree of belief will be common to all statement 
>> asserted by one instance of E91 Co-Reference Assignment. Otherwise, 
>> the assertion must be broken down into a suitable number of instances 
>> with different degrees of belief.
>>
>> If there exists a document describing particular evidence, this can 
>> be referred to by using /P used specific object/. There may nothing 
>> more be known about the instance of E1 CRM Entity to which the 
>> described statements are assumed to refer to than the facts expressed 
>> by these very statements.
>>
>> Frequently, scholars may like to contradict to a co-reference 
>> statement or point to frequent confusions. This can be modelled using 
>> the property /P154 <#_P154_assigned_non>//assigned non co-reference to./
>>
>> The property /P155 <#_P155_has_co-reference>//has co-reference 
>> target/allows for associating an ???
>>
>> //
>>
>>
>> In the end, I got confused: The range of P155 can be interpreted as a 
>> URI used within the same knowledge base as the instance of E91. Then, 
>> it would correspond to a co-reference between some text element and 
>> the knowledge base in which we implement the CRM, the "local truth".
>> In that case, also one instance of P153 would make sense, even two 
>> instances of P155 only.
>> In case we talk about Linked Open Data, the issue becomes more 
>> obscure. We could regard the co-reference to be between some text 
>> element and the document the URI resolves into.
>> If however someone uses this very URI in another context, the 
>> question of co-reference is again there.
>>
>> It appears as if we need a construct to refer to the use of a URI 
>> within a knowledge base or RDF document as an instance of 
>> Propositional Object. If we follow this line, then the interpretation 
>> of P155 pointing to a "self co-reference" would be consistent, and 
>> any other
>> meaning of referring to a URI would need a contextualization of the 
>> URI to be discussed.
>>
>> Opinions?
>>
>> Best,
>>
>> Martin
>> -- 
>>
>> --------------------------------------------------------------
>>   Dr. Martin Doerr              |  Vox:+30(2810)391625        |
>>   Research Director             |  Fax:+30(2810)391638        |
>>                                 |  Email:martin at ics.forth.gr  |
>>                                                               |
>>                 Center for Cultural Informatics               |
>>                 Information Systems Laboratory                |
>>                  Institute of Computer Science                |
>>     Foundation for Research and Technology - Hellas (FORTH)   |
>>                                                               |
>>                 N.Plastira 100, Vassilika Vouton,             |
>>                  GR70013 Heraklion,Crete,Greece               |
>>                                                               |
>>               Web-site:http://www.ics.forth.gr/isl            |
>> --------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Crm-sig mailing list
>> Crm-sig at ics.forth.gr
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>
>
>
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig


-- 

--------------------------------------------------------------
  Dr. Martin Doerr              |  Vox:+30(2810)391625        |
  Research Director             |  Fax:+30(2810)391638        |
                                |  Email: martin at ics.forth.gr |
                                                              |
                Center for Cultural Informatics               |
                Information Systems Laboratory                |
                 Institute of Computer Science                |
    Foundation for Research and Technology - Hellas (FORTH)   |
                                                              |
                N.Plastira 100, Vassilika Vouton,             |
                 GR70013 Heraklion,Crete,Greece               |
                                                              |
              Web-site: http://www.ics.forth.gr/isl           |
--------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20140327/a98e1bd5/attachment-0001.html>


More information about the Crm-sig mailing list