[Crm-sig] P72 has Language

Franco Niccolucci franco.niccolucci at gmail.com
Tue Oct 15 04:50:16 EEST 2019


Dear all,


having somehow started this discussion in a hot August evening, let me remind you that the initial question was:

"When describing biographical information [in an archive] it’s common to state that some person was fluent in some language, or languages, apart from his/her native one. Using current archival descriptions standards [ISAD(G) 3.2.2; EAD <bioghist>] this is represented within a text, usually a very long text string with information of distinct natures. So far we have been able to decompose the different elements and represent them adequately as instances of CIDOC-CRM classes and link them trough the suitable properties.
We cannot link a Person (E21) to a language (E56) and neither use multiple instantiation, as it has been suggested in other cases (http://www.cidoc-crm.org/Issue/ID-258-p72-quantification), because Person (E21) and Linguistic Object (E33) are disjoint.”

I understand these bios consist in a text, and metadata are added to it as instances of various CIDOC-CRM classes. The question was how to indicate in such metadata the knowledge of a language as reported in the bio: so not a real quality of the person, but a fact documented. My suggestion was to use E74 Group. I always prefer to use what is already available and avoid the unnecessary proliferation of classes and properties, in my opinion there are already (more than) enough. But in doing so I try to maximize expressiveness, as otherwise one class (E1 CRM Entity) and one property (P2 has type) would be sufficient for the whole world: P2 is not a jack-of-all-trades. 

Reportedly, the Group solution seemed to please the person who made the question.

I don’t know if the "language spoken" is an information usually taken into account in CH; but in this case it was by the archivist, otherwise no question would have been aaked.

Best regards

Prof. Franco Niccolucci
Director, VAST-LAB
PIN - U. of Florence
Scientific Coordinator
ARIADNEplus - PARTHENOS

Editor-in-Chief
ACM Journal of Computing and Cultural Heritage (JOCCH) 

Piazza Ciardi 25
59100 Prato, Italy


> Il giorno 14 ott 2019, alle ore 22:39, Detlev Balzer <db at balilabs.de> ha scritto:
> 
> Dear George, Martin,
> 
> this discussion made me curious whether or not I can confirm George's assertion that such statements are common in the cultural heritage field.
> 
> EAC-CPF does have a language element, which is, however, only used to indicate in which language the name of a person or corporation is expressed. 
> 
> GND, the authority file for libraries in German-speaking countries, has a Language entity which is used for making statements about the "field of study" of a person. Other predicates for the person-language pair of entities do occur, but these are obvious data entry errors.
> 
> Having extracted person-related data from a dozen or more cultural heritage projects, I don't remember any example where languages spoken or known by somebody have been considered in any other sense than relating to the documented activity, rather than to the (possibly un-instantiated) capacity of the person.
> 
> Of course, this is just an observation that doesn't prove anything. Personally, I would tend towards Martin's view that there is little, if anything, to be gained by defining such kind of statement in a reference model such as the CIDOC CRM.
> 
> Best wishes,
> Detlev
> 
>> George Bruseker <george.bruseker at gmail.com> hat am 14. Oktober 2019 um 19:45 geschrieben:
>> 
>> 
>> Dear Martin,
>> 
>> The conversation began with a use case from an archive. I just inform that
>> this is also found in all the projects I work on for memory institutions.
>> They find it in scope, so looking further afield for what anthropologists
>> do doesn't seem like a necessary step? Though highly fascinating!
>> 
>> Best
>> 
>> George
>> 
>> 
>> 
>> On Mon, Oct 14, 2019, 6:58 PM Martin Doerr <martin at ics.forth.gr> wrote:
>> 
>>> Dear George, All,
>>> 
>>> As a second thought:
>>> 
>>> I think documentation formats such as LIDO are an adequate place to add
>>> such useful properties to characterize items in a more detailed way, we
>>> would not put in the CRM analytically. Shapes, colors etc. being typical
>>> examples.
>>> 
>>> Question: Are there formats from the archival world that use to describe
>>> the languages people speak? EAD CFP?
>>> Libraries are interested in the languages someone publishes in, not
>>> speaking.
>>> 
>>> What are the anthropologists registering? Would they be interested in
>>> languages learned at school, or rather in the language used for
>>> communication in a typical group? Would they document people being
>>> incapable of communicating in that group?
>>> Or just infer language via group?
>>> 
>>> How to distinguish native speakers from non-native?
>>> 
>>> Would historians make cases of people that could not communicate in a
>>> given language, with societal effects?
>>> 
>>> What about illiterate people? Speaking, not writing...? Maintaining oral
>>> history with great precision, etc.
>>> 
>>> What about creoles ?
>>> 
>>> Best,
>>> 
>>> Martin
>>> 
>>> On 10/14/2019 7:33 PM, Martin Doerr wrote:
>>> 
>>> 
>>> Dear George,
>>> 
>>> The first principle of all is are there relevant queries that need that
>>> property for integrating disparate sources, which indeed provide such data,
>>> and is that research one we like to support with the CRM?
>>> 
>>> Second, using p2 on E21 does the job, doesn't it? What is the added value
>>> of "knows language"?
>>> 
>>> Next principle, keep the ontology small. Querying 1000 properties is
>>> already more than anybody can keep in mind. Each additional property has an
>>> implementation cost. We need strong arguments for relevance.
>>> 
>>> It has been the mos t important success factor of the CRM to keep the
>>> ontology small and still expressive enough. If we loose this discipline, we
>>> will loose the whole project.
>>> 
>>> Finally, we are not repeating in the CRM the way typically information
>>> systems document, but always tried to find a more fundamental
>>> representation. With that argument, we could never have introduced events.
>>> They did NOT appear in any of the typical systems at that time. It is a
>>> principle *not *to model all the valuable description elements, which are
>>> relevant to characterize an item, but not creating interesting links across
>>> resources.
>>> 
>>> I did not say that it is a personal opinion that someone speaks a
>>> language. I said, this is observable. I document: Franco has spoken Latin,
>>> repeatedly? But talking about skills, is another level, it introduces a
>>> quality, which is hard to objectify, as Franco has pointed out. Actually,
>>> it is a typical classification problem, with all its boundary case
>>> questions, and the CRM is about relations between particulars.
>>> 
>>> So, what is the* added value* against p2, and what are the typical
>>> research data and typical research questions for *integrating* such data,
>>> that cannot be answered with P2?
>>> 
>>> Best,
>>> 
>>> martin
>>> 
>>> 
>>> 
>>> 
>>> On 10/14/2019 4:24 PM, George Bruseker wrote:
>>> 
>>> Dear Martin,
>>> 
>>> Which is CEO’s proposition that you support? It gets lost in the string.
>>> Do you mean that a) a person speaking a language means being part of a
>>> group, or b) using the p2 on E21 and then make types for ’Speakers of...'
>>> 
>>> I am (still and very much ) a supporter of a new property ‘knows
>>> language'. I do not think that the group solution works because of the
>>> identify criteria of groups. I also don’t think the event solution is
>>> necessary (another suggestion that has floated in this conversation). It is
>>> often the case that for person we do not know events of their acquisition
>>> or use of language or a skill but we do have proposition that they had that
>>> language or skill! I also don’ t support the ‘English Speakers’ type
>>> solution since it provides a different URI than the URI for ‘English’ and
>>> forces more, obscure, modelling.
>>> 
>>> Another CIDOC CRM principle is model at the level of knowledge that is
>>> typically present in information systems. Again, I think the present case
>>> (people know languages) is identical to the case of
>>> 
>>> E22 consists of E57 Material
>>> 
>>> This is a typical piece of knowledge held about an object. It would be
>>> obtuse to insist that one should create an event node to indicate the
>>> manner of this material becoming the constituting material of the object
>>> when we don’t know this fact. This is why CRM represents such binary
>>> relations, because they are real, they are a level of knowledge and they
>>> are observable.
>>> 
>>> If someone has entered into an information system George: English, Pot
>>> Making, it is unlikely that what they want to reconstruct are instances of
>>> me using English or performing Pot making. Rather they are interested that
>>> there is an individual which has a particular formation which means that he
>>> knows language x, knows skill x. This information is probably used in an
>>> actual integration to connect an instance of E21 via an instance of E57
>>> Language to for example E33 that use the same E57.
>>> 
>>> It would seem we need some sort of hierarchy in the principles which can
>>> also be conflicting.
>>> 
>>> 
>>> My approach is not documenting skills*.* My approach is documenting
>>> facts, rather than potentials. I take notice and may document that you
>>> spoke Latin, as I have done last time at school. I have a document stating
>>> my grade in Latin at high school.  My grade at high school confirms a set
>>> of years of continued successful lessons, not that I could understand much
>>> Latin now;-).
>>> Speaking a language can be documented as an extended (observed) activity,
>>> as in FRBRoo.
>>> 
>>> 
>>> It may be, but is it typically? I have never seen an information system,
>>> especially in museum context that would.
>>> 
>>> For instance, someone writing books in particular language. This falls
>>> under any kind of extended activity not further specified, such as an
>>> artist using a technique for some time, and avoids transforming actual
>>> activities into potentials.
>>> 
>>> We can document someone's documented opinion about a potential of a
>>> person, as an information object.
>>> 
>>> 
>>> That would make this information mostly unusable however. If our goal is
>>> to functionally use the observation person x speaks language y, then it
>>> needs to be semantically represented and not made a string.
>>> 
>>> 
>>> In the "Principles for Modelling Ontologies" we refer:
>>> "7.2 Avoid concepts depending on a personal/ spectator perspective"
>>> 
>>> This could be elaborated more. In the CRM, we do not model concepts
>>> "because people use them", but because they can be used to integrated
>>> information related to them with URIs.  Therefore, your arguments and what
>>> I wanted to say is, "skill" is a bad concept for integration. What should
>>> be instantiated are the observable activities, which may or may not
>>> indicate skills.
>>> 
>>> 
>>> I don’t see that this principle applies. It is not a personal perspective
>>> that someone speaks a language, anymore than it is a personal perspective
>>> that an object is constituted of a material. This fact can be documented
>>> and observed. Someone else can come and do the same. Don’t believe Franco
>>> can speak Latin? Watch him and see if he can. When someone writes in an
>>> information system, they probably typically mean, some evidence leads me to
>>> assert Person y knows language y. They do not mean to say at some point in
>>> the past he learned it, or at some point he performed it.
>>> 
>>> In the case of documenting that someone knows a language this can be used
>>> practically to integrate using URIs just in case we use the same URI for
>>> English that we use to describe a document and that we use to describe the
>>> knowledge of the individual
>>> 
>>> E21 knows language E57 Language URI:AA
>>> E33 has language E57 Language URI:AA
>>> 
>>> answers the query, who in this graph knew the language this document was
>>> written in.
>>> 
>>> Functionally, the issue for me  is, is there a good reason against adding
>>> a binary property off of person which can indicate their knowledge ability
>>> and connect to a well known URI for a language.
>>> 
>>> Best,
>>> 
>>> George
>>> 
>>> 
>>> --
>>> ------------------------------------
>>> Dr. Martin Doerr
>>> 
>>> Honorary Head of the
>>> Center for Cultural Informatics
>>> 
>>> Information Systems Laboratory
>>> Institute of Computer Science
>>> Foundation for Research and Technology - Hellas (FORTH)
>>> 
>>> N.Plastira 100, Vassilika Vouton,
>>> GR70013 Heraklion,Crete,Greece
>>> 
>>> Vox:+30(2810)391625
>>> Email: martin at ics.forth.gr
>>> Web-site: http://www.ics.forth.gr/isl
>>> 
>>> 
>>> _______________________________________________
>>> Crm-sig mailing listCrm-sig at ics.forth.grhttp://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>> 
>>> 
>>> --
>>> ------------------------------------
>>> Dr. Martin Doerr
>>> 
>>> Honorary Head of the
>>> Center for Cultural Informatics
>>> 
>>> Information Systems Laboratory
>>> Institute of Computer Science
>>> Foundation for Research and Technology - Hellas (FORTH)
>>> 
>>> N.Plastira 100, Vassilika Vouton,
>>> GR70013 Heraklion,Crete,Greece
>>> 
>>> Vox:+30(2810)391625
>>> Email: martin at ics.forth.gr
>>> Web-site: http://www.ics.forth.gr/isl
>>> 
>>> _______________________________________________
>>> Crm-sig mailing list
>>> Crm-sig at ics.forth.gr
>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>> 
>> _______________________________________________
>> Crm-sig mailing list
>> Crm-sig at ics.forth.gr
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
> 
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig




More information about the Crm-sig mailing list