[Crm-sig] Question: How to model a 'file'

Martin Doerr martin at ics.forth.gr
Thu Apr 16 22:05:56 EEST 2020


...in addition, we should think about digital images of printed material 
providing the identity of content for a lost physical item...

On 4/16/2020 9:46 PM, Martin Doerr wrote:
> Dear George,
>
> You are right, this is an open and important question.
>
> I have repeatedly pointed to that issue in CRM-SIG Meetings, but with 
> limited response so far.
>
> One part of the discussion had been in CRMInf, about the equivalence 
> of a file with a Proposition Set.
>
> The other part has been when introducing P190 has symbolic content. My 
> remark is not even in the minutes, that this must be discussed. May be 
> I forgot a homework to elaborate this more. Let me expand here a bit 
> my current understanding:
>
> This: E41 Appellation p190 has symbolic content df:literal "file name 
> value goes here" feeds only the name of the file, which identifies the 
> file, into the Appellation. It could be just an rdf label, because the 
> content of the Appellation is hardly ambiguous. On the other side, 
> reserving another node for the Appellation allows for assigning a type 
> "filename" to it. But a filename is anyhow not a good identifier.
>
> If the Digital Object is represented by a URI, e.g., a DOI, the 
> remaining question is, if it resolves or can unambiguously be related 
> to an external content or not.
>
> If it does, then the identity of this Digital Object should be the 
> "primitive" one, its binary identity. I.e., a .pdf and .doc of the 
> same scientific publication would be different objects, even a .doc 
> with changes in embedded metadata would make it different.
>
> If we mean however that the ontological identity is, for instance, 
> that of the equivalence class of possible encodings of one certain 
> publication following Springer rules or so, the URI pointing to a 
> binary is misleading, because many files can represent the same 
> publication.  The different encodings will both /incorporate/ and 
> /represent /the respective publication, but both properties are not 
> identifying the content.
>
> Therefore, a variation (not subproperty) of P190 should  do it. We 
> have again the problem, that we need to form a common superclass with 
> a Primitive Value.
>
> Perhaps, once we have done the great step and declared some Primitive 
> Values as IsA Appellations, the most elegant form would be to form a 
> superclass of E62 String and  Digital Object, and raise the range of 
> P190 to it. This would elegantly make clear that E62 String and 
> Digital Object differ only in the fact if they are in or out of the KB 
> proper.
>
> If we do that, the range of P190 will again point to a URI, which, in 
> this case, either must be the binary, or a lower representation than 
> the level of symbolic specificity given for the domain instance. In 
> any case, we should reach at a "tangible" binary, and a suitable type 
> to distinguish, if the URI is meant to correspond to a real binary 
> (even if no more extent!!), or to a higher level may be useful.
>
> We should also answer the question, how this translates to analogue 
> content, because we may copy files manually and re-encode.
>
> After that, we should think about Propositional Objects represented in 
> files...
>
> Any thoughts?
>
> Best,
>
> Martin
>
> On 4/15/2020 8:16 PM, George Bruseker wrote:
>> Dear all,
>>
>> Here is another humble modelling problem for which I don't feel that 
>> there is a commonly agreed and documented answer, although it is a 
>> common question. How do we connect an actual file with the semantic 
>> network? So here is the scenario.
>>
>> I have a file: a word doc, a jpg image, a powerpoint. I want to 
>> represent it in CIDOC CRM and connect it the semantic network and do 
>> so in a way that would be interoperable with all other well formed 
>> instances of CIDOC CRM. How do I do that?
>>
>> Well part of the answer is clear. Part is unclear. Regarding the 
>> representation of the the fact that there is a digital object we have 
>> two choices. If we use pure CRMbase then we have
>>
>> E73 p2 has type E55 "Digital Object"
>>
>> If we use CRM extensions then we have
>>
>> D1 Digital Object
>>
>> Great. Now in the semantic network we can relate this in all sorts of 
>> standard ways to other entities (p67 refers to, p128 is about) etc. 
>> etc. We can use a creation event from CRM base or a digital machine 
>> event from CRMdig to document when the file was created, by whom etc. 
>> Super. I can use p1 is identified by E41 appellation to indicate the 
>> name of that digital object (which may differ from the file name) and 
>> give it a type with p2 has type. All standard and wonderful.
>>
>> I still have to put the file itself, that actual digital object which 
>> I want my user to be able to find and manipulate somehow in relation 
>> to the semantic network.
>>
>> How do people tend to do that? I have seen many variation but no 
>> common method.
>>
>> So what is the go-to solution and should it perhaps be documented on 
>> the CIDOC CRM site because it is a really common pattern?
>>
>> I have seen
>>
>> the file = E73... just put the file as the URN of the semantic node. 
>> But then this means your file is accessible via a URN which is often 
>> not the case and anyhow you probably want to distinguish your 
>> semantic node which 'stands for' the file from the actual file itself.
>>
>> I have seen and used E41 Appellation as a pattern. So the D1 or E73 
>> p1 is identified by E41 Appellation p190 has symbolic content 
>> df:literal "file name value goes here". Here you have a problem that 
>> you then need also to store somehow a path by which to reach that on 
>> some file system.
>>
>> I guess another alternative would be to use p190 has symbolic content 
>> and then throw the file in there as a blob. I don't particularly like 
>> this solution, as I would hope to find strings at the end of p190 and 
>> not blobs.
>>
>> Would maybe a sub property of p190 'is encoded in file' be an option 
>> in order to use the blob solution?
>>
>> Anyhow maybe there are already better solutions than I lay out above, 
>> but I would be interested to hear. Also I think it would be great to 
>> identify the best practice and put in on the main site so that people 
>> follow this strategy consistently.
>>
>> Probably my examples hide multiple use cases requiring different 
>> patterns. Anyhow, what do you think?
>>
>> Best,
>>
>> George
>>
>> _______________________________________________
>> Crm-sig mailing list
>> Crm-sig at ics.forth.gr
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>
>
> -- 
> ------------------------------------
>   Dr. Martin Doerr
>                
>   Honorary Head of the
>   Center for Cultural Informatics
>   
>   Information Systems Laboratory
>   Institute of Computer Science
>   Foundation for Research and Technology - Hellas (FORTH)
>                    
>   N.Plastira 100, Vassilika Vouton,
>   GR70013 Heraklion,Crete,Greece
>   
>   Vox:+30(2810)391625
>   Email:martin at ics.forth.gr   
>   Web-site:http://www.ics.forth.gr/isl  
>
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig


-- 
------------------------------------
  Dr. Martin Doerr
               
  Honorary Head of the
  Center for Cultural Informatics
  
  Information Systems Laboratory
  Institute of Computer Science
  Foundation for Research and Technology - Hellas (FORTH)
                   
  N.Plastira 100, Vassilika Vouton,
  GR70013 Heraklion,Crete,Greece
  
  Vox:+30(2810)391625
  Email: martin at ics.forth.gr
  Web-site: http://www.ics.forth.gr/isl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20200416/9509e0e9/attachment.html>


More information about the Crm-sig mailing list