Dear Martin,

Your observations are extremely stimulating, as usual, especially with regard to the observation of a linguistic object which is a very complex and articulated operation. In our view, in particular, the following conditions can occur while a linguistic object is observed:

1. I observe some signs on a surface without even realising that it is a text.
2. I understand that it is a text but without understanding its meaning (typically, even without understanding what language or writing system it is).
3. I actually read and understand text.

We agree with your observation that the relationship between these types of observation needs to be better specified. We therefore propose to keep TX5 Reading as the most specific observation (type 3) and to define one or more classes for the other observation cases of which TX5 could become a specific one.

As regards the Transcription, for the epigraphists this type of operation has a rather broad meaning that covers various cases, from the “exact” reproduction of the signs, to their stylised rendering, up to the transliteration using a different script. In the first cases, transcription does not necessarily imply an understanding of the signs (e.g. see publications on texts in unknown alphabets such as the one on the Phaistos Disc).

The other ideas and new classes you propose, relating to the other cases, are very intriguing, we are thinking about them, but they probably need a more articulated discussion.

Achille & Francesca

Il giorno 6 set 2021, alle ore 19:50, Martin Doerr via Crm-sig <> ha scritto:

Dear All,

I belief that TX5 Reading and TX6 Transcription should be in a different relationship.

In more detail, I propose to rename TX5 Reading to "TX5 Text Recognition", and
ontologically strictly separate observation from inferred interpretation of meaning, once TX5 Reading is declared as subclass of Observation, and TX6 Transcription is not.

Note that one can perfectly "read" a clear text written in a known script, without understanding any word. E.g., I can indeed copy well-written or printed Chinese Han characters without understanding any Chinese, just by knowledge of the relevant structural features. I assume the same holds for cuneiform. Equally, I can copy a Latin inscription without understanding any of the abundant abbreviations. This is indeed the proper observation.

If the result of this "reading" is a documentation in the same script and notation or not is a detail up to the reader. I'd argue, however, that the class TX5 needs a formal output, an instance of E90 Symbolic Object at least, in order to be useful. This is missing in the current model. Transcription in the sense of changing script of notation could be an internal, not documented  intermediate step of the text recognition ("transcribing text recognition", or adequate output properties), or an explicit step after the recognition of the Symbolic Object.

It is obviously true that text recognition typically includes arguments of understanding. I'd argue, that this is not intrinsic to reading, but only applies to texts not clearly typed. Strictly speaking, any such process constitutes ERROR CORRECTION and text COMPLETION.

Therefore, I propose a new class "Meaning Comprehension", which would take as input a recognized text and interprets an assumed meaning in plain language, or even formal propositions, which would be the end-stadium of the reading process, resulting in an information object. This class may reside in CRMinf or in CRMtex.

We can then construct from "Text Recognition", "Transcription" and "Meaning Comprehension" combined and short-cutting constructs, which would include "error correction", "resolution of recognition
ambiguity" and "missing part completion" as useful in practice for representing typical scholarly defaults.

I'd argue that resolution of linguistic ambiguity using scholarly arguments about the likely context of reference of the text constitutes a scholarly interpretation process after "reading", regardless whether error correction and completion used such arguments.

We need these separations, in order to create a clear interface to "Belief Adoption" in CRMinf, which is about the assumed real world truth of statements in texts.


All the best,


 Dr. Martin Doerr
 Honorary Head of the                                                                   
 Center for Cultural Informatics
 Information Systems Laboratory  
 Institute of Computer Science             
 Foundation for Research and Technology - Hellas (FORTH)   
 N.Plastira 100, Vassilika Vouton,         
 GR70013 Heraklion,Crete,Greece 
Crm-sig mailing list