[Crm-sig] NEW ISSUE: revise TX5 Reading versus TX6 Transcription

Martin Doerr martin at ics.forth.gr
Mon Sep 6 20:50:04 EEST 2021

Dear All,

I belief that TX5 Reading and TX6 Transcription should be in a different 

In more detail, I propose to rename TX5 Reading to "TX5 Text 
Recognition", and ontologically strictly separate observation from 
inferred interpretation of meaning, once TX5 Reading is declared as 
subclass of Observation, and TX6 Transcription is not.

Note that one can perfectly "read" a clear text written in a known 
script, without understanding any word. E.g., I can indeed copy 
well-written or printed Chinese Han characters without understanding any 
Chinese, just by knowledge of the relevant structural features. I assume 
the same holds for cuneiform. Equally, I can copy a Latin inscription 
without understanding any of the abundant abbreviations. This is indeed 
the proper observation.

If the result of this "reading" is a documentation in the same script 
and notation or not is a detail up to the reader. I'd argue, however, 
that the class TX5 *needs* a formal output, an instance of E90 Symbolic 
Object at least, in order to be useful. This is missing in the current 
model. Transcription in the sense of changing script of notation could 
be an internal, not documented  intermediate step of the text 
recognition ("transcribing text recognition", or adequate output 
properties), or an explicit step after the recognition of the Symbolic 

It is obviously true that text recognition typically includes arguments 
of understanding. I'd argue, that this is *not* intrinsic to reading, 
but only applies to texts not clearly typed. Strictly speaking, any such 
process constitutes *ERROR CORRECTION* and text *COMPLETION*.

Therefore, I propose a new class "Meaning Comprehension", which would 
take *as input a recognized text *and interprets an assumed meaning in 
plain language, or even formal propositions, which would be the 
end-stadium of the reading process, resulting in an information object. 
This class may reside in CRMinf or in CRMtex.

We can then construct from "Text Recognition", "Transcription" and 
"Meaning Comprehension" combined and short-cutting constructs, which 
would include "error correction", "resolution of recognition ambiguity" 
and "missing part completion" as useful in practice for representing 
typical scholarly defaults.

I'd argue that resolution of linguistic ambiguity using scholarly 
arguments about the likely context of reference of the text constitutes 
a scholarly interpretation process after "reading", regardless whether 
error correction and completion used such arguments.

We need these separations, in order to create a clear interface to 
"Belief Adoption" in CRMinf, which is about the assumed real world truth 
of statements in texts.


All the best,


  Dr. Martin Doerr
  Honorary Head of the
  Center for Cultural Informatics
  Information Systems Laboratory
  Institute of Computer Science
  Foundation for Research and Technology - Hellas (FORTH)
  N.Plastira 100, Vassilika Vouton,
  GR70013 Heraklion,Crete,Greece
  Email: martin at ics.forth.gr
  Web-site: http://www.ics.forth.gr/isl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20210906/8ce278da/attachment-0001.html>

More information about the Crm-sig mailing list