[Crm-sig] How to represent imprecision? (E60 Number and E61 Time Primitive)

martin martin at ics.forth.gr
Thu Oct 6 17:31:20 EEST 2011


Dear Vladimir,

Thank you very much for your important questions. As a general remark I'd like to remind you
that the CIDOC CRM as a standard is an ontology in the narrower sense, a formal model approximating
a human conceptualization, and not a standard database schema. Any implementation, in particular
any RDF Schema, is again an approximation of this conceptualization. The CRM has a much wider scope
and longer life-cycle than RDF. In Relational Databases, quite different issues occur.
The Definition of the CIDOC CRM makes very clear that "Primitive Values" are dependent on the
capabilities of the respective IT infrastructure.

These details cannot be standardized in the same way as the CRM, because the change
in shorter periods of time than the ones for which we want to have conceptual interoperability, not bitwise interoperability.
Therefore the CRM refers loosely to concepts of time and number in a mathematical sense.
So far, no database implementation is compatible with all mathematical numerical systems.
Rather, we can make mathematical models of the database implementations and by that devise
algorithms to mediate between different implementations.


On 10/5/2011 1:21 AM, Vladimir Alexiev wrote:
> Very often in the museum domain measurements are imprecise, so dimensions
> must be expressed as an interval.
>
> 1. Imprecise Dimension
> E54 Dimension says "The properties of the class E54 Dimension allow for
> expressing the numerical approximation of the values of an instance of E54
> Dimension".
> My understanding is that can only happen through: E54 Dimension. P90 has
> value: E60 Number
> E60 Number says "... including *intervals* of these values to express
> *limited precision*".
This means that you have, according to your application, to specialize the respective
concepts and available primitive values. Different Dimensions need different numeric
systems.
>
> Regarding time spans, CIDOC CRM allows imprecision to be expressed in two
> ways:
>

and then, if in addition necessary:
> 2. Imprecise Duration
> E52 Time-Span. P83 had at least duration. E54 Dimension
> E52 Time-Span. P84 had at most duration. E54 Dimension

>
> IMHO this pair of properties is unnecessary, since:
> - E54 Dimension already accomodates (or should accommodate) imprecision, see
>
Good point! We'll make this any issue.


> - If we have this pair, then shouldn't we also split P43 has dimension in
> two (has minimum dimension, has maximum dimension)?
> - The pair allows "P91 has unit" of the two Dimensions to differ, which I
> think is unnecessary
> ("between 1 and 2 cm" is used often, but who'd say "between 1 cm and 1
> meter"?)

Not so good, because "Dimension" is the concept of the actual dimension of something
at some time, and the interval is the uncertainty about it. It is not, that the dimension
itself would vary. In cases of multi-dimensional values, such as color vectors (HSI) etc.,
the uncertainty may be an odd area. Restricting that to minimum-maximum in the ontology,
would make such more complex cases incompatible with the CRM. Time, in contrast, has one
dimension (except for in science fiction).
THe CRM follows the principle of "minimal commitment" by Thomas Gruber here.

>
> 3. Imprecise Start/End
> As depicted in the CRM Tutorial (online at
> http://personal.sirma.bg/vladimir/crm-tutorial/#slide27)
> two properties allow to express the Outer&  Inner bounds of a Time-Span:
> E52 Time-Span. P81 ongoing throughout: E61 Time Primitive (outer bound)
> E52 Time-Span. P82 at some time within: E61 Time Primitive (inner bound)
>
> Each of the bounds has start/end. This is confirmed by the spec:
> E61 Time Primitive says "... interval logic to express *date ranges*"

There are several large-scale Relational Databases that have implemented precisely that.
>
> Let's see what the current RDFS/OWL implementations of CIDOC CRM offer
> (neither one allows E54 Dimension to express a numerical approximation, i.e.
> item 1):
>
> 4. OWL2 DL proposal
> http://bloody-byte.net/rdf/cidoc-crm/core_5.0.1.rdf
>    <owl:DatatypeProperty
> rdf:about="http://purl.org/NET/cidoc-crm/core#P90_has_value">
>      <rdfs:domain
> rdf:resource="http://purl.org/NET/cidoc-crm/core#E54_Dimension"/>
>      <rdfs:range
> rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
>      <skos:scopeNote xml:lang="en">This property allows an E54 Dimension to
> be approximated by an E60 Number primitive.</skos:scopeNote>

This is not work of CRM-SIG or ISO, but adequate, see below.
>
> 5. OWL DL
> http://erlangen-crm.org/current
> P90_has_value is a Data Property

This is not work of CRM-SIG or ISO either
>
> 6. RDFS
> http://www.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2_english_label.rdfs
> "The primitive values "E60 Number"... are interpreted as rdf: literal.
>     <rdf:Property rdf:ID="P90F.has_value">
> 	<rdfs:domain rdf:resource="#E54.Dimension"/>
> 	<rdfs:range
> rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>

This is work of CRM-SIG. It is adequate for data transport, because rdf does not
have the necessary constructs, and in a literal we can encode any numbering system.
This is the standard way how for instance xsd:DateTime is added to RDF.
>
> Seme4 defined a CRM extension for the British Museum (called BMX), see
> http://crm.rkbexplorer.com.
> It defines several extension properties (prefix PX):
>
> 7. PX.min_value, PX.max_value as subPropertyOf P90F.has_value.
> - If you assert e.g. min_value=35 and max_value=45, that would infer
>    *both* has_value=35 and has_value=45, which I think is strange.

This is not strange. If you read careful the Definition of the CRM, it is clearly stated
that in an implementation multiple values of the same unique property have to be
interpreted as alternatives. Hence, the result is correct. Both values are possible,
like multiple fathers...

>    Instead I'd leave has_value independent, and set it to the average of
> min_value and max_value using some calculation
An average of an uncertainty interval does in general not make sense. It makes only
sense if an hypothesis about the nature of the deviation from the true value exists,
which requires knowledge of the measurement process.

> - This implements the requirement 1, but is it faithful to CIDOC CRM?
>    CIDOC CRM says the imprecision should be captured in the domain of
> P90.has_value, not through parallel properties
Sure, but we do not have (any more) the machines that provide interval values.
Necessarily, we can only write transformation algorithms between different solutions.
The CRM does not intend to standardize the impossible.
>
> 8. PX.time-span_earliest, PX.time-span_latest as properties of
> E52.Time-Span.
> - (Actually these are defined merely as rdf:Property and don't specify the
> domain and range).
> - these properties are superfluous, given P81 ongoing throughout and P82 at
> some time within
> - they don't allow to capture outer&  inner bound, as per 3
> - they are unrelated to CIDOC CRM properties, so the extension is not CRM
> Compatible.
>    A compatibility condition from the CRM Intro is:
>    "all properties of the extension are either subsumed by CRM properties, or
> are part of a path for which a CRM property is a shortcut"
>    See online here:
> http://personal.sirma.bg/vladimir/crm/introduction.html#extensions

The CRM does not prescribe any property. Not implementing inner bounds does not violate
comaptibility.
Note, that the subsumption requirement ends
at primitive values, because they are out of scope of the CRM (this should may be stated
more explicitly).
"subsumed by CRM properties" must be seen algorithmically, since the CRM is not bound to
a particular KR language. We can write an algorithm, that transforms instances of pairs of
PX.time-span_earliest, PX.time-span_latest into instances of P81, encoding the interval into
a literal with the intended meaning of a Time Primitive. Thereby data transport and data
transformation is supported.

If we want to query in addition a real database implementation for dates, we need a practical
implementation.

>
> CIDOC CRM leaves an important question (imprecise dimensions) unspecified,
> hidden in the scope notes of primitives E60 Number and E61 Time Primitive.
> This shouldn't be dismissed as "mere RDF implemenattion issue" since it is
> important for practical CRM interoperability.

Practical interoperability is a task of applications. The CRM-SIG does not "dismiss" that.
It is highly interested in that. But it will definitely not propose a standard serving a
particular encoding form and database, which causes then incompatibilities with other implementations.

It is a task for particular implementer communities to provide their solutions and suggest for
adoption by others. If a consensus is achieved on this level, CRM-SIG will make recommendations.

We do have a recommendation for RDF implementetations of P81, P82, which is out for vote. See attached.
>
> What would be the best way to represent imprecision?
>
> 9. If we define E60 Number and E61 Time Primitive as RDF classes, that would
> imply minimal changes to CIDOC CRM.
> - E60.Number with dataProperties crm:min_value, crm:max_value, and rdf:value
> (average or expected)
> - E61.Time_Primitive with dataProperties crm:min_date, crm:max_date, and
> maybe rdf:value
This causes the maximal number of joins, highly inefficient for querying and data entry,
introducing at the end of the chain properties that
have no possible subsumption with existing CRM properties.
In my eyes the worst case, because retrieving with the query one instance of P90, but
not being able to write a SPARQL condition directly on this value solves nothing except for the
paper exercise of "minimal change" to the CRM.


> - (see 2) The pair P83.had_at_least_duration and P84.had_at_most_duration
> should be merged to one property has_duration
yes
>
> 10. I'm sure that people who expect P57 has number of parts. to be a simple
> xsd:integer
> will be very unhappy to suddenly find a class E60.Number (and rightly so!)
Please note that what the user finds in a user interface is explicitly not the concern
of the CRM. ONLY because such concerns have been excluded, the CRM could ever be standardized.
Your GUI has to provide the adequate filters.
The CRM has NEVER been recommended as a data entry form!

> But E60.Number also gives examples of complex numbers, 3D coordinates,
> etc... So it really is not a literal, it needs to be a class
Exactly. This is why an implementation has to specialize E60 Number on a case by
case basis. The CRM does not want to deal with that.
>
> Is there a better way than 9? Your comments/advice will be appreciated.
> I googled "E60 site:http://lists.ics.forth.gr/pipermail/crm-sig" and
> couldn't find relevant discussion.
Most discussions are in the meetings. You may like to read the meeting minutes.

Best wishes and thank your for your comments!

Martin


>
> Regards!
> --
> Vladimir Alexiev, PhD, PMP
> PM/BA, Ontotext Corp, www.ontotext.com
> Sirma Group Holding, www.sirma.com
> 135 Tsarigradsko Shosse Blvd, 1784 Sofia, Bulgaria
> Email: vladimir.alexiev at ontotext.com, skype:valexiev1
> Mobile: +359 888 568 132, SMS: 359888568132 at sms.mtel.net, Fax: +359 2 975
> 3226
>
>
>
>
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>


-- 

--------------------------------------------------------------
  Dr. Martin Doerr              |  Vox:+30(2810)391625        |
  Research Director             |  Fax:+30(2810)391638        |
                                |  Email: martin at ics.forth.gr |
                                                              |
                Center for Cultural Informatics               |
                Information Systems Laboratory                |
                 Institute of Computer Science                |
    Foundation for Research and Technology - Hellas (FORTH)   |
                                                              |
  Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece |
                                                              |
          Web-site: http://www.ics.forth.gr/isl               |
--------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Recommendation_time_spans.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 122764 bytes
Desc: not available
Url : http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20111006/52c0bb8a/attachment-0001.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time_spans.rdfs
Type: text/xml
Size: 3842 bytes
Desc: not available
Url : http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20111006/52c0bb8a/attachment-0001.xml 


More information about the Crm-sig mailing list