[Crm-sig] How to represent imprecision? (E60 Number and E61 Time Primitive)

Christian-Emil Ore c.e.s.ore at iln.uio.no
Thu Oct 6 22:24:44 EEST 2011


Dear all,
I think it will be very unwise to remove the
  E52 Time-Span. P83 had at least duration. E54 Dimension
  E52 Time-Span. P84 had at most duration. E54 Dimension

As JOn Holmen and  I has shown in a system for time reasoning in 
connection with archaeology, see the paper at the end of
http://www.edd.uio.no/artiklar/arkeologi/holmen_ore_caa2009.pdf

these properties are quite useful. In fact,  they model the basic way 
historians work or field archaeologists for that matter.

The idea of dating as a measurement + dimension to express imprecision 
is fine  scientific dating methods as C14. However, for dating based on 
reading of written sources and historical calendars it is not 
sufficient. We need both. To take away the P83 and P84 will reduce the 
expressive power of CRM. It is a little ike removing way models of light 
because light can be seen as particles.

C-E




On 06.10.2011 16:31, martin wrote:
> Dear Vladimir,
>
> Thank you very much for your important questions. As a general remark
> I'd like to remind you
> that the CIDOC CRM as a standard is an ontology in the narrower sense, a
> formal model approximating
> a human conceptualization, and not a standard database schema. Any
> implementation, in particular
> any RDF Schema, is again an approximation of this conceptualization. The
> CRM has a much wider scope
> and longer life-cycle than RDF. In Relational Databases, quite different
> issues occur.
> The Definition of the CIDOC CRM makes very clear that "Primitive Values"
> are dependent on the
> capabilities of the respective IT infrastructure.
>
> These details cannot be standardized in the same way as the CRM, because
> the change
> in shorter periods of time than the ones for which we want to have
> conceptual interoperability, not bitwise interoperability.
> Therefore the CRM refers loosely to concepts of time and number in a
> mathematical sense.
> So far, no database implementation is compatible with all mathematical
> numerical systems.
> Rather, we can make mathematical models of the database implementations
> and by that devise
> algorithms to mediate between different implementations.
>
>
> On 10/5/2011 1:21 AM, Vladimir Alexiev wrote:
>> Very often in the museum domain measurements are imprecise, so dimensions
>> must be expressed as an interval.
>>
>> 1. Imprecise Dimension
>> E54 Dimension says "The properties of the class E54 Dimension allow for
>> expressing the numerical approximation of the values of an instance of
>> E54
>> Dimension".
>> My understanding is that can only happen through: E54 Dimension. P90 has
>> value: E60 Number
>> E60 Number says "... including *intervals* of these values to express
>> *limited precision*".
> This means that you have, according to your application, to specialize
> the respective
> concepts and available primitive values. Different Dimensions need
> different numeric
> systems.
>>
>> Regarding time spans, CIDOC CRM allows imprecision to be expressed in two
>> ways:
>>
>
> and then, if in addition necessary:
>> 2. Imprecise Duration
>> E52 Time-Span. P83 had at least duration. E54 Dimension
>> E52 Time-Span. P84 had at most duration. E54 Dimension
>
>>
>> IMHO this pair of properties is unnecessary, since:
>> - E54 Dimension already accomodates (or should accommodate)
>> imprecision, see
>>
> Good point! We'll make this any issue.
>
>
>> - If we have this pair, then shouldn't we also split P43 has dimension in
>> two (has minimum dimension, has maximum dimension)?
>> - The pair allows "P91 has unit" of the two Dimensions to differ, which I
>> think is unnecessary
>> ("between 1 and 2 cm" is used often, but who'd say "between 1 cm and 1
>> meter"?)
>
> Not so good, because "Dimension" is the concept of the actual dimension
> of something
> at some time, and the interval is the uncertainty about it. It is not,
> that the dimension
> itself would vary. In cases of multi-dimensional values, such as color
> vectors (HSI) etc.,
> the uncertainty may be an odd area. Restricting that to minimum-maximum
> in the ontology,
> would make such more complex cases incompatible with the CRM. Time, in
> contrast, has one
> dimension (except for in science fiction).
> THe CRM follows the principle of "minimal commitment" by Thomas Gruber
> here.
>
>>
>> 3. Imprecise Start/End
>> As depicted in the CRM Tutorial (online at
>> http://personal.sirma.bg/vladimir/crm-tutorial/#slide27)
>> two properties allow to express the Outer& Inner bounds of a Time-Span:
>> E52 Time-Span. P81 ongoing throughout: E61 Time Primitive (outer bound)
>> E52 Time-Span. P82 at some time within: E61 Time Primitive (inner bound)
>>
>> Each of the bounds has start/end. This is confirmed by the spec:
>> E61 Time Primitive says "... interval logic to express *date ranges*"
>
> There are several large-scale Relational Databases that have implemented
> precisely that.
>>
>> Let's see what the current RDFS/OWL implementations of CIDOC CRM offer
>> (neither one allows E54 Dimension to express a numerical
>> approximation, i.e.
>> item 1):
>>
>> 4. OWL2 DL proposal
>> http://bloody-byte.net/rdf/cidoc-crm/core_5.0.1.rdf
>> <owl:DatatypeProperty
>> rdf:about="http://purl.org/NET/cidoc-crm/core#P90_has_value">
>> <rdfs:domain
>> rdf:resource="http://purl.org/NET/cidoc-crm/core#E54_Dimension"/>
>> <rdfs:range
>> rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
>> <skos:scopeNote xml:lang="en">This property allows an E54 Dimension to
>> be approximated by an E60 Number primitive.</skos:scopeNote>
>
> This is not work of CRM-SIG or ISO, but adequate, see below.
>>
>> 5. OWL DL
>> http://erlangen-crm.org/current
>> P90_has_value is a Data Property
>
> This is not work of CRM-SIG or ISO either
>>
>> 6. RDFS
>> http://www.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2_english_label.rdfs
>> "The primitive values "E60 Number"... are interpreted as rdf: literal.
>> <rdf:Property rdf:ID="P90F.has_value">
>> <rdfs:domain rdf:resource="#E54.Dimension"/>
>> <rdfs:range
>> rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
>
> This is work of CRM-SIG. It is adequate for data transport, because rdf
> does not
> have the necessary constructs, and in a literal we can encode any
> numbering system.
> This is the standard way how for instance xsd:DateTime is added to RDF.
>>
>> Seme4 defined a CRM extension for the British Museum (called BMX), see
>> http://crm.rkbexplorer.com.
>> It defines several extension properties (prefix PX):
>>
>> 7. PX.min_value, PX.max_value as subPropertyOf P90F.has_value.
>> - If you assert e.g. min_value=35 and max_value=45, that would infer
>> *both* has_value=35 and has_value=45, which I think is strange.
>
> This is not strange. If you read careful the Definition of the CRM, it
> is clearly stated
> that in an implementation multiple values of the same unique property
> have to be
> interpreted as alternatives. Hence, the result is correct. Both values
> are possible,
> like multiple fathers...
>
>> Instead I'd leave has_value independent, and set it to the average of
>> min_value and max_value using some calculation
> An average of an uncertainty interval does in general not make sense. It
> makes only
> sense if an hypothesis about the nature of the deviation from the true
> value exists,
> which requires knowledge of the measurement process.
>
>> - This implements the requirement 1, but is it faithful to CIDOC CRM?
>> CIDOC CRM says the imprecision should be captured in the domain of
>> P90.has_value, not through parallel properties
> Sure, but we do not have (any more) the machines that provide interval
> values.
> Necessarily, we can only write transformation algorithms between
> different solutions.
> The CRM does not intend to standardize the impossible.
>>
>> 8. PX.time-span_earliest, PX.time-span_latest as properties of
>> E52.Time-Span.
>> - (Actually these are defined merely as rdf:Property and don't specify
>> the
>> domain and range).
>> - these properties are superfluous, given P81 ongoing throughout and
>> P82 at
>> some time within
>> - they don't allow to capture outer& inner bound, as per 3
>> - they are unrelated to CIDOC CRM properties, so the extension is not CRM
>> Compatible.
>> A compatibility condition from the CRM Intro is:
>> "all properties of the extension are either subsumed by CRM
>> properties, or
>> are part of a path for which a CRM property is a shortcut"
>> See online here:
>> http://personal.sirma.bg/vladimir/crm/introduction.html#extensions
>
> The CRM does not prescribe any property. Not implementing inner bounds
> does not violate
> comaptibility.
> Note, that the subsumption requirement ends
> at primitive values, because they are out of scope of the CRM (this
> should may be stated
> more explicitly).
> "subsumed by CRM properties" must be seen algorithmically, since the CRM
> is not bound to
> a particular KR language. We can write an algorithm, that transforms
> instances of pairs of
> PX.time-span_earliest, PX.time-span_latest into instances of P81,
> encoding the interval into
> a literal with the intended meaning of a Time Primitive. Thereby data
> transport and data
> transformation is supported.
>
> If we want to query in addition a real database implementation for
> dates, we need a practical
> implementation.
>
>>
>> CIDOC CRM leaves an important question (imprecise dimensions)
>> unspecified,
>> hidden in the scope notes of primitives E60 Number and E61 Time
>> Primitive.
>> This shouldn't be dismissed as "mere RDF implemenattion issue" since
>> it is
>> important for practical CRM interoperability.
>
> Practical interoperability is a task of applications. The CRM-SIG does
> not "dismiss" that.
> It is highly interested in that. But it will definitely not propose a
> standard serving a
> particular encoding form and database, which causes then
> incompatibilities with other implementations.
>
> It is a task for particular implementer communities to provide their
> solutions and suggest for
> adoption by others. If a consensus is achieved on this level, CRM-SIG
> will make recommendations.
>
> We do have a recommendation for RDF implementetations of P81, P82, which
> is out for vote. See attached.
>>
>> What would be the best way to represent imprecision?
>>
>> 9. If we define E60 Number and E61 Time Primitive as RDF classes, that
>> would
>> imply minimal changes to CIDOC CRM.
>> - E60.Number with dataProperties crm:min_value, crm:max_value, and
>> rdf:value
>> (average or expected)
>> - E61.Time_Primitive with dataProperties crm:min_date, crm:max_date, and
>> maybe rdf:value
> This causes the maximal number of joins, highly inefficient for querying
> and data entry,
> introducing at the end of the chain properties that
> have no possible subsumption with existing CRM properties.
> In my eyes the worst case, because retrieving with the query one
> instance of P90, but
> not being able to write a SPARQL condition directly on this value solves
> nothing except for the
> paper exercise of "minimal change" to the CRM.
>
>
>> - (see 2) The pair P83.had_at_least_duration and P84.had_at_most_duration
>> should be merged to one property has_duration
> yes
>>
>> 10. I'm sure that people who expect P57 has number of parts. to be a
>> simple
>> xsd:integer
>> will be very unhappy to suddenly find a class E60.Number (and rightly
>> so!)
> Please note that what the user finds in a user interface is explicitly
> not the concern
> of the CRM. ONLY because such concerns have been excluded, the CRM could
> ever be standardized.
> Your GUI has to provide the adequate filters.
> The CRM has NEVER been recommended as a data entry form!
>
>> But E60.Number also gives examples of complex numbers, 3D coordinates,
>> etc... So it really is not a literal, it needs to be a class
> Exactly. This is why an implementation has to specialize E60 Number on a
> case by
> case basis. The CRM does not want to deal with that.
>>
>> Is there a better way than 9? Your comments/advice will be appreciated.
>> I googled "E60 site:http://lists.ics.forth.gr/pipermail/crm-sig" and
>> couldn't find relevant discussion.
> Most discussions are in the meetings. You may like to read the meeting
> minutes.
>
> Best wishes and thank your for your comments!
>
> Martin
>
>
>>
>> Regards!
>> --
>> Vladimir Alexiev, PhD, PMP
>> PM/BA, Ontotext Corp, www.ontotext.com
>> Sirma Group Holding, www.sirma.com
>> 135 Tsarigradsko Shosse Blvd, 1784 Sofia, Bulgaria
>> Email: vladimir.alexiev at ontotext.com, skype:valexiev1
>> Mobile: +359 888 568 132, SMS: 359888568132 at sms.mtel.net, Fax: +359 2 975
>> 3226
>>
>>
>>
>>
>> _______________________________________________
>> Crm-sig mailing list
>> Crm-sig at ics.forth.gr
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>
>
>
>
>
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig



More information about the Crm-sig mailing list