[Crm-sig] Issue 397

Martin Doerr martin at ics.forth.gr
Tue Jun 11 14:42:12 EEST 2019


Dear Robert,

I may have lost the track. I had published my final version of the 
guidelines before January. A final approval may be pending, but I have 
elaborated in much details these properties. Yes, of course, they have 
to be defined, that was the idea of the deprecation. I thought it was 
accepted already...:-)

See:

"Whereas the CRM regards that intervals of primitive values are 
primitive values by themselves, there is currently no corresponding 
practice in RDF. Therefore, in analogy to the properties of E52 
Time-Span, we define in CRM RDFS two more subproperties of P90 has 
value: “P90a_has_lower_value_limit” and “P90b_has_upper_value_limit”. 
Even if we regard complex matrices of numbers as one value for an 
instance of E54 Dimension, such as RGB images, we can argue that minimal 
and maximal values exist as two separate matrices of the same structure. 
The precise guidelines for using these properties are given in the 
section “Guidelines for using P90a, P90, P90b” below."

"


  *Guidelines for using P90a, P90, P90b*

The CRM recommends to approximate numerical values of Dimensions with 
intervals. The range of the respective property "P90 has value" is 
defined in the CRM as E60 Number. Whereas the CRM regards that intervals 
of primitive values are primitive values by themselves, there is 
currently no corresponding practice in RDF. Therefore, in analogy to the 
properties of E52 Time-Span, we define in CRM RDFS two more 
subproperties of P90 has value: “/P90a_has_lower_value_limit/” and 
“/P90b_has_upper_value_limit/”.

The reasons for recommending this approximation are the following: All 
scientific measurements of non-discrete values are imprecise because of 
the tolerances of the measurement devices, shortcomings in applying the 
procedures and the indeterminacy of the measured effect itself. In 
natural sciences, important results of measurements are associated with 
possibly complex probabilistic distributions for the true value of the 
measured effect.

The most complex case relevant for cultural-historical data are the 
so-called “battleship curves” for calibrated C14 dating data. Many of 
these distribution models actually extend to infinity with non-zero 
probability, which is neither practical nor always justified. In the 
case of C14 however, the actual width of the distribution is often 
underestimated. Nevertheless, even data with a given probabilistic 
uncertainty to infinity are typically associated by scientists with 
narrower “confidence intervals” at one to three “standard deviations”, 
i.e., with a probability of some 68% – 99.7% for the value to be in the 
given range (https://en.wikipedia.org/wiki/Standard_deviation).

Whereas querying globally a very large aggregation of 
cultural-historical data by time intervals is highly relevant, querying 
and reasoning with different approximations of dimensions is normally 
restricted to quite narrow questions. For many cases, a medium value 
without explicit limits is sufficient for the application, such as the 
length of a museum object in millimeters for packaging it in a box. 
Nevertheless, querying explicit representation of actual outer limits or 
at least reasonably wide confidence intervals is computationally highly 
effective, and therefore a good way to ensure recall at query time, 
i.e., that the relevant results are contained in the answer to the 
query, even if it also contains irrelevant ones.

We therefore recommend to use /P90_has_value/ for documenting a medium 
value//or a value without error estimates, when the precision appears to 
be self-evident or irrelevant.

We recommend to use /P90a_has_lower_value_limit /for documenting the 
highest explicit lower limit available for the respective value, even if 
it provides very wide margins. It is an error to omit the lower limit 
even if it appears to be overly pessimistic.

We recommend to use /P90b_has_upper_value_limit /for documenting the 
lowest explicit upper limit available for the respective, even if it 
provides very wide margins. It is an error to omit the upper limit even 
if it appears to be overly pessimistic.

In case of approximating probabilistic distributions, we recommend to 
keep lower and upper limit at two standard deviations or enclosing the 
true value with 95% probability.

/P90a_has_lower_value_limit/ should always be used together with 
/P90b_has_upper_value_limit. /If they are used, the property 
/P90_has_value/ may be used as well or be omitted."


On 6/11/2019 12:56 PM, Robert Sanderson wrote:
>
> Apologies for missing this back in February …
>
> Before the deprecation of P83 and P84 in favor of P191, it was 
> possible to say that a TimeSpan had a minimum duration of 2 days and a 
> maximum duration of 4 days by using P83 and P84.
>
> Now there is only a single Dimension related via P191, with the intent 
> that the value can be an interval.
>
> Given that in the RDF projection of CRM, the value of a Dimension is a 
> single number (and similarly, the dates are single dates), it is not 
> possible to express the above without some additional constructions in 
> that projection.
>
> Thus it seems like we need at least to define P90a_has_minumum_value 
> and P90b_has_maximum_value as properties of Dimension to be able to 
> express the interval value. This would be more consistent, and provide 
> access to the construction for other uses of Dimension, so I’m happy 
> with the deprecation of the last SIG … but we need to follow through 
> with the corresponding RDF definitions.
>
> I propose the following properties, which could be defined in the same 
> document as P81a/b and P82a/b:
>
> P90a_has_minimum_value
>
> This property allows the lowest possible value of an E54 Dimension to 
> be approximated by an E60 Number primitive.
>
> P90b_has_maximum_value
>
> This property allows the greatest possible value of an E54 Dimension 
> to be approximated by an E60 Number primitive.
>
> Rob
>
> *From: *Martin Doerr <martin at ics.forth.gr>
> *Date: *Saturday, February 23, 2019 at 4:59 PM
> *To: *Robert Sanderson <RSanderson at getty.edu>, crm-sig 
> <Crm-sig at ics.forth.gr>
> *Subject: *Re: [Crm-sig] Issue 397
>
> Dear Robert,
>
> On 2/23/2019 1:09 AM, Robert Sanderson wrote:
>
>     This becomes problematic, unfortunately, in RDF which does not
>     have a way to natively express a Number that is actually an
>     interval.  The resolution would be to do the same as P81a/b …
>     which would have the same effect as maintaining P83 and P84, just
>     not in the model directly.
>
>     While I appreciate the theoretical consistency that this change
>     would add, from an implementation perspective, this would bring
>     more complexity than value.
>
> I do not understand what increases the complexity: If I have in RDFS 
> two paths  P83-E54-P90 AND P83-E54-P90, and the ambiguity how to use 
> P90a, P90b together with these paths, OR I have a single path Pxxx-E54 
> that splits into P90a, P90b, then, in the end I have again two paths: 
> Pxxx-E54-P90a AND Pxxx-E54-P90b and no ambiguity to use P83 or P90a.
>
> So where is the added complexity? I see it only reduced, but I may be 
> wrong!
>
> My second question was if, since we have bound the Dimension already 
> to temporal durations in the definition of Pxxx, we should express 
> that by a subclass of E54.
>
> Best,
>
> martin
>
>     Overall, I’m not in favor of the deprecation, but am not averse to
>     adding had_duration separately, with the potential to deprecate 83
>     and 84 if a holistic approach to date and number intervals can be
>     devised.
>
>     Thanks!
>
>     Rob
>
>     *From: *Crm-sig <crm-sig-bounces at ics.forth.gr>
>     <mailto:crm-sig-bounces at ics.forth.gr> on behalf of Martin Doerr
>     <martin at ics.forth.gr> <mailto:martin at ics.forth.gr>
>     *Date: *Friday, February 15, 2019 at 9:18 AM
>     *To: *crm-sig <Crm-sig at ics.forth.gr> <mailto:Crm-sig at ics.forth.gr>
>     *Subject: *[Crm-sig] Issue 397
>
>     Dear All
>
>     As discussed in Berlin, I proposed to deprecate P83, P84, because
>     in competes with an interval interpretation of P90, and :
>
>     Introduce instead Pxxx had duration, Domain:  E52 Time-Span,
>     Range: E54 Dimension
>     and use the P90, P90a, P90b as adequate
>
>     or introduce  an Exxx Temporal Duration , subclass of E54
>     Dimension, and define subproperties in RDFS ending in xsd:duration.
>
>     Here my definition:
>
>     *Pxxx had duration (was duration of)*
>
>     Domain: E52 Time-Span
>
>     Range: E54 Dimension
>
>     Quantification: one to one (1,1:1,1)
>
>     Scope note:         This property describes the length of time
>     covered by an E52 Time-Span. It allows an E52 Time-Span to be
>     associated with an E54 Dimension representing duration (i.e. it’s
>     inner boundary) independent from the actual beginning and end.
>     Indeterminacy of the duration value can be expressed by assigning
>     a numerical interval to the property P90 has value of E54 Dimension.
>
>     Examples:
>
>     §  the time span of the Battle of Issos 333 B.C.E. (E52) /had
>     duration/ Battle of Issos minimum duration (E54) has unit (P91)
>     day (E58) has value (P90) (E60)
>
>     In First Order Logic:
>
>     Pxxx(x,y) ⊃E52(x)
>
>     Pxxx(x,y) ⊃E54(y)
>
>     *Comments?*
>
>     ------------------------------------------------------------------------------------------------------
>
>     See:
>
>     P83 had at least duration (was minimum duration of)
>
>     Domain: E52 Time-Span
>
>     Range: E54 Dimension
>
>     Quantification: one to one (1,1:1,1)
>
>     Scope note:         This property describes the minimum length of
>     time covered by an E52 Time-Span.
>
>     It allows an E52 Time-Span to be associated with an E54 Dimension
>     representing it’s minimum duration (i.e. it’s inner boundary)
>     independent from the actual beginning and end.
>
>     Examples:
>
>     §  the time span of the Battle of Issos 333 B.C.E. (E52) had at
>     least duration Battle of Issos minimum duration (E54) has unit
>     (P91) day (E58) has value (P90) 1 (E60)
>
>     In First Order Logic:
>
>     P83(x,y) ⊃ E52(x)
>
>     P83(x,y) ⊃ E54(y)
>
>
>     P84 had at most duration (was maximum duration of)
>
>     Domain: E52 Time-Span
>
>     Range: E54 Dimension
>
>     Quantification: one to one (1,1:1,1)
>
>     Scope note:         This property describes the maximum length of
>     time covered by an E52 Time-Span.
>
>     It allows an E52 Time-Span to be associated with an E54 Dimension
>     representing it’s maximum duration (i.e. it’s outer boundary)
>     independent from the actual beginning and end.
>
>     Examples:
>
>     §  the time span of the Battle of Issos 333 B.C.E. (E52) had at
>     most duration Battle of Issos maximum duration (E54) has unit
>     (P91) day (E58) has value (P90) 2 (E60)
>
>     In First Order Logic:
>
>     P84(x,y) ⊃ E52(x)
>
>     P84(x,y) ⊃ E54(y)
>
>     -- 
>
>     ------------------------------------
>
>       Dr. Martin Doerr
>
>                    
>
>       Honorary Head of the
>
>       Center for Cultural Informatics
>
>       
>
>       Information Systems Laboratory
>
>       Institute of Computer Science
>
>       Foundation for Research and Technology - Hellas (FORTH)
>
>                        
>
>       N.Plastira 100, Vassilika Vouton,
>
>       GR70013 Heraklion,Crete,Greece
>
>       
>
>       Vox:+30(2810)391625
>
>       Email:martin at ics.forth.gr  <mailto:martin at ics.forth.gr>   
>
>       Web-site:http://www.ics.forth.gr/isl  
>
> -- 
> ------------------------------------
>   Dr. Martin Doerr
>                
>   Honorary Head of the
>   Center for Cultural Informatics
>   
>   Information Systems Laboratory
>   Institute of Computer Science
>   Foundation for Research and Technology - Hellas (FORTH)
>                    
>   N.Plastira 100, Vassilika Vouton,
>   GR70013 Heraklion,Crete,Greece
>   
>   Vox:+30(2810)391625
>   Email:martin at ics.forth.gr  <mailto:martin at ics.forth.gr>   
>   Web-site:http://www.ics.forth.gr/isl  
>
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig


-- 
------------------------------------
  Dr. Martin Doerr
               
  Honorary Head of the
  Center for Cultural Informatics
  
  Information Systems Laboratory
  Institute of Computer Science
  Foundation for Research and Technology - Hellas (FORTH)
                   
  N.Plastira 100, Vassilika Vouton,
  GR70013 Heraklion,Crete,Greece
  
  Vox:+30(2810)391625
  Email: martin at ics.forth.gr
  Web-site: http://www.ics.forth.gr/isl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20190611/a608781f/attachment-0001.html>


More information about the Crm-sig mailing list