[crm-sig] Type hierarchy

Nicholas Crofts nickcrofts at yahoo.com
Mon Oct 22 14:47:25 EEST 2001


Note concerning the type hierarchy

At our recent meeting in Paris there was some discussion about the role of the type hierarchy, the numbering of types and the appropriate methodological principles to apply. These notes are intended to present my understanding of these issues.

The type hierarchy contains four sorts of types:

   As already noted elsewhere [Doerr & Crofts 1999], the type hierarchy implicitly contains a 'redundant' declaration of all types corresponding to the entities (classes) present under "E1 CIDOC Entity". These implicit type declarations also duplicate the hierarchical structure of the existing entity hierarchy. In Paris, I put forward the proposal that implicit types should be prefixed with the letter 'T', and use the same numbering as their corresponding entity. "E5 Event", for example, would correspond to "T5 Event". This proposition has not yet been adopted, but I should use this form of notation in the remarks which follow.
   The type hierarchy may also contain additional sub types of the implicit types, thereby providing a higher degree of granularity than is expressed by the basic entity hierarchy. For example, a subtype "Coins" could be declared for "T24 Man-Made Object". I am not sure what rules should be adopted for numbering these sub types since they are assumed to be domain-oriented and specific to local systems. For present purposes I shall use a decimal notation such as "T24.1 Coin". 
   The third sort of types present in the type hierarchy are descendants of type hierarchies which do not corresponding to any entity in the entity hierarchy, such as 'language' and 'material'. The head types of these type hierarchies are currently assigned an 'E' number. This effectively avoids any possible conflict with the numbering of the entity hierarchy. However, in order to highlight their nature as 'types' I propose to adopt the same 'T' prefix as for other types. (I would argue that this approach is consistent in that it suggests the existence of an 'implicit' entity in the entity hierarchy, one that could be declared explicitly at a later date if required.). I shall refer to these type hierarchies as 'floating types' since are not be assigned to any position in the entity hierarchy.
   The head type of the type hierarchy is E55 Type. This exists in the main entity hierarchy so it can correctly be referred to using the 'E' prefix. However, since it also represents the highest type in the hierarchy of implicit types, it could also be considered as equivalent to "E1 CIDOC Entity", and might therefore be numbered 'T1 CIDOC Entity". Alternatively, T1 might be declared as a sub type of E55 Type. This point needs further clarification.

The distinction between the 'E' hierarchy and the 'T' hierarchy is essentially technical. The Entity hierarchy consists of classes and can be considered as a structural hierarchy. In a relational database it would naturally be implemented as a set of tables. The Type hierarchy consists of instances and can be considered as a set of data values. It would naturally be implemented as a single table containing values. Despite these differences, the two representations are intended to be logically equivalent. Each entity in the E hierarchy corresponds to a type in the T hierarchy. When no corresponding type exists for a given entity it is merely for reasons of economy; the undeclared type can be considered as implicit. Conversely, a type for which has no corresponding entity exists in the E hierarchy corresponds to an implicit entity, which could be declared at a later stage if needed.

A consequence of the logical equivalence of the entity hierarchy and the type hierarchy is that the methodological rules which apply to the declaration of entities and sub entities also apply to types and subtypes. The 'Isa' rule should be applicable to both, so it should be possible of any sub-entity to say that "sub-entity X isA(n) entity Y", where Y is a super-entity of X. Thus, a "Person (E21) is a Physical Entity (E18)". Similarly, it should be possible for any subtype to say that "subtype X isA(n) type Y", where Y is a super-type of X. Following my previous example: "a Coin (T24.1) is a Man-Made Object (T24)". It follows from this that we should also be able to make assertions of the form "subtype X isA(n) entity Y" where X is a subtype of type Y', corresponding to entity Y. e.g. "A coin (T24.1) is a Man-Made Object (E24)". We should bear in mind that the isA rule is intended to be inclusive: it is true iff any and all members of a sub-category are also members of the super-category. Pasta, for example is not a good specialisation of 'Italian food', since some pastas are not Italian. 

When do we declare subtypes without a corresponding entity? 

A subtype should be declared for a en existing implicit type (i.e. one which corresponds to an existing entity) iff it is needed to register a domain specific notion which would otherwise not be recorded and if it does not require properties in addition to those it inherits from the existing entity. If additional properties are required, a sub-entity would have to be declared. 

A subtype should be declared either directly under E55 or as part of a type hierarchy so declared (i.e. there is no corresponding entity) iff it corresponds to an implicit entity for which no properties are required in the scope of the CRM. This is assumed to be the case for E56 (T56) language, for example. No entity is declared in the CRM to represent language since we have no properties to record about languages other than their identity: the CRM does not describe or talk about languages. If this situation changes in the future, we would need to declare a 'language' entity in the entity hierarchy, and attach the relevant properties. 

When to use the 'has type' property?

All entities declared in the CRM have a 'has type' property. This enables instances of entities to be declared as belonging to a given sub types. The logical equivalence of the types and entities means that assigning a sub type to an entity instance is logically equivalent to declaring it as an instance of an implicit sub-entity (a specialisation). It follows that one should only assign sub types which follow the 'isA' rule, i.e. where subtype Y isA type X and X is the type corresponding to entity X'. (This constraint may be represented in the property declarations using the appropriate T number) Furthermore, we can say that types assigned to the has type property should not be 'floating types', since these are not yet declared in the entity hierarchy and consequently cannot follow the isA rule. No instance of an entity in the CRM should have type 'language', for example, for this would imply that language is a specialisation of the entity to which the instance belongs. (If it can be established, that "language" is indeed a specialisation of some existing entity, then it should be reclassified.) 

When to use other property links to the type hierarchy?

Some entities have additional properties which link into the type hierarchy. "E33 Linguistic Object", for example, has a property "has language (is language of): T56 Language". Property links of this sort can be seen as logically equivalent to links to entities. The fact that language is currently declared as a type, and not as an entity, reflects the fact that, as it stands, the CRM has very little to say about languages. It follows that the type to which the property link refers should not be a subtype of the entity. If it is, then the "has type" property should be used instead. In the current example this rule holds, it is not the case that a language is a linguistic object (as defined in the CRM). 

In the light of the foregoing remarks, I would argued that the 'has gender' property of "E21 Person" should be maintained and should not be handled using the inherited "has type" property. Gender is not a good specialisation of Person, since many male, female and somewhere-in-between objects are not persons, inversely, many men and women are not fish. The CRM does not currently have a specific class for animals, other than E20 Biological Object, (which could also be taken to include plants, bacteria and biological material). However, something like an 'animalia' subclass will need to be included at some stage to meet the needs of natural history collections. This would be a natural place for the 'has gender' property. 

Open questions

How should types such as 'language' be declared, which could be positioned in the entity hierarchy? Leaving them as 'floating' types suggests that we don't know how to classify them correctly. Would it be true to say that, in principle, all floating types could be placed in the entity hierarchy?

The entity hierarchy leads down to, but does not include, instances of entities. The type hierarchy leads down to, but also appears to include, instance level types. This asymmetry seems to be required, for example, for 'E56 Language' and 'E57 Material'. The type hierarchies fail to do their job if they don't include entries like 'French' and 'Wood'. Do we need to make this distinction clear at a theoretical level, and possibly by differentiating the instance level data in the type hierarchy in some way? Failure to make the distinction might lead to problems if property links are made to "class level" types rather than instances.

Types can have many names. Common names, scientific names, original names, etc. Should we give types an 'is identified by' link to appellation? (Incidentally, should appellation have a 'has value' property link to string?)

Types are created by human beings, who can often be named. This is notably the case with biological taxonomy. What is the relationship between the type hierarchy and E28 Conceptual Object ? (This could open up a whole can of biological objects...)

I hope some of the foregoing makes sense. 

best wishes

Nick Crofts

References

[Doerr & Crofts 1999] Electronic Esperanto: The Role of the Object Oriented CIDOC Reference Model , Proc. of the ICHIM'99, Washington, DC, September 22-26, 1999

URL : http://cidoc.ics.forth.gr/docs/doerr_crofts_ichim99_new.doc


Nicholas Crofts
DAEL / DSI
rue David-Dufour 5
Case postale 22
CH - 1211 Genève 8
tél +41 22 327 5271
fax +41 22 328 4382


---------------------------------
Nokia Game is on again. 
 Click here  to join the new all media adventure before November 3rd. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20011022/e9bd3ee2/attachment-0001.htm 


More information about the Crm-sig mailing list