[Crm-sig] How to represent the textual content of documents about museum objects?

Conal Tuohy conal.tuohy at gmail.com
Thu Sep 10 13:05:50 EEST 2015

On 8 September 2015 at 19:27, Dominic Oldman <doint at oldman.me.uk> wrote:

> I think there are various approaches you can take depending upon what your
> objectives are.
> 1. Identify (describe) the document and provide access to it. Using CRM
> this would harmonise with other CRM data.

This is really all I'm aiming to do, though I had to step outside of the
CIDOC CRM (and use FRBRoo) to encode the relationship between the E31
Document and the associated HTML content. I'm slightly dissatisfied with
that, but perhaps it's to be expected. I'm open to other options!

> 2. Identify particular fragments of the text (using FRBRoo).
> 3. Tag particular things in the text
> In terms of 3 there is TEI but also the option of using CRM in RDFa tags
> to identify entities and relationships in the text that would have
> correspondence in the data. This is an approach we have used at the BM.
> RDFa tags can be used to identify people, places, subjects etc, and can
> link these entities using CRM properties. These can operate on their own as
> an extension to the RDF store or be harvested into the RDF store.

In other projects I have used TEI as a source for RDF, with a workflow
which harvests RDF from TEI documents and stores them in a SPARQL graph
store. It's a powerful technique for aggregating data across a corpus of
texts. I would be very interested to read more about how you have used TEI
(or RDFa) in this way at the British Museum!

But in this particular project I'm trying out a workflow that doesn't
involve an RDF store at all. I don't control the source of the data (I
don't work for Museum Victoria); I am merely querying it and re-formatting
it to produce RDF on the fly (i.e. as requested by a Linked Data client).
Their API is not natively RDF, and I'm not harvesting or even caching the
RDF data I generate so there's actually no "RDF store" involved at all.
It's been an interesting experiment for me; the weaknesses in the approach
are that any actual aggregation you need to do has to be quick enough to
perform on the fly. The Linked Data resources (RDF graphs) my software
produces are all based on 1 or at most 2 queries to the Museum's API, and
possibly 1 to dbpedia. On the positive side, the lack of caching and
harvesting makes the whole thing very simple.



Conal Tuohy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20150910/629c59a7/attachment.html>

More information about the Crm-sig mailing list