[Crm-sig] Pages reproduced as spreads

martin martin at ics.forth.gr
Fri Mar 10 18:48:24 EET 2017


On 10/3/2017 6:21 μμ, Florian Kräutli wrote:
> Apologies for having inadvertently split this discussion in two 
> threads. I hope this answer shows up in the right place.
>
> Thank you Dominic and Christian-Emil. This is really useful.
>
> Dominic, you pointed to another part of the problem which I haven't 
> asked about here, but which appears in a different area of our model. 
> The question of how do I specify the page where an expression appears 
> in a PDF?
>
> Martin and George, I hope you don't mind if I share your 
> recommendation on this question here:
>
>     7) How to refer to page?
>
>     The distinction between pages in the physical work and the digital
>     work was first pointed to.
>
>     1) One page in the publication expression. Pages separate below
>     phrase boundaries. Therefore they are units at the symbolic level
>     and parthood should be expressed using P106
>
>     2) One in the digital image
>
>     If you want to go pages on pdf, would best to use a media
>     indexing Annotation from 3DCoform
>     METS <area> construct gets an ID and has coordinates in media object2
>
Yes, the idea is to regard images, 3D-models, HTML files, .doc etc., as 
making up virtual mathematical spaces (not! E53 Place), in which we can 
describe sections by some geometric expression, as I would in natural 
space by geocoordinates. This is already underlying the METS <area> 
construct. In the 3D-COFORM Project, we have generalized this to add 3D 
models. We then describe an area, such as a rectangle highlight on an 
image, as a P106 part of the Digital Object, and identify it by a sort 
of new RDF data type, i.e., a literal filled with an XML expression, as 
xsdDateTime, GeoSPARQL etc. That allows for connecting a Triple Store 
with a media browser that understands the highlights.

Best,

Martin
>
>
>
>
>> On 10 Mar 2017, at 16:29, crm-sig-request at ics.forth.gr 
>> <mailto:crm-sig-request at ics.forth.gr> wrote:
>>
>> Send Crm-sig mailing list submissions to
>> crm-sig at ics.forth.gr <mailto:crm-sig at ics.forth.gr>
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>> or, via email, send a message with subject or body 'help' to
>> crm-sig-request at ics.forth.gr
>>
>> You can reach the person managing the list at
>> crm-sig-owner at ics.forth.gr
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Crm-sig digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Re: Crm-sig Digest, Vol 122, Issue 8 (Dominic Oldman)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 10 Mar 2017 15:24:50 +0000
>> From: Dominic Oldman <doint at oldman.me.uk>
>> To: Christian-Emil Smith Ore <c.e.s.ore at iln.uio.no>,
>> "crm-sig at ics.forth.gr" <crm-sig at ics.forth.gr>
>> Subject: Re: [Crm-sig] Crm-sig Digest, Vol 122, Issue 8
>> Message-ID:
>> <CAHVLp01pyaJqx52N97JX=syvJfGrscPooig9S6a=3h0WcFBimg at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi Florian,
>>
>> Here is an off line discussion that we should have put on the list.
>>
>> Cheers,
>>
>> D
>>
>>
>> orcid.org/0000-0002-5539-3126
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Christian-Emil Smith Ore <
>> c.e.s.ore at iln.uio.no> wrote:
>>
>>> Do so, and send my regards. Please incorproate the following example:
>>>
>>>
>>> To create excerpts is common activity in lexicography and history. An
>>> excerpt is indeed a fragement of a text. The  corresponding 
>>> expression is a
>>> fragment expression.  See for example a paperslip for the word 
>>> 'shovelfork'
>>> (used to prepare la (small) field instead of ploughing.  The text is a
>>> fragment of a longer text dealing with somebody childhood memories
>>>
>>>
>>> http://www.edd.uio.no/setelarkiv/setel1963769.jpg?
>>>
>>>
>>> The entire paper slip represents a self-contained expression where a
>>> expression fragment is incorporated (in the corresponding work)
>>>
>>>
>>> Best
>>>
>>> Christian-Emil
>>> ------------------------------
>>> *From:* Dominic Oldman <doint at oldman.me.uk>
>>> *Sent:* 10 March 2017 13:32
>>>
>>> *To:* Christian-Emil Smith Ore
>>> *Subject:* Re: [Crm-sig] Crm-sig Digest, Vol 122, Issue 8
>>>
>>> Hi Christian,
>>>
>>> I note that this didnt go on the list - Can I post this to the list as I
>>> think it is important generally.
>>>
>>> D
>>>
>>> orcid.org/0000-0002-5539-3126
>>>
>>> On Fri, Mar 10, 2017 at 12:30 PM, Dominic Oldman <doint at oldman.me.uk>
>>> wrote:
>>>
>>>> Although I think then the scope note could be much clearer on E23 
>>>> because
>>>> it tends to suggest fragments isolated from the whole whereas in 
>>>> this case
>>>> the section still resides within a whole. Although the scope note does
>>>> state "excerpts" I still think this could be stated far more 
>>>> clearly with
>>>> less ambiguity -  if it does mean that these excerpts can be identified
>>>> sections of the information object within a whole text.
>>>>
>>>> Can we put this on the agenda for the next meeting?
>>>>
>>>> D
>>>>
>>>>
>>>> orcid.org/0000-0002-5539-3126
>>>>
>>>> On Fri, Mar 10, 2017 at 9:37 AM, Christian-Emil Smith Ore <
>>>> c.e.s.ore at iln.uio.no> wrote:
>>>>
>>>>> It is not necessarily so that the text printed on a page is a
>>>>> self-contained expression, it is in general a F23 Expression 
>>>>> Fragment ?
>>>>>
>>>>>
>>>>> Best
>>>>>
>>>>> Christian-Emil
>>>>>
>>>>>
>>>>> F22 Self-Contained Expression
>>>>>
>>>>> This class comprises the immaterial realisations of individual 
>>>>> works at
>>>>> a particular time that are regarded as a complete whole. The 
>>>>> quality of
>>>>> wholeness reflects the intention of its creator that this 
>>>>> expression should
>>>>> convey the concept of the work. Such a whole can in turn be part of a
>>>>> larger whole.
>>>>>
>>>>>
>>>>> Inherent to the notion of work is the completion of recognisable
>>>>> outcomes of the work. These outcomes, i.e. the Self-Contained 
>>>>> Expressions,
>>>>> are regarded as the symbolic equivalents of Individual Works, 
>>>>> which form
>>>>> the atoms of a complex work. A Self-Contained Expression may contain
>>>>> expressions or parts of expressions from other work, such as 
>>>>> citations or
>>>>> items collected in anthologies. Even though they are incorporated 
>>>>> in the
>>>>> Self-Contained Expression, they are not regarded as becoming 
>>>>> members of the
>>>>> expressed container work by their inclusion in the expression, but are
>>>>> rather regarded as foreign or referred to elements.
>>>>>
>>>>>
>>>>> F22 Self-Contained Expression can be distinguished from F23 Expression
>>>>> Fragment in that an F23 Expression Fragment was not intended by 
>>>>> its creator
>>>>> to make sense by itself. Normally creators would characterise an 
>>>>> outcome of
>>>>> a work as finished. In other cases, one could recognise an outcome 
>>>>> of a
>>>>> work as complete from the elaboration or logical coherence of its 
>>>>> content,
>>>>> or if there is any historical knowledge about the creator 
>>>>> deliberately or
>>>>> accidentally never finishing (completing) that particular 
>>>>> expression. In
>>>>> all those cases, one would regard an expression as self-contained.
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> *From:* Dominic Oldman <doint at oldman.me.uk>
>>>>> *Sent:* 09 March 2017 20:50
>>>>> *To:* Christian-Emil Smith Ore
>>>>>
>>>>> *Subject:* Re: [Crm-sig] Crm-sig Digest, Vol 122, Issue 8
>>>>>
>>>>>
>>>>> So in this case the self contained expression (information object)
>>>>> identified as page 1 can then be represented by a part of a PDF 
>>>>> image which
>>>>> itself identifies parts (a physical page?) which are identified 
>>>>> accordingly.
>>>>>
>>>>> I'm still not sure whether this is what Florian means though - so 
>>>>> await
>>>>> his reply.
>>>>>
>>>>> D
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> orcid.org/0000-0002-5539-3126
>>>>>
>>>>> On Thu, Mar 9, 2017 at 7:31 PM, Christian-Emil Smith Ore <
>>>>> c.e.s.ore at iln.uio.no> wrote:
>>>>>
>>>>>> Hi
>>>>>> There are many ways to number or put identifiers to parts of 
>>>>>> written or
>>>>>> printed material:folio, sheet (versio/recto), page.
>>>>>> If the physical original is known, perhaps a starting point would 
>>>>>> be to
>>>>>> model the physical parts and their relationships.
>>>>>>
>>>>>> The pdfs in question seems to be facsimiles of these physical 
>>>>>> parts. (a
>>>>>> single page, double pages etc). A possible way to model them is 
>>>>>> to see the
>>>>>> pdfs as carriers of visual items reperesenting the physical 
>>>>>> objects of the
>>>>>> specific item (P5).
>>>>>>
>>>>>> The first example in the compenote of  P138 represents (has
>>>>>> representation):
>>>>>> ?       the digital file found at http://www.emunch.no/N/full/No
>>>>>> -MM_N0001-01.jpg (E36) represents page 1 of Edward Munch's manuscript
>>>>>> MM N 1, Munch-museet (E73) mode of representation Digitisation(E55)
>>>>>>
>>>>>> Best
>>>>>> Christian-Emil
>>>>>> ________________________________________
>>>>>> From: Crm-sig <crm-sig-bounces at ics.forth.gr> on behalf of Dominic
>>>>>> Oldman <DOldman at britishmuseum.org>
>>>>>> Sent: 09 March 2017 17:59
>>>>>> To: Florian Kr?utli; crm-sig at ics.forth.gr
>>>>>> Subject: Re: [Crm-sig] Crm-sig Digest, Vol 122, Issue 8
>>>>>>
>>>>>> Hi Florian,
>>>>>>
>>>>>> Just trying to understand.
>>>>>>
>>>>>> You have an expression that is organised with page numbers. This is
>>>>>> reproduced in the PDF. The expression page numbers are the same (the
>>>>>> information object) but page 1 is spread over two carrier pages. 
>>>>>> i.e. page
>>>>>> 1 is still page 1 as an information object but on the application 
>>>>>> adobe
>>>>>> spreads it over two application carrier pages. Is that right? or 
>>>>>> is it
>>>>>> something else.
>>>>>>
>>>>>> If the expression is the same (the same information object) then 
>>>>>> isn't
>>>>>> page 1, page 1
>>>>>>
>>>>>> Can you clarify.
>>>>>>
>>>>>> D
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Crm-sig [crm-sig-bounces at ics.forth.gr] on behalf of Florian
>>>>>> Kr?utli [fkraeutli at mpiwg-berlin.mpg.de]
>>>>>> Sent: 09 March 2017 10:38
>>>>>> To: crm-sig at ics.forth.gr
>>>>>> Subject: Re: [Crm-sig] Crm-sig Digest, Vol 122, Issue 8
>>>>>>
>>>>>> Dear Martin,
>>>>>>
>>>>>> many thanks for your input!
>>>>>>
>>>>>> Our question at the moment is simply, does a page in the PDF 
>>>>>> represent
>>>>>> one or two pages of the book?
>>>>>>
>>>>>> Later on, we might have more specific questions that will require 
>>>>>> us to
>>>>>> define the relationships between these two page identifiers (in the
>>>>>> physical book and in the PDF) more explicitly. We would then also 
>>>>>> need to
>>>>>> manually assess each PDF as, for instance, we can not assume that 
>>>>>> page n in
>>>>>> a book corresponds to page n/2 in a double-spread PDF. A PDF 
>>>>>> might contain
>>>>>> some additional pages with information about the digitisation 
>>>>>> process.
>>>>>>
>>>>>> For now we however only need a binary answer: double-spread yes 
>>>>>> or no.
>>>>>>
>>>>>> All the best,
>>>>>>
>>>>>> Florian
>>>>>>
>>>>>>
>>>>>>> On 8 Mar 2017, at 11:00, crm-sig-request at ics.forth.gr wrote:
>>>>>>>
>>>>>>> Send Crm-sig mailing list submissions to
>>>>>>>      crm-sig at ics.forth.gr
>>>>>>>
>>>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>>>      http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>>> or, via email, send a message with subject or body 'help' to
>>>>>>>      crm-sig-request at ics.forth.gr
>>>>>>>
>>>>>>> You can reach the person managing the list at
>>>>>>>      crm-sig-owner at ics.forth.gr
>>>>>>>
>>>>>>> When replying, please edit your Subject line so it is more specific
>>>>>>> than "Re: Contents of Crm-sig digest..."
>>>>>>>
>>>>>>>
>>>>>>> Today's Topics:
>>>>>>>
>>>>>>>  1. Re: Pages reproduced as spreads (martin)
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>> ----------
>>>>>>>
>>>>>>> Message: 1
>>>>>>> Date: Tue, 7 Mar 2017 18:24:17 +0200
>>>>>>> From: martin <martin at ics.forth.gr>
>>>>>>> To: crm-sig at ics.forth.gr
>>>>>>> Subject: Re: [Crm-sig] Pages reproduced as spreads
>>>>>>> Message-ID: <e4b3d793-40d5-f5d5-1f39-ff2404bab29b at ics.forth.gr>
>>>>>>> Content-Type: text/plain; charset=UTF-8; format=flowed
>>>>>>>
>>>>>>> Dear Florian,
>>>>>>>
>>>>>>> There is no model without a question. Pages of books constitute a
>>>>>>> partitioning of an
>>>>>>> information object. Each page number can be seen as an identifier.
>>>>>>> Paragraphs belong to an alternative partitioning system. The
>>>>>>> reproduction has its own particioning, the scanned double pages.
>>>>>>> Each scanned image represents, actually also incorporates, the 
>>>>>>> text of
>>>>>>> two pages of the reproduced.
>>>>>>> Between alternative partitionings, one can define includes/overlaps
>>>>>>> relations.
>>>>>>>
>>>>>>> If this is elegant, depends on what queries or functions you'd 
>>>>>>> like to
>>>>>>> support.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> martin
>>>>>>>
>>>>>>> On 7/3/2017 1:36 ??, Florian Kr?utli wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> I have a collection of Books (F5) that have been reproduced 
>>>>>>>> (F33) as
>>>>>> PDFs (E84).
>>>>>>>> In some cases, books have been digitised as spreads i.e. one 
>>>>>>>> page in
>>>>>> the PDF represents two pages in the book.
>>>>>>>>
>>>>>>>> Is there an elegant way to model this?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Florian
>>>>>>>> _______________________________________________
>>>>>>>> Crm-sig mailing list
>>>>>>>> Crm-sig at ics.forth.gr
>>>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> Dr. Martin Doerr              |  Vox:+30(2810)391625        |
>>>>>>> Research Director             |  Fax:+30(2810)391638        |
>>>>>>>                               |  Email: martin at ics.forth.gr |
>>>>>>>                                                             |
>>>>>>>               Center for Cultural Informatics               |
>>>>>>>               Information Systems Laboratory                |
>>>>>>>                Institute of Computer Science                |
>>>>>>>   Foundation for Research and Technology - Hellas (FORTH)   |
>>>>>>>                                                             |
>>>>>>>               N.Plastira 100, Vassilika Vouton,             |
>>>>>>>                GR70013 Heraklion,Crete,Greece               |
>>>>>>>                                                             |
>>>>>>>             Web-site: http://www.ics.forth.gr/isl           |
>>>>>>> --------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>> Subject: Digest Footer
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Crm-sig mailing list
>>>>>>> Crm-sig at ics.forth.gr
>>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>> End of Crm-sig Digest, Vol 122, Issue 8
>>>>>>> ***************************************
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Crm-sig mailing list
>>>>>> Crm-sig at ics.forth.gr
>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>>
>>>>>> _______________________________________________
>>>>>> Crm-sig mailing list
>>>>>> Crm-sig at ics.forth.gr
>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>>
>>>>>> _______________________________________________
>>>>>> Crm-sig mailing list
>>>>>> Crm-sig at ics.forth.gr
>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: 
>> <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20170310/1ff1d6bd/attachment.html>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> Crm-sig mailing list
>> Crm-sig at ics.forth.gr
>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig
>>
>>
>> ------------------------------
>>
>> End of Crm-sig Digest, Vol 122, Issue 10
>> ****************************************
>
>
>
> _______________________________________________
> Crm-sig mailing list
> Crm-sig at ics.forth.gr
> http://lists.ics.forth.gr/mailman/listinfo/crm-sig


-- 

--------------------------------------------------------------
  Dr. Martin Doerr              |  Vox:+30(2810)391625        |
  Research Director             |  Fax:+30(2810)391638        |
                                |  Email: martin at ics.forth.gr |
                                                              |
                Center for Cultural Informatics               |
                Information Systems Laboratory                |
                 Institute of Computer Science                |
    Foundation for Research and Technology - Hellas (FORTH)   |
                                                              |
                N.Plastira 100, Vassilika Vouton,             |
                 GR70013 Heraklion,Crete,Greece               |
                                                              |
              Web-site: http://www.ics.forth.gr/isl           |
--------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ics.forth.gr/pipermail/crm-sig/attachments/20170310/49626eff/attachment-0001.html>


More information about the Crm-sig mailing list