Changes between Version 4 and Version 5 of Collaborations/Perseus


Ignore:
Timestamp:
09/25/15 10:12:11 (9 years ago)
Author:
bridget.almas@tufts.edu
Comment:

added some ideas for an RDA Collaboration project

Legend:

Unmodified
Added
Removed
Modified
  • Collaborations/Perseus

    v4 v5  
    1010 * investigate granular citations of Perseus text fragments with the [https://www.clarin.eu/content/virtual-collections virtual collection registry]
    1111 * integration of services into [http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/Main_Page Weblicht]
     12
     13
     14'''Possible RDA Collaboration Project:'''
     15
     16Data Perseus and OPP have:
     17* Images of Manuscripts and Books [ OPP ]
     18* OCR Scans of Manuscripts and Books [ OPP ]
     19* Catalog Metadata (MODS and MADS and CTS) [ Perseus ]
     20
     21What we want to be able to do:
     22* clearly distinguish between these data types
     23* assign a persistent identifier to each image
     24* assign a persistent identifier to each ocr scan
     25* and be able to differentiate image types
     26* be able to register new CTS URN identifiers for eventual TEI transcriptions resulting from scans
     27* link Catalog metadata to images and ocr scans
     28* make all of this data searchable and retrieval through CLARIN Federated Search endpoint
     29
     30Relevant RDA Components:
     31
     32* Data Types Registry to manage and describe the data types
     33* PID Types API to abstract differences between the CTS URN and Handle based identifier systems
     34
     35Possible workplan:
     36
     37CLARIN:
     38
     39* CMDI mapping of Perseus Catalog metadata
     40* Provide a repository and PIDs for Images and OCR Scans (implement RDA PID Types API if haven't already?)
     41
     42OPP:
     43* update OCR workflow to register for PIDs with CLARIN
     44* implement PID Types API for CTS URNs (augments the Catalog CITE Collections API)
     45* deploy demonstrator Data Types Registry (CORDRA) and register Perseus/OPP data types
     46
     47Perseus:
     48
     49* OAI-PMH interface to the Perseus Catalog
     50* register as Clarin center
     51
     52Solutions provided:
     53
     54Data Management solution for OPP Image/OCR/Book data
     55 * defines a common interoperable model for the relationship between images, ocr scans and books
     56 * solves a data management problem for Perseus/OPP
     57 * makes this data and solution available to other CLARIN centers
     58
     59PID Management solution for CTS URNs
     60 * step towards interoperability between CTS URNs and Handles
     61
     62Availability of Perseus/OPP Catalog data through CLARIN Federated search
     63
     64User Stories
     65
     66* I want to be able to search for medieval German texts from between the 11th and 12th century and retrieve pictures of this book so I can feed it to (1) an OCR pipeline, (2) a transcription/crowdsourcing platform and (3) other machine learning processes.
     67* I want to be able to automatically identify the differences between several versions of a picture of a page of a book (e.g. black & white, resolution, filters applied, etc.)
     68* I want to be able to create a monograph using linked data where I reference a part of a picture of a manuscript.