| 12 | |
| 13 | |
| 14 | '''Possible RDA Collaboration Project:''' |
| 15 | |
| 16 | Data Perseus and OPP have: |
| 17 | * Images of Manuscripts and Books [ OPP ] |
| 18 | * OCR Scans of Manuscripts and Books [ OPP ] |
| 19 | * Catalog Metadata (MODS and MADS and CTS) [ Perseus ] |
| 20 | |
| 21 | What we want to be able to do: |
| 22 | * clearly distinguish between these data types |
| 23 | * assign a persistent identifier to each image |
| 24 | * assign a persistent identifier to each ocr scan |
| 25 | * and be able to differentiate image types |
| 26 | * be able to register new CTS URN identifiers for eventual TEI transcriptions resulting from scans |
| 27 | * link Catalog metadata to images and ocr scans |
| 28 | * make all of this data searchable and retrieval through CLARIN Federated Search endpoint |
| 29 | |
| 30 | Relevant RDA Components: |
| 31 | |
| 32 | * Data Types Registry to manage and describe the data types |
| 33 | * PID Types API to abstract differences between the CTS URN and Handle based identifier systems |
| 34 | |
| 35 | Possible workplan: |
| 36 | |
| 37 | CLARIN: |
| 38 | |
| 39 | * CMDI mapping of Perseus Catalog metadata |
| 40 | * Provide a repository and PIDs for Images and OCR Scans (implement RDA PID Types API if haven't already?) |
| 41 | |
| 42 | OPP: |
| 43 | * update OCR workflow to register for PIDs with CLARIN |
| 44 | * implement PID Types API for CTS URNs (augments the Catalog CITE Collections API) |
| 45 | * deploy demonstrator Data Types Registry (CORDRA) and register Perseus/OPP data types |
| 46 | |
| 47 | Perseus: |
| 48 | |
| 49 | * OAI-PMH interface to the Perseus Catalog |
| 50 | * register as Clarin center |
| 51 | |
| 52 | Solutions provided: |
| 53 | |
| 54 | Data Management solution for OPP Image/OCR/Book data |
| 55 | * defines a common interoperable model for the relationship between images, ocr scans and books |
| 56 | * solves a data management problem for Perseus/OPP |
| 57 | * makes this data and solution available to other CLARIN centers |
| 58 | |
| 59 | PID Management solution for CTS URNs |
| 60 | * step towards interoperability between CTS URNs and Handles |
| 61 | |
| 62 | Availability of Perseus/OPP Catalog data through CLARIN Federated search |
| 63 | |
| 64 | User Stories |
| 65 | |
| 66 | * I want to be able to search for medieval German texts from between the 11th and 12th century and retrieve pictures of this book so I can feed it to (1) an OCR pipeline, (2) a transcription/crowdsourcing platform and (3) other machine learning processes. |
| 67 | * I want to be able to automatically identify the differences between several versions of a picture of a page of a book (e.g. black & white, resolution, filters applied, etc.) |
| 68 | * I want to be able to create a monograph using linked data where I reference a part of a picture of a manuscript. |