= Recommendations for the VLO-Facets = [[PageOutline(2-3)]] == General information == === Conventions for the Vocabulary === * Generally, there should be an English version of all Metadata (though there may be exceptions); all non-English metadata should be indicated by the attribute @xml:lang, defining the language of the respective metadata. * Use existing vocabularies for metadata entries wherever possible. * Use standardised spellings/versions of names (such as person names, organisation names, etc.). * In general, each metadata element should be filled by exactly one metadata entry. If there are multiple information/entries for one metadata category, repeat the respective metadata element, e. g. instead of `speech, gesture` use `speechgesture`. * Different elements should be exclusive concerning their meaning (i.e. there shouldn't be semantic overlaps). * Prefer explicit information over unspecific element contents, e. g. instead of `multimodal` use `speechgesture`. * Metadata entries which are meant to be integrated in the facetted search should be kept short (one-word-expressions or short phrases). === Terminological Conventions of Facet Definitions === Source Material:: The original texts or objects which the resource under consideration is based on. Resource:: The digital object at hand which is described by the metadata record under consideration. == Search facets == === Collection === Definition:: The collection to which a resource or tool belongs. A resource or tool can only belong to one collection. A collection is based on a specific (legacy) archive or a certain type of resource within such an archive or centre. The members of a collection share basic metadata vocabularies for the information displayed in the VLO. Tooltip:: The collection to which the resource or tool belongs Recommended Data Categories:: None. Element content of `` is used instead. Recommendations on the Vocabulary:: A resource has to be assigned exactly one collection it belongs to. The usage of more than one `` element is not valid with respect to the schema. The usage of more than one collection name within one `` element is not supported. === Continent (deprecated) === {{{#!div class="notice system-message" This facet is to be removed from the VLO. }}} Definition:: The continent of origin of the source material of the resource, i.e. not the continent in which the resource was created, but where e.g. original texts were written or speech recordings made. Tooltip:: The continent of origin of the source material of the resource === Country === Definition:: The country of origin of the source material of the resource, i.e. not the country in which the resource was created, but where e.g. original texts were written or speech recordings made. Tooltip:: The country of origin of the source material of the resource Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2532_d004b0a6-fd1d-3ca3-abf1-1e6aeb3e37b2: location country; → Nota bene: This data category has a broader definition than the VLO facet `Country`. Decisive for the interpretation by the VLO is the (stricter) facet definition. Problematic Data Categories: http://hdl.handle.net/11459/CCR_C-3792_68c770a4-d58c-46dd-d429-5609ce5f81c3: country name, http://hdl.handle.net/11459/CCR_C-2092_36cd7ca8-e412-9f29-7ea7-4a3ba4ba2c91: country coding → Both these data categories are not sufficiently concrete themselves for further interpretation within the VLO. If used, it is necessary that they are embedded in a context (of other data categories) which narrows down the possible readings of these categories to the one given in the facet definition. Recommended Vocabulary:: Codes or Names according to ISO 3166-1 (Alpha-2, Alpha-3, or Numerical); cf.http://en.wikipedia.org/wiki/ISO_3166-1, http://www.geonames.org/countries/. === Data Provider (deprecated) === {{{#!div class="notice system-message" This facet is to be removed from the VLO. }}} Definition:: The indication of the institution providing the metadata on the resource as being a CLARIN centre or not Tooltip:: The provider of the metadata for this resource === Format === Definition:: The mime types of the files in the resource or consumed/produced by the tool. Tooltip:: The mime types used in the resource or by the tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2571_2be2e583-e5af-34c2-3673-93359ec1f7df: mime type Recommended Vocabulary:: The Template expressions under: http://www.iana.org/assignments/media-types/media-types.xhtml Open Issues:: Maybe create a more tight vocabulary for linguistic resources based on https://docs.google.com/spreadsheets/d/1Tjmp_sEZDHIqnFAU1erx2VtzyYdjbKdM7RQGNIUbWhs/edit#gid=884722196 ? === Genre === Definition:: The conventionalized discourse or text type of the content of the resource, consistently applied within the collection. Tooltip:: The genre of the content of the resource Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2470_d191f2b2-6339-f031-b534-70d526b28357: genre, http://hdl.handle.net/11459/CCR_C-3899_c6c608e7-cb2e-1832-09ff-aee36e1f2ed4: subGenre Recommendations on the Vocabulary:: The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, sufficiently documented, linked to via some reference within the respective element. === Keywords === Definition:: Keywords containing relevant information on the resource or tool not stated in other VLO metadata facets, consistently applied within the collection. Tooltip:: Keywords describing the resource or tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-5436_6ab57c2c-5f8d-3561-6db6-d75da23d2637: metadata tag Recommendations on the Vocabulary:: The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, sufficiently documented, linked to via some reference within the respective element. === Language === Definition:: The object language relevant for the resource or tool, i.e. the language of the source material of a resource, the object language of a language description, or the language supported by a linguistic tool. Tooltip:: The object language relevant for the resource or tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2482_08eded24-4086-7e3f-88e5-e0807fb01e17: languageID, http://hdl.handle.net/11459/CCR_C-2484_669684e7-cb9e-ea96-59cb-a25fe89b9b9d: languageName → N.B.: If possible, the languageName-category should only be used in combination with a language ID corresponding to ISO 639-3 (see: Recommendations on the Vocabulary). http://hdl.handle.net/11459/CCR_C-5361_ba085ec1-9746-52bf-8cc1-3c300ce16eb8: langUsage with http://hdl.handle.net/11459/CCR_C-5358_3cd089fe-ad03-6181-b20c-635ea41ed818: language → N.B.: The langUsage and language categories are modeled analogous to the corresponding TEI Header elements. They should only be used in combination with one another. Recommendations on the Vocabulary:: Language Codes according to ISO 639-3 (Languages) or ISO 639-5 (Language Families) ISO 639-3 `und` (undeterminded) for resources which cannot be assigned one exact language (e.g. language independent tools). === Availability === Definition:: A rough description of the conditions under which the resource or tool can be used. Tooltip:: The usage conditions for the resource or tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8: i.e. availability, such as “free; free for academic use; restricted use; request required; user licence required; registration required; unknown”; http://www.isocat.org/datcat/DC-6846: i.e. rights, meaning “Any rights information for this resource.” (hence: very broad definition); http://hdl.handle.net/11459/CCR_C-5439_98bb103d-476a-7f62-54b4-bf9de24d2229: i.e. license type Recommended Vocabulary:: '''free''' (The resource is available for re-use under a free license, e.g. CC, LGPL. There may be still usage constraints within the scope of the respective licenses, e.g. concerning attribution or conditions of further sharing). '''free for academic use''' (The resource is available for re-use in an academic context.) '''restricted''' (The resource is not available for free re-use. Restrictions may concern the availability or functionality of the resource or its parts. Restrictions might be loosened by purchase of a license.) '''upon-request''' (The resource is freely available upon a user’s request. The terms of usage will be negotiated based on the intended usage scenarios.) Open Issues:: Test implementation available under: http://aspra11.informatik.uni-leipzig.de:8080/vlo/search?0 This facet should be accompanied by a display facet `License`. === Lifecycle Status === Definition:: The status in the life cycle of the resource or tool. Tooltip:: The status in the life cycle of the resource or tool Recommended Data Categories:: (not discussed yet!) http://hdl.handle.net/11459/CCR_C-3818_8c4aec73-1654-7565-9575-c4a17425ee29: life cycle status, such as: “.” Recommended Vocabulary:: As far as possible use the vocabulary proposed in DC-3818, i.e.: '''planned, development, released, production, withdrawn, retired, superseded, archived''' === Modality === Definition:: The channel by which the signs in the content of the resource were transmitted or the modality for which a tool is intended, e.g. to regonize speech, gestures or entities of a text. Tooltip:: The modality of the content of the resource or intended for the tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2490_44bc38a3-1799-4149-c791-40ac0176f0ff: modalities Recommended Vocabulary:: (based on the examples under http://hdl.handle.net/11459/CCR_C-2490_44bc38a3-1799-4149-c791-40ac0176f0ff and the data within the VLO’s modality facet) ||= Type =||= Subtype =||=Examples and/or notes|| ||actOut / performance / demonstration|| || a cooking show || ||dance || || || ||eyeGaze || || || ||facialExpressions || || || ||gestures ||(languageLike) || pantomimes, emblems, sign language || || ||(coSpeech) || iconics, beats, cohesives, deictics, metaphorics || || ||(enblematics) || ||image ||(painting / drawing)|| || || ||(photo) || || || ||(other) || || ||music ||(instrumental) || || || ||(singing) || || ||speech || ||an interview, storytelling|| ||writing || (print) ||a book || || || (handwritten) ||a letter|| ||unspecified || || || ||other || || drum signals, whistling || ||multimodal || ||In case that several modalities are involved, do not provide more than one term per element. Instead make use of the respective element several times. If that is not possible, the unspecified term '''multimodal''' may be used.|| Open Issues:: Discuss Feedback of F-AG6 (Petra Wagner, Uni Bielefeld) === National Project === Definition:: The national CLARIN project providing the resource or tool. Tooltip:: The national CLARIN project providing the resource or tool Recommended Data Categories:: /c:CMD/c:Header/c:MdCollectionDisplayName/text() Recommended Vocabulary:: '''CELR, CLARIN-AT, CLARIN-D, CLARIN-DK, CLARIN-EU, CLARIN-NL, CLARIN-PL, CLARINO, SWE-CLARIN, Other''' This list will have to be expanded for new national CLARIN initiatives. (Note from Hanna Hedeland: Aren't these values are derived when harvesting via a mapping?) Open Issues:: A description of the resources harvested as well as the criteria for harvesting would be helpful. === Organisation === Definition:: The name of the organisation currently responsible for the resource or tool, i.e. to be contacted with any questions or requests regarding the metadata or access to the resource/tool. Tooltip:: The organisation currently responsible for the resource or tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2459_fc4e74d6-84de-c8cd-1ae8-2c2be5ee90b1: organization Problematic Data Categories:: http://hdl.handle.net/11459/CCR_C-2979_8030473e-bbcb-6b87-3fd2-90554429ec50: Organisation → Underspecified, i.e. this DC doesn’t specify the purpose of the organisation named here in the context of the resource. Eliminated/Ignored Data Categories:: http://hdl.handle.net/11459/CCR_C-6134_72c22724-2615-fd70-2eff-8cd3cb59e91d:publisher (from TEI), http://purl.org/dc/terms/publisher: publisher (Dublin Core) → Both underspecified, i.e. this DCs don’t specify the entity named here (organisation, person, etc.) now the purpose of the entity in the context of the resource. Recommendations on the Vocabulary:: (1) There should be an English reading provided for each institution name which will be represented within the VLO. The English name should be marked as such by usage of a language-defining attribute and an ISO 639-3 language code (e.g. `@xml:lang="eng"`). (2) There should only be one variant of an institution name used within all metadata records provided by this respective institution. === Project === Definition:: The name of the projects originally involved in the creation of the resource or tool. These projects may no longer exist and are usually not the ones to be contacted regarding the resource/tool. Tooltip:: The project within which the resource was created Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2536_13fc5f10-c14a-1f64-a669-32736f6d3ef5: project name, http://hdl.handle.net/11459/CCR_C-2537_fa206273-223a-f4fa-dde3-ba59b965701f: project title Recommendations on the Vocabulary:: TODO === Resource Type === Definition:: The type of the resource or tool (e.g. corpus, lexicon, grammar, tool, …) Tooltip:: The type of the resource or tool (e.g. corpus, lexicon, grammar, tool, …) Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-3806_e55e9ed6-b099-c21d-a634-3c7f4d22a215: resource class, http://hdl.handle.net/11459/CCR_C-5424_3200a38b-344e-41de-e539-f71f80c38df8: type → based on the TEI Header definition of type Problematic Data Categories:: http://purl.org/dc/terms/type, http://purl.org/dc/elements/1.1/type → This DC is underspecified since “genre” is included in the definition as well. If used, it is necessary that it is embedded in a context (of other data categories) disambiguating the possible readings of this category. Recommended Vocabulary:: (based on the examples under http://hdl.handle.net/11459/CCR_C-3806_e55e9ed6-b099-c21d-a634-3c7f4d22a215 and the data within the VLO’s Resource Type facet) ||= Value =||Example|| ||annotatedText|| || ||audioRecording || || ||collection || a collection which cannot be used as a corpus in a linguistic sense|| ||corpus || || ||database || || ||dataset:experimentalData|| || ||dataset:fieldworkMaterial|| || ||dataset:surveyData|| || ||dataset:testData|| || ||grammar|| || ||image||a digital image|| ||lexicalResource|| || ||physicalObject|| book, picture, photo, stone with inscription, …|| ||plainText|| || ||session|| || ||teachingMaterial || || ||tool|| || ||toolChain|| || ||videoRecording|| || ||webApplication|| || ||webService|| || ||unspecified || || ||other || || TODO: Excluded DC3806-vocabulary: grammar → rather belongs to facet genre, lexicon → use lexicalResource instead, resourceBundle → use collection instead, teachingMaterial → rather belongs to facet genre === Subject === Definition:: The subject or topic of the content of the resource, consistently applied within the collection. Tooltip:: The subject or topic of the content of the resource Recommended Data Categories:: http://purl.org/dc/terms/subject: subject, http://purl.org/dc/elements/1.1/subject: subject, http://hdl.handle.net/11459/CCR_C-6147_ebed915e-f911-f128-cddc-466aa41c9c73: domain of use, http://hdl.handle.net/11459/CCR_C-5316_2c6244b4-4f10-5e8e-49b6-26fbf7004791: classification code → This DC is designed analogous to the TEI-Header element classCode. It is thus underspecified for the subject facet. Hence, when used the classification scheme has to be determined and the usage of this element according to the definition of the subject facet should be specified by the context somehow. Recommended Vocabulary:: The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, and preferably sufficiently documented. === Temporal Coverage === Definition:: The temporal coverage of the source material of the resource, i.e. not the time within which the resource was created, but when e.g. original texts were written or speech recordings made. Tooltip:: The temporal coverage of the source material of the resource Possible Data Categories:: http://hdl.handle.net/11459/CCR_C-3664_eb600f47-5123-efbe-251b-d952c65fc847: Time coverage, http://hdl.handle.net/11459/CCR_C-3654_f1608e88-95e6-4233-5d21-5312e76de32d: Start range → This data category has to be used together with DC-3655, http://hdl.handle.net/11459/CCR_C-3655_bc4c2656-2946-0be9-49f0-021a811e531b: End range → This data category has to be used together with DC-3654 Problematic Data Categories:: http://hdl.handle.net/11459/CCR_C-4343_0d6e80e7-6f6c-5497-eac6-29b95a5fa9ec: interval → Underspecified (“a : a space of time between events or states”): could be filled with values like “200 years”, “a decade”, etc.; http://hdl.handle.net/11459/CCR_C-5742_66aaccdd-f1d0-a6a6-a4fb-efa704e06d8b: End time → incomplete; corresponding start time category missing; http://hdl.handle.net/11459/CCR_C-2502_747eb0cd-03e9-cffb-34cc-d0c8c77e4c5a: time coverage → Definition (“The time period that the content of a resource is about.”) does not suit the focus of this facet (cf. 17.1) Proposed vocabulary:: Open Date Range format www.ukoln.ac.uk/metadata/dcmi/date-dccd-odrf for http://hdl.handle.net/11459/CCR_C-3664_eb600f47-5123-efbe-251b-d952c65fc847; W3C DateTime http://www.w3.org/TR/NOTE-datetime for http://hdl.handle.net/11459/CCR_C-3654_f1608e88-95e6-4233-5d21-5312e76de32d and http://hdl.handle.net/11459/CCR_C-3655_bc4c2656-2946-0be9-49f0-021a811e531b == Display facets == === Name === Definition:: The full public name of the resource or tool. Tooltip:: The name of the resource or tool Recommended Data Categories:: XXX Recommended Vocabulary:: XXX === Description === Definition:: A general description of the resource or tool apropriate for the VLO context, i.e. for a specific language resource or tool. Tooltip:: A general description of the resource or tool Recommended Data Categories:: XXX Recommended Vocabulary:: XXX === License === Definition:: The name of the license or a very brief description of the licensing conditions under which the resource or tool can be used. Tooltip:: The licensing conditions for the resource or tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8: i.e. availability, such as “free; free for academic use; restricted use; request required; user licence required; registration required; unknown”; http://www.isocat.org/datcat/DC-6846: i.e. rights, meaning “Any rights information for this resource.” (hence: very broad definition) Recommendations on the Vocabulary:: XXX General Recommendations:: In case of more than one license per resource (e.g. a resource containing several parts which are published under different licenses) there may be multiple selections of licenses within one metadata record. === Rights Holder === {{{#!div class="notice system-message" proposal by Hanna; not yet discussed! }}} Definition:: The person or organization holding the full rights for the resource or tool. Tooltip:: The rights holder of the resource or tool Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-6709_cb3572ed-ffd3-04f1-c145-b9c1f26bfc82, http://purl.org/dc/terms/rightsHolder, http://hdl.handle.net/11459/CCR_C-2956_519a4aab-2f76-0fd3-090e-f0d6b81a7dbb ? Recommendations on the Vocabulary:: XXX === Actor or Author === {{{#!div class="notice system-message" proposal; partly discussed with agreement on basic DCs, but due to complexity not a final recommendation draft }}} Definition:: Information on the participant (actor/author) that produced the source material of the resource Tooltip:: Actor or author information Recommended Data Categories:: http://hdl.handle.net/11459/CCR_C-2550_cdb6b956-9997-0923-68ef-09de017f24ef (actorAge/participantAge), http://hdl.handle.net/11459/CCR_C-2484_669684e7-cb9e-ea96-59cb-a25fe89b9b9d and/or http://hdl.handle.net/11459/CCR_C-4160_192be757-0d8f-f4fe-b10b-d3d50de92482 (actorLanguages) - if within http://hdl.handle.net/11459/CCR_C-4146_5ccc45c8-d729-c180-2bf1-fccc56dde24d (Container DC for Actor) or maybe within other Actor components, http://hdl.handle.net/11459/CCR_C-2560_ebb466e1-2b6b-6701-4e64-94618b4b455b (actorSex/participantSex), http://hdl.handle.net/11459/CCR_C-4578_c4d97a82-4ba4-d5ff-a9f9-fbd43f121e33 (actorCountry, the birth country) Recommendations on the Vocabulary:: XXX