wiki:VLO-Taskforce/RecommendationsForFacets

Recommendations for the VLO-Facets

General information

Conventions for the Vocabulary

  • Generally, there should be an English version of all Metadata (though there may be exceptions); all non-English metadata should be indicated by the attribute @xml:lang, defining the language of the respective metadata.
  • Use existing vocabularies for metadata entries wherever possible.
  • Use standardised spellings/versions of names (such as person names, organisation names, etc.).
  • In general, each metadata element should be filled by exactly one metadata entry. If there are multiple information/entries for one metadata category, repeat the respective metadata element, e. g. instead of <modality>speech, gesture</modality> use <modality>speech</modality><modality>gesture</modality>.
  • Different elements should be exclusive concerning their meaning (i.e. there shouldn't be semantic overlaps).
  • Prefer explicit information over unspecific element contents, e. g. instead of <modality>multimodal</modality> use <modality>speech</modality <modality>gesture</modality>.
  • Metadata entries which are meant to be integrated in the facetted search should be kept short (one-word-expressions or short phrases).

Terminological Conventions of Facet Definitions

Source Material
The original texts or objects which the resource under consideration is based on.
Resource
The digital object at hand which is described by the metadata record under consideration.

Search facets

Collection

Definition
The collection to which a resource or tool belongs. A resource or tool can only belong to one collection. A collection is based on a specific (legacy) archive or a certain type of resource within such an archive or centre. The members of a collection share basic metadata vocabularies for the information displayed in the VLO.
Tooltip
The collection to which the resource or tool belongs
Recommended Data Categories
None. Element content of <MdCollectionDisplayName> is used instead.
Recommendations on the Vocabulary
A resource has to be assigned exactly one collection it belongs to. The usage of more than one <MdCollectionDisplayName> element is not valid with respect to the schema. The usage of more than one collection name within one <MdCollectionDisplayName> element is not supported.

Continent (deprecated)

This facet is to be removed from the VLO.

Definition
The continent of origin of the source material of the resource, i.e. not the continent in which the resource was created, but where e.g. original texts were written or speech recordings made.
Tooltip
The continent of origin of the source material of the resource

Country

Definition
The country of origin of the source material of the resource, i.e. not the country in which the resource was created, but where e.g. original texts were written or speech recordings made.
Tooltip
The country of origin of the source material of the resource
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2532_d004b0a6-fd1d-3ca3-abf1-1e6aeb3e37b2: location country; → Nota bene: This data category has a broader definition than the VLO facet Country. Decisive for the interpretation by the VLO is the (stricter) facet definition. Problematic Data Categories: http://hdl.handle.net/11459/CCR_C-3792_68c770a4-d58c-46dd-d429-5609ce5f81c3: country name, http://hdl.handle.net/11459/CCR_C-2092_36cd7ca8-e412-9f29-7ea7-4a3ba4ba2c91: country coding → Both these data categories are not sufficiently concrete themselves for further interpretation within the VLO. If used, it is necessary that they are embedded in a context (of other data categories) which narrows down the possible readings of these categories to the one given in the facet definition.
Recommended Vocabulary
Codes or Names according to ISO 3166-1 (Alpha-2, Alpha-3, or Numerical); cf.http://en.wikipedia.org/wiki/ISO_3166-1, http://www.geonames.org/countries/.

Data Provider (deprecated)

This facet is to be removed from the VLO.

Definition
The indication of the institution providing the metadata on the resource as being a CLARIN centre or not
Tooltip
The provider of the metadata for this resource

Format

Definition
The mime types of the files in the resource or consumed/produced by the tool.
Tooltip
The mime types used in the resource or by the tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2571_2be2e583-e5af-34c2-3673-93359ec1f7df: mime type
Recommended Vocabulary
The Template expressions under: http://www.iana.org/assignments/media-types/media-types.xhtml
Open Issues
Maybe create a more tight vocabulary for linguistic resources based on https://docs.google.com/spreadsheets/d/1Tjmp_sEZDHIqnFAU1erx2VtzyYdjbKdM7RQGNIUbWhs/edit#gid=884722196 ?

Genre

Definition
The conventionalized discourse or text type of the content of the resource, consistently applied within the collection.
Tooltip
The genre of the content of the resource
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2470_d191f2b2-6339-f031-b534-70d526b28357: genre, http://hdl.handle.net/11459/CCR_C-3899_c6c608e7-cb2e-1832-09ff-aee36e1f2ed4: subGenre
Recommendations on the Vocabulary
The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, sufficiently documented, linked to via some reference within the respective element.

Keywords

Definition
Keywords containing relevant information on the resource or tool not stated in other VLO metadata facets, consistently applied within the collection.
Tooltip
Keywords describing the resource or tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-5436_6ab57c2c-5f8d-3561-6db6-d75da23d2637: metadata tag
Recommendations on the Vocabulary
The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, sufficiently documented,

linked to via some reference within the respective element.

Language

Definition
The object language relevant for the resource or tool, i.e. the language of the source material of a resource, the object language of a language description, or the language supported by a linguistic tool.
Tooltip
The object language relevant for the resource or tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2482_08eded24-4086-7e3f-88e5-e0807fb01e17: languageID, http://hdl.handle.net/11459/CCR_C-2484_669684e7-cb9e-ea96-59cb-a25fe89b9b9d: languageName → N.B.: If possible, the languageName-category should only be used in combination with a language ID corresponding to ISO 639-3 (see: Recommendations on the Vocabulary). http://hdl.handle.net/11459/CCR_C-5361_ba085ec1-9746-52bf-8cc1-3c300ce16eb8: langUsage with http://hdl.handle.net/11459/CCR_C-5358_3cd089fe-ad03-6181-b20c-635ea41ed818: language → N.B.: The langUsage and language categories are modeled analogous to the corresponding TEI Header elements. They should only be used in combination with one another.
Recommendations on the Vocabulary
Language Codes according to ISO 639-3 (Languages) or ISO 639-5 (Language Families) ISO 639-3 und (undeterminded) for resources which cannot be assigned one exact language (e.g. language independent tools).

Availability

Definition
A rough description of the conditions under which the resource or tool can be used.
Tooltip
The usage conditions for the resource or tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8: i.e. availability, such as “free; free for academic use; restricted use; request required; user licence required; registration required; unknown”; http://www.isocat.org/datcat/DC-6846: i.e. rights, meaning “Any rights information for this resource.” (hence: very broad definition); http://hdl.handle.net/11459/CCR_C-5439_98bb103d-476a-7f62-54b4-bf9de24d2229: i.e. license type
Recommended Vocabulary
free (The resource is available for re-use under a free license, e.g. CC, LGPL. There may be still usage constraints within the scope of the respective licenses, e.g. concerning attribution or conditions of further sharing). free for academic use (The resource is available for re-use in an academic context.) restricted (The resource is not available for free re-use. Restrictions may concern the availability or functionality of the resource or its parts. Restrictions might be loosened by purchase of a license.) upon-request (The resource is freely available upon a user’s request. The terms of usage will be negotiated based on the intended usage scenarios.)
Open Issues
Test implementation available under: http://aspra11.informatik.uni-leipzig.de:8080/vlo/search?0 This facet should be accompanied by a display facet License.

Lifecycle Status

Definition
The status in the life cycle of the resource or tool.
Tooltip
The status in the life cycle of the resource or tool
Recommended Data Categories
(not discussed yet!) http://hdl.handle.net/11459/CCR_C-3818_8c4aec73-1654-7565-9575-c4a17425ee29: life cycle status, such as: “.”
Recommended Vocabulary
As far as possible use the vocabulary proposed in DC-3818, i.e.: planned, development, released, production, withdrawn, retired, superseded, archived

Modality

Definition
The channel by which the signs in the content of the resource were transmitted or the modality for which a tool is intended, e.g. to regonize speech, gestures or entities of a text.
Tooltip
The modality of the content of the resource or intended for the tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2490_44bc38a3-1799-4149-c791-40ac0176f0ff: modalities
Recommended Vocabulary
(based on the examples under http://hdl.handle.net/11459/CCR_C-2490_44bc38a3-1799-4149-c791-40ac0176f0ff and the data within the VLO’s modality facet)
Type Subtype Examples and/or notes
actOut / performance / demonstration a cooking show
dance
eyeGaze
facialExpressions
gestures (languageLike) pantomimes, emblems, sign language
(coSpeech) iconics, beats, cohesives, deictics, metaphorics
(enblematics)
image (painting / drawing)
(photo)
(other)
music (instrumental)
(singing)
speech an interview, storytelling
writing (print) a book
(handwritten) a letter
unspecified
other drum signals, whistling
multimodal In case that several modalities are involved, do not provide more than one term per element. Instead make use of the respective element several times. If that is not possible, the unspecified term multimodal may be used.
Open Issues
Discuss Feedback of F-AG6 (Petra Wagner, Uni Bielefeld)

National Project

Definition
The national CLARIN project providing the resource or tool.
Tooltip
The national CLARIN project providing the resource or tool
Recommended Data Categories
/c:CMD/c:Header/c:MdCollectionDisplayName/text()
Recommended Vocabulary
CELR, CLARIN-AT, CLARIN-D, CLARIN-DK, CLARIN-EU, CLARIN-NL, CLARIN-PL, CLARINO, SWE-CLARIN, Other This list will have to be expanded for new national CLARIN initiatives. (Note from Hanna Hedeland: Aren't these values are derived when harvesting via a mapping?)
Open Issues
A description of the resources harvested as well as the criteria for harvesting would be helpful.

Organisation

Definition
The name of the organisation currently responsible for the resource or tool, i.e. to be contacted with any questions or requests regarding the metadata or access to the resource/tool.
Tooltip
The organisation currently responsible for the resource or tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2459_fc4e74d6-84de-c8cd-1ae8-2c2be5ee90b1: organization
Problematic Data Categories
http://hdl.handle.net/11459/CCR_C-2979_8030473e-bbcb-6b87-3fd2-90554429ec50: Organisation → Underspecified, i.e. this DC doesn’t specify the purpose of the organisation named here in the context of the resource.
Eliminated/Ignored? Data Categories
http://hdl.handle.net/11459/CCR_C-6134_72c22724-2615-fd70-2eff-8cd3cb59e91d:publisher (from TEI), http://purl.org/dc/terms/publisher: publisher (Dublin Core) → Both underspecified, i.e. this DCs don’t specify the entity named here (organisation, person, etc.) now the purpose of the entity in the context of the resource.
Recommendations on the Vocabulary
(1) There should be an English reading provided for each institution name which will be represented within the VLO. The English name should be marked as such by usage of a language-defining attribute and an ISO 639-3 language code (e.g. @xml:lang="eng"). (2) There should only be one variant of an institution name used within all metadata records provided by this respective institution.

Project

Definition
The name of the projects originally involved in the creation of the resource or tool. These projects may no longer exist and are usually not the ones to be contacted regarding the resource/tool.
Tooltip
The project within which the resource was created
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2536_13fc5f10-c14a-1f64-a669-32736f6d3ef5: project name, http://hdl.handle.net/11459/CCR_C-2537_fa206273-223a-f4fa-dde3-ba59b965701f: project title
Recommendations on the Vocabulary
TODO

Resource Type

Definition
The type of the resource or tool (e.g. corpus, lexicon, grammar, tool, …)
Tooltip
The type of the resource or tool (e.g. corpus, lexicon, grammar, tool, …)
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-3806_e55e9ed6-b099-c21d-a634-3c7f4d22a215: resource class, http://hdl.handle.net/11459/CCR_C-5424_3200a38b-344e-41de-e539-f71f80c38df8: type → based on the TEI Header definition of type
Problematic Data Categories
http://purl.org/dc/terms/type, http://purl.org/dc/elements/1.1/type → This DC is underspecified since “genre” is included in the definition as well. If used, it is necessary that it is embedded in a context (of other data categories) disambiguating the possible readings of this category.
Recommended Vocabulary
(based on the examples under http://hdl.handle.net/11459/CCR_C-3806_e55e9ed6-b099-c21d-a634-3c7f4d22a215 and the data within the VLO’s Resource Type facet)
Value Example
annotatedText
audioRecording
collection a collection which cannot be used as a corpus in a linguistic sense
corpus
database
dataset:experimentalData
dataset:fieldworkMaterial
dataset:surveyData
dataset:testData
grammar
imagea digital image
lexicalResource
physicalObject book, picture, photo, stone with inscription, …
plainText
session
teachingMaterial
tool
toolChain
videoRecording
webApplication
webService
unspecified
other

TODO: Excluded DC3806-vocabulary: grammar → rather belongs to facet genre, lexicon → use lexicalResource instead, resourceBundle → use collection instead, teachingMaterial → rather belongs to facet genre

Subject

Definition
The subject or topic of the content of the resource, consistently applied within the collection.
Tooltip
The subject or topic of the content of the resource
Recommended Data Categories
http://purl.org/dc/terms/subject: subject, http://purl.org/dc/elements/1.1/subject: subject, http://hdl.handle.net/11459/CCR_C-6147_ebed915e-f911-f128-cddc-466aa41c9c73: domain of use, http://hdl.handle.net/11459/CCR_C-5316_2c6244b4-4f10-5e8e-49b6-26fbf7004791: classification code → This DC is designed analogous to the TEI-Header element classCode. It is thus underspecified for the subject facet. Hence, when used the classification scheme has to be determined and the usage of this element according to the definition of the subject facet should be specified by the context somehow.
Recommended Vocabulary
The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, and preferably sufficiently documented.

Temporal Coverage

Definition
The temporal coverage of the source material of the resource, i.e. not the time within which the resource was created, but when e.g. original texts were written or speech recordings made.
Tooltip
The temporal coverage of the source material of the resource
Possible Data Categories
http://hdl.handle.net/11459/CCR_C-3664_eb600f47-5123-efbe-251b-d952c65fc847: Time coverage, http://hdl.handle.net/11459/CCR_C-3654_f1608e88-95e6-4233-5d21-5312e76de32d: Start range → This data category has to be used together with DC-3655, http://hdl.handle.net/11459/CCR_C-3655_bc4c2656-2946-0be9-49f0-021a811e531b: End range → This data category has to be used together with DC-3654
Problematic Data Categories
http://hdl.handle.net/11459/CCR_C-4343_0d6e80e7-6f6c-5497-eac6-29b95a5fa9ec: interval → Underspecified (“a : a space of time between events or states”): could be filled with values like “200 years”, “a decade”, etc.; http://hdl.handle.net/11459/CCR_C-5742_66aaccdd-f1d0-a6a6-a4fb-efa704e06d8b: End time → incomplete; corresponding start time category missing; http://hdl.handle.net/11459/CCR_C-2502_747eb0cd-03e9-cffb-34cc-d0c8c77e4c5a: time coverage → Definition (“The time period that the content of a resource is about.”) does not suit the focus of this facet (cf. 17.1)
Proposed vocabulary
Open Date Range format www.ukoln.ac.uk/metadata/dcmi/date-dccd-odrf for http://hdl.handle.net/11459/CCR_C-3664_eb600f47-5123-efbe-251b-d952c65fc847; W3C DateTime? http://www.w3.org/TR/NOTE-datetime for http://hdl.handle.net/11459/CCR_C-3654_f1608e88-95e6-4233-5d21-5312e76de32d and http://hdl.handle.net/11459/CCR_C-3655_bc4c2656-2946-0be9-49f0-021a811e531b

Display facets

Name

Definition
The full public name of the resource or tool.
Tooltip
The name of the resource or tool
Recommended Data Categories
XXX
Recommended Vocabulary
XXX

Description

Definition
A general description of the resource or tool apropriate for the VLO context, i.e. for a specific language resource or tool.
Tooltip
A general description of the resource or tool
Recommended Data Categories
XXX
Recommended Vocabulary
XXX

License

Definition
The name of the license or a very brief description of the licensing conditions under which the resource or tool can be used.
Tooltip
The licensing conditions for the resource or tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8: i.e. availability, such as “free; free for academic use; restricted use; request required; user licence required; registration required; unknown”; http://www.isocat.org/datcat/DC-6846: i.e. rights, meaning “Any rights information for this resource.” (hence: very broad definition)
Recommendations on the Vocabulary
XXX
General Recommendations
In case of more than one license per resource (e.g. a resource containing several parts which are published under different licenses) there may be multiple selections of licenses within one metadata record.

Rights Holder

proposal by Hanna; not yet discussed!

Definition
The person or organization holding the full rights for the resource or tool.
Tooltip
The rights holder of the resource or tool
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-6709_cb3572ed-ffd3-04f1-c145-b9c1f26bfc82, http://purl.org/dc/terms/rightsHolder, http://hdl.handle.net/11459/CCR_C-2956_519a4aab-2f76-0fd3-090e-f0d6b81a7dbb ?
Recommendations on the Vocabulary
XXX

Actor or Author

proposal; partly discussed with agreement on basic DCs, but due to complexity not a final recommendation draft

Definition
Information on the participant (actor/author) that produced the source material of the resource
Tooltip
Actor or author information
Recommended Data Categories
http://hdl.handle.net/11459/CCR_C-2550_cdb6b956-9997-0923-68ef-09de017f24ef (actorAge/participantAge), http://hdl.handle.net/11459/CCR_C-2484_669684e7-cb9e-ea96-59cb-a25fe89b9b9d and/or http://hdl.handle.net/11459/CCR_C-4160_192be757-0d8f-f4fe-b10b-d3d50de92482 (actorLanguages) - if within http://hdl.handle.net/11459/CCR_C-4146_5ccc45c8-d729-c180-2bf1-fccc56dde24d (Container DC for Actor) or maybe within other Actor components, http://hdl.handle.net/11459/CCR_C-2560_ebb466e1-2b6b-6701-4e64-94618b4b455b (actorSex/participantSex), http://hdl.handle.net/11459/CCR_C-4578_c4d97a82-4ba4-d5ff-a9f9-fbd43f121e33 (actorCountry, the birth country)
Recommendations on the Vocabulary
XXX
Last modified 9 years ago Last modified on 11/03/15 11:49:23