Changes between Version 1 and Version 2 of VLO-Taskforce/RecommendationsForFacets


Ignore:
Timestamp:
07/01/15 09:32:00 (9 years ago)
Author:
herold@bbaw.de
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VLO-Taskforce/RecommendationsForFacets

    v1 v2  
    11= Recommendations for the VLO-Facets =
     2
     3[[PageOutline(2-3)]]
    24
    35== General information ==
     
    1719 Source Material:: The original texts or objects which the resource under consideration is based on.
    1820 Resource:: The digital object at hand which is described by the metadata record under consideration.
     21
     22== Search facets ==
     23
     24=== Collection ===
     25
     26 Definition:: The collection to which a resource or tool belongs. A resource or tool can only belong to one collection. A collection is based on a specific (legacy) archive or a certain type of resource within such an archive or centre. The members of a collection share basic metadata vocabularies for the information displayed in the VLO.
     27 Tooltip:: The collection to which the resource or tool belongs
     28 Recommended Data Categories:: None. Element content of `<MdCollectionDisplayName>` is used instead.
     29 Recommendations on the Vocabulary:: A resource has to be assigned exactly one collection it belongs to. The usage of more than one `<MdCollectionDisplayName>` element is not valid with respect to the schema. The usage of more than one collection name within one `<MdCollectionDisplayName>` element is not supported.
     30
     31=== Continent (deprecated) ===
     32{{{#!div class="notice system-message"
     33This facet is to be removed from the VLO.
     34}}}
     35 Definition:: The continent of origin of the source material of the resource, i.e. not the continent in which the resource was created, but where e.g. original texts were written or speech recordings made.
     36 Tooltip:: The continent of origin of the source material of the resource
     37
     38=== Country ===
     39 Definition:: The country of origin of the source material of the resource, i.e. not the country in which the resource was created, but where e.g. original texts were written or speech recordings made.
     40 Tooltip:: The country of origin of the source material of the resource
     41 Recommended Data Categories:: http://www.isocat.org/datcat/DC-2532: location country; → Nota bene: This data category has a broader definition than the VLO facet `Country`. Decisive for the interpretation by the VLO is the (stricter) facet definition.
     42 Problematic Data Categories: http://www.isocat.org/datcat/DC-3792: country name, http://www.isocat.org/datcat/DC-2092: country coding → Both these data categories are not sufficiently concrete themselves for further interpretation within the VLO. If used, it is necessary that they are embedded in a context (of other data categories) which narrows down the possible readings of these categories to the one given in the facet definition.
     43 Recommended Vocabulary:: Codes or Names according to ISO 3166-1 (Alpha-2, Alpha-3, or Numerical); cf.http://en.wikipedia.org/wiki/ISO_3166-1, http://www.geonames.org/countries/.
     44
     45=== Data Provider (deprecated) ===
     46{{{#!div class="notice system-message"
     47This facet is to be removed from the VLO.
     48}}}
     49 Definition:: The indication of the institution providing the metadata on the resource as being a CLARIN centre or not
     50 Tooltip:: The provider of the metadata for this resource
     51
     52=== Format ===
     53 Definition:: The mime types of the files in the resource or consumed/produced by the tool.
     54 Tooltip:: The mime types used in the resource or by the tool
     55 Recommended Data Categories:: http://www.isocat.org/datcat/DC-2571: mime type
     56 Recommended Vocabulary:: The Template expressions under: http://www.iana.org/assignments/media-types/media-types.xhtml
     57 Open Issues:: Maybe create a more tight vocabulary for linguistic resources based on https://docs.google.com/spreadsheets/d/1Tjmp_sEZDHIqnFAU1erx2VtzyYdjbKdM7RQGNIUbWhs/edit#gid=884722196 ?
     58
     59=== Genre ===
     60 Definition:: The conventionalized discourse or text type of the content of the resource, consistently applied within the collection.
     61 Tooltip:: The genre of the content of the resource
     62 Recommended Data Categories:: http://www.isocat.org/datcat/DC-2470: genre, http://www.isocat.org/datcat/DC-3899: subGenre
     63 Recommendations on the Vocabulary:: The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, sufficiently documented, linked to via some reference within the respective element.
     64
     65=== Keywords ===
     66 Definition:: Keywords containing relevant information on the resource or tool not stated in other VLO metadata facets, consistently applied within the collection.
     67 Tooltip:: Keywords describing the resource or tool
     68 Recommended Data Categories:: http://www.isocat.org/datcat/DC-5436: metadata tag
     69 Recommendations on the Vocabulary:: The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, sufficiently documented,
     70linked to via some reference within the respective element.
     71
     72=== Language ===
     73 Definition:: The object language relevant for the resource or tool, i.e. the language of the source material of a resource, the object language of a language description, or the language supported by a linguistic tool.
     74 Tooltip:: The object language relevant for the resource or tool
     75 Recommended Data Categories:: http://www.isocat.org/rest/dc/2482: languageID, http://www.isocat.org/rest/dc/2484: languageName → N.B.: If possible, the languageName-category should only be used in combination with a language ID corresponding to ISO 639-3 (see: Recommendations on the Vocabulary). http://www.isocat.org/rest/dc/5361: langUsage with http://www.isocat.org/rest/dc/5358: language → N.B.: The langUsage and language categories are modeled analogous to the corresponding TEI Header elements. They should only be used in combination with one another.
     76 Recommendations on the Vocabulary:: Language Codes according to ISO 639-3 (Languages) or ISO 639-5 (Language Families) ISO 639-3 `und` (undeterminded) for resources which cannot be assigned one exact language (e.g. language independent tools).
     77
     78=== Availability ===
     79 Definition:: A rough description of the conditions under which the resource or tool can be used.
     80 Tooltip:: The usage conditions for the resource or tool
     81 Recommended Data Categories:: http://www.isocat.org/datcat/DC-2453: i.e. availability, such as “free; free for academic use; restricted use; request required; user licence required; registration required; unknown”; http://www.isocat.org/datcat/DC-6846: i.e. rights, meaning “Any rights information for this resource.” (hence: very broad definition); http://www.isocat.org/datcat/DC-5439: i.e. license type
     82 Recommended Vocabulary:: '''free''' (The resource is available for re-use under a free license, e.g. CC, LGPL. There may be still usage constraints within the scope of the respective licenses, e.g. concerning attribution or conditions of further sharing). '''free for academic use''' (The resource is available for re-use in an academic context.) '''restricted''' (The resource is not available for free re-use. Restrictions may concern the availability or functionality of the resource or its parts. Restrictions might be loosened by purchase of a license.) '''upon-request''' (The resource is freely available upon a user’s request. The terms of usage will be negotiated based on the intended usage scenarios.)
     83 Open Issues:: Test implementation available under: http://aspra11.informatik.uni-leipzig.de:8080/vlo/search?0 This facet should be accompanied by a display facet `License`.
     84
     85=== Lifecycle Status ===
     86 Definition:: The status in the life cycle of the resource or tool.
     87 Tooltip:: The status in the life cycle of the resource or tool
     88 Recommended Data Categories:: (not discussed yet!) http://www.isocat.org/datcat/DC-3818: life cycle status, such as: “.”
     89 Recommended Vocabulary:: As far as possible use the vocabulary proposed in DC-3818, i.e.: '''planned, development, released, production, withdrawn, retired, superseded, archived'''
     90
     91=== Modality ===
     92 Definition:: The channel by which the signs in the content of the resource were transmitted or the modality for which a tool is intended, e.g. to regonize speech, gestures or entities of a text.
     93 Tooltip:: The modality of the content of the resource or intended for the tool
     94 Recommended Data Categories:: http://www.isocat.org/datcat/DC-2490: modalities
     95 Recommended Vocabulary:: (based on the examples under http://www.isocat.org/datcat/DC-2490 and the data within the VLO’s modality facet)
     96
     97||= Type =||= Subtype =||=Examples and/or notes||
     98||actOut / performance / demonstration|| || a cooking show ||
     99||dance || || ||
     100||eyeGaze || || ||
     101||facialExpressions || || ||
     102||gestures      ||(languageLike) ||     pantomimes, emblems, sign language ||
     103||              ||(coSpeech)     ||     iconics, beats, cohesives, deictics, metaphorics ||
     104||              ||(enblematics)  ||
     105||image         ||(painting / drawing)|| ||
     106||              ||(photo) || ||
     107||              ||(other) || ||
     108||music         ||(instrumental) || ||
     109||              ||(singing) || ||
     110||speech        || ||an interview, storytelling||
     111||writing       || (print) ||a book ||
     112||              || (handwritten) ||a letter||
     113||unspecified || || ||
     114||other || ||           drum signals, whistling ||
     115||multimodal || ||In case that several modalities are involved, do not provide more than one term per element. Instead make use of the respective element several times. If that is not possible, the unspecified term '''multimodal''' may be used.||
     116
     117 Open Issues:: Discuss Feedback of F-AG6 (Petra Wagner, Uni Bielefeld)
     118
     119
     120=== National Project ===
     121 Definition:: The national CLARIN project providing the resource or tool.
     122 Tooltip:: The national CLARIN project providing the resource or tool
     123 Recommended Data Categories:: /c:CMD/c:Header/c:MdCollectionDisplayName/text()
     124 Recommended Vocabulary:: '''CELR, CLARIN-AT, CLARIN-D, CLARIN-DK, CLARIN-EU, CLARIN-NL, CLARIN-PL, CLARINO, SWE-CLARIN, Other''' This list will have to be expanded for new national CLARIN initiatives. (Note from Hanna Hedeland: Aren't these values are derived when harvesting via a mapping?)
     125 Open Issues:: A description of the resources harvested as well as the criteria for harvesting would be helpful.
     126
     127=== Organisation ===
     128 Definition:: The name of the organisation currently responsible for the resource or tool, i.e. to be contacted with any questions or requests regarding the metadata or access to the resource/tool.
     129 Tooltip:: The organisation currently responsible for the resource or tool
     130 Recommended Data Categories:: www.isocat.org/datcat/DC-2459: organization
     131 Problematic Data Categories:: www.isocat.org/datcat/DC-2979: Organisation → Underspecified, i.e. this DC doesn’t specify the purpose of the organisation named here in the context of the resource.
     132 Eliminated/Ignored Data Categories:: www.isocat.org/datcat/DC-6134:publisher (from TEI), http://purl.org/dc/terms/publisher: publisher (Dublin Core) → Both underspecified, i.e. this DCs don’t specify the entity named here (organisation, person, etc.) now the purpose of the entity in the context of the resource.
     133 Recommendations on the Vocabulary:: (1) There should be an English reading provided for each institution name which will be represented within the VLO. The English name should be marked as such by usage of a language-defining attribute and an ISO 639-3 language code (e.g. `@xml:lang="eng"`). (2) There should only be one variant of an institution name used within all metadata records provided by this respective institution.
     134
     135=== Project ===
     136 Definition:: The name of the projects originally involved in the creation of the resource or tool. These projects may no longer exist and are usually not the ones to be contacted regarding the resource/tool.
     137 Tooltip:: The project within which the resource was created
     138 Recommended Data Categories:: http://www.isocat.org/datcat/DC-2536: project name, http://www.isocat.org/datcat/DC-2537: project title
     139 Recommendations on the Vocabulary:: TODO
     140
     141=== Resource Type ===
     142 Definition:: The type of the resource or tool (e.g. corpus, lexicon, grammar, tool, …)
     143 Tooltip:: The type of the resource or tool (e.g. corpus, lexicon, grammar, tool, …)
     144 Recommended Data Categories:: http://www.isocat.org/datcat/DC-3806: resource class, http://www.isocat.org/datcat/DC-5424: type → based on the TEI Header definition of type
     145 Problematic Data Categories:: http://purl.org/dc/terms/type, http://purl.org/dc/elements/1.1/type → This DC is underspecified since “genre” is included in the definition as well. If used, it is necessary that it is embedded in a context (of other data categories) disambiguating the possible readings of this category.
     146 Recommended Vocabulary:: (based on the examples under http://www.isocat.org/datcat/DC-3806 and the data within the VLO’s Resource Type facet)
     147
     148||= Value =||Example||
     149||annotatedText|| ||
     150||audioRecording || ||
     151||collection || a collection which cannot be used as a corpus in a linguistic sense||
     152||corpus || ||
     153||database || ||
     154||dataset:experimentalData|| ||
     155||dataset:fieldworkMaterial|| ||
     156||dataset:surveyData|| ||
     157||dataset:testData|| ||
     158||grammar|| ||
     159||image||a digital image||
     160||lexicalResource|| ||
     161||physicalObject|| book, picture, photo, stone with inscription, …||
     162||plainText|| ||
     163||session|| ||
     164||teachingMaterial || ||
     165||tool|| ||
     166||toolChain|| ||
     167||videoRecording|| ||
     168||webApplication|| ||
     169||webService|| ||
     170||unspecified || ||
     171||other || ||
     172
     173TODO: Excluded DC3806-vocabulary:
     174grammar → rather belongs to facet genre,
     175lexicon → use lexicalResource instead,
     176resourceBundle → use collection instead,
     177teachingMaterial → rather belongs to facet genre
     178
     179
     180=== Subject ===
     181 Definition:: The subject or topic of the content of the resource, consistently applied within the collection.
     182 Tooltip:: The subject or topic of the content of the resource
     183 Recommended Data Categories:: http://purl.org/dc/terms/subject: subject, http://purl.org/dc/elements/1.1/subject: subject, http://www.isocat.org/datcat/DC-6147: domain of use, http://www.isocat.org/datcat/DC-5316: classification code → This DC is designed analogous to the TEI-Header element classCode. It is thus underspecified for the subject facet. Hence, when used the classification scheme has to be determined and the usage of this element according to the definition of the subject facet should be specified by the context somehow.
     184 Recommended Vocabulary:: The usage of a somehow controlled vocabulary is recommended which is homogeneous in itself, and preferably sufficiently documented.
     185
     186
     187=== Temporal Coverage ===
     188 Definition:: The temporal coverage of the source material of the resource, i.e. not the time within which the resource was created, but when e.g. original texts were written or speech recordings made.
     189 Tooltip:: The temporal coverage of the source material of the resource
     190 Possible Data Categories:: http://www.isocat.org/datcat/DC-3664: Time coverage, http://www.isocat.org/datcat/DC-3654: Start range → This data category has to be used together with DC-3655, http://www.isocat.org/datcat/DC-3655: End range → This data category has to be used together with DC-3654
     191 Problematic Data Categories:: http://www.isocat.org/datcat/DC-4343: interval → Underspecified (“a : a space of time between events or states”): could be filled with values like “200 years”, “a decade”, etc.; http://www.isocat.org/datcat/DC-5742: End time → incomplete; corresponding start time category missing; http://www.isocat.org/datcat/DC-2502: time coverage → Definition (“The time period that the content of a resource is about.”) does not suit the focus of this facet (cf. 17.1)
     192 Proposed vocabulary:: Open Date Range format www.ukoln.ac.uk/metadata/dcmi/date-dccd-odrf for http://www.isocat.org/datcat/DC-3664; W3C DateTime http://www.w3.org/TR/NOTE-datetime for http://www.isocat.org/datcat/DC-3654 and http://www.isocat.org/datcat/DC-3655