This is a proposal for a common catalogue of MIME format variants to be recognized by !WebLicht and eventually other CLARIN services. It originated from a discussion across the Dev, Standards and TEIweblicht mailing lists (cf. [https://lists.clarin.eu/pipermail/dev/2016-June/thread.html June] and [https://lists.clarin.eu/pipermail/dev/2016-July/thread.html July]). The original proposal is due to Thomas Schmidt with feedback from Bryan Jurish. * ISO/TEI transcriptions of spoken language will be identified by the MIME type `application/tei+xml;format-variant=tei-iso-spoken`. A parameter `tokenized=0/1` can be added to indicate whether (=1) or not (=0) the respective TEI file is tokenized (i.e. has markup). * DTA/TEI files will be identified by the MIME type `application/tei+xml;format-variant=tei-dta`. A parameter `tokenized=0/1` can be added to indicate whether (=1) or not (=0) the respective TEI file is tokenized (i.e. has markup). * EXMARaLDA Basic Transcriptions will be identified by the MIME type `application/xml; format-variant=exmaralda-exb` * FOLKER/OrthoNormal transcription files will be identified by the MIME type `application/xml; format-variant=folker-fln` * Transcriber transcription files will be identified by the MIME type `application/xml; format-variant=transcriber-trs` * CLAN/CHAT transcription files will be identified by the MIME type `text/plain;format-variant=clan-cha` * TCF will be identified by the MIME type `application/xml;format-variant=weblicht-tcf` Notes by Thomas: * It would have to be checked (note the passive, I don't know who could be in charge of this) whether competing MIME types for these file types are already registered somewhere. I know that !WebLicht already seems to have two variants of EXMARaLDA transcriptions. The mechanims specifying those would probably have to be deprecated. Transcriber and CHAT are also not unlikely to have been given some kind of mimetype elsewhere in CLARIN. * Further relevant formats will be ELAN/EAF and PRAAT/TextGrid (the latter being a text, not an XML format). Both are also likely to have been registered somewhere already, so "someone" (again, I wouldn't know who) should check if mime types have been defined for those. The parameter "charset" can be used for plain text files etc. The above is intended to address points 1-3 in Marie Hinrichs' list below: From !WebLicht’s side, there are several places where some work/coordination needs to happen: 1. TCF: agree on the textsource.type attribute and make sure that the encoder services set it properly 1. Agree on type names (cf. format-variants above) 1. Make sure the CMDI for encoder and decoder services reflect outcomes of 1 and 2 1. Add new mappings to !WebLicht for TEI. == Inventory of mime types as currently used in CLARIN centres == See https://docs.google.com/spreadsheets/d/1KeuZ-0sKiLuguJtKqRfTp0SwHYMME9cTkbuvJD_vRIg/edit?usp=sharing and feel free to comment == References == * https://tools.ietf.org/html/rfc6129 * https://tools.ietf.org/html/rfc6838 * https://www.w3.org/TR/webarch/#xml-media-types * in the relevant respect, obsoleted by RFC 7303, but the recommendation against text/xml is upheld there in order not to complicate the entire picture