== Part of Speech tag sets for FCS == In search for a simple part-of-speech tagset for CLARIN Federated Content Search The POS tag set shall be used by the human users of the FCS aggregator. === Sources === EAGLES: [[http://www.ilc.cnr.it/EAGLES96/pub/eagles/corpora/annotate.ps.gz|EAGLES Recommendations for the morphosyntactic annotation of corpora]], ''Obligatory attributes/values, Major Categories'', §4.2.1 on page 7 STTS/HW: [[http://www.sfs.uni-tuebingen.de/resources/stts-1999.pdf|Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset)]], ''Hauptwortarten'', Tabelle 2.1 on page 4 TANL: [[http://medialab.di.unipi.it/wiki/Tanl_POS_Tagset#Coarse-grained_tags Tanl Coarse Grained Tags]] UD-17: [[http://universaldependencies.github.io/docs/u/pos/all.html|Universal Dependencies]] UP-12: [[http://www.petrovi.de/data/universal.pdf|Universal POS]] 2nd paragraph in the right hand column of page 2 === Summary table === The summary table shows the tags drawn from the tag sets quoted above (in their original spelling). ||= EAGLES =||= STTS/HW =||= TANL=||= UP-12 =||= UD-17 =||= Notes =|| || N || N || S || NOUN || NOUN || noun || || || || || || PROPN || proper noun/named entity || || V || V || V || VERB || VERB || verb || || || || || || AUX || auxiliary verb (includes modal auxiliaries like ''should'' or ''must'') || || AJ || ADJ || A || ADJ || ADJ || adjective || || PD || P || || || || pronoun/determiner (as one single class) || || AT || ART || R || || || article || || || || D || DET || DET || determiner || || || || T || || || predeterminer (e.g., '''tutto''' il giorno) || || || P || PRON || PRON || pronoun || || AV || ADV || B || ADV || ADV || adverb || || AP || AP || E || ADP || ADP || adposition (circum-, pre-, postposition) || || C || KO || C || CONJ || CONJ || conjunction || || || || || || SCONJ || subordinating conjunction || || NU || CARD || N || NUM || NUM || numeral; cardinal numeral (ordinals are tagged as adjectives or adverbs) || || I || ITJ || I || || INTJ || interjection || || U || PTK || || PRT || PART || unique; particle || || R || || X || X || X || residual; other || || || || || || SYM || symbol ($, %, §, ©, +, −, 😝, !http://example.org, !a@example.org) || || PU || || F || . || PUNCT || punctuation || || 13 || 11 || 14 || 12 || 17 || total number of tags || '''Notes:''' STTS/HW lacks generic tags for punctuation and other: They could be supplemented as $ (for punctuation) and X (for other) UP-12 lacks a tag for interjection; interjections are mapped to the class X (c.f. [[https://code.google.com/p/universal-pos-tags/source/browse/trunk/nl-alpino.map]]) UP-12 and UD-17 lack a tag for article, it is absorbed into the determiner class. UP-12 and UD-17 have a class determiner mostly separated out of the pronoun class; this separation is completely unnatural to German speakers, compare the [[http://universaldependencies.github.io/docs/tagset-conversion/de-stts-uposf.html|translation from STTS to UD-17]] The use of DET is inconsistent in UD-17; e.g., Latin seems to have no determiners at all (c.f. [[http://universaldependencies.github.io/docs/tagset-conversion/la-conll-uposf.html|Tagset la::conll]]). TANL lacks the category particle, the Italian negation particle ''non'' is classified as an adverb. TANL considers article, determiner, predeterminer, and pronoun as first-class citizens in parts-of-speech. TANL has one-letter class names: elegant, but not necessarily mnemonic. === Decision === A poll was taken in the video meeting on 2015-03-09, for the result see [[VidConf20150309]]