| 1 | == Part of Speech tag sets for FCS == |
| 2 | In search for a simple part-of-speech tagset for CLARIN Federated Content Search |
| 3 | |
| 4 | The POS tag set shall be used by the human users of the FCS aggregator. |
| 5 | |
| 6 | === Sources === |
| 7 | EAGLES: [[http://www.ilc.cnr.it/EAGLES96/pub/eagles/corpora/annotate.ps.gz|EAGLES Recommendations for the morphosyntactic annotation of corpora]], ''Obligatory attributes/values, Major Categories'', §4.2.1 on page 7 |
| 8 | |
| 9 | STTS/HW: [[http://www.sfs.uni-tuebingen.de/resources/stts-1999.pdf|Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset)]], ''Hauptwortarten'', Tabelle 2.1 on page 4 |
| 10 | |
| 11 | UD-17: [[http://universaldependencies.github.io/docs/u/pos/all.html|Universal Dependencies]] |
| 12 | |
| 13 | UP-12: [[http://www.petrovi.de/data/universal.pdf|Universal POS]] 2nd paragraph in the right hand column of page 2 |
| 14 | |
| 15 | === Summary table === |
| 16 | |
| 17 | The summary table show the tags drawn from the tag sets quoted above (in their original spelling). |
| 18 | |
| 19 | ||= EAGLES =||= STTS/HW =||= UP-12 =||= UD-17 =||= Notes =|| |
| 20 | || N || N || NOUN || NOUN || noun || |
| 21 | || || || || PROPN || proper noun/named entity || |
| 22 | || V || V || VERB || VERB || verb || |
| 23 | || || || || AUX || auxiliary verb (includes modal auxiliaries like ''should'' or ''must'') || |
| 24 | || AJ || ADJ || ADJ || ADJ || adjective || |
| 25 | || PD || P || || || pronoun/determiner || |
| 26 | || AT || ART || || || article || |
| 27 | || || || DET || DET || determiner || |
| 28 | || || || PRON || PRON || pronoun || |
| 29 | || AV || ADV || ADV || ADV || adverb || |
| 30 | || AP || AP || ADP || ADP || adposition (circum-, pre-, postposition) || |
| 31 | || C || KO || CONJ || CONJ || conjunction || |
| 32 | || || || || SCONJ || subordinating conjunction || |
| 33 | || NU || CARD || NUM || NUM || numeral; cardinal numeral (ordinals are tagged as adjectives or adverbs) || |
| 34 | || I || ITJ || || INTJ || interjection || |
| 35 | || U || PTK || PRT || PART || unique; particle || |
| 36 | || R || || X || X || residual; other || |
| 37 | || || || || SYM || symbol ($, %, §, ©, +, −, 😝, !http://example.org, !a@example.org) || |
| 38 | || PU || || . || PUNCT || punctuation || |
| 39 | || 13 || 11 || 12 || 17 || total number of tags || |
| 40 | |
| 41 | '''Notes:''' |
| 42 | |
| 43 | STTS/HW lacks generic tags for punctuation and other: They could be supplemented as $ (for punctuation) and X (for other) |
| 44 | |
| 45 | UP-12 lacks a tag for interjection (an oversight?) |
| 46 | |
| 47 | UP-12 and UD-17 lack a tag for article, it is absorbed into the determiner class. |
| 48 | |
| 49 | UP-12 and UD-17 have a class determiner mostly separated out of the pronoun class; this separation is completely unnatural to German speakers, compare the [[http://universaldependencies.github.io/docs/tagset-conversion/de-stts-uposf.html|translation from STTS to UD-17]] |
| 50 | |
| 51 | The use of DET is inconsistent in UD-17; e.g., Latin seems to have no determiners at all (c.f. [[http://universaldependencies.github.io/docs/tagset-conversion/la-conll-uposf.html|Tagset la::conll]]). |
| 52 | |