Version 1 (modified by 9 years ago) (diff) | ,
---|
Part of Speech tag sets for FCS
In search for a simple part-of-speech tagset for CLARIN Federated Content Search
The POS tag set shall be used by the human users of the FCS aggregator.
Sources
EAGLES: EAGLES Recommendations for the morphosyntactic annotation of corpora, Obligatory attributes/values, Major Categories, §4.2.1 on page 7
STTS/HW: Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset), Hauptwortarten, Tabelle 2.1 on page 4
UD-17: Universal Dependencies
UP-12: Universal POS 2nd paragraph in the right hand column of page 2
Summary table
The summary table show the tags drawn from the tag sets quoted above (in their original spelling).
EAGLES | STTS/HW | UP-12 | UD-17 | Notes |
---|---|---|---|---|
N | N | NOUN | NOUN | noun |
PROPN | proper noun/named entity | |||
V | V | VERB | VERB | verb |
AUX | auxiliary verb (includes modal auxiliaries like should or must) | |||
AJ | ADJ | ADJ | ADJ | adjective |
PD | P | pronoun/determiner | ||
AT | ART | article | ||
DET | DET | determiner | ||
PRON | PRON | pronoun | ||
AV | ADV | ADV | ADV | adverb |
AP | AP | ADP | ADP | adposition (circum-, pre-, postposition) |
C | KO | CONJ | CONJ | conjunction |
SCONJ | subordinating conjunction | |||
NU | CARD | NUM | NUM | numeral; cardinal numeral (ordinals are tagged as adjectives or adverbs) |
I | ITJ | INTJ | interjection | |
U | PTK | PRT | PART | unique; particle |
R | X | X | residual; other | |
SYM | symbol ($, %, §, ©, +, −, 😝, http://example.org, a@example.org) | |||
PU | . | PUNCT | punctuation | |
13 | 11 | 12 | 17 | total number of tags |
Notes:
STTS/HW lacks generic tags for punctuation and other: They could be supplemented as $ (for punctuation) and X (for other)
UP-12 lacks a tag for interjection (an oversight?)
UP-12 and UD-17 lack a tag for article, it is absorbed into the determiner class.
UP-12 and UD-17 have a class determiner mostly separated out of the pronoun class; this separation is completely unnatural to German speakers, compare the translation from STTS to UD-17
The use of DET is inconsistent in UD-17; e.g., Latin seems to have no determiners at all (c.f. Tagset la::conll).