wiki:Taskforces/FCS/FCS POS tag set

Version 2 (modified by Jörg Knappen, 9 years ago) (diff)

--

Part of Speech tag sets for FCS

In search for a simple part-of-speech tagset for CLARIN Federated Content Search

The POS tag set shall be used by the human users of the FCS aggregator.

Sources

EAGLES: EAGLES Recommendations for the morphosyntactic annotation of corpora, Obligatory attributes/values, Major Categories, §4.2.1 on page 7

STTS/HW: Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset), Hauptwortarten, Tabelle 2.1 on page 4

UD-17: Universal Dependencies

UP-12: Universal POS 2nd paragraph in the right hand column of page 2

Summary table

The summary table show the tags drawn from the tag sets quoted above (in their original spelling).

EAGLES STTS/HW UP-12 UD-17 Notes
N N NOUN NOUN noun
PROPN proper noun/named entity
V V VERB VERB verb
AUX auxiliary verb (includes modal auxiliaries like should or must)
AJ ADJ ADJ ADJ adjective
PD P pronoun/determiner
AT ART article
DET DET determiner
PRON PRON pronoun
AV ADV ADV ADV adverb
AP AP ADP ADP adposition (circum-, pre-, postposition)
C KO CONJ CONJ conjunction
SCONJ subordinating conjunction
NU CARD NUM NUM numeral; cardinal numeral (ordinals are tagged as adjectives or adverbs)
I ITJ INTJ interjection
U PTK PRT PART unique; particle
R X X residual; other
SYM symbol ($, %, §, ©, +, −, 😝, http://example.org, a@example.org)
PU . PUNCT punctuation
13 11 12 17 total number of tags

Notes:

STTS/HW lacks generic tags for punctuation and other: They could be supplemented as $ (for punctuation) and X (for other)

UP-12 lacks a tag for interjection; interjections are mapped to the class X (c.f. https://code.google.com/p/universal-pos-tags/source/browse/trunk/nl-alpino.map)

UP-12 and UD-17 lack a tag for article, it is absorbed into the determiner class.

UP-12 and UD-17 have a class determiner mostly separated out of the pronoun class; this separation is completely unnatural to German speakers, compare the translation from STTS to UD-17

The use of DET is inconsistent in UD-17; e.g., Latin seems to have no determiners at all (c.f. Tagset la::conll).