Changes between Initial Version and Version 1 of Taskforces/FCS/FCS POS tag set


Ignore:
Timestamp:
02/23/15 15:21:29 (9 years ago)
Author:
Jörg Knappen
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS POS tag set

    v1 v1  
     1== Part of Speech tag sets for FCS ==
     2In search for a simple part-of-speech tagset for CLARIN Federated Content Search
     3
     4The POS tag set shall be used by the human users of the FCS aggregator.
     5
     6=== Sources ===
     7EAGLES: [[http://www.ilc.cnr.it/EAGLES96/pub/eagles/corpora/annotate.ps.gz|EAGLES Recommendations for the morphosyntactic annotation of corpora]], ''Obligatory attributes/values, Major Categories'', §4.2.1 on page 7
     8
     9STTS/HW: [[http://www.sfs.uni-tuebingen.de/resources/stts-1999.pdf|Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset)]], ''Hauptwortarten'', Tabelle 2.1 on page 4
     10
     11UD-17: [[http://universaldependencies.github.io/docs/u/pos/all.html|Universal Dependencies]]
     12
     13UP-12: [[http://www.petrovi.de/data/universal.pdf|Universal POS]] 2nd paragraph in the right hand column of page 2
     14
     15=== Summary table ===
     16
     17The summary table show the tags drawn from the tag sets quoted above (in their original spelling).
     18
     19||= EAGLES =||= STTS/HW =||= UP-12 =||= UD-17 =||= Notes =||
     20||  N       ||  N        ||  NOUN   ||  NOUN   || noun    ||
     21||          ||           ||         ||  PROPN  || proper noun/named entity ||
     22||  V       ||  V        ||  VERB   ||  VERB   || verb    ||
     23||          ||           ||         ||  AUX    || auxiliary verb (includes modal auxiliaries like ''should'' or ''must'') ||
     24||  AJ      ||  ADJ      ||  ADJ    ||  ADJ    || adjective ||
     25||  PD      ||  P        ||         ||         || pronoun/determiner ||
     26||  AT      ||  ART      ||         ||         || article ||
     27||          ||           ||  DET    ||  DET    || determiner ||
     28||          ||           ||  PRON   ||  PRON   || pronoun ||
     29||  AV      ||  ADV      ||  ADV    ||  ADV    || adverb ||
     30||  AP      ||  AP       ||  ADP    ||  ADP    || adposition (circum-, pre-, postposition) ||
     31||  C       ||  KO       ||  CONJ   ||  CONJ   || conjunction ||
     32||          ||           ||         ||  SCONJ  || subordinating conjunction ||
     33||  NU      ||  CARD     ||  NUM    ||  NUM    || numeral; cardinal numeral (ordinals are tagged as adjectives or adverbs) ||
     34||  I       ||  ITJ      ||         ||  INTJ   || interjection ||
     35||  U       ||  PTK      ||  PRT    ||  PART   || unique; particle ||
     36||  R       ||           ||  X      ||  X      || residual; other ||
     37||          ||           ||         ||  SYM    || symbol ($, %, §, ©, +, −, 😝, !http://example.org, !a@example.org) ||
     38||  PU      ||           ||  .      ||  PUNCT  || punctuation ||
     39||  13      ||  11       ||  12     ||  17     || total number of tags ||
     40
     41'''Notes:'''
     42
     43STTS/HW lacks generic tags for punctuation and other: They could be supplemented as $ (for punctuation) and X (for other)
     44
     45UP-12 lacks a tag for interjection (an oversight?)
     46
     47UP-12 and UD-17 lack a tag for article, it is absorbed into the determiner class.
     48
     49UP-12 and UD-17 have a class determiner mostly separated out of the pronoun class; this separation is completely unnatural to German speakers, compare the [[http://universaldependencies.github.io/docs/tagset-conversion/de-stts-uposf.html|translation from STTS to UD-17]]
     50
     51The use of DET is inconsistent in UD-17; e.g., Latin seems to have no determiners at all (c.f. [[http://universaldependencies.github.io/docs/tagset-conversion/la-conll-uposf.html|Tagset la::conll]]).
     52