wiki:Taskforces/FCS/FCS-QL

Context Navigation

Version 4 (modified by Oliver Schonefeld, 9 years ago) (diff)
--

CLARIN Federated Content Search Query Language

A working draft for the CQP flavor for CLARIN Federated Content Search (FCS).

ENBF

 [1] query                ::= main-query within-part?
  	 
 [2] main-query           ::= simple-query
                            | "(" main-query ")"            /* grouping */
                            | main-query "|" main-query     /* or */
                            | main-query main-query         /* sequence */
                            | main-query quantifier         /* quatification */ 	 

 [3] simple-query         ::= implicit-query
                            | segment-query 	 
 
 [4] implicit-query       ::= flagged-regexp 	 
 
 [5] segment-query        ::= "[" expression? "]" 	 

 [6] within-part          ::= simple-within-part

 [7] simple-within-part   ::= "within" simple-within-scope

 [8] simple-within-scope  ::= "sentence"
                            | "s"
                            | "utterance"
                            | "u"
                            | "paragraph"
                            | "p"
                            | "turn"
                            | "t"
                            | "text"
                            | "session"  	 

[11] expression           ::= basic-expression
                            | expression "|" expression     /* or */
                            | expression "&" expression     /* and */
                            | "(" expression ")"            /* grouping */
                            | "!" expression                /* not */ 	 

[12] basic-expression     ::= attribute operator flagged-regexp 	 

[13] operator	          ::= "="                           /* equals */
                            | "!="                          /* non-equals */

[14] quantifier           ::= "+"                           /* one-or-more */
                            | "*"                           /* zero-or-more */
                            | "?"                           /* zero-or-one */
                            | "{" integer "}"               /* exactly n-times */
                            | "{" integer? "," integer "}"  /* at most */
                            | "{" integer "," integer? "}"  /* min-max */	 

[15] flagged-regexp       ::= regexp
                            | regexp "/" regexp-flag+ 	 

[16] regexp-flag          ::= "i"  /* case-insensitive; Poliqarp/Perl compat */
                            | "I"  /* case-sensitive; Poliqarp compat */
                            | "c"  /* case-insensitive, CQP compat */
                            | "C"  /* case-sensitive */
                            | "l"  /* literal matching, CQP compat*/
                            | "d"  /* diacritic agnostic matching, CQP compat */ 
       
[17] regexp               ::= quoted-string

[18] attribute            ::= simple-attribute
                            | qualified-attribute

[19] simple-attribute     ::= identifier

[20] qualified-attribute  ::= identifier ":" identifier  

[21] identifier           ::= identifier-char identifier-char*

[22] identifier-char      ::= [a-zA-Z0-9\-]

[24] integer              ::= [0-9]+ 

[26] quoted-string        ::= "'" (char | ws)* "'"  /* single-quotes */
                            | """ (char | ws)* """  /* double-quotes */

[27] char                 ::= <any unicode codepoint excluding whitespace codepoints>
                            | "\" escaped-char

[28] ws                   ::= <any whitespace codepoint>

[29] escaped-char         ::= "\"                                  /* backslash (\) */
                            | "'"                                  /* single quote (') */
                            | """                                  /* double quote (") */
                            | "n"                                  /* generic newline, i.e "\n", "\r", etc */
                            | "t"                                  /* character tabulation (U+0009) */
                            | "x" hex hex                          /* Unicode codepoint with hex value hh */
                            | "u" hex hex hex hex                  /* Unicode codepoint with hex value hhhh */
                            | "U" hex hex hex hex hex hex hex hex  /* Unicode codepoint with hex value hhhhhhhh */ 

[30] hex                  ::= [0-9a-fA-F]

Notes

based on Poliqarp with inspiration from others
"attribute": the annotation layer to be used, e.g. "word", "lemma", "pos" or qualified "pos:stts" the supported values for this construct are beyond the grammar and need to be defined in supplementary documents
"simple-within-scope": possible values for scope
- "sentence", "s", "utterance", "u": denote a matching scope of something like a sentence or utterance. provides compatibility with FCS 1.0 ("Generic Hits", "Each hit SHOULD be presented within the context of a complete sentence.")
- "paragraph" | "p" | "turn" | "t": denote the next larger unit, e.g. something like a paragraph
- "article" | "session": something like a whole document
[27] and [28] "any $SOMETING codepoint" are a pain to get easily done in at least ANTLR and JavaCC. Especially in combination with [29] :/
regex are not defined/guarded by this grammar :/
non-continuous rule numbers are currently intended; we've already removed some. Rules will be renumbered, when grammar is fixed.

Attachments (1)

FCS_QL_2.g4 (3.7 KB) - added by peter.beinema@mpi.nl 9 years ago. antlr (version 4.5) grammar for FCS-QL query

Download all attachments as: .zip

Download in other formats:

Plain Text