Version 4 (modified by 9 years ago) (diff) | ,
---|
CLARIN Federated Content Search Query Language
A working draft for the CQP flavor for CLARIN Federated Content Search (FCS).
ENBF
[1] query ::= main-query within-part? [2] main-query ::= simple-query | "(" main-query ")" /* grouping */ | main-query "|" main-query /* or */ | main-query main-query /* sequence */ | main-query quantifier /* quatification */ [3] simple-query ::= implicit-query | segment-query [4] implicit-query ::= flagged-regexp [5] segment-query ::= "[" expression? "]" [6] within-part ::= simple-within-part [7] simple-within-part ::= "within" simple-within-scope [8] simple-within-scope ::= "sentence" | "s" | "utterance" | "u" | "paragraph" | "p" | "turn" | "t" | "text" | "session" [11] expression ::= basic-expression | expression "|" expression /* or */ | expression "&" expression /* and */ | "(" expression ")" /* grouping */ | "!" expression /* not */ [12] basic-expression ::= attribute operator flagged-regexp [13] operator ::= "=" /* equals */ | "!=" /* non-equals */ [14] quantifier ::= "+" /* one-or-more */ | "*" /* zero-or-more */ | "?" /* zero-or-one */ | "{" integer "}" /* exactly n-times */ | "{" integer? "," integer "}" /* at most */ | "{" integer "," integer? "}" /* min-max */ [15] flagged-regexp ::= regexp | regexp "/" regexp-flag+ [16] regexp-flag ::= "i" /* case-insensitive; Poliqarp/Perl compat */ | "I" /* case-sensitive; Poliqarp compat */ | "c" /* case-insensitive, CQP compat */ | "C" /* case-sensitive */ | "l" /* literal matching, CQP compat*/ | "d" /* diacritic agnostic matching, CQP compat */ [17] regexp ::= quoted-string [18] attribute ::= simple-attribute | qualified-attribute [19] simple-attribute ::= identifier [20] qualified-attribute ::= identifier ":" identifier [21] identifier ::= identifier-char identifier-char* [22] identifier-char ::= [a-zA-Z0-9\-] [24] integer ::= [0-9]+ [26] quoted-string ::= "'" (char | ws)* "'" /* single-quotes */ | """ (char | ws)* """ /* double-quotes */ [27] char ::= <any unicode codepoint excluding whitespace codepoints> | "\" escaped-char [28] ws ::= <any whitespace codepoint> [29] escaped-char ::= "\" /* backslash (\) */ | "'" /* single quote (') */ | """ /* double quote (") */ | "n" /* generic newline, i.e "\n", "\r", etc */ | "t" /* character tabulation (U+0009) */ | "x" hex hex /* Unicode codepoint with hex value hh */ | "u" hex hex hex hex /* Unicode codepoint with hex value hhhh */ | "U" hex hex hex hex hex hex hex hex /* Unicode codepoint with hex value hhhhhhhh */ [30] hex ::= [0-9a-fA-F]
Notes
- based on Poliqarp with inspiration from others
- "attribute": the annotation layer to be used, e.g. "word", "lemma", "pos" or qualified "pos:stts" the supported values for this construct are beyond the grammar and need to be defined in supplementary documents
- "simple-within-scope": possible values for scope
- "sentence", "s", "utterance", "u": denote a matching scope of something like a sentence or utterance. provides compatibility with FCS 1.0 ("Generic Hits", "Each hit SHOULD be presented within the context of a complete sentence.")
- "paragraph" | "p" | "turn" | "t": denote the next larger unit, e.g. something like a paragraph
- "article" | "session": something like a whole document
[27]
and[28]
"any $SOMETING codepoint" are a pain to get easily done in at least ANTLR and JavaCC. Especially in combination with[29]
:/- regex are not defined/guarded by this grammar :/
- non-continuous rule numbers are currently intended; we've already removed some. Rules will be renumbered, when grammar is fixed.
Attachments (1)
-
FCS_QL_2.g4 (3.7 KB) - added by 9 years ago.
antlr (version 4.5) grammar for FCS-QL query
Download all attachments as: .zip