Changes between Version 39 and Version 40 of FCS-spec
- Timestamp:
- 08/29/11 11:06:11 (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
FCS-spec
v39 v40 31 31 32 32 Example (tentative): 33 {{{ 33 {{{#!xml 34 34 <indexInfo> 35 35 <set identifier="isocat.org/datcat" name="isocat"/> … … 61 61 The common ground is the `<sru:searchRetrieveResponse>` defined by the protocol. This goes down to the `<sru:record>` wrapping element. 62 62 The proposition is to continue with a generic structure being able to encompass "all" the various types of information. 63 ( But please also look atthe draft of the schema [[source:FederatedSearch/ccsResource.xsd]]):64 65 {{{ 66 <ccs:Resource pid="{pid of the resource} “>63 (Please also consider the draft of the schema [[source:FederatedSearch/ccsResource.xsd]]): 64 65 {{{#!xml 66 <ccs:Resource pid="{pid of the resource}"> 67 67 <ccs:DataView type="metadata" pid="{PID of the CMD-record}" schema=""> /* pid is optional */ 68 /* this is for metadata provided directly by the content provider 69 * CMD-format is prefered if available. */ 70 <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f> /* is this any useful? */ 71 /* or rather direct metadata-fields like: */ 68 /* this is for metadata provided directly by the content provider 69 * CMD-format is prefered if available, */ 70 <CMD>... 71 </CMD> 72 /* but other recognized formats, like dublincore, should be accepted as well: */ 72 73 <dc:title></dc:title> 73 74 <dc:author></dc:author> 74 .... 75 76 /* in the worst case (if the metadata-record consists just of a key-value-list - a serialized form may be used: */ 77 <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f> 78 75 79 </ccs:DataView> 76 <ccs:ResourceFragment pid="{offset relative to the parent resource PID} “>80 <ccs:ResourceFragment pid="{offset relative to the parent resource PID}"> 77 81 <ccs:DataView type="metadata" schema=""> 78 /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */ 82 /* metadata pertaining to the specific (matching) fragment, 83 * like metadata on the "current" speaker, or matching chapter of a book */ 79 84 </ccs:DataView> 80 85 81 <ccs:DataView type="kwic"><c type="left" >Some text with </c><kw>keyword</kw><c type="right" >highlighted</c></ccs:DataView> 86 <ccs:DataView type="kwic"> 87 <c type="left" >Some text with </c> 88 <kw>keyword</kw> 89 <c type="right" >highlighted</c> 90 </ccs:DataView> 82 91 83 92 <ccs:DataView type="text/xml" schema="{meertens/schema}"><meertens:any/> 84 93 </ccs:DataView> 85 94 86 <ccs:DataView type="image/jpeg" ref="{ optional linkto the data}" >95 <ccs:DataView type="image/jpeg" ref="{link to the data}" > 87 96 </ccs:DataView> 88 97 </ccs:ResourceFragment> … … 94 103 It may represent anything that has a PID (and a MDRecord). 95 104 So in particular it may also be collections, aggregating other Resources. 96 Allowed children are: `Resource`, `ResourceFragment` , `Metadata`and `DataView`105 Allowed children are: `Resource`, `ResourceFragment` and `DataView` 97 106 !ResourceFragment:: 98 107 A part of a resource, without own PID, i.e. something addressable with: PID of the Resource + Fragment Identifier. 99 108 Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc. 100 Allowed children are: ` Metadata` and `DataView`109 Allowed children are: `DataView` 101 110 !DataView:: 102 the element carrying the typed data 103 Content can be anything that is in other namespace. 104 The content has to be possible inline or referenced. Important for Images and AV-Files. 111 Element carrying the typed data. 112 Content can be anything that is in other namespace and it can be either inline or referenced via @pid/@ref-attributes. 113 @pid:: 114 Attribute allowed on every element, identifying the !Resource/ResourceFragment/DataView.[[BR]] 115 PID should also be resolvable, so the `@ref`-attribute 116 @ref:: 117 Attribute allowed on every element, refering to the !Resource/ResourceFragment/DataView via a URL.[[BR]] 118 '''''TODO:''' Decide if `@pid` and `@ref` shall both be used, and if yes, then if they can be used in parallel.'' 119 @type:: 120 Attribute on `DataView` identifying the type of the content. See more in next chapter [#DataViews] 121 @schema:: 122 Attribute on `DataView` referencing the schema that describes the content of given `DataView`.[[BR]] 123 ''(Note: there seems to be some functional overlap between `@type`- and `@schema`-attribute.)'' 105 124 106 !DataView type='metadata' (obsoletes Metadata) :: 107 optional (but strongly encouraged) element carrying metadata about the `Resource` or `ResourceFragment`. 108 The metadata can be inline or referenced via attributes `@pid` or `@ref`. 109 It can be basically any xml, it shall be described by a schema referenced in the @schema-parameter, 110 but preference order is CMD-records, then other recognized standards (`dublincore`, `OLAC`) 111 112 Although the original idea was to "serialize" all such metadata-fields in a `<f key="{field-name}">`-element, I now prefer reusing existing namespaces. 113 114 `<dc:title>` seems preferable to `<f key="dc:title">`, right? 115 116 (An alternative would be to flatten the structure, but for now, we go with the nested one. Example of a flat structure: ) 117 118 {{{ 125 ''(An alternative would be to flatten the structure, but for now, we go with the nested one. Example of a flat structure: '' 126 {{{#!xml 119 127 <sru:recordData> 120 128 <ccs:ResourcePID>{PID of the resource}</ccs:ResourcePID> … … 132 140 </sru:recordData> 133 141 }}} 134 142 ) 135 143 136 144 === Data Views === 137 145 138 Here we propose several types of DataViews, the actual format has to be yet defined for most of them,146 Here we propose several types of !DataViews, the actual format has to be yet defined for most of them, 139 147 but we should reuse existing formats where possible, so we should look at existing practices and data, but at the same time avoid overspecializing on some specific format. (The CLARIN deliverable [[http://www-sk.let.uu.nl/u/D5C-3.pdf | Interoperability and Standards ]] (2,7 MB) can be used as a starting point.) Discussion about the relationship between data types and corresponding "Viewers", i.e. means of displaying the information to the user under [[Viewable]]. 140 148 141 All dataviews of specific types have to be the same in all implementations. 142 That is, if a service presents results as `KWIC`, that should be the same KWIC in all services. 143 149 The `@type`-attribute will not be sufficient to describe For further specification of the format of the (referenced) content - `DataView`-element shall carry a '''@schema'''-attribute. 150 151 title:: 152 a simple text, that can serve as representative for given `Resource` or `ResourceFragment`. 153 It could be title of a book or chapter, Name of the Resource, or Lemma in a Lexicon 144 154 kwic:: Keyword in context 145 155 both keyword and (left/right) context are wrapped into elements, to avoid mixed-content 146 {{{ 156 {{{#!xml 147 157 <ccs:DataView type="kwic"> 148 158 <c type="left">Some text with </c><kw>keyword</kw> <c type="right">highlighted</c> … … 150 160 }}} 151 161 Alternatively every token could be wrapped as a element: 152 {{{162 {{{#!xml 153 163 <ccs:DataView type="kwic"> 154 164 <t id="t1">Some</t> … … 161 171 This comes close to the way the text is encoded in '''TCF''' and would accordingly allow to add (stand-off) annotation layers (lemma, POS, but also syntactic annotations). This shall be a separate DataView-type. 162 172 163 If there is some associated metadata (like bibliographic information about the source of the hit, this is to be encoded in a separate element `<ccs:DataView type="metadata">`. 173 content:: 174 A basic !DataView-type, to be used when nothing more specific applies. Especially this is to be used, if only plain text can be delivered as result, not distinguishing the keyword and the context (as required in '''kwic'''-!DataView) 175 {{{#!xml 176 <ccs:DataView type="content">Some text with the matching keyword. However the keyword is NOT highlighted</ccs:DataView> 177 }}} 178 metadata:: 179 If there is some associated metadata (like bibliographic information about the source of the hit, this is to be encoded in a separate element 180 {{{#!xml 181 <ccs:DataView type="metadata"> 182 }}} 183 optional (but strongly encouraged) element carrying metadata about the `Resource` or `ResourceFragment`. 184 185 The metadata can be inline or referenced via attributes `@pid` or `@ref`. This can be basically any kind of md-record, but order of preference is: 186 1. CMD-record 187 1. some recognized format like ''dublincore'', ''OLAC'' 188 1. a "serialized" format with flat list of `<f key="{field-name}">`-elements 164 189 165 190 Geographic data :: A geographic location, either as coordinates or some location (street, city, place). … … 185 210 For an example EAF file see: [http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf sample file] 186 211 For now we have annotations/eaf as type and annotations/tcf as type. 187 {{{ 212 {{{#!xml 188 213 <ccs:DataView type="annotations/eaf" ref="http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf" 189 214 </ccs:DataView> … … 192 217 193 218 === restricting the search by collections=== 194 Restricting the search space shall be done via `x-cmd- domain` (or `x-cmd-context`?)parameter (obsoleting: `x-cmd-collections`). See more under [[SearchContext]]219 Restricting the search space shall be done via `x-cmd-context`-parameter (obsoleting: `x-cmd-collections`). See more under [[SearchContext]] 195 220 196 221 == Scan operation == … … 200 225 201 226 Example (tentative): 202 {{{ 203 227 {{{#!xml 204 228 <sru:scanResponse xmlns:sru="http://www.loc.gov/zing/srw/" 205 229 xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/"