Changes between Version 39 and Version 40 of FCS-spec


Ignore:
Timestamp:
08/29/11 11:06:11 (13 years ago)
Author:
vronk
Comment:

added DataViews?: title, metadata, content; updated the description of Elements and Attributes of the result format

Legend:

Unmodified
Added
Removed
Modified
  • FCS-spec

    v39 v40  
    3131
    3232Example (tentative):
    33 {{{
     33{{{#!xml
    3434 <indexInfo>
    3535  <set identifier="isocat.org/datcat" name="isocat"/>
     
    6161The common ground is the `<sru:searchRetrieveResponse>` defined by the protocol. This goes down to the `<sru:record>` wrapping element.
    6262The proposition is to continue with a generic structure being able to encompass "all" the various types of information.
    63 (But please also look at the draft of the schema [[source:FederatedSearch/ccsResource.xsd]]):
    64 
    65 {{{
    66  <ccs:Resource pid="{pid of the resource}
     63(Please also consider the draft of the schema [[source:FederatedSearch/ccsResource.xsd]]):
     64
     65{{{#!xml
     66 <ccs:Resource pid="{pid of the resource}"
    6767    <ccs:DataView type="metadata" pid="{PID of the CMD-record}" schema="">  /* pid is optional  */
    68         /* this is for metadata provided directly by the content provider
    69          * CMD-format is prefered if available.  */
    70       <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f> /* is this any useful? */
    71         /* or rather direct metadata-fields like: */
     68     /* this is for metadata provided directly by the content provider
     69         * CMD-format is prefered if available, */
     70       <CMD>...
     71       </CMD>
     72     /* but other recognized formats, like dublincore, should be accepted as well: */
    7273       <dc:title></dc:title>
    7374       <dc:author></dc:author>
    74        ....
     75   
     76     /* in the worst case (if the metadata-record consists just of a key-value-list - a serialized form may be used: */
     77        <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f>       
     78
    7579    </ccs:DataView>
    76     <ccs:ResourceFragment pid="{offset relative to the parent resource PID}>           
     80    <ccs:ResourceFragment pid="{offset relative to the parent resource PID}">           
    7781       <ccs:DataView type="metadata" schema="">   
    78             /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */
     82            /* metadata pertaining to the specific (matching) fragment,
     83             * like metadata on the "current" speaker, or matching chapter of a book */
    7984       </ccs:DataView>
    8085
    81       <ccs:DataView type="kwic"><c type="left" >Some text with </c><kw>keyword</kw><c type="right" >highlighted</c></ccs:DataView>             
     86      <ccs:DataView type="kwic">
     87         <c type="left" >Some text with </c>
     88         <kw>keyword</kw>
     89         <c type="right" >highlighted</c>
     90      </ccs:DataView>           
    8291   
    8392      <ccs:DataView type="text/xml" schema="{meertens/schema}"><meertens:any/>     
    8493      </ccs:DataView>           
    8594
    86       <ccs:DataView type="image/jpeg" ref="{optional link to the data}" >
     95      <ccs:DataView type="image/jpeg" ref="{link to the data}" >
    8796      </ccs:DataView> 
    8897    </ccs:ResourceFragment>
     
    94103  It may represent anything that has a PID (and a MDRecord).
    95104  So in particular it may also be collections, aggregating other Resources.
    96   Allowed children are: `Resource`, `ResourceFragment`, `Metadata` and `DataView`
     105  Allowed children are: `Resource`, `ResourceFragment` and `DataView`
    97106 !ResourceFragment::
    98107  A part of a resource, without own PID, i.e. something addressable with: PID of the Resource +  Fragment Identifier.
    99108  Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc.
    100   Allowed children are:  `Metadata` and `DataView`
     109  Allowed children are:  `DataView`
    101110 !DataView::
    102   the element carrying the typed data
    103   Content can be anything that is in other namespace.
    104   The content has to be possible inline or referenced. Important for Images and AV-Files.
     111  Element carrying the typed data.
     112  Content can be anything that is in other namespace and it can be either inline or referenced via @pid/@ref-attributes.
     113 @pid::
     114  Attribute allowed on every element, identifying the !Resource/ResourceFragment/DataView.[[BR]]
     115  PID should also be resolvable, so the `@ref`-attribute
     116 @ref::
     117  Attribute allowed on every element, refering to the !Resource/ResourceFragment/DataView via a URL.[[BR]]
     118  '''''TODO:''' Decide if `@pid` and `@ref` shall both be used, and if yes, then if they can be used in parallel.''
     119 @type::
     120  Attribute on `DataView` identifying the type of the content. See more in next chapter [#DataViews]
     121 @schema::
     122  Attribute on `DataView` referencing the schema that describes the content of given `DataView`.[[BR]]
     123  ''(Note: there seems to be some functional overlap between `@type`- and `@schema`-attribute.)''
    105124 
    106   !DataView type='metadata' (obsoletes Metadata) ::
    107     optional (but strongly encouraged) element carrying metadata about the `Resource` or `ResourceFragment`.
    108     The metadata can be inline or referenced via attributes `@pid` or `@ref`.
    109     It can be basically any xml, it shall be described by a schema referenced in the @schema-parameter,
    110     but preference order is CMD-records, then other recognized standards (`dublincore`, `OLAC`)
    111 
    112   Although the original idea was to "serialize" all such metadata-fields in a `<f key="{field-name}">`-element, I now prefer reusing existing namespaces.
    113 
    114   `<dc:title>` seems preferable to `<f key="dc:title">`, right?
    115 
    116 (An alternative would be to flatten  the structure, but for now, we go with the nested one. Example of a flat structure: )
    117 
    118 {{{
     125''(An alternative would be to flatten  the structure, but for now, we go with the nested one. Example of a flat structure: ''
     126{{{#!xml
    119127<sru:recordData>
    120128   <ccs:ResourcePID>{PID of the resource}</ccs:ResourcePID>
     
    132140</sru:recordData>
    133141}}}
    134 
     142)
    135143
    136144=== Data Views ===
    137145
    138 Here we propose several types of DataViews, the actual format has to be yet defined for most of them,
     146Here we propose several types of !DataViews, the actual format has to be yet defined for most of them,
    139147but we should reuse existing formats where possible, so we should look at existing practices and data, but at the same time avoid overspecializing on some specific format. (The CLARIN deliverable [[http://www-sk.let.uu.nl/u/D5C-3.pdf | Interoperability and Standards ]] (2,7 MB)  can be used as a starting point.) Discussion about the relationship between data types and corresponding "Viewers", i.e. means of displaying the information to the user under [[Viewable]].
    140148
    141 All dataviews of specific types have to be the same in all implementations.
    142 That is, if a service presents results as `KWIC`, that should be the same KWIC in all services.
    143 
     149The `@type`-attribute will not be sufficient to describe  For further specification of the format of the (referenced) content - `DataView`-element shall carry a '''@schema'''-attribute.
     150
     151 title::
     152  a simple text, that can serve as representative for given `Resource` or `ResourceFragment`.
     153  It could be title of a book or chapter, Name of the Resource, or Lemma in a Lexicon
    144154 kwic:: Keyword in context
    145155    both keyword and (left/right) context are wrapped into elements, to avoid mixed-content
    146  {{{
     156 {{{#!xml
    147157 <ccs:DataView type="kwic">
    148158   <c type="left">Some text with </c><kw>keyword</kw> <c type="right">highlighted</c>
     
    150160 }}}
    151161  Alternatively every token could be wrapped as a element:
    152  {{{
     162  {{{#!xml
    153163  <ccs:DataView type="kwic">
    154164      <t id="t1">Some</t>
     
    161171  This comes close to the way the text is encoded in '''TCF''' and would accordingly allow to add (stand-off) annotation layers (lemma, POS, but also syntactic annotations). This shall be a separate DataView-type.
    162172
    163   If there is some associated metadata (like bibliographic information about the source of the hit, this is to be encoded in a separate element `<ccs:DataView type="metadata">`.
     173 content::
     174  A basic !DataView-type, to be used when nothing more specific applies. Especially this is to be used, if only plain text can be delivered as result, not distinguishing the keyword and the context (as required in '''kwic'''-!DataView)
     175{{{#!xml
     176 <ccs:DataView type="content">Some text with the matching keyword. However the keyword is NOT highlighted</ccs:DataView>
     177}}}
     178 metadata::
     179  If there is some associated metadata (like bibliographic information about the source of the hit, this is to be encoded in a separate element
     180{{{#!xml
     181  <ccs:DataView type="metadata">
     182}}}
     183  optional (but strongly encouraged) element carrying metadata about the `Resource` or `ResourceFragment`.
     184
     185  The metadata can be inline or referenced via attributes `@pid` or `@ref`. This can be basically any kind of md-record, but order of preference is:
     186      1. CMD-record
     187      1. some recognized format like ''dublincore'', ''OLAC''
     188      1. a "serialized" format with flat list of `<f key="{field-name}">`-elements
    164189
    165190 Geographic data :: A geographic location, either as coordinates or some location (street, city, place).
     
    185210                     For an example EAF file see: [http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf sample file]
    186211                     For now we have annotations/eaf as type and annotations/tcf as type.                 
    187 {{{
     212{{{#!xml
    188213<ccs:DataView type="annotations/eaf" ref="http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf"
    189214</ccs:DataView>
     
    192217
    193218=== restricting the search by collections===
    194 Restricting the search space shall be done via `x-cmd-domain` (or `x-cmd-context`?) parameter (obsoleting: `x-cmd-collections`). See more under [[SearchContext]]
     219Restricting the search space shall be done via `x-cmd-context`-parameter (obsoleting: `x-cmd-collections`). See more under [[SearchContext]]
    195220
    196221== Scan operation ==
     
    200225
    201226Example (tentative):
    202 {{{
    203 
     227{{{#!xml
    204228<sru:scanResponse xmlns:sru="http://www.loc.gov/zing/srw/"
    205229          xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/"