Gaudi

{{{ #!html

}}} [[PageOutline(1-3, , inline)]] {{{ #!comment Obviously, your page starts below this block }}} = XSD Schema = == Preamble == The xsd schema is designed according to the following paradigm: -- There are 7 sorts of resources in DASISH: {{{CachedRepresentation}}}, {{{Source}}}, {{{User}}}, {{{Annotation}}}, {{{Notebook}}}, {{{Lists of Permissions}}}, {{{Lists of Versions}}}. -- There are 6 xsd-types corresponding to the serialisations of all the types of resources above, except {{{CachedRepresentation}}}. There is no an xsd-schema type corresponding to {{{Cached representation}}} because a cached representation is a "pure" resource like an image or a text file that does not contain any meta-information about itself. The metadata of a cached presentation are defined via an instance of {{{CachedRepresentationInfo}}}. -- Each of these 6 types has an obligatory attribute "URI" which contains DASISH identifier pointing to the location of the resource on the DASISH server. -- There are corresponding lists-of-reference types: {{{CachedRepresentations}}}, {{{Sources}}}, {{{Users}}}, {{{Annotations}}}, {{{Notebooks}}}. Their names are just plural English forms of the corresponding types. -- There are corresponding resource-info types: {{{CachedRepresentationInfo}}}, {{{SourceInfo}}}, {{{UserInfo}}}, {{{AnnotationInfo}}}, {{{NotebookInfo}}}. They contain reference to the corresponding resource plus the most important information about the resource. -- There are corresponding list-of-resource-info types: {{{SourceInfos}}}, {{{UserInfos}}}, {{{AnnotationInfos}}}, {{{NotebookInfos}}}. There is a number of auxiliary types as well. A commonly-used one is ResourceREF which contains the attribute "ref" of type {{{xs:anyURI}}}. It allows to declare elements-references and avoid mixing them with elements-resources. === Handling new (not yet in the DB) sources === Adding annotation with the target sources which are not yet in the DB needs special treatment. It becomes clear when the POST body for a new annotation must be serialized. Two approaches seem to be plausible. We will follow the FIRST option. 1) A "strongly-typed" schema. An annotation contains a list of elements-"targets". Each of them can be either a source element or a new-source element. It is implemented using xs:choice construct for elements. A source and a new-source element differs by one attribute: a source has obligatory "ref" attribute, and a new source has an obligatory "xml:id" attribute. See [source:DASISH/t5.6/schema/trunk/annotator-schema/src/main/resources/DASISH-schema.xsd DASISH-schema] 2) A "weakly-typed" schema. An annotation contain a list of elements-"targets" of the same type that contains two non-obligatory attributes: "ref" and "xml:id". The type-checking "''at least one of the attributes is present and they are mutually exclusive''" may be left for later to schematron or so. See [source:DASISH/t5.6/docs/XMLandXSD/DASISH-schema-alternative.xsd DASISH-alternative-xsd]. The link to the second, "weakly-typed", version of the XSD-schema is left for the reference, however this version is not maintained any more. = Scenario XML's validated vs the given schema = See [[DASISH/Scenario]] === GET api/user/00000000-0000-0000-0000-000000000112 === {{{#!xml }}} === GET api/users/00000000-0000-0000-0000-0000000000112/current === {{{#!xml }}} === GET api/users/info?email=twagoo@mpi.nl === {{{#!xml }}} == Retrieving annotations == === Responding GET api/annotations?link=Sagrada === all annotations which annotating links containing "Sagrada" {{{#!xml http://localhost:8080/annotator-backend/api/targets/15a7ff2c-ee9e-4eb1-b51a-6b20c6df0218 http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-000000000031 http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-000000000032 }}} === Responding GET api/annotations/00000000-0000-0000-0000-000000000021 === {{{#!xml <html><body>some html 1</body></html> }}} === GET api/annotations/00000000-0000-0000-0000-000000000021/targets === {{{#!xml http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-000000000031 http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-000000000032 }}} === GET api/annotations/00000000-0000-0000-0000-000000000021/permissions === {{{#!xml }}} === GET api/targets/00000000-0000-0000-0000-000000000032 === An unresolvable target obeys the same schema. A target becomes unresolvable if e.g. its link becomes obsolete or broken. {{{#!xml http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-000000000032 }}} === GET api/targets/00000000-0000-0000-0000-000000000032/versions === {{{#!xml http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-000000000032 }}} === GET api/cached/00000000-0000-0000-0000-000000000051/metadata === {{{#!xml }}} === Responding GET api/annotations/00000000-0000-0000-0000-00000000002c (example usage for unresolvable targets 1) === The respond for an annotation with unresolved targets and the respond for an annotation with resolved targets (see above) are both instances of the same schema element. However, one of the targets of the first annotations annotation refers e.g. to an obsolete version of the page. Next, having the target eferences, the client will ask for the source versions saved in the DB. The last step: having the info about the version under consideration, the client asks for cached representations of the version. {{{#!xml <html><body>some html 1</body></html> }}} === Responding GET api/targets/00000000-0000-0000-0000-00000000003c (unresolvable target sources 2, the same as for resolvable, just the link is broken or obsolete) === {{{#!xml http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-00000000003c }}} === Responding GET api/targets/00000000-0000-0000-0000-00000000003c/versions (unresolvable target sources 3) === The target has only one version in this case: itself. {{{#!xml http://localhost:8080/annotator-backend/api/targets/00000000-0000-0000-0000-00000000003c }}} === Responding GET api/cached/00000000-0000-0000-0000-00000000005c/metadata (unresolvable target sources 4) === {{{#!xml }}} == Making a new annotation == === Request body for POST api/annotations === The new annotation URI, the owner reference will be replaced by the server. The new annotation URI is service URI + the UUID generated by the server. The owner reference is the service URI + logged-in user UUID. The targets's URI will be replaced if the target is new (has not been presented in the dtatabase yet). {{{#!xml Some background information on Sagrada Família. }}} === Response body (envelope) for POST api/annotations === The temporary URIs/references are replaced with permanent references. However, no cahced representation is found for the target. Therefore, in the action part of the envelope there is an action CREATE_CACHED_REPRESENTATION for the object which is the target for the web-page. {{{#!xml Some background information on Sagrada Família. }}} The client sends metadata cached representation in the POST body, and a cached representation itself. An example of serialized metadata for a cached representation has been considered above, so we do not give it here. == Editing an annotation == === PUT api/annotations/6a01ba7b-2a15-47d4-bf1c-a14b46eb953f === Request body : an updated annotation {{{#!xml Construction process of S.F. }}} Enveloped respond containing new (updated) annotation and a list of actions: {{{#!xml Construction process of S.F. }}} === PUT api/annotations/6a01ba7b-2a15-47d4-bf1c-a14b46eb953f/body === Request body: {{{#!xml Some background information on Sagrada Família. attempt # 1 }}} Response: {{{#!xml Some background information on Sagrada Família. attempt # 1 }}} === PUT api/annotations/6a01ba7b-2a15-47d4-bf1c-a14b46eb953f/permissions === Supplementary updating the list of permissions in the annotation: Example 1. Request body: {{{#!xml }}} Respond {{{#!xml }}} Example 2: user 00000000-0000-0000-0000-000000000114 is not known to the DASISH data base Request body: {{{#!xml }}} Respond {{{#!xml }}} === PUT api/annotations/6a01ba7b-2a15-47d4-bf1c-a14b46eb953f/permissions/00000000-0000-0000-0000-000000000111 === {{{#!xml writer }}} Response: string "1 rows are updated/added". == Managing Notebooks (obsolete section) == === GET api/notebooks === {{{#!xml Gaudi Douglas Adams }}} === GET api/notebooks/NIDxyxy === {{{#!xml Gaudi }}} === GET api/notebooks/NIDxyxy/annotations/ === Respond is a list of annotation info, is similar to the respond on {{{GET api/annotations?link="http://en.wikipedia.org/wiki/Sagrada_Fam%C3%ADlia"&access=read}}}. = Issues with the schema = As regards all future schema updates, please make sure to remember to update all available scenario xml documents on trac.clarin.eu accordingly, and preferably, with as little delay as possible. It might also be a good idea to validate all of these documents against the revised schema with a reliable XML parser like e.g. the Xerces-J XML parser. Also, if you like, you can check by validating some of our current "real-life" mock xml documents that we have been using for client development ([source:DASISH/t5.6/client/trunk/chrome/markingcollection/content/markingcollection/annotator-service/test/mockjax/mocks]). == Cached Representation BLOB == For the back-end (and the database!): the cached representation for now is stored in the database as a BLOB. See: http://dba.stackexchange.com/questions/803/blobs-or-references-in-postgresql/815#815. Also, see https://trac.clarin.eu/ticket/366 [[BR]][[BR]] ''Stephanie:'' Please define what data needs to be sent from the client regarding cached representations (the HTML markup, any css files, images, ...?). In Wired-Marker, what is called "cache" is a locally saved copy of the HTML markup plus a css file with content aggregated from the css files used by the original website (cf. DiscussionPage: Answer 1 on Versioning). ''Twan:'' no specific format is expected but in the current specification 'a cached representation' matches up with a single file. So according to this the client should choose a representation that can be represented as a single file or use some method of wrapping multiple files up into a single entity. Apart from that, the exact nature of the cached representation is taken to be client specific, so you can choose what format suits the plugin best. If the current single file situation really turns out to be problematic, let us know and we could consider alternative solutions that do support grouping multiple files together as a single cached representation entity. ''Olof + Stephanie:'' For a start, we can tune the client to send the HTML representation as a single file only. A refined solution might be to extend the content of this file by adding the complete, aggregated css content from the stylesheets inline, i.e. within the range of {{{ ... }}} in the {{{}}}-tag. ''Olha, Menzo, Twan:'' If that's relatively easy for you to do, that would probably suffice in the initial phase. For a more complete solution (i.e. including images, scripts etc) you could perhaps consider [https://en.wikipedia.org/wiki/MHTML MIME HTML] or alternatively something like [https://en.wikipedia.org/wiki/Mozilla_Archive_Format the Mozilla Archive Format]. Cross-browser support could be a bit of an issue there but the situation does not look too bad.

Contents