wiki:DASISH/SpecificationDocument

Version 76 (modified by olhsha, 11 years ago) (diff)

--

DASISH WEB-ANNOTATOR

TLA

This document specifies a browser extension for annotating web-documents. We present the class structure of the implementation, describe the functionality from the user perspective and define the REST API.

Document version: 1.1

Date: 14 April 2013

Authors: Olha Shkaravska, Przemek Lenkiewicz, Menzo Windhouwer, Twan Goosen, Daan Broeder

Contents

  1. Technical Summary
  2. Model
  3. Initial Annotation-Body Types
  4. User Interface prototype
    1. Main window view
    2. Context menu
  5. REST API
    1. User realm
    2. Annotations
      1. api/annotations
      2. api/annotations/<aid>
    3. Sources
      1. api/sources
    4. Notebooks
      1. api/notebooks
  6. APPENDIX 1

Technical Summary

The aim of this document is to give specifications for a web-annotating tool, which is to be developed within the DASISH project. The tool is a browser extension that allows to annotate fragments of web documents by tags, colors and text notes. The annotatable fragments may be texts and, on the later stages of development, graphical objects as well.

Initially the tool will allow to annotate only web-pages. Later we plan to extend the tool to annotate web-documents generated by linguistic software, e.g. EAF-files, created by ELAN (MPI Nijmegen), or lexical entries created by LEXUS (MPI Nijmegen). We do not want to limit annotatable objects by those generated by DASISH participants and plan to include external linguistic software to our case study.

<aid>Annotation identifier
<cid>Cached Representation identifier
<datetime>Date and time, including time zone, as defined in http://www.w3.org/TR/xmlschema-2/#dateTime
<nid> Notebook identifier
<prefix>The prefix of a namespace
<sid>Source identifier
<text>Some text
<uid> a user identifier
<URI>URI, as defined in http://tools.ietf.org/html/rfc3986, can include fragment description as an x-path
UserA person, a group, or “everyone” (public)

Passing a <uid> as a parameter in the description of the REST service should never be required as a means of identification, because the active principal is known from the session via “Shibboleth” identification procedure.

An owner is either the principal who has created the annotation or a principal to whom the ownership has been assigned.

Model

Class “Annotation” is in the center of the model schema. An instance of "Annotation" is in the “target” relation with one or more objects of class “Source”. Semantics of an Annotation object is defined in its attribute “Body”. Annotations bodies express variety of the possibilities to annotate documents, from marking their fragments with simple text tags or colors, to putting arbitrary text notes. For now there is no fixed xsd-schema for a body, it must be just a valid xml.

The model schema is based on the following interfaces and classes:

  • class Source represents (a specific fragment of) a specific version of an annotatable object; it contains information about this version, such as a time stamp, the version string, the reference to the list of the versions-siblings.
  • class Annotation that contains the references to the annotation’s body, to the sources, to the list permissions and to the owner.
  • Cached representation contains a reference to the file with a specific cached representation and the meta-information about it, such is its mime-type, date of creation, etc.
  • Interface Body (of annotation) (can be text, “like”, color, relation, etc.).

See the schema for serializing these classes, and examples.

Initial Annotation-Body Types

In the first prototype we plan to implement only 1-target annotations with the body type “Note”. From the user perspective they are just text notes about fragments of the document a-la comment in Word Documents, but displayed only in a list or as a tooltip (like the Wired Marker currently does). Balloon display as done in MS Word can be implemented in further stage.

In general we plan to implement the body types following the class diagram above. Recall that these body types, besides “Notes”, are: color, tag (a unary relation), labeled tag (a unary relation with parameters), binary relations. Below we present series of instances of these body types. Implementing these instances within our tool will have two-fold effect:

  • first, it will serve for user’s convenience by providing a drop-down menu of annotations once a fragment to be annotated is selected,
  • second, it will show that within the proposed class schema it is possible to create reasonable types of annotations,

To create an annotation, the user needs to highlight the text and right-click the mouse. The creation menu should appear near the highlighted text (or on the right sub-panel of the whole panel). There the user can select the type of annotation and add other parameters when necessary. It may be possible to highlight the second fragment for binary relations using Shift(s).

For the existing annotations, left mouse click on the highlighted text triggers a “callout” (or a rectangular box, connected to the text fragment) with a short annotation description. It is applicable for tags and relations (see below). Right mouse click on the highlighted text triggers the context menu that contains the complete information about annotation: its author, date, its URI.

User Interface prototype

Main window view

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/UI.png

Context menu

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/MENU.png

REST API

Web-documents exist in time, that is different versions of the document may exist under the same link in different moments of time. We will rely on local caching of versions of annotated sources (as already exists as a feature in Wired-Marker), see Unresolvable targets in Scenario for an example.

All information necessary to fulfill a PUT, POST or DELETE request, such as the URI of an annotated object, is given “serialized” in the request body, but not as request parameters in the request’s URI. If a POST (PUT, DELETE) method is performed, then in the case of success it returns a serialized information about the added (resp. updated, removed) resource together with a standard HTTP response code. The information includes: the resource ID, owner’s ID, time stamp, (possibly) the list of the <sid>’s of the target sources. For the full information the user will use GET on a just created/ updated annotation, already knowing its ID. In the case of failure the corresponding error message and error status are returned, e. g, 401 Unauthorized access. Only “owner” has DELETE rights.

For now in this document the descriptions of the requests below often refer to the corresponding descriptions on the Scenario wiki-page. It will be the other way around, i.e. the Scenario will refer to this specification document after projecting stabilizes.

User realm

Resource Description
GET api/users/uid see Authentication in Scenario
GET api/users/info?email=<...> see Managing a list of permissions in Scenario

Annotations

api/annotations

Resource Description
GET api/annotations?link=<URI>&text=<text>&access=[read, write]&ns=<prefix>:<ns>&owner=<uid>&after=<datetime1>&before=<datetime2> Returns a FILTERED by the request parameters list of info-s of the the annotations: for <URI>, to which the inlogged <uid> has “read” (resp.”write”) access and the bodies of which contain the text <text>. Moreover, these annotations are created between <datetime1> and <datetime2>. If the parameter “link” is omitted, then considers all annotated objects to which <uid> has “read”/”write” access. The default <datetime1> can be 01 Jan 1970, 00:00. The default <datetime2> is today.
POST api/annotationsAdds a new annotation by picking up its XML-serialization from the request body.

api/annotations/<aid>

It is assumed, that if the logged-in user <uid> has no “read” access to <aid> then GET methods over URI-s of the form api/annotations/<aid> will return error status Unauhtorized access 401, or similar. The same happens if the logged-in user <uid> has no “write” access to <aid> with PUT, POST and DELETE methods.

The table below describes the behavior of the pair (method, URI), when user <uid> has authorized access to <aid>. Here “authorized access “ means that <uid> has “read” access for GET-methods, and “write” access for PUT, POST, and DELETE methods.

Resource Description
GET api/annotations/<aid>Returns the annotation that has this <aid>.
GET api/annotations/<aid>/sourcesReturns the list of the <sid>-s of all the target sources of <aid>.
DELETE api/annotations/<aid>Removes <aid> from the database, together with all its target sources to which no other annotation refers . Returns a status code.
PUT api/annotations/<aid>Updates the annotation with <aid>. For instance, it is used when <uid> wants to correct typos in the annotation body AND change annotated fragments. (See PUT api/annotations/<aid>/body for correcting body only.) The serialized representation of the updated annotation is given in the request body. The server return an "envelope" containing the updated annotation and the list of actions.
PUT api/annotations/<aid>/bodyUpdates the body of the annotation <aid>. Used e.g. for correcting typos in the text part. the server returns the "envelope", see above.
GET api/annotations/<aid>/permissions See Scenario, managing permission lists
PUT api/annotations/<aid>/permissions See Scenario, managing permission lists
PUT api/annotations/<uid>/permissions/UIDagc See Scenario, managing permission lists

Sources

A source represents (a specific fragment of) a specific version of an annotatable object. For instance, if an annotatable object is a web-page that has 3 versions and users have annotated versions 1 and 3, then there are 2 sources in the Data-Base that correspond to the “web-page”. Naturally, these sources represent versions 1 and 3.

Adding sources to the DataBase? and removing them is a responsibility of the Data-Base Management System. In fact, adding a source is a “side effect” of creating an annotation on a certain URI.

If all the annotations that refer to a certain source are deleted, then the DB managing part deletes this source from the DB. A read-only REST API for inspecting Sources is needed.

Cached representations are managed by the client, therefore creation and deletion API is necessary. It is possible to store the cashed representation not only of the fragment precisely corresponding to an annotation target source, but of a larger fragment and even of the entire annotatable object.

api/sources

Resource Description
GET api/sources/<sid> See the second step for Unresolvable targets in Scenario
GET api/sources/<sid>/versionsReturns the lists of the <sid>-s (URIs) of all the “sibling”-versions of the {{{<sid>}}
GET api/sources/<sid>/cached/<cid>/metadataReturns the meta-nformation of <cid> if it exists.
GET api/sources/<sid>/cached/<cid>/contentReturns the file that is the cached representation with <cid> if it exists.
POST api/sources/<sid>/cachedIt is a multipart POST, with the request body consisting of a description containing the metadata specified by the Cached Representation realization class, e.g., screenshot, and a single file (multiple files must be archived). The description has a form as follows: <cachedrepresentation-description><mime>multipart/related</mime><tool>ToolID01</tool><type>MHTML</type></cachedrepresentation-description> Adds a new cached representation of <sid>, by taking the cached representation from the request body.
DELETE api/sources/<sid>/cached/<cid>Removes the cached representation <cid> given in the body of the request from the list of cached representations of the <sid>. It is removed from the database as well, unless there are no more references to this representation.

Notebooks

api/notebooks

POST api/annotations and PUT /notebooks/<nid>?annotation=<aid>.
ResourceDescription
GET api/notebooks Returns notebook-infos for the notebooks accessible to the current user.
GET api/notebooks/ownedReturns the list of all notebooks owned by the current logged user.
GET api/notebooks/<nid>/readersReturns the list of <uid> who allowed to read the annotations from notebook.
GET api/notebooks/<nid>/writersReturns the list of <uid> that can add annotations to the notebook.
GET api/notebooks/<nid>/metadataGet all metadata about a specified notebook <nid>, including the information if it is private or not.
GET api/notebooks/<nid>?maximumAnnotations=limit&startAnnotation=offset&orderby=orderby&orderingMode=1|0Get the list of all annotations <aid>-s contained within a Notebook with related metadata. Parameters: <nid>, optional maximumAnnotations specifies the maximum number of annotations to retrieve (default -1, all annotations), optional startAnnotation specifies the starting point from which the annotations will be retrieved (default: -1, start from the first annotation), optional orderby, specifies the RDF property used to order the annotations (default: dc:created ), optional orderingMode specifies if the results should be sorted using a descending order desc=1 or an ascending order desc=0 (default: 0 ).
PUT /notebooks/<nid>Modify metadata of <nid>. The new notebook’s name must be sent in request’s body.
PUT /notebooks/<nid>/<aid>Adds an annotation <aid> to the list of annotations of <nid>.
POST api/notebooks/Creates a new notebook. This API returns the <nid> of the created Notebook in response’s payload and the full URL of the notebook adding a Location header into the HTTP response. The name of the new notebook can be specified sending a specific payload.
DELETE api/notebooks/<nid>Delete <nid>. Annotations stay, they just lose connection to <nid>.
POST api/notebooks/<nid>Creates a new annotation in <nid>. The content of an annotation is given in the request body. In fact this is a short cut of two actions --

APPENDIX 1

For Appendix 1 please see the DOC file of this document. Note that it is obsolete except its Appendix.

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/DASISH-Annotator-1.1-snapshot.docx

Attachments (22)