wiki:DASISH/SpecificationDocument

Version 59 (modified by olhsha, 11 years ago) (diff)

--

DASISH WEB-ANNOTATOR

TLA

This document specifies a browser extension for annotating web-documents. We present the class structure of the implementation, describe the functionality from the user perspective and define the REST API.

Document version: 1.1

Date: 14 April 2013

Authors: Olha Shkaravska, Przemek Lenkiewicz, Menzo Windhouwer, Twan Goosen, Daan Broeder

Contents

  1. Technical Summary
  2. Model
  3. Initial Annotation-Body Types
  4. User Interface prototype
    1. Main window view
    2. Context menu
  5. REST API
    1. User realm
    2. Annotations
      1. api/annotations
      2. api/annotations/<aid>
    3. Sources
      1. api/sources
    4. Notebooks
      1. api/notebooks
  6. APPENDIX 1

Technical Summary

The aim of this document is to give specifications for a web-annotating tool, which is to be developed within the DASISH project. The tool is a browser extension that allows to annotate fragments of web documents by tags, colors and text notes. The annotatable fragments may be texts and, on the later stages of development, graphical objects as well.

Initially the tool will allow to annotate only web-pages. Later we plan to extend the tool to annotate web-documents generated by linguistic software, e.g. EAF-files, created by ELAN (MPI Nijmegen), or lexical entries created by LEXUS (MPI Nijmegen). We do not want to limit annotatable objects by those generated by DASISH participants and plan to include external linguistic software to our case study.

UserA person, a group, or “everyone” (public)
<uid> a user identifier
<aid>Annotation identifier
<sid>Source identifier
<datetime>Date and time, including time zone, as defined in http://www.w3.org/TR/xmlschema-2/#dateTime
<cid>Cached Representation identifier
<URI>URI, as defined in http://tools.ietf.org/html/rfc3986, can include fragment description as an x-path
<prefix>The prefix of a namespace
<text>Some text

Passing a <uid> as a parameter in the description of the REST service should never be required as a means of identification, because the active principal is known from the session via “Shibboleth” identification procedure.

An owner is either the principal who has created the annotation or a principal to whom the ownership has been assigned.

Model

Class “Annotation” is in the center of the model schema. An instance of "Annotation" is in the “target” relation with one or more objects of class “Source”. Semantics of an Annotation object is defined in its attribute “Body”. Annotations bodies express variety of the possibilities to annotate documents, from marking their fragments with simple text tags or colors, to putting arbitrary text notes. For now there is no fixed xsd-schema for a body, it must be just a valid xml.

The model schema is based on the following interfaces and classes:

  • class Source represents (a specific fragment of) a specific version of an annotatable object; it contains information about this version, such as a time stamp, the version string, the reference to the list of the versions-siblings.
  • class Annotation that contains the references to the annotation’s body, to the sources, to the list permissions and to the owner.
  • Cached representation contains a reference to the file with a specific cached representation and the meta-information about it, such is its mime-type, date of creation, etc.
  • Interface Body (of annotation) (can be text, “like”, color, relation, etc.).

See the schema for serializing these classes and examples.

Initial Annotation-Body Types

In the first prototype we plan to implement only 1-target annotations with the body type “Note”. From the user perspective they are just text notes about fragments of the document a-la comment in Word Documents, but displayed only in a list or as a tooltip (like the Wired Marker currently does). Balloon display as done in MS Word can be implemented in further stage.

In general we plan to implement the body types following the class diagram above. Recall that these body types, besides “Notes”, are: color, tag (a unary relation), labeled tag (a unary relation with parameters), binary relations. Below we present series of instances of these body types. Implementing these instances within our tool will have two-fold effect:

  • first, it will serve for user’s convenience by providing a drop-down menu of annotations once a fragment to be annotated is selected,
  • second, it will show that within the proposed class schema it is possible to create reasonable types of annotations,

To create an annotation, the user needs to highlight the text and right-click the mouse. The creation menu should appear near the highlighted text (or on the right sub-panel of the whole panel). There the user can select the type of annotation and add other parameters when necessary. It may be possible to highlight the second fragment for binary relations using Shift(s).

For the existing annotations, left mouse click on the highlighted text triggers a “callout” (or a rectangular box, connected to the text fragment) with a short annotation description. It is applicable for tags and relations (see below). Right mouse click on the highlighted text triggers the context menu that contains the complete information about annotation: its author, date, its URI.

User Interface prototype

Main window view

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/UI.png

Context menu

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/MENU.png

REST API

Web-documents exist in time, that is different versions of the document may exist under the same link in different moments of time. We will rely on local caching of versions of annotated sources (as already exists as a feature in Wired_-Marker), see Scenario#Unresolvable target for an example.

All information necessary to fulfill a PUT, POST or DELETE request, such as the URI of an annotated object, is given “serialized” in the request body, but not as request parameters in the request’s URI. If a POST (PUT, DELETE) method is performed, then in the case of success it returns a serialized information about the added (resp. updated, removed) resource together with a standard HTTP response code. The information includes: the resource ID, owner’s ID, time stamp, (possibly) the list of the <sid>’s of the target sources. For the full information the user will use GET on a just created/ updated annotation, already knowing its ID. In the case of failure the corresponding error message and error status are returned, e. g, 401 Unauthorized access. Only “owner” has DELETE rights.

For now in this document the descriptions of the requests below often refer to the corresponding descriptions on the Scenario wiki-page. It will be the other way around, i.e. the Scenario will refer to this specification document after projecting stabilizes.

User realm

Resource Description
GET api/users/uid see Authentication in Scenario
GET api/users/info?email=<...> see Managing a list of permissions in Scenario

Annotations

api/annotations

Resource Description
GET api/annotations?source=<URI>&text=<text>&access=[read, write]&ns=<prefix>:<ns>&xpath=<xpath>&owner=<uid>&after=<datetime1>&before=<datetime2>Returns the list of <aid>-s of the annotations of the annotated object located at <URI>, to which the inlogged <uid> has “read” (resp.”write”) access and the bodies of whom contain the text <text>. Moreover, these annotations are created between <datetime1> and <datetime2>. If the parameter “source” is omitted, then considers all annotated objects to which <uid> has “read”/”write” access. Parameter xpath allows to search over the parts of annotations body, e.g. <xpath> may be body[@type=’relation’]/relation=’contradiction’. For this one needs the URI of namespace <ns> represented by prefixes <prefix>. The default <xpath> is “empty” and implies no limitation. The default <datetime1> can be 01 Jan 1970, 00:00. The default <datetime2> is today.
POST api/annotationsAdds a new annotation by picking up its XML-serialization from the request body. The XML serialization should include the annotated object URI’s and annotation body (e.g. text).

api/annotations/<aid>

It is assumed, that if the logged-in user <uid> has no “read” access to <aid> then GET methods over URI-s of the form api/annotations/<aid> will return error status Unauhtorized access 401, or similar. The same happens if the logged-in user <uid> has no “write” access to <aid> with PUT, POST and DELETE methods over the URI-s of the form api/annotations/<aid> .

The table below describes the behavior of the pair (method, URI), when user <uid> has authorized access to <aid>. Here “authorized access “ means that <uid> has “read” access for GET-methods, and “write” access for PUT, POST, and DELETE methods.

Resource Description
GET api/annotations/<aid>Returns the serialized annotation that has this <aid>.
GET api/annotations/<aid>/bodyReturns the body of the <aid>. It includes the body and some meta-data (the owner, date of creation, the URI-s of the target sources, the lists of readers and writers). Does not include list of notebooks where this annotation belongs to.
GET api/annotations/<aid>/sourcesReturns the list of the <sid>-s of all the target sources of <aid>.
GET api/annotations/<aid>/notebooksReturns the list of the <nid>-s and the names of all the target sources of <aid>.
DELETE api/annotations/<aid>Removes <aid> and all its target sources from the database. Returns the serialized representation of the removed <aid> with the message “the following annotation has been removed” or similar.
PUT api/annotations/<aid>Updates the annotation with <aid>. E.g. it is used when <uid> wants to correct typos in the annotation body AND change annotated fragments. (See PUT api/annotations/<aid>/body for correcting body only.) The serialized representation of the updated annotation is given in the request body.
PUT api/annotations/<aid>/bodyUpdates the body of the annotation <aid>. Used e.g. for correcting typos in the text part. The updated annotation’s body is given in the body of the request.

Sources

A source represents (a specific fragment of) a specific version of an annotatable object. For instance, if an annotatable object is a web-page that has 3 versions and users have annotated versions 1 and 3, then there are 2 sources in the Data Base that correspond to the “web-page”. Naturally, these sources represent versions 1 and 3.

Note that access to the whole document with <aoid> is possible via its <sid>=<aoid>#, with empty fragment descriptor.

Adding sources to the DataBase? and removing them is a responsibility of the DataBase? Management System. In fact, adding a source is a “side effect” of creating an annotation on a certain URI. Moreover, is the source with <sid>=<aoid>@<vid>#XXX is added to the DB, then the source <sid>=<aoid>@<vid># must be added as well, unless it is already in the DB.

If all the annotations that refer to a certain source are deleted, then the DB managing part deletes this source from the DB. A read-only REST API for inspecting Sources (incl. fragments) is needed.

Cached representations are managed by the client, therefore creation and deletion API is necessary. It is possible to store the cashed representation not only of the fragment precisely corresponding to an annotation target source, but of a larger fragment and even of the entire annotatable object.

api/sources

Resource Description
GET api/sources?uri=<aoid>&maxSources=<number>Returns the lists of the <sid>-s of all the sources referring to<aoid>, that is the sources with the <sid>-s of the form <aoid>@XXX#YYY. The length of the list is bound by <number>. The default length (maxSources value) must be provided. Alternatively/additionally, one may use paging to list the sources. Instead of ?uri=<aoid> it may be possible to use other ways of scoping the request GET api/sources, for instance ?uriprefix=URI.
GET api/sources/<sid>/versionsReturns the lists of the <sid>-s (URIs) of all the “sibling”-versions of the <sid>=<aoid>XXXYYY that is the list of <sid>’s of the form <aoid>ZZZYYY
GET api/sources/<sid>/cachedReturns the list of meta-information of all the cached representations of <sid>. The meta-information of a cached representation includes: <cid>,MIME type, subtype (e.g. “screenshot”), size, the tool ID which opens the representation.
GET api/sources/<sid>/cached/<cid>/metadataReturns the meta-nformation of <cid> if it exists.
GET api/sources/<sid>/cached/<cid>/contentReturns the file that is the cached representation with <cid> if it exists.
POST api/sources/<sid>/cachedIt is a multipart POST, with the request body consisting of a description containing the metadata specified by the Cached Representation realization class, e.g., screenshot, and a single file (multiple files must be archived). The description has a form as follows: <cachedrepresentation-description><mime>multipart/related</mime><tool>ToolID01</tool><type>MHTML</type></cachedrepresentation-description> Adds a new cached representation of <sid>, by taking the cached representation from the request body.
DELETE api/sources/<sid>/cached/<cid>Removes the cached representation <cid> given in the body of the request from the list of cached representations of the <sid>. It is removed from the database as well, unless there are no more references to this representation.

Notebooks

api/notebooks

POST api/annotations and PUT /notebooks/<nid>?annotation=<aid>.
ResourceDescription
GET api/notebooks Returns notebooks accessible to the current user. For each notebook attributes indicate: whether it is owned by user; whether user can read; whether user can write
GET api/notebooks/ownedReturns the list of all notebooks owned by the current logged user.
GET api/notebooks/<nid>/readersReturns the list of <uid> who allowed to read the annotations from notebook.
GET api/notebooks/<nid>/writersReturns the list of <uid> that can add annotations to the notebook.
GET api/notebooks/<nid>/metadataGet all metadata about a specified notebook <nid>, including the information if it is private or not.
GET api/notebooks/<nid>?maximumAnnotations=limit&startAnnotation=offsen&orderby=orderby&orderingMode=1|0Get the list of all annotations <aid>-s contained within a Notebook with related metadata. Parameters: <nid>, optional maximumAnnotations specifies the maximum number of annotations to retrieve (default -1, all annotations), optional startAnnotation specifies the starting point from which the annotations will be retrieved (default: -1, start from the first annotation), optional orderby, specifies the RDF property used to order the annotations (default: dc:created ), optional orderingMode specifies if the results should be sorted using a descending order desc=1 or an ascending order desc=0 (default: 0 ).
PUT /notebooks/<nid>Modify metadata of <nid>. The new notebook’s name must be sent in request’s body.
PUT /notebooks/<nid>?annotation=<aid>Adds an annotation <aid> to the list of annotations of <nid>.
PUT api/notebooks/<nid>/setPrivate=[true, false]Sets the specified Notebook as private or not private.
POST api/notebooks/Creates a new notebook. This API returns the <nid> of the created Notebook in response’s payload and the full URL of the notebook adding a Location header into the HTTP response. The name of the new notebook can be specified sending a specific payload.
POST api/notebooks/<nid>Creates a new annotation in <nid>. The content of an annotation is given in the request body. In fact this is a short cut of two actions:
DELETE api/notebooks/<nid>Delete <nid>. Annotations stay, they just lose connection to <nid>.

APPENDIX 1

For Appendix 1 please see the DOC file of this document. Note that it is obsolete except its Appendix.

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/DASISH-Annotator-1.1-snapshot.docx

Attachments (22)