wiki:DASISH/SpecificationDocument

Version 47 (modified by twagoo, 11 years ago) (diff)

api/user/uid -> api/user/info

DASISH WEB-ANNOTATOR

TLA

This document specifies a browser extension for annotating web-documents. We present the class structure of the implementation, describe the functionality from the user perspective and define the REST API.

Document version: 1.1

Date: 14 April 2013

Authors: Olha Shkaravska, Przemek Lenkiewicz, Menzo Windhouwer, Twan Goosen, Daan Broeder

Contents

  1. Technical Summary
  2. Model
  3. Initial Annotation-Body Types
  4. User Interface prototype
    1. Main window view
    2. Context menu
  5. REST API
    1. User realm
      1. Alternative
    2. Annotations
      1. api/annotations
      2. api/annotations/<aid>
    3. Sources
      1. api/sources
    4. Notebooks
      1. api/notebooks
  6. APPENDIX 1

Technical Summary

The aim of this document is to give specifications for a web-annotating tool, which is to be developed within the DASISH project. The tool is a browser extension that allows to annotate fragments of web documents by tags, colors and text notes. The annotatable fragments may be texts and, on the later stages of development, graphical objects as well.

Initially the tool will allow to annotate only web-pages. Later we plan to extend the tool to annotate web-documents generated by linguistic software, e.g. EAF-files, created by ELAN (MPI Nijmegen), or lexical entries created by LEXUS (MPI Nijmegen). We do not want to limit annotatable objects by those generated by DASISH participants and plan to include external linguistic software to our case study.

The heart of the class schema of the project is class “Annotation”. An object of Annotation class is in the “target” relation with one or more objects of class “Source”. Semantics of an Annotation object is defined in its attribute “Body”. There are a few types of annotations bodies that express variety of the possibilities to annotate documents, from marking their fragments with simple text tags or colors, to putting arbitrary text notes.

UserA person, a group, or “everyone” (public)
<uid>Principal identifier (a principal is either user or a group)
<aid>Annotation identifier
<aoid>URI of an annotatable object outside the DB
<vid>Version identifier, which may be a number, a time stamp or both, depending on the origin of the document.
<fid>A string that describes a fragment within a given document. Examples: <xpath> for XML documents, coordinates for graphics.
<sid>The identifier of an annotated source: <aoid>@<vid>#<fid>. If <fid> is empty then <sid> refers to the whole document given by <aoid>@<vid>. The default <vid> (when omitted) corresponds to the latest version. Abbreviation <sid> mimics “source identifier”.
<datetime>Date and time, including time zone, as defined in http://www.w3.org/TR/xmlschema-2/#dateTime
<cid>Cached Representation identifier
<URI>URI, as defined in http://tools.ietf.org/html/rfc3986
<prefix>The prefix of a namespace
<text>Some text

An example of <sid> is given by the URI

http://tla.mpi.nl/#xpointer(//div[id='post-1157']/p/substring(.,33,3))

Here the part http://tla.mpi.nl/ is an <aoid> and the part xpointer(//div[id='post-1157']/p/substring(.,33,3)) is a <fid> . Since <vid> is not given, the <sid> refers to the latest version of the resource located at http://tla.mpi.nl/ .

Passing a <uid> as a parameter in the description of the REST service should never be required as a means of identification, because the active principal is known from the session via “Shibboleth” identification procedure.

An owner is either the principal who has created the annotation or a principal to whom the ownership has been assigned.

Model

The model schema is based on the following interfaces and classes:

  • class Source represents (a specific fragment of) a specific version of an annotatable object; it contains information about this version, such as a time stamp, the lists of references to cashed representations;
  • class Annotation that contains the references to the annotation’s body (that contains the list of sources which it annotates), also the name of the owner, the lists of “readers” and “writers”;
  • interface Cached representation is a generic interface to be implemented by different representations of annotatable resources like serialized ones (e.g. XML-sed), media-files, screenshots;
  • interface Body (of annotation) (can be text, “like”, color, relation, etc.); contains the reference to the annotation.

The  DASISH model (UML)

We propose the following XML-serializations.

<?xml version="1.0" encoding="UTF-8"?>
<source xmlns="http://www.dasish.eu/ns/addit"
    xml:id="AOID01V03XX"
    lastModified="2011-10-10T12:00:00-05:30">
    <URI>https:/www.dasish.eu/annotationDB/AOID01V03#XX</URI>
    <versionString>3.0</versionString>
    <versions>
        <version ref="AOID01V03XX"/>
        <version ref="AOID01V02XX"/>
        <version ref="AOID01V01XX"/>
    </versions>
    <cachedRepresentations>
        <cachedRepresentation xml:id="CID01V0301X" 
            URI="https:/www.dasish.eu/annotationDB/CID01@V0301#X" 
            mimeType="multipart/related" tool="ToolID01" type="MHTML"/>
          <!-- the subclass serialization could lead to some extra info here -->
        <cachedRepresentation xml:id="CID01V0302" 
            URI="https:/www.dasish.eu/annotationDB/CID01@V0301#X" 
            mimeType="image/png" tool="ToolID02" type="screenshot"/>
        <!-- the subclass serialization could lead to some extra info here, e.g the screenshot might be for a specific annotation 
             as it shows only the part of the Source that was visible to that user -->
    </cachedRepresentations>
</source>

Note that the MIME type for MHTML is taken from Wikipedia, but there seems to be some discussion about this approach.

http://en.wikipedia.org/wiki/MHTML http://stackoverflow.com/questions/31250/content-type-for-mht-files

An annotation whose body is a binary relation (in this example “implies”) The intended meaning of the following example is that source1 implies source2.

<?xml version="1.0" encoding="UTF-8"?>
<annotation xmlns="http://www.dasish.eu/ns/addit" xml:id="AID01"
    URI="https:/www.clarin.eu/annotationDB/AID01" 
    timeStamp="2012-10-10T12:00:00-05:30">
    <owner ref="UID001"/>
    <headline>Example relation body</headline> 
    <body type="relation">
       <relation>implies</relation>
        <from ref="SID01XX"/>
        <to ref="SID01YY"/>   
    </body>
     <targetSources>
         <targetSource xml:id="SID01V1XX">
             <URI>http:/tla.mpi.nl#XX</URI>
             <versionString>1.5</versionString>
         </targetSource> 
         <targetSource xml:id="SID02V1XX">
             <URI>http:/tla.mpi.nl#XX</URI>
             <versionString>2.0</versionString>
         </targetSource> 
    </targetSources>
    <readers>
        <reader ref="UID001"/>
        <reader ref="UID234"/>
        <reader ref="UID345"/>
    </readers>
    <writers>
        <writer ref="UID001"/>
        <writer ref="UID234"/>
    </writers>
    <notebooks>
        <notebook ref="NID1"/>
        <notebook ref="NID2"/>
    </notebooks>
</annotation>

An annotation whose body is “Note” (see the section about the types of annotations)

<?xml version="1.0" encoding="UTF-8"?>
<annotation xmlns="http://www.dasish.eu/ns/addit" xml:id="AID02"
    URI="https:/www.clarin.eu/annotationDB/AID02" 
    timeStamp="2012-12-12T12:12:12-00:00">
    <owner ref="UID001"/>
    <headline>Example text annotation</headline> <!-- schematron checks the length <== 100 -->
    <body type="Note" 
        xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en">
        <xhtml:p></xhtml:p>
    </body>
    <targetSources>
        <targetSource xml:id="SID1V1XX">
            <URI>www.google.nl</URI>
            <versionString>1.5</versionString>
        </targetSource> 
    </targetSources>
    <readers>
        <reader ref="UID001"/>
        <reader ref="UID123"/>
        <reader ref="UID124"/>
    </readers>
    <writers>
        <writer ref="UID001"/>
        <writer ref="UID123"/>
    </writers>
    <notebooks>
        <notebook ref="NID1"/>
        <notebook ref="NID2"/>
    </notebooks>
</annotation>

Note that “full” XML presentations as above may be returned by the corresponding GET methods. When we want to POST a new annotation then we know less known about it: for instance, it does not have an assigned identifier yet. We propose the following serialization of a new annotation:

<?xml version="1.0" encoding="UTF-8"?>
<annotation xmlns="http://www.dasish.eu/ns/addit" xml:id="AID05"
    URI="https:/www.clarin.eu/annotationDB/AID05" 
    timeStamp="2012-12-12T12:12:12-00:00">
    <owner ref="UID001"/>
    <headline>Example new annotation</headline> <!-- schematron checks the length <== 100 -->
    <body type="relation">
      <relation>implies</relation>
        <from ref="tempSID1"/>
        <to ref="tempSID2"/>
    </body>
   <targetSources>
    <targetSource xml:id="tempSID1">
      <URI>http:/tla.mpi.nl#XX</URI>
      <versionString>1.5</versionString>
    </targetSource> 
    <targetSource xml:id="tempSID2">
      <URI>http:/tla.mpi.nl#XX</URI>
      <versionString>2.0</versionString>
    </targetSource>
  </targetSources>
  <readers>
    <reader ref="UID001"/>
  </readers>
  <writers>
    <writer ref="UID001"/>
  </writers>
  <notebooks>
  </notebooks>
</annotation>

Initial Annotation-Body Types

In the first prototype we plan to implement only 1-target annotations with the body type “Note”. From the user perspective they are just text notes about fragments of the document a-la comment in Word Documents, but displayed only in a list or as a tooltip (like the Wired Marker currently does). Balloon display as done in MS Word can be implemented in further stage.

In general we plan to implement the body types following the class diagram above. Recall that these body types, besides “Notes”, are: color, tag (a unary relation), labeled tag (a unary relation with parameters), binary relations. Below we present series of instances of these body types. Implementing these instances within our tool will have two-fold effect:

  • first, it will serve for user’s convenience by providing a drop-down menu of annotations once a fragment to be annotated is selected,
  • second, it will show that within the proposed class schema it is possible to create reasonable types of annotations,

To create an annotation, user needs to highlight the text and right-click the mouse. The creation menu should appear near the highlighted text (or on the right sub-panel of the whole panel). There the user can select the type of annotation and add other parameters when necessary. It may be possible to highlight the second fragment for binary relations using Shift(s).

For the existing annotations, left mouse click on the highlighted text triggers a “callout” (or a rectangular box, connected to the text fragment) with a short annotation description. It is applicable for tags and relations (see below). Right mouse click on the highlighted text triggers the context menu that contains the complete information about annotation: its author, date, its URI.

User Interface prototype

Main window view

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/UI.png

Context menu

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/MENU.png

REST API

Remark on document versioning. Web-documents exist in time, that is different versions of the document may exist under the same URI (<aoid>) in different moments of time. In the first prototype we implement only the simplest necessary handling of the versions of the web-document. In the first implementation we omit REST requests concerning versions and rely on local caching of old versions of annotated sources (as already exists as a feature in Wired_-Marker).

All information necessary to fulfill a PUT, POST or DELETE request, such as the URI of an annotated object, is given “serialized” in the request body, but not as request parameters in the request’s URI. If a POST (PUT, DELETE) method is performed, then in the case of success it returns a serialized information about the added (resp. updated, removed) resource together with a standard HTTP response code. The information includes: the resource ID, owner’s ID, time stamp, (possibly) the list of the <sid>’s of the target sources. For the full information the user will use GET on a just created/ updated annotation, already knowing its ID. In the case of failure the corresponding error message and error status are returned, e. g, 401 Unauthorized access. Only “owner” has DELETE rights.

User realm

Resource Description
GET api/user/ Returns logged-in user's UID and two list of notebooks (read and write), incl. their id's and rendering names.

Alternative

Proposed by Twan

Resource Description
GET api/user/info Returns logged-in user's UID (and possibly other info)
  • Retrieval of readable and writable notebooks through call to /api/notebooks, e.g.
Resource Description
GET api/notebooks Returns notebooks accessible to the current user. For each notebook attributes indicate: whether it is owned by user; whether user can read; whether user can write

Annotations

api/annotations

Resource Description
GET api/annotations?source=<URI>&text=<text>&access=[read, write]&ns=<prefix>:<ns>&xpath=<xpath>&owner=<uid>&after=<datetime1>&before=<datetime2>Returns the list of <aid>-s of the annotations of the annotated object located at <URI>, to which the inlogged <uid> has “read” (resp.”write”) access and the bodies of whom contain the text <text>. Moreover, these annotations are created between <datetime1> and <datetime2>. If the parameter “source” is omitted, then considers all annotated objects to which <uid> has “read”/”write” access. Parameter xpath allows to search over the parts of annotations body, e.g. <xpath> may be body[@type=’relation’]/relation=’contradiction’. For this one needs the URI of namespace <ns> represented by prefixes <prefix>. The default <xpath> is “empty” and implies no limitation. The default <datetime1> can be 01 Jan 1970, 00:00. The default <datetime2> is today.
POST api/annotationsAdds a new annotation by picking up its XML-serialization from the request body. The XML serialization should include the annotated object URI’s and annotation body (e.g. text).

api/annotations/<aid>

It is assumed, that if the logged-in user <uid> has no “read” access to <aid> then GET methods over URI-s of the form api/annotations/<aid> will return error status Unauhtorized access 401, or similar. The same happens if the logged-in user <uid> has no “write” access to <aid> with PUT, POST and DELETE methods over the URI-s of the form api/annotations/<aid> .

The table below describes the behavior of the pair (method, URI), when user <uid> has authorized access to <aid>. Here “authorized access “ means that <uid> has “read” access for GET-methods, and “write” access for PUT, POST, and DELETE methods.

Resource Description
GET api/annotations/<aid>Returns the serialized annotation that has this <aid>.
GET api/annotations/<aid>/bodyReturns the body of the <aid>. It includes the body and some meta-data (the owner, date of creation, the URI-s of the target sources, the lists of readers and writers). Does not include list of notebooks where this annotation belongs to.
GET api/annotations/<aid>/sourcesReturns the list of the <sid>-s of all the target sources of <aid>.
GET api/annotations/<aid>/notebooksReturns the list of the <nid>-s and the names of all the target sources of <aid>.
DELETE api/annotations/<aid>Removes <aid> and all its target sources from the database. Returns the serialized representation of the removed <aid> with the message “the following annotation has been removed” or similar.
PUT api/annotations/<aid>Updates the annotation with <aid>. E.g. it is used when <uid> wants to correct typos in the annotation body AND change annotated fragments. (See PUT api/annotations/<aid>/body for correcting body only.) The serialized representation of the updated annotation is given in the request body.
PUT api/annotations/<aid>/bodyUpdates the body of the annotation <aid>. Used e.g. for correcting typos in the text part. The updated annotation’s body is given in the body of the request.

Sources

A source represents (a specific fragment of) a specific version of an annotatable object. For instance, if an annotatable object is a web-page that has 3 versions and users have annotated versions 1 and 3, then there are 2 sources in the Data Base that correspond to the “web-page”. Naturally, these sources represent versions 1 and 3.

Note that access to the whole document with <aoid> is possible via its <sid>=<aoid>#, with empty fragment descriptor.

Adding sources to the DataBase? and removing them is a responsibility of the DataBase? Management System. In fact, adding a source is a “side effect” of creating an annotation on a certain URI. Moreover, is the source with <sid>=<aoid>@<vid>#XXX is added to the DB, then the source <sid>=<aoid>@<vid># must be added as well, unless it is already in the DB.

If all the annotations that refer to a certain source are deleted, then the DB managing part deletes this source from the DB. A read-only REST API for inspecting Sources (incl. fragments) is needed.

Cached representations are managed by the client, therefore creation and deletion API is necessary. It is possible to store the cashed representation not only of the fragment precisely corresponding to an annotation target source, but of a larger fragment and even of the entire annotatable object.

api/sources

Resource Description
GET api/sources?uri=<aoid>&maxSources=<number>Returns the lists of the <sid>-s of all the sources referring to<aoid>, that is the sources with the <sid>-s of the form <aoid>@XXX#YYY. The length of the list is bound by <number>. The default length (maxSources value) must be provided. Alternatively/additionally, one may use paging to list the sources. Instead of ?uri=<aoid> it may be possible to use other ways of scoping the request GET api/sources, for instance ?uriprefix=URI.
GET api/sources/<sid>/versionsReturns the lists of the <sid>-s (URIs) of all the “sibling”-versions of the <sid>=<aoid>XXXYYY that is the list of <sid>’s of the form <aoid>ZZZYYY
GET api/sources/<sid>/cachedReturns the list of meta-information of all the cached representations of <sid>. The meta-information of a cached representation includes: <cid>,MIME type, subtype (e.g. “screenshot”), size, the tool ID which opens the representation.
GET api/sources/<sid>/cached/<cid>/metadataReturns the meta-nformation of <cid> if it exists.
GET api/sources/<sid>/cached/<cid>/contentReturns the file that is the cached representation with <cid> if it exists.
POST api/sources/<sid>/cachedIt is a multipart POST, with the request body consisting of a description containing the metadata specified by the Cached Representation realization class, e.g., screenshot, and a single file (multiple files must be archived). The description has a form as follows: <cachedrepresentation-description><mime>multipart/related</mime><tool>ToolID01</tool><type>MHTML</type></cachedrepresentation-description> Adds a new cached representation of <sid>, by taking the cached representation from the request body.
DELETE api/sources/<sid>/cached/<cid>Removes the cached representation <cid> given in the body of the request from the list of cached representations of the <sid>. It is removed from the database as well, unless there are no more references to this representation.

Notebooks

api/notebooks

POST api/annotations and PUT /notebooks/<nid>?annotation=<aid>.
ResourceDescription
GET api/notebooks/ownedReturns the list of all notebooks owned by the current logged user.
GET api/notebooks/<nid>/readersReturns the list of <uid> who allowed to read the annotations from notebook.
GET api/notebooks/<nid>/writersReturns the list of <uid> that can add annotations to the notebook.
GET api/notebooks/<nid>/metadataGet all metadata about a specified notebook <nid>, including the information if it is private or not.
GET api/notebooks/<nid>?maximumAnnotations=limit&startAnnotation=offsen&orderby=orderby&orderingMode=1|0Get the list of all annotations <aid>-s contained within a Notebook with related metadata. Parameters: <nid>, optional maximumAnnotations specifies the maximum number of annotations to retrieve (default -1, all annotations), optional startAnnotation specifies the starting point from which the annotations will be retrieved (default: -1, start from the first annotation), optional orderby, specifies the RDF property used to order the annotations (default: dc:created ), optional orderingMode specifies if the results should be sorted using a descending order desc=1 or an ascending order desc=0 (default: 0 ).
PUT /notebooks/<nid>Modify metadata of <nid>. The new notebook’s name must be sent in request’s body.
PUT /notebooks/<nid>?annotation=<aid>Adds an annotation <aid> to the list of annotations of <nid>.
PUT api/notebooks/<nid>/setPrivate=[true, false]Sets the specified Notebook as private or not private.
POST api/notebooks/Creates a new notebook. This API returns the <nid> of the created Notebook in response’s payload and the full URL of the notebook adding a Location header into the HTTP response. The name of the new notebook can be specified sending a specific payload.
POST api/notebooks/<nid>Creates a new annotation in <nid>. The content of an annotation is given in the request body. In fact this is a short cut of two actions:
DELETE api/notebooks/<nid>Delete <nid>. Annotations stay, they just lose connection to <nid>.

APPENDIX 1

For Appendix 1 please see the DOC file of this document. Note that it is obsolete except its Appendix.

https://trac.clarin.eu/raw-attachment/wiki/DASISH/SpecificationDocument/DASISH-Annotator-1.1-snapshot.docx

Attachments (22)