Changes between Version 3 and Version 4 of DASISH/SpecificationDocument


Ignore:
Timestamp:
04/03/13 09:40:58 (11 years ago)
Author:
Przemek
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DASISH/SpecificationDocument

    v3 v4  
    1 Hello.
     1Technical Summary.
    22
    3 The specification is currently being reformatted into the Trac style. Please be patient and for the time being - use the attached DOC file.
     3The aim of this document is to give specifications for a web-annotating tool, which is to be developed within the DASISH project. The tool is a browser extension that allows to annotate fragments of web documents by tags, colors and text notes. The annotatable fragments may be texts and, on the later stages of development, graphical objects as well.
     4
     5Initially the tool will allow to annotate only web-pages.  Later we plan to extend the tool to annotate web-documents generated by linguistic software, e.g. EAF-files, created by ELAN (MPI Nijmegen), or lexical entries created by LEXUS (MPI Nijmegen). We do not want to limit annotatable objects by those generated by DASISH participants and plan to include external linguistic software to our case study.
     6
     7The heart of the class schema of the project is class “Annotation”.  An object of Annotation class is in the “target” relation with one or more objects of class “Source”. Semantics of an Annotation object is defined in its attribute  “Body”.  There are a few types of annotations bodies that express variety of the possibilities to annotate documents, from marking their fragments with simple text tags or colors, to putting arbitrary text notes.
     8
     9TABLE1
     10
     11An example of <sid> is given by the URI
     12 http://tla.mpi.nl/#xpointer(//div[id='post-1157']/p/substring(.,33,3))
     13Here  the part   http://tla.mpi.nl/ is an <aoid> and  the part  xpointer(//div[id='post-1157']/p/substring(.,33,3)) is a <fid>.  Since <vid> is not given, the <sid> refers to the latest version of the resource located at http://tla.mpi.nl/ .
     14
     15<uid> is not mentioned explicitly below, as a parameter in the description of the REST service, because it is known from the session via “Shibboleth” identification procedure. 
     16
     17An owner is either the principal who has created the annotation or a principal to whom the ownership has been assigned.
     18
     19Class Schema
     20The schema is based on the following interfaces and classes: 
     21* class Source represents (a specific fragment of) a specific version of an  annotatable object; it contains information about this version, such as  a time stamp, the lists of references to cashed representations;
     22* class Annotation that contains the references to the annotation’s body (that contains the list of sources which it annotates), also the name of the owner, the lists of “readers” and “writers”;
     23* interface Cached representation is a generic interface to be implemented by different representations of annotatable resources like serialized ones  (e.g. XML-sed), media-files, screenshots;
     24* interface Body (of annotation) (can be text, “like”, color, relation, etc.); contains the reference to the annotation.
     25
     26UML DIAGRAM
     27
     28We propose the following XML-serializations.
     29
     30XML1
     31
     32Note that the MIME type for MHTML is taken from Wikipedia, but there seems to be some discussion about this approach.
     33
     34http://en.wikipedia.org/wiki/MHTML
     35http://stackoverflow.com/questions/31250/content-type-for-mht-files
     36
     37An annotation whose body is a binary relation
     38(in this example  “implies”)
     39The intended meaning of the following example is that source1 implies source2.
     40
     41XML2
     42
     43An annotation whose body is “Note”
     44(see the section about the types of annotations)
     45
     46XML3
     47
     48Note that “full” XML presentations as above may be returned by the corresponding GET methods.  When we want to POST a new annotation then we know less known about it: for instance, it does not have an assigned identifier yet.  We propose the following serialization of a new annotation:
     49
     50XML4
     51
     52== Initial Annotation-Body Types ==
     53
     54In the first prototype we plan to implement only 1-target annotations with the body type “Note”. From the user perspective they are just text notes about fragments of the document a-la comment in Word Documents, but displayed only in a list or as a tooltip (like the Wired Marker currently does). Balloon display as done in MS Word can be implemented in further stage.
     55
     56In general we plan to implement the body types following the class diagram above. Recall that these body types, besides “Notes”, are:  color, tag (a unary relation), labeled tag (a unary relation with parameters), binary relations.  Below we present series of instances of these body types. Implementing these instances within our tool will have two-fold effect:
     57* first, it will serve for user’s convenience by providing a drop-down menu of annotations once a fragment to be annotated is selected,
     58* second, it will show that within the proposed class schema it is possible to create reasonable types of annotations,
     59
     60To create an annotation, user needs to highlight the text and right-click the mouse.  The creation menu should appear near the highlighted text (or on the right sub-panel of the whole panel). There the user can select the type of annotation and add other parameters when necessary.  It may be possible to highlight the second fragment for binary relations using Shift(s).
     61
     62For the existing annotations, left mouse click on the highlighted text triggers a “callout” (or a rectangular box, connected to the text fragment) with a short annotation description. It is applicable for tags and relations (see below). Right mouse click on the highlighted text triggers the context menu that contains the complete information about annotation: its author, date, its URI.
     63
     64
     65 
     66== User Interface prototype ==
     67
     68=== Main window view:
     69
     70UI
     71
     72=== Context menu:
     73
     74MENU
     75
     76== REST API ==
     77
     78Remark on document versioning. Web-documents exist in time, that is different versions of the document may exist under the same URI (<aoid>) in different moments of time. In the first prototype we implement only the simplest necessary handling of the versions of the web-document. In the first implementation we omit REST requests concerning versions and rely on local caching of old versions of annotated sources (as already exists as a feature in Wired_-Marker).
     79
     80All information necessary to fulfill a PUT, POST or DELETE request, such as the URI of an annotated object, is given “serialized” in the request body, but not as request parameters in the request’s URI.  If a POST (PUT, DELETE) method is performed, then in the case of success it returns a serialized information about the added (resp. updated, removed) resource together with a standard HTTP response code. The information includes: the resource ID, owner’s ID, time stamp, (possibly) the list of the <sid>’s of the target sources. For the full information the user will use GET on a just created/ updated annotation, already knowing its ID.  In the case of failure the corresponding error message and error status are returned, e. g, 401 Unauthorized access.  Only “owner” has DELETE rights.
     81
     82=== Annotations
     83
     84TABLE 5
     85
     86api/annotations/<aid>
     87It is assumed, that if the logged-in user <uid> has no “read” access to <aid> then GET methods over URI-s of the form api/annotations/<aid>[/…] will return error status Unauhtorized access 401, or similar. The same happens if the logged-in user <uid> has no “write” access to <aid> with PUT, POST and DELETE methods over the URI-s of the form api/annotations/<aid>[/…] .
     88
     89The table below describes the behavior of the pair (method, URI), when user <uid> has authorized access to <aid>. Here  “authorized access “ means that <uid> has “read” access for GET-methods, and “write” access for PUT, POST, and DELETE methods.
     90
     91TABLE6
     92
     93Sources
     94
     95A source represents (a specific fragment of) a specific version of an annotatable object.  For instance, if an annotatable object is a web-page that has 3 versions and users have annotated versions 1 and 3, then there are 2 sources in the Data Base that correspond to the “web-page”.  Naturally, these sources represent versions 1 and 3.
     96
     97Note that access to the whole document with <aoid> is possible via its <sid>=<aoid>#, with empty fragment descriptor.
     98
     99Adding sources to the DataBase and removing them is a responsibility of the DataBase Management System.  In fact, adding a source is a “side effect” of creating an annotation on a certain URI.  Moreover, is the source with <sid>=<aoid>@<vid>#XXX is added to the DB, then the source <sid>=<aoid>@<vid># must be added as well, unless it is already in the DB.
     100
     101If all the annotations that refer to a certain source are deleted, then the DB managing part deletes this source from the DB. A read-only REST API for inspecting Sources (incl. fragments) is needed.
     102
     103Cached representations are managed by the client, therefore creation and deletion API is necessary. It is possible to store the cashed representation not only of the fragment precisely corresponding to an annotation target source, but of a larger fragment and even of the entire annotatable object.
     104
     105api/sources
     106
     107 
     108 
     109
     110
     111
     112
     113
     114
     115
     116
     117
     118
     119
     120
     121 
     122
     123
     124
     125