DASISH/DiscussionPage – CLARIN Trac

Context Navigation

<Stephanie+Olof> [2013-04-25]

Question (TLA team): How are versions and fragments represented/encoded in "internal" identifiers and URIs in Wired-Marker?

Answers:

Versioning: There is no real versioning system in Wired-Marker where versioning numbers (or the like) are stored. Instead, - and as long as Auto-caching is enabled in the settings dialogue - pages with markers (i.e. ~ 'annotations') are cached, i.e. saved as html + css files locally (OS X, default path: /Users/user name/Library/Application Support/Firefox/Profiles/profile name/WiredMarker/cache/. When using the 'Cache - Open cache' Wired-Marker function, users can thus get back to a physical 'snapshot' of the original state of the respective web page, which is a local copy though. These snapshots are established according to the URL + date + time range at the time of making an annotation on a specific web page. These data are used for creating a folder structure with a naming convention where among others date and time stamp information is included (example: ../WiredMarker/cache/2013/04/24/11/1a6beade73b701420ce943d404afd8ff/20130424111843). These are in turn checked against the the doc_url (example: http://dasish.eu/dasishevents/) and oid_date (example: 04/24/2013 11:18:43) fields of the marked (annotation) object in the local sqlite database, which makes the mapping between an annotated item and its locally cached parent web page version possible.

Fragments: As far as representation of fragments is concerned, Olof has pointed out earlier on that Wired-Marker uses the "hyperanchor" format. This format is a concept of an extended form of URL that includes additional information (position or range in a web page, modification style, etc.). Included is also a version number which refers to the hyperanchor format version (example: #hyperanchor1.3:). For putting together the hyperanchor format path, the XPath syntax is used.

'Real example' for 'hyper-anchor code' from Wired-Marker: http://dasish.eu/dasishevents/#hyperanchor1.3%3A%2F%2Fdiv%5B%40id%3D%26quot%3Bwrapper-crumb-trail%26quot%3B%5D%2Ffollowing-sibling%3A%3Ap%5B1%5D(33)(3)(eve)%26%2F%2Fdiv%5B%40id%3D%26quot%3Bwrapper-crumb-trail%26quot%3B%5D%2Ffollowing-sibling%3A%3Ap%5B1%5D(71)(3)(ect)

In URL decoded format: http://dasish.eu/dasishevents/#hyperanchor1.3://div[@id="wrapper-crumb-trail"]/following-sibling::p[1](33)(3)(eve)&//div[@id="wrapper-crumb-trail"]/following-sibling::p[1](71)(3)(ect)

We have implemented a transformation of this format to an XPointer: http://dasish.eu/dasishevents/#xpointer(start-point(string-range(//div[@id=’wrapper-crumb-trail’/following-sibling::p[1]/text()[1],'',33)) /range-to(string-range(//div[@id=’wrapper-crumb-trail’]/following-sibling::p[1] /text()[1],'',77)))

This format can also be converted back to the hyperanchor format (if it uses a start-point and a range-to). The "hyperanchor" format has been developed by the same company that also was involved in developing the Wired-Marker Firefox add-on. The encoding is described in detail on their 'Technical Notes' web site: http://www.hyper-anchor.org/en/technical.html, http://www.hyper-anchor.org/en/technical_format.html, http://www.hyper-anchor.org/en/technical_create.html, http://www.hyper-anchor.org/en/technical_example.html. The hyperanchor format also stores style information, like e.g. background-color, that is used to style the marked up string or node (= fragment). According to the mapping solution we now have in mind, this will be transferred to the annotation body (as inline CSS in XHTML format), like shown below:

sample-annotation.xml

<annotation ...>
...
   <body xmlns:xhtml="http://www.w3.org/1999/xhtml" type="Note" xml:lang="en">
      <xhtml:p style="background-color:#ff5555;">this is another annotation</xhtml:p>
   </body>
...
</annotation>

<Olha> [2013-04-22] I will try to sum up MPI's position in brief.

-- Regarding the structure of annotation bodies (and binary relations in particular). For now the schema allows to put any xml in the body. This is our intention for now, not to make a rigid schema for particular kinds of body. Now binary relations we have in the examples are just reasonable examples. Defining their structure is not a priority task at the moment. I hope this answers a few questions of Stephanie and Olof below.

-- Regarding xml:id. Yes, we know that @ and # are not allowed, that's why I took them out from the id-s in the newest examples. We think that allowing @ and # are not worth efforts on making our own "dasish:id" and checking its uniqueness, etc.

-- Stephanie: "it will not be possible to store multiple binary relations (Appendix I, R(A,B): Implies, Equivalent, Implies the opposite, Contradicts) with one and the same annotation." My personal opinion: I would not like to have multiple relations on the same annotation, it seems a bit messy. However, the core model allows for such a body type so it is primarily up to the client whether this gets supported.

-- Stephanie about binary relation "different": "Do you intend to have an expandable list for the binary relations?". My personal opinion: yes, in principle I would like to have such a list. However, we may think on it later on. See the first item above: for now everything can be in the body.

-- Within these days we will elaborate our common opinion on Olof's question about versioning: "According to the latest draft, the targetSource element is to contain URI and versionString elements. On the other hand, the parent targetSource element has the attribute xml:id with a value of SID. According to the technical summary, sid contains both aoid, i.e. the URI of an annotatable object outside the DB, and vid, i.e. version identifier (if not omitted and thus being equivalent to the latest version). So we wonder why you put that in and what the benefit of these two elements would be."

<Stephanie> [2013-04-16] Olof took up the following issue before, but I would like to note it down here, just to make sure. According to the current design model, it will not be possible to store multiple binary relations (Appendix I, R(A,B): Implies, Equivalent, Implies the opposite, Contradicts) with one and the same annotation. Rather, a new annotation needs to be posted if a writer wants to add more relations to (e.g.) an annotated string on a web page that already holds one single binary relation. Is this really our intention?

Also, in the appendix, only 4 possible binary relations are listed (cf. the list above), whereas two of the scenario XML files on <https://trac.clarin.eu/wiki/DASISH/XSD%20and%20XML> contain the binary relation "different" (Responding GET api/annotations/AID01, Responding GET api/annotations/AID02). Do you intend to have an expandable list for the binary relations? In that case, I would suggest adding some form of hint to the corresponding appendix section.

Furthermore, in the above named sample xml files the element tags <this ...> and <that ...> are used. According to the UML diagram and earlier xml examples the tags <from ...> and <to ...> are to be used. Validation still results in "Document is valid" due to the use of xs:any processContents="lax" for the Body complexType in the XML schema file. However, both Olof and I are of the opinion that we ought to stick with "from" and "to" - with usage either as xml tags or as xml attributes (cf. the below point of discussion).

<Stephanie+Olof> [2013-04-12] Concerning the proposal for the XML-serialization of an "annotation whose body is a binary relation" (see Specification Document) we would like to remind of our discussion input from one of the earlier Skype meetings in January 2013 (cf. Olof's e-mail from Januari 18th). We refer to the current draft on <https://trac.clarin.eu/wiki/DASISH/SpecificationDocument> while resuming the discussion.

It might be appropriate to use from and to xml attributes instead of tags for the relation tag. This seems to make more sense semantically, not least because it would mean that we could follow the style of the ref attributes in e.g. reader and writer.

...
<body type="relation">
        <relation from="SID01XX" to="SID01YY">implies</relation>
</body>
...

According to the latest draft, the targetSource element is to contain URI and versionString elements. On the other hand, the parent targetSource element has the attribute xml:id with a value of SID. According to the technical summary, sid contains both aoid, i.e. the URI of an annotatable object outside the DB, and vid, i.e. version identifier (if not omitted and thus being equivalent to the latest version). So we wonder why you put that in and what the benefit of these two elements would be.

As regards the draft for the "serialization of a new annotation", we would like to point out that the value of the xml:id attribute should preferably be sid = <aoid>@<vid>#<fid>, instead of using a temporary sid. This would also entail that the child elements URI and versionString would become dispensable.

In connection to these considerations we encountered another issue: According to the XML specification the special XML namespace attribute xml:id cannot contain certain characters like e.g. ':', '/' or '[' (W3C: not an NCName). This means that such an attribute cannot be used for a value of sid as described in the technical summary (conflicts with uri:s and xpath expressions). Therefore, we propose to define an attribute by name 'id' within the default DASISH namespace (without the strict XML namespace rules for which characters are valid) and to use it instead of the xml:id attribute.

current draft (new annotation - POST operation)

...
<targetSource xml:id="tempSID1">
	<URI>http:/tla.mpi.nl#XX</URI>
	<versionString>1.5</versionString>
</targetSource>
...

might thus be changed to

...
<targetSource id="http:/tla.mpi.nl@1.5#XX" />
...

<Olof> In the examples in the specification document <targetSource> is in some cases identified by "xml:id" and also a "source" on page 7. Is there a reason for having a temporary id as an xml:id and not the whole uri+fragment in the relation reference?. For the client it´s important to know where to look for the URI to be able to get the fragment containing the start/end for the annotation.

<Twan> I think indeed this could be solved without the xml:id; the source URI should already be a unique ID.

<Menzo> Yes, and if the relation body is serialized in RDF it has to be URIs instead of IDs.

<Olof> I think we need to establish an XSD schema as soon as possible for the representation of the serialized objects and a few close to real life examples. This will be a good exercise for finalizing the exchange format, for finding other issues and ideas for future improvements.

<Twan> Yes, I agree. Is this something we were going to do?

<Menzo> Yes, I think the plan was that we would create more XML examples including schemas. We didn't decide on a schema language ... unfortunately XSD is still the prevalent one ;-)

<Stephanie> Would be helpful if the future examples as well as the schema definition files could be provided in real file format (XML, XSD) directly, instead of just as images in the specifications document. We propose to use a new subdirectory structure within the back-end directory on Clarin SVN to this end.

<Olof> In the example (page 3) identification of the annotated node (start/end) is made by using xpointer (fid), Wired-Marker uses hyperanchor <http://www.hyper-anchor.org/en/technical_format.html> and it works in a somewhat different way. Hyperanchor allows storage of color etc. and requires less work effort and no changes in Wired-Marker. Is xpointer important for identifying the annotation, and if so is conversion from hyperanchor to xpointer required for the client?

<Twan> I haven't looked into hyper anchor but I don't see why xpointer would be a hard requirement as long as it does the job of reliably referencing a fragment. I'm a bit worried about storing properties of the annotation (such as colour) in the fragment identifier though.

<Menzo> I agree storing (some of) the annotation info in the fragment identifier would be wrong. The fragments identifier should only contain information to pinpoint a specific fragments in a source, not additional rendering or annotation info.

I don't know hyper anchor. In principle I would go for a standard where possible, e.g., XPointer or one of the W3C recommended media fragments. Also because WiredMarker? is only one of the tools that could do something with the annotations, i.e., when possible don't bind it too tight to WiredMarker? but keep it generic. But I guess this can only be a guideline, annotation bodies and cached representations could be tool specific, also there will be very specific fragments (LEXUS lexical entries, ELAN annotations; but those might still be referenced with a standard fragment identifier (Xpointer?)).

<Olof> It would be a big change in the Wired-Marker source to replace hyperanchor, we need to make a mapping for input to and output from the client. #291

<Olof> When retrieving more than one source or annotation we need an wrapper, this should be specified in the REST-specification. For ease this wrapper should always be used, even if the result is one object. This wrapper should also contain information about the total amount of objects if it is larger than the pre-defined maximum of objects returned. This is important for the specified paging e.g. "GET api/sources?uri=<aoid>&maxSources=<number>" on page 14. We also suggest to replace the maxSources with a more general parameter (e.g. "max") that can even be used for limiting the maximum amount of annotations returned. Furthermore, we need a param for start to be able to get the next set of objects. E.g.:

...

</result>

<Twan> I think this is fine as well, I would propose to add it to the API specifications like this.

<Menzo> Sounds good :-)

Last modified 11 years ago Last modified on 04/25/13 09:09:13

Download in other formats:

Plain Text