Changes between Version 144 and Version 145 of DASISH/XSD and XML


Ignore:
Timestamp:
12/18/13 12:57:33 (10 years ago)
Author:
olhsha
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DASISH/XSD and XML

    v144 v145  
    538538
    539539= Issues with the schema =
    540 == Possible namespace pollution: Ticket 348 to be discussed with Peter  ==
    541 
    542 See [https://trac.clarin.eu/ticket/348]
    543 
    544 Peter:
    545 "It looks like there might be some namespace pollution or some other anomaly that causes the jaxb a
    546 uto generated classes to omit the getter for notebooks from the ObjectFactory. This is an issue
    547 when a jaxb root node is required, such as in the rest interface. A work around has been added
    548 which makes it clear where the issue is and why the ObjectFactory is required, but this needs to
    549 be replaced when the schema is updated."[[BR]]
    550 [[BR]]
    551 ''Stephanie:'' If indeed, the issue lies with the plural forms of complexType names (used for lists of resources) in the schema, we think that, principally, the plural forms can - and should - be replaced more or less immediately, e.g. by adding the suffix "{{{List}}}" or "{{{Collection}}}" instead of plural "s". For instance, the serialized xml output could then match the following snippet
    552 {{{
    553 <annotationCollection> 
    554    <annotation>
    555       <!-- elements of annotation -->
    556    </annotation>
    557 </annotationCollection>
    558 }}}
    559 
    560 
    561  
    562 By the way, speaking of diverse "JAXB plural issues", there were some hints on the web regarding (inline and external) Java binding customization for JAXB, - it might be that this would also be a viable solution approach. Maybe this thread is worth checking as well: http://stackoverflow.com/questions/11943487/please-advise-the-best-pattern-for-serialising-jaxb-lists?
    563 
    564 ''Olha'': we agree.  See also the updated comments for the ticket 348.
    565 
    566 As regards all future schema updates, please make sure to remember to update all available scenario xml documents on trac.clarin.eu accordingly, and preferably, with as little delay as possible. It might also be a good idea to validate all of these documents against the revised schema with a reliable XML parser like e.g. the Xerces-J XML parser. Also, if you like, you can check by validating some of our current "real-life" mock xml documents that we have been using recently for client development ([source:DASISH/t5.6/client/trunk/chrome/markingcollection/content/markingcollection/annotator-service/test/mockjax/mocks]).
    567 
    568 ''Olha'': sure. We have not made any changes yet (before getting you opinion). Unfortunately, I will be away 27/08 - 14/09 due to urgent family circumstances. I hope that we can wait till I'm back.
    569 
    570 === Resolution ===
    571 
    572 The schema is to be fixed. The current hypothesis is that JAXB get confused by plural forms used in the schema in type names for lists of resources,
    573 e.g. Annotation --> Annotations, Notebook --> Notebooks etc. Replace plural forms with other wording and check if it fixes the problem.
    574 
    575 
    576 == New-Or-Existing-Source-Info JAXB-generated class ==
    577 
    578 This class corresponds to the choice sub-schema within the schema-type for annotation, which tells if a target source is new or old. This means that
    579 the client provides the server with this information.
    580 
    581 It is ok, but it makes code a bit hairy. Can we remove the choice and let the backend todecide if the source is new or old, simply by looking through all the sources
    582 to find the one with given external-id/uri.  If it is not found then the source is new.
    583 [[BR]][[BR]]
    584 ''Stephanie:'' Both Olof and I are somewhat unsure whether it is really necessary to differentiate between new and existing sources when sending POST, PUT, DELETE and GET requests from within the client, especially if the backend implementation is going to be changed to simply check whether the given URI already exists in the DB or not.
    585 Could you please rethink whether we actually need to keep this concept in the schema document, DASISH-schema.xsd? This decision also has consequences for how the behavior of the client needs to be tuned (cf. [source:DASISH/t5.6/client/trunk/chrome/markingcollection/content/markingcollection/annotator-service/test/mockjax/mocks POST/GET] mock docs named above).
    586 
    587 ''Olha'': we agree with you. See resolution below.
    588 
    589 === Resolution ===
    590 
    591 The schema must be simplified. It is up to server to decide if a  source is new or old. See ticket  https://trac.clarin.eu/ticket/362 for more detail.
    592 
    593 
    594 ==    External_id (Data Base) vs URI (schema) vs UUID-based class (Java code) ==
    595 
    596 For the time being I treat {{{extrnal_id}}} and the URI as "the same":  URI is external_id. Both are strings.
    597 
    598 In the Java-code there are classes {{{{CachedIdentifier, VersionIdentifier, SourceIdentifier, AnnotationIdentifier, NotebookIdentifier, UserIdentifier}}}
    599 encapsulating  UUI.  Any such class  has a string field "identifier" (corresponding to external_id) plus hash, plus internal constants for hash.
    600 [[BR]][[BR]]
    601 ''Stephanie:'' Apropos this subject, could you please clarify whether you intend to use "external-id (Ticket #362: externalID, see even discussion topic above)" for the database only? You use the wording "For the time being", which made us wonder whether the statement "URI is external_id" will be true even in the longer perspective, or, if not, where / how the externalID is going to be put in from the schema side, an thus within the context of the resulting, serialized xml documents? Furthermore, why is the nomenclature "external_id" used? Won't the value of URI always be an internal id, like e.g. http://dasish.eu/annotations/AID20130808114716, that needs to be created and set via a corresponding backend functionality?
    602 
    603 ''Olha'': The term "external_id"  comes from the Data Base  and is used for communication with the client.  It is a string which satisfies UUID-format, e.g. "00000000-0000-0000-0000-000000000003". The URI in the xsd is of the form  like you mentioned above  e.g. http://dasish.eu/annotations/00000000-0000-0000-0000-000000000003. An external Id is generated by the backend whenever a resource is recognized as new in the  corresponding "add"-method. It is sent to the client as a respond to the "Add" request. Just to be sure, we definitely intend to keep it this way although there should be no need for the client to be aware of the conceptual distinction between 'external id' and the URI, in other words the latter can be considered the public identifier for the client.
    604 
    605  "Internal_id" in the database is not visible to a client.  It is "next-in-the-table"  number generated by the database whenever the corresponding resource is added. It is a primary key of the resource. It is used for joint tables (e.g. when you connect an annotation to a source) and for quick search by key, since it is just a number, not a string.
    606 
    607 Also, the ticket 371 mentioned below is fixed.
    608 
    609 === Resolution ===
    610 
    611 The schema stays intact. For the back-end code: the URI must be of the form  "http(s):/<dasish-server>/externalID.  See https://trac.clarin.eu/ticket/363 for more detail.
    612 
    613 Moreover, all {{{{CachedIdentifier, VersionIdentifier, SourceIdentifier, AnnotationIdentifier, NotebookIdentifier, UserIdentifier}}} must be removed. We will be using just  UUID type
    614 for all  identifiers of any sort of resource. E.g.  UserIdentifier userIdentifier = New UserIdentifier("00000000-0000-0000-0000-000000000003") will be replaced with
    615 UUID userIdentifer = UUID.fromString("00000000-0000-0000-0000-000000000003"). It is the ticket  https://trac.clarin.eu/ticket/371. 
    616 
    617 
    618 
    619 == Body: must be some serialization/deserialization mechanism ==
    620 
    621 -- "body" in the DB is just a text
    622 
    623 -- "body" in JAXB-generated  class is a list of objects
    624 
    625 For now, I use simple "serialize" and "deserialize" Helpers' procedures  which should be replaced by some proper marshalling-demarshalling. For simple serialization I treat the first element of the list
    626 of objects above  as a text  whcich corresponds to the DB column "body_xml".
    627 
    628 === Resolution 18/09 ===
    629 
    630 The schema  is updated (contrary to what was stated before 18/09). See https://trac.clarin.eu/ticket/364 for more detail, the comment on it.
    631 [[BR]][[BR]]
    632 ''Stephanie:'' It might be preferable to abandon the helper methods in favor of using JAXB's marshal() and unmarshal() methods together with a(n) {{{Marshaller}}} / {{{Unmarshaller}}} object. Can JAXB's {{{ObjectFactory}}} methods be of any use as an alternative way to access XML data? Furthermore, you might want to have a look at Javax.xml.bind.JAXBIntrospector.getValue(Object jaxbElement). Please see: http://www.tutorialspoint.com/java/xml/; http://www.oracle.com/technetwork/articles/javase/index-140168.html; http://docs.oracle.com/javase/tutorial/jaxb/intro/index.html; http://jaxb.java.net/tutorial/ - in case you haven't come across these web resources before.
    633 
    634 "Olha": the helpers "serialize/deserialize" are removed. We have to discuss at Goteborg our present solution
    635 
    636 {{{
    637 <xs:complexType name="AnnotationBody">
    638 <xs:simpleContent>
    639 <xs:extension base="xs:string">
    640 <xs:attribute name="mimeType" type="xs:string"/>
    641 </xs:extension>
    642 </xs:simpleContent>
    643 </xs:complexType>
    644 }}}
    645 
    646 The database is changed so that the column "body_xml" in the table "annotation" is replaced with two ones: "body_text" and "body_mimetype". The corresponding adjustments in the code and in the unit tests are done. So far, no special serialization menachisme is used.
    647 
    648 The question: what does the client need to do to be able to represent the annotation body properly?
    649 
    650 == Source ==
    651 
    652 Misprint in timeStamp: timeSatmp.
    653 
    654 === Resolution ===
    655 
    656 The schema must be fixed.  See https://trac.clarin.eu/ticket/365
    657 
    658 == Cached Representation Info ==
    659 
    660 Missing in the schema: the attribute/elememt "where_is_the_file" which actually points to the location
    661 where the file can be download. It is necessary to fulfill 
    662 
    663 GET api/sources/<sid>/cached/<cid>/content
    664 
    665 === Resolution ===
    666 
    667 The schema stays intact. For the back-end (and the database!):  the cached representation for now should be stored in the database
    668 as a BLOB. See: http://dba.stackexchange.com/questions/803/blobs-or-references-in-postgresql/815#815.
     540
     541As regards all future schema updates, please make sure to remember to update all available scenario xml documents on trac.clarin.eu accordingly, and preferably, with as little delay as possible. It might also be a good idea to validate all of these documents against the revised schema with a reliable XML parser like e.g. the Xerces-J XML parser. Also, if you like, you can check by validating some of our current "real-life" mock xml documents that we have been using for client development ([source:DASISH/t5.6/client/trunk/chrome/markingcollection/content/markingcollection/annotator-service/test/mockjax/mocks]).
     542
     543
     544== Cached Representation BLOB  ==
     545
     546For the back-end (and the database!):  the cached representation for now is stored in the database as a BLOB. See: http://dba.stackexchange.com/questions/803/blobs-or-references-in-postgresql/815#815.
    669547
    670548Also, see https://trac.clarin.eu/ticket/366
     
    677555
    678556''Olha, Menzo, Twan:'' If that's relatively easy for you to do, that would probably suffice in the initial phase. For a more complete solution (i.e. including images, scripts etc) you could perhaps consider [https://en.wikipedia.org/wiki/MHTML MIME HTML] or alternatively something like [https://en.wikipedia.org/wiki/Mozilla_Archive_Format the Mozilla Archive Format]. Cross-browser support could be a bit of an issue there but the situation does not look too bad.
    679 == Version ==
    680 
    681 MISSING in the schema:  attribute URI (corresponding to the external_id in the DB) is absent. Therefore it does not appear
    682 in the  JAXB-generated class "Version" and the java class has one attribute less than the DB table "version"
    683 
    684 {{{
    685 CREATE TABLE version (
    686     version_id SERIAL UNIQUE NOT NULL,
    687     external_id UUID UNIQUE NOT NULL,
    688     version text,
    689 );
    690 }}}
    691 
    692 For now I'm using attrribute "version:String" now to keep "external_id/URI" in the java
    693 class "Version".
    694 
    695 === Resolution ===
    696 
    697 Fix the schema: URI must be added to the version-type. The ticket: https://trac.clarin.eu/ticket/367
    698 
    699 ==  LISTS of Resources, like "PermissionS" and "CachedRepresentationS",
    700 and version-siblingS connected to a particular source
    701 cannot be standalone tables in the relational DB ==
    702 
    703 According to the schema: a list of Cached representations is declared as a standalone resource
    704 of type "CachedRepresentations". Every version referres to its own list of cached representations.
    705 Every such list has it own ID.
    706 
    707 According to the Rel. database: it looks a bit strange to have such lists. Instead, i have made
    708 a common joint table (verson_id, cached_representation_id). A pair (a, b) is listed in this table iff
    709 the version with  the internal id "a" has cahced representation with intrenal id "b".
    710 
    711 === Resolution ===
    712 
    713 Fix the schema: all lists of resources are artifacts, they do not have URI's. Ticket: https://trac.clarin.eu/ticket/368
    714 [[BR]][[BR]]
    715 ''Stephanie:'' When removing the URI attribute for lists of resources, please make sure that the integrity of the references is still guaranteed.
    716 
    717 ''Olha'': lists are not stored in the DB but generated by the backend on the fly when they are needed. For instance, once the client wants to GET an annotation (whose schema contains the list of source-infos), the backend looks up the joint table "annotations_resources" and the table "source"  and  generates a list of resources for the annotation.
     557
    718558