Context Navigation

Namespaces

Timestamp:: 02/18/14 20:33:19 (10 years ago)
Author:: oschonef
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CMDI 1.2/Schema sanity/Namespaces

-                      v11
+                      v12
 The overall approach of CMDI is to define a set of (metadata) components, that get compiled into profiles. Authors of components are free to choose the (inner) structure and the names for their components on their own. Two independent metadata authors can easily come up with Profiles, that contain components that uses the same name, but a different structure, i.e. one `Creator` component, that just has no further inner structure and another `Creator` component, that is further structured in `Name`, `Organization` and `Email`.
 The CMDI profiles will be compiled into XML schema documents and components become XML elements. In the current implementation, CMDI puts all XML elements into one ''generic CMDI XML namespace'', i.e. the two `Creator` elements are identified by the same QName. However, these two `Creator`s conceptually different things, cause they have a different structure and thus, their XML representation have conflicting content models.
+The CMDI profiles will be compiled into XML schema documents and components become XML elements. In the current implementation, CMDI puts all XML elements into one ''generic CMDI XML Namespace'', i.e. the two `Creator` elements are identified by the same QName. However, these two `Creator`s conceptually different things, cause they have a different structure and thus, their XML representation have conflicting content models.
 As long as XML instances created from these XML Schema are not used in together in one context, things work. But if used together, this is a recipe for problems. For example consider the following scenarios:
  * XML Parsers: an XML Parser parsing and validating a batch of CMDI instances from various profiles can cache an internal representation of a parsed XML schema based on XML Namespaces to speed up the processing. However, if it first caches an XML schema, that contains no "Creator" elements, it will reject all instances, that happen to contain "Creator"s. Or, if it caches the "flat" `Creator`, it will reject the structured `Creator`. \\
+   Tools that might use such a processing model are VLO, Metadata Assement Services or other repositories aggregating metadata from various sources.
+ * XML Binding: frameworks to map from XML to Objects can fail on CMDI profiles, because all profiles are in the same XML namespace, thus a programs fail if they encounter CMDI instances from a different profile.
+ * Using CMDI in other contexts, i.e. embedded in other protocols. One example is OAI-PMH. CLARIN requires centers to make their metadata available for harvesting by means of OAI-PMH. OAI-PMH required to link each metadataPrefix to an ''unique'' XML schema [http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces Section "3.4 metadataPrefix and Metadata Schema"]: it defined "The XML namespace URI that is a global identifier of the metadata format" and "The metadata schema URL - the URL of an XML schema to test validity of metadata expressed according to the format". As long as centers only use one profile, they are fine. However, if their repository contains CMDI records from various profiles, it's harder to select an appropriate schema to announce via OAI-PMH. (NB: the only solution is to use the ''minimal'' CMDI XML schema, but very few centers currently do that. Furthermore, to also validate the structure below the `Components` elements, more has to be done)
+   For example, Xerces-J has the following comment in org.apache.xerces.impl.xs.XMLSchemaValidator.java:1564 ff
+    "store the external schema locations. they are set when reset is called, so any other schemaLocation declaration for the same namespace will be effectively ignored. because we choose to take first location hint available for a particular namespace."
+  The comment is in the `reset()` method of the !SchemaValidator. This hints, that the parser might cache schemas based on XML Namespace names, if instances of the parser get re-used.
+ * XML Binding: frameworks to map from XML to Objects can fail on CMDI profiles, because all profiles use the same XML Namespace, thus a programs fail if they encounter CMDI instances from a different profile.
+ * XML Databases: native XML Databases might also run into a the XML schema caching issue, if validation is turned on. The popular eXist-db uses Xerces-J under the hood, so the above mentioned issue with Xerces-J may also automatically lead to problems with eXist-db.
+ * Using CMDI in other contexts, i.e. embedded in other protocols. One example is OAI-PMH. CLARIN requires centers to make their metadata available for harvesting by means of OAI-PMH. OAI-PMH required to link each metadataPrefix to an ''unique'' XML schema [http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces Section "3.4 metadataPrefix and Metadata Schema"]: it defined "The XML namespace URI that is a global identifier of the metadata format" and "The metadata schema URL - the URL of an XML schema to test validity of metadata expressed according to the format". As long as centers only use one profile, they are fine. However, if their repository contains CMDI records from various profiles, it's harder to select an appropriate schema to announce via OAI-PMH.
 A clean solution to avoid these problems it to have a general CMDI XML namespace for the ''wrapper'' elements of an CMDI instance (Header, ...) and a profile specific namespace for the elements below the `Components` elements. The following approaches could be used to lessen the impact of this change for users who don't care for Namespaces: