wiki:CMDI 1.2/Schema sanity/Summary

Version 4 (modified by Menzo Windhouwer, 11 years ago) (diff)

--

Schema sanity improvements in CMDI 1.2: Executive summary

This page provides an executive summary of the issue and proposed solution fully described in CMDI 1.2/Schema sanity.

Issue description

In CMDI several XML schemas play a role. A general fixed schema, the general component schema, determines how a profile or component is specified. Throughout the years several developers have worked on this schema and made different design decisions, e.g., what should be an element and what should be an attribute, what should be capitalized or not. The upgrade to CMDI 1.2 is a chance to clean this up.

Based on the profile and component specifications a profile specific XSD is generated to validate instances of this profile. In CMDI 1.1 all profiles use the same namespace. This simple approach leads to problems with the basic assumptions about XML, namespaces and schemas in the world outside of CLARIN. For example, the OAI-PMH protocol, also used by CLARIN but specified by the Open Archived Initiative demands that only one schema is associated with a metadata prefix. But CMDI metadata comes with many schemas, for each profile a different one. Other tools, i.e., Xerces2-J, assume (supported by the XSD recommendation) that a namespace indicates an unique schema and use the namespace for caching purposes. Xerces is commonly used and this problem thus appears in other tools as well. For example, eXist-db runs into validation problems with CMD records that use different profiles. Another parser, schema-aware Saxon-EE (the commercial version of Saxon-HE), behaves similar to Xerces2-J and also caches the schemas based on namespaces.

Description of proposed solution

The cleanup of the general component schema is to return to, as far as possible, its original approach where information was expressed as Attributes. Next to that some other small cleanup, e.g., removing unnecessary prefixes, is also planned.

The single namespace approach of CMDI 1.1 is to be replaced by a general namespace for the CMDI envelope and profile specific namespaces for the payload. The impact of that can, when needed, be limited due to common facilities to either skip or delete namespaces, e.g., XPath 2.0 allows wildcards to match any namespace.