wiki:CMDI 1.2/Schema sanity/Consistency

This page is a subpage of CMDI 1.2

General component schema consistency

The issue

Multiple developers worked on the general component schema and they made different choices in XML schema design, e.g., what should be an element and what should be an attribute, what should be capitalized or not. A component specification instance example shows this most clearly:

<CMD_Component CardinalityMax="1" CardinalityMin="0"
  ConceptLink="http://www.isocat.org/datcat/DC-2520"
  ComponentId="clarin.eu:cr1:c_1271859438118" name="Description">
    <CMD_Element Multilingual="true" CardinalityMax="1" CardinalityMin="1"
      ValueScheme="string" ConceptLink="http://www.isocat.org/datcat/DC-2520"
      name="Description">
        <AttributeList>
            <Attribute>
                <Name>LanguageID</Name>
                <Type>string</Type>
            </Attribute>
        </AttributeList>
    </CMD_Element>
</CMD_Component>

All the information on a component or an element is captured by attributes. But all information on an Attribute is captured using nested elements. @name is lower case opposed to the Name element. When transforming a profile or component specification one has to be aware of these different approaches taking at different parts of the schema.

Proposed solutions

Make a choice between attributes and elements, or allow a blend. Also cleanup names in the process, i.e., to capitalized.

First solution: metadata in attributes, data in elements

Elements have some advantages:

  • even if you have one single atomic value now its possible in the future to support multiple values, e.g., descriptions in multiple languages,
  • in the future its possible to add some metadata to values, e.g., to make units explicit
  • if the value has internal structure these could be exposed in the future by nested elements (or metadata attributes), e.g., make the registry the ConceptLink points to explicit

A, rather vague, rule of thumb is does an user which sees just the text of the XML document get all the relevant textual data. In this case one wouldn't see the nesting but should see at least names and descriptions of components, elements, attributes and values. Following that reasoning @name and @description would become elements, or only ComponentId stays an XML attribute.

Pros

Basic structure can become more apparent in the XML file.

Cons

There still a blend of attributes and elements, which might not be intuitive to every user. What's data or what's metadata depends heavily who is reading the document and for what purpose.

Centre impact

  • Any tool that interacts directly with the XML specification of a profile or component, which should be mainly the CMD Infrastructure tools itself
  • No impact on CMD profile instances

Implementation examples

<CMD_Component CardinalityMax="1" CardinalityMin="0"
  ConceptLink="http://www.isocat.org/datcat/DC-2520"
  ComponentId="clarin.eu:cr1:c_1271859438118">
    <Name>Description</Name>
    <CMD_Element Multilingual="true" CardinalityMax="1" CardinalityMin="1"
      ValueScheme="string" ConceptLink="http://www.isocat.org/datcat/DC-2520">
        <Name>Description</Name>
        <AttributeList>
            <Attribute>
                <Name>LanguageID</Name>
                <Type>string</Type>
            </Attribute>
        </AttributeList>
    </CMD_Element>
</CMD_Component>

Discussion

Discuss this solution proposal in this section

Second solution: elements only

Use elements by default, i.e., all attributes for CMD components and elements become XML elements.

Pros

XML elements have more possibilities to be adapted in the future.

Cons

The hierarchy becomes a bit more convoluted.

Centre impact

  • Any tool that interacts directly with the XML specification of a profile or component, which should be mainly the CMD Infrastructure tools itself
  • No impact on CMD profile instances

Implementation examples

<CMD_Component>
    <CardinalityMax>1</CardinalityMax>
    <CardinalityMin>0</CardinalityMin>
    <ConceptLink>http://www.isocat.org/datcat/DC-2520</ConceptLink>
    <ComponentId>clarin.eu:cr1:c_1271859438118</ComponentId>
    <Name>Description</Name>
    <CMD_Element>
        <Multilingual>true</Multilingual>
        <CardinalityMax>1</CardinalityMax>
        <CardinalityMin>1</CardinalityMin>
        <ValueScheme>string</ValueScheme>
        <ConceptLink>http://www.isocat.org/datcat/DC-2520</ConceptLink>
        <Name>Description</Name>
        <AttributeList>
            <Attribute>
                <Name>LanguageID</Name>
                <Type>string</Type>
            </Attribute>
        </AttributeList>
    </CMD_Element>
</CMD_Component>

Discussion

Discuss this solution proposal in this section

Third solution: attributes only

Use attributes by default, i.e., all elements nested under for CMD attributes become XML attributes.

Pros

This is the original setup, i.e., in line with the oldest parts of the schema.

Cons

XML attributes don't provide much facilities to accommodate future change.

Centre impact

  • Any tool that interacts directly with the XML specification of a profile or component, which should be mainly the CMD Infrastructure tools itself
  • No impact on CMD profile instances

Implementation examples

<CMD_Component CardinalityMax="1" CardinalityMin="0"
  ConceptLink="http://www.isocat.org/datcat/DC-2520"
  ComponentId="clarin.eu:cr1:c_1271859438118">
  Name="Description">
    <CMD_Element Multilingual="true" CardinalityMax="1" CardinalityMin="1"
      ValueScheme="string" ConceptLink="http://www.isocat.org/datcat/DC-2520"
      Name="Description">
        <AttributeList>
            <Attribute Name="LanguageID" Type="string"/>
        </AttributeList>
    </CMD_Element>
</CMD_Component>

Discussion

Discuss this solution proposal in this section

Tickets

Tickets in the CMDI 1.2 milestone with the keyword consistency:

Ticket Summary Owner Component Priority Status
No tickets found

Discussion

Menzo: how about the underscore in CMD_ComponentSpec, CMD_Component and CMD_Element? The CMD_ prefix could be pruned away (we know we're in the CMDI world, or we could indicate that using a/the namespace), or a CMDComponent or CmdComponent pattern could be used.

Oliver (IDS)?: I agree with Menzo; (IMHO) we could just prune the prefix. If we want to keep the prefix, I vote for camel-cased (CmdComponent) -- looks nicer ;)
About the consistency issue: I tend to lean towards solution 3, because I think it is easier to process with XSLT, but I have no hard feelings. However, we should stick to one solution.

Last modified 10 years ago Last modified on 03/24/14 12:12:16