MDService Example Queries
Collection of Real-world queries (but not all of them tested yet) and related issues.
Showing both CQL queries and XPath-translations.
MDRepository provides three interfaces:
getCollections() queryModel() searchRetrieve()
Basic facets
! needs queryModel - Values
- profile overview
-
important starting question: which profiles are actually used:
?operation=queryModel&q=//CMD/Components
however this returns only the root elements, which may not unambiguously refer to the profiles.
CMD/Header/MdProfile
should refer to the correct profile, but may not be always filled. (+ requires queryModel to return Values!)?operation=queryModel&q=//MdProfile ?operation=queryModel&q=//CMD/Components/MdProfile
- language overview
-
?operation=queryModel&q=language
- continent/country
-
?operation=queryModel&q=Location
View collections
There is a special param collection
, but this is currently unstable, as it relied on the exist's own collection mechanism.
A safe alternative is the use of element IsPartOf
(full path: /CMD/Resources/IsPartOfList/IsPartOf
)
cql: IsPartOf = test-hdl:1839/00-0000-0000-0001-494F-5 xpath: //IsPartOf[.='test-hdl:1839/00-0000-0000-0001-494F-5'] //IsPartOf[.='clarin-at:aac-test-corpus:sozialismus']
So to get the whole collection (all resources of the collection) we use operation=searchRetrieve
and query element IsPartOf
:
collection(clarin-at:aac-test-corpus:sozialismus)
As the IsPartOf
-relation is fully expanded (every resource references via IsPartOf
every ancestor-collection CMD-record, not just the parent), the simple quey as introduced above returns all descendants, not only children, which should actually be the default behaviour. One possible resolution would be to add an attribute @ancestor-level
or similar, which would allow to distinguish the depth of the IsPartOf
-relation. Querying just the children would be accomplished by:
//IsPartOf[@ancestor-level=1][.='clarin-at:aac-test-corpus:sozialismus']
This would even allow for flexible maximum depth of the query wrt to the collection-hierarchy.
Querying multiple collections translates to a OR
-query:
xpath: //IsPartOf[.='{handle-coll1}' or .='{handle-coll2}'] //IsPartOf[@ancestor-level=1][.='{handle-coll1}' or .='{handle-coll2}']
Simple Search Clauses
- Index ANY term
-
Title any the
- Phrase
-
Country = "United States" /* not working yet in MDService*/ Description any "Free play in a family context" "data collection" (: but works in XPath: :) //title[contains(.,'a machine-readable transcription')]
- comparitors
-
does not work yet
Date < 2005 /* especially tricky as the Date-format YYYY-MM-DD (or even other) */
Boolean
Boolean CQL-queries are easily conversible into XPath, they just have to be contextualised with some same ancestor node, like this:
//Session[contains(Date,'2005')][contains(.//Description,'participant')]
Currently it may not be the root node CMD
itself, because MDRepository searches for the ancestor::CMD
of the XPath
So this will result in 0 (irrespective of the actual conditions)
! //CMD[contains(.//Title,'year')][.//Date='2001-11-21']
But this would work:
//CMD/Components[contains(.//Title,'year')][.//Date='2001-11-21'] //CMD/*[contains(.//Title,'year')][.//Date='2001-11-21'] //Components[contains(.//Title,'year')][.//Date='2001-11-21']
Usually using Components
as starting point is best, if searching in the actual metadata. This however does not cover the /CMD/Header
and /CMD/Resources
part of the MD-records. This is especially the case when constraining the collections with IsPartOf
:
//CMD/*[contains(.//Title,'year')][.//Date='2001-11-21'][.//IsPartOf='{handle-coll1}'] (: this wouldn't work: :) ! //CMD/Components[contains(.//Title,'year')][.//Date='2001-11-21'][.//IsPartOf='{handle-coll1}']
Boolean queries:
- AND
-
biling_data: ( ( Date any 2005 ) ) and ( ( Description any participant ) ) biling_data: ( ( Date any 2005 ) ) and ( ( Description any part ) ) and ( ( Country any United ) )
brackets are not necessary, as long as they are not necessary ;). This should work as well :biling_data: Date any 2005 and Description any participant
- AND OR
-
biling_data: ( ( Date any 2005 ) or ( Country any United ) ) and ( ( Description any part ) ) //CMD/Components[.//Date[contains(.,'2005')] or .//Country[contains(.,'United')]][.//Description[contains(.,'part')]]
- AND NOT
-
seems very expensive, not sure about the syntax
biling_data: ( ( Date any 2005 ) not ( Country = Japan) ) //CMD[.//imdi-corpus][not(.//ResourceType[.='Metadata'])]