Changeset 1045 for MDRepository
- Timestamp:
- 01/09/11 20:28:15 (13 years ago)
- Location:
- MDRepository/trunk/xquery
- Files:
-
- 2 added
- 6 edited
Legend:
- Unmodified
- Added
- Removed
-
MDRepository/trunk/xquery/README
r958 r1045 12 12 c) set admin pwd 13 13 14 ...? 14 d) you may want to add memory to the JVM 15 under bin/functions.d/eXist-settings.sh#set_java_options() 16 17 e) you may also want to grow the cache in conf.xml 18 <db-connection cacheSize="48M" collectionCache="24M" database="native" 19 where @cacheSize could be around 512M 20 and @collectionCache should be around one third of the @cacheSize 21 15 22 16 23 1. add scripts to: /db/clarin … … 18 25 + cmd-model.xqm has all the logic 19 26 + cmd-model.xql is the script being called as the interface 20 +cmd-stats.xql is meant for testing purposes, but not integrated yet21 +init-cache.xql is meant for refreshing the cache with some long-running (resource-intensive) queries, meant to run once upon dataset change27 (+) cmd-stats.xql is meant for testing purposes, but not integrated yet 28 (+) init-cache.xql is meant for refreshing the cache with some long-running (resource-intensive) queries, meant to run once upon dataset change 22 29 23 2. add data to /db/cmdi-mirror24 (the file-system structure will be reflected in the "collection"-structure within exist,25 however this is irrelevant for the MDRepository methods.26 Those rely on the linking via handles in MdSelfLink/ResourceRef and IsPartOf elements of the MDRecords.27 The handles in IsPartOf are redundant (necessary for faster collection-constraint search)28 and can be derived from the ResourceRef/MdSelfLink link.29 This can be done before storing the data in the repository,30 or after the import directly in the repository (scripts for this will be available soon)31 30 32 3. add a clarin-user in /db/system/users.xml31 2. add a clarin-user in /db/system/users.xml 33 32 (needed for writing into the cache) 34 33 + /db/clarin/writer.xml with given user, like this: … … 39 38 40 39 41 4. create a collection for caching,42 eg: /db/c ommon/clarin/freqs40 3. create a collection for caching, 41 eg: /db/cache 43 42 this has to correspond to the entry in cmd-model.xqm: 44 declare variable $cmd-model:commonFreqsPath as xs:string := "/db/c ommon/clarin/freqs";43 declare variable $cmd-model:commonFreqsPath as xs:string := "/db/cache"; 45 44 46 45 If you change something, you have to manually clear the cache-collection. … … 50 49 for getCollections: collection{maxdepth}-{hash({collection-handle})} 51 50 for queryModel: values{maxdepth}-{hash({simple xpath from q-param})} 51 52 52 53 5. depending on your server-setup you should be able to get your first query under somewhere like: 53 4. define indices 54 copy cmdi-mirror.xconf into /db/system/config/db/cmdi-mirror 55 56 57 5. add data to /db/cmdi-mirror 58 (the file-system structure will be reflected in the "collection"-structure within exist, 59 however this is irrelevant for the MDRepository methods. 60 Those rely on the linking via handles in MdSelfLink/ResourceRef and <IsPartOf> elements of the MDRecords. 61 The handles in <IsPartOf> are redundant (necessary for faster collection-constraint search) 62 and can be derived from the ResourceRef/MdSelfLink link. 63 This can be done before storing the data in the repository, 64 or after the import directly in the repository (XUpdate-scripts for this will be available soon) 65 66 The top level collection record is by convention called colleciton_root.cmdi 67 and is marked with: <IsPartOf>root</IsPartOf> 68 (So every dataset (olac, lrt, imdi) has one such MDRecord.) 69 70 6. depending on your server-setup (port) you should be able to get your first query under somewhere like: 54 71 55 72 http://localhost:8680/exist/rest/db/clarin/cmd-model.xql?q=Components … … 61 78 Avoid starting multiple times. 62 79 You can see in the cache-collection, if the results are ready. 63 80 81 64 82 65 83 == test suite == -
MDRepository/trunk/xquery/build-tests.properties
r827 r1045 2 2 test.file=${cmdi-tests.basedir}/cmd-test.xml 3 3 tests.output = ${cmdi-tests.basedir}/output 4 //test.group = cmdi-tests-searchRetrieve 5 test.group = cmdi-tests-cc -
MDRepository/trunk/xquery/build-tests.xml
r827 r1045 43 43 </typedef> 44 44 45 <echo message="queries-file :${test.file}"/>45 <echo message="queries-file#group:${test.file}#${test.group}"/> 46 46 <delete dir="${tests.output}/temp" failonerror="false"/> 47 47 <mkdir dir="${tests.output}"/> … … 50 50 <test:benchmark outputFile="${tests.output}/cmdi-result.xml" 51 51 source="${test.file}" 52 group=" cmdi-tests-cc"/>52 group="${test.group}"/> 53 53 </target> 54 54 -
MDRepository/trunk/xquery/cmd-model.xql
r727 r1045 29 29 return 30 30 if ($operation eq $cmd-model:getCollections) then 31 cmd-model:get-collections($query-collections, $format, $max-depth)31 cmd-model:get-collections($query-collections, $format, $max-depth) 32 32 else if ($operation eq $cmd-model:queryModel) then 33 cmd-model:query-model($cmd-index-path, $query-collections, $format, $max-depth) 34 else if ($operation eq $cmd-model:searchRetrieve) then 35 let $cql-query := request:get-parameter("query", "MDGroup/Actors/Actor"), 36 $start-item := request:get-parameter("startRecord", 1), 37 $end-item := request:get-parameter("iend", 50) 33 cmd-model:query-model($cmd-index-path, $query-collections, $format, $max-depth) 34 else if ($operation eq $cmd-model:scanIndex) then 35 let $filter := request:get-parameter("filter", ""), 36 $start-item := request:get-parameter("startItem", 1), 37 $max-items := request:get-parameter("maxItems", 50), 38 $sort := request:get-parameter("sort", 'text') 39 return cmd-model:scan-index($cmd-index-path, $query-collections, $format, $start-item, $max-items, $sort) 40 else if ($operation eq $cmd-model:searchRetrieve) then 41 let $cql-query := request:get-parameter("query", "MDGroup/Actors/Actor"), 42 $start-item := request:get-parameter("startItem", 1), 43 $max-items := request:get-parameter("maxItems", 50) 38 44 39 return cmd-model:search-retrieve($cql-query, $query-collections, $format, xs:integer($start-item), xs:integer($ end-item))45 return cmd-model:search-retrieve($cql-query, $query-collections, $format, xs:integer($start-item), xs:integer($max-items)) 40 46 else 41 47 <error>Unknown operation</error> -
MDRepository/trunk/xquery/cmd-model.xqm
r959 r1045 12 12 declare variable $cmd-model:cmdiMirrorPath as xs:string := "/db/cmdi-mirror"; 13 13 declare variable $cmd-model:cachePath as xs:string := "/db/cache"; 14 14 declare variable $cmd-model:groupXsl := doc('/db/clarin/group.xsl'); 15 15 declare variable $cmd-model:getCollections as xs:string := "getCollections"; 16 16 declare variable $cmd-model:queryModel as xs:string := "queryModel"; 17 declare variable $cmd-model:scanIndex as xs:string := "scanIndex"; 17 18 declare variable $cmd-model:searchRetrieve as xs:string := "searchRetrieve"; 18 19 … … 28 29 declare variable $cmd-model:responseFormatText as xs:string := "text"; 29 30 31 declare variable $cmd-model:scanSortText as xs:string := "text"; 32 declare variable $cmd-model:scanSortSize as xs:string := "size"; 33 30 34 declare variable $cmd-model:collectionDocName as xs:string := "collection.xml"; 31 35 … … 34 38 declare variable $cmd-model:xmlExt as xs:string := ".xml"; 35 39 40 declare variable $cmd-model:maxDepth as xs:integer := 8; 36 41 declare variable $cmd-model:valuesLimit as xs:integer := 100; 37 42 43 44 (:~ 45 API function getCollections. 46 :) 47 declare function cmd-model:get-collections($collections as xs:string+, $format as xs:string, $max-depth as xs:integer) as item() { 48 let $name := cmd-model:gen-cache-id("collection", $collections, xs:string($max-depth)), 49 $doc := 50 if (cmd-model:is-in-cache($name)) then 51 cmd-model:get-from-cache($name) 52 else 53 let $data := cmd-model:colls($collections, $max-depth) 54 return cmd-model:store-in-cache($name, $data) 55 return 56 cmd-model:serialise-as($doc, $format) 57 }; 38 58 39 59 (:~ … … 54 74 55 75 (:~ 56 API function getCollections. 57 :) 58 declare function cmd-model:get-collections($collections as xs:string+, $format as xs:string, $max-depth as xs:integer) as item() { 59 let $name := cmd-model:gen-cache-id("collection", $collections, xs:string($max-depth)), 60 $doc := 61 if (cmd-model:is-in-cache($name)) then 62 cmd-model:get-from-cache($name) 63 else 64 let $data := cmd-model:colls($collections, $max-depth) 65 return cmd-model:store-in-cache($name, $data) 76 API function scanIndex. 77 two phases: 78 1.one create full index for given path/element (and cache) 79 2. select wished subsequence (on second call, only the second step is performed) 80 :) 81 declare function cmd-model:scan-index($q as xs:string, $collection as xs:string+, $format as xs:string, $start-item as xs:integer, $max-items as xs:integer, $p-sort as xs:string?) as item()? { 82 83 let $qa := tokenize($q,'='), 84 $cmd-index-path := $qa[1], 85 $filter := ($qa[2],'')[1], 86 $sort := if ($p-sort eq $cmd-model:scanSortText or $p-sort eq $cmd-model:scanSortSize) then $p-sort else $cmd-model:scanSortText, 87 $name := cmd-model:gen-cache-id("index", ($collection, $cmd-index-path),"1"), 88 (: skip cache $doc := cmd-model:values($cmd-index-path, $collection) :) 89 $doc := if (cmd-model:is-in-cache($name)) then 90 cmd-model:get-from-cache($name) 91 else 92 let $data := cmd-model:values($cmd-index-path, $collection) 93 return cmd-model:store-in-cache($name, $data) 94 95 (: extract the required subsequence (according to given sort) :) 96 let $res-term := transform:transform($doc,$cmd-model:groupXsl, 97 <parameters><param name="mode" value="subsequence"/> 98 <param name="sort" value="{$sort}"/> 99 <param name="filter" value="{$filter}"/> 100 <param name="start-item" value="{$start-item}"/> 101 <param name="max-items" value="{$max-items}"/> 102 </parameters>), 103 $count-items := count($res-term/v), 104 $colls := if (fn:empty($collection)) then '' else fn:string-join($collection, ","), 105 $created := fn:current-dateTime(), 106 $scan-clause := concat($cmd-index-path, '=', $filter), 107 $res := <Terms colls="{$colls}" created="{$created}" count_items="{$count-items}" 108 start-item="{$start-item}" max-items="{$max-items}" sort="{$sort}" scanClause="{$scan-clause}" >{$res-term}</Terms> 109 110 (: let $result-count := $doc/Term/@count, 111 $result-seq := fn:subsequence($doc/Term/v, $start-item, $end-item), 112 $result-frag := ($doc/Term, $result-seq), 113 $seq-count := fn:count($result-seq) :) 114 66 115 return 67 cmd-model:serialise-as($ doc, $format)116 cmd-model:serialise-as($res, $format) 68 117 }; 69 118 … … 82 131 for $coll in $collections return util:eval(fn:concat("$collection/ft:query(descendant::IsPartOf, <term>", xdb:decode($coll) ,"</term>)/ancestor-or-self::CMD", $sanitized-query)) 83 132 84 let$result-count := fn:count($results),133 let $result-count := fn:count($results), 85 134 $result-seq := fn:subsequence($results, $start-item, $end-item), 86 135 $seq-count := fn:count($result-seq), 87 $end-time := util:system-dateTime(), 88 $result-fragment := 136 $end-time := util:system-dateTime() 137 138 let $summary-fragment := 139 if (contains($format,'withSummary')) then 140 let $used-profiles := for $profile in distinct-values($results//Components/concat(child::element()/name(),'##',../Header/MdProfile)) 141 let $profile-id := substring-after($profile,'##'), $profile-name := substring-before($profile,'##') 142 return <profile id="{$profile-id}" name="{$profile-name}" count="{count($results//Components[concat(child::element()/name(),'##',../Header/MdProfile) eq $profile])}" />, 143 $end-time2 := util:system-dateTime(), 144 $result-summary := cmd-model:elem-r($result-seq//Components, "Components", $cmd-model:maxDepth, $cmd-model:maxDepth), 145 $end-time3 := util:system-dateTime(), 146 $duration := concat(($end-time - $start-time),", ", ($end-time2 - $start-time),", ", ($end-time3 - $start-time)) 147 return (<duration>{$duration}</duration>, <usedProfiles>{$used-profiles}</usedProfiles>,<resultSummary>{$result-summary}</resultSummary>) 148 else <duration>{$end-time - $start-time}</duration> 149 150 let $result-fragment := 89 151 <searchRetrieveResponse> 90 152 <numberOfRecords>{$result-count}</numberOfRecords> … … 92 154 <extraResponseData> 93 155 <returnedRecords>{$seq-count}</returnedRecords> 94 <duration>{$end-time - $start-time}</duration> 156 {$summary-fragment} 95 157 </extraResponseData> 96 158 <records> … … 106 168 (: 107 169 ********************** 108 queryModel - subfunctions170 queryModel, scanIndex - subfunctions 109 171 :) 110 172 … … 154 216 (for $elname in $subs[. != ''] 155 217 return 156 cmd-model:elem-r(util:eval(concat("$path-nodes/", $elname)), concat($path, '/', $elname), $max-depth, $depth - 1), 157 if ($max-depth eq 1 and $text-count gt 0) then cmd-model:values($path-nodes) else ()) 218 cmd-model:elem-r(util:eval(concat("$path-nodes/", $elname)), concat($path, '/', $elname), $max-depth, $depth - 1) 219 (: values moved to own function: scanIndex 220 if ($max-depth eq 1 and $text-count gt 0) then cmd-model:values($path-nodes) else ()) :) 221 ) 158 222 else 'maxdepth' 159 223 }</Term> 160 };161 162 declare function cmd-model:values($nodes as node()*) as node()* {163 let $keys := distinct-values($nodes/text())164 let $values := for $key at $pos in $keys165 let $kcount := count($nodes[. eq $key])166 order by lower-case($key) ascending167 return <v key="{$key}" cnt="{$kcount}" />168 return169 if ($cmd-model:valuesLimit eq 0) then $values170 else171 subsequence($values, 1, $cmd-model:valuesLimit)172 224 }; 173 225 … … 178 230 return util:node-xpath($anc) 179 231 }</Term> 232 }; 233 234 declare function cmd-model:collect-nodes($collections as xs:string+, $path as xs:string) as element()* { 235 let $collection := collection($cmd-model:cmdiMirrorPath), 236 $path-nodes := 237 if ($collections[1] eq $cmd-model:collectionRoot) then 238 util:eval(fn:concat("$collection/descendant-or-self::", $path)) 239 else 240 for $coll in $collections 241 return 242 util:eval(fn:concat("$collection/ft:query(descendant::IsPartOf, <query><term>", xdb:decode($coll), "</term></query>)/ancestor-or-self::CMD/descendant-or-self::", $path)) 243 244 return $path-nodes 245 }; 246 247 declare function cmd-model:values($path as xs:string,$collections as xs:string+) as element() { 248 249 let $nodes := cmd-model:collect-nodes($collections, $path), 250 (: $term := <Term path="{fn:concat("//", $path)}" name="{(text:groups($path, "/([^/]+)$")[last()],$path)[1] }" >{$nodes}</Term> 251 @name is added in xslt:) 252 $term := <Term path="{fn:concat("//", $path)}" >{$nodes}</Term> 253 254 (: use XSLT-2.0 for-each-group functionality to aggregate the values of a node - much, much faster, than XQuery :) 255 return transform:transform($term,$cmd-model:groupXsl, ()) 256 180 257 }; 181 258 -
MDRepository/trunk/xquery/cmd-test.xml
r830 r1045 3 3 <configuration> 4 4 <!-- <connection id="con" user="admin" password="" base="xmldb:exist://embedded-eXist-server"/> --> 5 <connection id="con" user="admin" password="pwd" base="xmldb:exist://localhost:8680/exist/xmlrpc/"/> 5 <!-- <connection id="con" user="admin" password="M0dest7db" base="xmldb:exist://localhost:8680/exist/xmlrpc"/> --> 6 <connection id="con" user="admin" password="sen71blE8dba" base="xmldb:exist://clarin.aac.ac.at/exist/xmlrpc"/> 6 7 <action name="sequence" class="org.exist.performance.ActionSequence"/> 7 8 <action name="create-collection" class="org.exist.performance.actions.CreateCollection"/> … … 40 41 <thread name="thread1" connection="con"> 41 42 <sequence repeat="20" description="actual queries on the mdrecords"> 42 <xquery collection=" /db/cmdi-mirror" query="">43 <xquery collection="db/cmdi-mirror" query=""> 43 44 import module namespace cmd-model = "http://spraakbanken.gu.se/clarin/xquery/model" 44 45 at "xmldb:exist:///db/clarin/cmd-model.xqm"; 45 46 cmd-model:search-retrieve("//title[contains(.,'a')]", 'root', 'xml', 1,10) 46 47 </xquery> 47 <xquery collection="/db/cmdi-mirror" query="// ft:query(*, <query><term>Language</term></query>)" />48 <xquery collection="/db/cmdi-mirror" query="ft:query(descendant::IsPartOf, <query><term>clarin-at:aac-test-corpus</term></query>)" />48 <xquery collection="/db/cmdi-mirror" query="//Components/ft:query(., 'Language')" /> 49 <xquery collection="/db/cmdi-mirror" query="ft:query(descendant::IsPartOf, 'clarin-at:aac-test-corpus')" /> 49 50 <xquery collection="/db/cmdi-mirror" query="//idno[.='4197']" /> 50 51 <xquery collection="/db/cmdi-mirror" query="//date[.>'1920']" /> 51 52 <xquery collection="/db/cmdi-mirror" query="//ResourceType[.='Metadata']" /> 52 53 <xquery collection="/db/cmdi-mirror" query="//sourceDesc/biblStruct/monogr/title[contains(.,'rat')]" /> 53 <xquery collection="/db/cmdi-mirror" query="//CMD/*[.//teiHeader//monogr/title[contains(.,'t')]][.//teiHeader//imprint/date[. >'1930']][.//teiHeader//imprint/date[.<'1935']]" />54 <xquery collection="/db/cmdi-mirror" query="//CMD/*[.//teiHeader//monogr/title[contains(.,'t')]][.//teiHeader//imprint/date[.>'1930']][.//teiHeader//imprint/date[.<'1935']]" /> 54 55 </sequence> 55 56 </thread>
Note: See TracChangeset
for help on using the changeset viewer.