source: MDRepository/trunk/xquery/README

Last change on this file was 3107, checked in by vronk, 11 years ago

marked obsoleted

File size: 3.8 KB
Line 
1####################################################
2THIS CODE IS OBSOLETED!!
3it has been integrated and further developed within:
4https://github.com/vronk/corpus_shell
5and
6https://github.com/vronk/SADE
7####################################################
8
9== CLARIN MDRepository ==
10Steps to setup and run the repository
11
120. prerequisites + install
13        a) be sure to use java-jdk 1.6 (we experienced strange java-errors with 1.5)
14       
15        b) install:
16        http://exist-db.org/quickstart.html#sect2
17       
18        java -jar eXist-{version}.jar -p {install-dir}
19               
20        c) set admin pwd
21       
22        d) you may want to add memory to the JVM
23           under bin/functions.d/eXist-settings.sh#set_java_options()
24       
25        e) you may also want to grow the cache in conf.xml
26                 <db-connection cacheSize="48M" collectionCache="24M" database="native"       
27                  where @cacheSize could be around 512M
28      and @collectionCache should be around one third of the @cacheSize
29
30
311. add scripts to: /db/clarin
32               
33                + cmd-model.xqm has all the logic
34                + cmd-model.xql is the script being called as the interface
35                + groups.xsl
36         (+) cmd-stats.xql is meant for testing purposes, but not integrated yet
37         (+) init-cache.xql is meant for refreshing the cache with some long-running (resource-intensive) queries, meant to run once upon dataset change
38
39
402. add a clarin-user in /db/system/users.xml
41   (needed for writing into the cache)
42   + /db/clarin/writer.xml with given user, like this:
43   <write>
44    <write-user>clarin</write-user>
45    <write-user-cred>{PASSWORT}</write-user-cred>
46        </write>
47
48
493. create a collection for caching,
50        eg: /db/cache
51        this has to correspond to the entry in cmd-model.xqm:
52        declare variable $cmd-model:commonFreqsPath as xs:string := "/db/cache";
53       
54        If you change something, you have to manually clear the cache-collection.
55       
56        Queries on queryModel- and getCollections-interfaces are being cached.
57        The key is:
58          for getCollections: collection{maxdepth}-{hash({collection-handle})}
59          for queryModel:   values{maxdepth}-{hash({simple xpath from q-param})}
60
61       
624. define indices
63         copy cmdi-mirror.xconf into /db/system/config/db/cmdi-mirror
64         
65         
665. add data to  /db/cmdi-mirror
67         (the file-system structure will be reflected in the "collection"-structure within exist,
68         however this is irrelevant for the MDRepository methods.
69         Those rely on the linking via handles in MdSelfLink/ResourceRef and <IsPartOf> elements of the MDRecords.
70         The handles in <IsPartOf> are redundant (necessary for faster collection-constraint search)
71         and can be derived from the ResourceRef/MdSelfLink link.
72         This can be done before storing the data in the repository,
73         or after the import directly in the repository (XUpdate-scripts for this will be available soon)
74
75         The top level collection record is by convention called colleciton_root.cmdi
76         and is marked with: <IsPartOf>root</IsPartOf>
77         (So every dataset (olac, lrt, imdi) has one such MDRecord.)
78
796. depending on your server-setup (port) you should be able to get your first query under somewhere like:               
80               
81        http://localhost:8680/exist/rest/db/clarin/cmd-model.xql?q=Components
82        (queryModel is the default operation)
83       
84        http://localhost:8680/exist/rest/db/clarin/cmd-model.xql?operation=getCollections&collection=
85       
86        These queries may take some time, when run first time, so be patient.
87        Avoid starting multiple times.
88        You can see in the cache-collection, if the results are ready.
89
90
91       
92== test suite ==
93THIS IS CURRENTLY BEING DEVLEOPED! NOT SAFELY USABLE YET!
94
95own build-file: build-tests.xml
96based on exist's performance.xml sub-build-file
97imports main exist build-file.
98This yields problems with basedir for the imported build-files
99
100The simplest solution I could find is to set the basedir as property on command line:
101
102ant -f build-tests.xml -Dbasedir=C:/apps/exist benchmark
103
104The other options are to be set in build-tests.properties!
105
106actual queries for testing/benchmarking are written in cmd-test.xml
Note: See TracBrowser for help on using the repository browser.