Changes between Version 1 and Version 2 of CmdiPortal


Ignore:
Timestamp:
07/27/09 12:54:14 (15 years ago)
Author:
vronk
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CmdiPortal

    v1 v2  
     1= CMDI Portal =
     2
    13CmdiPortal aims to be an integrative web-based working environment for the CLARIN user.
    24At least it must provide a MetadataBrowser - a web interface for querying CLARIN metadata stored in MetadataRepository. In full bloom it should moreover provide personal workspace facilities, extensible component for viewing resources themselves (as opposed to only viewing their metadata) and tightly interoperate with the WorkflowEngine and MetadataEditor components.
     5
     6== Web User Interface ==
     7
     8The interface should/could integrate various components. From the point of view of the user it could be a comprehensive toolbox, where everything is reachable within a click. :)
     9
     10[[Image(portal_gui.png)]]
     11
     12This is a tentative sketch of a possible user interface already in a highly evolved stage. The building blocks are:
     13 Search/Browse::
     14   This is where every workflow starts. It has multiple tabs: Browsing (Catalogue), Searching, Bookmarks etc.
     15   But a more "organic", hybrid approach shall be examined (ie mixing of Browsing and Searching).
     16   Controls MDList.
     17 MDList::
     18   Here the results of a search or the members of a category are listed.[[BR]]
     19   ~ One MD record in one row.[[BR]]
     20   As we want the searching very flexibel etc. it may be, that the distinction between the Search block and MDList will not be that clear,
     21   ie that eg controls for refining the search will be integrated in the list.
     22 MDDetail::
     23   A detail view for one Metadata record. Controlled by the MDList.
     24 !PrivateWorkspace::
     25   This block is similar to the Browse block and could be probably even integrated in the main Search/Browse control.
     26   The important distinctive function is, that this block does not show data from (Central) MetaDataRepository, but rather local data,
     27   ie data available on user's machine. Be it private resources, resources downloaded from the repository or results of processing existing resources.
     28   This component would accordingly (unlike all the other components) need access to local file system. This will require the user to install this component on her machine, be it a firefox extension, some mini-java-app or any other sort of standalone application.
     29   To not put pressure on the user to install anything, this component has to be optional, installable on demand.
     30   This component shall be very tiny, it should be really only concerned with managing and enriching the local data and providing the information to the rest of the system.
     31 !ResourceViewer::
     32   An extensible container for displaying the resources. See [[CmdiPortal#resource-viewer Section: Resource Viewer]]
     33 WorkflowEngine::
     34   In a advanced stage of evolution, there should be integration or a tight interoperability between the MD-Browser and the Workflow-Engine.
     35   The utopic scenario is to use MD-Browser for finding the Resources, both content and tools and just drag them into the graphical WorkflowEngine pane, rearranging them, equipping them with necessary parameters and running the workflow, the results automatically being added in the Results tab of user's workspace.
     36
     37=== Resource Viewer === #resource-viewer
     38While CMDI is primarily about metadata, to really make a difference to just a collection of links, it seems necessary to integrate a resource viewer in the long run. Of course every Resource type requires a different view and there are many ways one can look at every resource.
     39The idea here is to provide a kind of an extensible container, which based on the provided metadata tries to reach the actual resources and display them as good as it can. The important thing here ist "extensible", meaning new kinds (novelty implementations) of displaying resources can be incorporated in the infrastructure on this hook.
     40
     41So ResourceViewer is a abstract class/interface, which will be base for different implementations.
     42First two implementations, that would already provide for good show cases are `CorpusViewer` and `ResourceStats`.
     43
     44 ResourceStats::
     45   A generic viewer trying to give an overall look on the resource. Probably it also needs to be typed respectively to resource type.
     46   It should give information about all kind of count (tokens, sentences, documents, entries, bib-usage, etc...)
     47
     48==== Corpus Viewer ====
     49A considerable part of the language resources is text corpora. Corpora require a specific user interface, the minimum requirement being some kind of query-interface and a KWIC-display of the results. This can be extended in many ways, but this is a necessary minimum. Plus there is the server side to this, a Corpus Query System, capable of answering the queries exactly and fast based on its prebuilt indices.
     50
     51There is already a few (very good) implementations of this system, every corpus has to use one. Some of them provide even a API, allowing to query the corpus from own/third applications. The idea for the CorpusViewer is to provide a query interface + KWIC-display (+ everything else in the long run) based on the APIs provided, querying the corpora where it is, ie communicating with the corpus query system at the content provider's site. (Let's forget problems with workload and performance for now.)
     52
     53There are two implementations (i know of) which seem to lend themselves happily / suit fine in this scenario:
     54 * DDC - DWDS Dialing Concordancer - http://www.ddc-concordance.org/
     55 * manatee/bonito - http://textforge.cz
     56
     57Both are open source, provide a very expressive query language, scale fine, robust and proven in production environments.
     58manatee/bonito is used by both Slovak and Czech National Corpora and Korpus Südtirol, 
     59ddc is used by DWDS - Berlin, Schweizer Text Korpus and a unique corpus cooperation project C4. (and most probably a few more)
     60Furthermore manatee builds the base for SketchEngine - a next generation query tool
     61and ddc comes with integrated functionality for distributed corpora, ie the data are user-transparently spread across network.
     62They both provide thin APIs in perl and/or python.
     63
     64With such a Corpus Viewer implemented the CLARIN user would be (technically) able to query these existing corpora (running on these Corpus Query System) and a Corpus Service could be introduced to accomodate various "homeless" corpora from CLARIN member research groups, then also reachable via this components, integrated in the CMDI Portal.
     65
     66Wouldn't that be nice? ;)
     67
     68== Profile ==
     69The proposed workspace shall be highly personalizable. This information shall be stored in a profile.
     70Multiple profiles can be managed by one user (correspond to different workspaces).
     71
     72The internal format for profiles be JSON.[[BR]]
     73We can have personal and project/team profiles.[[BR]]
     74A project team could/should share a common project profile.
     75
     76We need to think of overlaying multiple profiles.