wiki:CmdiPortal

CMDI Portal

CmdiPortal aims to be an integrative web-based working environment for the CLARIN user. At least it must provide a MetadataBrowser - a web interface for querying CLARIN metadata stored in MetadataRepository?. In full bloom it should moreover provide personal workspace facilities, extensible component for viewing resources themselves (as opposed to only viewing their metadata) and tightly interoperate with the WorkflowEngine? and MetadataEditor components.

Components and Deployment

CmdiPortal and MDService are tightly intertwined. Plus there is a bunch of further related components. Following a first sketch on components interdependencies and deployment situation around CmdiPortal and MDService, that waits to be verbosed and refined yet. Especially AAI completely missing here as well yet!

Figure 1 CMDIPortal, MDService and interacting/related components
CmdiPortal, MDService and related components

Figure 2 One possible deployment situation for CMDIPortal, MDService & co.
Deployment + dependencies of CmdiPortal and MDService

Web User Interface

The interface should/could integrate various components. From the point of view of the user it could be a comprehensive toolbox, where everything is reachable within a click. :)

Figure 3 This is a tentative sketch of a possible user interface already in a highly evolved stage. a sketch of a possible portal user interface

Following the building blocks are discussed, but beware that these terms will probably get increasingly fuzzy, because for example detail view of a Virtual Collection MD record will be a MD collection itself, which means a list of MD entries! :

Search/Browse?
This is where every workflow starts. It has multiple tabs: Browsing (Catalogue), Searching, Bookmarks etc. But a more "organic", hybrid approach shall be examined (ie mixing of Browsing and Searching). Controls MDList.
MDList/Tree
Here the results [MD collection] of a search or the members of a category are listed.
~ One MD record in one row.
As we want the searching very flexibel etc. it may be, that the distinction between the Search block and MDList will not be that clear, ie that eg controls for refining the search will be integrated in the list.
MDDetail
A detail view for one Metadata record. Controlled by the MDList.
Local/PrivateWorkspace?
This block is similar to the Browse block and could be probably even integrated in the main Search/Browse? control. The important distinctive function is, that this block does not show data from (Central) MetaDataRepository?, but rather local data, ie data available on user's machine. Be it private resources, resources downloaded from the repository or results of processing existing resources.
This component would accordingly (unlike all the other components) need access to local file system. This will require the user to install this component on her machine, be it a firefox extension, some mini-java-app or any other sort of standalone application. To not put pressure on the user to install anything, this component has to be optional, installable on demand.
This component shall be very tiny, it should be really only concerned with reaching out to the local data (managing and enriching them) and providing the information to the rest of the system.
ResourceViewer
An extensible container for displaying the resources. See CmdiPortal#resource-viewer Section: Resource Viewer
WorkflowEngine?
In a advanced stage of evolution, there should be integration or a tight interoperability between the MD-Browser and the Workflow-Engine. The utopic scenario is to use MD-Browser for finding the Resources, both content and tools and just drag them into the graphical WorkflowEngine? pane, rearranging them, equipping them with necessary parameters and running the workflow, the results automatically being added in the Results tab of user's workspace.

Resource Viewer

While CMDI is primarily about metadata, to really make a difference to just a collection of links, it seems necessary to integrate a resource viewer in the long run. Of course every Resource type requires a different view and there are many ways one can look at every resource. The idea here is to provide a kind of an extensible container, which based on the provided metadata tries to reach the actual resources and display them as good as it can. The important attribute here is "extensible", meaning new kinds (novelty implementations) of displaying resources can be incorporated in the infrastructure at this hook.

So ResourceViewer? is a abstract class/interface, which will be base for different implementations. Following a few ideas for derived classes / implementations, that would already provide for good show cases:

CorpusViewer
see next section
ResourceStats
A generic viewer trying to give an overall look on the resource. Probably it also needs to be typed respectively to resource type. It should give information about all kind of count (tokens, sentences, documents, entries, bib-usage, etc...). The question is, to which extent this information would have to be provided by the resource provider via a specific interface, or how much is already contained in the MD, making it just some pimped up MDDetail view.
Audio/Video Viewer
listen to the spoken corpus, watch the video!
Generic_iFrame
really just a placeholder for integrating a user interface provided by the resource provider. This will look nasty (the patchwork will be recognizable visually), but would be fast to do. Don't know if useful really.

Corpus Viewer

A considerable part of the language resources is text corpora. Corpora require a specific user interface, the minimum requirement being some kind of query-interface and a KWIC-display of the results. This can be extended in many ways, but this is a necessary minimum. Plus there is the server side to this, a Corpus Query System, capable of answering the queries exactly and fast based on its prebuilt indices.

There is already a few (very good) implementations of this system, as every corpus has to use one. Some of them provide even an API, allowing to query the corpus from own/third applications. The idea for the CorpusViewer? is to provide a query user interface + KWIC-display (+ "everything else" in the long run) based on the APIs provided, querying the corpora where it is, ie communicating with the corpus query system at the content provider's site. (Let's forget problems with workload and performance for now.)

There are two implementations (i know of) which seem to lend themselves happily / suit fine in this scenario:

Both are open source, provide a very expressive query language, scale fine, robust and proven in production environments. manatee/bonito is used by both Slovak and Czech National Corpora and Korpus Südtirol, ddc is used by DWDS - Berlin, Schweizer Text Korpus - Basel and a unique corpus cooperation project C4 (and most probably a few more).

Furthermore manatee builds the base for SketchEngine? - a next generation query tool and ddc comes with integrated functionality for distributed corpora, ie the data are user-transparently spread across network. They both provide thin APIs in perl and/or python.

With such a Corpus Viewer implemented, the CLARIN user would be (technically) able to query the existing corpora (running on these Corpus Query System), theoretically even heterogenously, ie multiple corpora with different format and running on different CQS could be queried at once. Also a Corpus Housing Service could be introduced to accomodate various "homeless" corpora from CLARIN member research groups, then also reachable via this components, integrated in the CMDI Portal.

Wouldn't that be nice? ;)

Profile

Propositions:

  • The proposed workspace shall be highly personalizable.
  • This information shall be stored in a profile.
  • Multiple profiles can be managed by one user (correspond to different workspaces).
  • The internal format for profiles be JSON.
  • We can have personal and project/team profiles.
  • A project team could/should share a common project profile.
  • We need to think of overlaying multiple profiles.
  • Despite anticipated heavy use of AJAX, any given workspace-state shall be linkable, for example by providing linkToThis() method which would freeze the state of application as Profile[JSON] and publish this under a generated link.

Samples

Here screenshots of existing systems, user interfaces shall be collected, as base for discussion. Ideas:

  • Flamenco Faceted Search at clarin.eu
Last modified 15 years ago Last modified on 08/04/09 07:51:36

Attachments (3)

Download all attachments as: .zip