MetadataBrowser – CLARIN Trac

Context Navigation

MetadataBrowser is the user interface (web application) of the MDService. It provides browsing and searching functionality on the Metadata Repository. (This page should contain rather abstract definition. See this for implementation details.)

Search

The user formulates a query to search in the metadata. The result is a list of MD-Records. Due to the complex structure of the records (massively multiple schemas ...), when formulating the query the user is guided by meta-information about the data (MD-Records) in the repository. This comprises:

collections: basic hierarchical structure of the records in the MDRepository
components and profiles: list of existing profiles and MD-components stored in the ComponentRegistry
data categories: list of data categories provided by the isocat DCRegistry.

Query Input

This is a crucial component wrt to ergonomy. It has to interact with (be fed from) Selection Lists (Collections, Terms), and "commands" the Result-pane.

The images on the right: Screenshot of the IMDI-Browser as one source of inspiration and a baseline. The second one is a tentative schema for the query-input UI-component.

Variants:

imdi-style: interaction: autocomplete, combo-boxes, dropping in from selection
search-by-profile: whole profile tree-listed with input-widgets
contextually-autocompleting-textarea: one textarea with contextual autocomplete (suggesting); one rather raw attempt: http://193.170.95.104/search/searchx#

Repository Matrix

This is one rather keen proposal of an advanced search/interaction UI-component. It is a kind of mix of the above mentioned variants (especially search-by-profile), facetted search and repository statistics.

The motivation is a complex explorative search-process, where the user needs to relate many different results.

The basic idea is to build up a matrix (table) of Terms and Collections. Each field in this table stands for a resultset for given Term and Collection, represented by a number (size of the resultset). The rows are ruled by a Term: which can be:

*

basic simple search in all fields

simple Term

fed from CmdiMetadataServices#queryModel repository statistics

Datacategory
Element from Component Registry

search clause: index + relation + search-term (free text or a one of proposed list if ValueScheme? defines a vocabulary..
boolean clause: combining multiple search clauses by AND/OR

The columns list selected collections, special case: all-selector searching over whole

Even if the matrix won't be implemented as such, it shall serve as guidance for the various querying/possibilities.

This matrix would still be fed by the selection-lists in the menu, providing available Collections and Terms.

Collection view

Lists all collections contained in the Repository. The basic structure for building the Collections-Tree is the collection/partOf-relation constituted by ResourceProxy-links in one MD-record linking to (multiple) other MD-records (instead of Resources themselves). It seems sensible to provide the list of contributing parties (individual OAI-PMH providers) at the first-level of the collection hierarchy.

This collection view also seems the natural place to display the virtual collections. For the start, all VCs could be displayed, if they grow too many, some structuring and filtering will have to be employed.

Components/Elements? view

The component registry accomodates many components ( more than 100 currently, possibly rising up to 1000s?) packaged in a number of profiles (currently 13, eventually rising hopefully not to much over a 100?). These profiles translate to schemas, which guide individual MD-records/instances, which the MDRepository harvests and stores. This allows "informed search" in the Repository - based on the profiles and their CMD-Components and -Elements user is informed about the structure of data in the Repository.

Therefore the user will be provided with the information on the "model" (profiles, components, elements), fetched from ComponentRegistry. However the model from the ComponentRegistry just reveals the possible structure, not telling, which schemas are actually used and elements filled in the MD-records in the Repository. Therefore this information will be optionally enriched by statistical data from Repository, so that together with the structure of the data, the user knows beforehand, which MD-Elements are really used and contain useful information.

There are multiple possible views on this "model". The default/natural one is the treeview starting from the profiles (eg SpeechCorpusProfile), digging down through its components (MDGroup, Actors, Actor), until one eventually arrives at the leaves - the elements (Sex or Age).

Another - probably more convenient - way is to directly work with the (flat list of) elements, as they are the ones, where the actual values to search upon reside. However this comes with the problem of ambiguity, as the elements can be used in various components in various profiles. One solution for this could be a "contextual" list of elements, in which ambiguous elements are expanded with their context (ancestor compoments (profile)) until they get unambiguous. With this list if the user uses an ambiguous element, she is provided with sublist of contexts, in which the element is used, to pick from. The user can even still be let the freedom to leave the query ambiguous, searching in all elements with the same name irrespective of their component context.

This element-list can be easily used in the query field for autocompletion and also for customizing the result-view, deciding which fields of the MD-records the user wants to have displayed.

Data Categories

Another viable path to the data should be via Data Categories. These are profile-neutral and hopefully well-defined.

The strategy: The user is provided with the list (or tree) of Data Categories (fetched from isocat-DCR), and can use them in the query. However the MDRepository is unaware of Data Categories or ConceptLinks?, thus the DCs used in the query have to be resolved to appropriate Components/Elements? before passing the MD-query to MDRepository. For this MDService has to use the information from ComponentRegistry as described in SemanticMapping (level 1).

Virtual Collection

One natural functionality of the MDBrowser is to save results of a query as a virtual collection. VCRegistry provides an interface for the creation of Virtual Collections, where the MDBrowser has to register (POST) the information (required metadata + the list of MD-Records to form a virtual collection). This also means, that the user needs immediate feedback - she needs to see her registered virtual collection enlisted within the general virtual collection list. This is problematic, as the overall virtual collection listing is fetched in an uniform manner (wrt to other metadata) from the MDRepository, which will harvest Virtual Collections from VCR, yielding a latency of roughly 1 day.

So the MDBrowser/Service will probably have to apply a dual strategy getting the virtual collections primarily from MDRepository, but to additionaly query the "pending" virtual collections (not yet harvested by MDRepository), or at least VCs of given user, directly from VCR.

Moreover publishing a Virtual Collection requires some notion of a user to attribute the created VC to a Creator. Also there should be a kind of a basket (cart) where the user can collect the MDRecords before she lets them publish at VCR.

User / Profile

Although in it's basic form the MDBrowser is just a human view port on the MDService and should work stateless, in general this is a limitation for the possible functionality.

MDBrowser now provides Workspaceprofiles for this. (see implementation details)

Use case

Following diagram depicts the required and optional functions of MDBrowser from user's point of view.

Parts of Content

Following diagram tries to relate individual parts of content used in MetadatBrowser?.

Last modified 13 years ago Last modified on 05/27/11 10:08:37

Attachments (6)

MDBrowser_UseCaseDiagram.png (71.0 KB) - added by vronk 14 years ago.
screen_IMDI_mdsearch.png (44.8 KB) - added by vronk 14 years ago. Screenshot of the IMDI-Browser search interface
screen_MDBrowser_20100525_v2.png (52.8 KB) - added by vronk 14 years ago. Screenshot of the MetadataBrowser search user-interface (preliminary version)
SearchInput.png (41.8 KB) - added by vronk 14 years ago. Schematic view of a proposal for the query input UI-component of MetadataBrowser
mdbrowser_repomatrix.png (201.8 KB) - added by vronk 14 years ago. MDBrowser Repository Matrix
mdbrowser_components.png (81.0 KB) - added by vronk 14 years ago. relations between types of content (as components in the browser)

Download all attachments as: .zip

Download in other formats:

Plain Text