wiki:SemanticMapping

Semantic Mapping service is (at least in the abstract definition) a separate component of CMDI used by MDService for query expansion based on semantic relations between terms (data categories or CMD-Components, -Elements), making itself use of the RelationRegistry storing the relations.

However the division of competencies (separation of concerns) and thus also the interface of the SM-service is not completely clear yet, the main problem being the resolution of names to identifiers. Some notes on it:

  • RelationRegistry should store the relations based on identifiers.
  • In interaction with the user and the MDRepository MDService/Browser uses the names of the components and elements.
  • As MDService knows both the (ambiguous) names and the appropriate identifiers, it would be trivial to let it do the resolution. However than it would have to do the whole processing (see below) itself, leaving SemanticMapping with not much to do.
  • Thus it seems more appropriate for the SM-service to accept a cmdIndex-formatted string, do the necessary processing itself and return already resolved semantically equivalent combinations. But for this it would have to query the ComponentRegistry itself, which actually has already been done by the MDService.

Example

An example for the "basic" idea:

If

  1. a MD-query contains an index-string (according to cmdIndex-syntax), eg:
    Actor.Name any Peter
    
  2. semantic mapping level is set to 3
  3. and Relation Registry contains following relations (#Xxx stands for identifier) :
    #sameAs (#Actor, #Person)
    #sameAs (#Name, #FullName)
    

Then after applying semantic mapping the query should be expanded to:

 Actor.Name any Peter      
 OR Actor.FullName any Peter  
 OR Person.Name any Peter     
 OR Person.FullName any Peter 


The processing model

  1. if the parameter is already an identifier go to next step
    if the parameter is in cmdIndex format
    1. split it into individual components
    2. translate Names to URIs (conceptlink or component-id)
  2. find correspondences/alternatives for every component (see level 1-N) querying ComponentRegistry or RelationRegistry with the identifiers as arguments
  3. create all combinations of the alternatives
  4. optionally check in ComponentRegistry for existence of given combinations
  5. optionally display the expanded query to the user

Semantic Mapping levels (provided to the user to decide)

level 1
just mapping based on the ConceptLink can be resolved with information from ComponentRegistry
  1. If the input-parameter is a data category CompReg? is queried for all the components containing elements with given @ConceptLink:
     CompReg.concept2components-elements(DatCat)
    
  2. If the input-parameter is an element or component, first the appropriate ConceptLink? is looked up by MDService (generally in the ComponentRegistry, but normally MDService will have this information cached), only then continuing as stated above.
level 2
use informaiton from RelationRegistry, but just equivalence-relation (sameAs)
level 3
from RelationRegistry, also regard Component Data Categories (as described in #Example)
requires combining the equivalents of the Elements with those of the Components
level 4
from RelationRegistry, but also regard other relations (subClassOf, synonymy?)

Searching in Datacategories

Procedure for searching in Datacategories (SM level 1)

  1. identify DatCats? used by combining the information on usage from MDRepository with the @ConceptLinks on Elements in the CMD-Profiles
  2. provide the list of usable DatCats?, with name (and description) resolved from isocatDCR,
    • optionally in selected language
    • the list can be further expanded based on the information in RelationRegistry
  3. user selects a datacategory as an index to search in and provides the search term (or is provided with allowed values based on usage-data from MDRepository)
  4. the query is rewritten - the datacategory is translated to the equivalent xml-elems (cmdIndex or XPath in general)
  5. in the result-view (in table-format) the column shall be named by the datacategory, but the values have to be retrieved based on the cmdIndex again.
Last modified 13 years ago Last modified on 01/15/11 13:50:10