1 | |
---|
2 | Explore the \DUroletitlereference{Component Metadata Framework} |
---|
3 | |
---|
4 | In \emph{CMD}, metadata schemas are defined by profiles, that are constructed out of reusable components - collections |
---|
5 | of metadata fields. The components can contain other components, and they can be reused in multiple profiles. |
---|
6 | Furthermore, every CMD element (metadata field) refers via a PID to a data category to indicate unambiguously how the content of the field in a metadata description should |
---|
7 | be interpreted (Broeder et al., 2010). |
---|
8 | |
---|
9 | Thus, every profile can be expressed as a tree, with the profile component as the root node, the used components as intermediate nodes |
---|
10 | and elements or data categories as leaf nodes, parent-child relationship being defined by the inclusion (\texttt{componentA -includes-> componentB}) or referencing (\texttt{elementA -refersTo-> datcat1}).The reuse of components in multiple profiles and especially also the referencing of the same data categories in multiple CMD elements leads to a blending of the individual profile trees into a graph (acyclic directed, but not necessarily connected). |
---|
11 | |
---|
12 | SMC Browser visualizes this graph structure in an interactive fashion. You can have a look at the \href{examples.html}{examples} for inspiration. |
---|
13 | |
---|
14 | It is implemented on top of wonderful js-library \href{https://github.com/mbostock/d3}{d3}, the code checked in \href{https://svn.clarin.eu/SMC/trunk/SMC}{clarin-svn} (and needs refactoring). More technical documentation follows soon. |
---|
15 | |
---|
16 | |
---|
17 | \subsection{Data% |
---|
18 | \label{data}% |
---|
19 | } |
---|
20 | |
---|
21 | The graph is constructed from all profiles defined in the \href{http://catalog.clarin.eu/ds/ComponentRegistry/\#}{Component Registry}. |
---|
22 | To resolve name and description of data categories referenced in the CMD elements |
---|
23 | definitions of all (public) data categories from \href{http://dublincore.org}{DublinCore} and \href{http://www.isocat.org}{ISOcat} (from the \href{http://www.isocat.org/rest/profile/5.rdf}{Metadata Profile} {[}RDF{]} - retrieving takes some time!) are fetched. However only data categories used in CMD will get part of the graph. Here is a \href{smc_stats.html}{quantitative summary} of the dataset. |
---|
24 | |
---|
25 | When inspecting the numbers, it is important to be aware of the occurrence expansion resulting from the reusability of the components. |
---|
26 | So in an example, a component C has 2 subcomponents and is reused within one profile by two other components A and B, the resulting profile |
---|
27 | will consist of (at least) 8 components (\texttt{{[}A, B, A/C, B/C, A/C/C1, A/C/C2, B/C/C1, B/C/C2{]}}), although only 5 distinct components are used. |
---|
28 | The same goes for elements in reused components. In most cases it is indicated in the label, if the number reflect distinct items, or all (expanded) occurrences. |
---|
29 | |
---|
30 | (Some of the) numbers in the statistics lead to a list of corresponding terms. |
---|
31 | E.g. in the summary for a profile, clicking on the components-number lists all the components of given profile alphabetically. |
---|
32 | Currently there are such lists for: |
---|
33 | % |
---|
34 | \begin{quote} |
---|
35 | % |
---|
36 | \begin{itemize} |
---|
37 | |
---|
38 | \item \texttt{profile -> components} |
---|
39 | |
---|
40 | \item \texttt{profile -> elements} |
---|
41 | |
---|
42 | \item \texttt{profile -> data categories} |
---|
43 | |
---|
44 | \item \texttt{data category -> profiles} |
---|
45 | |
---|
46 | \end{itemize} |
---|
47 | |
---|
48 | \end{quote} |
---|
49 | |
---|
50 | |
---|
51 | \subsection{User Interface% |
---|
52 | \label{user-interface}% |
---|
53 | } |
---|
54 | |
---|
55 | The User interface is divided into 4 main parts: |
---|
56 | % |
---|
57 | \begin{description} |
---|
58 | \item[{Index}] \leavevmode |
---|
59 | Lists all available Profiles, Components, Elements and used Data Categories |
---|
60 | The lists can be filtered (enter search pattern in the input box at the top of the index-pane) |
---|
61 | By clicking on individual items, they are added to the \DUroletitlereference{selected nodes} and get rendered in the graph pane |
---|
62 | |
---|
63 | \item[{Main (Graph)}] \leavevmode |
---|
64 | Pane for rendering the graph. |
---|
65 | |
---|
66 | \item[{Navigation}] \leavevmode |
---|
67 | This is the control panel governing the rendering of the graph. See below for available \hyperref[options]{Options}. |
---|
68 | |
---|
69 | \item[{Detail}] \leavevmode |
---|
70 | In this pane, overall summary of the data is displayed by default, |
---|
71 | but mainly the detail information about the selected nodes is listed here. |
---|
72 | |
---|
73 | \end{description} |
---|
74 | |
---|
75 | |
---|
76 | \subsection{Interaction% |
---|
77 | \label{interaction}% |
---|
78 | } |
---|
79 | |
---|
80 | Following data sets are distinguished wrt user interaction: |
---|
81 | % |
---|
82 | \begin{description} |
---|
83 | \item[{all data}] \leavevmode |
---|
84 | the full graph with all profiles, components, elements and data categories and links between them. |
---|
85 | |
---|
86 | Currently this amounts to roughly 2.000 nodes and 3.000 links |
---|
87 | |
---|
88 | \item[{selected nodes}] \leavevmode |
---|
89 | nodes explicitely selected by the user (see below how to \hyperref[select-nodes]{select nodes}) |
---|
90 | |
---|
91 | \item[{data to show}] \leavevmode |
---|
92 | the subset of data that shall be displayed. |
---|
93 | |
---|
94 | Starting from the selected nodes, connected nodes (and connecting edges) |
---|
95 | are determined based on the options (\texttt{depth-before}, \texttt{depth-after}). |
---|
96 | |
---|
97 | \end{description} |
---|
98 | |
---|
99 | The nodes are colour-coded by type: |
---|
100 | |
---|
101 | \includegraphics[height=100px]{images/graph_legend.png} |
---|
102 | |
---|
103 | \phantomsection\label{select-nodes} |
---|
104 | There are multiple ways to select/unselect nodes: |
---|
105 | % |
---|
106 | \begin{description} |
---|
107 | \item[{select from index}] \leavevmode |
---|
108 | by clicking individual items in the index list, the item will be \textbf{added} to the selected nodes |
---|
109 | |
---|
110 | clicking on an already selected item unselects it |
---|
111 | |
---|
112 | \item[{select in graph}] \leavevmode |
---|
113 | by clicking on a visible node in the graph, the node will be \textbf{added} to the selected nodes |
---|
114 | |
---|
115 | clicking on an already selected node unselects it |
---|
116 | |
---|
117 | \item[{select area in graph}] \leavevmode |
---|
118 | by dragging (hold mouse button down and pull) a rectangle in the graph pane, all nodes within that rectangle get selected |
---|
119 | all other nodes will be unselected |
---|
120 | |
---|
121 | \item[{unselect in detail pane}] \leavevmode |
---|
122 | clicking on an item in the detail pane unselects it |
---|
123 | |
---|
124 | \item[{select in statistics}] \leavevmode |
---|
125 | as mentioned in \hyperref[data]{Data} (some) numbers in the statistics reveal a list of corresponding terms. |
---|
126 | Clicking on these terms in the statistics page leads to the browser, with given term as selected node (and default settings) |
---|
127 | |
---|
128 | \item[{select in statistics in the detail pane}] \leavevmode |
---|
129 | the numbers from statistics page are shown also in the detail pane for selected nodes. |
---|
130 | Here, clicking on a term from these lists adds it to the graph, as a selected node. |
---|
131 | |
---|
132 | \item[{mouseover}] \leavevmode |
---|
133 | on mouse over a node, all connected nodes to given node (and connecting links) within the visible sub-graph are highlighted |
---|
134 | and all other nodes and links are faded |
---|
135 | |
---|
136 | \item[{drag a node}] \leavevmode |
---|
137 | click and hold on a node, one can move the node around, however usually the layout is stronger |
---|
138 | and puts the node back to its original position. Not so with the freeze-layout, that freezes all the nodes and lets you move them around freely |
---|
139 | |
---|
140 | \end{description} |
---|
141 | |
---|
142 | |
---|
143 | \subsection{Options% |
---|
144 | \label{options}% |
---|
145 | } |
---|
146 | |
---|
147 | The navigation pane provides following option to control the rendering of the graph: |
---|
148 | % |
---|
149 | \begin{description} |
---|
150 | \item[{depth-before}] \leavevmode |
---|
151 | how many levels of connected ancestor nodes shall be displayed |
---|
152 | |
---|
153 | \item[{depth-after}] \leavevmode |
---|
154 | how many levels of connected descendant nodes shall be displayed |
---|
155 | |
---|
156 | \item[{link-distance}] \leavevmode |
---|
157 | approximate distance between individual nodes |
---|
158 | (not exact, because it is just one of multiple factor for the layouting of the graph) |
---|
159 | |
---|
160 | \item[{charge}] \leavevmode |
---|
161 | the higher the charge, the more the nodes tend to drift apart |
---|
162 | |
---|
163 | \item[{friction}] \leavevmode |
---|
164 | factor for ``cooling down'' the layout, lower numbers (50-70) stabilize the graph more quickly, |
---|
165 | but it may be too early, with higher numbers (95-100) the layout has more time/freedom to arrange, |
---|
166 | but may get jittery |
---|
167 | |
---|
168 | \item[{node-size}] \leavevmode |
---|
169 | N = all nodes have given diameter N; |
---|
170 | |
---|
171 | usage = node is scaled based on how often the node appears in the complete dataset |
---|
172 | i.e. often reused elements (like description or language) will be bigger |
---|
173 | |
---|
174 | \item[{labels}] \leavevmode |
---|
175 | show/hide all labels |
---|
176 | hiding the labels accelerates the rendering significantly, which may be an issue if more nodes are displayed. |
---|
177 | irrespective of this option, on mouseover labels for all and only the highlighted nodes are displayed |
---|
178 | |
---|
179 | \item[{curve}] \leavevmode |
---|
180 | straight or arc (better visibility) |
---|
181 | |
---|
182 | \item[{layout}] \leavevmode |
---|
183 | There are a few layouting algorithms provided. They are all not optimal in any way, but most of the time, they deliver quite good results. |
---|
184 | For different data displayed other algorithm may be more appropriate: |
---|
185 | % |
---|
186 | \begin{description} |
---|
187 | \item[{force}] \leavevmode |
---|
188 | undirected layout, trying to spread the nodes in the pane optimally, equally in all directions |
---|
189 | This is the underlying \href{https://github.com/mbostock/d3/wiki/Force-Layout}{layouting algorithm}. All the other layouts build on top of it, by just adding further constraints. |
---|
190 | |
---|
191 | \item[{vertical-tree}] \leavevmode |
---|
192 | top-down layout respect the direction of the edges, children are always below the parents |
---|
193 | |
---|
194 | \item[{horizontal-tree}] \leavevmode |
---|
195 | left-right layout respect the direction of the edges, children are always right to the parents |
---|
196 | (at least they should be, currently, in certain configurations, the layout does not get the orientation for some links right) |
---|
197 | |
---|
198 | \item[{weak-tree}] \leavevmode |
---|
199 | a layout that ``tends'' towards left to right arrangement, but not strictly so (experimental) |
---|
200 | |
---|
201 | \item[{dot}] \leavevmode |
---|
202 | strict left to right reusing the x-positioning as determined by \href{http://www.graphviz.org/}{dot} |
---|
203 | Arranges the nodes in strict ranks (typical for dot layout) |
---|
204 | This is done in a separate preprocessing step for the whole graph, so the positioning may be suboptimal |
---|
205 | for a given subgraph. The y-coordinate is approximated on the fly by the base algorithm. |
---|
206 | |
---|
207 | \item[{freeze}] \leavevmode |
---|
208 | this is actually a ``no-layout'' - the nodes just stay fixed in their last position, |
---|
209 | However, individual nodes still can be dragged around, so this can be used to adjust a few nodes for better legibility (or aesthetics), |
---|
210 | but only when you start moving around inividual nodes, you will learn to appreciate the great (and tedious) work of the layouting algorithms, |
---|
211 | so generally you want to try to play around with the other settings to achieve a satisfying result. |
---|
212 | |
---|
213 | \end{description} |
---|
214 | |
---|
215 | \end{description} |
---|
216 | |
---|
217 | |
---|
218 | \subsection{Linking, Export% |
---|
219 | \label{linking-export}% |
---|
220 | } |
---|
221 | |
---|
222 | The navigation pane exposes a \textbf{link}, that captures the exact current state of the interface |
---|
223 | (just the options and the selection, not the positioning of the elements), |
---|
224 | so that it can be bookmarked, emailed etc. |
---|
225 | |
---|
226 | Furthermore, there is the \textbf{download}, that allows to export the current graph as SVG. |
---|
227 | This is accomplished without a round trip to the server, with a \href{https://groups.google.com/forum/?fromgroups=\#!topic/d3-js/aQSWnEDFxIc}{javascript trick} |
---|
228 | serializing the svg as base64-data into the url (so you don't want to save (or see) the exported url). |
---|
229 | But you can both, right click the link and {[}Save link as...{]}, or click on the link, which opens the SVG in a new tab |
---|
230 | where you can view, resize, print and save it. |
---|
231 | Employing this simple method also means, that there is no possibility to export the graph in PNG, PDF or any other format, |
---|
232 | because this would require \href{http://d3export.cancan.cshl.edu/}{server-side processing}. (However this is a planned future enhancement.) |
---|
233 | |
---|
234 | |
---|
235 | \subsection{Issues% |
---|
236 | \label{issues}% |
---|
237 | } |
---|
238 | % |
---|
239 | \begin{description} |
---|
240 | \item[{Performance}] \leavevmode |
---|
241 | Chrome is by far the fastest, followed by IE(9). |
---|
242 | A serious performance degradation was observed for graphs above 200 nodes on Firefox. |
---|
243 | Showing labels also significantly affects performance. |
---|
244 | |
---|
245 | \item[{Bounds}] \leavevmode |
---|
246 | When the graph gets to big, it does not fit in the viewing pane. |
---|
247 | This will be tackled soon (either scrollbars or applying boundaries). Meanwhile, |
---|
248 | you can reduce the link-distance and charge parameters or change the layout. |
---|
249 | |
---|
250 | \end{description} |
---|
251 | |
---|
252 | |
---|
253 | \subsection{Plans and ToDos% |
---|
254 | \label{plans-and-todos}% |
---|
255 | } |
---|
256 | |
---|
257 | Substantial issues: |
---|
258 | % |
---|
259 | \begin{itemize} |
---|
260 | |
---|
261 | \item Add information from \textbf{RelationRegistry} (relations between DatCats) |
---|
262 | |
---|
263 | \item Blend in instance data from \textbf{MDRepository} (allow search on MDRepository) |
---|
264 | |
---|
265 | \item graph operations (intersect, difference of subrgraphs) |
---|
266 | |
---|
267 | \end{itemize} |
---|
268 | |
---|
269 | Smaller enhancements of the user interface: |
---|
270 | % |
---|
271 | \begin{itemize} |
---|
272 | |
---|
273 | \item select nodes by querying the names (e.g. show me all nodes with ``Access'' in their name) |
---|
274 | |
---|
275 | \item option to show only selected types of nodes (e.g. only profiles and datcats) |
---|
276 | |
---|
277 | \item detail-info on hover |
---|
278 | |
---|
279 | \item full HTML-rendering of a node (Profile, Component) |
---|
280 | |
---|
281 | \item backlinking from detail (e.g. view all the profiles a data category is used in by clicking on the number ('used in profiles') |
---|
282 | |
---|
283 | \item store/export SVG/PDF/PNG-renderings of the graphs |
---|
284 | |
---|
285 | \item add edge-weight: scale based on usage, i.e. how often appears the relation in the complete dataset |
---|
286 | i.e. often reused combinations of components/elements will be nearer |
---|
287 | |
---|
288 | \item allow to blend in further (private) CMD-profiles dynamically |
---|
289 | |
---|
290 | \end{itemize} |
---|
291 | |
---|