Opened 9 years ago

Last modified 9 years ago

#789 new defect

VLO exposes to little meta-data / too much data

Reported by: tomaz.erjavec@ijs.si Owned by: haaf@bbaw.de
Priority: major Milestone: CLARIN-D Q3/2015
Component: VLO-WG Version: VLO-3.0
Keywords: Legal Cc: hajic@ufal.mff.cuni.cz, Jozef Mišutka, teckart@informatik.uni-leipzig.de, Twan Goosen, davor.ostojic@oeaw.ac.at

Description

I have concerns how data is presented in the VLO in particular that it is possible to download data files directly from the VLO, without going through the national repository; at the same time, the VLO does not show either the availability conditions nor the authors of the resource and in some contexts it does not even show its handle.

So, somebody downloading data from the VLO will not be aware of what availability conditions are (e.g. NC only). Furthermore, all or almost all licenses will specify at least BY, but by downloading from the VLO users won't even be aware of who the authors are or how to cite the resource (LINDAT has a field giving you exactly that). Also note that because the assumption was that users will download from the national repositories where all the meta-data is given, the data files themselves will typically not have any metadata - so, by downloading from the VLO, the users will never get any licensing and attribution information.

I would suggest that the option to download data is removed from VLO, but the handle is prominently displayed. This gives only one more click for the user to get to the data, but they will then get all the meta-data including licensing, authors and how to cite the resource - the national repositories went to a lot of trouble to make the meta-data complete, and doesn't make sense for the VLO to ignore most of it.

Change History (4)

comment:1 Changed 9 years ago by Jörg Knappen

It depends on what you show in your metadata to the VLO. We (Universität des Saarlandes) only show the landing pages of our resources to the VLO and ensure that way that a visitor sees the terms of use of our resources.

comment:2 in reply to:  1 Changed 9 years ago by Jozef Mišutka

Replying to j.knappen@…:

It depends on what you show in your metadata to the VLO. We (Universität des Saarlandes) only show the landing pages of our resources to the VLO and ensure that way that a visitor sees the terms of use of our resources.

I agree that hiding this metadata on purpose fixes the problem for the centre.

Nevertheless, the metadata are correctly filled out according to
https://www.clarin.eu/faq/how-do-i-point-files-im-describing-cmdi-how-does-resources-section-work
If you do not fill out the Resource section, harvesters might think that there are no files attached.

In this case, I would say that it is up to VLO (or any other harvester) to decide whether to show/use the links or not. I would say that if VLO does not know the license it should not display it.

Therefore, the afore mentioned metadata section should either include a reference to the license for each resource or the CLARIN FAQ should contain a warning describing "hiding metadata" approach and that harvesters should not make any assumptions about files if no Resource type is given.

comment:3 Changed 9 years ago by tomaz.erjavec@ijs.si

Thanks for both comments - the first one does offer a solution, but I agree with the second one that this situation is far from ideal and also mis/un-documented.
Anyway, we will follow the lead of Saarland and restrict exported metadata, but it would be good if a better, CLARIN-wide solution were found. After all, this is an important issue, CLARIN has a whole WG devoted to legal issues, and here it fails on the first practical hurdle...

comment:4 Changed 9 years ago by Twan Goosen

Cc: teckart@informatik.uni-leipzig.de Twan Goosen davor.ostojic@oeaw.ac.at added

added VLO developers to cc

Note: See TracTickets for help on using tickets.