45 | | * Decoupling of importer logic from importer application 'shell', specifically to make this available to the curation module ([https://github.com/clarin-eric/VLO/issues/118 #118], also [https://github.com/clarin-eric/VLO/issues/50 #50]) |
46 | | * Landing page prominence ([https://github.com/clarin-eric/VLO/issues/153 #153]), also in search results (part of [https://github.com/clarin-eric/VLO/issues/164 #164]) |
| 45 | * Front end and back end aspects can be developed more or less separately |
| 46 | * At least for testing/exploration, also allow for grouping by other fields than the custom 'similarity signature' |
| 47 | * Decoupling of importer logic from importer application 'shell', specifically to make this available to the curation module ([https://github.com/clarin-eric/VLO/issues/181 #181], also [https://github.com/clarin-eric/VLO/issues/50 #50]) |
| 48 | * Landing page prominence ([https://github.com/clarin-eric/VLO/issues/153 #153]) |
| 49 | * Also in search results (part of [https://github.com/clarin-eric/VLO/issues/164 #164]) |
| 57 | |
| 58 | ==== Link checker ==== |
| 59 | |
| 60 | A first complete link checking run has been completed. |
| 61 | |
| 62 | MPI-PL have reported high server loads due to 'aggressive' polling by the link checker (see comments of #1052). A delay is being introduced and another full link checking operation will be carried out. |
| 63 | |
| 64 | The results and performance of the first few test runs will be analysed, and reported on (informally) at CLARIN 2018. At a later stage, the intention is to implement support for parsing and respecting `robots.txt` attributes (note: a [https://github.com/pandzel/RobotsTxt 3rd party Java library] exists for this). Exact parameters such as upper bound for acceptable request intervals have to be determined. |
| 65 | |
| 66 | Some improvements will be implemented with respect to the presentation of the results - overall and per collection. Aggregated counts per 'repsonse category' (probably based on response code, but also timeouts and other reponse issues) will be provided. Sample URLs for large collections (>50 records) should be selected to be informative and representative, and should primarily give a sample of the problematic cases. In the long term a view on the entire URL set should be available. Currently the full report XML is available for closer inspection. |