Opened 8 years ago
Last modified 8 years ago
#974 assigned defect
Provider - http://weblicht.sfs.uni-tuebingen.de/oaiprovider/
Reported by: | tomasz.naskret@pwr.edu.pl | Owned by: | Marie Hinrichs |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | Harvesting | Version: | |
Keywords: | OAI | Cc: | Menzo Windhouwer |
Description
All data records are generated with same data stamp.
This behavior prevents incremental harvest.
Attachments (1)
Change History (7)
comment:1 Changed 8 years ago by
Owner: | changed from Menzo.Windhouwer@mpi.nl to Marie Hinrichs |
---|---|
Status: | new → assigned |
comment:3 Changed 8 years ago by
We looked at the date stamps and they are correct, so we are not sure exactly what the problem is. Our proai tables only get regenerated if we delete a record, which is rare - the last time was in January 2016. Otherwise, the dates seem to be updated correctly as far as we can tell.
Can you provide some more information?
Thanks.
comment:4 Changed 8 years ago by
Cc: | Menzo Windhouwer added |
---|
Menzo is checking the default prOAI behaviour and will report back.
comment:5 Changed 8 years ago by
My DO in FC has as date:
2016-09-08T11:06:04.427Z
The FC oaiprovider requests the right information from the FC resource index:
<result> <item uri="info:fedora/lat:1839_00_0000_0000_0001_367F_7"/> <itemID>oai:flat.example.com:lat:1839_00_0000_0000_0001_367F_7</itemID> <date datatype="http://www.w3.org/2001/XMLSchema#dateTime">2016-09-08T11:06:04.427Z</date><state uri="info:fedora/fedora-system:def/model#Active"/> </result>
This gets properly stored by Proai in its cache:
<record> <header> <identifier>oai:flat.example.com:lat:1839_00_0000_0000_0001_367F_7</identifier> <datestamp>2016-09-08T11:06:04Z</datestamp> </header> <metadata> <CMD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">...</CMD> </metadata> </record>
In its bookkeeping Proai stores a timestamp of the poll in its database:
1 | 1 | 1 | 1473332872624 | 2016/09/08/11/07/45.729.xml
The epoch 1473332872624
is
Thu, 08 Sep 2016 11:07:52.624 GMT
, which is what we see if we request the record via OAI:
<header> <identifier>oai:flat.example.com:lat:1839_00_0000_0000_0001_367F_7</identifier> <datestamp>2016-09-08T11:07:52Z</datestamp> </header>
While delivering the cached record Proai replaces the right datestamp by the one it stored in its database:
https://github.com/fcrepo3/proai/blob/v1.1.3/src/java/proai/cache/CachedContent.java#L80
The why seems to be lost in the mist of time. I can experiment with a Proai fork where we disable this line ...
comment:6 Changed 8 years ago by
I've created a fork of Proai 1.1.3, which doesn't overwrite the cached modification date:
https://github.com/menzowindhouwer/proai/commit/870b2c759afdcca6fa49458e59c5b2e607ed8123
(I'll attach the proai-1.1.3.jar
, which can just replace the JAR in tomcat/webapps/oaiprovider/WEB-INF/lib/
directory)
However, this might be only a partial solution as the from/until OAI query will still be evaluated against the poll timestamp in the Proai database.
But is the use of the poll timestamp really a problem for incremental harvesting?
- t1: record 1 is created
- t2: record 2 is created
- t3: record 3 is created
- t4: Proai comes by and sees the created records 1, 2 and 3
- t5: a full harvests gets records 1, 2 and 3 from Proai with datestamp t4
- t6: record 3 is updated
- t7: Proai comes by and sees the updated record 3
- t8: an incremental harvests requests records since t5, so Proai delivers record 3 with datestamp t7
Using the poll timestamp just a means to manage the deltas seems fine to me and allowed by OAI-PMH: https://www.openarchives.org/OAI/openarchivesprotocol.html#SelectiveHarvestingandDatestamps
So, I think the incremental harvest will work fine with Proai as it is. But one can use this patch if the OAI record datestamp should contain the actual modification time of the record instead of the poll timestamp.
Hi Marie, could you have a look at this - would it be something that can be changed easily?