A small error in the _get_token() method results in none unique tokens being return. This means that when listing all the records in a repository, an infinite loop can be entered.

A microtime of: 0.65573800 1195483767
Returns 656 + 1195483767 = 1195484423
Just under two seconds later ...
A microtime of: 0.65399921 1195483769
Returns 654 + 1195483769 = 1195484423

I've changed the _get_token function so that it also multiplies the seconds by the same value as the fraction of a second, resulting in a greater probability of the token being unique (now a request would have to occur within 1/10000 of a second of another).

Also included in the patch is an improvement to the SQL which stops the full query being executed to ascertain the maximum number of results obtainable (instead a COUNT query is executed).

Finally, one more point, we're using UTF-8 for our databases, and have found that the output of an OAI query is encoded twice. By commenting out the utf8_encode function (not included in this patch) we've fixed this. However, what we'd like to know is how do you define the charset being used?

CommentFileSizeAuthor
oai2.patch716 bytessdrycroft

Comments

rjerome’s picture

I've added this patch and commented out the utf-8 encoding. Your right this would be redundant. I ported this code from http://physnet.uni-oldenburg.de/oai/, and to be honest I don't know how the character set is being defined.

rjerome’s picture

Further to this, I've changed the token generation again to use the PHP uniqid() call to generate a unique 13 character string rather than trying to generate it from the time value.