We're using drush jobs triggered by cron to process a queue, we run multiple jobs in parallel to update multifields attached to different entities, the problem is that multifield_get_next_id is using a static cache, so different for each process, so we end up with duplicate ids in the database. The final value is only saved after the insert/update using multifield_update_maximum_id

The fix is to load the entity and clear all item['id'] so they get recalculated, but we have to run this serially.

PS: This can also happen when multiple people are editing entities.

Comments

attiks’s picture

Status: Active » Needs work

A possible solution is to add a kind of base table which will contain the unique id and possible the name of the table that contains the data, if you (Dave) are ok with it we can write a patch.

The extra table will cause an extra insert for each new record, but might speed up the entity loads.

jelle_s’s picture

Status: Needs work » Needs review
StatusFileSize
new9.5 KB

Patch implementing the changes described in #1

dave reid’s picture

What about using db_next_id() instead? I'd rather not go into needing an actual base table and additional records that will be unnecessary. :/

attiks’s picture

+++ b/MultifieldEntityController.php
@@ -35,9 +35,12 @@ class MultifieldEntityController extends DrupalDefaultEntityController {
+    $multifields = db_select('multifield_entity', 'm')->fields('m')->condition('id', $ids)->execute()->fetchAll();
+    $multifield_ids = array();
+    foreach ($multifields as $multifield) {
+      $multifield_ids[$multifield->field_name][] = $multifield->id;
+    }
+    foreach ($multifield_ids as $field_name => $ids) {
       $query = new EntityFieldQuery();

One benefit is that the entity_load will not have to scan all tables.

dave reid’s picture

StatusFileSize
new1011 bytes

This is a little improvement on the brittleness of the next ID, ensuring that updating the maximum ID will update the static value, and not make it re-fetch using variable_get(), which could return an old value if being stored in Memcache or something that doesn't always return a fresh value.

dave reid’s picture

StatusFileSize
new1012 bytes

Fixed multifield_update_maximum_id() should use a >= instead of > when deciding to use variable_set()

attiks’s picture

#5, #6 looks better, but will not avoid the problem when running multiple parallel jobs, which is fine by me, but we should maybe warn users about this behavior?

dave reid’s picture

Yeah, #6 doesn't at all solve the parallel problem. I encountered an issue just with a single process and the static not updating correctly, so wanted to at least address that first. I think #2 is on to a better solution, I just need some time to dive into it, plus the update hook needs to be fixed.

attiks’s picture

#2 @Jelle_S

+++ b/multifield.install
@@ -141,3 +174,51 @@ function multifield_update_7101() {
+      . "SELECT f.field_product_prices_id AS id, f.entity_type AS entity_type, f.bundle AS bundle, 'field_product_prices' AS field_name "
+      . "FROM {field_data_field_product_prices} f");

field_product_prices_id can't be right ;-)

nimek’s picture

1 Why we cant solve this issue by using auto_increment for field_name_id in DB ?
It seems that id column is global and it should be unique among all field_name_id tables. Why it cant be unique per field_name? Than auto_increment will solve the issue.

I think that for now only solution from #2 solves the paralel issue. So if anybody want to use multifield on production site should apply it.

Maybe it would be wise to add warning about this issue on module desc page?

nimek’s picture

After some diging i have idea to solve id problem.

$m =  multifield_get_fields();
$query = '';
foreach($m as $field_name){
$query .= 'SELECT MAX('.$field_name.'_id) FROM {field_data_'.$field_name.'}';
if($field_name != end($m)){
$query .= ' UNION ';
}
}
$max_id = db_query($query)->fetchField();
$next_id = db_next_id(db_query($query)->fetchField());

I know the code is not safe but i want to show my idea.
1 we take biggest existing id using mysql query
2 we use db_next_id() to get next id safe value

What do you think about this solution?

  • Dave Reid committed e669d75 on 7.x-1.x
    Issue #2534348 by Dave Reid, Jelle_S: Fixed multifield_get_next_id() may...
dave reid’s picture

I committed #6 for now to 7.x-1.x.

mustanggb’s picture

As nimek suggests auto_increment is seems like a good idea to me, at least this is how Serial Field tackles this sort of thing.

  • Dave Reid committed e669d75 on 7.x-2.x
    Issue #2534348 by Dave Reid, Jelle_S: Fixed multifield_get_next_id() may...
dave reid’s picture

@MustangGB: Using an auto_increment column is not possible, because we are counting the maximum value across potentially many different tables. We have to assume the site has more than one multifield fields.

ima0123’s picture

I thinks this problem has not been solved in case of parallel access.
It needs to create singleton object that manage max number or store sequential max id in database.

How about this solution?