Back in #2490420: EntityAutocomplete element settings allows sql injection and for arbitrary user-supplied data to be passed into unserialize(), settings started being stored in key-value and although there was a bit of discussion there about doing expiration, going for the less complicated method of having no expiration was chosen in order to more quickly release the fix.

Fast forward to present day. 82% of the data in our key_value table is the entity_autocomplete collection. We're talking millions upon millions of rows (gigabytes of information), most of which will never be touched again and will never be cleaned up. And this is only a 2-year old website where virtually all of the traffic is authenticated.

Permanent retention of this data is not sustainable. Even if a relatively high threshold is chosen for the expiry, it would still be preferable to keeping them in perpetuity.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

kevin.dutra created an issue. See original summary.

jibran’s picture

Component: forms system » entity system

The piece of code we are talking about here is:

    // Store the selection settings in the key/value store and pass a hashed key
    // in the route parameters.
    $selection_settings = isset($element['#selection_settings']) ? $element['#selection_settings'] : [];
    $data = serialize($selection_settings) . $element['#target_type'] . $element['#selection_handler'];
    $selection_settings_key = Crypt::hmacBase64($data, Settings::getHashSalt());

    $key_value_storage = \Drupal::keyValue('entity_autocomplete');
    if (!$key_value_storage->has($selection_settings_key)) {
      $key_value_storage->set($selection_settings_key, $selection_settings);
    }

While talking to @larowlan he suggested we can just use \Drupal::keyValueExpirable() instead of \Drupal::keyValue() here.

jibran’s picture

\Drupal::keyValue('entity_autocomplete')->getAll() shows that it stores one value per EntityAutocomplete element.

hchonov’s picture

According to #2 the change should be pretty straightforward and this might go in as a quick fix.

kevin.dutra’s picture

Assigned: Unassigned » kevin.dutra
kevin.dutra’s picture

Assigned: kevin.dutra » Unassigned
Status: Active » Needs review
FileSize
4.47 KB

I went with a full 24 hours for expiration time, which seems plenty long, but happy to adjust if anyone feels strongly. At least it actually expires at some point now. :)

Status: Needs review » Needs work
Berdir’s picture

Two quick thoughts:

a) I suspect you are relying on some patches that allow to consider the current entity in the autocomplete, e.g. excluding it. To end up with that many rows, you need to have a dynamic component in there because by default, it's just the static field settings. So I don't think you can get millions of records in there without some special setup.

b) The problem wit hsetWithExpireIfNotExists() is that it doesn't renew the expire. 24 might sound fine at first, but it's perfectly possible that you call it 23h59m after it had been set and by the time you try to use the autocomplete, the records have been purged.

kevin.dutra’s picture

Minor tweak to appease the testbot.

a) I suspect you are relying on some patches that allow to consider the current entity in the autocomplete, e.g. excluding it. To end up with that many rows, you need to have a dynamic component in there because by default, it's just the static field settings. So I don't think you can get millions of records in there without some special setup.

You are absolutely correct. I missed it the first time around, but we are using the work in progress from #1699378: Allow tokens in entity reference views selection arguments and that's what is introducing the dynamic part -- the entity that is being operated on. Given that something along these lines might show up at some point in the future, it'd be nice to be ready for when it arrives. Even without that, there's potential for unused cruft to be lying around in the table indefinitely (e.g. after deleting a field). It might not amount to much, but it's still cruft.

b) The problem wit hsetWithExpireIfNotExists() is that it doesn't renew the expire. 24 might sound fine at first, but it's perfectly possible that you call it 23h59m after it had been set and by the time you try to use the autocomplete, the records have been purged.

True. If I put on my end user hat though, I probably wouldn't expect a form to submit successfully if it had been sitting open for that long. Many sites would have killed my session already, so it would fail from that. But you gave me a potentially better idea: what if we tie it to the expiry of the form/form state cache? If the cached form state is deleted, I think you're likely to run into other weird problems because of the missing data, so the autocomplete would be a moot point. Plus, that's adjustable, so it leaves it open to the site builder to set the expiration according to their needs. (Note: I think that's only 6 hours by default though.)

Status: Needs review » Needs work
kevin.dutra’s picture

Whoops, missed a few other test adjustments. I think this is the rest of them.

Berdir’s picture

I think you misunderstood my second point. Even if you set the expiration to 1 year, it's still possible that you open a form with these settings just seconds before expiration. Unlike the form state storage, which is always written and always valid for 6h from when you start.

Also, in Drupal 8, the form state storage is actually only initialized on the first ajax request. The user/register form could for example have an autocomplete field, or a webform and that could be cached for days. And nothing will renew the expirable storage then, resulting in a broken autocomplete in both cases.

That's why we went with the non-expirable key value store. You should probably find the discussion about that in #2490420: EntityAutocomplete element settings allows sql injection and for arbitrary user-supplied data to be passed into unserialize() when we implemented this.

kevin.dutra’s picture

Ah! Okay, now I follow what you're saying. I think these two adjustments should address that issue:

  1. Use setWithExpire() instead of setWithExpireIfNotExists() so that the expiration will always get pushed further out.
  2. Sync up the max age of the form element to make sure we don't end up with page cache being used when the key value item is already gone.
Berdir’s picture

Well, that results in constant writes to the storage, which the current solution doesn't.

And note that the internal page cache does not currently respect max-age.

I'm not trying to just complain, and I agree there is an issue when using a patch like you do. But as I said, there's a reason we didn't do it like this initially.

The problem with any patch that passes the whole entity through this is that it doesn't just result in a lot of unique keys but the configuration is also massively bigger than what core deals with. Which means that an alternative solution like avoid persistent storage completely by passing a signed json structure for example doesn't work either.

kevin.dutra’s picture

Well from a basic feasibility standpoint, if the page cache doesn't respect max-age, then I'm not thinking of a way to keep things in sync without doing something hacky, so I guess this solution is dead in the water. The heavier writing is also not ideal, but as far as I can see would be necessary to ensure that it's still around for as long as it needs to be.

In any case, since this is (mostly) a byproduct of that other patch, I guess we should just close this as "won't fix" and put the burden onto that other issue to go back to the drawing board? I mean, you'll still have some cruft from core that could be floating around, but it's not likely to be very much unless you're constantly adding and removing fields.

Berdir’s picture

Not sure. I guess the other issue is basically blocked on solving this, or finding a different approach that doesn't result in this problem.

I'm fine with keeping this open, I think there's another similar issue somewhere too. Maybe @amateescu has some thoughts on this and the related feature rquests.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

acbramley’s picture

Just ran into this on a project that had a mammoth 1.1gig db dump, tracked down to the key_value table with over 110,000 rows in the entity_autocomplete collection. I'm yet to track down what configuration is causing this dynamic nature.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

cosolom’s picture

Rerolled for D9.4. Just in case if someone require it. Without tests and hook_install, because it can affect on next core updates.
Need to clean database manually after patch with drush command:
drush ev "\Drupal::keyValue('entity_autocomplete')->deleteAll();"

Status: Needs review » Needs work

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.