We did some performance improvements already by converting it to a raw query, but it didn't help too much, still seeing major performance issues caused by it on a large site with large pages and lots of queries on that table (100+ on some pages). There are "only" 10k crops right now, the queries will become slower and slower the more crops there are.
There are two possible improvements:
1) Add an index for the type/uri fields: alter table crop_field_data add index type_uri (type, uri(100))
; I already did that on our site now by hand and it helps a ton for a single query but there are still 100 queries to run. We can add this in a custom storage schem a class (see NodeStorageSchema and how it is used) + an update function to add the index.
2) Implement some kind of caching for it. A 1:1 caching will not help much for a (after solving 1) fast query, so we'd need something better. The cache only needs to contain very little data (the crop ID) so we can group things together with a cache collector or so. One option would be to do it by current path like aliases or we could possibly also somehow group the images together (hash them and then group by the first few characters? would need to be tested for overhead). Not quite sure this is worth it, we'll keep an eye on the performance of this site now that we added the index.
Comment | File | Size | Author |
---|---|---|---|
#59 | crop-optimise_queries-2975722-59-interdiff.txt | 1.08 KB | Berdir |
#59 | crop-optimise_queries-2975722-59.patch | 10.3 KB | Berdir |
| |||
#47 | crop-optimise_queries-2975722-44-interdiff.txt | 4.37 KB | Berdir |
#44 | crop-optimise_queries-2975722-44.patch | 9.72 KB | pdenooijer |
| |||
#36 | crop-slow_queries-2975722-36.patch | 8.77 KB | rhristov |
|
Comments
Comment #2
yobottehg CreditAttribution: yobottehg commentedI also saw with blackfire profiler profiling some jsonapi routes that for each image style that is exposed the following db call is executed:
Comment #3
Berdir4300 means your json api response includes 4300 image style links? I guess a lot of nodes multiplied by a lot of image styles, (response image styles?)
I don't think we can get rid of the query as crop.module has to *somehow* check image style if there is a crop configuration for it.
I'm still seeing this query a lot too but not that often.
Ideas:
* Check if we can implement a static cache for for a specific image + crop type
* Check if we can add an (improved index) for that query, I thought we already have one but not sure
* Return fewer image style links in your response
Comment #4
yobottehg CreditAttribution: yobottehg commentedIt's basically 10 crop types and 2 different resulting image sizes. So at the end 20 image styles.
I know it's a lot but this happens if you apply the patch from #2825812 and serialize a node with paragraphs which have media entities which you include.
I can probably reduce this to half the styles but the problem still exists.
Comment #5
BerdirYes, this perfectly shows why I've been arguing in that issue that we should *not* expose all the image styles all the time/by default IMHO, I would comment there about that.
10 different *crop types*, why do you need so many different crop types? Are you using image_widget_crop? Do you really have that many different aspect ratios in your images where users need to select that many different crop versions?
> I can probably reduce this to half the styles but the problem still exists.
Yes, but as I said, this is by design and there's no way to change that completely. We do need a query for each image + crop type at least.
That said, the query is afaik mostly/only needed for cache busting in case the urls change. I guess that could be made configurable *but* then you won't have a way to invalidate image styles in a CDN/cache if your users change the crop configuration.
Comment #6
Wim LeersI discovered what @yobottehg describes in #2, thanks to the Blackfire.io profile that @yobottehg reported in #3014232 — see #3014232-18: [regression] ResourceTypeRepository is significantly slower in JSON:API 2, becomes noticeable when handling hundreds of interlinked resources.3.
Comment #7
Wim LeersI'm surprised that the crop is applied during
hook_file_url_alter()
. Why not extendFile
entities to have acrop
field, which would move the processing to occur earlier, and allow the crop to be cached along with the entity? That'd still preserve the non-destructive nature of the Crop module (which seems to be the overarching yet undocumented design goal?).Comment #8
BerdirA file can have N crop configurations, each image style then (optionally) then has configuration for using that type. So we'd have to load them all, always and even then, I'm not quite sure how it would help because the file ID/entity is not accessible from places that allow us to alter the URL?
Don't see how that would help :)
Comment #9
Wim Leers#8 makes me understand this module even less than I did before. 😵 I really want to, but reading
crop.module
,\Drupal\crop\Entity\Crop
,\Drupal\crop\Entity\CropType
,crop.api.php
and its README really do not explain this module's architecture, how it slots into existing APIs, how it's meant to be used, and the rationale :(Comment #10
Wim LeersAlso seems like
Crop::findCrop()
would benefit from some static caching, since bothcrop
'sCrop::getCropFromImageStyle(
andimageapi_optimize
'sImageStyleWithPipeline::transformDimensions()
are calling it, resulting in 4374 calls to it for "only" 1254 unique image style + file combinations:EDIT: source: the Blackfire.io profile posted at #3014232-23: [regression] ResourceTypeRepository is significantly slower in JSON:API 2, becomes noticeable when handling hundreds of interlinked resources.
Comment #11
BerdirYeah, that definitely, that's one thing I mentioned too:
* Check if we can implement a static cache for for a specific image + crop type
I suspect it can even be lower because hopefully quite a few of those image styles also use the same crop type.
Patches welcome, preferably in a separate issue I think :)
Possibly a separate static cache for getting the crop type from an image style and then getting the result of that crop type + file, if we don't have the first either yet?
Comment #12
Wim LeersSounds sensible!
Comment #13
ndobromirov CreditAttribution: ndobromirov at FFW commentedHere is the PoC static cache on the method under discussion. Dropped time about 30-35% withing the method. In my case 450ms -> 300ms.
Comment #15
BerdirThanks for starting this.
I think it would be more efficient to cache by $type/$provider because you might have 10 different image styles all using the same type.
I also didn't really look at this code before, but it looks like if there is no crop for that type, it also checks all providers separately. I didn't write that code, so not sure what that's for exactly, but at the very least, we can optimize that a lot to only do a single query that looks for a crop using the $type or any of the providers and in the very unlikely case of multiple matches, prefer the one from the type.
If you have 3 effects then that would cut down the amount of queries from 4 to 1, plus statically caching that.
Seeing that, I guess the actual static cache list will need to be more intelligent, either do a separat static cache for each possible type/provider, or combine it together into a string.
Also, these kind of caches usually use a regular protected property, then it will also be reset properly during tests.
Comment #16
ndobromirov CreditAttribution: ndobromirov at FFW commentedI think an index on the data will help as well, but I am not sure where to write the schema alter code.
On top of that we should be able to break on the first crop found from a provider. No need to loop the whole effects list if a crop is found.
Comment #17
BerdirMy suggestion is to convert that into a single query, with a condition like this:
->condition('cfd.type', [$type, $provider1, $provider2, $provider3], 'IN').
Then it's always a single query, no matter how many providers you have.
And yes, an index could help too. That requires a storage_schema handler, with logic similar to \Drupal\node\NodeStorageSchema::getEntitySchema() and then an update function that calls ->updateEntityType().
The trick there is that uri can be very long, so we most likely have to limit the length of the index, but for most uris that should still work fine.
Comment #18
rhristov CreditAttribution: rhristov at Bulcode commentedApplying a patch that does the following:
- Adding an index on type and uri columns in crop_field_data table. This will speed up the database queries.
- Adding a static cache where the crop data gets retrieved from the database. This function is called too often from \Drupal\crop\Entity\Crop::getCropFromImageStyle, because we could have a large number of image styles that share the same crop type.
Comment #20
ndobromirov CreditAttribution: ndobromirov at FFW commentedThere is the strange behavior of fetching multitude of crops in a loop and using the last one only. Seems like a logical issue in the original code. If we do an array_pop (when not empty) and fetch only that should spare DB queries in the long run in some cases.
Overall this patch is a huge step forward I am getting around half a second speed up.
The failing tests does not seem related in ANY way and should not block a commit.
Comment #21
BerdirYes, definitely an improvement, I'm seeing 14 instead of 50 queries on an article page in a project I just tested.
However, I'm pretty sure the test fail *is* related, because it is specifically changing the crop in the same process and then that's going to hit the static cache.
If we change that to a normal protected property then we could even reset that on saving a crop but I'd also be OK with just unsetting the storage, in either case, it needs to be a protected property.
Also, of these 14 queries, there are still 3 requests per image with a single crop type and image style because of that fallback loop.
That means we can save an additional 66% of those queries if we optimize it further. Previously I suggested a new API that supports a list of types, but that would require a new API and quite some changes to the caller. I just had a better idea.
I think we can make the assumption that there will always be a rather limited amount of crop types, many sites will in fact only have one (e.g. when using focal point) or maybe a few different aspect ratios.
So my suggestion would be a two-level array, first indexed by image url, second by type. And then the trick is that if an uri is not yet in the static cache, we do a query that fetches type, cid entries for that and then put that as fetchAllKeyed() into the static cache. And then if the firs-level key is set, we simply have to check if the type is set too and then call load(). We do *not* need to staticallly cache the loaded object, they are already have a static cache. (Doing that should already fix the test in fact because it updates the same crop entity).
The downside is that sites that really have a large amount of crop types (*not* image styles) would have to fetch them all, but I'd say that if you have 50 crop types, chances are also high that we will have to fetch multiple crop types per image, and then one query with up to 50 matches is probably still faster than 3 that each only returns one.
Comment #22
BerdirComment #23
BerdirSomething like this.
Tested this on a page with quite a lot of images and it got down to 18 queries from 168. And I just realized I can get it even lower by excluding the image provider, because we know that's not going to provide any crops. So if you have some image styles/images that e.g. use non-cropping image styles, we can skip it completely then.
On the site where I tested previously, I'm down to 5 queries now.
Comment #24
BerdirOk, that passed, would be great to see some profiling from others. #20 shows that a lot of time is spent on getting the effects info from an image style, I can't really see that, but sounds like we should try to maybe statically cache that, as that iterator stuff is slow. Also loading the image styles is, I think we should change the API to only pass in the API (in a BC way, new method and forward to that), then we could likely save quite a bit more.
Comment #25
BerdirTried that, this should be as fast as we can make it.. We'll have to make sure this still works correctly.
Comment #26
ndobromirov CreditAttribution: ndobromirov at FFW commentedI was testing for cold caches in #20, so all caches were disabled (including the entity one), so that might not be relevant that much...
I will test both patches. How to do the validation part? The tests should be enough.
Comment #27
BerdirWell, test coverage is probably not *that* good here, and there are differences based on which module you use, e.g. focal_point works completely different than image_widget_crop. I did some quick tests with both, so it's looking good I think.
Comment #28
woprrr CreditAttribution: woprrr as a volunteer and at NeoLynk commentedHey berdir sorry for that late reply ! I will see all of this today this is my main focus ! I will test this patch with iwc by the way and with automated crop too (probably a part of automated crop factory need similar things but the bridge with crop api are ready to use this important improvement)
I make more feedback fast !
@berdir we can think to stable release for crop api 2.x by the way I guess with that patch and other small improvements for media in Code what’s your feeling about that ?
Comment #29
Berdir> @berdir we can think to stable release for crop api 2.x by the way I guess
with that patch and other small improvements for media in Code what’s your
feeling about that ?
Sounds good to me. We're using it in a lot of projects and it's working fine for us beside the performance issues, which this will hopefully mostly address.
Comment #30
woprrr CreditAttribution: woprrr as a volunteer and at NeoLynk commentedYES *_*, Efforts on the crop module need to be continued and I am currently personally reviewing the priority for contribution and sponsorship. This module must have remained simple and do the job in the best way as Janez and I had planned at the base. The effort at the documentation level must be resolved as soon as possible and I commit myself to realize all that is necessary around this problematic having also the knowledge of certain technical choices at the time of the development of the module we had serious problems and drift with the $ type in the queries for example we'd delete it and decide to find a robust solution on issue XXX.
I'm going to create a ROADMAP issue and a precise cartography of what we need to add in module for stable and robust release (TU / TF included). I just need people to validate that roadmap are realist for our projects.
We can optimize by move this block first with negative test (;'( sorry oncle bob)
like :
I focus my effort from readibility and clean codes principles to avoid #9 kind of comment and that's right in the past we was too lax :P
isApplicable again ? not empty and 'image' !== $provider (Prepare us to win war counter empire :))
Why not adding a static property $notApplicableProvider = ['image']
and then having a test like if($this->isApplicable($provider) or isApplicable() is provider are in property (not sure). Or test provider in findCrop method (to avoid this test in foreach :/).
Can we extract this test like : if(this->hasExistingEffects($image_style_id)) { ... }
That's useful for reability
Can we add a const into CropEffect to centralize effect name and avoid eventual errors (That change can make easier use of Crop api).
Again extract this into separate method look better EG : $this->addCropEffects(array $effects): self
Possibilbly we can wrap all foreach in extracted method if we can be more clean and clear.
That have another important improvement that's make Crop object more calistenic and simple ;)
I'm currently in test with IWC.
Comment #31
Berdir@woppr: Thanks for the review, there are a few points where I'm not sure I understand you correctly, could be a language issue. Maybe it would be easier if you'd just provide a patch/interdiff for your suggestions?
That said, several of your suggestions are about adding methods for single-use, single-line code snippets. I don't think it is necessary to add methods for that, e.g. point 6, 4. I don't think that will make a big difference for readability and you will replace one line with 10 or so (docblock + function declaration).
To address #9, we don't need more micro-functions, what we need is a high-level documentation that explains what this module does and how it does it.
Comment #32
artamiel CreditAttribution: artamiel at FFW commentedHey, @berdir,
I've ran some tests with the latest (#25) patch and I will share some feedback with you.
So, for me, the current state of the patch is definitely not production ready and still needs some work, but as far as optimizations goes - yes, there is definitely an improvement.
So, what was my use case:
- I have installed the module, along side JsonAPI, because I need the cropped images to be accessible directly with given URL.
- I have added 5 crop types with the following Aspect Ratios (not that they matter, but still...) - 1:2, 2:1, 2:2, 2:3 and one without AR /Free crop/
- I have created an Image Style for each Crop Type and applied the cropping effect accordingly.
- I have created a basic node and for the image/media uploaded and then applied cropping to the image for all of the types.
- Lastly, I've accessed my API endpoint and I've successfully obtained all 5 URL's with each crop style applied /so far so good/.
For my surprise, I saw that the cropping was applied only for the first Crop Type loaded /in my case this being 1:2 because they are sorted by name/. For the rest of the image styles, as a result I received the original image with no image style applied.
So, this made me curious and after looking at code changes introduced by the last patch, I saw that the problem lies within CropStorage. Basically, what happens is that by storing the
$uri
in static cache and with a combination ofrange(0, 1)
you are no longer able to retrieve the crop configuration for any given$type
but the first one, while the rest will always fallback toNULL
.For the moment I am not able to provide new patch, but meanwhile I can test pretty fast with a real-case scenario any future patches, until I find time to contribute myself.
Comment #33
BerdirYou're right, the range() is a left-over that needs to go, it should work then without that, and shows exactly my point, that test coverage isn't very good yet :)
You can probably just update that quickly locally and confirm that this fixes your problem, I'll update the patch later.
Comment #34
artamiel CreditAttribution: artamiel at FFW commentedYes, I've tested it before posting the previous comment, so I can confirm that removing the
range()
will definitely yield correct result.I just wasn't sure this is the direction that you will be going, since I am not that familiar with the discussion for now.
Also, I guess it is okay to point out the obvious - even with a static cache added, removing
range()
will still load all of the crop types, which might not be the best idea when, let's say, you have single image rendered on your page with crop style applied and in this case you're interested in loading just a single Crop Type.But besides that point, there is quite the performance-wise improvement already, so cases like the last one can be further optimized.
Good work.
Comment #35
BerdirYes, see #21 on my thoughts on this. I'm doing this on purpose, because with the fallback-logic, it often does more than one query right now for a single call to getCrop(). It's just a tiny bit slower if you really only need a single type for this request, but without this optimization, I think it is quite likely to do more than one request for a given image even when just displaying an image with a single image style. And one query that returns possibly multiple results that you might need is most likely still faster than a second query. Unless you have a huge amount of crop types, but that seems like a bad idea anyway.
Combined with query caches, it also means fewer variations of queries to cache for MySQL.
Comment #36
rhristov CreditAttribution: rhristov at Bulcode commentedApplying a new patch that is removing the range().
Comment #37
woprrr CreditAttribution: woprrr as a volunteer and at NeoLynk commentedIMHO we can’t just remove range like this directly without fixed the contextualization of crops variants. If users make n crops for one crop type findCrop will return n crop object without another logic to choose which crop is needed by display.
Comment #38
BerdirAFAIK that is a feature that doesn't exist and also doesn't work on HEAD, #2617818: Support multiple crop variants per URI and crop type doesn't have any patches?
This refactoring *might* make it more complicated, but it is an internal optimization that we could change again if we had to. But the API can only return one crop for type + URI, so we would have to make major changes to that anyway.
Comment #39
slv_ CreditAttribution: slv_ at Lullabot for NBCUniversal commentedBased on the existing comments it looks like #36 is robust enough to make it into 2.0?
If I got right from #38, while not the perfect solution, a non-ideal solution is far better than no solution at all. Without this patch, the crop api is not something that can run on a production site with a medium amount of uncached content. Could it be included in the next RC? Further feedback can be used later on to improve things.
IMHO the clean code tweaks won't change anything in the overall solution, and can always be added later on if truly needed.
Comment #40
BerdirYeah, we have been using this on various small to large production sites and this is a massive performance improvement.
Comment #41
romainj CreditAttribution: romainj as a volunteer commentedI can confirm that the patch #36 works pretty well. No side-effect until now.
Comment #42
ndobromirov CreditAttribution: ndobromirov at FFW commentedBased on all the above since #36 and the fact that I've been using the patch from #36 for months now on a high traffic information portal solution with no side effects, I will be moving to RTBC.
Comment #43
idebr CreditAttribution: idebr at iO commentedLet's tie up a few loose ends before commit:
crop.module now has an unused use statement for use Drupal\image\Entity\ImageStyle;
s/eyed/keyed
Unused use statement
This method has no parent to inheritdoc from
@return array[]
Comment #44
pdenooijer CreditAttribution: pdenooijer at Ordina Digital Services for RTL Nieuws commentedFixed above comments and did some more boy scouting on the patch.
Comment #45
naingoted CreditAttribution: naingoted commented#44, this one have some issues with focal point module 8.x-1.0 on drupal 8.7.7. It won't auto generate images before you manually click preview images link (under image). It should be auto generating images on node save. @pdenooijer what was your use case though ?
Best,
v
Comment #46
pdenooijer CreditAttribution: pdenooijer at Ordina Digital Services for RTL Nieuws commentedNo, my use case was that the front page was so terribly slow. This was because of the shear amount of slow queries (no index) on the crop table. I did not test this with focal point.
Comment #47
Berdir@pdenooijer: Don't forget to upload an interdiff, especially when only doing small changes in a larger patch. Attached one now.
Looks OK to me, the @deprecated comment is a bit strange as it doesn't properly reference the function, but the core rules are hard to apply to contrib on the exact message with version and so on. And that could be fixed on commit.
#45: Unless you use a special storage like S3, images are always generated on the first request and there is nothing here that would cause that to change.
Comment #48
naingoted CreditAttribution: naingoted commented#47 i've figured out. i had to use convert_encoding('UTF-8', 'HTML-ENTITIES') in the views template. In version 8.6.16 i was using
fields.field_image.content|render|striptags|trim
and now i've switched tofields.field_image.content|render|striptags|trim|convert_encoding('UTF-8', 'HTML-ENTITIES')
and i wasn't using s3.
Comment #49
ndobromirov CreditAttribution: ndobromirov commentedPatch in #44 applies against the latest dev version and it is still resolving the problem.
RTBC +1.
Comment #50
pdenooijer CreditAttribution: pdenooijer at Ordina Digital Services for RTL Nieuws commentedCan we please merge this, as it has 2 RTBC now an we are using this with success in production for quite a while as well.
Comment #51
phenaproximaIs there any real reason to add a whole method here? Surely we could just change getCropFromImageStyle to accept either an ImageStyleInterface object, or an ID. Then there's no real need for calling code to deal with this deprecation.
Nit: Seeing as how we're fixing minor typos here anyway, 'uri' should be 'URI', since it is an acronym.
This a subtle, but important-looking, change in behavior. Won't this cause a performance regression, since it will be querying for a crop even if it has previously come back as NULL? Either way, this deserves a comment explaining the rationale, so that a future developer doesn't unwittingly change it back.
What is the rationale for this?
Comment #52
phenaproximaThis absolutely needs a dedicated test.
Perhaps this should be using a dedicated implementation of CacheBackendInterface (i.e., MemoryBackend)?
Comment #53
Berdir#51
1 Well, since it is type hinted but a 1:1 interface thing, adding a new method is more BC, changing the type hint would break a possible overide of that method, however unlikely it may be. I also think accepting an object or ID is bad API design. No strong opinion though. There are few reasons to call it directly anyway, the main API is just getting an image style URL, also for decoupled and so on.
3. No, it's not a regression because we query for all crops and since fetchAllKeyed() always returns an array, even if there are no its, isset(array()) is TRUE.
4. The method documents null as return value, FALSE was a bug IMHO.
#52.1 Aww.. that's going to be painful. It's a straight-forward entity update..
2. I don't really the benefit of that, and it would make the thing about with the fetchAllKeyed()/isset more complicated IMHO. I don't see a benefit from being able to invalidate this from the outside like entity storage where memory usage is a concern.
Comment #54
ndobromirov CreditAttribution: ndobromirov commented#53.2 For any long-running script that will have to manage crops this could pose an issue.
If there is private cache it should have an API to clear it OR inject a service to contain the cached data.
I am generally +1 for #52.2.
Comment #55
BerdirWe only store list of ids, and we also unset it again if crops are saved. It is extremely unlikely that this would cause memory problems of any sort and it's still a entity storage and you could use the useCache(FALSE/TRUE) trick to unset the storages.
Core is doing the same with \Drupal\Core\Entity\ContentEntityStorageBase::$latestRevisionIds for example, I really don't see the point of making this more complicated.
Comment #56
pdenooijer CreditAttribution: pdenooijer at Ordina Digital Services for RTL Nieuws commentedI agree with Berdir.
#51:
\Drupal\crop\CropInterface|null
, so returning false is a bug.#52
Can we please merge this now?
Comment #57
phenaproximaThe only thing I think I feel strongly about here is the update test. I agree it is straightforward and unlikely to break, but entity updates are a situation where if something goes wrong for any reason, regardless of whose fault it is, there is a significant risk of data loss. It therefore feels irresponsible to commit this without some sort of assurance that the update path does what it's supposed to do.
That said, I know how painful it is create a fixture for update testing. I've done it before, though, and I'll see if I can do it here too. It really would make me feel a lot better, and if something does go wrong with the update path in the future, it'll be a lifesaver to have a test of it.
Comment #58
phenaproximaI just committed #3104867: Add a dedicated test to ensure the update path works, so it should now be trivial to add an assertion to UpdatePathTest confirming that the index exists as expected. :)
Comment #59
BerdirAdded that, also marked the test as @group legacy to avoid the unrelated deprecation messages. Could still remove that when we update to 8.8 updates, but sooner or later, new things will be added there.
Comment #61
phenaproximaAdjusting credit for commit.
Comment #63
phenaproximaCommitted, then reverted due to incorrect credit, then re-committed and pushed to 8.x-2.x. Thanks!
Comment #64
pdenooijer CreditAttribution: pdenooijer at Ordina Digital Services for RTL Nieuws commentedWoop woop, nice :D!
Comment #65
afoster CreditAttribution: afoster at Foster Interactive Inc. commentedThanks so much for this!
We had 4x crops on a single huge image go from 55s to 8s on node save. There are many many image styles in this site for responsive images for many different contexts.