If some of your modules have too many files in their subfolders, load time for your modules page can go up to 80 seconds or similar. It can even fail with a segfault.
The reason, in short:
module_rebuild_cache() calls drupal_system_listing(), which calls file_scan_directory(), which recursively calls itself for every subfolder nested somewhere in the modules directory.
Most of these subfolders do not contain any information that would be relevant for module_rebuild_cache(). All it wants to find is the .info files of all the modules.
Experiment:
Put a folder with a large number (sth like 10.000) of files and subfolders into one of your modules' folders, and call the modules page. Use xdebug to find out what is taking so long.
Typical offenders are modules which ask to download big 3rd party libraries and store them in your modules folder, such as civicrm, or older versions of wysiwyg api. Also theme using bower_component or node_modules.
-------
Proposed solution:
- Module developers should have the possibility to exclude subfolders from being scanned in drupal_system_listing(), with a setting in the module's .info file, or a dedicated configuration file.
- The site admin should have the possibility to exclude subfolders from being scanned in drupal_system_listing(), with a settings file somewhere. This way, people don't need to wait for slow-paced module development.
- For some modules it might be better to exclude all and then include specific folders (whitelist instead of blacklist). Alternatively, the developer
For backwards compatibility, the default will always be to scan all subfolders.
Syntax could be something like this (in modules/modulename/modulename.info):
noscan[] = js ; don't scan modules/modulename/js
noscan[] = styles
noscan[] = lib/* ; scan for "lib/xy.info", but not "lib/*/xy.info"
or in modules/noscan.info
noscan[] = some_module/* ; ignore subfolders of this module
---------
Implementation
If the function should look into the *.info files along with the recursive scanning, we can't use file_scan_directory(). We need some dedicated recursive function.
The .info file have to be scanned anyway, so, no extra performance penalty.
Comment | File | Size | Author |
---|---|---|---|
#44 | 546584-40_cache-infofile-and-scandir-results.patch | 4.58 KB | netw3rker |
#10 | scandir.php_.txt | 1.08 KB | donquixote |
#9 | scandir.ini_.txt | 6.4 KB | donquixote |
#9 | common.inc_.drupal_system_listing.speedup.patch | 5.7 KB | donquixote |
#8 | file_scan_system_directory.inc_.txt | 3.5 KB | donquixote |
Comments
Comment #1
markus_petrux CreditAttribution: markus_petrux commenteddrupal_system_listing() is also used to search for .info files.
Thanks for this finding. I believe I'm going to scan my folders to relocate stuff in 3rd party libs that's not really necessary.
Comment #2
picardo CreditAttribution: picardo commentedsubscribed
Comment #3
picardo CreditAttribution: picardo commentedIn the case of civicrm at least the subfolder exclusion will not be enough because typically civicrm file structure places Drupal modules in a deep subfolder. If you exclude civicrm from inspection, you will never find those modules. Those modules will need to be taken outside of the file structure, then, and they'd probably work regardless of where they are located, I'm sure.
Comment #4
donquixote CreditAttribution: donquixote commentedcivicrm shows we need a combination of whitelist and blacklist.
But how should the system decide which rule has priority?
Like this, the priority would be clear: Later rules override previous rules.
Alternative to the array notation (faster to type, but doesn't allow comments):
Or, we could target specific files, so we don't need additional exclusion rules.
Comment #5
EvanDonovan CreditAttribution: EvanDonovan commentedSubscribe. Thanks so much for revealing this issue!
Comment #6
donquixote CreditAttribution: donquixote commentedEDIT:
Removing my own comment for being off-topic.
Comment #7
moshe weitzman CreditAttribution: moshe weitzman commentedThis is a critical core bug IMO. We don't have any good solution even for admins and module developers who want to speed up their modules page.
Comment #8
donquixote CreditAttribution: donquixote commentedHere is a draft for a possible solution. The function file_scan_system_directory is meant as a replacement for file_scan_directory in drupal_system_listing. It's not meant as a general replacement for file_scan_directory!!!
Warning: This is draft code, I didn't even test it on my localhost.
The scandir.ini files can be placed anywhere in any module directory, or even the modules root dir. Paths in the scandir.ini files are relative to the location of the scandir.ini file.
I am not yet sure how the scandir.ini files should look like, but the result should look like this:
(assuming the file is "$path/scandir.ini")
I hope the rest is obvious from the attached file..
The idea:
- Modules can define their own scandir.ini files
- Users can have their own settings for modules that don't provide scandir.ini files.
Comment #9
donquixote CreditAttribution: donquixote commentedA patch for Drupal 6.x dev.
I don't have one for D7, I hope someone else can do that.
Attached is the patch, and an example scandir.ini file that is right now sitting in my sites/all/modules/contrib. Obviously, the scandir.ini makes sense for the modules that I currently have installed on the test site (a lot), but you would have to adjust it to your needs.
My tests show that the patched drupal_system_listing produces the same $files array as the old implementation: Same keys, same order, same values. And it's a lot faster, with the scandir.ini in place.
Theoretically the scandir.ini could be auto-generated. But, that would also mean something has to take care that it gets rebuilt from time to time - we would be back to square one, because rebuilding is just as slow as it always has been.
The only sustainable solution is having scandir.ini files shipped with contributed modules. Until then, the hand-made scandir.ini can be used as a temporary solution.
Comment #10
donquixote CreditAttribution: donquixote commentedA simple standalone script to automate the creation of scandir.ini files in sites/all/modules/contrib, if you want to use the above patch. If your modules are somewhere else, you need to adapt the script. You can run it for core modules, but they are usually not the big performance killer. You can run it for custom modules, but then you need to re-run it every time you change the folder structure or add new sub-modules. You don't want that.
Some instructions:
- You need to remove outdated scandir.ini files when you update a module. An easy way to do this is to delete the folder of the old module version, before you upload the new version. This is why I made separate scandir.ini files for each module.
- It is a good idea to re-run scandir.php every time you install or update or patch a module, but this is not required. The system will do its job without any scandir.ini files, it will just be a bit slower (about as slow as it is without the patch).
- You do NOT need to run scandir.php when you just enable or disable existing modules in admin/build/modules, as this does not change the file structure in any way.
Note: This is not the intended use of this mechanism. The original idea was that modules define their own scandir.ini files, which is especially useful for big ones like civicrm.
I would not suggest to make the automation script part of Drupal: It is very likely that Drupal does not have write access to the modules folder.
How is this helpful?
A typical use case: I made a custom module and want to test the auto-installer. This means, I have to repeatedly enable and disable this module. Each time I have to wait for drupal_system_listing, which can be quite slow with big contrib modules. The patch removes this one bottleneck.
Comment #11
EvanDonovan CreditAttribution: EvanDonovan commentedVery nice. I'll have to test this with views, ubercart & civicrm.
Comment #12
alexanderpas CreditAttribution: alexanderpas commentedfirst thing i'm getting with this patch is the "not another fragile .ini file" feeling.
we already have out .info files can't we be using that somehow?
Comment #13
donquixote CreditAttribution: donquixote commented@alexanderpas:
Sure, please invent a better syntax that fits in the *.info files.
But please consider that some module might not have a modulename.info in the root folder..
I see that style is a big issue here, we don't want ugly files lying around. For now I just wanted to have something that works, as a proof of concept and as a quick fix for people who don't want to wait for something pretty.
Maybe you can even come up with a convincing database-driven solution.
The point of having it in files was that these files get automatically deleted or updated when you update the module.
With the proposed patch my sites/all/modules/contrib/cck/scandir.ini file looks like this:
which means, inside this folder please look for
content.info, content.module,
modules/content_copy/content_copy.info, modules/content_copy/content_copy.module
etc.
(dot means the current directory)
The idea was that the following should also be allowed:
Now this would mean please look for
content.info, content.module,
and everything inside modules/
unfortunately, I think the current implementation does not look like this will work - but hey, it's the idea that counts.
The third notation this syntax is capable of (for the patch I'm not sure if this works):
which means, scan everything except the css and js subfolder. Or in other words: For the current folder there are no restrictions. For the css subfolder, please only look for
css/#.info
,css/#.module
,css/#/*
- which does not exist, obviously. For the js subfolder, please only look forjs/#.info
,js/#.module
,js/#/*
.In the real world I think the first solution is totally sufficient.
Comment #14
donquixote CreditAttribution: donquixote commented@alexanderpas:
Another problem I see with putting it in modulename.info files is that any auto-generation would have to mess with the original contents of the info file, and that you can no longer simply delete a file to force a re-scan.
Comment #15
markus_petrux CreditAttribution: markus_petrux commentedIf this auto-generation is a workaround for D6 in mind, it may hide the fact that this would have to be implemented first for D7 ...or D8, meaning module .info files could probably include this information as requirement to upgrade, I guess.
It seems to me it is more important to find consensus with the feature itself, and the syntax for the meta information. Once this is done, then this could be implemented for D7, then... since this won't be included in D6 because it is a new feature, and it will probably live just in patch form, it doesn't matter match if it works with auto-generated meta data that lives in .ini files or whatever else it makes it easier to deal with.
Comment #16
alexanderpas CreditAttribution: alexanderpas commenteddon't those modules have a parent module that also has an .info file?
We are already required to declare the files containing code we're providing (http://drupal.org/node/224333#registry)
effectively, if i'm reading the API docs correctly, we're left to only using this to scan for the .info & .tpl.php files effectively (as all the other files are declared in the .info)
we've already determined that we don't want the requirement to declare the template files inside themes.
now, how about adding the following syntax?
(all relative to an info file)
each of those directories will be scanned, in addition to the directory where the index file is located, just not recursively. (the directory containing the themes should still be scanned recursively)
in addition to those explicitly specified directories i think we should scan certain directories automatically by default.
specifically, the theme directory (again relative to the .info file) recursively,
and all sub directories of modules non-recursively (again finding all .info files there, and doing the same process.)
Comment #17
alexanderpas CreditAttribution: alexanderpas commentedjust to make things clear: if a module directory doesn't contain an info file, we should scan it's sub-directories recursively until we have found those .info files.
Comment #18
EvanDonovan CreditAttribution: EvanDonovan commentedalexanderpas: I like the
provides[]
syntax. I think putting this information in the .info file for the root module is the way to go moving forward. But donquixote's patch sounds like a good improvement for D6 sites with giant modules like CiviCRM installed.Comment #19
donquixote CreditAttribution: donquixote commentedI did not even think about tpl.php files! Thanks for reminding me.
I think it would make sense to separate hints for *.tpl.php files and *.module / *.info files. Usually x.module will be in the same folder as x.info. Not so for *.tpl.php.
So what about
I am not sure if the auto-appended .info and .module suffix is a good idea, but it helps that you don't need two separate definitions for each.
Folder hints vs file hints:
In 3rd party modules the scandir.ini (or the information in modulename.info) can be auto-generated before the module goes to CVS, so there is little extra cost in providing file hints instead of just folder hints. Folder hints make sense during development and for template files - if one folder contains nothing but template files.
Makes sense. Usually themes are not that big, and only one is active at a time. On the other hand, if there is a mechanism for modules, it's probably not such a big step to do the same for themes. It would be all optional, so a theme developer would only use this if the theme contains some big 3rd party lib or tons of images.
Comment #20
srobert72 CreditAttribution: srobert72 commentedSubscribe
Comment #21
sun.core CreditAttribution: sun.core commentedThis issue exists since D5, where .info files were introduced IIRC. In addition, we're not only scanning the filesystem for .info files less often, but also working on Libraries API in contrib (which removes the cause, so no workaround required for the trigger).
Comment #22
plachComment #23
pillarsdotnet CreditAttribution: pillarsdotnet commentedAlso see #1165694: Remove default nomask from file_scan_directory()
Comment #24
donquixote CreditAttribution: donquixote commentedWhat if we would introduce naming conventions?
Any directory name starting with underscore would not be scanned.
Or, if we already found a *.info file in a directory, then we only scan subdirectories named "modules".
Maybe a bit too simple..
Comment #25
pillarsdotnet CreditAttribution: pillarsdotnet commented@#24 -- Wouldn't help with civicrm installations, which is probably the most common and problematic use-case.
Comment #26
donquixote CreditAttribution: donquixote commentedFor civicrm this would mean (afaik) that we scan everything.
Maybe this should be fixed in civicrm then? Let it provide the "Drupal" part via drush dl into sites/*/modules, and move the "civicrm" part into sites/*/libraries. This would be a big change for civicrm, but imo a good one. Not sure if we could produce an upgrade path..
---------
If there are other projects where a heavy subdir (such as, a big css or js or img subdir) is not blocked by a modulename.info, then my idea would be a projectname.project.info (to not be mistaken for a module info file). And if we already consider doing that, then maybe put everything into *.project.info files, and never into modulename.info? I think it is more logical to think of this as per-project, not as per-module.
The other question is whether we should auto-generate these files.
Comment #27
pillarsdotnet CreditAttribution: pillarsdotnet commentedSo civicrm could have a
civicrm_project.info
in its top-level directory pointing todrupal/civicrm.info
and friends.Comment #28
donquixote CreditAttribution: donquixote commentedDrupal would think that civicrm_project is a module then. Which it is not. Is this a problem?
Otherwise, I would say "civicrm.project.info" instead.
Comment #29
pillarsdotnet CreditAttribution: pillarsdotnet commentedSorry; you are correct; it should be "civicrm.project.info" not "civicrm_project.info".
Looking forward to a reviewable patch.
Comment #30
EvanDonovan CreditAttribution: EvanDonovan commentedSo...is the idea from #26 that the modulename.project.info file would just have the information on which modules were in the project, as per the earlier proposal of a
provides[]
syntax, or that it would actually be a merge of the info files for all the modules provided by the project?I think the first option would be better.
I think auto-generating the files would be ideal; it would be good to have a drush command that module developers could use to create one.
Comment #31
gagarine CreditAttribution: gagarine commentedThis is also useful for themes. I my themes I have a ton of files in folders like node_modules or bower_component.
In some case it even fail with a segfault.
For theme, in my point of view we should only look at the first level and in one folder containing all the tpl/twig. This can be a convention...
Comment #32
gagarine CreditAttribution: gagarine commentedComment #33
dawehnerAdding a related issue.
Comment #34
sunThis should no longer be an issue in D8.
drupal_system_listing()
was replaced byExtensionDiscovery
.Comment #35
david_garcia CreditAttribution: david_garcia commentedClosed #2494223: Slow module page, module enable/disable, registry rebuild and others as a duplicate.
Comment #36
david_garcia CreditAttribution: david_garcia commentedMaybe a little crazy, but a possible approach is to completely cache the results of file_scan_directory based on file system timestamps.
I'm running this on production on a memory based cache backend with noticeable performance boost (such as accessing the module page down from 5 sec. to inmediate).
The change could possibly have a positive impact on anything that relies on file_scan_directory such as rebuilding the registry, enabling/disabling a module, etc.
Comment #38
moshe weitzman CreditAttribution: moshe weitzman at Acquia commentedFYI, we added a service in Drupal 8 which is similar to this - #2395143: YAML parsing is very slow, cache it with FileCache
Comment #39
david_garcia CreditAttribution: david_garcia commentedWe could probably also cache the parsing of info files, I also have this in my modified version of core:
Comment #40
netw3rker CreditAttribution: netw3rker commented@david_garcia, the patch in #36 has a couple of issues you wouldn't have caught because you are using memcache (which doesn't really care about bin names).
1) you do a cache_clear_all against a cache-bin called "file_scan_directory", but in all of your cache_set() commands, you don't specify that bin.
2) if the bin is supposed to exist, you need to explicitly define one in system_schema() (and an update hook)
The problem with doing that is file_scan_directory is required in order to even run updates, so, the system can't even fire the update to install the bucket it needs in order to look up the data required to fire the update :)
It might just be better to let this particular data live with all the other cache in the generic bin.
the code in #39 has similar problems with the use of "drupal_parse_info_file". That data however probably can live in the bootstrap bin since it's basically bootstrappy data about modules.
attached is a patch that combines these and names the buckets correctly.
Comment #41
netw3rker CreditAttribution: netw3rker commentedComment #44
netw3rker CreditAttribution: netw3rker commentedcorrection: "cache_bootstrap" is the bin name..
Comment #46
david_garcia CreditAttribution: david_garcia commented@netw3rker Thanks for working on this! The initial code was indeed just a POC.
Looks like we have an issue here because the altered functions are supposed to be working even if the database backend is not working/available probably to serve early bootstrap phases:
Maybe we can check if the cache backend is available before using it, and if not, completely omit caching?
Comment #47
netw3rker CreditAttribution: netw3rker commentedThat sounds like a good idea. Its funny because I pointed that out about your code, but it worked for me (because I have the cache tables) but never tested how a full-install would work, which the testbot did. +1 for test scripts.
I'm busy for the next few days, and can't get to testing this. Any chance you can update the patch with those changes?
Comment #48
cilefen CreditAttribution: cilefen commentedComment #49
delacosta456 CreditAttribution: delacosta456 commentedhi
Please any update for this please?