If some of your modules have too many files in their subfolders, load time for your modules page can go up to 80 seconds or similar. It can even fail with a segfault.

The reason, in short:
module_rebuild_cache() calls drupal_system_listing(), which calls file_scan_directory(), which recursively calls itself for every subfolder nested somewhere in the modules directory.

Most of these subfolders do not contain any information that would be relevant for module_rebuild_cache(). All it wants to find is the .info files of all the modules.

Experiment:
Put a folder with a large number (sth like 10.000) of files and subfolders into one of your modules' folders, and call the modules page. Use xdebug to find out what is taking so long.

Typical offenders are modules which ask to download big 3rd party libraries and store them in your modules folder, such as civicrm, or older versions of wysiwyg api. Also theme using bower_component or node_modules.

-------

Proposed solution:
- Module developers should have the possibility to exclude subfolders from being scanned in drupal_system_listing(), with a setting in the module's .info file, or a dedicated configuration file.
- The site admin should have the possibility to exclude subfolders from being scanned in drupal_system_listing(), with a settings file somewhere. This way, people don't need to wait for slow-paced module development.
- For some modules it might be better to exclude all and then include specific folders (whitelist instead of blacklist). Alternatively, the developer

For backwards compatibility, the default will always be to scan all subfolders.

Syntax could be something like this (in modules/modulename/modulename.info):

noscan[] = js   ; don't scan modules/modulename/js
noscan[] = styles
noscan[] = lib/*  ; scan for "lib/xy.info", but not "lib/*/xy.info"

or in modules/noscan.info

noscan[] = some_module/*  ; ignore subfolders of this module

---------

Implementation

If the function should look into the *.info files along with the recursive scanning, we can't use file_scan_directory(). We need some dedicated recursive function.

The .info file have to be scanned anyway, so, no extra performance penalty.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

markus_petrux’s picture

drupal_system_listing() is also used to search for .info files.

Thanks for this finding. I believe I'm going to scan my folders to relocate stuff in 3rd party libs that's not really necessary.

picardo’s picture

subscribed

picardo’s picture

In the case of civicrm at least the subfolder exclusion will not be enough because typically civicrm file structure places Drupal modules in a deep subfolder. If you exclude civicrm from inspection, you will never find those modules. Those modules will need to be taken outside of the file structure, then, and they'd probably work regardless of where they are located, I'm sure.

donquixote’s picture

civicrm shows we need a combination of whitelist and blacklist.
But how should the system decide which rule has priority?

[drupal_system_listing]

; don't look into subfolders of civicrm
scan[] = "- civicrm/*"

; except for some-subfolder/module-1
scan[] = "+ civicrm/some-subfolder/module-1"

; don't scan further down in some-subfolder/module-1
scan[] = "- civicrm/some-subfolder/module-1/*"

Like this, the priority would be clear: Later rules override previous rules.

Alternative to the array notation (faster to type, but doesn't allow comments):

scan = "
  - folder1/*
  + folder1/subfolder1
  - folder1/subfolder1/*
"

Or, we could target specific files, so we don't need additional exclusion rules.

scan = "
  - civicrm/*
  + civicrm/module1/module1.info
  + civicrm/module1/module1a/module1a.info
  + civicrm/module2/module2.info
"
EvanDonovan’s picture

Subscribe. Thanks so much for revealing this issue!

donquixote’s picture

EDIT:
Removing my own comment for being off-topic.

moshe weitzman’s picture

Category: feature » bug
Priority: Normal » Critical

This is a critical core bug IMO. We don't have any good solution even for admins and module developers who want to speed up their modules page.

donquixote’s picture

Here is a draft for a possible solution. The function file_scan_system_directory is meant as a replacement for file_scan_directory in drupal_system_listing. It's not meant as a general replacement for file_scan_directory!!!

Warning: This is draft code, I didn't even test it on my localhost.

The scandir.ini files can be placed anywhere in any module directory, or even the modules root dir. Paths in the scandir.ini files are relative to the location of the scandir.ini file.

I am not yet sure how the scandir.ini files should look like, but the result should look like this:
(assuming the file is "$path/scandir.ini")

<?php
array(
  // exclude "$path/dir1", except for ..
  'dir1' => array(
    'allowed_subdir_1',  // .. "$path/dir1/allowed_subdir_1", and all contained files and folders
    'allowed_subdir_2',  // .. "$path/dir1/allowed_subdir_2", and all contained files and folders
  ),
  // exclude dir2/aaa/bbb, except for ..
  'dir2/aaa/bbb' => array(
    'allowed_file.info',  // .. this info file
    'ccc/another_file.info',  // .. this info file
    'ccc/ddd/another_allowed_subdir'  // .. all files and folders in this subdirectory
  ),
  // exclude everything from dir3
  'dir3' => array(),
)
?>

I hope the rest is obvious from the attached file..

The idea:
- Modules can define their own scandir.ini files
- Users can have their own settings for modules that don't provide scandir.ini files.

donquixote’s picture

A patch for Drupal 6.x dev.
I don't have one for D7, I hope someone else can do that.

Attached is the patch, and an example scandir.ini file that is right now sitting in my sites/all/modules/contrib. Obviously, the scandir.ini makes sense for the modules that I currently have installed on the test site (a lot), but you would have to adjust it to your needs.
My tests show that the patched drupal_system_listing produces the same $files array as the old implementation: Same keys, same order, same values. And it's a lot faster, with the scandir.ini in place.

Theoretically the scandir.ini could be auto-generated. But, that would also mean something has to take care that it gets rebuilt from time to time - we would be back to square one, because rebuilding is just as slow as it always has been.

The only sustainable solution is having scandir.ini files shipped with contributed modules. Until then, the hand-made scandir.ini can be used as a temporary solution.

donquixote’s picture

FileSize
1.08 KB

A simple standalone script to automate the creation of scandir.ini files in sites/all/modules/contrib, if you want to use the above patch. If your modules are somewhere else, you need to adapt the script. You can run it for core modules, but they are usually not the big performance killer. You can run it for custom modules, but then you need to re-run it every time you change the folder structure or add new sub-modules. You don't want that.

Some instructions:
- You need to remove outdated scandir.ini files when you update a module. An easy way to do this is to delete the folder of the old module version, before you upload the new version. This is why I made separate scandir.ini files for each module.
- It is a good idea to re-run scandir.php every time you install or update or patch a module, but this is not required. The system will do its job without any scandir.ini files, it will just be a bit slower (about as slow as it is without the patch).
- You do NOT need to run scandir.php when you just enable or disable existing modules in admin/build/modules, as this does not change the file structure in any way.

Note: This is not the intended use of this mechanism. The original idea was that modules define their own scandir.ini files, which is especially useful for big ones like civicrm.

I would not suggest to make the automation script part of Drupal: It is very likely that Drupal does not have write access to the modules folder.

How is this helpful?
A typical use case: I made a custom module and want to test the auto-installer. This means, I have to repeatedly enable and disable this module. Each time I have to wait for drupal_system_listing, which can be quite slow with big contrib modules. The patch removes this one bottleneck.

EvanDonovan’s picture

Very nice. I'll have to test this with views, ubercart & civicrm.

alexanderpas’s picture

first thing i'm getting with this patch is the "not another fragile .ini file" feeling.

we already have out .info files can't we be using that somehow?

donquixote’s picture

@alexanderpas:
Sure, please invent a better syntax that fits in the *.info files.
But please consider that some module might not have a modulename.info in the root folder..

I see that style is a big issue here, we don't want ugly files lying around. For now I just wanted to have something that works, as a proof of concept and as a quick fix for people who don't want to wait for something pretty.

Maybe you can even come up with a convincing database-driven solution.
The point of having it in files was that these files get automatically deleted or updated when you update the module.

With the proposed patch my sites/all/modules/contrib/cck/scandir.ini file looks like this:

.[] = 'content'
.[] = 'modules/content_copy/content_copy'
.[] = 'modules/content_permissions/content_permissions'
.[] = 'modules/fieldgroup/fieldgroup'
.[] = 'modules/nodereference/nodereference'
.[] = 'modules/number/number'
.[] = 'modules/optionwidgets/optionwidgets'
.[] = 'modules/text/text'
.[] = 'modules/userreference/userreference'

which means, inside this folder please look for
content.info, content.module,
modules/content_copy/content_copy.info, modules/content_copy/content_copy.module
etc.
(dot means the current directory)

The idea was that the following should also be allowed:

.[] = 'content'
.[] = 'modules'

Now this would mean please look for
content.info, content.module,
and everything inside modules/

unfortunately, I think the current implementation does not look like this will work - but hey, it's the idea that counts.

The third notation this syntax is capable of (for the patch I'm not sure if this works):

css = '#'
js = '#'

which means, scan everything except the css and js subfolder. Or in other words: For the current folder there are no restrictions. For the css subfolder, please only look for css/#.info, css/#.module, css/#/* - which does not exist, obviously. For the js subfolder, please only look for js/#.info, js/#.module, js/#/*.

In the real world I think the first solution is totally sufficient.

donquixote’s picture

@alexanderpas:
Another problem I see with putting it in modulename.info files is that any auto-generation would have to mess with the original contents of the info file, and that you can no longer simply delete a file to force a re-scan.

markus_petrux’s picture

If this auto-generation is a workaround for D6 in mind, it may hide the fact that this would have to be implemented first for D7 ...or D8, meaning module .info files could probably include this information as requirement to upgrade, I guess.

It seems to me it is more important to find consensus with the feature itself, and the syntax for the meta information. Once this is done, then this could be implemented for D7, then... since this won't be included in D6 because it is a new feature, and it will probably live just in patch form, it doesn't matter match if it works with auto-generated meta data that lives in .ini files or whatever else it makes it easier to deal with.

alexanderpas’s picture

But please consider that some module might not have a modulename.info in the root folder.

don't those modules have a parent module that also has an .info file?

We are already required to declare the files containing code we're providing (http://drupal.org/node/224333#registry)

effectively, if i'm reading the API docs correctly, we're left to only using this to scan for the .info & .tpl.php files effectively (as all the other files are declared in the .info)
we've already determined that we don't want the requirement to declare the template files inside themes.

now, how about adding the following syntax?
(all relative to an info file)

provides[] = 'modules/field_sql_storage'
provides[] = 'modules/list'
provides[] = 'modules/number'
provides[] = 'modules/options'
provides[] = 'modules/text'

each of those directories will be scanned, in addition to the directory where the index file is located, just not recursively. (the directory containing the themes should still be scanned recursively)

in addition to those explicitly specified directories i think we should scan certain directories automatically by default.
specifically, the theme directory (again relative to the .info file) recursively,
and all sub directories of modules non-recursively (again finding all .info files there, and doing the same process.)

alexanderpas’s picture

just to make things clear: if a module directory doesn't contain an info file, we should scan it's sub-directories recursively until we have found those .info files.

EvanDonovan’s picture

alexanderpas: I like the provides[] syntax. I think putting this information in the .info file for the root module is the way to go moving forward. But donquixote's patch sounds like a good improvement for D6 sites with giant modules like CiviCRM installed.

donquixote’s picture

effectively, if i'm reading the API docs correctly, we're left to only using this to scan for the .info & .tpl.php files effectively (as all the other files are declared in the .info)

I did not even think about tpl.php files! Thanks for reminding me.
I think it would make sense to separate hints for *.tpl.php files and *.module / *.info files. Usually x.module will be in the same folder as x.info. Not so for *.tpl.php.

So what about

; .info and .module files (suffix is auto-appended)
modules[] = 'content'
modules[] = 'modules/content_copy/content_copy'
modules[] = 'modules/content_permissions/content_permissions'
modules[] = 'modules/fieldgroup/fieldgroup'
; template folders
templates[] = 'templates'
templates[] = 'modules/fieldgroup/templates'
; individual template files
templates[] = 'modules/fieldgroup/something.tpl.php'

I am not sure if the auto-appended .info and .module suffix is a good idea, but it helps that you don't need two separate definitions for each.

Folder hints vs file hints:
In 3rd party modules the scandir.ini (or the information in modulename.info) can be auto-generated before the module goes to CVS, so there is little extra cost in providing file hints instead of just folder hints. Folder hints make sense during development and for template files - if one folder contains nothing but template files.

we've already determined that we don't want the requirement to declare the template files inside themes.

Makes sense. Usually themes are not that big, and only one is active at a time. On the other hand, if there is a mechanism for modules, it's probably not such a big step to do the same for themes. It would be all optional, so a theme developer would only use this if the theme contains some big 3rd party lib or tons of images.

srobert72’s picture

Subscribe

sun.core’s picture

Version: 7.x-dev » 8.x-dev
Priority: Critical » Normal

This issue exists since D5, where .info files were introduced IIRC. In addition, we're not only scanning the filesystem for .info files less often, but also working on Libraries API in contrib (which removes the cause, so no workaround required for the trigger).

plach’s picture

pillarsdotnet’s picture

donquixote’s picture

What if we would introduce naming conventions?
Any directory name starting with underscore would not be scanned.
Or, if we already found a *.info file in a directory, then we only scan subdirectories named "modules".
Maybe a bit too simple..

pillarsdotnet’s picture

@#24 -- Wouldn't help with civicrm installations, which is probably the most common and problematic use-case.

donquixote’s picture

just to make things clear: if a module directory doesn't contain an info file, we should scan it's sub-directories recursively until we have found those .info files.

For civicrm this would mean (afaik) that we scan everything.
Maybe this should be fixed in civicrm then? Let it provide the "Drupal" part via drush dl into sites/*/modules, and move the "civicrm" part into sites/*/libraries. This would be a big change for civicrm, but imo a good one. Not sure if we could produce an upgrade path..

---------

If there are other projects where a heavy subdir (such as, a big css or js or img subdir) is not blocked by a modulename.info, then my idea would be a projectname.project.info (to not be mistaken for a module info file). And if we already consider doing that, then maybe put everything into *.project.info files, and never into modulename.info? I think it is more logical to think of this as per-project, not as per-module.

The other question is whether we should auto-generate these files.

pillarsdotnet’s picture

So civicrm could have a civicrm_project.info in its top-level directory pointing to drupal/civicrm.info and friends.

donquixote’s picture

Drupal would think that civicrm_project is a module then. Which it is not. Is this a problem?
Otherwise, I would say "civicrm.project.info" instead.

pillarsdotnet’s picture

Sorry; you are correct; it should be "civicrm.project.info" not "civicrm_project.info".

Looking forward to a reviewable patch.

EvanDonovan’s picture

So...is the idea from #26 that the modulename.project.info file would just have the information on which modules were in the project, as per the earlier proposal of a provides[] syntax, or that it would actually be a merge of the info files for all the modules provided by the project?

I think the first option would be better.

I think auto-generating the files would be ideal; it would be good to have a drush command that module developers could use to create one.

gagarine’s picture

This is also useful for themes. I my themes I have a ton of files in folders like node_modules or bower_component.

In some case it even fail with a segfault.

For theme, in my point of view we should only look at the first level and in one folder containing all the tpl/twig. This can be a convention...

gagarine’s picture

Title: allow to exclude folders from drupal_system_listing » Modules or theme with to much files kill drupal_system_listing performances
Issue summary: View changes
dawehner’s picture

Adding a related issue.

sun’s picture

Version: 8.0.x-dev » 7.x-dev

This should no longer be an issue in D8. drupal_system_listing() was replaced by ExtensionDiscovery.

david_garcia’s picture

david_garcia’s picture

Status: Active » Needs review
FileSize
3.72 KB

Maybe a little crazy, but a possible approach is to completely cache the results of file_scan_directory based on file system timestamps.

I'm running this on production on a memory based cache backend with noticeable performance boost (such as accessing the module page down from 5 sec. to inmediate).

The change could possibly have a positive impact on anything that relies on file_scan_directory such as rebuilding the registry, enabling/disabling a module, etc.

Status: Needs review » Needs work

The last submitted patch, 36: cache-file-scan-dir-d7.patch, failed testing.

moshe weitzman’s picture

FYI, we added a service in Drupal 8 which is similar to this - #2395143: YAML parsing is very slow, cache it with FileCache

david_garcia’s picture

We could probably also cache the parsing of info files, I also have this in my modified version of core:

function drupal_parse_info_file($filename) {
  $info = &drupal_static(__FUNCTION__, array());
  $key = NULL;
  if (!isset($info[$filename])) {
    if (!file_exists($filename)) {
      $info[$filename] = array();
    }
    else {
      $key = $filename . ':' . filemtime($filename);
      if ($cache = cache_get($key, 'drupal_parse_info_file')) {
        return $cache->data;
      }
      cache_clear_all($filename, 'drupal_parse_info_file', TRUE);
      $data = file_get_contents($filename);
      $info[$filename] = drupal_parse_info_format($data);
      cache_set($key, $info[$filename], 'drupal_parse_info_file');
    }
  }
  
  return $info[$filename];
}
netw3rker’s picture

@david_garcia, the patch in #36 has a couple of issues you wouldn't have caught because you are using memcache (which doesn't really care about bin names).

1) you do a cache_clear_all against a cache-bin called "file_scan_directory", but in all of your cache_set() commands, you don't specify that bin.

2) if the bin is supposed to exist, you need to explicitly define one in system_schema() (and an update hook)

The problem with doing that is file_scan_directory is required in order to even run updates, so, the system can't even fire the update to install the bucket it needs in order to look up the data required to fire the update :)

It might just be better to let this particular data live with all the other cache in the generic bin.

the code in #39 has similar problems with the use of "drupal_parse_info_file". That data however probably can live in the bootstrap bin since it's basically bootstrappy data about modules.

attached is a patch that combines these and names the buckets correctly.

netw3rker’s picture

Status: Needs work » Needs review

The last submitted patch, 9: common.inc_.drupal_system_listing.speedup.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 40: 546584-40_cache-infofile-and-scandir-results.patch, failed testing.

netw3rker’s picture

Status: Needs work » Needs review
FileSize
4.58 KB

correction: "cache_bootstrap" is the bin name..

Status: Needs review » Needs work

The last submitted patch, 44: 546584-40_cache-infofile-and-scandir-results.patch, failed testing.

david_garcia’s picture

@netw3rker Thanks for working on this! The initial code was indeed just a POC.

Looks like we have an issue here because the altered functions are supposed to be working even if the database backend is not working/available probably to serve early bootstrap phases:

DrupalUnitTestCase: These tests can not access the database nor files. Calling any Drupal function that needs the database will throw exceptions.

Maybe we can check if the cache backend is available before using it, and if not, completely omit caching?

netw3rker’s picture

That sounds like a good idea. Its funny because I pointed that out about your code, but it worked for me (because I have the cache tables) but never tested how a full-install would work, which the testbot did. +1 for test scripts.

I'm busy for the next few days, and can't get to testing this. Any chance you can update the patch with those changes?

cilefen’s picture

Title: Modules or theme with to much files kill drupal_system_listing performances » Modules or themes with too many files kill drupal_system_listing performances
delacosta456’s picture

hi
Please any update for this please?