All of our URLs that end in index.htm are having it removed, so that /folder/index.htm just ends at /folder/. The "Trim Suffixes" option is unchecked, so I'm not sure why this is happening. How do we stop it from doing this?

Comments

dman’s picture

This is by design to emulate behavior in static sites where you would be able to enter "/folder" and get the contents of "folder/index.html" transparently.
With normal import_html settings, to make this work in Drupal, two aliases for "folder" and "folder/index.html" are created that both return the contents of node/{n} - so both types of link are valid if you request them, but (by choice) the short one is given priority and shows in the editing mode.

In short - this feature should not be causing navigation problems, it's there so that either version of incoming link still work.

If you want to disable that, then in the settings under "Import and Content Analysis Options" you can remove that filename from the "Default Document" settings. Then index.htm will no longer be treated as special files, and will no longer be the 'index' page content for links to the short path.
However, removing that feature may weaken a few of the other heuristics that are built around assuming that there will be a default document for every folder. If you stop treating index files as special cases, the menu builder will start listing them like normal files, and that is not what I'd usually want.

As a work-around, you could instead *not* disable the default document while importing and building menus, but instead try manually deleting the short version of the url alias directly from the URL alias admin table *after* importing.
That will remove support for short names, and leave only the long name in the url table. And the long name will be left as the URL for that page.

With a little code work, you could probably patch import_html/modules/core.inc:path_import_html_after_save() to behave differently than it does, but the normal intent of an html import is to clean up legacy links like this, so it's not an option that will be exposed or supported.

dman’s picture

Status: Active » Closed (works as designed)
teschek’s picture

Status: Closed (works as designed) » Active

I'm following up with this after a long hiatus where I haven't had time to work on it. I still can't get the import to stop dropping 'index.htm' from the end of link paths in existing html pages. For example, on import a link that has something like "/folder/folder/folder/index.htm" will come out after import as reading "/folder/folder/folder". On the other hand, if the link just reads "index.htm" it will retain that. Many of our links have just the simple index.htm because the index page being referenced is in the same folder, but when we are trying to reference an index.htm page in another folder we have to include a path and in that case it strips "index.htm" off. I just want it to stop this stripping behavior. Is there a way to do this? I read your previous comment but if you answered my question there I'm not understanding you.

teschek’s picture

Still waiting for an answer on this one... Thanks.