Environment:

- multisite using a sites.php for aliases
- the site is www.mysite.com aliased in sites.php for folder "mysite"

My text format settings, on the production site (www.mysite.com) is set to "full url" and these urls:

http://test.mysite.com
http://acc.mysite.com
http://prod.mysite.com
http://www.mysite.com <-- current url

Issue:

Sometimes (not sure when this happens, but NOT after cron, which makes it difficult) all pathologic URLs in the speficied formats are rewritten to:

- http://mysite/...
instead of
- http://www.mysite.com

which obviously causes broken links.

After a cache clear, everything is okay again.

Comments

Garrett Albright’s picture

I bet a cron run is to blame here, actually, specifically one done via Drush. When Drush runs, sometimes it doesn't know the "real" URL of a site, since the web server is being bypassed, so it uses "default" or another alias name instead as the server URL. Normally this isn't a problem with Pathologic since in most cases content isn't being rendered during cron runs anyway, but it could happen. Might that be what's happening in your case?

askibinski’s picture

Priority: Major » Normal

@Garett

Yes! that might definitely be the case! Thanks for the insight!

In fact, it might explain some other weird behaviour I was having (an email was sent out with broken http://default links). Is this a known drush issue? I'm searching the drush issue queue but can't find it. We're on drush 5.8.

Garrett Albright’s picture

askibini, yes, it's a normal Drush issue, though not one that actually causes problems in most cases, including Pathologic's, and not one that's easily solvable for all cases where it might cause one. There's an easy fix, though (which I meant to include in my previous post but apparently forgot for some reason… sorry): Specify a -l/--url parameter on the Drush command to explicitly state what the root URL for the site should be; eg drush -l "http://example.com/" cron . Give that a try and see if it solves the problems.

Parke’s picture

I have the same issue.
Not using sites.php (not a multisite settup) or any Drush commands.

Peculiar bit:

Previous employee had these full urls:
http://www.mysite.gov/
http://mysite.gov/
158.12.123.123/

We no longer want to host these domains. I saved all text formats to read:
http://www.mysite.gov/

But, IP and sans-www domains are still being published - intermittently - when re-writing node/xxxx links.

Along with clearing cache, I can change text format and resave block's or page's text box to fix.

Should I install Drush just to clear this issue? I thought I circumvented the issue by un-commenting base_url assignment in settings.php. This caused user login to fail. I re-commented $base_url and users could login. I then noticed node/xxx links rendering incorrectly again. Shooting in the dark there.

Thanks Garrett.

Garrett Albright’s picture

Wheres_Bonzo, if you're not using Drush, then how are you doing cron runs currently? The other approach is to use a browser to call cron.php currently, which I don't recommend for performance reasons (it's really sort of a hack), but if you're using that approach, you shouldn't be having this sort of issue, since Drupal will be running in a normal web service environment and be able to properly determine the desired domain name. But maybe you're doing things in an odd way…?

Parke’s picture

We are using out of the box Poorman's cron setup accessible from Administration » Configuration » System » Cron.
The cron.php file installed simply calls bootstrap.inc and off it goes, I'm assuming.

Parke’s picture

This issue showed itself today. Our footer links built with node/xxxx addresses are displaying as mysite.gov instead of www.mysite.gov. Clearing cache or re-saving block with text element did not fix the issue.

I browsed the recent log messages and found that 4 completed cron jobs ran from 4 separate 'locations' with the URLs listed in #4. Seeing as your first hunch was a cron issue, this seemed interesting and maybe can give you another clue.

Actions: Stop using poorman's cron and install Drush on the windows machine. Use your fix in #3 to fix this issue. I'll respond within 2 weeks to close this issue.

Garrett Albright’s picture

Parke, you're saying that clearing cache or editing filtered content used to fix the problem, but no longer does? That doesn't sound like a Cron-related problem, then.

askibinski’s picture

Would it also help to provide the $base_url in settings.php? (the one which is commented out by default) - I normally don't set this. I wonder if drush looks at this var.

Parke’s picture

$base_url seemed to work, but this caused our https login to break.

I found a block that re-wrote itself today to the IP version of the /node link. I used phpmyadmin to search for the IP address, but all references were either in cache, watchdog, or accesslog tables which I don't think pathologic uses to re-write the /node links.

I have searched the server itself for any references to the IP address and cannot locate it either.

To temp. fix issue: I navigated to edit the block. I selected 'Disable rich-text' then immediately choose 'Enable rich-text'. I saved the block and the public version was fixed to write in our full domain path for each /node link.

I have installed our setup on a development VM server that runs on the localhost. I can run any tests you'd like to see. On this fresh install, the pages that live on /localhost/ are created with nodes pointing to www.mysite.gov. If I clear cache/edit block, they are re-written to /localhost/.

Parke’s picture

Another instance showed itself today. One with www.mysite.gov. (with a trailing period.)

I've never seen this entry in our pathologic settings.

Could this be an explanation: A user visits our page with an old domain. This kicks of the drupal core cron that tells pathologic these domain settings could be used? The watchdog tables have a 'cron run completed' entry for the location of www.mysite.gov. (with the trailing period.)

Thanks for looking into this.

Garrett Albright’s picture

Turn off Drupal's poormanscron, for God's sake! That abomination should never had made it into core.

But either way, it could be an explanation, yes. Somehow people (or bots, more likely) are accessing your site using the IP address or unexpected URLs and kicking off Drupal's cron. Hopefully turning it off will stop that.

MrHaroldA’s picture

Drupal's poormanscron will never run if you have a regular cron job that runs Drupal's cron.

Parke’s picture

Status: Active » Closed (works as designed)

Followed advice and turned off any cron triggered by php calls. Now I'm running a scheduled task to kick off a drush call in #3. Also set base_url for other Drupal re-writes of paths.

I changed status as works as designed seeing as Pathologic wasn't the root cause.

Again, thanks for your help.

Parke’s picture

Issue summary: View changes

strikethorugh