Hi I understand clean urls just removes '?q='

however, i migrated a bunch of wordpress posts into my d5 installation, and now my urls are like this:

http://www.sample.com/index.php?q=index.php/2008/08/26/1st-post

when I want them to be like this:

http://www.sample.com/index.php/2008/08/26/1st-post

I figure if I can just get clean urls to remove "index.php?q=" it'll work out. Can someone help please?

Comments

Flying Drupalist’s picture

arh1’s picture

this is an issue with how your web server is handling the URL rewriting (it's not a pathauto issue, though that could be handy for you otherwise).

Drupal's default (non-clean) URLs will be like: http://www.sample.com/index.php?q=path/to/content

the clean URLs would typically be simply like: http://www.sample.com/path/to/content

so, it looks to me like clean URLs are working correctly. the extra 'index.php' in your URLs makes me think this is something specific to your WordPress migration...

what's a sample URL for some native Drupal content on your site (i.e. content not migrated from WordPress)? what RewriteBase, RewriteCond, and RewriteRule lines do you have in your .htaccess file? can you provide a URL for your site?

Flying Drupalist’s picture

Installing pathauto would allow him to specify what url he want and thus fix his issue. No?

ghmercado’s picture

Ok after a lot of experimentation and hair pulling, I've figured out what the problem is (and what I wanna happen).

All the migrated wordpress posts have 'index.php' before them (eg. http://www.site.com/index.php/yyyy/mm/dd/postname)

Hence, with clean urls the urls come out like this:

http://www.site.com/index.php?q=index.php/2007/01/20/postname

which is UGLY

I can change that by editing each post's migrated URL Path Settings to: yyyy/mm/dd/postname (previously index.php/yyyy/mm/dd/postname)

so if I turn on clean urls, it looks great:

http://www.site.com/2007/01/20/postname

My problem therefore is two parts (I think):

First: How do I remove "index.php/" from all my (and only my) migrated wordpress posts? I'm not afraid to do it via phpmyadmin (on a copy of my db of course).
Second: An SEO problem. Google indexed my site all with 'index.php'. Is it possible to make a redirect all search results to remove index.php? It'd be great if I could do so via Drupal, as my webserver (lighttpd) isn't the easiest to play with :)

thanks so much to both you guys and anyone else who'd be kind enough to help me out

arh1’s picture

Installing pathauto would allow him to specify what url he want and thus fix his issue. No?

just for the record, the core path module allows you to create path aliases as desired. what pathauto does is automate the creation of new path aliases so that, e.g., whenever you create a new "blog" node a path alias like "blog/yyyy/mm/dd/node-title" (or whatever pattern you'd like) is created automatically. very handy tool!

First: How do I remove "index.php/" from all my (and only my) migrated wordpress posts? I'm not afraid to do it via phpmyadmin (on a copy of my db of course).

this could be done very easily w/ a manual SQL query via phpmyadmin or CLI. something like:
update url_alias set dst=substring(dst,10) where dst like 'index.php/%';

(note: this is for D5 and i'm just rattling it off the top of my head... your mileage may vary -- please test and tweak this query yourself!)

Second: An SEO problem. Google indexed my site all with 'index.php'. Is it possible to make a redirect all search results to remove index.php? It'd be great if I could do so via Drupal, as my webserver (lighttpd) isn't the easiest to play with :)

hmm, don't know of a good contributed module offhand, though Global Redirect may set you on the right path in your search... with Apache, i think it'd be pretty simple to do this with an additional RewriteRule line.

jwuk’s picture

arh1, I'm a complete beginner at SQL, so I was very glad to learn one can do something like that 'update'. Myself, if I had this problem I would probably export the database, then use a text editor to amend the index.php occurrences, and reload the database. But that wouldn't be as good a solution as yours because, looking at a dump of my database I see I have other occurrences of index.php that shouldn't be changed. Thanks for the pointer.

You also wrote "...i think it'd be pretty simple to do this with an additional RewriteRule line". As one who has wrestled with Apache's rewrite rules and come out of it muddy and beaten, I have to say that I don't doubt for some it is indeed pretty simple. But it's rather like a nuclear scientist saying "...i think it'd be pretty simple to do this with an additional isotronic destabilofluxmeter catching the hadron flubberquacks" -- the rest of us have a hard time finding which shelf in the hardware store to start looking. ;)

ghmercado’s picture

Yeah. Those flubberquacks get you everytime.

ghmercado’s picture

arh1 thanks so much for the help.

For the record I was unable to tell if your suggested sql query was correct.

Instead, I did the ff.:

  1. Via phpmyadmin, I downloaded a copy of the url_alias table
  2. After making doubly sure I could completely do away with it first, I backed it up, and edited the url_alias copy via my text editor and removed all instances of index.php/ .
  3. I uploaded it up onto my db and voila, all index.php/ removed.
  4. I activated Clean URLs, and then via PathAuto > Node Path Settings, I entered "[yyyy]/[mm]/[dd]/[title-raw]" under 'Pattern for all Blog entry paths:'.

I now have gorgeous URLs in the format I prefer: 'http://www.site.com/2007/01/20/postname'

My only problem left is the SEO issue, where incoming links and search engine results still have the 'index.php/' in the URLs. I have asked at the Lighty forum however, so am crossing my fingers.

Thanks again!

arh1’s picture

cool. glad you're making progress!

My only problem left is the SEO issue, where incoming links and search engine results still have the 'index.php/' in the URLs. I have asked at the Lighty forum however, so am crossing my fingers.

well again i don't know lighttpd at all, but in Apache, a line something like this in your .htaccess file would do the trick:

  RewriteRule ^index.php/(.*)$ $1 [L,R=301]

which basically says, if the requested url starts with 'index.php/', rewrite the url by removing that string and just using everything after it. then the L says this should be the last rewrite rule applied, and the 'R=301' says the url has changed permanently.

(again, you'd have to test thoroughly yourself! index.php/ is very close to Drupal's legitimate "non-clean" urls starting with 'index.php?q='.)

ghmercado’s picture

hi i tried the rewrite rule you suggested.

what happens is, when i try entering a valid URL that goes like this:

http://www.website.com/index.php/2007/03/20/name-of-post

it becomes like this:

http://www.website.com/home/serverdirectory/public_html/2007/03/20/name-...

in other words, it replaces 'index.php' with 'home/serverdirectory/public_html', the physical location of the files.

here are my rewrite rules, which are still default from the time i installed things:


RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

hope you dont mind giving me advice again? I know I need to fully udnerstand mod_rewrite but much of this stuff is totally confusing so I'd appreciate any pointing in the right direction. TIA!

arh1’s picture

just doing some quick testing here... try this:

RewriteEngine on
RewriteBase /
RewriteRule ^index.php/(.*)$ $1 [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

in other words, uncomment the RewriteBase line, and add the new RewriteRule immediately after it.

ghmercado’s picture

arh1, if ever you come by Manila let me buy you a beer, ok?

thanks much. My SEO issues are now at peace :)

arh1’s picture

glad you got it working, ghmercado, and more importantly, that you're at peace ;)