I'd like to hear your thoughts on using the pathauto mod on all content for a heavy traffic site with a lot of content.

For example, lets say I allow pathauto to set the following:

All stories to have the URL of www.domain.com/month/day/year/story_title
All forum topics to have the URL of www.domain.com/month/day/year/forum_title

Will pathauto have any impact on performance vs. standard default naming conventions of drupal (/node/xxxx)?

Thanks.

p.s Yes I know about the issues with 4.6, but to my understanding 4.7 has none.

Comments

iraszl’s picture

Yes, it does take some CPU processes, but you should not notice it really.
--
Creativine: Brands coming alive as Drupal themes

theichurch’s picture

There was a major change from 4.6 to 4.7 with the url aliasing. In 4.6 the entire url aliaing list was loaded for every page. So, when you got up into the thousands of aliases the site would slow way down. In 4.7 every time it comes accross a URL that could be aliased it checks the database to see if it's there and if it is stores it for the rest of the page in case it pops up again.

This moved resources from a lot of resources out of php to more database calls. The overall performance was greatly increased for sites with a lot of url aliases.

I am building a site using pathauto for every url so there will be a big table. In some initial testing (before launch) I have found that when a node is loaded my page makes about 40 or 50 database calls because of the url aliasing being on. The page generation is fairly quick being that it is a shared hosted account, has all of those calls and is still on the order of 1/10th of a second.

Now, this is not under load. When that happens it will really be tested.

Hopefully someone else can answer better... but I am going to now go test it with and without url aliasing on.....

rewted’s picture

That's my issue... I don't know exactly what will happen under a high load/100000+ stories with pathauto.
The last thing I want to do is use pathauto, only to have it bring my site to its knee's after going live.

theichurch’s picture

The way I understand the inner workings is that it works for a few thousand nodes fine. Now, the way the inner workings of drupal are concerned it can scale from a few thousand to a hundred thousand without a performance hit. I looked through the code to come to this conclusion. Props to whomever wrote it. A few thousand vs a hundred thousand aliases makes no difference to drupal.

The difference will be with the database server. How will it handle having that many records to look through. The performance of a pathauto setup is really based on your database setup. Will your database setup scale??? I am about to do a site that will scale rather quickly to a lot of nodes and have no worries using a pathauto setup.

rewted’s picture

What do you mean "scale"?

theichurch’s picture

When I spoke of scale I was writing about will the database server setup be able to not just handle the number of calls but be able to handle having a large table full of the url aliases and being able to look them up quickly.

Can your database have 100,000 url aliases and look them up very quickly?

rewted’s picture

No idea. :)

greggles’s picture

This is really more of a path issue than a pathauto issue - but pathauto is one of the few ways where a site would be likely to have hundreds of thousands of nodes.

Between the Devel Module for creating test nodes and howto benchmark drupal you should be all set to test it out easily.

As one of the pathauto maintainers, I'm very interested in the results. Let us know how it goes - we can add it to the pathauto documentation.

--
Growing Venture Solutions
Drupal Implementation and Support in Denver, CO

cel4145’s picture

I tried pathautho with kairosnews during 4.6 and it was a big drain on resources. Just updated to 4.7 and implemented pathauto. Everything seems to be running fine. There are 4,000+ nodes on Kairosnews now all aliased to their titles.

greggles’s picture

Can you clarify whether pathauto was the problem or path was the problem?

So much of the time path is the problem, but people point the finger at pathauto because pathauto is what created there thousands of url aliases.

--
Growing Venture Solutions
Drupal Implementation and Support in Denver, CO

dindon’s picture

i think this about path or url alias.. where i saw it's take lot of queries ???

let say the site have 300k nodes, how could happen next?? have someone benchmark the url alias??

Thanks

rewted’s picture

I'm planning on using /may/16/06/ as my format.... and noticed a lot of sites using the format /2006/05/16/

Is there any benifit to /2006/05/16, or jsut personal prefference?

theichurch’s picture

the breakdown of /2006/05/16 is done in a hierarchy. The year (2006) is the biggest. Within the year are the months (05) and within the months are the days (16). Each is inside the next. This would allow you to create something like /2006/05/ that would contain all of the writings from the month of may in 2006. And, /2006/ which would be everything from that year.

For organizing and giving options to view it just flows. From the biggest to the smallest. This is a good practice for organizing.

Why are you doing it /may/16/06/ ????

rewted’s picture

I didn't want people to get confused with the format /2006/05/05 since my UK visitors would interpret the placement of month differnetly. Also, aestheticly /month/day/year is simplistic.

You do have a great point, and will most likely switch to /2006/05/16 (year/month/day)

Thanks.