Fix path matching in robots.txt [#180379]

Comment	File	Size	Author
#60	fix-robots_txt-syntax-checker-180379-60.patch	526 bytes	develcuy
#45	180379-comment-url.patch	568 bytes	andypost
#23	180379_fixing_robotstxt-21-d7.patch	1.12 KB	chx
#21	180379_fixing_robotstxt-21-d7.patch	1.12 KB	Freso
#12	robots.txt.d7.trailing_slashes.patch	1.06 KB	Freso
#2	robots_3.patch	1.75 KB	Drupalzilla.com
	robots_2.patch	1.36 KB	Drupalzilla.com

Comment #1

moshe weitzman commented 3 October 2007 at 01:07

Status:

Needs review

» Needs work

thanks for working on this ... please use unified diff format for diffs. see diffandpatch. we're so used to them that i can't recall which lines of yours are adds versus deletes.

Log in or register to post comments

Comment #2

Drupalzilla.com commented 4 October 2007 at 00:33

Status	File	Size
new	robots_3.patch	1.75 KB

Sorry about that (my first time submitting a patch).

I've attached a new patch made with cvs diff -up.

Log in or register to post comments

Comment #3

catch

he/him

English

commented 4 October 2007 at 11:32

Status:

Needs work

» Needs review

setting back to review.

Log in or register to post comments

Comment #4

Anonymous (not verified) commented 4 October 2007 at 12:16

Why do we want to do +Disallow: /node$ and its q equivalent?

Do we really want to remove -Disallow: /contact/ and add +Disallow: /contact$ and others or do we just want to add the $ ending ones? The three engines you mention aren't the only ones.

We should think about creating robots.txt on the fly in the module activation processes. Activate/deactivate module could add/remove robots.txt text.

Log in or register to post comments

Comment #5

catch

he/him

English

commented 4 October 2007 at 14:50

Why do we want to do +Disallow: /node$ and its q equivalent?

Wouldn't that mean duplicate content between example.com/ and example.com/node if /node is the front page?

Log in or register to post comments

Comment #6

Anonymous (not verified) commented 4 October 2007 at 19:38

Yes, I suppose. But then there is http://drupal.org/project/globalredirect which would correct that issue without needing to modify the robots.txt file. And adding http://drupal.org/project/gsitemap can help even further. However for the default install, I see your point.

Log in or register to post comments

Comment #7

Drupalzilla.com commented 4 October 2007 at 22:47

Why do we want to do +Disallow: /node$ and its q equivalent?

example.com/node is duplicate content of example.com/
example.com/node/1 shouldn't be blocked though.

Do we really want to remove -Disallow: /contact/ and add +Disallow: /contact$ and others or do we just want to add the $ ending ones? The three engines you mention aren't the only ones.

It looked to me like the current rule, Disallow: /contact/ was an attempt to block the contact forms. It doesn't block the default contact form because of trailing slash. But it might be best just to leave that rule in its current form.

We should think about creating robots.txt on the fly in the module activation processes. Activate/deactivate module could add/remove robots.txt text.

That is a good idea because a lot of modules create duplicate content problems -- as long as you could still have precise control over the robots.txt rules by hand.

Log in or register to post comments

Comment #8

catch

he/him

English

commented 4 October 2007 at 23:51

dynamic robots.txt for modules could maybe be an addition to this: http://drupal.org/node/53579? - either way it's a very nice idea.

Log in or register to post comments

Comment #9

Freso commented 26 October 2007 at 09:01

Status:

Needs review

» Needs work

example.com/node is duplicate content of example.com/ – per default, this is true, but it can be easily changed, even by people who do not have access to the file system and thus won't be able to edit robots.txt.

Also, where does your use of "$" come from? I haven't been able to discern its function from anything I could find at robotstxt.org or Wikipedia...

It was also agreed upon in issue 75916 to have aggregator indexed by default, so that should be changed to "If you do not want you aggregator pages to be indexed, uncomment the following line".

Finishing this, I'd recommend you to read through issue 75916, as it contains some hints and has some discussion on this.

Log in or register to post comments

Comment #10

catch

he/him

English

commented 26 October 2007 at 10:22

$ is an end of line character, not in the spec, but recognised by all major search engines (this is covered in the issue discussion and I almost cut and pasted).

example.com/node - I agree with though, a lot of sites don't use /node as the front page.

Log in or register to post comments

Comment #11

catch

he/him

English

commented 11 February 2008 at 21:36

Version:

6.x-dev

» 7.x-dev

Bumping to 7.x

Log in or register to post comments

Comment #12

Freso commented 20 February 2008 at 22:26

Status:

Needs work

» Needs review

Status	File	Size
new	robots.txt.d7.trailing_slashes.patch	1.06 KB

Okay, I've had some time to turn this over in my mind, and I'm feeling rather uneasy about using * and $ in the robots.txt, as they're not standard. Google and co. might well support it, but I'll bet you that there are tons of (polite) robots out there that actually follow the standard and doesn't care for (or possibly doesn't even know of) the extensions Google et al. use. And to me they seem like they would confuse standards compliant robots.

The attached patch removes some of the trailing slashes, namely /foo/bar/ to /foo/bar, leaving /foo/ alone (in case someone wants to make a /fooxyz node). Even if the non-standards approach is deemed a road worth continuing along, this patch will provide a temporary boost to robots.txt's effectiveness.

Log in or register to post comments

Comment #13

lilou commented 23 August 2008 at 17:33

Patch still applied.

Log in or register to post comments

Comment #14

cburschka

they

commented 13 November 2008 at 02:31

The attached patch removes some of the trailing slashes, namely /foo/bar/ to /foo/bar

Actually, the patch adds new rules without affecting the existing ones... was that what you meant to do?

Log in or register to post comments

Comment #15

Freso commented 13 November 2008 at 10:27

Yes, this is what I meant to do.

Log in or register to post comments

Comment #16

BartVB commented 14 November 2008 at 23:14

edit: Nevermind :\ Should read the actual patch before replying..

Log in or register to post comments

Comment #17

cburschka

they

commented 14 November 2008 at 21:30

Um... huh? Which rule, specifically, prevents http://drupal.org/forum/...etc from being indexed? These are all of the new ones:

+Disallow: /comment/reply
+Disallow: /node/add
+Disallow: /user/register
+Disallow: /user/password
+Disallow: /user/login
+Disallow: /?q=comment/reply
+Disallow: /?q=node/add
+Disallow: /?q=user/password
+Disallow: /?q=user/register
+Disallow: /?q=user/login

Log in or register to post comments

Comment #18

16 November 2008 at 21:40

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #19

lilou commented 17 November 2008 at 13:29

Status:

Needs work

» Needs review

See: #335122: Test clean HEAD after every commit and http://pastebin.ca/1258476

Log in or register to post comments

Comment #20

29 November 2008 at 21:20

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #21

Freso commented 7 March 2009 at 14:46

Status:

Needs work

» Needs review

Status	File	Size
new	180379_fixing_robotstxt-21-d7.patch	1.12 KB

Re-roll.

Log in or register to post comments

Comment #22

Freso commented 7 March 2009 at 14:48

Also: Marked #278775: Allow robots.txt to disallow URLs with "sort" and "filter" in them a duplicate of this.

Log in or register to post comments

Comment #23

chx commented 19 April 2009 at 18:43

Status	File	Size
new	180379_fixing_robotstxt-21-d7.patch	1.12 KB

reposting for bot's sake.

Log in or register to post comments

Comment #24

Anonymous (not verified) commented 20 April 2009 at 13:12

Status:

Needs review

» Reviewed & tested by the community

Log in or register to post comments

Comment #25

dries commented 21 April 2009 at 05:12

I wonder why we need /?q=user/logout/ -- can something follow the logout-part of the path?

Log in or register to post comments

Comment #26

catch

he/him

English

commented 21 April 2009 at 08:39

Isn't it because that leads you to a 403? Same as admin?

Log in or register to post comments

Comment #27

Freso commented 27 April 2009 at 09:47

It was added with #75916: Include a default robots.txt (commit), but that issue doesn't seem to mention why it is using the slash at the end of it. I think the safe thing to do is to keep it; should we find out it causes trouble, it can be removed later.

Log in or register to post comments

Comment #28

cburschka

they

commented 22 May 2009 at 07:49

Don't see any remaining issues here, unless we want to get rid of some of the trailing slashes.

Log in or register to post comments

Comment #29

webchick

she/they

English

Vancouver 🇨🇦

commented 27 May 2009 at 03:05

Status:

Reviewed & tested by the community

» Needs work

We could do with a comment at the top of this file that explains why the paths are repeated. Although I would love a reason better than "We don't know why the slashes are there" :P I'm wondering if we should just remove them, since the contents of this file with this patch are absolutely baffling.

Any SEO experts in the house?

Log in or register to post comments

Comment #30

Anonymous (not verified) commented 28 May 2009 at 02:53

http://www.google.com/support/webmasters/bin/answer.py?answer=35237

Log in or register to post comments

Comment #31

webchick

she/they

English

Vancouver 🇨🇦

commented 28 May 2009 at 02:56

@earnie: Can you explain how that page explains why we need both trailing and not trailing slashes on every path? And if so, could you formulate that into a comment and re-roll the patch?

Log in or register to post comments

Comment #32

Anonymous (not verified) commented 28 May 2009 at 11:01

I think it more says we need the ones with the slash more than we need the ones without it. See http://www.google.com/support/webmasters/bin/answer.py?answer=40360&ctx=... for examples.

In particular:

# To block a directory and everything in it, follow the directory name with a forward slash.

Disallow: /junk-directory/

Log in or register to post comments

Comment #33

eMPee584 commented 9 July 2009 at 15:06

IISC there's something important missing here: wildcard paths for multilingual site, as posted on #347515: robots.txt: add wildcarded paths for multilingual sites:

# For multi-language sites (wildcards supported at least
# by GoogleBot, MSNBot and Yahoo Slurp web spiders)
# Paths (clean URLs)
Disallow: /*/admin/
Disallow: /*/comment/reply/
Disallow: /*/contact/
Disallow: /*/logout/
Disallow: /*/node/add/
Disallow: /*/search/
Disallow: /*/user/register/
Disallow: /*/user/password/
Disallow: /*/user/login/
# Paths (no clean URLs)
Disallow: /*/?q=admin/
Disallow: /*/?q=comment/reply/
Disallow: /*/?q=contact/
Disallow: /*/?q=logout/
Disallow: /*/?q=node/add/
Disallow: /*/?q=search/
Disallow: /*/?q=user/password/
Disallow: /*/?q=user/register/
Disallow: /*/?q=user/login/

Log in or register to post comments

Comment #34

Anonymous (not verified) commented 14 October 2009 at 16:15

Version:	7.x-dev	» 6.x-dev
Status:	Needs work	» Active

Is there a reason why /user/ isn't blocked? The current robots.txt file blocks "/user/logon" but nothing addresses "/user" (which routes to the same logon page). Seems to me addition of the following would be required:

Disallow: /user/
Disallow: /?q=user/

Similarly, attempts to navigate to /system/ or /system/files/ result in a "page not found" error. This is good. But when files are attached to nodes, those files become available as /system/files/foo.txt (replace foo.txt with the appropriate filename+extension). I have seen said file attachments indexed by Google (not good, imho). Wouldn't the following additions to robots.txt prevent the indexing of any node-attached files?

Disallow: /system/files/
Disallow: /?q=system/files/

Log in or register to post comments

Comment #35

vm commented 14 October 2009 at 16:18

Version:	6.x-dev	» 7.x-dev
Status:	Active	» Needs work

readjusting version and status as there is already a patch in play that needs work according to webchicks comments in #29

Log in or register to post comments

Comment #36

Anonymous (not verified) commented 15 October 2009 at 04:00

Thanks @VeryMisunderstood, I'm working with 6.14 and didn't consider how changing the version from what was defaulted might be a problem. My apologies.

http://robotstxt.org is supposed to be the definitive source but they're currently generating a 503 server error. So I hit Wikipedia.

The robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final '/' character appended, otherwise all files with names starting with that substring will match, rather than just those in the directory intended.

What that tells me is inclusion of the trailing slash obviates the need for additional entries including the same path. So 1-3 below will disallow indexing of content within specific directories subordinate to "http://foo.com/user/" while 4 accomplishes the same in addition to disallowing _everything else_ subordinate to "http://foo.com/user/":

Disallow: /user/register/ (blocks everything under "http://foo.com/user/register/")
Disallow: /user/password/ (blocks everything under "http://foo.com/user/password/")
Disallow: /user/login/ (blocks everything under "http://foo.com/user/login/")
Disallow: /user/ (blocks everything under "http://foo.com/user/")

As far as I can tell, the current approach (inclusion of the trailing slash) is correct. But it seems to me "Disallow: /user/" could replace "Disallow: /user/register/, /user/password/, /user/login/" (same for the non-clean URL equivalents).

Finally, in my initial comment (#34), I also suggested adding another dir path (/system/files/) so as to block indexing of any files attached to nodes. I think it might make sense to simply block /system/ but haven't looked at that carefully to prevent unintended consequences.

Log in or register to post comments

Comment #37

mattyoung commented 15 October 2009 at 07:32

.

Log in or register to post comments

Comment #38

Anonymous (not verified) commented 16 October 2009 at 15:40

Following up on my last post (#36), http://robotstxt.org continues to be offline. Wikipedia isn't a bad source but also not the most reliable. So I checked W3C (w3.org) and they provide the same details on "robots.txt".

Disallow: /help disallows both /help.html and /help/index.html

Disallow: /help/ would disallow /help/index.html but allow /help.html

I believe this confirms my suspicion that "Disallow: /user/" would negate "Disallow: /user/register/", /user/password/", and "/user/login/" (same for the non-clean URL equivalents). Additionally, "/user/" yields a login page that is currently not blocked from indexing. So adding "Disallow: /user/" (to replace the 3 /user paths listed in the current robots.txt) will block the one that is currently not getting blocked from indexing, while also doing what the three current entries attempt to accomplish.

Something else I just thought of... how would all this work with a multi-site installation? I'd like to test this but to-date have not been able to successfully complete a multi-site installation (yes, I've tried following the handbook references). If someone can coach me through a test multi-site installation, I'd be happy to look into this.

And as I mentioned in #36, I believe it makes sense to add "Disallow: /system/files/" (and the non-clean URL equiv). This would block the indexing of node file attachments, presuming the file system default of "/files" or "(/something)/files" is retained.

Log in or register to post comments

Comment #39

vm commented 12 January 2019 at 14:17

when using the private file system, the files folder should be moved above the public root which as far as I can tell disallows anon users to reach them. Bots index as anon users? a file system set as private but left in the public root is essentially public regardless of setting?

I'd gladly help you with a multisite install. I've done a few. However, this thread isn't the place for those instructions. Feel free to create a forum thread. May even want to do a search on the forums as I've posted my successful steps multiple times.

Log in or register to post comments

Comment #40

j0nathan commented 14 July 2010 at 14:44

subscribing

Log in or register to post comments

Comment #41

mlbrgl commented 21 October 2010 at 10:21

@zacamjo - #38

Wouldn't "Disallow: /user/" also block "/user/[USER ID]" paths, that community sites might want to keep getting indexed, when they are public?

http://www.google.com/search?q=site%3Adrupal.org%2Fuser%2F

Log in or register to post comments

Comment #42

YK85 commented 7 February 2011 at 07:43

I was wondering if someone can help setup the robot.txt for drupal 6 for a multilingual site? Thank you!

Log in or register to post comments

Comment #43

j0nathan commented 7 February 2011 at 12:42

Hi, here is another example of a modified robots.txt file, for multilingual Drupal 6 site:
https://wiki.koumbit.net/DrupalRobots

Log in or register to post comments

Comment #44

YK85 commented 7 February 2011 at 13:58

It seems like #43 link is using the method in #33.
I'm still not clear if all 8 lines shown below needs to be in the robot.txt for each url:
Does anyone know for sure?

# Paths (no clean URLs)
Disallow: /?q=admin/

# Paths (clean URLs)
Disallow: /admin/

# Paths (clean URLs) no trailing
Disallow: /admin

# Paths (no clean URLs) no trailing
Disallow: /?q=admin

# Paths (clean URLs) multilingual
Disallow: /*/admin/

# Paths (no clean URLs) multilingual
Disallow: /*/?q=admin/

# Paths (clean URLs) multilingual, no trailing
Disallow: /*/admin

# Paths (no clean URLs) multilingual, no trailing
Disallow: /*/?q=admin

Log in or register to post comments

Comment #45

andypost

he/him

Russian

commented 19 April 2011 at 09:38

Version:	7.x-dev	» 8.x-dev
Status:	Needs work	» Needs review

Status	File	Size
new	180379-comment-url.patch	568 bytes

D7 introduced comment/% urls for comments this brings a huge trouble with content duplication

So proposal is totally disable /comment/

Log in or register to post comments

Comment #46

robloach

he/him

commented 22 April 2011 at 22:48

Log in or register to post comments

Comment #47

pillarsdotnet commented 28 May 2011 at 00:08

#45: 180379-comment-url.patch queued for re-testing.

Log in or register to post comments

Comment #48

andypost

he/him

Russian

commented 1 September 2011 at 17:55

Issue tags:

+SEO, +Drupal SEO

This trouble mostly cause by "Last comments" block which points to comment/ID#comment-ID

Another way to fix this to change a block to display links for comments like /node/NID#comment-ID

Log in or register to post comments

Comment #49

ayesh commented 9 December 2011 at 11:57

No need to mention that a simple problem in robots.txt can be a fatal problem for sites that mainly depend on Google's traffic.
Keeping Google traffic in mind, I could set query params in Google webmaster central to INDEX page, sort and order queries. GW has a really cool feature to set which query do what.

About /comment/ID URLs, I got content duplication warnings and a robots.txt entry to disallow them worked great. But Do we really need to give each comment a URL ?
D6's comment URL pattern looks nice but without the node ID, comment URLs are a little misleading.

Log in or register to post comments

Comment #51

kscheirer

English

Vallejo

commented 6 January 2013 at 01:02

Issue tags:

-SEO, -Drupal SEO

#45: 180379-comment-url.patch queued for re-testing.

Log in or register to post comments

Comment #52

6 January 2013 at 01:04

Status:	Needs review	» Needs work
Issue tags:		+SEO, +Drupal SEO

The last submitted patch, 180379-comment-url.patch, failed testing.

Log in or register to post comments

Comment #53

maciej.zgadzaj commented 11 March 2013 at 16:28

Re #44:

# Paths (clean URLs) no trailing
Disallow: /admin

Would block (for example) /administration-guide

# Paths (no clean URLs) no trailing
Disallow: /?q=admin

Would block /?q=administration-guide

# Paths (clean URLs) multilingual
Disallow: /*/admin/

Would block /content/admin/

# Paths (clean URLs) multilingual, no trailing
Disallow: /*/admin

Would block /content/administration-guide

Log in or register to post comments

Comment #54

Anonymous (not verified) commented 11 March 2013 at 16:57

Re #53: And why do we want a robot accessing administration-guide anyway?

Log in or register to post comments

Comment #55

maciej.zgadzaj commented 11 March 2013 at 17:04

Re #53: And why do we want a robot accessing administration-guide anyway?

Because that could be an article alias, content of which someone could want to have indexed by a search engine?

Log in or register to post comments

Comment #56

andypost

he/him

Russian

commented 16 October 2013 at 16:56

The related discussion about links to comments #2113323: Rename Comment::permalink() to not be ambiguous with ::uri()

Log in or register to post comments

Comment #57

develcuy commented 23 June 2014 at 05:41

There is a broken link at line 14. The new link is: http://www.robotstxt.org/robotstxt.html

Log in or register to post comments

Comment #58

hass commented 14 December 2014 at 11:46

http://www.frobee.com/robots-txt-check link in robots.txt is broken.

Log in or register to post comments

Comment #59

ronaldmulero commented 17 January 2015 at 21:13

Log in or register to post comments

Comment #60

develcuy commented 18 January 2015 at 01:35

Status	File	Size
new	fix-robots_txt-syntax-checker-180379-60.patch	526 bytes

following #58, there is a good syntax checker that requires no account creation like the google one: https://webmaster.yandex.com/robots.xml

Patch attached.

Log in or register to post comments

Comment #61

gbisht commented 18 January 2015 at 10:02

Status:	Needs work	» Needs review
Issue tags:		+SprintWeekend2015

@develCuy please put the issue in needs review after submitting the patch.

Log in or register to post comments

Comment #62

jonhattan

Spanish

Plasencia

commented 20 January 2015 at 17:55

Status:

Needs review

» Reviewed & tested by the community

In the general term it would be more accurate to link to http://www.robotstxt.org/checker.html, wikipedia or any other trusted source but none of them provide a listing.

Log in or register to post comments

Comment #63

alexpott

he/they

English

🇪🇺🌍

commented 22 January 2015 at 18:31

Status:

Reviewed & tested by the community

» Needs work

Hmmm the patch on #60 is completely unrelated to the issue summary. I think that the fact that the http://www.frobee.com/robots-txt-check is broken should be a new issue. That new issue should discuss whether or not we should link to a validator in robots.txt - to me this seems superfluous.

Log in or register to post comments

Comment #64

cilefen commented 20 May 2015 at 17:28

1 file was hidden/shown/deleted

Status	File	Size
hidden	fix-robots_txt-syntax-checker-180379-60.patch	526 bytes

#63 was fixed in #2446657: Dead link on robots.txt.

Log in or register to post comments

Comment #65

cilefen commented 20 May 2015 at 17:30

Title:

Fixing Robots.txt

» Fix path matching in robots.txt

Log in or register to post comments

Comment #66

ayesh commented 20 May 2015 at 17:59

There's a lot to fix in the robots.txt file.
#2446657: Dead link on robots.txt
#1137848: /filter/tips page is listed by search engines

Still, it needs some rework, now that Google recommends to not block CSS/JS folders for its mobile-friendly SEO rankings (1, 2). Of course we shouldn't be focusing on just Google, but I do not see the motivation behind blocking module and theme paths, login pages (People do search for "facebook login", "facebook sign up", etc).

Log in or register to post comments

Comment #67

deepakaryan1988

Hindi

commented 18 June 2015 at 08:58

Issue tags:

-SprintWeekend2015

Removing sprint weekend tag!!
As suggested by @YesCT

Log in or register to post comments

Comment #68

deepakaryan1988

Hindi

commented 18 June 2015 at 13:32

Issue tags:

+SprintWeekend2015

Sorry, these issues were actually worked on during the 2015 Global Sprint
Weekend https://groups.drupal.org/node/447258

Log in or register to post comments

Comment #69

lpalgarvio commented 22 March 2016 at 23:39

Version:

8.0.x-dev

» 8.1.x-dev

Log in or register to post comments

Comment #70

22 March 2016 at 23:39

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #72

22 March 2016 at 23:39

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #73

22 March 2016 at 23:39

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #74

salvis

he/his

commented 1 December 2017 at 16:35

Found this old issue...

According to https://developers.google.com/search/reference/robots_txt, /fish/ does not match /fish, i.e. /admin/ doesn't match /admin, so GoogleBot may try to access /admin (and hit 304) if some hacker links there.

If you have a "Log in" Link on your front page, you'll find that Google fully indexes /user/login, even though our robots.txt has Disallow: /user/login/

Log in or register to post comments

Comment #75

1 December 2017 at 16:35

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #76

1 December 2017 at 16:35

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #77

avpaderno

he/him

Italian

Brescia, 🇮🇹 🇪🇺

commented 12 January 2019 at 09:47

Version:

8.6.x-dev

» 8.7.x-dev

Log in or register to post comments

Comment #78

philsward commented 15 January 2019 at 06:25

Considering this problem has been around since Drupal 5 and this issue has been around for over a decade now, I don't see it ever getting committed.

Crazy how difficult it is to get a simple text file committed for Drupal.

Somebody may as well close the issue as "Won't Fix".

Log in or register to post comments

Comment #79

norman.lol

he/him

German

Berlin

commented 16 January 2019 at 21:53

I think there's just some fundamental (human) SEO expertise needed to bring this issue forward.

It definitely needs some more attention, yes.

Log in or register to post comments

Comment #80

16 January 2019 at 21:53

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #81

cilefen commented 1 July 2019 at 14:24

Google have just open-sourced their robots.txt parser.

Log in or register to post comments

Comment #82

1 July 2019 at 14:24

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #83

1 July 2019 at 14:24

Version:

8.9.x-dev

» 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #84

1 July 2019 at 14:24

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Log in or register to post comments

Comment #85

1 July 2019 at 14:24

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #86

longwave

he/him

English

UK

commented 31 May 2021 at 21:49

Status:

Needs work

» Postponed (maintainer needs more info)

I'm not sure what actionable tasks there are for this issue. There is a lot of discussion of different factors but there doesn't seem to be anything concrete we can move forward with. I think all Drupal core can hope to do here is ship a simple robots.txt file that covers some basic paths used by core, as it does at present. Site owners can edit the file directly or install robotstxt module if they wish to override the default settings.

I don't think we can use the * or $ operators, while these are supported by some search engines they are almost certainly not accepted by all.

#74 was resolved in #3123285: Actually exclude user register, login, logout, and password pages from search results in robots.txt (current rules are broken)

I suggest that this issue should be closed but if there are specific, actionable problems with any of the lines in the current robots.txt that these are discussed in new issues.

Log in or register to post comments

Comment #87

31 May 2021 at 21:49

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #88

catch

he/him

English

commented 4 January 2022 at 13:05

Status:

Postponed (maintainer needs more info)

» Closed (works as designed)

It's been a few months since #86, let's close this one.

Log in or register to post comments

Fix path matching in robots.txt

Comments