Support from Acquia helps fund testing for Drupal Acquia logo

Comments

RobLoach’s picture

Title: robots.txt is part of core, breaks "never hack core"-principle » "Never hack core"-principle broken by robots.txt
Version: 6.0 » 7.x-dev
Status: Needs review » Active
Issue tags: +robots.txt, +Don't Hack Core

In Drupal's current state, in order to add stuff to robots.txt, one must either modify robots.txt, or delete the file and use the RobotsTXT module. Requiring custom entries in robots.txt is a common practice of any site, and telling people to "never hack core" just makes absolutely no sense here.

In order to make this sane, we should have calls to /robots.txt output the standard robots.txt. Instead of this being a straight file, however, it would be outputted from a variable/hook. Note that this should also work when mod_rewrite is unavailable.

seutje’s picture

I like this idea, but since it's not a bug and it does involve some changes, it doesn't seem feasible for 7

Damien Tournoud’s picture

Version: 7.x-dev » 8.x-dev
Category: task » feature

Agreed with #2.

Dave Reid’s picture

Yar, I be supporting renaming the file to example.robots.txt although I'd love to get it as an actual hook_robotstxt() and hook_robotstxt_alter() in core.

RobLoach’s picture

Status: Active » Needs work
Issue tags: +delivery callback
FileSize
9.23 KB

This patch does a few things...

  • Leaves robots.txt where it is so if the server does not have Clean URLs, it will still get the default robots.txt
  • When Clean URLs are active, however, it'll send the request over to Drupal to handle
  • Uses hook_robotstxt() and hook_robotstxt_alter() to construct the robots.txt
  • Tries to output the text via hook_menu's delivery callback (not working)

Anyone know how $page['#theme_wrappers'] works?

  // Search engine control.
  $items['robots.txt'] = array(
    'page callback' => 'drupal_get_robotstxt',
    'access callback' => TRUE,
    'type' => MENU_CALLBACK,
    'delivery callback' => 'drupal_deliver_txt_page',
  );

I guess we should rather base this on ajax_deliver instead?

Also, should there be a variable that hook_robotstxt() checks before grabbing from the file for the default value?

 /**
+ * Implements hook_robotstxt().
+ */
+function system_robotstxt() {
+  // Cache the robots.txt content from the file system.
+  $robotstxt = &drupal_static(__FUNCTION__, array());
+  if (empty($robotstxt)) {
+    if ($cache = cache_get(__FUNCTION__)) {
+      $robotstxt = $cache->data;
+    }
+    else {
         // Check the robotstxt variable first before grabbing the file contents.
         $robotstxt = empty(variable_get('robotstxt')) ? file(realpath('robots.txt'), FILE_IGNORE_NEW_LINES) : variable_get('robotstxt');
+      cache_set(__FUNCTION__, $robotstxt);
+    }
+  }
+  return $robotstxt;
+}
NikLP’s picture

Sounds like a heap of good ideas, +1 from me.

Josh The Geek’s picture

+++ modules/system/txt.tpl.php	1 Jan 1970 00:00:00 -0000
@@ -0,0 +1,25 @@
+// $Id: html.tpl.php,v 1.6 2010/11/24 03:30:59 webchick Exp $

No $Id$ after tggm. Can you reroll this patch with Git? Also, it was the wrong Id anyways. If you copy a file with an Id, you change it back to $Id$ from its expanded form.

There should also probably be a system_robotstxt like you suggested that contains the usual defaults. Should a test be included? +1 the the whole idea.

Powered by Dreditor.

catch’s picture

Subscribing. Increasingly I'd like us to stop supporting non-clean urls - at least for things that are only needed on production sites. Then we wouldn't need double logic for so much stuff.

Regardless this seems like a good plan.

RobLoach’s picture

Everytime I put together a site with a staging or multisite setup, I always hit this. Once again, going to add it to my hit list.

j0nathan’s picture

Subscribing.

tim.plunkett’s picture

Status: Needs work » Needs review
FileSize
8.62 KB

Reroll with git.

RobLoach’s picture

I'm still not sure about drupal_deliver_txt_page(). Is there a better/cleaner way to output just text in Drupal?

Also, this is interesting: #1032234: Use Robots Meta Tag rather than robots.txt when possible

pillarsdotnet’s picture

jeremyr’s picture

Would there be a way to drop the robots.txt file into the respective sites/ folder for a multi-site setup? Each site may need to have their own unique file and it just makes sense to have customizations in the same folder as settings.php.

I'm currently facing this issue with an existing set of D6 sites.

j0nathan’s picture

A solution described in comment #14 would also benefit to Aegir which hosts multiple sites into a unique platform.

RobLoach’s picture

Would there be a way to drop the robots.txt file into the respective sites/ folder for a multi-site setup? Each site may need to have their own unique file and it just makes sense to have customizations in the same folder as settings.php.

Although that does sound handy, I think it's something we should pass off to contrib to handle. First thing is getting hook_robotstxt() in. Then the Robots.txt module for Drupal 8 could worry about loading in additional robots entries from the sites directories.

andypost’s picture

Suppose we can't make the patch in without dropping none-clean urls support. So at first robots.txt should be moved into example.robots.txt and only after landing of this patch we could start clean-url as requirement.

Also I'd like to point to that system module is not a good place for robots in case #679112: Time for system.module and most of includes to commit seppuku

EDIT: Also let's fix #180379-45: Fix path matching in robots.txt

lpalgarvio’s picture

neat :)

does a contrib module really have to exist? can this be merged into D8 core? a GUI makes sense.

joachim’s picture

Subscribing.

I just saw a patch to a contrib module (http://drupal.org/node/981670) which recommends that users add lines to robots.txt, and that got me thinking -- surely this should be done with a hook_robotstxt ;)

> Also I'd like to point to that system module is not a good place for robots

Should we move it to a robotstxt.module?

joachim’s picture

Title: "Never hack core"-principle broken by robots.txt » generate robots.txt from a hook so users don't have to hack core to change it

Better title.

pillarsdotnet’s picture

Title: generate robots.txt from a hook so users don't have to hack core to change it » Move all or part of robotstxt module into core.

How is the patch in #5 different from the RobotsTxt module?

joachim’s picture

Neat, I didn't know about that!

Looking at that project page, I'd say this:

> and gives you the chance to edit it, on a per-site basis, from the web UI

which isn't in the patch. IMO that can stay in contrib.

joestewart’s picture

A little related info, hopefully useful. Aegir currently looks in the site files directory for a robots.txt and falls back to the one in Drupal root. Apache commit:

http://drupalcode.org/project/provision.git/commitdiff/e7127de6027c54727...

#1173954: Support for per-site robots.txt

andypost’s picture

If core could run as service or without node module I think this functionality should live in module.
Having example.robots.txt make no sense because brings more questions in forums.
Probably core could be shipped with default set of rules but UI can live in contrib as token module does.

andypost’s picture

Hey, it seems nobody works on this so maybe move this issue to D9?

lpalgarvio’s picture

seems to be the most wise decision.

klonos’s picture

Should we at the very least:

1. rename the file to default.robots.txt or example.robots.txt
2. require the same copy-rename procedure that we require for default.settings.php during installation (could be automated if no robots.txt exists already).

1. would prevent overwriting any custom file created with each update.
2. would ensure that a robots.txt file exists

RobLoach’s picture

Title: Move all or part of robotstxt module into core. » Move parts of robotstxt module into core.

Hey, it seems nobody works on this so maybe move this issue to D9?

As long as we get the patch up to par, then it might still be able to get in.

How is the patch in #5 different from the RobotsTxt module?

It attempts to use Drupal's rendering engine rather than outputting text and exiting the process.

Should we move it to a robotstxt.module?

Introducing a robotstxt.module to Drupal core could be an option. The current patch sticks it directly into system.module, and we all know system.module is already pretty large.

Questions left to get this patch up to par:

  1. How does one "properly" output a Drupal-generated text file in Drupal 8?
  2. Do we stick it into a robotstxt module in Drupal core, or stick it directly into system.module?
andypost’s picture

Status: Needs review » Needs work
+++ b/includes/common.incundefined
@@ -218,6 +218,22 @@ function drupal_get_profile() {
+function drupal_get_robotstxt() {

@@ -2543,6 +2559,116 @@ function drupal_deliver_html_page($page_callback_result) {
+function drupal_deliver_txt_page($page_callback_result) {

+++ b/modules/system/system.moduleundefined
--- /dev/null
+++ b/modules/system/txt.tpl.phpundefined

maybe better to introduce this as core service?

RobLoach’s picture

Status: Needs work » Active
andypost’s picture

Status: Active » Needs work

the only way for this in D8 core a router with controller

RobLoach’s picture

Issue tags: +Needs reroll

Likely needs a reroll, and switch over to a controller. robots.txt has been bugging me since the Drupal 5 days. Would love to get it out of there so that we don't have to deal with patch workflows there.

andypost’s picture

Yes, controller should get $request to allow fine tuning of the hook results for each of searchbots

Albert Volkman’s picture

Version: 8.x-dev » 9.x-dev
Issue tags: -Needs reroll

Moving to 9.x.

Albert Volkman’s picture

Issue summary: View changes

Reference "Never Hack Core" docs.

mc0e’s picture

Issue summary: View changes

Why was this moved back to 9.x-dev? Seems like it's a few major versions overdue already, and should be given higher priority than that.

catch’s picture

Version: 9.x-dev » 8.1.x-dev
andypost’s picture

Version: 8.1.x-dev » 8.0.x-dev
Category: Feature request » Task

So 8.x version of robotstxt module works now, it makes sense to discus at least approach...

Answers to #28
module sends just a alterable strings (see implementation all we need is to configure proper caching

@catch I think that's a task with BC:
1) rename txt to example.robots.txt as we have for gitignore
2) add controller and route with proper caching + reading of example file or config
3) leave contrib module to swap controller and provide UI

lpalgarvio’s picture

Version: 8.0.x-dev » 8.1.x-dev
marcingy’s picture

Version: 8.1.x-dev » 8.2.x-dev

Should be 8.2 as 8.1 is feature frozen

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

JeroenT’s picture

Status: Needs work » Needs review
FileSize
6.13 KB

Created a working D8 version of the patch in #11

JeroenT’s picture

JeroenT’s picture

The last submitted patch, 41: move_parts_of_robotstxt-495608-41.patch, failed testing.

The last submitted patch, 42: move_parts_of_robotstxt-495608-42.patch, failed testing.

dawehner’s picture

Category: Task » Feature request

Is it just me or is this a feature request?

mc0e’s picture

There's no new feature here.

The ability to administer Drupal without hacking core has been long accepted as an expected feature, as has support for robots.txt.

I think it's fair to say then that the incompatibility between these long established features is a bug.

dawehner’s picture

Well, feel free to argue with the core committers :)

dawehner’s picture

Note: In a workflow using https://github.com/drupal-composer/drupal-project or similar, you can totally specify your own, without "hacking" core.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

mstrelan’s picture

See https://www.drupal.org/docs/develop/using-composer/using-drupals-compose... for current recommendation on modifying robots.txt

needs-review-queue-bot’s picture

Status: Needs review » Needs work
FileSize
155 bytes

The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

Bhanu951’s picture

Status: Needs work » Closed (outdated)

As the functionality is covered by composer scaffolding now closing this issue as outdated after discussing in slack.

https://drupal.slack.com/archives/C1BMUQ9U6/p1675257997597579

See https://www.drupal.org/docs/develop/using-composer/using-drupals-compose... for current recommendation on modifying robots.txt