robots.txt is part of the core distribution. I think it should be something like robots.txt.example or similar, so that we do not have to update it or change it.

See Programming: Never Hack Core and Site Building: Never Hack Core.

Files: 
CommentFileSizeAuthor
#11 drupal-495608-11.patch8.62 KBtim.plunkett
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es).
[ View ]
#5 robotstxt.patch9.23 KBRobLoach
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es).
[ View ]

Comments

Title:robots.txt is part of core, breaks "never hack core"-principle"Never hack core"-principle broken by robots.txt
Version:6.0» 7.x-dev
Status:Needs review» Active
Issue tags:+robots.txt, +Don't Hack Core

In Drupal's current state, in order to add stuff to robots.txt, one must either modify robots.txt, or delete the file and use the RobotsTXT module. Requiring custom entries in robots.txt is a common practice of any site, and telling people to "never hack core" just makes absolutely no sense here.

In order to make this sane, we should have calls to /robots.txt output the standard robots.txt. Instead of this being a straight file, however, it would be outputted from a variable/hook. Note that this should also work when mod_rewrite is unavailable.

I like this idea, but since it's not a bug and it does involve some changes, it doesn't seem feasible for 7

Version:7.x-dev» 8.x-dev
Category:task» feature

Agreed with #2.

Yar, I be supporting renaming the file to example.robots.txt although I'd love to get it as an actual hook_robotstxt() and hook_robotstxt_alter() in core.

Status:Active» Needs work
Issue tags:+delivery callback
StatusFileSize
new9.23 KB
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es).
[ View ]

This patch does a few things...

  • Leaves robots.txt where it is so if the server does not have Clean URLs, it will still get the default robots.txt
  • When Clean URLs are active, however, it'll send the request over to Drupal to handle
  • Uses hook_robotstxt() and hook_robotstxt_alter() to construct the robots.txt
  • Tries to output the text via hook_menu's delivery callback (not working)

Anyone know how $page['#theme_wrappers'] works?

<?php
 
// Search engine control.
 
$items['robots.txt'] = array(
   
'page callback' => 'drupal_get_robotstxt',
   
'access callback' => TRUE,
   
'type' => MENU_CALLBACK,
   
'delivery callback' => 'drupal_deliver_txt_page',
  );
?>

I guess we should rather base this on ajax_deliver instead?

Also, should there be a variable that hook_robotstxt() checks before grabbing from the file for the default value?

<?php
/**
+ * Implements hook_robotstxt().
+ */
+function system_robotstxt() {
// Cache the robots.txt content from the file system.
$robotstxt = &drupal_static(__FUNCTION__, array());
+  if (empty(
$robotstxt)) {
+    if (
$cache = cache_get(__FUNCTION__)) {
+     
$robotstxt = $cache->data;
+    }
+    else {
        
// Check the robotstxt variable first before grabbing the file contents.
        
$robotstxt = empty(variable_get('robotstxt')) ? file(realpath('robots.txt'), FILE_IGNORE_NEW_LINES) : variable_get('robotstxt');
+     
cache_set(__FUNCTION__, $robotstxt);
+    }
+  }
+  return
$robotstxt;
+}
?>

Sounds like a heap of good ideas, +1 from me.

+++ modules/system/txt.tpl.php 1 Jan 1970 00:00:00 -0000
@@ -0,0 +1,25 @@
+// $Id: html.tpl.php,v 1.6 2010/11/24 03:30:59 webchick Exp $

No $Id$ after tggm. Can you reroll this patch with Git? Also, it was the wrong Id anyways. If you copy a file with an Id, you change it back to $Id$ from its expanded form.

There should also probably be a system_robotstxt like you suggested that contains the usual defaults. Should a test be included? +1 the the whole idea.

Powered by Dreditor.

Subscribing. Increasingly I'd like us to stop supporting non-clean urls - at least for things that are only needed on production sites. Then we wouldn't need double logic for so much stuff.

Regardless this seems like a good plan.

Everytime I put together a site with a staging or multisite setup, I always hit this. Once again, going to add it to my hit list.

Subscribing.

Status:Needs work» Needs review
StatusFileSize
new8.62 KB
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es).
[ View ]

Reroll with git.

I'm still not sure about drupal_deliver_txt_page(). Is there a better/cleaner way to output just text in Drupal?

Also, this is interesting: #1032234: Use Robots Meta Tag rather than robots.txt when possible

Would there be a way to drop the robots.txt file into the respective sites/ folder for a multi-site setup? Each site may need to have their own unique file and it just makes sense to have customizations in the same folder as settings.php.

I'm currently facing this issue with an existing set of D6 sites.

A solution described in comment #14 would also benefit to Aegir which hosts multiple sites into a unique platform.

Would there be a way to drop the robots.txt file into the respective sites/ folder for a multi-site setup? Each site may need to have their own unique file and it just makes sense to have customizations in the same folder as settings.php.

Although that does sound handy, I think it's something we should pass off to contrib to handle. First thing is getting hook_robotstxt() in. Then the Robots.txt module for Drupal 8 could worry about loading in additional robots entries from the sites directories.

Suppose we can't make the patch in without dropping none-clean urls support. So at first robots.txt should be moved into example.robots.txt and only after landing of this patch we could start clean-url as requirement.

Also I'd like to point to that system module is not a good place for robots in case #679112: Time for system.module and most of includes to commit seppuku

EDIT: Also let's fix #180379-45: Fixing Robots.txt

neat :)

does a contrib module really have to exist? can this be merged into D8 core? a GUI makes sense.

Subscribing.

I just saw a patch to a contrib module (http://drupal.org/node/981670) which recommends that users add lines to robots.txt, and that got me thinking -- surely this should be done with a hook_robotstxt ;)

> Also I'd like to point to that system module is not a good place for robots

Should we move it to a robotstxt.module?

Title:"Never hack core"-principle broken by robots.txtgenerate robots.txt from a hook so users don't have to hack core to change it

Better title.

Title:generate robots.txt from a hook so users don't have to hack core to change itMove all or part of robotstxt module into core.

How is the patch in #5 different from the RobotsTxt module?

Neat, I didn't know about that!

Looking at that project page, I'd say this:

> and gives you the chance to edit it, on a per-site basis, from the web UI

which isn't in the patch. IMO that can stay in contrib.

A little related info, hopefully useful. Aegir currently looks in the site files directory for a robots.txt and falls back to the one in Drupal root. Apache commit:

http://drupalcode.org/project/provision.git/commitdiff/e7127de6027c54727...

#1173954: Support for per-site robots.txt

If core could run as service or without node module I think this functionality should live in module.
Having example.robots.txt make no sense because brings more questions in forums.
Probably core could be shipped with default set of rules but UI can live in contrib as token module does.

Hey, it seems nobody works on this so maybe move this issue to D9?

seems to be the most wise decision.

Should we at the very least:

1. rename the file to default.robots.txt or example.robots.txt
2. require the same copy-rename procedure that we require for default.settings.php during installation (could be automated if no robots.txt exists already).

1. would prevent overwriting any custom file created with each update.
2. would ensure that a robots.txt file exists

Title:Move all or part of robotstxt module into core.Move parts of robotstxt module into core.

Hey, it seems nobody works on this so maybe move this issue to D9?

As long as we get the patch up to par, then it might still be able to get in.

How is the patch in #5 different from the RobotsTxt module?

It attempts to use Drupal's rendering engine rather than outputting text and exiting the process.

Should we move it to a robotstxt.module?

Introducing a robotstxt.module to Drupal core could be an option. The current patch sticks it directly into system.module, and we all know system.module is already pretty large.

Questions left to get this patch up to par:

  1. How does one "properly" output a Drupal-generated text file in Drupal 8?
  2. Do we stick it into a robotstxt module in Drupal core, or stick it directly into system.module?

Status:Needs review» Needs work

+++ b/includes/common.incundefined
@@ -218,6 +218,22 @@ function drupal_get_profile() {
+function drupal_get_robotstxt() {
@@ -2543,6 +2559,116 @@ function drupal_deliver_html_page($page_callback_result) {
+function drupal_deliver_txt_page($page_callback_result) {
+++ b/modules/system/system.moduleundefined
--- /dev/null
+++ b/modules/system/txt.tpl.phpundefined

maybe better to introduce this as core service?

Status:Active» Needs work

the only way for this in D8 core a router with controller

Issue tags:+Needs reroll

Likely needs a reroll, and switch over to a controller. robots.txt has been bugging me since the Drupal 5 days. Would love to get it out of there so that we don't have to deal with patch workflows there.

Yes, controller should get $request to allow fine tuning of the hook results for each of searchbots

Version:8.x-dev» 9.x-dev
Issue tags:-Needs reroll

Moving to 9.x.

Issue summary:View changes

Reference "Never Hack Core" docs.