Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
robots.txt is part of the core distribution. I think it should be something like robots.txt.example or similar, so that we do not have to update it or change it.
See Programming: Never Hack Core and Site Building: Never Hack Core.
Comment | File | Size | Author |
---|---|---|---|
#63 | 495608-nr-bot.txt | 155 bytes | needs-review-queue-bot |
#43 | move_parts_of_robotstxt-495608-43.patch | 3.65 KB | JeroenT |
#11 | drupal-495608-11.patch | 8.62 KB | tim.plunkett |
#5 | robotstxt.patch | 9.23 KB | RobLoach |
Comments
Comment #1
RobLoachIn Drupal's current state, in order to add stuff to robots.txt, one must either modify robots.txt, or delete the file and use the RobotsTXT module. Requiring custom entries in robots.txt is a common practice of any site, and telling people to "never hack core" just makes absolutely no sense here.
In order to make this sane, we should have calls to /robots.txt output the standard robots.txt. Instead of this being a straight file, however, it would be outputted from a variable/hook. Note that this should also work when mod_rewrite is unavailable.
Comment #2
seutje CreditAttribution: seutje commentedI like this idea, but since it's not a bug and it does involve some changes, it doesn't seem feasible for 7
Comment #3
Damien Tournoud CreditAttribution: Damien Tournoud commentedAgreed with #2.
Comment #4
Dave ReidYar, I be supporting renaming the file to example.robots.txt although I'd love to get it as an actual hook_robotstxt() and hook_robotstxt_alter() in core.
Comment #5
RobLoachThis patch does a few things...
hook_robotstxt()
andhook_robotstxt_alter()
to construct the robots.txtdelivery callback
(not working)Anyone know how $page['#theme_wrappers'] works?
I guess we should rather base this on ajax_deliver instead?
Also, should there be a variable that hook_robotstxt() checks before grabbing from the file for the default value?
Comment #6
NikLP CreditAttribution: NikLP commentedSounds like a heap of good ideas, +1 from me.
Comment #7
Josh The Geek CreditAttribution: Josh The Geek commentedNo $Id$ after tggm. Can you reroll this patch with Git? Also, it was the wrong Id anyways. If you copy a file with an Id, you change it back to $Id$ from its expanded form.
There should also probably be a system_robotstxt like you suggested that contains the usual defaults. Should a test be included? +1 the the whole idea.
Powered by Dreditor.
Comment #8
catchSubscribing. Increasingly I'd like us to stop supporting non-clean urls - at least for things that are only needed on production sites. Then we wouldn't need double logic for so much stuff.
Regardless this seems like a good plan.
Comment #9
RobLoachEverytime I put together a site with a staging or multisite setup, I always hit this. Once again, going to add it to my hit list.
Comment #10
j0nathan CreditAttribution: j0nathan commentedSubscribing.
Comment #11
tim.plunkettReroll with git.
Comment #12
RobLoachI'm still not sure about
drupal_deliver_txt_page()
. Is there a better/cleaner way to output just text in Drupal?Also, this is interesting: #1032234: Use Robots Meta Tag rather than robots.txt when possible
Comment #13
pillarsdotnet CreditAttribution: pillarsdotnet commentedComment #14
jeremyr CreditAttribution: jeremyr commentedWould there be a way to drop the robots.txt file into the respective sites/ folder for a multi-site setup? Each site may need to have their own unique file and it just makes sense to have customizations in the same folder as settings.php.
I'm currently facing this issue with an existing set of D6 sites.
Comment #15
j0nathan CreditAttribution: j0nathan commentedA solution described in comment #14 would also benefit to Aegir which hosts multiple sites into a unique platform.
Comment #16
RobLoachAlthough that does sound handy, I think it's something we should pass off to contrib to handle. First thing is getting hook_robotstxt() in. Then the Robots.txt module for Drupal 8 could worry about loading in additional robots entries from the sites directories.
Comment #17
andypostSuppose we can't make the patch in without dropping none-clean urls support. So at first robots.txt should be moved into example.robots.txt and only after landing of this patch we could start clean-url as requirement.
Also I'd like to point to that system module is not a good place for robots in case #679112: Time for system.module and most of includes to commit seppuku
EDIT: Also let's fix #180379-45: Fix path matching in robots.txt
Comment #18
lpalgarvio CreditAttribution: lpalgarvio commentedneat :)
does a contrib module really have to exist? can this be merged into D8 core? a GUI makes sense.
Comment #19
joachim CreditAttribution: joachim commentedSubscribing.
I just saw a patch to a contrib module (http://drupal.org/node/981670) which recommends that users add lines to robots.txt, and that got me thinking -- surely this should be done with a hook_robotstxt ;)
> Also I'd like to point to that system module is not a good place for robots
Should we move it to a robotstxt.module?
Comment #20
joachim CreditAttribution: joachim commentedBetter title.
Comment #21
pillarsdotnet CreditAttribution: pillarsdotnet commentedHow is the patch in #5 different from the RobotsTxt module?
Comment #22
joachim CreditAttribution: joachim commentedNeat, I didn't know about that!
Looking at that project page, I'd say this:
> and gives you the chance to edit it, on a per-site basis, from the web UI
which isn't in the patch. IMO that can stay in contrib.
Comment #23
joestewart CreditAttribution: joestewart commentedA little related info, hopefully useful. Aegir currently looks in the site files directory for a robots.txt and falls back to the one in Drupal root. Apache commit:
http://drupalcode.org/project/provision.git/commitdiff/e7127de6027c54727...
#1173954: Support for per-site robots.txt
Comment #24
andypostIf core could run as service or without node module I think this functionality should live in module.
Having example.robots.txt make no sense because brings more questions in forums.
Probably core could be shipped with default set of rules but UI can live in contrib as token module does.
Comment #25
andypostHey, it seems nobody works on this so maybe move this issue to D9?
Comment #26
lpalgarvio CreditAttribution: lpalgarvio commentedseems to be the most wise decision.
Comment #27
klonosShould we at the very least:
1. rename the file to default.robots.txt or example.robots.txt
2. require the same copy-rename procedure that we require for default.settings.php during installation (could be automated if no robots.txt exists already).
1. would prevent overwriting any custom file created with each update.
2. would ensure that a robots.txt file exists
Comment #28
RobLoachAs long as we get the patch up to par, then it might still be able to get in.
It attempts to use Drupal's rendering engine rather than outputting text and exiting the process.
Introducing a robotstxt.module to Drupal core could be an option. The current patch sticks it directly into system.module, and we all know system.module is already pretty large.
Questions left to get this patch up to par:
Comment #29
andypostmaybe better to introduce this as core service?
Comment #30
RobLoachThere's also: #1032234: Use Robots Meta Tag rather than robots.txt when possible
Comment #31
andypostthe only way for this in D8 core a router with controller
Comment #32
RobLoachLikely needs a reroll, and switch over to a controller. robots.txt has been bugging me since the Drupal 5 days. Would love to get it out of there so that we don't have to deal with patch workflows there.
Comment #33
andypostYes, controller should get $request to allow fine tuning of the hook results for each of searchbots
Comment #34
Albert Volkman CreditAttribution: Albert Volkman commentedMoving to 9.x.
Comment #34.0
Albert Volkman CreditAttribution: Albert Volkman commentedReference "Never Hack Core" docs.
Comment #35
mc0e CreditAttribution: mc0e commentedWhy was this moved back to 9.x-dev? Seems like it's a few major versions overdue already, and should be given higher priority than that.
Comment #36
catchComment #37
andypostSo 8.x version of robotstxt module works now, it makes sense to discus at least approach...
Answers to #28
module sends just a alterable strings (see implementation all we need is to configure proper caching
@catch I think that's a task with BC:
1) rename txt to example.robots.txt as we have for gitignore
2) add controller and route with proper caching + reading of example file or config
3) leave contrib module to swap controller and provide UI
Comment #38
lpalgarvio CreditAttribution: lpalgarvio commentedComment #39
marcingy CreditAttribution: marcingy at Examiner.com commentedShould be 8.2 as 8.1 is feature frozen
Comment #41
JeroenTCreated a working D8 version of the patch in #11
Comment #42
JeroenTThis is the right patch..
Comment #43
JeroenTThird time's a charm.
Comment #46
dawehnerIs it just me or is this a feature request?
Comment #47
mc0e CreditAttribution: mc0e as a volunteer commentedThere's no new feature here.
The ability to administer Drupal without hacking core has been long accepted as an expected feature, as has support for robots.txt.
I think it's fair to say then that the incompatibility between these long established features is a bug.
Comment #48
dawehnerWell, feel free to argue with the core committers :)
Comment #49
dawehnerNote: In a workflow using https://github.com/drupal-composer/drupal-project or similar, you can totally specify your own, without "hacking" core.
Comment #62
mstrelan CreditAttribution: mstrelan at PreviousNext commentedSee https://www.drupal.org/docs/develop/using-composer/using-drupals-compose... for current recommendation on modifying robots.txt
Comment #63
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #64
Bhanu951 CreditAttribution: Bhanu951 as a volunteer commentedAs the functionality is covered by composer scaffolding now closing this issue as outdated after discussing in slack.
https://drupal.slack.com/archives/C1BMUQ9U6/p1675257997597579
See https://www.drupal.org/docs/develop/using-composer/using-drupals-compose... for current recommendation on modifying robots.txt