Download & Extend

Drupal outputs two meta content-type tags

Project:Drupal core
Version:6.x-dev
Component:base system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs review
Issue tags:conten-type, Drupal, meta, ouput, tags, two

Issue Summary

I recently noticed Drupal was writting two meta content-type tags to my pages:

  <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="shortcut icon" href="/is118/misc/favicon.ico" type="image/x-icon" />

I believe the issue is with drupal_final_markup and drupal_get_html_head functions in common.inc.
I fixed the issue by preventing drupal_get_html_head() from adding another meta content type, however,
i dont think this is ideal.

is drupal_final_markup necessary?

AttachmentSizeStatusTest resultOperations
fix.patch389 bytesIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch fix_8.patch.View details | Re-test

Comments

#1

subscribing. Just noticed this happening after upgrading to 6.11.

#2

Status:active» needs work

Even drupal.org is outputting the content-type meta tag twice...

drupal_final_markup was added to prevent certain attacks (see the diff here http://cvs.drupal.org/viewvc.py/drupal/drupal/includes/common.inc?r1=1.7...)

As for the patch; I suggest removing the first line of the function as well.
After that, set this issue to patch needs review so the your patch will be tested automagically.

- Arie

#3

Status:needs work» closed (works as designed)

This was introduced by SA-CORE-2009-005 on purpose, to prevent non updated themes to cause issues. The duplication of the meta content-type header is harmless.

#4

The duplicate of the meta content-type header is harmless.

Harmless but inelegant.

#5

Yes, it is inelegant, we (at the security team) all agree it is inelegant, ugly, or whatever you want to call it. We are not going to argue over it. Unfortunately, we found most themes vulnerable to this bug, so we needed a fix which solves the vulnerability with all themes regardless of their release schedules. Thus we needed to make sure that the content type value is output no matter what. We ensured that the output still validates against the W3C validator and that we found no standard saying we should not output the same meta tag two times. We also looked at the output in various browsers and found that it had no issues in practice.

#6

Status:closed (works as designed)» needs work

Just one thing.. It seems that there's this thing called FastCGI that seems to have an issue with duplicate content headers. See #462336: TCPDF: Pdf generator Problem with Fastcgi for more details.

I would actually prefer if we could apply the patch in #1 with the removal of the first line as suggested in #2 to remove one of the content-type lines, at least to Drupal 5 and 6.

Knowing how drupal_final_markup() is working I see no conflict in applying this patch and still complying with all the security concerns expressed in #5.

João

#7

jcnventura: First, did you verify that when you remove one of the content type headers, it is indeed working right with TCPDF?

#8

My website delivers HTML4, not XHTML, so this new feature means that my site doesn't validate because of the trailing slash. Is there some configuration setting that tells Drupal whether to output HTML4 or XHTML, so that drupal_final_markup() can check which to use?

#9

Drupal outputs XHTML, and the suggested XHTML notation for HTML backwards compatibility (from the W3C) is the slash separated with a space before the closing >. Drupal hardwires outputting XHTML in so many places, that it is hard to believe that you managed to output straight HTML without heavy hacking.

#10

Status:needs work» closed (works as designed)

#462336: TCPDF: Pdf generator Problem with Fastcgi is unrelated. The "duplicate" headers in the error message is not the HTML one, but a true HTTP header.

Back to "by design".

#11

Status:closed (works as designed)» needs work

Gábor: I am the print module maintainer, that bug was reported by one of the users, so no I can't try to reproduce it with only one tag.. For me TCPDF works fine with one or two tags.

The problem is with FastCGI which is stupidly aborting when it detects a duplicate header. That's a known problem and is already solved in their CVS snapshot (of November 2008).. If you follow one of the links in the other issue, there's a patch there which simply removes the duplicate detection code to avoid that problem. (http://www.fastcgi.com/archives/fastcgi-developers/2008-September/000048...)

However, I would guess that most host providers which use FastCGI will be using the official release version of 2007 and not the CVS snapshot, so it may be that Drupal itself may run into problems in that setup. On the other hand, I don't understand how he can be running print 6.x-1.7 (which because it uses drupal_final_markup() requires D6.11 or later), and not have the same problem with core Drupal...

Anyway, how bad is it to commit the patch in this thread and have a single Content-Type tag??

João

#12

I didn't have to do much hacking, just a couple of lines in phptemplate_preprocess_page to replace '/>' with '>'. But I will no doubt run into more problems as I add more modules. I'm using HTML4 for historical reasons because I ported an existing site to Drupal. I guess it's time to bite the bullet and move to XHTML.

Slightly off-topic, are there any plans to support HTML5 if it ever takes off? At that stage there might be a need for a Drupal setting to allow a choice between XHTML or HTML5.

#13

Status:needs work» closed (works as designed)

I take it that jcnventura cross-posted.

#14

jcnventura: The patch you posted deals with HTTP headers, not meta tags in HTML. Therefore looks unrelated.

#15

Status:closed (works as designed)» needs work

Yes, I cross-posted..

And indeed, you're right, that's an HTTP header, so I very much doubt that he'll be able to generate PDFs as it is. I will investigate the issue further with the user on the other queue..

João

#16

Status:needs work» closed (works as designed)

Yes, I cross-posted..

And indeed, you're right, that's an HTTP header, so I very much doubt that he'll be able to generate PDFs as it is. I will investigate the issue further with the user on the other queue..

João

#17

Hi, I also came across this one.

For me how the fix is implemented is actually an issue because I would like to serve my pages with content type application/xhtml+xml and having two conflicting content types is not correct, not only inelegant, which I could well live with.

I can't even preg_match in my THEME_preprocess_page() hook because drupal_final_markup() seems to skip all the output mangling paths.

Could you please suggest a way to work around that? Otherwise I will have to hack the core (sounds like a mild menace, eh?) :)

Thanks for the attention.

With Kind Regards,
Antonio

#18

Status:closed (works as designed)» needs work

Sorry, to re-open, but I disagree on the status and I think it could be easily fixed. :)

Any reason for not fixing drupal_get_html_head() ?

This was introduced by SA-CORE-2009-005 on purpose, to prevent non updated themes to cause issues. The duplicate of the meta content-type header is harmless.

s/harmless/hack-ish, I would say, and it could be easily fixed with the patch provided on top of the issue. It is completely unnecessary that drupal_get_html_head() prepends the Content-type meta while this job is now done by drupal_final_markup().

As a follow up to #17, I would say drupal_final_markup() would have to check if a Content-type meta exists in the header and ensure it is placed on top of it. So maybe drupal_get_html_head() does not need to be fixed, but add a bit more logic to drupal_final_markup(). And maybe even it wouldn't have to be restricted to certain values of $hook ('page' and 'book_export_html'). Think about alternative templates to 'page' that are used to render iframes, etc (several contrib modules do). So I think the issue status would have to be switched to "needs work" and let's try to elaborate on this please.

The security issue was fixed, so now that we have (relative) peace, I think we can try to think about a better approach.

#19

@ao2: the reason we implemented it in a way that themes cannot override is exactly to ensure that it is always present first.

@markus_petrux: the reason we left the meta in drupal_get_html_head() is that we thought we cannot be 100% sure that the regular expression matches (however simple that regular expression is), and we still need to have that meta information on the page. If you can find a way which totally ensures that all themes are fixed regardless of their state, then that would be great.

#20

Since Drupal 6 has no API for using a different document type (as requested by #17), one possible approach would be to apply the patch in #0. That would mean Drupal 6 is still forcing the eixstence of the content-type meta, but injected on a different place. ie. what was doing drupal_get_html_head() to the header output, it now would be done from drupal_final_markup() only, but not on both places, which is the cause of duplicated meta tags for content-type.

Also, at the bottom of theme(), where drupal_final_markup() is invoked, I would remove the condition that triggers the call to drupal_final_markup(), so a mini-patch for discussion would look like this.

Changes to theme.inc:

<?php
-  if ($hook == 'page' || $hook == 'book_export_html') {
-   
$output = drupal_final_markup($output);
-  }
$output = drupal_final_markup($output);
?>

Changes to common.inc:

<?php
function drupal_get_html_head() {
$output = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n";
-  return
$output . drupal_set_html_head();
+  return
drupal_set_html_head();
}

function
drupal_final_markup($content) {
  
// Make sure that the charset is always specified as the first element of the
   // head region to prevent encoding-based attacks.
-  return preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
+  if (
strpos($content, '<head') !== FALSE) {
+    return
preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
+  }
}
?>

And if you wanted to provide support for different mime types (rather than forcing 'text/html'), then Drupal could provide a $conf variable or something else than can be defined by the theme. But this part would really be a "feature request". Hence, probably something to take into account for D7, maybe.

#21

@markus_petrux: it is totally possible that some theme function is used to generate arbitrary XML or other markup. That could easily contain markup which matches <head[^>]*> or have <head> with a different meaning then HTML. Also, as said, we cannot ensure that <head[^>]*> is actually a character sequence present in all themes, so we cannot say we ensured that the meta tag appears, if we only rely on that being matched. Plus the regular expression is purposefully case insensitive. You now put a case sensitive strpos ahead of it, so themes using <HEAD[^>]*> would not work. We went through discussing these options in the security team, and were not at all happy to come out with the solution as we did, but we had the duty to try and ensure security with all custom themes, whoever and however did them.

#22

hmm... I see.

Maybe drupal_get_html_head() could add a placeholder instead of the content-type meta tag, and that could be used in drupal_final_markup() to catch up it's a page template that was fine with that content-type meta tag, so it can be now safely added by drupal_final_markup().

Se the mini-patch for discussion would look like:

Changes to theme.inc:

<?php
-  if ($hook == 'page' || $hook == 'book_export_html') {
-   
$output = drupal_final_markup($output);
-  }
$output = drupal_final_markup($output);
?>

Changes to common.inc:

<?php
function drupal_get_html_head() {
$output = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n";
-  return
$output . drupal_set_html_head();
+  return
'<!--head-->'. drupal_set_html_head();
}

function
drupal_final_markup($content) {
  
// Make sure that the charset is always specified as the first element of the
   // head region to prevent encoding-based attacks.
-  return preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
+  if (
strpos($content, '<!--head-->') !== FALSE) {
+    return
preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
+  }
}
?>

Now, you can be sure strpos() will match.

#23

#22 follow up:

The string '<!--head-->' could be replaced by something a bit more esoteric to ensure it's unique enough.

...or if we don't find a way to remove the check in theme(), then just apply the patch in #0, so we don't output more than one content-type meta tag?

#24

@Gábor: I understand why you did this way, I just didn't like the hardcoded content-type. But IIUC this is a non issue for D6 since the content-type is hardcoded in other places as well, and this discussion is Off Topic here anyway. So I'll have to hack the core for the time being. I just wanted to be sure :)

Opening a feature request for D7 (#475242: Support Content-Type different from text/html).

Sorry for the noise.

Regards,
Antonio

#25

I have solved with this code in page.tpl.php file

<?php /*print $head;*/
$headlines = explode("\n", $head);
foreach (
$headlines as $line) {
if (
$line == '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />') {
 
$line = "" . "\n";
}
print
$line ."\n";
}
?>

Is there a way to do it with a function in template.php file?

#26

@samuele, something like:

<?php
function THEME_preprocess_page(&$vars) {
 
$vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head']);
}
?>

#27

Thank you for the quick fix @samuele and @ao2!

This needs to be fixed in a proper way, because the HTML 5 validator throws an error with this. In this particular case the validator might be wrong, because it’s not a real error to have two identical meta tags, but still ...

#28

Having to echo the above, as I found the additional meta tag was added when trying to validate HTML5. It would also be nice to be able to provide our own input with this, to meet differing standards.

Note, you can fix one of the meta tags using template.php, but not the other. The one that comes before Title. The only option seems to be to crack the core code, which I won't just to validate.

#29

To paraphrase this issue using an old commercial as a base:

Themer: Hey! You got some of your model in my view!
Developer: Hey! You got some of your view in my model!
Both: *munch* Whoa! This tastes like utter crap!

The argument about allowing the extra output in html_get_head is a moot point. If you NEED to place this call into drupal_final_markup (because you can't trust html_get_head to protect you from the vulnerability) then if the theme developer has done something screwy enough to mess with your regular expression, you still haven't guaranteed that the design has the content-type as the first line of the head. This will still render sites utilizing that theme vulnerable to attack. Thus, you probably should not be claiming that you have 'solved' this vulnerability issue.

While I agree that security is critical, I'm conflicted because I also believe this violates MVC. Clearly, if the theme is where the issue lies, why should it be the model's job to fix it? Granted, there are multiple examples at how Drupal already breaks the principles of MVC, but that's another argument. At most, Drupal should put out a warning for site admins to check that their themes are outputting a content-type and place the responsibility for this back where it belongs: with the theme designers. If a site admin is doing their job, they should be looking for news updates from Drupal anyways and if they decide they can't take 30 seconds to view the source of their page and verify that they see a content-type output, they deserve to be hacked!

However, I should state that I'm very against 'Big Brother' tactics like this, the seat belt laws, and pretty much anything ever made "mandatory" for my personal safety. So take my opinion with a grain of salt. ;)

#30

Status:needs work» closed (works as designed)

@jgadrow: this is only true in Drupal 5 and 6, where is was not pratical to update all the themes distributed on drupal.org *and* all the custom themes deployed in the several hundred thousands of websites around the world. This ugly regexp is not in Drupal 7.

And by the way: (1) Drupal is not MVC (I'm still not sure what MVC means in the context of a web application anyway), (2) drupal_final_markup() is part of the theming layer, so it could be argued that it is part of the View (at least in the definition of MVC you seems to have).

#31

Status:closed (works as designed)» needs work

Wonderful that it's not in Drupal 7. So, it's only in the currently supported, stable systems. Which is, what? MOST Drupal sites then?

And I noticed that you completely skipped the major statement that I made: the fact that this vulnerability STILL exists because you can't be 100% positive that theme designers have:

1) Placed the $head variable at the beginning of their tag (or placed it at all for that matter)
2) Not done something stupid to mess up the regular expression in drupal_final_markup

So, you're forcing themers to remove an extra declaration of the content-type (or just look inelegant and inept) on the basis that this allows you to 'solve' a security vulnerability. And it STILL does not solve the vulnerability 100% of the time.

Once again, why not just leave the declaration out of html_get_head and put out a security warning advising administrators to check the output of their sites? Site admins can then alter their themes or not (at their leisure) to fix the vulnerability. You're not forcing themers to overcome a kludgy 'fix' to a problem that may not be affecting their theme anyways. Since themes are separate from the Drupal core, future updates will not invalidate a change made by a site admin meaning they make the change only once. And if a new version of their theme comes out and they overwrite their current theme. They should either already know to check the output for the content-type, or, the theme is being actively developed and the theme developer should have already altered their theme to remove the vulnerability. Note: If they haven't that's a good indicator to go with a different theme developer!

Additionally, there's the added performance hit (albeit a small one) of constantly telling PHP to remove that extra declaration that the theme I'm using didn't need in the first place. Or the extra bandwidth being consumed by the unnecessary characters if it's left in. Some people ARE charged for bandwidth, so, technically, there IS a cost (and an actual, physical cost at that!) associated with leaving the extra declaration over-and-above future semantic compatibility. So, the statement that "declaring it twice is harmless" is also misleading.

And, as for a definition of MVC:
Model-View-Controller - An architecture for programs that have graphical user interfaces. The architecture separates those parts of the program that handle the application data (the model) from those that present that data to users (e.g. graphically, on the screen) (the views) and from those parts that respond to changes made to the data (e.g. by the user manipulating the data presented on the screen) (the controllers). The architecture defines the ways these parts of the program communicate. The parts of the architecture have high cohesion and they are connected in such a way as to give low coupling.

That definition is just as suitable for web applications as it is for 'standard' applications. And I don't think you're allowed to state that Drupal is "not MVC" and then make a claim that it still follows "MVC." It's fine for a program to NOT be MVC. I had already stated in my previous post that Drupal is not truly MVC (try making your theme output an XML document utilizing XSL(T) instead of (X)HTML and serve it correctly as text/xml without hacking core... I dare ya! lol) but has tried to, largely, not interfere in the presentational layer. However, to claim that it's not MVC and then state that it could be argued that it conforms to it is, at best, contradictory.

Also, I would like to state that I'm not 'upset' or 'angry.' I know Drupal has limitations (just like any other system does) but I do enjoy in participating in intellectual debate. So, please, I'm not intentionallly trying to step on any toes here, only give a new opinion. My native language just happens to be sarcasm. :)

#32

Status:needs work» closed (works as designed)

@jgadrow: according to your definition, the whole theming layer (and more) is part of the "View", that includes drupal_final_markup(), making your point in #29 moot.

#33

I have already yielded on the grounds of MVC (see #31) as it's not terribly important. However, if you insist on making this 'by design' is there a known issue section for Drupal? I believe you should add one in that case:

Known issue:
If you are running Drupal 5 or Drupal 6 and the theme that you're using has not output the $head variable, has not supplied its own content-type declaration, and has irregular code that causes the regular expression in drupal_final_markup to fail, you will be vulnerable to an encoding type attack. Additional errors may crop up in some instances if your theme is not removing an extra content-type declaration supplied by Drupal in an attempt to safeguard MOST systems from this vulnerability.

#34

Whether it validates or not, having duplicate headers is a sloppy fix to anything.

There is no way to avoid putting the Content-Type meta before the TITLE tag when dynamically generating titles with user content.

If no content type is set, IE tries to interpret this:

+ADw-script+AD4-alert(1)+ADw-/script+AD4-

as UTF-7 which is this:
<script>alert(1)</script>

For a full explanation please refer to: http://code.google.com/p/doctype/wiki/ArticleUtf7

So that is not an issue.

The issue is:
- drupal_get_html_head() [/includes/common.inc] tacks it on to the top of the headers.
- then, drupal_final_markup($content) [/includes/common.inc] adds one immediately after the HEAD tag (if there is a HEAD tag)

The two functions above are called in the given order. The result... is 2 identical headers.

So
- Drupal always adds it to the top of Headers
- Drupal adds it after HEAD only if there is a HEAD tag

My solution:

If there is a HEAD tag
- If there is a CONTENT-TYPE tag, remove it
- Insert a CONTENT-TYPE tag immediately after the HEAD tag

I changed the drupal_final_markup($content) function by adding two lines at the top as follows:

/**
* Make any final alterations to the rendered xhtml.
*/
function drupal_final_markup($content) {
  // If a HEAD tag exists and there is already a CONTENT-TYPE header, remove it
  $content = preg_replace('/(<head.*?)<meta http-equiv="Content-Type"[^>]*>[\\r\\n \\t]*/si','$1',$content,1);
  // Make sure that the charset is always specified as the first element of the
  // head region to prevent UTF-7 encoding-based attacks.
  return preg_replace('/<head[^>]*>/i', "\$0\n  <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
}

Which also removes the trailing space to keep the code good looking.

- Kent

#35

subscribe

#36

subscribe

#37

+

#38

subscribe

security issue is okay, but for themers with standards in mind there must be a simple way to influence this. Framework!

#39

I don't have an elegant solution for this, so take this with a grain of salt. This fix just 'feels' wrong to me.
1. It is fixing a security issue in IE, not drupal.
2. It is forcing output that may not be desired.

I don't know a whole lot about charsets, but I did notice this in the link from #34: "Declaring an incorrect charset can be worse than not setting one at all."

To be honest, I never would have noticed if I wasn't working with HTML5. I generally skip over this part of the header when I view source.

Like I said, I don't have a good solution for this, my current one is to disable the action in common.inc and code my themes correctly. It's a pain, but it works.

#40

Subscribe

#41

subscribing...

#42

subscribe

#43

Version:6.11» 6.14

Subscribing
I added http://drupal.org/node/451304#comment-1711102 to my page.tpl.php and it worked.
But I also want the title tag to be the first and not the content line, is this possible using this core-patch. Will that ever be committed?
Hopefully no security issues doing this?

greetings, Martijn

#44

After reading this thread, the two meta tags don't seem to actually be "by design," but rather just one way to accomplish the security goal here. Seems like it should be possible to find a solution that accomplishes that goal and does not involve duplicate content types. Would be nice if someone in the "by design" camp would comment on the general idea of removing existing content types before inserting the new tag in drupal_final_markup().

#45

amazing silly bugs drupal has...
oh well...

#46

The duplication of the meta content-type header is harmless.

..unless you care about the HTML output of your page. Come on, outputting it twice is a hack and you know it - why not check if it's already been set and then decide whether or not to output it?

#47

Agree, also out of SEO-purposes it is better not to show it twice!
Greetings, Martijn

#48

..unless you care about the HTML output of your page. Come on, outputting it twice is a hack and you know it - why not check if it's already been set and then decide whether or not to output it?

We output valid HTML (nothing in the (X)HTML specification says you should only have one of those meta-tag.

But yes, it is a hack. It was made to make everyone safer and it will be gone in Drupal 7. Overall, not a big deal at all.

Feel free to come up with an efficient code to "check if it's already been set and then decide whether or not to output it". I would be happy to review this patch.

Agree, also out of SEO-purposes it is better not to show it twice!

You should fire your SEO consultant.

#49

We output valid HTML (nothing in the (X)HTML specification says you should only have one of those meta-tag.

Valid HTML is not necessarily good HTML! Layouts built with tables validate, that doesn't mean that it's good HTML... The HTML validator service is a tool, not a stamp of approval.

#50

it is totally possible that some theme function is used to generate arbitrary XML or other markup. That could easily contain markup which matches ]*> or have with a different meaning then HTML

This is exactly how I came across this. I was generating a NewsML feed, which defines a <HeadLine> tag. Mysteriously, a meta tag was being injected into the content. Ugh.

#51

Version:6.14» 6.15

subscribe

#52

subscribe

#53

subscribe

#54

Clearly there are some valid reasons for putting in this code, but it is (as everyone seems to agree) an inelegant hack. The people complaining here seem to be those who actually value their html validation (I include myself amongst them).

A major issue seems to be the lack of an option to disable the behaviour, and the way that this update has silently broken validation on, ironically, any site that is concerned enough about security to apply that upgrade. This is a slight betrayal of trust which is why it is evoking a strong response. I am not criticising the security team - they do an amazing job and I would rather they do things like this than not.

I have two ideas that may be worth discussing.

  • We use the current solution, but provide a theme setting to disable the behaviour. By default it is set to add the element
  • Rather than checking final markup, we use the regex on the active/enabled themes to check for the $head variable, and modify the output based upon its absence. At least the themes that are doing the right thing would not be affected

And on the issue of xhtml output, there are valid reasons to use HMTL 4.01. It would be great to have a theme level setting that triggered the empty tag syntax. I know, scratch my own itch ;)

#55

Sorry - mispost

#56

Did you manage to come up with a workaround? I've got a similar issue when generating xml, which has a tag defined in it. Cheers

#57

Subscribing!

#58

Version:6.15» 6.x-dev
Status:closed (works as designed)» needs work

Sub. Changing to CNW based on #48:

Feel free to come up with an efficient code to "check if it's already been set and then decide whether or not to output it". I would be happy to review this patch.

Please don't change back to 'by design' prematurely. Yes, it was originally done by design, but it's a hack and it's a pretty big DrupalWTF for D6 at the moment. Let's see if somebody wants to try and tackle this the right way.

#59

subscribed

#60

Just a small tweak to #25 to still give out the character encoding for html 5

<?php //print $head;
 
$headlines = explode("\n", $head);
  foreach (
$headlines as $line) {
   if (
$line == '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />') {
   
$line = '<meta charset="utf-8" />' . "\n";
   }
   print
$line ."\n";
  }
?>

You still need to edit drupal_final_markup($content) [/includes/common.inc] like this:

function drupal_final_markup($content) {
  // Make sure that the charset is always specified as the first element of the
  // head region to prevent encoding-based attacks.
  //return preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
  return $content;
}

-- Alex Sartogo
Hot Sauce Studios

#61

subscribe

#62

It's possible to solve the problem on theming level (without patching the drupal core)
Something like next line in your page.tpl.php (in used theme folder)

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php print $language->language ?>" lang="<?php print $language->language ?>" dir="<?php print $language->dir ?>">
<head>
<?php $head = str_replace('<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />','',$head);
print
$head; ?>

<title><?php print $head_title; ?></title>

#63

Re: #62, first of all, something like that should go in template.php, and also, won't your example remove all instances of it?

#64

@denis_sle, please see #26, and yes, with this _workaround_ you still need to emit your own content-type in page.tpl.php

Bye,
Antonio

#65

In an effort to move a fix forward, I'm going to try to summarize the constraints such a fix would need to work under. Please correct any mistakes here:

  • There are two content-type meta tags being inserted, which makes the HTML bloated.
  • The first insertion is in drupal_get_html_head
  • The second insertion is in drupal_final_markup
  • The drupal_final_markup implementation uses a regular expression to check for <head>, which is not 100% reliable, as it's possible to create an invalid theme without a <head> to match this expression.
  • The drupal_get_html_head implementation will always output the meta tag at the top of $head, but $head is not necessarily at the top of <head>, so this is also not 100% reliable.
  • Because neither addition is 100% reliable, simply removing either would make the security solution less reliable, as the two meta tags couldn't cover potential gaps left by the other.
  • It's important that the content-type meta tag be inserted as reliably as possible, as the lack of this tag is a known security problem.
  • A solution that removed one of the insertions only when certain the other has already been inserted would address the duplicate tag problem without introducing any additional security vulnerability.

Is that all correct?

#66

subscribing

#67

Subscribe.

#68

subscribe

#69

still an issue...

#70

...issue... and annoying thing...

I have a question for security team - what happens if we add this declaration to php header - as far as I know it takes precedence over the html declaration...

<?php
header
('Content-Type: text/html; charset=utf-8');
?>

...is UTF-7 injection (in ie) is the case then...?

#71

subscribing

#72

subscribing

#73

subscribing

#74

subscribing

#75

Status:needs work» needs review

This is a simple change to drupal_final_markup() that only adds the content-type meta tag at the top of head if we can't verify that it's already at the top of head. If we can verify it's already there, there's no benefit to adding it again, so we just return $content as-is.

AttachmentSizeStatusTest resultOperations
duplicate.patch960 bytesIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch duplicate_0.patch.View details | Re-test

#76

The patch in #75 is working correctly. Thank you.

#77

The patch from #75 fixes this issue.
I am using:
Drupal 6.17.

#78

Re: #75, seems like it might be a bit cleaner to only have 1 return statement, like so:

<?php
if (!preg_match('/<head[^>]*>\s*<meta http-equiv="Content-Type" content="text\/html; charset=utf-8" \/>/i', $content)) {
 
$content = preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
}
return
$content;
?>

No?

#79

@mcrittenden, I don't think that's really better, but it's pretty subjective and not a significant gain in speed or readability on either side (with the exact same functional result). If you think it's better, please post a patch so this can maybe get resolved before D8 is released.

#80

This implements #78. Not sure what the Drupal coding style preference is on one or multiple return statements, so this may or may not be necessary, but it just looks better to me.

AttachmentSizeStatusTest resultOperations
one_return_statement.patch964 bytesIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch one_return_statement.patch.View details | Re-test

#81

Status:needs review» reviewed & tested by the community

Looks good to me. Since this is functionally identical to the previous patch, which had 2 positive reviews, I'm going to go ahead and move the status to RTBC.

#82

Because the security team introduced this code, I've sent a request to them to help double check.

#83

Looks, reasonable - though this effectively is a minor performance hit for all sites without an updated theme, right?

#84

Similar to #26, this snippet will strip out only the duplicate metatag in template.php, without a core hack/patch.

<?php
THEME_preprocess_page
(&$vars){
 
// Strip duplicate head charset metatag, use limit param to strip 1 instance only.
 
$vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], 1);
}
?>

Note: The assumes the tag is always output twice, which, as it's output in core is reasonable. But if a core patch is applied, this snippet will need to be removed.

*EDIT*

Here's a version that checks the string exists twice before stripping one:

<?php
THEME_preprocess_page
(&$vars){
// Strip duplicate head charset metatag
 
$matches = array();
 
preg_match_all('/(<meta http-equiv=\"Content-Type\"[^>]*>)/', $vars['head'], $matches);
  if(
count($matches) >= 2){
   
$vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], 1); // strip 1 only
 
}
}
?>

DT

#85

Yes. It's a good idea. It can even be implement in a module:

<?php
clean_duplicate_meta_tag_preprocess_page
(&$vars){
 
// Strip duplicate head charset <meta> tag.
 
$matches = array();
  if (
preg_match_all('/(<meta http-equiv=\"Content-Type\"[^>]*>)/im', $vars['head'], $matches) > 1) {
   
$vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], 1); // strip 1 only
 
}
}
?>

#86

Has anyone considered proposing a theme hook that allows the theme to certify that it correctly handles the content type? I mean, after all, the heavy-handed insertion of the content type meta tag was in response to lots of themes that exist that don't correctly set the content type. I would think that, with that in mind, it would make sense to give theme developers the option of turning off the insertion behavior simply by going the "Drupal Way" and implementing the appropriate hook.

I propose something like THEME_content_type_is_provided().

#87

My website will display just the source code one minute or the webpage the next. My web host says it is due of the duplicate content header tags. Which I really had three because I added one to my page.tpl.php file.

#88

@Kutakizukari: did the patch solve your issue (I'm suspicious that it did not)?

@all: not sure its a good idea to preg_match case insensitively through the whole generated page content. That can easily get darn slow. Did you measure this?

#89

@Gábor Hojtsy, this should actually be faster than the current code in the vast majority of cases, specifically cases where themes are doing things right, the cases we want to optimize for.

The existing code also does a case-insensitive search, but with preg_replace rather than preg_match. Because preg_match stops after the first match, and in the vast majority of cases a match will be found very early in the document, preg_match will almost never search the whole document. preg_replace, on the other hand, defaults to replacing all matches, meaning it currently does look through the whole document in every case.

I do think the preg_match could be case-sensitive, though, since it would still be case-insensitive on the replace. That would mean maintaining duplicate tags on invalid (uppercase) XHTML in themes, which I think is probably fine with people who don't care about valid markup anyway.

We could also apply a $limit of 1 to the preg_replace while we're at it, and speed it up even for the cases of bad themes. I'll make another patch with all of this soon if no one else gets to it first.

[edited to fix a typo]

#90

Status:reviewed & tested by the community» needs review

This patch is essentially #80, with case-sensitive rather than case-insensitive preg_match. When I went to add a limit of 1 to preg_replace, I discovered it was already there, so some of what I wrote in #89 is wrong. Still, this should be faster than the current code in most cases now that it's case sensitive.

AttachmentSizeStatusTest resultOperations
duplicate.patch992 bytesIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch duplicate_1.patch.View details | Re-test

#91

Just a couple of minor improvements/suggestions over #90:

1) if the condition of the IF statement is reversed, then a) the patch is simpler, and b) the result of the preg_replace does not need to be assigned to $content.
2) preg_match could use a delimiter that does not exist in the regular expression, making it a bit easier to read.

The patch would look like this:

   // Make sure that the charset is always specified as the first element of the
   // head region to prevent encoding-based attacks.
+  if (preg_match('`<head[^>]*>\s*<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />`', $content)) {
+    return $content;
+  }
   return preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);

#92

@markus_petrux, that's essentially undoing the change in #78, which is fine with me, but is starting to feel a bit like bikeshedding. Could you post an actual patch for review to keep this moving?

#93

Subscribing

#94

Ant help on the error below regarding printing would be appreciated.

www.hellaswebnews.com

Fatal error: Call to undefined function drupal_final_markup() in /home/.porche/myftp/hellaswebnews.com/sites/all/modules/print/print.pages.inc on line 32
array(4) { ["type"]=> int(1) ["message"]=> string(49) "Call to undefined function drupal_final_markup()" ["file"]=> string(80) "/home/.porchemyftp/hellaswebnews.com/sites/all/modules/print/print.pages.inc" ["line"]=> int(32) } n/a

#95

@hellaswebnews, that looks like an unrelated issue that should have a separate thread.

#96

Subscribing

#97

Subscribing

#98

Subscribing :)

#99

Version:6.x-dev» 7.x-dev

Seeing what happens...

#100

Version:7.x-dev» 6.x-dev

Only effects 6.x.

#101

Only affects 6.x.

#102

subscribe

#103

Could this be wrapped in a variable_get() check to allow some sites to disable it when needed? e.g.:

<?php
function drupal_final_markup($content) {
 
// Make sure that the charset is always specified as the first element of the
  // head region to prevent encoding-based attacks.
 
if (variable_get('duplicate_meta_content_type', TRUE)) {
    if (!
preg_match('/<head[^>]*>\s*<meta http-equiv="Content-Type" content="text\/html; charset=utf-8" \/>/', $content)) {
     
$content = preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
    }
  }
  return
$content;
}
?>

#104

@DamienMcKenna, why would some sites need to disable it? Just curious as to what I'm missing.

#105

@DamienMcKenna if you're going to create a variable (and I also don't see the point of that, as no one wants duplicate meta tags), you need to also provide an interface for updating that variable.

As this is currently in "needs review" status and we're well over a year in now, can we maybe focus on reviewing the current patch before we introduce more new ideas? If there's a better solution, let's state the benefits clearly and change the status to "needs work" while we work on that better solution. Or better yet, submit a better patch. But if there's just a different solution, that's not really helping move this forward.

#106

This might also be affecting how favicons are displayed. The $head prints out a "shortcut icon" that does not change when I set a custom icon via the Theme settings. It always prints out "site/misc/favicon.ico". I could work around that by manually overwriting the favicon.ico file there, but I'd prefer not to.

#107

Do not highjack this case, please.

#108

@Gábor Hojtsy the host that I was using to get Drupal running I had to use Apache 1.2, PHP 5 Flex, CGI to get the file temporary directory right. I followed this https://members.nearlyfreespeech.net/wiki/Applications/Drupal.

File system settings and permissions

• All the Drupal files should belong to the web group (25000).
o All files you uploaded to the server will have your user ID for Owner and Group (A number like 165576, or "me"). The "web" group's number is 25000. You can change your files' group to web by typing "chgrp web filename" or "chgrp 25000 filename".
• sites/default/files or sites/default/yoursite.com/files should have 775 permissions, and again, belong to the web group.
• For the temporary directory, PHP's safe mode will keep it from writing to the system's /tmp directory. You should set this to your /home/tmp directory. To get the full path to this directory, visit your admin/reports/status/php page on your Drupal site and scroll down to the PHP Variables section and find the _SERVER["TMPDIR"] variable, which should look something like /f1/content/site-name/tmp/. Use that value for your Drupal site's temporary directory.
• Alternative: If the above doesn't work you might need to try creating a subdirectory of /home/tmp and chgrp it to web. Then in Drupal, go to Administer > Site Configuration > File System and update the temp directory path to use the subdirectory you just created. (Applicable to 6.16 and possibly other versions.)

Was able to get it to work with no problems running Apache 2.2, PHP 5.2 Fast, CGI.

#109

Status:needs review» needs work

The last submitted patch, duplicate.patch, failed testing.

#110

What about delete some code from line 124 in includes/common.inc?

From:
$output = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n";

To this:
$output = "";

Simple... but it is safe?

_______________
Sorry for my English

#111

Small modification to #84:

<?php
function THEME_preprocess_page(&$vars){
// Strip duplicate head charset metatag
 
$matches = array();
 
preg_match_all('/(<meta http-equiv=\"Content-Type\"[^>]*>)/', $vars['head'], $matches);
  if(
count($matches) >= 2){
   
$vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], 1); // strip 1 only
   
$vars['head'] = preg_replace('/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/', '', $vars['head']);
  }
}
?>

Now there is no blank line.

#112

try this. this will do the magic.
function drupal_get_html_head() {
//$output = "\n";
return $output . drupal_set_html_head();
}

#113

subscribe

#114

#111 works. Thanks for simple workaround!

#115

#111 works. Thanks for simple workaround! +1

#116

#111 worked to remove the second occurrence for me as well- which is great!

Is it possible for the first occurrence to output in the html5 meta tag format for charset?

<meta charset="utf-8" />

#117

@snowfox, that format is only a new option in HTML5. The longer version Drupal uses is still valid in HTML5.

#118

@117 Correct. They would both still be valid in HTML5 (but what isn't?). But, I would prefer to use the correct(er) more valid(er) flavor of meta tag declaration.

#119

#111 also worked for me. Is this the best solution?

Also, just for the sake of the conversation on this thread. does this issue has been solved in D7?

#120

@snowfox

re: <meta charset="utf-8" />

Would love to know how to do this as well.

#121

Status:needs work» needs review

#75: duplicate.patch queued for re-testing.

#122

If you have more than 2 lines displayed (which should not append) :

<?php
function THEME_preprocess_page(&$vars){
// Strip duplicate head charset metatag
 
$matches = array();
 
preg_match_all('/(<meta http-equiv=\"Content-Type\"[^>]*>)/', $vars['head'], $matches);
 
$nb_matches = count($matches);
  if(
$nb_matches >= 2){
   
$vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], $nb_matches - 1); // strip all but one
   
$vars['head'] = preg_replace('/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/', '', $vars['head']);
  }
}
?>

this is a small change from #111

#123

#111 it works for me thank you!
subscribed

nobody click here