Closed (won't fix)
Project:
Drupal.org site moderators
Component:
Content moderation
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
3 Oct 2011 at 14:51 UTC
Updated:
17 Jan 2012 at 06:22 UTC
When img source filter was deployed a lot of images are now broken.
Attached are two text files with lists of nodes and comments that probably need to be updated.
I just ran this analysis:
select count(1), type from node n inner join node_revisions nr on n.vid = nr.vid where body like '%src="http://%' group by type;
+----------+-----------------+
| count(1) | type |
+----------+-----------------+
| 452 | book |
| 1251 | forum |
| 3 | page |
| 1663 | project_issue |
| 190 | project_project |
| 6 | project_release |
| 6 | story |
+----------+-----------------+
There are ~4,257 comments which need to be updated based on a similar query (note: the query has false positives like http://drupal.org/node/5890#comment-8871 which has an iframe src on http but its inside a code tag so not actually a problem).
I suggest anyone actually doing this work download the files and turn the cid and nid values into links with something like awk in their terminal:
awk '{print "http://drupal.org/comment/edit/" $2}' cids_to_update.txt
| Comment | File | Size | Author |
|---|---|---|---|
| cids_to_update.txt.bz2 | 26.15 KB | greggles | |
| nids_to_update.txt.bz2 | 76.28 KB | greggles |
Comments
Comment #1
dwwI thought none of those would be broken until their input format was changed from Documentation to Filtered HTML. See #1275424: Deal with documentation role and documentation input format for more...
Comment #2
greggleshttp://drupal.org/node/7774#comment-12824
The scenario is comments/nodes set to filtered html where the img tag was being stripped out before and is now being replaced with the red X.
Comment #3
dwwGotcha. Although your LIKE is going to have a lot of false positives with people who just cut+paste'd the file attachment URL (which gives you absolute URLs to http://drupal.org/files/...) into an img tag. So I think the problem is less severe than it seems from the stats in the summary.
Comment #4
jhodgdonDoes the filter allow those kind of URLs (#3)?
Comment #5
arianek commentedsub
Comment #6
vegantriathleteSo I started to take a look at this and see that we will need a much better way of coordinating. Here are the results for the first 15 items. My comments are at the end of the line using the ->
The point is that I went 0 / 15. How do we coordinate so that somebody else doesn't do the exact same thing? How can we let others know which items have already been addressed?
Maybe as we work on the items, we could edit the original issue, remove the old version and attach a new version of the file [with the completed items removed]. Edit: NO GO, with the idea of changing the attached files for the original issue. We'd have to attach the new version to our reply.
I suppose it would also be helpful if we agree on some type of format for including notes about things like nids 223, 3331 and 3351 which need further follow up by someone with the appropriate permission. I would also recommend keeping this a simple text file instead of compressing it with any utility to avoid the potential that somebody wouldn't be able to open it [I understand the desire to save space].
Comment #7
gregglesI thought of querying the cache_filter table to find problems but that doesn't work b/c drupal.org uses memcache.
Thanks for your work, vegantriathlete.
Comment #8
vegantriathlete@greggles: I'm happy to go through more. I just want to make sure that we don't duplicate effort. Any thoughts on implementing a protocol?
Comment #9
jhodgdonCan I suggest to avoid duplication of effort that anyone who wants to take on a chunk of the file just say something like: "I'm taking nodes 123 - 4567" in a comment? That is probably good enough to avoid duplication of effort?
Comment #10
vegantriathleteSounds good! Leave it to a doc person to come up with a simple solution. We code monkeys have a way of over-complicating things: if it doesn't involve programming, then it isn't any fun ;-)
Would it work well if we leave another comment with a status update when we've finished? Probably it would be best just to comment on the "exceptions" rather than on things we were able to resolve.
Will you take a look at what I've done above and give your thoughts about how I would have reported back on those 15?
Then I'll do another go 'round with nodes 10235 - 14886.
Comment #11
jhodgdonI put myself in the category of "code monkey" by the way. But I also am a project manager, freelance site builder, doc writer, and any number of other categories. And I'm fond of non-tech simple solutions where possible, and have managed a number of "divide and conquer meta issues". :)
Comment #12
jhodgdonBump. Do we need to turn this into a meta-issue and organize a sprint on it? Are we agreed that this has to be a manual fix-it process, or is there something we can automate?
Comment #13
jhodgdonLooking at this again... it's been a while.
So it looks like we are just talking about pages with Filtered HTML format that contain IMG tags.
Previously the images were completely filtered out. Now they are showing up as big red X's.
I guess I am not sure why this is a huge problem. If people see the page with the problem, they should have an Edit button and can fix it. Can't we just let that happen organically? We don't have plans to do any wholesale "change everything that was docs format to filtered HTML" or anything...
I'm inclined to say this is "won't fix".
Comment #14
webchickCreated #1335904: Proxy external images which could help solve this issue.
Comment #15
killes@www.drop.org commentedYeah, I am very much with jhodgdon. People should file a webmaster issue if they can't fix it.