Setting "Preview comment" to "Required" does not strictly require that the comment be previewed first. This is being abused by spammers to quickly and efficiently post spam comments.
I discovered this after I added a new feature to my new spam module to auto-blacklist spammer IP addresses, allowing me to block comment spammers when they preview a comment and thus preventing them from ever inserting their spam into my database. I configured my comment module to "require" comment previews, and yet found that the comments were slipping past my filter. I finally realized what the spammer is doing is setting $_POST['op'] to 'Post comment', effectively bypassing the preview phase.
I'm currently looking for a clean solution to this. At the moment the only idea I have is to generate a token at the preview phase, and validate the token at the post phase. Unfortunately the token would have to be stored in the databse between the preview and the post, which adds overhead.
Alternatively, I've considered using a time-based hash which would constantly update depending on the time of day. This could easily be validated without storing anything in the database. If too long has gone between the preview and the post, an additional preview step would be required... The down side here is that the time-based hash would be publically available, and thus the spammer could easily duplicate it in their script. A private key could solve for that, but increases the complexity as it adds a configuration step.
I have the feeling I'm missing a simpler, cleaner solution. Suggestions?
| Comment | File | Size | Author |
|---|---|---|---|
| #26 | common.inc_12.patch | 905 bytes | jeremy |
| #22 | contact.module_8.patch | 1.44 KB | jeremy |
| #21 | form_validate-2.patch | 1.1 KB | jeremy |
| #20 | form_validation.patch | 2.72 KB | jeremy |
| #15 | comment.module_16.patch | 1.11 KB | jeremy |
Comments
Comment #1
moshe weitzman commentedeven if you get this fixed, won't these bots just add a preview step?
this 'preview required' feature is designed to maintain high quality submissions by forcing users to proof read. it isn;t designed for security.
i think you want to hook into comment_validate(). just add a hook here - there is already a hook_comment() waiting for you to add an operation.
Comment #2
eaton commentedI posted a patch a few days ago (http://drupal.org/node/28255) that adds validation and form construction hooks for comments. It's similar to the one that the captcha module uses, though it adds comment form_pre and form_post hooks instead of a single comment form hook.
Comment #3
jeremy commented> even if you get this fixed, won't these bots just add a preview step?
Eventually, yes, but it drastically changes their ability to fling spam at a site. As is, they simply have a script that shoots data out at high speed without having to wait for messages to return from the server. It is the server that is doing all the work, thus making it simple for a spammer to DoS a site.
If "preview required" really meant "preview required", they would be forced to first automate clicking "preview", and then wait for a response before clicking "submit". This requires more resources on their side, and allows us to add delays after clicking "preview" (if we detect that they are a spammer) further using their resources.
> this 'preview required' feature is designed to maintain high quality
> submissions by forcing users to proof read. it isn;t designed for security.
Regardless of the intention, I was misled to believe that configuring my site to require previews would require that all comments were first previewed. As a site administrator, I would prefer to know that "required" really means "required".
> i think you want to hook into comment_validate(). just add a hook here -
> there is already a hook_comment() waiting for you to add an operation.
Yes and no. Ultimately yes that will work and will allow my spam module to prevent the spam from ever being posted. But it still leaves the greatest burden on the web server, instead of on the spammer. The spammer can still use a very simple script that only pushes data, and thus can generate spam at an unbelievable rapid rate.
Here is an example patch to enforce "preview required". It's one idea, I'm sure there are better ones.
Comment #4
jeremy commentedHere's a second version of the patch that doesn't require any manual configuration.
Comment #5
jose reyero commentedI like this idea, and the patch looks good
Still, I think it misses something, like some timestamp related hash, because once you get the hash code you can post multiple comments with that.
Another problem I can think of is, what happens when a cron run happens between the preview and the post?? I'm afraid comments would get lost
For this second problem, I think a key generated only once after module activation could do. About the first one....mmm... I'll sit down for a while and think.....
Comment #6
jeremy commented> I think it misses something, like some timestamp related hash, because
> once you get the hash code you can post multiple comments with that.
Using a timestamp will mean that the comment form "expires". That is, if you wait too long to preview your comment, it will generate an error when you try to post.
Yes, technically a spammer could post one real comment, then based on what was in the session from that they could post the same identical comment over and over, so long as it was attached to the same node. But this is not what they do, they try and spread their spam throughout your webpage. Furthermore, the spam module is perfectly capable of detecting and preventing this.
> Another problem I can think of is, what happens when a cron run happens
> between the preview and the post?? I'm afraid comments would get lost
The key is only generated once, that's what the first test is about. In any case, in the unlikely event that the key were to change between preview and post they would simply have to post a second time.
My earlier patch wasn't quite right, I was testing the token in the wrong place. This patch fixes that.
BTW: This is beneficial for maintaining high quality submissions too, as prior to this change someone could:
1) enter a comment
2) press preview
3) completely change their comment (introducing a mistake)
4) press submit and the comment (mistake and all) would go into the database unpreviewed
After this change:
1) enter a comment
2) press preview
3) completely change their comment (introducing a mistake)
4) press submit and they get an error because they didn't preview their changes - forcing them to preview once after any change
Comment #7
jeremy commentedFWIW: I've been getting slammed by spam attacks this whole week. Installing this patch has made a huge difference. Well over 100 spam attempts per minute (sometimes two and three times that) and I hardly notice the spammer, whereas before it was choking my database.
(Granted, the spammer has not yet upgraded his script to first preview, then submit. But even if he did it wouldn't help him as testing has verified that the new spam module would prevent the comments from ever getting to the database.)
Additionally, user and anonymous (nonspam) comments continue to show up at a normal rate.
Comment #8
jeremy commentedI would love to see _any_ discussion on this. Drupal is currently too easy to spam, with little effort on the spammer's side, and lots of resources wasted on the Drupal side. A patch like this will greatly increase the spammer's burden, and make it possible to effectively block even the most aggressive spammer attacks.
Comment #9
jose reyero commentedWell, this patch is definitely better than what we have, and would save some spam for sure.
But maybe keeping track, at the session level, of generated hashes for a user, and then removing them when the comment is sent, could do the work.
This way we can forget about previewing comments or not, and also the "permission" to post the comment would expire when the session expires. Any randomly generated value could do for this, no need for complex hashes, but having nid and pid in the hash would add some extra security.
Comment #10
breyten commentedJeremy, a big +1 on the idea, but why not generate the private key when it is actually needed (Ie, when displaying the comment form), instead of wasting a _cron() hook on it?
Comment #11
jeremy commented> Well, this patch is definitely better than what we have, and would save some spam for sure.
It is continuing to work very well on my site, which seems to be under nearly perpetual spam attacks from multiple sources.
> But maybe keeping track, at the session level, of generated hashes for a user, and then
> removing them when the comment is sent, could do the work.
The catch is: the key has to be something unique to the server, not guessable or learnable from the outside Simply storing the hash data in the session alone is not enough, as then the spammer could create any random data and store it in the session.
That said, the hash could be generated off something other than the text of the comment as it is now, so that a preview is not required. I'll look at doing something like that and submit another patch.
> This way we can forget about previewing comments or not, and also the "permission" to
> post the comment would expire when the session expires. Any randomly generated value
> could do for this, no need for complex hashes, but having nid and pid in the hash would
> add some extra security.
nid and pid alone are worthless, as they are easy to learn. The pid can always be 0 (spam is rarely attached to a pre-existing comment). The nid is obtained in the path of where the spam is being posted.
The solution is a "private-key", which is what my patch adds. Then sure, hash the private key plus the nid and the pid, and you've got enough protection to prevent most spammers. To make it even more secure, automatic rey-keying could be easily accomplished.
Comment #12
jeremy commentedThe attached patch:
1) gets rid of the _cron() hook
2) no longer requires that comments be previewed
Prior to this patch, comment spammers were able to send data to a Drupal server acting as though they'd filled out a comment form and pressed submit. As they didn't actually use the form, they could submit spam comments at an obscene rate.
With this patch, comment spammers will have to actually load the form, enter text, and press submit. Yes, that can still be automated, but it takes much more work and slows them down, as they have to wait for the entry form to load each time.
Unfortunately a spammer could manually submit one comment, then re-use that same session info over and over to attach repeated spam comments to the same node. Such an attempt would be detected and blocked by the spam module if enabled, but again such a session re-use attack could be done without loading the form each time. Fortunately there is much less gain for a spammer to submit 100 spam comments on the same page, versus submitting 100 spam comments each on a different page as they do now.
Ideas to improve upon this concept include:
- re-key every day or week, changing the private key regularly to be sure it couldn't ever be permanently cracked
- add a key table, and generate a unique key for every comment form. essentially, upon comment form creation generate a random key which is stored both in a database table and in the session. when a comment is submited, look for the key from the session in the database table, if a match is found delete it from the database table and post the comment. this would prevent session re-use, but adds overhead. i don't know if it's worth it, perhaps as an external module if the hooks were available.
Comment #13
drummform_set_error()'s fist argument should be the name of a form field, not 'error.' Using (..., 'error') would be better in this case.
And the actual message needs work. Since this is a hidden field I don't think it has anything to do with cookies.
Comment #14
drummThe unclosed link in my last update was supposed to say drupal_set_message(..., 'error')
Comment #15
jeremy commenteddrupal_set_message(..., 'error') isn't sufficient to prevent the comment from being posted. I have instead updated the patch to set the error on the hidden 'token' form field.
I have updated the message to read:
"Unable to validate your comment, please try again. If this error persists, please contact the site administrator."
If you don't like the error message, better suggestions are welcome.
Comment #16
jeremy commentedAny feedback on this patch? I have been running it on my website for a couple of weeks, and it has completely stopped the most persistent auto-spam scripts that had been posting poker type comments constantly to my site.
Comment #17
Zed Pobre commentedThis patch is against HEAD? It doesn't want to apply to my 2.6.3 comment.module.
Comment #18
Abalieno commentedIt's is for cvs but I'm trying to manually apply it to 4.6.3.
Will comment later to tell how it went.
Comment #19
Abalieno commentedWell, it worked.
No spam at all in more than a day. I don't know if other users are having problem but this patch broke the tool the spammer was using.
:)
Comment #20
jeremy commentedHere's a completely rewritten version of the patch, based on an email discussion with Dries. This provides a more generic interface that can be used to validate other form submissions, not just comments. The patch introduces two new functions, form_token() and form_validate(). The first function uses a private key and a public key to set a token in a hidden field. The second function validates the token. The patch also updates the comment module, demonstrating how these new functions are used.
More information as to how the patch works can be found in the comments that are within the code.
Based on my own experiences on kerneltrap.org, this patch blocks 99% of the current comment spammers.
Comment #21
jeremy commentedThis optional patch is intended to be applied after my form_validation.patch attached to #20 above. It makes the drupal_private_key more secure by regenerating it every 24 hours. Without this patch, it would be possible for a spammer to use brute-force to learn a site's private key by reading the code and observing the token generated for a given form. With this patch, brute force is still possible, but with this patch it would have to be done every 24 hours.
It is possible for a form to be generated prior to a rekey, and then to be validated after a rekey. In this event, the key will fail validation and the user will see a message telling them something like "Validation error, please try again. If this error persists, please contact the site administrator." When they try pressing "submit" again it will work fine.
I kept this patch separate as it adds complexity that may not be desired in core. Personally I think it is important and should be merged along with the first patch. I added it to the system module as it seemed the most logical place, and is a "required" module that can not be disabled.
Comment #22
jeremy commentedThis third patch updates the contact.module to use the new form token validation functions.
Note that this is not yet a perfect solution. Someone wanting to spam the contact form could manually fill it out once to obtain a valid token. They could then use that token to spam the contact form repeatedly, so long as they don't change the fields that were used to generate the token.
The solution is to keep a history of recently validated tokens. Each time a token is validated, make sure it was not already recently validated -- if it was, don't allow it a second time. I would be happy to provide patches for this if it is deemed necessary, and these first patches are merged. It would require a new database table, and a db_query to validate the token. Ideas for alternative solutions are welcome.
Comment #23
dries commentedFor the contact module, maybe it is better to use the recipient's e-mail address to calculate the token. Like that, each personal contact form would have a unique token, making it slightly more difficult to reuse tokens.
Another solution might be IP- or session-based tokens. Do spammers post from a single IP, or do they come from different IPs (e.g. using hacked computers). If you use the poster's IP to calculate the token, tokens become more dynamic.
Comment #24
dries commentedI committed the current implementation of this patch, though I'd like us to think some more about how we can make reusing tokens more difficult without introducing complex logic. For sake of simplicity, I didn't commit the cron-based private key regenerator.
Comment #25
jeremy commentedI have written a function called hash_is_duplicate() that uses a serialized array stored in the variable table to validate that the token is not duplicated for the n last tokens. (n is a constant, defined as 256 in my patch). It is relatively simple, and avoids the need to introduce a new database table. However, left to do is to teach it the difference between a preview and a submit (ie, multiple previews are okay, but only one submit is okay), I'll try and post a finished patch soon.
An alternative idea is to simply remember the last token used (rather than remembering n tokens). At this time, spammers simply submit the same thing over and over as fast as they can. But they're not stupid, and they'd quickly learn to switch back and forth between two different spams.
> For the contact module, maybe it is better to use the recipient's e-mail
> address to calculate the token. Like that, each personal contact form
> would have a unique token, making it slightly more difficult to reuse tokens.
That is what it currently does, on one of the forms. On the other, the information is dynamic so not a good token. The only way we can use dynamic data as tokens is to first force previews.
> Another solution might be IP- or session-based tokens. Do spammers post
> from a single IP, or do they come from different IPs (e.g. using hacked
> computers). If you use the poster's IP to calculate the token, tokens become
> more dynamic.
Very good idea, and a simple change. From what I've observed, spammers use a wide range of IPs.
> For sake of simplicity, I didn't commit the cron-based private key regenerator.
Please reconsider. There are dozens of md5 brute force crackers freely available. They would have to be customized to easily crack our tokens, but as all data but the private key is freely available the customization is simple. Thus, a smart spammer could brute force crack a private key with relative ease and a little patience. An alternative approach would be to double the length of the private key, which would at least make this harder. Thoughts?
Comment #26
jeremy commentedI have been unable to get my is_hash_duplicate() function working properly. I taught it the difference between "preview" and "submit", but it still gets confused if you try and go back and edit content, or if you try and post a second comment to the same place. I'm out of time for now.
Instead, here is a simple patch to use the remote IP address in the token hash, making it a little more difficult for spammers to re-use a hash.
Something else that could be used is the session ID -- but are there legitimate reason someone's session ID may change from page to page? ie, what if cookies aren't enabled, and the session ID isn't embedded in the URL?
Comment #27
dries commentedCommitted to HEAD. Thanks.
Comment #28
eldarin commentedA comment:
having mostly known ingredients in the MD5 hash sum does not make it any safer ...
No matter if it's the username, userid, ip-number or the name of the user's aunt; as long as they are known to the spamming user it does not make for good secret ingredients.
;-)
A better approach would be a more complex input by the site administrator. It would just be a string input and required to be perhaps 120 characters of gibberish using as many fist punching on the keyboard as they can muster.
PS! Also beware that on some platforms mt_rand() amounts to a whopping 32768 integers to choose from.
Excercise: with 100 spams per minute - how long would it take to guess the secret number, and secondly how many spams could the spammer manage to post within the 24h the "secret" was known ?
Spammers are not dumb, and certainly not the programmers they have paid working for them!
Comment #29
eldarin commentedAnother note:
just realize that having a secret that is fairly easy to guess - although it might be 32-bit "security" doesn't mean it's perfect. If you can process and/or heuristically build a good guess it will not be "good enough". If any of you remember the problems with NFS inodes back in the early and mid 1990s, where it was easy to guess inodes and pretty much possible to mount anything by guessing the inode-number. Suffice to say a lot of those "NFS-browsers" were use to root a lot of servers.
The lesson is to not rely on weak security. You don't have to make it overly complicated, just use better solutions. An integer - even a large one - makes it too easy to hack sometime.
Comment #30
jeremy commentedThese patches introduce an infrastructure that makes it more difficult to spam a Drupal site. With this infrastructure now in place, it is simple enough to improve upon it as necessary.
I am curious to see what spammers will do when Drupal 4.7 is a common installation. A theif in a large parking lot will walk past 50 locked cars to find the one that's unlocked, even though picking a lock is relatively easy... It will be interesting to see if they deem Drupal worth the cycles necessary to spam our sites, or if they move on to easier pickings.
In any case, contributions to make this logic more secure are always welcome. I'm less concerned about private keys at the moment, more concerned about token re-use.
Comment #31
eldarin commentedTotally agree with you on token reuse. Private keys are of course pretty good privacy ... ;-)
I think that spammers might choose various strategies; one of which might be a round-robin hammering or parallel hammering to guess the token. Of course such hammering could easily be caught in patterns.
Tokens with simplistic ingredients being reused are not good enough to deterr spammers, I think.
Comment #32
jeremy commented> one of which might be a round-robin hammering or parallel hammering to guess the token.
No, it would be much simpler to simply preview a comment and look at the token generated in the subsequent form. Now, build a custom application that takes the known elements of the key and then sequentually tries each possible unknown element -- brute force. There are plenty of tools out there for brute forcing md5 hashes. If you're willing to spend the CPU cycles. Once you have one token, all of the rest of the work can be done without the involvement of the website.
The deterrent is more that this would have to happen for each website you wanted to spam. And as soon as you find the key, the website owner simply has to delete his private key for it to be auto-regenerated.
As for token re-use, I'm still thinking about that one. Necessity is the mother of invention -- as soon as spammers learn to abuse recycled tokens, we'll certainly come up with a solution.
Comment #33
eldarin commentedJeremy, of course you are right. I was too narrowminded in my thinking about the spamming itself. Taking the MD5 token and finding the integer would probably take less than a minute on some computers. It's just counting from 0 to RAND_MAX - whether it's 32768 or 2^32 - and using the other wellknown ingredients ...
In my view the current solution therefore offers zero protection. Your solution with a truly secret ingredient is what works. Then the fact that you changed the secret every once in a while was even better.
You wouldn't need a true private key for this to work. Let's say you took my approach with a long string of gibberish as required input. Then you permutate this string with a cron-job, just like you regenerated a private key using cron. That would certainly be a good compromise, without using indiviudal tokens - which I think is the strongest alternative.
;-)
Comment #34
eldarin commentedAn extra note:
some say that a timestamp can increase security, but actually it just makes another easy guessing integer. Getting individual tokens with is what is best, but a good compromise is a site-specific tough secret ingredient.
;-)
Comment #35
eldarin commentedI meant "individual tokens with unknown ingredients is what is best". Anyone familiar to the classic knapsack problem might agree here.
Comment #36
eldarin commentedThe timestamp as an ingredient or as a timeout to individual tokens has it's purpose in limiting hammering attempts, though.
Comment #37
jeremy commentedThe current goal is to keep it very simple. The functionality will be enhanced if it proves necessary. If you're really concerned about this, I'd recommend opening a new issue, and attaching a patch to implement a more secure private key, or prevention of token reuse.
Comment #38
eldarin commentedI did open an issue, but marked is as "duplicate" because it obviously was. Well, the better approach is to get the best solution from compromises.
The discussion about using various ingredients just didn't add to getting a better solution. If all the ingredients are known, then it's easy to create the correct MD5 hash, even if the hash itself varies with each form getting POSTed.
There would be little needed change to your code, just use something else than a random integer for the secret key. If it could be a long string to be input by siteadmin, then a random permutation on this string using the cron-script would be a pretty tough nut to crack for anyone, save perhaps some elite chinese scientists with special interests (re: Bruce Schneier news lately)
;-)
Comment #39
(not verified) commentedComment #40
mattconnolly commentedHave a look at this patch for drupal 6.2.
http://drupal.org/node/275368
It allows you to make previewing Optional, Required, or Disabled for new NODES and COMMENTS.