Privacy/security of a website

By shane on 4 Nov 2003 at 00:40 UTC

Lately the *#@#$#@ SPAMmers have been harvesting email addresses off of my website. I know it's the only place possible that they could have picked up the email addresses, because I have posted it absolutely no where else. This is extremely frustrating, to say the least.

I was considering some sort of development of a module ... if at all possible, or filter... What I'd like is to obfuscate ANY email addresses on the website somehow.

Some potential methods might be to create a simple redirect link, replacing the actual email address, to a page on the site that created an image, which contained a string the user had to input correctly, and once so done, would then be able to view the email address information. I know this creates a burden on the user, but it keeps bots from harvesting email addresses. This would be similar to registration requirements/methods used by many of the larger webmail (eg Yahoo, MSN) or other sites (eg Network Solutions for the use of the 'whois' feature).

Another thought was extendig that notion so that if a user is logged in, they don't have to go through those contortions. But anonymous users do.

And, extending that further, the site might actually simply refuse to give out any email addresses unless you have a valid user account. This obviously is good from the perspective that anonymous users can't harvest email accounts - period. However the downside is not being able to publish/post general informational email addresses for people to use (eg feedback, help requests for users that can't login, etc).

Extending even further, it'd be nice to allow registered members the ability to send email to other members. Obviously one gets into a security issue or concern, providing email addresses via a webpage that some users may not want their address revealed on. A possible solution is some sort of simple anonymizer feature - only allow the sender to send an email via web page form, which would then submit an email to the recipient. The senders address would then be revealed, and traditional email exchange could occur, if the recipient so chose.

Even better - as part of the users profile information - they could check a box which defines whether anonymous, registered users, or nobody could see their email address. If "anonymous" was selected, tie into the earlier thought above of using an image authentication before actually allowing anonymous viewers to view the email address.

Are there any other folks out there with similar concerns? What are your thoughs on this issue? Has anyone attempted to do anything relating to this? Am I the only one concerned with protecting the privacy of my users information and the privacy of my system email addresses?

Comments

replace addresses by images

scott_ commented 4 November 2003 at 08:54

Correct me if i'm wrong but as far as i remember, drupal doesn't show users email addresses anywhere. Of course harvesting is still a problem, i'm working on a filter that replaces email addresses by images, that way email addresses are still visible for humans but not to spammers. Well, it does work but still needs a settings page :)

If you want users to be able to contact eachother, it shouldn't be to hard to write a module which adds a field to the user settings page and creates a contact form (which should probably use throttle so it cant be used to fill someones mailbox).

Email addresses

shane commented 4 November 2003 at 17:22

Scott - you're right, by default Drupal doesn't expose email addresses. But when building a true community website, one has the need to provide "contact" information for your group, to the general community at large that you are trying to serve. Often email is an important means of receiving SPAM ... errr ... I mean, communicating with them.

The idea of converting email addresses to images is intriguing, and one method I hadn't thought of.

I've modified the members.module to display users email addresses on the members page - which is only available to valid logged in users.

re: Email addresses

cel4145 commented 5 November 2003 at 18:16

I've done the same to protect member email addresses. So I wonder if members.module could be modified to display email addresses on a separate, pdf page (without email address links) using the pdf view module. Then the members module could be public. Would that do the trick? Or can spam bots convert pdf's to html, as Google often does?

PDF display of addresses

shane commented 5 November 2003 at 20:32

It's possible that the SPAM bot harvesters are only trolling HTML, but it's also fairly likely that a lot of them can convert PDF to HTML/Text, which is fairly easy with lots of available tools to do that. I certainly wouldn't want to rely on PDF format as a protection option.

I like scott_'s suggestion of replaing email addresses with an image file, this sounds a lot like a simple filter feature - although your filter would have to be pretty smart about matching an email address accurately...

Matching isn't the problem

killes@www.drop.org commented 6 November 2003 at 00:50

It isn't that defficult to match an email address. The problem with addresses as images is that you lock out people, in this case blind people (and people using textbrowsers for other reasons).

--
Drupal services
My Drupal services

alt text

scott_ commented 6 November 2003 at 09:07

you can still use alt text that is readable, something like "foo at bar.com". spammers usually wont go through much trouble to find emailadresses. If it doesn't look like foo@bar.com, they will just move on to the next page.

if anyone wants to test or improve it, the module can be found here: http://scott.studentenweb.org/drupal/modules/hidemail.module

Can't access your module

sskennel commented 22 November 2003 at 23:47

scott_,

I get an almost-empty Drupal page when I follow that URL.

-- Roger

Roger H. Goun
Brentwood, NH

fixed

scott_ commented 27 November 2003 at 09:45

I uploaded the module again.

Of course spammers could use ocr software, but for now i dont think they really do. Not that many websites try to obfuscate emailaddresses, what makes me think the cost of doing so is far greater than the benefit. Nobody ever said obfuscation is perfect, but that does not mean it shouldnt be used. Even when some spammers use ocr software, there will still be others who dont, just like there are still people harvesting emailaddresses from usenet..

Anything Science Can Create, Science Can Duplicate

(not verified) commented 26 November 2003 at 14:10

Obfuscation is temporary.

Spammers used to only collect from the Usenet News feeds.
Then they started joining mailing lists.
Then they modified web crawlers to be email harvest bots.

It's trivial to recognize stuff in ALT tags.
It's trivial to look for "at" with "com" or "dot" nearby. (myname at myisp dot com)
It's trivial to look for HTML character escapes (& codes)

Obfuscation only slows down harvesters.
If harvesters are not looking for "at" yet, they soon will.

Using things other than HTML is still only a delaying tactic. The situation is similar to cryptography -- we can protect our communication now with codes which are "good enough" because they require more computer power than most organizations have. In the future every desktop might have enough power to crack present codes, but we gamble that won't happen next month.

Doing OCR on images can already be done, and more will be done if all email addresses are hidden with images. Unless the images are one per letter and the name of the image file with an "S" is "S.png". :-)
Also, if one program to generate images is popular then a "dictionary attack" can be used -- create images with common names, then merely try to match the left side of images to those in the dictionary.

The first race in OCR will be in creating images which are hard to distinguish from other images, or spammers will only do OCR on images which only have two colors and a solid border of one of those colors. Then spammers will have to deal with increasing levels of obfuscation as image generators use more complex patterns.

The "twisted letter" images which are now being used are only a little more complex. Present OCR can't handle them well unless the twist is always the same. But there are methods which can be used, such as topological OCR: a closed loop with one line upward from the top half is "b", "d", or "6".

Using images is only a delaying tactic. If the reward is THOUGHT to be good enough, thieves will make an effort. If the world's largest diamond were in a cardboard box on an empty sidewalk, might it get stolen? In a mall jewelry store? In a police station? In a bank safety deposit box? Thefts take place from all those storage areas -- increased difficulty and reduced chance of escape merely decreases the number of theft attempts. (Safety deposit box? Look up last year's story about the tunnel into a vault...) Even then, theft can happen indirectly -- destroying where an item is stored can increase the value of similar items.

Also remember that it already has been observed that spammers have begun using hijacked Microsoft Windows machines. So some spammer might use thousands of PCs as compute power to decode obfuscated text.

Email is becoming unusable

centipod commented 13 January 2006 at 01:44

[quote]
However the downside is not being able to publish/post general informational email addresses for people to use (eg feedback, help requests for users that can't login, etc).
[/quote]

I haven't read to the end of the thread, (and wow - I just noticed, after having hit the submit button, this was 3 years old too - why does it look the same as a recent posting - well, thats another subject though) but this has been mentioned twice so far. So, may I inject my two cents worth?

First, seems like you are describing a tech-support function. Forms work well for that.

But for anonymous contact....

I've learned over the years that for anonymous public users to contact a group or organization that a contact page is vastly better than publishing an email address of any type. The contact page can send to an unpublished address which then wont get spammed (if its never discovered that is). And CAPTCHA technology (if I spelled that correctly) can also be used to defeat automated submissions. And when a private address is spammed - just move it! But if its tied to a precious branded domain name - its all over with - you will be dealing with spam filtering the rest of your life.

In short - in the era of spam - anonymous input to email could be dying sort of a slow death. Almost all the public figures I contact don't have well known public email addresses. They have web forms. And for good reason.

As for emailing other users - tools like PHPBB, for example, sends intra-board email without violating the privacy of a user by giving out their address. I believe if you want somebody's address you should have to ask them for it. Its similar to having to ask for their unpublished cell phone number. But thats just my personal belief.

I wonder, Does not the PM module also partially solve this intra-board communication problem as well?

One suggestion I have for the community is that gmail now has free pop3 service. I am trying to get drupal to fetch submissions to a gmail address.

This has an interesting side effect. The server that drupal is on wont even allow SMTP connections to it. Why should it when gmail is out there for free? Meanwhile I am separating all incoming postfix traffic to another machine (separate from the web server). If a spammer does conquer the email server it will have zero effect on any websites. I'd rather build drupal sites than spend my life building milters and greylists and bayesian gizmos. But thats just me.

Thank You.

I used to feel the same way.

MrEricSir commented 2 February 2006 at 03:01

I used to feel the same way. But then I realized something: when the average person is on the web, they're essentially annonymous.

I was regularly getting my comment box spammed by dumb kids. But by publishing my e-mail address on the web instead, I get a few hundred spambot generated messages a day.

That's why a comment box is WORSE.

See, when the kids spammed me, their spam was *impossible* for Thunderbird to filter. But the commercial spam filters (almost) perfectly. I think about one spam makes it through a week.

Also, the average person apparently does NOT know their own e-mail address. This was very apparent after I'd regularly get submissions with a from field that started with a "www." and ended with "@com".

Good points...BUT

centipod commented 17 March 2006 at 23:56

...I was really referring more to a contact page. For me this has evolved into more of a survey page, where users select a variety of options in addition to typing in their message. There is no simply reason for a dumb kid to bother with this so I've never gotten even one spam. But then the purpose of the contact page is for customers to initiate a purchase contract. It doesn't attract kids.
If you're doing something that does attract kids then you've got other problems regardless of the technical aspects.

As for "comment" responses, I never enable blog comments from anonymous sources. And since my forums are all private member "by invitation only" there is no spam there either. If I was allowing anonymous web comments I guess I'd be expecting all sorts of nonsense.

I'm not so sure about the anonymity on the web. Your IP address is tracked by many web sites. If you truly want to be anonymous you probably need to don a disguise and use a fake ID to use a computer at a public library. :)

Senator Moynihan (from NY) said the Soviet Union, before it broke up, recorded all the phone calls in the US that went over microwave towers. They were planning on decoding these later as they sifted through the data. Don't worry...they were looking for indutrial technology, not for our personal data.

But these days the NSA, on the other hand, records EVERYTHING. True anonymity is almost impossible.

Private Messages

matt westgate commented 4 November 2003 at 17:30

The private messages module will allow your users to contact each other without needing an email address.

private.messages module

shane commented 4 November 2003 at 17:43

Mathias - I'll check it out, thx. That may solve the issue of allowing members to email each other in a secure/private/safe way, but still need to tackle the greater issue of how to secure all email addresses displayed on the site. Especially if you have users post email addresses in publicly available nodes, etc... Thanks for the pointer.

Upon further reflection, privatemsg.module won't work

shane commented 4 November 2003 at 17:56

Unfortunately, privatemsg.module isn't quite what I'm looking for. I'd like to not be managing email-like services on the website. Most users simply don't log into their account on the site that often. I'm looking more for a module which will allow the users to submit a message, which is gated to the members email address. The privatemsg.module is close, but instead of creating a separate email system, simply gate to the existing system.

I think I'll take a look at the privatemsg.module code and see if I can hack it up to just send messages via that module, instead of the full blown message handling capabilities it currently has.

Thank you for the pointers to the privatemsg.module, it is certainly a good starting point for me.

urlfilter I extended urlfi

killes@www.drop.org commented 4 November 2003 at 13:06

urlfilter

I extended urlfilter.module to obfuscate email addresses by changing some characters to html entities.

--
Drupal services
My Drupal services

Urlfilter.module extension ...

shane commented 4 November 2003 at 17:24

Killes - any chance of getting a peek at what you did to the urlfilter.module? Or, is it on the Drupal site somewhere? I'd like to check it out. Thanks!

My approach is not very good

killes@www.drop.org commented 4 November 2003 at 20:19

My approach is not very good because /all/ @s and .s will end up encoded. I simply put this line

$text = strtr($text, array("@" => "& #64;", "." => "& #46;"));

at the end of the urlfilter function. You'll habe to remove the spaces from the html entities.

--
Drupal services
My Drupal services

If you wish to put an & in a post...

Steven commented 4 November 2003 at 22:52

Use &.

(The text above is: &amp; ;).)

t()

(not verified) commented 4 November 2003 at 17:48

I suppose it would be possible to extend t() in common.inc which parses all output from nodes, etc to identify e-mail addresses and encode them as unicode/base64/username dot domain dot tld or any other format; this can be achived with a reasonably simple regexp match... it's not perfect, but it might do the trick.

some pointers

moshe weitzman commented 6 November 2003 at 12:21

The Deanspace team has an enhanced profile.module whose main benefit is letting users choose how much information to disclose (email address, City, etc.). You might want to clean it up and submit as a patch for 4.4

I like the idea of a filter which substitutes typed in email addresses with equivalent javascript in order to fool the bots.

[please don't post here how you hate javascript]

How long will Javascript solution last, though?

Steve Dondley commented 7 November 2003 at 13:12

If they aren't already, spammers can get a tool like this: http://www.mozilla.org/rhino/jsc.html to decode the email addresses.

I think the only way to avoid spammers is to translate addresses into .gifs. There must be some open source code out there somewhere that can do that. The barriers to spammers getting OCR software to decode the images would be much higher.

CAPTCHA

Steven commented 17 November 2003 at 03:05

OCR software is already being used to defeat (simple) CAPTCHA's, where a user must enter warped text from a picture to complete a registration process. Some spammers might be using stupid measures to get around anti-spam filters ('En.large your pe.nis') but don't confuse their lack of ethics with lack of intelligence.

PS: GIF images are evil, consider PNG instead.

I've never heard of a spammer

killes@www.drop.org commented 17 November 2003 at 10:49

I've never heard of a spammer circumventing images with OCR. It is not worth the overhead. In the time they need to ocr your email image, they can grab dozends of unprotected email addresses.
Also I think that spammer *do* havea lack of intelligence. I bet Kjartan gets a lot of spam on the Drupal-IDs such as mine.

--
Drupal services
My Drupal services

Unless.

(not verified) commented 26 November 2003 at 00:15

If the whole site has protected email addresses they won't necessarily be grabbing dozens of unprotected email addresses, and it would be better to use the OCR.

Talk, Spammer!

(not verified) commented 26 November 2003 at 13:35

I've never heard of a spammer circumventing images with OCR.

Do you often have spammers tell you about their technology?

Other than telling you to buy their CDs full of email addresses and spam mailer programs.

I use this code for captcha

(not verified) commented 17 November 2003 at 13:46

I'd show you but you guys would max out my 64Kb link. Works a treat.
I am sure it works as spammers are lazy and my site is probably not worth the effort to OCR.
http://sourceforge.net/snippet/detail.php?type=snippet&id=100299
Sorry haven't put it in a module yet.

opps

(not verified) commented 17 November 2003 at 13:58

Also (Anon Coward Again) I might add that people really have a bit of fun with captcha.
I have one fellow who I gave a catch all email account to and he was playing email-crazy and everyone else started, I just let them go of course and all sorts of email (fake) gets in there...People can usually tell the silly ones from the real ones= more wasted time for an email collector.

Web Poison

(not verified) commented 26 November 2003 at 13:21

Wasting the time of email collectors is a technology of its own. A popular version is to have a module or script which generates pages full of fake email and HTML links.

The idea is to guide an email harvesting bot into such pages, and the script then emits garbage for the bot's database. Usually the pages include at least one link back to the script...also under different URLs.

An ancient script to do this was "Web Poison". That has faded away. SpamX is a recent script.

A related method is used by people with spam filters: have some visible email addresses which are labeled so humans can see that they are not to be used. Harvest bots don't see the warning. All spam to those email addresses is known to be spam, so that email is fed to the spam filter to increase its recognition of known spam.

Fake addresses

killes@www.drop.org commented 26 November 2003 at 22:53

If you should want to use such a script you should make sure that:

1) You do not generate email addresses of domains that do not belong to you.(even if hadhakdhakhdakdhakhda.com might not exist today it might tomorrow)

2) Use a robots.txt file such that legitimate web crawlers do not index your fake addresses.

--
Drupal services
My Drupal services

Base the revelation of email addresses on user action

(not verified) commented 26 November 2003 at 07:17

Even a page that can parse obfuscated characters, and correctly render javascript can be interpretted.
like the following (spaces inserted to prevent this post from blowingup)
a name="myemail" href="">Email me /a>
< sc ri p t>
function makemailto(){
emailprefix = 'me@';
emailsuffix = 'domain.com';
document.myemail = 'mailto:' + emailprefix + '@' + emailsuffix;
}

However, it is very unlikely that the harvester is simulating mouseovers on all the links, so take that function above and make it activated by a mouseover. Now they need a javascript parser that simulates mouseovers... what was that 3-4 lines of javascript?

Simple and elegant solutions are the answer because the cost of implementation and maintenance is low. (i.e. we will use them)

Hope that helps.

Greg.
www.greentreesoftware.ca

Images are the only way

jhart commented 8 November 2003 at 15:43

We have been discussing this issue extensively at my place of employment. As part of testing various ideas, I wrote a web page, in less than an hour, that scrapes email addresses. It easily defeats both entity encoding and "user at domain dot tld". We concluded that any text based pattern can be similarly defeated; images are the only effective option. We're working on a specification for how to do it and how to make it convenient for users. When it's done, I'll try to remember to post the concepts here.

Is this feasible?

normanice commented 10 November 2003 at 15:21

I had this idea. What do you guys think?

Create a module that takes arg(1) and base64_decodes it and does a header("location: mailto:$data), but only if the http referrer is correct. that will open your email client with the proper email address. then, create a module that scrambles all mailto: urls into a url("module/" . base64_encode(address)). This should fool most, if not all availible email harvesters.

Module created

normanice commented 10 November 2003 at 17:35

i've created a module(email_guardian) to do just this. i'm awaiting my cvs account so I can upload it.

Exposing emails to registered users only won't help

aldem commented 11 November 2003 at 01:04

Because it is simple to register an user and provide it to a harvesting bot - so it will go through and collect whatever is visible.

The idea to protect emails like this is obsolete, anyway - you can't stop spammers from harvesting - technologies are more clever and clever, so the only way to fight it is to implement mail filters - when there is no effect, there is no need to send anything (eventually).

It won't prevent spammers from sending something to you, but it will prevent spam from reaching you, at least - which is (for instance) my concern (I don't care if something is sent to me unless I see it).

Module Created

GwaiLo commented 11 November 2003 at 09:21

Hi all,
This is my first post. I have created a module to do what is required, It creates image files (png) and also obfuscates the HTML that creates said images. There are alt and title tags for those who are using text browsers, and it seems to work quite well. all it requires is a string with an email address (requires the @ symbol) in it to function, and returns the img tags along with images.
The module can currently be found at : http://phoenix.austarmetro.com.au/~xabbu/test.php?email=xyz.foo@bar.this...

(the email $_GET var is just to demonstrate the functionality)
The source for this is at : http://phoenix.austarmetro.com.au/~xabbu/test.phps

I hope someone can comment on this, I have joined the devel-mailing list, and will also be in #drupal on the Freenode IRC network.

"Nationalism is an infantile sickness. It is the measles of the human race." -- Albert Einstein

use the standard "open source" email format...

(not verified) commented 12 November 2003 at 06:00

Like the way I've written mine...
- Aalaap
aalaap at aalaap dot com.

by all means, if people are w

bertboerland commented 12 November 2003 at 08:59

by all means, if people are willing to write a module for this; the more the better [?].

But it seems to me there is alrready a module which does just this; converting text to pictures and its called the Smiley Module!

just replace all "@" signs and "all tld" towards a picture with a "@" sign and ".com", ".nl" etc.

this will of cource replace .communicate also with [image of .com]unicate but that is a site effect. can be solved bu adding a space to the text to be picturized. so ".com " wil becom a pictire and .comxxx not

seems to me an easier way to solve this problem.

what should be done:
* make a small picture of an "@" sign
* make small pictures of all generic topleveldomains (see iana)
* make small pictures of all country code (see country code list)
* add them to the smiley module

Note that having picture in your text might change the layout.

Worth checking out some IBM pages who seems to be doing the same with the eserver

now the email address f.bar@example.com will be f.bar[picture @ sign]example[picture dot com with tailing space]

now all that has to be done is porting the smiley module to 4.3 :-)

--
groets

bertb

--
groets
bert boerland

Cleaner .com workaround.

(not verified) commented 26 November 2003 at 00:11

A cleaner workaround for the .com problem is matching ".com\b". Then it matches the bit 'between' the 'm' and the space but doesn't swallow the space.

A good solution

(not verified) commented 14 November 2003 at 12:26

i've been experiencing the same problem in several websites of my own. In the last website i've done, i did a little brainstorm and find a solution that works for me.
I'm used to code in PHP, but for the last 2 years i've been coding in C#.Net, and did a function that encodes a string in its Int32 Char value.
Here's an example:
let's say my email is scoelho@somemail.com
function converts my email to :
& # 115;& # 99;& #111;& # 101;& # 108;& # 104;& # 111;& # 64;& # 115;& # 111;& # 109;& # 101;& # 109;& # 97;& # 105;& # 108;& # 46;& # 99;& # 111;& # 109;
(remove spacers, as the post decodes the values...)

until now i haven't received any spam to that email...

there must be better ways, and i'm not sure if this really works, anyway, but until now, no spam ;)

i hope this helps someone out there!

SCoelho
scoelho_AT_flesk.com ;o)

Do you know where I find a li

killes@www.drop.org commented 14 November 2003 at 12:37

Do you know where I find a list or function of those values? I have been googling but had no luck or not the right keywords.

--
Drupal services
My Drupal services

Try http://www.unicode.org/ ;)

Steven commented 17 November 2003 at 03:07

Just get the glyph's index using a PHP function. Though it will probably only work on ASCII given PHP's lack of unicode support.

This works as well

jmcclain19 commented 21 November 2003 at 05:07

This works as well

Lucky.

(not verified) commented 26 November 2003 at 00:16

You're pretty lucky. Any HTML parser will convert those back to their normal values.

There two important things to

killes@www.drop.org commented 26 November 2003 at 00:33

There two important things to know about address grabbers:

1.) People who use them are stupid.
2.) People who write them want to save resources, too.
So they won't to html rendering and they won't do OCR.

--
Drupal services
My Drupal services

Error in Logic

(not verified) commented 26 November 2003 at 13:33

A. People who use them are thieves, and thieves can spend an extraordinary amount of effort in their attempts to not work for an honest day's wage. Lazy idiots. Being lazy is fine, as long as you apply it to creating ways of making your work get done with less effort (look up the inventor of the airplane autopilot).

B. Spammers used to merely grab addresses from Usenet News postings. Someone made the effort to harvest from Web pages. Eventually someone will do HTML rendering and OCR. People who write harvest bots expend as much effort as is needed to get an acceptable result. Or they grab code which already does what they want -- some college researcher will publish useful technology.

Obfuscation is not the total solution.
Look for methods which require human activity, or where the process interferes with spam business models.

For example, most spammers don't want to be found. Requiring incoming email to come from a valid email address will block the spam which is labeled with nonexistent sender addresses.

As of Spring 2008, they absolutely do HTML rendering

escoles commented 1 May 2008 at 13:42

As of Spring 2008, they absolutely do HTML rendering. And OCR.

You're right about the cost. What happened is that it became cheap to simulate rendering, and OSS OCR modules became readily available.

It's true that it's a running fight, but email is too useful to discard. It's worth the trouble.

are drupal security?

gzalex commented 1 December 2003 at 09:44

i'm using xoops

:jawdrop:

hihi i'm a newbie...

really

bertboerland commented 27 June 2005 at 18:19

hihi i'm a newbie...

Really? I mean: really? Could have fooled me there.

--
groets
bertb

--
groets
bert boerland

Specification idea for securing email addresses

shane commented 5 February 2004 at 18:51

It's been a while since I revisited this thread, and I just reread through all of it. It occured to me, after mulling over this problem for some time, the following:

have a filter that scans all input for email addresses - should be able to deal with bare addresses with HTML formatting around it (eg <a href="mailto ... ">), and bare text addresses with no formatting
the email address is input into a database table that the module utilizes
the entry in the database contains an ID number (and of course the email address) which is unique to the address, and used as an index later on
obviously check to make sure the address doesn't get duplicated in the database (eg do a case insensitive comparison)
replace the email address with something else in the original node, either of the following:
- a generic image with the words "email me" (or similar)
- an image of an envelope
- an image generated with the text of the email address (as per conversations/code above) - I don't like that one, because it means OCR may be able to scrape the address - if they aren't already
Better yet - provide all of these as administrative options, which also has the ability to specify the image to use for replacement, or to use the code that creates an image with the text of the email address
when someone clicks on the replaced image (eg the "email" text image, or the envelope, or whatever it is), it takes them to an email form page, similar to what the "feedback.module" has
this module would also check that the refer is from a local page ONLY, that way anyone trying to hit the form from a remote source wouldn't be able to use it to try and spam via an HTML robot - disadvantage if someone has the page bookmarked to use the form for legitimate usage
the user inputs their subject, email address (twice so an input typo doens't cause a problem - and compare the results), and the body of the email message
possibly have a check box for "copy me too" feature, so end user gets a *separate* email copied to them also - separate so as not to reveal the protected email address in the To: or CC: field of the copied email - make sure the From: gets set to a "do_not_reply@domain.com" address)
the "copy me too" feature should be a site admin controllable option - which may have a toggle to say, expose the address in the "copy me too" message to the end user (eg construct a single message with a To: and CC: that includes the end user and protected address in one email)
the module then pulls the email address out of the database, based on the ID number passed from the original node page and link to the form
constructs an email using the mail() (or similar) function
the email is dispatched

The benefit as I see it here, is anything exposed to HTML scrapers is a URL link like "/email_guardian/send/22" (where "22" is the index ID of the email address) and a form with a "submit" and an Index ID number. Nothing in the HTML is exposed. The disadvantage is you don't have the email address displayed on the web page - so a user that wants to send email *must* use your form to submit email - after that, _if_ a reply is sent back, they then get the email address from a standard/normal email conversation thread. Another disadvantage, I suppose - is if someone *really* wanted to spam that address, it might be pretty easy to write code to fill in the form and send email - but they wouldn't be able to harvest your address.

It seems to me all of the components pretty much exist to do this to date. The feedback.module has the basics already for submitting an email via a form. The above topics discuss and provide code to generate an image to obfuscate the email address (I prefer replacing it with something completely unlike an email address - eg an envelope image), the filtering capability on input to snarfle up the email address, and Drupal to tie it all together into a single module.

The only major issue here is the usability as it pertains to handicapped. Text only browsers *should* work, since they would be directed to a form to fillin - and I believe all major text based browsers (eg Lynx and Links) support forms. Not exactly sure how this effects handicapped users, since I don't deal with designing websites at the level.

Are there any thoughts, positive or (constructively) negative about this approach? Thank you.

Another Admin Option

shane commented 5 February 2004 at 19:02

...would be to have some feature to scan all existing nodes and do a filter/replace on the existing nodes. This way, existing install bases that add the module could take advantage of it's features... Or better yet - a filter on output (does Drupal do output filtering??) that catches email addresses then executes the logic above.

Shane

Any feedback?

shane commented 6 February 2004 at 19:48

Does anyone have feedback on the specifications I brought up above? If not, I'll move forward with "my vision" - but I was hoping to get some feedback about these features so as to insure the module is full featured and supports the needs of the community.

TIA!

New drupal guardian.module for cloaking email addresses

shane commented 4 March 2004 at 00:52

I've created a new module called "guardian.module" which implements the features I mention in this posting. The core functions of the module seem to work well for me. It is easily extendable to allow other admins to create new "cloaking" modes. I've provided 3 basic "cloaking" modes with this module. In addition I've provided some contributed code for email obfuscating which may be modified and included into the guardian.module - I was unable in a short span of time to get them working - but others may find them useful.

There are some features that need to be implemented before this module is 100% production worthy (eg tracking the referer URL and providing appropriate links back to the original page, etc...).

If you are interested in this module, please take a look at it. Provide comments, thoughts, feature requests, and patches to me directly. I'm waiting to find out how the heck to get my module included into the Drupal contrib section.

Module versions for 4.3.x version Drupal code and 4.4.0-rc code can be found at: http://www.tuna.org/guardian/

My contact details are in the module README file.

Guardian

prashant commented 27 April 2004 at 20:25

I just installed the Guardian module and setup the configuration properly.

However, it doesnt work. So I checked the install file for the module and it says that I need GD installed. How do I do that? Where would I get it from?

Does the guardian module filter previous posts on a Drupal Site?

GD Module and Filter Previous

shane commented 5 May 2004 at 17:58

Prashant - You can find info on the GD packages at http://www.boutell.com/gd/. How you upgrade/add to your system will depend on your OS, Webserver, etc... environment setup.

The guardian.module filters nodes on output - when it's displayed to a client. It doesn't actually change the content in the database. So any existing content will be protected by the module.

Good luck - let me know how it goes. I've got it running well on a couple of production websites - and so far, knock on wood - haven't gotten any SPAM through the posted email addresses at these sites.

Guardian.module in real life

shane commented 16 September 2004 at 16:49

If anyone cares or is interested ... I've been running my guardian.module on 4 websites for over 6 months now. So far I've had 100% success in protecting ALL email addresses on the websites from shit head web-scrapers. I know for certainty that all 4 sites have been scraped several times by many very dubious locations.

There are some limitations in the guardian.module, but nothing that can't be overcome by a competent Regex rewrite and some minor code additions. Currently the module only "cloaks" straight email addresses (eg " bill@winbloze.com "). It doesn't recognize addresses within an "<a href="mailto: ..." tagset.

Some other features that would be nice - use a mailto tagset like above, and pass the "subject=" string through to the guardian to "fill in" the subject of the email. Etc...

The module can be found at http://www.tuna.org/guardian/

Encrypt email address

jbyers commented 6 March 2004 at 19:33

Have you thought about using this free tool to encrypt posted email addresses:
http://www.hiveware.com/enkoder_form.php

Interesting...

shane commented 16 March 2004 at 23:16

I haven't seen that one before, so I'm looking into it now. I'm not sure I can get a copy of the code that does the encoding - as I'd need that to implement within my module. They don't seem to post their code...

Thanks for the pointer.

Convert this method to php

Mardeg commented 18 October 2006 at 05:53

The method mentioned at http://www.webmasterforums.net/resources-scripts/9217-making-obfuscated-... is a non-javascript non-image solution. Is it possible to automate this with php?

Hiveware is great!

cherylchase commented 27 May 2004 at 19:01

I frequently have need to post email addresses at a nonprofit website (http://www.isna.org), and I want to protect the owners from spam harvesting. I have found the Hiveware Enkoder to be a fabulous tool for this. But pasting in the hiveware javascript is beyond the tech skills of most of my content-creators. A filter that produces something similar automatically would be great for two reasons:

1. Save non-technical users from having to look at javascript.

2. As was mentioned above, obfuscation is temporary. With a filter, when the technique used to obfuscate becomes obsolete, it can be updated for all protected content, without having to edit each one.

Hiveware provides a download of a java app, but the source is not open. I wonder if they would be willing to help out, in exchange for credit? It looks as if they publish their app as a service, and might be open to this.

no need... but...

chx commented 25 September 2004 at 21:37

first of all, if you are using _any_ spam circumventing technique, like my address at notexisting domain dot tld -- you are done. Even a spammer is just a businessman who wants you to sell you things. If he knows you are not of the the very few who buys from spam letters, he won't bother.

If you still want to protect your address, here is a funny way to do it:

"me@subdomain domain tld
replace the spaces after the @ with ."

unlike images this is readable by any human, but I doubt any robot will be able to recognize this as an address.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

Any security flaw with openly accessible cron.php file?

ceti commented 27 June 2005 at 12:53

Seeing that anyone could access and activate cron.php, I was wondering if this presents any problems. I am thinking that someone could maliciously target this file and make it run a million times or something...

DoS attacks are possible on

killes@www.drop.org commented 27 June 2005 at 13:55

DoS attacks are possible on all exposed areas of a web site. You can also call index.php a million times. OTOH it does not hurt to make all parts that do not need to be exposed as inaccessible as practical. Try accessing drupal.org's cron.php. ;-)
--
Drupal services
My Drupal services

--
Drupal services
My Drupal services

Crontab configuration

MichaelCole commented 14 November 2006 at 07:23

So, my new hosting provider recommends not doing a GET for the cron job, but shelling PHP directly. Drupal hasn't complained yet.

nice -n 15 php $HOME/public_html/community/cron.php

(Of course your milage may vary from host to host)

With that, I can chmod 640 cron.php, update.php, and others.

I did this with sites/default/settings.php and haven't had any trouble (I still get a blank white screen though)

Anyways, I'm ignorant, and looking for a guide on hardening a drupal install. I'm not a paranoid, but seems like taking a couple steps won't hurt.

So far:
chmod 640 cron.php
chmod 644 sites/default/settings.php

Any other recomendations?

Mike

- check out http://drupal.org/node/94117 - says that update.php is ok.

Obfuscation

scarecrow-rye commented 1 December 2006 at 16:21

Drupal automatically converts any email addresses and web URLS it encounters within content to clickable links. Wouldnt it be simple to amend the function that does the linking to one of the following?

A) Convert the email address to ASCII characters, so you@email.com would become &#x79;&#00111;&#00117;&#x72;&#x40;&#x65;&#00109;&#97;&#00105;&#x6c;&#x2e;&#0099;&#x6f;&#00109; which is readable by the browser but not so easily done by a spider, or

B) Replace the email address with a little bit of JavaScript.

Take a look at http://www.addressmunger.com/ for examples. Personally, baring things like clean code and accesibilty to mind, I prefer option A.

How to insert Hiveware Enkoder Javascript into a Drupal Page?

TallDavid commented 15 December 2006 at 02:53

I've used the Hiveware Enkoder ( http://automaticlabs.com/products/enkoderform ) to encrypt email addresses for many Dreamweaver created websites with great success. Today I tried to insert the Hiveware Enkoder created JavaScript into a Drupal 5 beta 2 website as part of a Page node. I've learned that a simple cut-n-paste of the javascript code does not work. (JavaScript newbie here).

Could someone please detail the steps needed to successfully embed Hiveware Enkoder created JavaScript into a Drupal Page node. Thanks in advance.

easy

bertboerland commented 15 December 2006 at 08:34

create a block, copy the javascript in to it, select full html, post, your done

--
groets
bertb

--
groets
bert boerland

No cigar.

TallDavid commented 15 December 2006 at 14:26

Thanks for the suggestion.

Unfortunately, this isn't working. I created a test block, copied the javascript into the block, ensured that full HTML was selected, saved the block and viewed the test block output. Instead of the expected email link, all that is displayed is "/* */."

This is the same behavior that is observed when attempting to insert the javascript for a Hiveware encrypted email link into a Page node.

Ideas?

FYI, below is the Hiveware generated javascript code:

<script type="text/javascript">
/* <![CDATA[ */
function hivelogic_enkoder(){var kode=
"kode=\"oked\\\"=')('injo).e(rsvere).''t(lispe.od=kdeko\\\\;k\\\"do=e\\\"\\"+
"\\\\\\\\\\rnhg%@nrgh%_g@frpxqh1wuzwl_h_%d_k+h?\\\\\\\\\\\\\\\\#\\\\u@___i_"+
"_d%__polrwu=thhxwvjCodhywvqrsdusldhvvuf1prvBexhmwfL@tqhx|ui#ru#pkw#hdJyovh"+
"rwDqssduvluh1vrf#phzvew__hl____l#%_wow@___h__o%__Ffl#nhkhuw##rhvgqd##qphld"+
"#orwD#wfrl#qhUodH#wvwd#hsDuslddv#ohVyuflv__#h____pA%_hld#ofDlwqrD#ssduvlod"+
"V#uhlyhf#v2?Ad,%___%_{>*>>@r*+i@u>l?3nlg+1rhhjokq4w>0.,5l~@.,n{g@1rkhufwdl"+
"D4+..r,hnfgd1Dk+u,wnl\\\\\\\\\\\\\\\\00g0@r.hl{n+g?1rhhjokqnwgB1rkhufwdnDg"+
"+1rhhjokq4w=0*,>*%,{>*@>*ri+u@l>3?ln+gr1hhojqkw40>,.l5@~,.{n@gr1hkfudwDl+4"+
"..,rnhgf1dkDu+w,l\\\\\\\\\\\\\\\\00n0gr@h.{l+n?gr1hhojqkwnBgr1hkfudwDn+gr1"+
"hhojqkw40=,**>,\\\"\\\\\\\\\\\\x;'=;'of(r=i;0<iokedl.netg;h+i)+c{k=do.ehcr"+
"aoCedtAi(-);3fic(0<c)=+21;8+xS=rtni.grfmohCraoCedc(})okedx=\\\"\\\\e=od\\"+
"\"kk;do=eokeds.lpti'()'r.verees)(j.io(n'')\";x='';for(i=0;i<(kode.length-1"+
");i+=2){x+=kode.charAt(i+1)+kode.charAt(i)}kode=x+(i<kode.length?kode.char"+
"At(kode.length-1):'');"
;var i,c,x;while(eval(kode));}hivelogic_enkoder();
/* ]]> */.

Hiveware Enkoder JavaScript in Drupal - How To

TallDavid commented 20 December 2006 at 17:06

I found a solution...

By removing the following two lines of generated JavaScript code from the Hiveware Enkoder output, I have been successful in getting the encrypted email addresses work properly when pasted into Drupal nodes. Another hint: be sure that your Input Format is set to Full HTML or PHP.

/* <![CDATA[ */

/* ]]> */.

Accessibilty

scarecrow-rye commented 18 December 2006 at 15:53

Unfortunately, I doubt if the JavaScript approach would meet Accessibilty guidelines. Would screen readers be able to read it?

Privacy/security of a website

Comments

New forum topics