Hide, obfuscate or otherwise prevent the harvesting of email addresses

Question

I am developing a public repository webapp for my organization.

It will be public webapp, exposed to the internet. All people and organisational units can be queried and its contact data will be displayed. It is developed as a single page app against a REST back-end. There will also probably be a mobile front-end in the future.

One requirement is that people's emails are visible, and are clickable links with the mailto:email@so.com href attribute, so users can click on the address to quickly start writing an email.

On the other hand, I want to make email harvesting difficult for spammers (I know that with the above requirement, it will always be possible to ultimately get the email adresses but I don't want it to be extra easy). So I don't want to expose the emails in clear text in my API.

The previous version of this app used server-generated text-to-image to show the address, and then the onclick handler used an AJAX call to get the actual address from the server (based on the ID of the person), then activate the "mailto" link.

It does not seem so good to generate one or two extra server calls for each person displayed, especially when displaying a search results list. I am thinking I can probably do better. For example, I could just include the email field in my API, but obfuscate/encrypt it. The app (or any future client made by us such as a mobile app) would know how to decode the email address.

Is there a better way to do this?

score 5 · Answer 1 · answered Nov 26 '15 at 10:06

Don't overthink things, the obvious way (render image instead of text) is exactly the right thing to do here.

Nowadays, any time delay or processing cost involved in an extra server call wil be negligible compared to the kind of time it takes a user to move a mouse and perform a click in the first place. (From the viewpoint of a computer, people move in ultra-ultra-slow motion.)

score 2 · Answer 2 · edited Nov 26 '15 at 15:48

Why not use an image as stated above, and include a (lightly) encrypted email address for each image, along with a local Javascript function to resolve the obfuscated email upon click? That way everything stays single-trip, but most spam harvesters aren't going to be hooking into the event and looking for a process path for a result, they're just going to read the tag and hope it's valid. Simple and effective, no? Not the cure all end all, but it ought to do.

Mike Nakis · Accepted Answer · 2015-11-26T16:52:36.480

First, let me say that the image is a good solution, and it does not require any extra roundtrips to the server while displaying search results: once generated, the image can be saved in an image file on the filesystem of the server, and served as a plain <img src="user337567.png"/>. This means that the server will essentially be caching the images, recomputing them only in the event that an email address has changed. Extra roundtrips to the server will only be required when the user clicks on an email address image, but clicking is an operation performed in human time, and therefore represents negligible overhead.

One slight problem with this approach is that spammers may be using optical character recognition technology. One way to account for this possibility would be to make the rendered email addresses difficult to read, sort of like a captcha, but you will never have any metric telling you how successful you were in this.

Other approaches:

Require authentication.

Make the webpages of your public repository webapp visible to all visitors, but hide the email addresses. When a visitor clicks on (or hovers over) a hidden email address, inform them that they have to register in order to view that information. In the registration process, require a captcha. The logic behind this:

It is only fair that you can see our email addresses if you first let us know yours, right?

Note that you can even use this approach in addition to serving email addresses as clickable images, for added security.

Also note that you can use this approach to protect a lot more information than just email addresses. (What if there is a need to also protect phone numbers later?)
Use additional anti-harvesting measures.

One approach commonly used is to require that clients use cookies, so as to be able to identify each client, and then keep track of requests received by the server from a specific client, and if the client sends too many requests too fast, then blacklist them. Normally, blacklisting means denying any service whatsoever, but in your case, blacklisting could simply mean that from that moment on you don't show them any more email addresses, or that from that moment on you start showing them images instead of email addresses.

Note that this is a generally useful thing to have, which may prevent various different kinds of abuse, and you might want to implement it regardless of what you end up doing specifically for the email addresses.
Implement "we'll call you back"

If you really want to avoid authentication, then instead of displaying email addresses, you can have a "contact" field, which, when clicked, pops up a dialog which asks the visitor to enter their email address, (probably along with a captcha,) and sends the visitor an email message to which the visitor may reply.

Hide, obfuscate or otherwise prevent the harvesting of email addresses

3 Answers3