Email addresses displayed on your site, including those hidden in
forms, are vulnerable to malicious software used by spammers. This article provides an
overview of such spam bots (robots) and email harvesters, and explains
steps you can take to prevent email addresses from being harvested
from your web pages.
Spam Bots and Email Harvesters
While legitimate bots and spiders crawl your site in order to index
your pages for inclusion in search engines and directories, malicious
bots do the same in search of email addresses. But unlike their
legitimate counterparts, malicious bots ignore standards and rules,
and even use methods specifically intended to avoid their detection
and prevention.
For example, legitimate bots will obey instructions provided in a
robots.txt file, telling them what they can and cannot crawl, while
malicious bots will not. More recent email harvesters will also
identify themselves falsely as legitimate bots in order to avoid
common detection and blocking methods.
Spam bots and email harvesters examine the HTML code of web pages
and collect anything that looks like an email address. Therefore, in
addition to conventional "mailto" links, they will also find email
addresses embedded in contact forms.
Prevention Methods
Advanced developers commonly deny access to known malicious bots
via rules specified in an .htaccess file. However, this method requires the
name or identifier of the offending bot, which does not protect
against bots that falsely identify themselves. This is also a
reactive approach, requiring additional entries after-the-fact,
as new malicious bots become known. Furthermore, the .htaccess file is processed before access to the
site is granted, so a long list of validation rules can significantly
reduce load times and general site performance for legitimate
visitors.
Malicious bots can only collect email addresses they can identify.
You can therefore take steps to alter the appearance of email addresses so they remain
accessible to visitors, but cannot be identified by email harvesters.
One approach is to include email addresses as images. However, visitors would no longer be able to click
the email address to send a message. And adding this functionality would
render the link
as vulnerable as a normal text link.
Another approach is to break the email address into separate
JavaScript variables, then reassemble these when the page is loaded
to provide visitors with a normal text link. However, if your visitor has
disabled JavaScript, they won't see your email address either.
A practical solution is to convert the email address and mailto
parts of the link to
Unicode. This is then rendered as a
conventional text link in web browsers, but appears as gibberish to
email harvesters.
For
example, the email address "a@b.com" would normally be coded as follows:
<a href="mailto:a@b.com">a@b.com</a>
The HTML tags need to remain as they are, but everything else can be
changed to Unicode. So in the above example, mailto:a@b.com would be
changed to:
mailto:a@b.com
And a@b.com would be changed to:
a@b.com
The resulting link appears and functions exactly as a conventional text link:
a@b.com
This same method can also be applied to email addresses embedded in
forms, which is not possible using other methods shown in this
article.
Tools that convert ASCII to Unicode are freely available on the Internet.
Simply search for "ascii to unicode" using your preferred search engine.
< Back to Hosting Articles
|