Blocking Contact Form Spam Bots

mikey3times

BuSo Pro
Joined
Aug 25, 2018
Messages
169
Likes
157
Degree
1
I was complaining about contact form spam in another thread, but wanted to make this a separate thread to urge @CCarter to share some expertise with us.

BACKSTORY

One of my sites is getting 3-4 spam comments per day. I know, not a lot, but still. I'm used to the occasional outreach from bloggers looking for links to their borderline-relevant topic, but these seem to be bots asking for links to completely irrelevant articles/sites. For instance, I thought this one was pretty direct...

...

I was researching "divorce laws" posts and came across your article [contents of title tag for completely irrelevant article on my site].

Would it be possible to edit the article [url of my article] and add the line "You can also check frequently asked divorce law questions at [redacted law firm who is getting scammed]." and at the text "divorce law", add a link to my website "[redacted url]"?

...

I also get lots of requests to link to some CBD sites even though my site has absolutely nothing to do with CBD. I've also received a bunch of other unrelated-link requests.

SOLUTIONS?

So I followed some of @CCarter's advice:

Have you tried calling your form with javascript? I do this on all my sites, first the contact form page is always noindex, nocache, nofollow - that is essential for it not to appear within Google or other search engine’s index.

Next I create a javascript that ajax calls the form html and places it on the contact form page.

Now most bots do not run/render javascript, so they were to even visit the page there is no form according to them. A user does run javascript so when they land on the page the form appears. I’m away from my computer but an example is if you visit the SW login page without javascript enabled you wouldn’t see a form to login cause it uses ajax to call the form itself.

You no longer need captcha once you implement this. :smile: I thought I wrote up code for this, but it might have been at WF, I'll see what I can do to show a version in a DevOps DevSeries post.

I made the contact form nofollow, noindex. I haven't figured out how to nocache in my CMS. I am also looking to implement javascript loading of the contact page.

My CMS supports Invisible reCAPTCHA, which ties some javascript to the submit button - I may give that a go. I do have reCAPTCHA enabled, so the bots are getting past that...or these are real people. Not sure.

What is everyone else doing? I don't want to shut down the contact form because I do get a lot of helpful messages and I enjoy interacting with the people who use this site.
 
If they are already successfully completing a Google recapcha field, then they are not your normal low level spam bots.

The suggestion to no index the pages the forms are on is fine, provided you don't want to actually rank those pages. But if they are already indexed, you will need to change their URLs (and not create a redirect).

This will likely cut down spam, but there are plenty of tools out there that work by taking a list of websites, and then crawling them looking for contact forms / emails, so no indexing won't stop that.

Using JavaScript is also becoming less effective, because so many websites rely on it, that a lot of bots are created using full browser engines now, hell even Google Bot is using chrome. Check out puppeteer.js if you want an idea of how easy it is to write these things with full chrome rendering.

What has been successful for me in similar situations is testing interactions with JavaScript (time to complete form, touch/click events etc), or setting cookies from image requests (often, browser based bots block images to save bandwidth).

But I would only do this if the problem is serious. As a developer, this stuff takes a while to set up and monitor (to make sure you are not getting false positives and losing genuine submits), as a non developer it's going to take even longer.

Chances are you are better of just deleting a few junk messages every now and then
 
Regarding captcha - even though bots can solve captcha the bot does not need to render the page to solve a captcha. The reason I know this is at SERPWoo all we do is solve captcha all day long and we aren't rendering any pages, just reading straight HTML code. If we were to render each page - that would create a massive, and I mean MASSIVE, amount of lag in the processing time.

Even if a page takes only 1 second to render - there are only 86400 seconds in a day. So it's impossible to get to 2 million+ Google crawls daily if it even took 1 second to render a page completely, it's not scalable.

Now IF a bot literally takes the time to render page and then fill out the form and correctly submit captcha - it will get through. BUT in my opinion that's an unscalable operation.

My Solution

I used to get spam content submissions a lot, so over the course of years I figured out we can stop spam by using their weaknesses.

A lot of bots query Google to find forms.

A lot bots do not render javascript they are lazy and look for "<form" in the html.

By removing the form from Google and by removing the "<form></form>" from the page you eliminate a ton of lazy bots. Here are some methods I use (there are more sophisticated methods but for the purpose of this example I'm going to keep it short):

Here is the solutions that hides my forms from spambots.

First, make sure that the page if it's a contact page is "noindex" and "nocache" / "noarchive". That alone will eliminate bots that scrape Google for forms (looking at Xrumer and Scrapebox).

Second, make sure that the form's data processing (cgi or php) checks that the referring was from that page OR your domain at the very least.

What do I mean?

In my example "contact-us.cgi" is what's processing the form. I make sure that the script looks for the environment data for HTTP_REFERER meaning that data coming in actually came from that page.

Why? This will eliminate another chunk of bots that simply submit the action field but do not do it from the page. So if someone were just to POST data to the "contact-us.cgi" form the data would not get processed.

Third, Cookies are another line of defense. When the form is being submitted make sure the data coming in has a specific cookie that was generated from landing on that page.

Fourth, The way this script works is by simply using ajax to call the html of the form that's saved in the file "ThisRandomStuff.mp4". Most bots aren't going to waste time rendering ajax and will skip nonsense like '.jpg' or 'mp4' files.

Here is a working example: Hidden Form From Bots

Screenshot:

uGxusmw.png

--

Here is the code for your form:

filename: ThisRandomStuff.mp4

Code:
<form action="/a-cgi-bin/contact-us.cgi" method="post" id="submitform">

<table style="width: 400px;">
<tr>
    <td style="width: 150px; text-align: right;"><label for="text-input">Your Name:</label></td>
    <td><input tabindex="1" id="contact_name" name="contact_name" type="text" class="form-control" required=""></td>
</tr>

<tr>
    <td style="width: 150px; text-align: right;"><label for="text-input">Your Email:</label></td>
    <td><input tabindex="2" id="email" name="email" type="text" class="form-control" required=""></td>
</tr>

<tr>
    <td style="width: 150px; text-align: right;"><label>Your Message:</label></td>
    <td><textarea name="comment" rows="8" tabindex="3" class="form-control" cols="30" required=""></textarea></td>
</tr>

<tr>
    <td colspan="2" style="text-align: center;"><input name="submit" tabindex="5" type="submit" value="Send Message" class="btn btn-alt"></td>
</tr>
</form>
</table>

Here is the javascript code to place in your HTML:
Code:
<!-- Stuff that makes the javascript run - @MercenaryCarter -->
<script type="text/javascript">

var rootdomain = (("https:" == document.location.protocol) ? "https" : "http") + "://" + window.location.hostname;

function call_that_form(url) {
    //Not really an mp4 file. This file can be where ever on the same server you want, even different folder
    url = rootdomain + '/buso/hidden-form/' + url + '.mp4';
    var page_request = false;

    if (window.XMLHttpRequest) {
        page_request = new XMLHttpRequest();
    } else if (window.ActiveXObject) {
        try {
            page_request = new ActiveXObject("Msxml2.XMLHTTP");
        } catch (e) {
            try {
                page_request = new ActiveXObject("Microsoft.XMLHTTP");
            } catch (e) {}
        };
    } else {
        return false;
    }

    page_request.open('GET', url, false);
    page_request.send(null);
    writecontent(page_request);
}

function writecontent(page_request) {
    if (window.location.href.indexOf("http") == -1 || page_request.status == 200) {
        //document.write(page_request.responseText);
       document.getElementById('the_formy').innerHTML = page_request.responseText;
    }
}

//calls the form (located at /buso/hidden-form/ThisRandomSTUFF)
call_that_form('ThisRandomStuff');
</script>
<!-- Stuff that makes the javascript run - @MercenaryCarter -->

--

Implementing this will get you a reduction of 95% of lazy spam bots.

Another method you can utilize is to not process form submissions that have certain phrases like "CBD" or "divorce" in the message if you are getting tons of that.

There is always the outliers like a crazy programmer that would build a super smart spam bot that doesn't use Google as a source, but those are far rarer than the lazy spammer. Remember spamming is a numbers game - so creating 2-3 obstacles and you should be good, especially if the obstacles allows you to remove the annoying Captchas for users.

V0EKnCy.gif
 
Interesting new information I learned about Captcha today:


That's serious privacy violation. They are monitoring me all over the internet and therefore realize I'm a real person.

fW4srFQ.jpg

It's getting a bit much...
 
Back