Google Analytics 4 Spam

CCarter

Final Boss ®
Moderator
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
4,341
Likes
8,855
Degree
8
Alright so these spammers in Poland are making my analytics completely useless. I assume a lot of people here are also dealing with this since there are several discussions on Reddit and Twitter about what's going on.

Reddit: Weird traffic only visible in Google Analytics

Reddit: Suddenly I started to receiving %200 traffic from Poland

Twitter:


GA's twitter hardly gets any interaction besides the "Bring back Google Analytics 3" occasional comment. However this particular tweet has a TON of replies ALL about the Poland spam. So it's gotten out of control. Honestly it's probably why Google Analytics is starting to lose Market Share. After this GA4 move I realize a lot of marketers used this opportunity to find a new solution and with good reason.

mqzJEnF.png

My Problem

R0WE10a.png

Looks good on paper but, that's all fake.

K3SEsnV.png

-

OHS4qRc.png

--

Poland at 2K versus USA at 49 visitors - How am I suppose to start a marketing campaign with all this spam screwing up my metrics? Anyways...

I looked at some BuSo metrics and it's getting spammed too, but I can't do much cause I don't have admin access to my Engineer account - cause I'm not the Engineer...

How It Works

The thing about this spam is - these "visitors" aren't actually going to your site. It will never show up you web server logs.

What they do is they install YOUR Analytics Code onto some page in the deep void of the internet. Then they visit that page from their spamming domain and Google registers it as a referring visitor to YOUR Analytics.

My solution would be to white list MY DOMAIN and WEB SERVERS IP ADDRESS as to being the only one that can register visits. But I don't work at Google so, here we are.

If someone can shoot that solution up the pipeline to the Google Analytics team that would be great.

Potential solutions

Reading reddit got me to a solution:

"It seems what works only is to exclude their IPs in tag settings as "Internal Traffic": 77.222.40.224/24 and 45.140.19.173/24 and 38.180.120.84/24

Here is a video to do the internal traffic tagging:


Good luck.

--

Other Solutions

Other solutions (which I don't agree with work - because again the visitors never hit your actual website are within these linkedin posts:

New Solutions to the 'news.grets.store' and other Ghost Spam Referral Traffic Issue - The Daily Dose of Digital - 23/04/24

Check your GA4! Referral Traffic spike from News.Grets.Store and a solution to deal with it - The Daily Dose of Digital - 19/02/24

However I am desperate enough to try the Google Tag manager solution. Till this day I have zero clue what Google Tag Manger even is.

--

So I just checked, even my TrafficLeaks.com domain is getting it, so if I'm getting it at MakoBoard.com, BuilderSociety.com, and TrafficLeaks.com - that means ALL of you are getting the spam too.

Am I stupid? How are you all solving this?

Shopify Solutions

I just checked another GA4 property and it's actually not getting the spam. It's hosted at Shopify though - might be something there.

Holy Shit I figured it out. The way Shopify calls Google Analytics isn't like normal sites. On Normal sites it's like this:

Code:
    <!-- Google tag (gtag.js) -->
        <script async src="https://www.googletagmanager.com/gtag/js?id=G-1234567890"></script>
        <script>
          window.dataLayer = window.dataLayer || [];
          function gtag(){dataLayer.push(arguments);}
          gtag('js', new Date());

          gtag('config', 'G-1234567890');
        </script>

That code doesn't exist within Shopify. they are calling all the "tracking code" but using some javascript function with some function like this:
Code:
 trekkie.load(
      {
    "Trekkie": {
        "appName": "storefront",
        "development": false,
        "defaultAttributes": {
            "shopId": 27132985441,
            "isMerchantRequest": null,
            "themeId": 123383480417,
            "themeCityHash": "12527924154125950505",
            "contentLanguage": "en",
            "currency": "USD"
        },
        "isServerSideCookieWritingEnabled": true,
        "monorailRegion": "shop_domain"
    },
    "Google Gtag Pixel": {
        "conversionId": "G-1234567890",
        "eventLabels": [
            {
                "type": "purchase",
                "action_label": "G-1234567890"
            },
            {
                "type": "page_view",
                "action_label": "G-1234567890"
            },
            {
                "type": "view_item",
                "action_label": "G-1234567890"
            },
            {
                "type": "search",
                "action_label": "G-1234567890"
            },
            {
                "type": "add_to_cart",
                "action_label": "G-1234567890"
            },
            {
                "type": "begin_checkout",
                "action_label": "G-1234567890"
            },
            {
                "type": "add_payment_info",
                "action_label": "G-1234567890"
            }
        ],
        "targetCountry": "US"
    },
    "Session Attribution": {},
    "S2S": {
        "facebookCapiEnabled": false,
        "source": "trekkie-storefront-renderer"
    }
}
    );

Alright so if someone can figure out how to call Google Analytics 4 code like Shopify does you'll be able to escape this spam cause the spammer scanning your site for your GA code will think you don't have one.

However thinking about it - if the spammers already have your GA4 account you are screwed cause you are already in their system. You would need a new GA4 account with the new way of calling GA4.

Quick Update

Rrw839K.png

I thought I had them yesterday, but the new sites switched to that "38.180.120.84" IP Address which I then added to the internal traffic filter.

However today I don't have a single visitor. So perhaps the internal traffic filter is working.
 
I use Supermetrics and that appears to filter out ghost referral spam because I can't get it to pull into my reports however, it's definitely in the Google Analytics interface.

I can see it's spoofing hostname but only our domains that exist as redirects (misspellings, .biz, etc...). I don't see any with our actual hostnames. So check the hostname that it's coming from before attempting the tag manager solution. I'm wondering if that would work better as primary trigger set to only show your whitelisted hostnames instead of trying to exclude theirs.
 
So check the hostname that it's coming from before attempting
What's a hostname and how would I do what you are suggesting? (Assume I am retarded or 5 years old)
 
Good job on the sleuthing, but I figure that most people have left GA4 by now.

Has it changed its appalling interface?

I'm using Plausible (https://plausible.io/), which does the job of telling me which pages get traffic and seems accurate, but I haven't bothered setting up too many goals and such. I find their pricing a bit steep too. Even so, they do deliver the easy overview dashboard that GA3 used to have.

Among all the bad decisions that Google has made under Sundar Pichai, willingly shutting down GA3 without a reasonable SMB alternative seems to be one of the worst. Are anyone except high level agencies using GA4?

Sundar Pichai really seems to be a bad CEO. Most of his moves, including in search, seems to be aimed at cost saving. Cost saving in indexing, cost saving in free analytics. No visions, bad woke AI etc. I really don't understand why he still has a job. Google has really and very visibly deteriorated with him as CEO.
 
find their pricing a bit steep too

The cloud version is $9 and $19 a month. That's steep?

Also there is a free self-hosted version. However they lost me when I have to install Docker or whatever the hell they are talking about - using postgres database - what is that? This is what I am referring to in my journal about these open software asking us to do nonsense that the average marketer can't do.

MYSQL, PHP - you are good to go. They'll never get mass adoption asking people to go "Dock with the space station using a flux capacitor." Give me a break.

I am getting desperate though, so I might be docking with the space station from the digital ocean to view data on my own site... wild.

But let's not get off topic, Google Analytics 4 is still dominating and will be for the foreseeable 2-3 years. They need to fix this spam problem fast.
 
Hostnames in Google Analytics is referring to what domain the script loaded on. The last big round of ghost referrals spam I recall addressing weren't actually hitting your site and wouldn't get this value correct. So it was easier to filter out by whitelisting your hostname (domain, subdomains, any 3rd party domains like payment portals, etc...)

This latest version appears to be hitting alternate domains registered and redirected to my primary site.
You can verify this in Google Analytics by adding a secondary dimension and typing in "Hostname"
By5deZh.jpeg

If you verify that this traffic is not coming from your primary hostname like in my case, you could set your analytics tag to only fire on your primary domain.

My version might only be a couple steps shorter but it's an attempt at whitelisting vs. blacklisting domains the script is allowed to fire on and will need further testing but I'm thinking you could do something like this. Create a trigger that only fires when on pageview when on your whitelisted domain:
4mBpk9O.png


Then assign that trigger to your Google Analytics tag:
Lk1VwAt.jpeg


If you're okay with the extra configuration and occasionally adding more domains, the solution you provided could work too. I'm guessing this latest version must be attempting to load the script unlike the previous "ilovevitaly" referral spam that I believe was just hijacking UA- ids and injecting its spam right into Google Analytics.

Google Tag Manager has a little bit of a learning curve but once you get the concept it will become easy and in many cases a convenient way to quickly deploy scripts. Just think triggers are what tell your tags to load, variables give you some additional controls, and tags are your scripts you want to load.
 
Google Tag Manager

So this trigger stuff is only possible through Google Tag Manager?

Looking at the way there are triggers seems similar to how Shopify does it, they had some similar variables. Hmmm, that might just solve the problem.

Now I got to finally learn what Google Tag Manager even is.
 
So this trigger stuff is only possible through Google Tag Manager?

Looking at the way there are triggers seems similar to how Shopify does it, they had some similar variables. Hmmm, that might just solve the problem.

Now I got to finally learn what Google Tag Manager even is.
You could use JavaScript to make something similar but it's already there and free.

If you want to get deeper into it, look into the datalayer, very handy for passing e-commerce data to other platforms or custom coding
 
You could use JavaScript to make something similar but it's already there and free.

That doesn't sound right, these ghost visitors are never on my site.

Hmmm how does the GTM stop it if the spammers already have my GA4 ID?
 
That's why you need to verify your hostname for these visits in GA. If they don't match your domain then you're right, they aren't hitting your site.

The tag manager approach only works if they're actually hitting the script somehow. Which I'm not 100% sure they are.

Another alternative to get rid of this is to import your GA data to bigquery and strip it out there with a query before passing it to your bi dashboard of choice

I'm just starting to learn bigquery as a workaround for the terrible UI in GA4. The data is actually really good, it's just hard to get it in a useful manner.
 
sHGZr2R.png

The hostnames do match my domain(s). However here are the last 30 days of CPU usage on the server:

iTNkEiB.png

That's not a server getting 3,300+ visitors in that time frame. CPU barely goes over 1%.

So they are using my GA4 ID to post it on some other location then visiting my site themselves with a bot that can render javascript.

They are using something like PhantomJS to trigger the Google Analytics tag: Testing Google Analytics with PhantomJS

If that guy is able to trigger GA4, then anyone that steals some GA4 IDs can trigger it and send referral spam anywhere.

This sounds stupid, but if someone write a script up that executes this to trigger this spam and open sourced it, perhaps then Google would be forced to block this spam by restricting what domains can trigger your GA4 ID.

It's not even that hard to figure out. Grab a ton of GA4 IDs, then load up their GA4 script in rotation on a random site, and have a Headless browser like PhantomJS keep tapping it over and over. That's it - now everyone knows how they are referral spamming.

In any case blocking the IP Address of 38.XXX.XXX.XXX by tagging it as internal got today's Poland traffic down to 44 versus 333 yesterday. PERHAPS there is some significant delay in GA4 and I may have finally defeated it. But it's going to be a constant cat and mouse game until Google allows us to restrict what domains/IP Addresses can trigger our GA4 IDs. Not some random spammer in the deep web.

This whole thing is ridiculous.
 
sHGZr2R.png

The hostnames do match my domain(s). However here are the last 30 days of CPU usage on the server:

iTNkEiB.png

That's not a server getting 3,300+ visitors in that time frame. CPU barely goes over 1%.

So they are using my GA4 ID to post it on some other location then visiting my site themselves with a bot that can render javascript.

They are using something like PhantomJS to trigger the Google Analytics tag: Testing Google Analytics with PhantomJS

If that guy is able to trigger GA4, then anyone that steals some GA4 IDs can trigger it and send referral spam anywhere.

This sounds stupid, but if someone write a script up that executes this to trigger this spam and open sourced it, perhaps then Google would be forced to block this spam by restricting what domains can trigger your GA4 ID.

It's not even that hard to figure out. Grab a ton of GA4 IDs, then load up their GA4 script in rotation on a random site, and have a Headless browser like PhantomJS keep tapping it over and over. That's it - now everyone knows how they are referral spamming.

In any case blocking the IP Address of 38.XXX.XXX.XXX by tagging it as internal got today's Poland traffic down to 44 versus 333 yesterday. PERHAPS there is some significant delay in GA4 and I may have finally defeated it. But it's going to be a constant cat and mouse game until Google allows us to restrict what domains/IP Addresses can trigger our GA4 IDs. Not some random spammer in the deep web.

This whole thing is ridiculous.
The hostnames you shared are similar to what I was seeing, that it's hitting redirected domains. In my case I don't have those alternate domains defined in GA4 so I would think it had to have come across them by actually hitting them and not just pinging my GA4 ID.
 
The cloud version is $9 and $19 a month. That's steep?

Yes, $20 is steep. It's the death by a thousand cuts of monthly subs.

It's incredible how easy you spend hundreds and thousands on small tools and such if you're not careful.

Of course $20 is not much in itself, but I also don't use it much, in my use case it's mostly a glorified Log visualizer.
 
Back