Your Google Analytics Is Lying to You

May 2026 - 14 min read Chart with magnifying glass revealing bot traffic hidden in analytics spikes

You open GA4, and the numbers look great. Traffic is up. Sessions are climbing. Then you dig deeper and realize a significant chunk of what you're looking at is not human. Bots, scrapers, and spam scripts are inflating your data, distorting your decisions, and costing you money. Here is exactly what is happening, and a step-by-step playbook to clean your analytics up.

The Scope of the Problem

Bot traffic is not a niche concern for large enterprises. It affects every website with a tracking pixel.

Cloudflare, one of the largest internet infrastructure providers, estimates that bots currently account for more than 31% of all internet traffic globally. Imperva, a global cybersecurity firm, reported that in 2024, automated internet traffic surpassed human traffic for the first time, accounting for 51% of all web activity.

That does not mean half your GA4 sessions are fake. Google automatically filters known bots using the IAB/ABC International Spiders & Bots List, and any GA4 property created after July 2021 has this enabled by default. But Google's own documentation acknowledges this filtering is limited and cautious — it removes what it recognizes, not what it doesn't.

And modern bots are increasingly hard to recognize.

What Types of Bot Traffic Are Hitting Your GA4 Property

Not all non-human traffic is the same. Understanding the categories helps you target the right fix.

Crawlers and indexing bots: Search engine crawlers (Googlebot, Bingbot) and well-behaved third-party crawlers identify themselves and are largely filtered out automatically. These are not the problem.

Scrapers and competitive intelligence bots: Automated scripts that extract content, pricing, or product data from your site. They may generate sessions and pageviews that look superficially real.

Referral spam: Fake visits that appear in your acquisition reports under suspicious source domains. Some of these bots never actually visit your site — they send hits directly to your GA4 Measurement Protocol endpoint to get you to notice (and click through to) their domain in your reports.

Ghost traffic: Server-side spoofing that sends Measurement Protocol hits directly to GA4 without any page load or browser interaction. No real visit ever occurred.

Residential proxy bots: The most sophisticated category. These bots route through residential IP addresses, mimicking real user locations. They can simulate mouse movements, scroll events, and click patterns specifically to bypass standard detection. Data from 2025 indicates that a significant portion of non-human traffic now successfully mimics human mouse movements.

Geographic bot waves: Coordinated campaigns targeting analytics properties from specific regions. In late 2025 and early 2026, a widely-reported wave of bot traffic from China and Singapore hit thousands of GA4 properties simultaneously, bypassing standard exclusion filters and inflating session counts while tanking engagement metrics.

How to Spot Bot Traffic in Your GA4 Reports

Before you can filter anything, you need to know what you're looking for. These are the most reliable diagnostic signals.

1. Suspicious sessions in the Traffic Acquisition report

Go to Reports > Acquisition > Traffic Acquisition. Look for:

Source/medium combinations you don't recognize
High session volumes with 0% engagement rate
Session duration of 0 seconds across an entire traffic source
Sudden spikes with no corresponding marketing activity

2. Anomalous geographic patterns

Go to Reports > User Attributes > Demographic Details, then view by Country or City. Look for:

Traffic spikes from countries that are not part of your target market
High volumes from cities known for data center infrastructure: Ashburn (Virginia), Boardman (Oregon), Des Moines, Clifton (NJ), or equivalent hubs in other regions
Any country suddenly representing a disproportionate share of your traffic without explanation

3. Unusual hostname data (ghost traffic indicator)

This one catches ghost traffic that never touched your site. Go to Explore > Free Form, add 'Hostname' as a dimension and 'Sessions' as a metric. If you see hostnames other than your own domain or known third-party services (like a payment portal), you are looking at Measurement Protocol abuse.

4. Device and browser anomalies

Bots often run on headless browsers or outdated browser versions. In GA4 Explore reports, check for:

Browser versions far behind current stable releases
Screen resolutions of 0x0 or extremely unusual dimensions (800x600 is a common flag in 2025/2026 given current device norms)
Disproportionate volumes from a single device/OS combination

5. Behavioural pattern checks in User Explorer

Go to Explore > Free Form and use the User ID dimension to examine individual user paths. A bot fingerprint looks like: a single user generating dozens or hundreds of sessions in a 24-hour period with zero scroll depth, no clicks, no form interactions, and single-page visits landing on 404 pages. No human behaves this way at that velocity.

Bot Signal Diagnostic Cheat Sheet

Signal	Where to Find It in GA4	What It Indicates
0-second sessions	Traffic Acquisition > Engagement rate	Bot or ghost traffic
0% engagement rate by source	Traffic Acquisition report	Non-human source
Unknown hostnames	Explore > Free Form > Hostname dimension	Ghost/Measurement Protocol abuse
Suspicious city spikes	Demographic Details > City	Data center or coordinated bot wave
Spammy referrer domains	Traffic Acquisition > Session source/medium	Referral spam
Single page, no scroll/click	User Explorer in Explore	Crawler or scraper
Outdated browser/0x0 resolution	Explore > Device + Browser dimensions	Headless browser bot
Traffic spike without campaign	Overview > All traffic	Bot wave or attack

How to Filter and Remove Bot Traffic in GA4

GA4 gives you fewer built-in filter controls than Universal Analytics did. Here is what is actually available to you, in order from easiest to most robust.

Step 1: Confirm Google's built-in bot filtering is active

GA4 properties created after July 2021 have this enabled by default, but it is worth verifying.

Click the Admin gear icon (bottom-left).
Under Property, click Data Streams and select your web data stream.
Scroll to the Google tag section and click Configure tag settings.
Click Show all to expand options.
Click Data Filters. You should see a filter named 'Bot traffic' set to Active.

If it shows Testing or is absent, activate it. This filter uses the IAB/ABC International Spiders & Bots List and handles known, well-documented bot types automatically.

Step 2: Filter your own internal traffic

Your team's visits are not bot traffic, but they contaminate data in the same way — skewing engagement metrics, inflating sessions, and distorting conversion rates. This is a separate filter you must configure manually.

Find your IP address: Search 'what's my IP' in Google. Write it down.

In Admin, click Data Streams and select your web data stream.
Click Configure tag settings > Show all > Define Internal Traffic.
Click Create. Enter a Rule Name (e.g., 'Office IP'), set match type to 'IP address equals', and enter your IP.
Click Create to save the rule.
Go back to Admin > Data Settings > Data Filters.
Find the 'Internal Traffic' filter (it defaults to Testing mode).
Click the three-dot menu on the right and select Activate filter. Confirm the permanent activation warning.

Step 3: Exclude referral spam sources

If specific spam domains are showing up repeatedly in your acquisition reports, you can exclude them from attribution.

Go to Admin > Data Streams > your data stream > Configure tag settings > Show all.
Select List unwanted referrals.
Add the spam domains you have identified. GA4 will reclassify sessions from these sources as Direct rather than giving them referral credit.

This does not remove the sessions from your data, it just strips the attribution so they stop inflating your referral channel. For complete removal, use the Exploration-based segment approach described in Step 5.

Step 4: Use Explorations to build clean reporting segments

Since GA4 cannot retroactively delete data, the most reliable way to work with clean numbers is to build a segment in Explorations that excludes known bot signatures. This does not alter raw data — it gives you a filtered view for analysis.

In Explore > Free Form, build a segment that excludes:

Sessions with 0 engagement time
Sessions from identified spam source/medium combinations
Sessions from flagged geographic locations (specific cities or countries confirmed as bot sources)
Sessions with suspicious hostnames

Save this segment and use it as your standard view when analysing performance. This approach is safe, reversible, and does not risk data loss.

Step 5: For advanced control, use Google Tag Manager

IP-based filters break down when teams are remote, use VPNs, or operate across multiple locations. A GTM-based approach is more robust because the exclusion signal comes from your own logic, not a network address.

The method: push a custom data layer variable (e.g., is_internal = true) when an internal user is present — identified by a URL parameter, first-party cookie, or login state — and use GTM to set the GA4 traffic_type parameter to 'internal'. The existing internal traffic data filter then catches and excludes those hits.

This requires GTM editor access and some configuration work. If your team has a developer or analytics specialist, this is the most reliable long-term solution, particularly for organisations with remote workforces.

Beyond GA4: Upstream Prevention

Filtering is reactive — it cleans data after bots have already hit your site. For teams dealing with significant or recurring bot problems, upstream prevention reduces the load entirely.

Web Application Firewall (WAF): Services like Cloudflare, AWS WAF, or Sucuri can identify and block bot traffic at the network level before it ever reaches your site or your analytics. Cloudflare's free tier includes basic bot mitigation. Enterprise plans add behavioural analysis and IP reputation scoring.

CAPTCHA and challenge pages: reCAPTCHA v3 runs invisibly and scores users by behaviour without a challenge UI. High-value actions (form submissions, account creation) can be gated by score threshold, and reCAPTCHA v3 scores can be passed directly into GA4 as custom parameters, letting you weight traffic quality in your reporting.

Server-side tagging: Moving GA4 event collection to a server-side GTM container removes your tracking code from the browser entirely, making it significantly harder for bots to trigger events by intercepting your JavaScript. This also improves privacy compliance and data accuracy generally.

What to Do When You Can't Delete the Bad Data

GA4 does not let you delete historical data. If you have weeks or months of bot-inflated sessions already in your property, you have two realistic options:

Option 1: Use segments for all analysis going forward. Document the period when bot traffic was present and apply exclusion segments whenever you analyse data from that window. Annotate your reports so future team members understand the data quality issue.

Option 2: Create a new, clean GA4 property. For severe contamination, some teams start fresh with a new property — properly configured with all filters active from day one — and treat historical data as unreliable. This is a significant reset but gives you a clean baseline going forward.

What not to do: Don't make strategic decisions based on traffic volume numbers from a period you know was contaminated. Prioritize engagement rate, conversion rate, and goal completions from verified human segments rather than raw session counts.

Ongoing Monitoring: Make This a Routine

Bot traffic is not a one-time problem you solve and move on from. New bot waves emerge regularly. The 2025/2026 China and Singapore campaign is one example of a coordinated attack that hit thousands of GA4 properties simultaneously. Building a monitoring habit is the only reliable defence.

Weekly: Check the Traffic Acquisition report for new unfamiliar source/medium combinations and engagement rate anomalies.

Monthly: Review geographic distribution and flag any country or city spikes that don't align with your marketing activity.

After any major traffic spike: Before celebrating, open User Explorer and the Hostname report to validate that the traffic is human. Spikes that don't correspond to a campaign, a PR mention, or a known event are always worth investigating.

Clean Data Is a Design Decision

Your analytics are only as trustworthy as the data going into them. When bot traffic inflates your session counts, it corrupts every downstream decision: the pages you prioritize, the campaigns you scale, the UX changes you justify.

Most teams spend significant effort interpreting their analytics. Very few invest the same effort in validating them first. Getting your GA4 property properly configured — bot filter confirmed, internal traffic excluded, referral spam removed, and a clean segment set up for analysis — takes a few hours. The decisions it protects are worth significantly more.