Peak Bots: Is your site ready?

No, this isn’t just clickbait. In my work with very large retailers, I have seen how the threat/annoyance of bots has become an everyday topic.

[DISCLOSURE: I work at Akamai, primarily analyzing data from mPulse RUM Service.]

I started investigating the effects of bots on Web Performance data last year and how I can help customers eliminate the noise of this traffic from the RUM data that they rely on so that they can then get a true sense of their actual performance.

A lot of this work also occurred in parallel with the same organizations working with security products and services to try to eliminate and respond to ever-changing bot traffic.

When I began my analysis of the effects of bot traffic on RUM data in early 2025, the logical starting point was the data that was easiest to remove: ASNs that are known Hosting & Cloud Providers with services that allow for running of scripted Real Browser Bots for purposes that were banal (synthetic web performance measurement and other testing), annoying (price scrapers and other site information collection services), and malicious (fraud, DDoS, etc.).

[NOTE: The data I work with excludes stupid bots, which are bots that do not run a headless browser but are completely code-based, usually running with the HTTP Library of whatever language they were developed with. Simple rule: No JS execution, no data for me to work with.]

In general, a lot of the major retailers I work with saw that 15-20% of their daily traffic originated from Real Browser Bots. At times within a day, this could spike to over 50%.

Once I removed those from the data, I then went back and added another set of bot signals — old browser versions. Often the Cloud & Hosting Providers overlapped with old browser versions, but there were always a few cases where I would see massive bursts of traffic from retail ISPs that should not be sending this level of traffic from a truly ancient version of Chrome.

When I factored in the Real Browser Bots running from retail ISPs identifying as old browser versions, another 3-7% of daily traffic could be flagged as being from non-human sources.

But then, I got to the hardest to detect and mitigate population of bots: Shadow Bots. These are the bots running modern browser versions from retail ISPs in volumes and with performance that make their visits stand out against traffic from the rest of the ISP.

An example of this would be a segment of modern Chrome users on Comcast showing performance data that is completely out of line with other Comcast users, either in general or for their region.

Shadow Bots are very difficult to control or filter within RUM data, and the detection and mitigation effort falls to specialty security services that can fingerprint and identify specific cohorts of traffic that are not real users. I am still experimenting with ways to flag this data, but they haven’t been completely successful (yet).

By this point, you are thinking that RUM should be changed to BUM (you’re smart; you can figure it out).

Now, there are the AI/Agentic Bots,and these will be much harder to deal with. Some of this traffic will be valid and will be necessary to let through (real visitors initiating real transactions through their new Agents); however, a significant percentage of AI-driven agent traffic will likely come from malicious actors performing tasks that they previous did through browsers.

A number of reports on the effect that Agentic Bot traffic are out and I suggest you review them:

  • Akamai: Publishing Industry Under Attack: Global AI Bot Activity Surges by 300%, Akamai Report FindsLINK
  • Imperva: Bad Bot Report 2026: The Internet Is No Longer Human and It’s Changing How Business WorksLINK
  • Human Security: The 2026 State of AI Traffic & Cyberthreat Benchmark ReportLINK

The future of Bot Management will be complex. What to allow and what to deny will become even harder than it is today. In fact, I foresee a future in which the very concept of the Web, as we know it today, will fade away, replaced with personalized agents performing scheduled and ad-hoc tasks by interacting with other agents, APIs, and bots.

At that point, how do you measure performance?

And who is the Real User in RUM?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *