OAPEN Blog

Silke Davison ·

Traffic management and bot protection for the OAPEN Library and DOAB: Implementing Cloudflare and Anubis 

Read this article at
Hypotheses logo

Over the past few years, the nature of internet traffic has shifted fundamentally. What was once predominantly human-driven web is now increasingly shaped by automated activity. Recent data from CERN indicates that bots now generate the majority of web requests, accounting for roughly 60-80% of traffic, a pattern we also see for the OAPEN Library and the Directory of Open Access Books (DOAB), surpassing human traffic for the first time. 

This shift reflects a structural change in how the web operates. Alongside traditional search engine crawlers, there is a rapid rise of AI-driven agents, systems that browse, extract, and process content on behalf of users or organisations. These agents can amplify demand significantly, as a single user query may trigger hundreds or thousands of automated requests across multiple sites.  

For organisations maintaining open infrastructure and publicly accessible content, this new reality introduces complex and often competing responsibilities: ensuring availability, managing costs, protecting resources, and maintaining openness. Platforms such as the OAPEN Library and DOAB are increasingly affected as stability and access come under pressure (see this blog post)

Why traffic management is now essential 

The growth of automated traffic is not only a technical issue, it has direct operational consequences. 

Bots can consume large amounts of bandwidth and computing resources, often without delivering comparable value in user engagement. In some cases, the imbalance between requests and actual human use becomes extreme, straining infrastructure and distorting usage metrics. 

For services with limited resources and a commitment to openness, like the OAPEN Library and DOAB, this introduces real risks, including degraded performance, higher operational costs, and reduced reliability for human users. 

In this context, traffic management is no longer optional, it is essential to maintain stable and equitable access. 

Not all bots are the same 

Automated traffic is not inherently harmful. In practice, the OAPEN Library and DOAB interact with a wide range of bots, not only malicious or abusive ones engaged in scraping, data harvesting, or disruption. These include: 

  • Search engine crawlers that support discoverability 
  • Scholarly and indexing services that enable research and metadata aggregation 
  • AI agents used by researchers, institutions, and emerging digital tools.

Many of these bots behave similarly at a technical level, making it difficult to distinguishing between legitimate and abusive traffic. Blocking indiscriminately would directly affect both accessibility and visibility. 

For infrastructures supporting open science and scholarly communication, this creates a delicate balance: protecting systems without undermining the mechanisms that make content discoverable and reusable. 

Implementing Cloudflare and Anubis 

To address these challenges, we are adopting a layered approach to traffic management using Cloudflare and Anubis, with a focus on stability, availability, and performance under high load. 

DOAB: Cloudflare for traffic management 

DOAB currently uses Cloudflare to manage high traffic and improve availability. Cloudflare sits between users and our servers, filtering traffic, accelerating content delivery, and blocking malicious requests before they reach the system. 

Its large-scale traffic analysis and bot classification capabilities allow us to better understand incoming requests and differentiate between traffic types.  

Key characteristics 

  • Requests are filtered and distributed to reduce system overload 
  • High-frequency requests can trigger rate limiting or challenge mechanisms (e.g., CAPTCHA) 
  • High-load endpoints such as search are protected 
  • Libraries and trusted IP ranges can be whitelisted.

Impact 

The impact has been immediate and visible: 

  • Most requests are now properly resolved 
  • Errors related to system overload have been significantly reduced 
  • Previously frequent “503 – service unavailable” responses have become almost non-existent 
  • Overall, availability has improved, even during peak traffic.

By managing incoming traffic at scale, Cloudflare helps protect DOAB from overload while maintaining performance for users. 

OAPEN: Anubis in production 

Anubis is deployed in the OAPEN Library to distinguish between human users and automated traffic. It is an open-source anti-bot tool that uses a proof-of-work challenge, requiring visitors to complete a lightweight computational task, easy for real users but costly for bots. 

Key characteristics 

  • Users receive a challenge in the browser and if it is solved, a cookie that confirms their status for one week is added, reducing repeated challenges. (see How Anubis works)
  • Libraries and trusted partners can be whitelisted to avoid disruption 
  • Support for Google Scholar and the Internet Archive has been ensured, so indexing and archiving continue seamlessly. 

Impact 

The introduction of Anubis has already led to: 

  • Improved uptime and responsiveness 
  • Better handling of peak traffic 
  • Reduced pressure on backend systems  

We continue to monitor its effect on user access and usage data, which remains unaffected, including for OAI PMH and REST functions. 

Protecting access for “good” bots 

A key priority is ensuring that protective measures do not unintentionally block useful automated traffic. 

Many bots play a vital role in the research landscape, including search engine indexing, scholarly aggregators, repository synchronisation systems, and AI tools supporting scientific workflows. These “good” bots underpin discovery, interoperability, and reuse, all core principles of open science. 

At the same time, the boundary between beneficial and harmful automation is increasingly blurred. Some AI crawlers, while not malicious, can still generate significant load and extract large volumes of content without returning value. 

Managing bot traffic therefore requires nuance, not blanket blocking. It involves allowing trusted and verified bots, monitoring emerging agents, and adapting policies as needed. 

Supporting openness while ensuring sustainability 

Our objective is not to restrict access, but to ensure that access remains sustainable. By implementing Cloudflare for DOAB and Anubis for the OAPEN Library, we aim to: 

  • protect the stability and performance of our services
  • reduce the impact of abusive or excessive automated traffic
  • and preserve fair access for all users – human and machine alike. 

At the same time, we actively work to maintain openness by allowing legitimate crawlers and indexing services, reviewing and adjusting allowlists, and by staying aligned with the needs of the research community. 

Looking ahead 

Managing automated traffic is an ongoing process. For the OAPEN Library and DOAB, this involves: 

  • Fine-tuning challenge behaviour and rate limits 
  • Maintaining and updating whitelist rules for trusted partners 
  • Ensuring reliable access for essential services such as indexing and archiving 
  • Monitoring impacts on usage statistics and user experience 
  • Adapting approaches as traffic continues to grow.

The rapid rise of AI-driven traffic has fundamentally shifted the balance of the web. Responding requires not only technical solutions, but also a clear commitment to responsible infrastructure stewardship. 

This is part of a broader challenge for organisations supporting open knowledge: how to remain open, reliable, and sustainable in an environment where machines are no longer just participants, but the dominant users of the network. 

These measures provide a strong foundation for managing growing demand while keeping both platforms stable, accessible, and reliable. 

— Anna Wałek, Head of Technology, OAPEN Foundation

Questions or comments? Get in touch.