Event Calendars Are a Crawler Trap, and robots.txt Won’t Fix It

Web 321
June 23rd, 2026
No Comments

Install The Events Calendar on a brochure site that runs three events a year, leave it alone for a month, then read your access logs: Googlebot and a half-dozen lesser crawlers patiently walked the calendar backward toward 1970 and forward into the 2040s, one empty month at a time. The plugin didn’t do anything wrong. It shipped the same architectural decision every WordPress event calendar ships: a navigable date grid with no boundaries, rendered dynamically, linked into infinity.

That decision is a crawler trap. And most of the standard advice for dealing with it either does nothing for your server or actively makes the load worse. I want to walk through why the trap exists, why it’s a performance problem and not just an SEO one, why the usual fixes miss, and the approach we ended up building into a plugin to close it.

Where the infinite URLs come from

The Events Calendar (Liquid Web / StellarWP, formerly Modern Tribe) routes its archive through two query vars: eventDisplay for the view and eventDate for the date. So /events/month/2027-04/ is a real, resolvable URL, and so is 2127-04, and so is every month in between. Each view template renders previous/next links unconditionally, with no terminus, so a crawler that follows links never runs out of calendar to follow. Multiply that by the view set the plugin exposes (month, list, day, week, and the Pro photo and map views) and the day-level views that take a full YYYY-MM-DD, and the addressable space stops being a list and becomes a coordinate system.

The arithmetic is worse than it feels. A single century of month views across six view types is roughly 7,200 URLs. Add day-level views and you’re into the hundreds of thousands, all on a site that might hold a dozen real events. None of it is gated behind anything a crawler respects.

The All-in-One Event Calendar (Timely) gets to the same place by a different road. It hangs everything off one calendar page and navigates with query parameters: action for the view (month, week, oneday, agenda, posterboard, stream), exact_date for a specific day, and month_offset / week_offset / oneday_offset for relative navigation. The offsets are the dangerous part. month_offset=480 is forty years out, costs the crawler nothing to construct, and resolves to a real rendered page.

So this isn’t a bug in either plugin. It’s the natural consequence of building a calendar for a human who clicks “next” a few times, with nothing downstream telling an automated client where to stop. The site needs more code to turn back the onslaught of bots.

It’s a server problem, not just an SEO one

The SEO cost is the part people notice first: thousands of thin, near-duplicate, mostly-empty pages diluting crawl budget and cluttering the index. That’s real. But the expensive problem is what each of those requests does to your origin.

Every one of these URLs is dynamic and, in practice, uncacheable. The parameters are effectively unique, so a page cache sees a near-zero hit rate on them. Each request boots WordPress, loads the full plugin stack, and runs the main WP_Query — and on an event calendar that query is not cheap. It’s date math against post meta, often with recurring-event expansion on top. Then it renders a template, for a month with nothing in it. You are paying the full cost of a dynamic page to deliver an empty grid, thousands of times, to robots.

The knock-on effects are the ones that show up on the invoice. PHP-FPM workers get tied up servicing crawler traffic, so real visitors queue behind it and “Time to First Byte” (TTFB) climbs. The database takes connection pressure it didn’t need. On metered or CPU-throttled hosting, the bill moves. And the page cache, the thing that’s supposed to protect you, gets churned: an unbounded space of one-hit URLs evicts the warm entries for pages people actually visit.

If you want to see your own exposure, grep your access log for calendar paths grouped by status, or open the Crawl Stats report in Search Console and look at how much of Google’s budget is going to calendar URLs versus everything else. It’s usually a larger slice than anyone expects.

Why the usual fixes don’t work

The first instinct is almost always robots.txt. Disallow the calendar paths and the problem goes away, right? It doesn’t, for two reasons. A disallowed URL can still be indexed as a bare URL entry, so you don’t even reliably solve the index-bloat half. More importantly, a crawler that’s forbidden from fetching the URL can never see the 301 or 410 you’d use to consolidate it. You’ve blindfolded the bot and then tried to give it directions. robots.txt is the right tool for one narrow job here — keeping crawlers out of feeds and filter parameters — and the wrong tool for the date tree.

noindex alone has the opposite failure. It fixes the index but does nothing for load, because the crawler still has to fetch the page and render enough of it to read the meta tag. Full dynamic cost, every time, for a page you’ve politely asked not to be indexed.

Caching harder doesn’t help either. You can’t cache an unbounded set of single-hit URLs, and trying just accelerates cache churn. And reaching for a firewall or bot-blocker is a blunt instrument that blocks the crawlers you want alongside the ones you don’t, then turns into whack-a-mole across user agents and IP ranges.

The pattern in all four: they treat the symptom at the wrong layer, after the expensive work is already committed.

Intercept before the query runs

The fix that actually moves the server numbers has to happen early, before WordPress spends anything. That’s the idea behind Calendar Crawl Guard, the plugin we built and put on GitHub.

The whole strategy rests on one hook. The plugin evaluates the request on parse_request at priority 1, which fires inside WP::main() before query_posts() runs the main WP_Query. A request that’s going to be rejected exits there, before the expensive query and before any template renders, so it costs a small fraction of a normal page load instead of the whole thing. Rejected responses also carry Cache-Control and Expires headers, so a front cache or CDN can serve the repeats without ever waking PHP again. That early exit is the part that matters; everything else is policy on top of it.

The policy is a short allow-list plus three ways of saying no. A canonical window — current month plus or minus six by default, configurable — stays fully crawlable. Single event pages are always kept, because they’re the only URLs here with real content. Everything outside the window gets a 410 Gone, deliberately, not a 404: 410 is the permanent signal that tells a crawler to drop the URL and stop coming back, where 404 invites it to try again next week. Alternate and duplicate URLs that have a real canonical target get a 301 instead: the Tribe Bar filter parameters, the photo and map views (and AI1EC’s posterboard and stream), and recurring-event instance permalinks folding back to the base event. There’s an optional 403 for known bad user agents, off by default.

The second half is feeding authority to the URLs you kept, and this is where the robots.txt mistake gets corrected. The plugin adds canonical tags and noindex hints to the non-primary views — using The Events Calendar’s own tribe_events_add_no_index_meta filter where it exists, and a wp_head meta tag for AI1EC — and it scopes robots.txt rules to feeds and filter parameters only. It deliberately does not disallow the URLs it’s redirecting or 410-ing, because those are exactly the ones the crawler needs to fetch in order to see the signal and act on it. Block them and you’re back to the blindfold.

Both calendars run through a single decision engine behind a provider layer that normalizes their very different vocabularies into one set of rules. The useful consequence for anyone running it: the same engine drives the live hook, an admin “Test a URL” tool, and two WP-CLI commands, so what you preview is exactly what production does.

Tuning it for your own site

The defaults are deliberately conservative, but two settings are worth a moment’s thought before you flip the master switch.

The window: how far out do you actually publish?

The canonical window does the heavy lifting. It’s the band of time, current month plus or minus six by default, that stays fully crawlable. Set it to match how far ahead your site genuinely has content. A community theatre that announces a season a year out should widen it to twelve. A venue booking eighteen months ahead, eighteen. A site whose calendar is really just the next few weeks of classes can drop it to two or three and reject everything past that. The number isn’t a performance dial so much as an editorial one: it should describe the real horizon of your events, not a guess about how bots behave.

Whatever you choose, the window only decides what’s kept. What happens to everything outside it is the second setting, “Out-of-window response,” and the default is 410 Gone for a specific reason. A 404 tells a crawler “nothing here right now,” which it reads as an invitation to check back, so it keeps requesting /events/2034-08/ indefinitely. A 410 tells it “gone, and not coming back,” and well-behaved crawlers drop the URL from their queue and stop asking. That permanence is the whole point. You aren’t just declining to render the empty date tree, you’re getting the crawler to quit requesting it in the first place. The plugin puts cache headers on the 410 as well, so a bot that does come back is answered from your front cache instead of PHP. You can switch it to 404, or to a 301 back to the main archive, if your situation calls for it, but 410 is the setting that actually cuts the repeat traffic.

The agent list: turning the worst offenders away at the door

Some crawlers don’t warrant a polite 410. The aggressive SEO-audit bots, and a growing list of AI training scrapers, will burn through the date tree faster than Googlebot ever did and bring you nothing back. The “Block bad user agents” setting takes a list of user-agent substrings, one per line, and answers any match with a 403. Matching is case-insensitive and substring-based, so a single line reading SemrushBot or GPTBot catches every version without you tracking the exact string.

The detail that makes this safe to switch on: the block is scoped to calendar requests. A matched agent gets a 403 when it tries to walk the date tree, but it isn’t shut out of the rest of your site. You’re turning a scraper away from the one area that was costing you money, not standing up a site-wide firewall. Keep the list to agents you’ve actually watched abusing the calendar in your access log, and never add Googlebot, Bingbot, or any search crawler you want indexing your real events. It ships off by default, which is the right starting point. Turn it on once a genuine offender shows up in your logs.

Seeing it work before you trust it

The part I’d have wanted as an agency is the dry run. wp ccg test runs any list of URLs through the engine and prints the verdict for each, so you can pipe in a file of paths pulled straight from your access log and see what would happen to them. wp ccg scan generates month views across a range of offsets and shows you exactly where the window flips from allow to reject, which is the fastest way to confirm your boundary is where you think it is. There’s optional logging for a live view of what’s being intercepted. Turn the rules on, then watch the calendar slice of your Crawl Stats and your PHP worker usage over the next couple of weeks.

Get it, or improve it

Calendar Crawl Guard is on GitHub under an open license, built for PHP 8 and current WordPress, supporting both The Events Calendar and the All-in-One Event Calendar:

https://github.com/dewolfe001/calendar-crawl-guard

Issues and pull requests are welcome. If you maintain another calendar plugin with the same unbounded-date problem, the provider layer is the place to add it, and I’d take that PR happily.

Event calendars don’t have to be crawler traps. They’re traps because the default behavior optimizes for a person clicking “next” and nobody ever told the robot where the calendar ends. Give crawlers a short, honest list of canonical URLs and a gone-means-gone answer for the rest, do it before the query runs, and the load problem mostly disappears on its own.

Facebook Tweet LinkedIn

Comments are closed

Event Calendars Are a Crawler Trap, and robots.txt Won’t Fix It

Where the infinite URLs come from

It’s a server problem, not just an SEO one

Why the usual fixes don’t work

Intercept before the query runs

Tuning it for your own site

The window: how far out do you actually publish?

The agent list: turning the worst offenders away at the door

Seeing it work before you trust it

Get it, or improve it

We'll take good care of your website.

Contact Us

Socials

More

Quick Links