Skip to main content
Back to blog

Debugging Cloudflare: DNS, Workers, and Security Issues Without Digging Through Dashboards

Five common Cloudflare debugging scenarios: DNS propagation, Worker 500s, WAF blocks, Tunnel drops, and low cache hit rate — diagnosed faster with AI.

Cloudflare debugging is one of those investigations that starts simple and quietly expands. You go in to check why a DNS record hasn't propagated, and twenty minutes later you're cross-referencing WAF event logs, a Wrangler tail session, and a cached response header — in three separate browser tabs. The dashboard is well-built. The tooling is solid. But nothing in Cloudflare was designed to give you one view across DNS, Workers, the WAF, Tunnels, and cache analytics simultaneously. That cross-feature context is usually the thing you need most when something is actually broken.

This article covers five common Cloudflare DNS troubleshooting, Cloudflare Workers debug, and security scenarios in detail — what the real investigation looks like, and what changes when you can run those same queries in plain English.


Cloudflare's Role in Modern Infrastructure

Before getting into the scenarios, it's worth being explicit about what Cloudflare is actually doing in a typical stack. For most teams, it's not just a CDN or a DNS host — it's several deeply integrated services running simultaneously:

  • DNS management — Often the authoritative nameserver for entire zones, handling hundreds of records per domain
  • CDN and DDoS protection — Proxied records (orange cloud) route traffic through Cloudflare's network; cache rules, page rules, and transform rules layer on top
  • Workers — Serverless JavaScript/TypeScript executing at the edge across 275+ global locations, with access to KV, D1, R2, Queues, and Durable Objects
  • Pages — Static site and full-stack hosting with Workers Functions integrated
  • Tunnelscloudflared creates a secure outbound-only connection from your origin to Cloudflare's network; no public IP, no open firewall ports
  • WAF — Web Application Firewall with managed rulesets (OWASP, Cloudflare Managed), custom rules, rate limiting, and bot management
  • Zero Trust — Cloudflare Access, Gateway, and WARP for identity-aware network access

Every layer adds configuration surface. The complexity compounds quickly once you're using more than DNS. A WAF rule blocks a legitimate request that came through a Worker serving a Tunnel-connected backend — and now you have four systems to check in sequence. Cloudflare WAF issues and Cloudflare security rules misconfigurations are especially hard to trace because the Security Events log and the WAF rule editor live in separate parts of the dashboard.


Scenario 1: DNS Change Not Propagating

The Real Cloudflare Debugging Approach

You've updated an A record. It's been 40 minutes. Your app is still resolving to the old IP.

The investigation goes like this:

  1. Open the Cloudflare dashboard → your zone → DNS → confirm the record is saved and showing the correct IP
  2. Check proxy status — is the record orange-cloud (proxied) or grey-cloud (DNS-only)? Proxied records have a TTL of 300 seconds enforced by Cloudflare regardless of what you set. DNS-only records respect the TTL you configured.
  3. Query Cloudflare's own resolver directly: dig @1.1.1.1 api.example.com A — if the new IP shows up here, the problem is downstream caching, not Cloudflare's authoritative record
  4. Check TTL on the previous record — if it was set to 3600, downstream resolvers will cache the old answer for up to an hour
  5. Use an external propagation checker to see what different global resolvers are returning
  6. If the zone recently changed nameservers, verify NS records have propagated globally as well

That's five steps across the dashboard, a terminal, and a third-party tool before you have a clear answer.

With Clanker Cloud

"Why hasn't my DNS change propagated? The A record for api.example.com was updated 30 minutes ago."

Clanker Cloud reads the current DNS record state from the Cloudflare API, checks whether the record is proxied or DNS-only, surfaces the configured TTL and the effective TTL Cloudflare is advertising, and tells you what downstream resolvers should be doing. If there's a nameserver delegation issue or a conflicting CNAME flattening rule, it surfaces that too. You get the answer in one query instead of five steps.


Scenario 2: Cloudflare Worker Throwing 500 Errors

The Real Cloudflare Debugging Approach

A Worker is returning 500s intermittently in production. Time to find out why.

Option A — Wrangler CLI:

wrangler tail my-worker --format pretty

This streams live logs from the Worker. You'll see invocations, console output, exceptions, and response status codes in real time. If the error is reproducible, this is often the fastest path. Filter by status:

wrangler tail my-worker --status error

Option B — Dashboard:

Navigate to Workers & Pages → select your Worker → Logs tab. This gives you recent invocations with request/response data. The dashboard view is useful for historical errors but requires manual filtering and doesn't show binding-level failures clearly.

What you're looking for:

  • Uncaught JavaScript exceptions (often from edge cases in request parsing)
  • Binding failures — a KV namespace that wasn't attached to the Worker environment, a D1 database that hit query limits, an R2 bucket that returned an unexpected error
  • CPU time exceeded (Workers have a 30ms CPU time limit on the free tier, 30s on paid)
  • Subrequest failures — Workers often fan out to external APIs; a 500 from a downstream service bubbles up as a Worker error

You may also need to check recent deployments:

wrangler deployments list

If a deployment went out in the last hour and errors started then, rollback is likely the fix:

wrangler rollback

With Clanker Cloud

"Which Cloudflare Workers are throwing errors in the last hour?"

Clanker Cloud queries error rates across your Workers, surfaces which one has elevated 500s, shows the time window when errors started (correlating against recent deployments), and identifies which binding is involved in the failure. If the KV namespace binding is misconfigured in the production environment but not staging, that surfaces immediately. You can follow up with "Show me the recent deployments for the payments-worker" and get the deployment history without opening a second tab.


Scenario 3: WAF Blocking Legitimate Traffic

The Real Cloudflare Debugging Approach

Your iOS app team is reporting intermittent 403s against your API. The server logs show no request arriving — which means Cloudflare is blocking it before it hits your origin.

Dashboard path: Security → Events → filter by Action: Block

You're looking for:

  • The triggered Rule ID (e.g., 100014 from the Cloudflare Managed Ruleset, or a custom rule you wrote)
  • The request's User-Agent, IP, ASN, and path
  • Whether the block is from the WAF, rate limiting, or bot management

Once you have the Rule ID, navigate to Security → WAF → Managed rules to find the specific managed rule and decide whether to:

  • Skip it for a specific URI path or IP range
  • Override the action from Block to Log (temporarily, while you investigate)
  • Disable the rule entirely (last resort)

Testing with curl to replicate the block:

curl -v -H "User-Agent: MyApp/2.1 iOS/17" https://api.example.com/v1/endpoint

Add headers progressively to isolate which request characteristic triggered the rule. Check if the app is sending an unusual Content-Type or a body format that looks like SQL injection or XSS to the WAF's pattern matcher.

If the block is from a custom Cloudflare security rule, check your rules under Security → WAF → Custom rules and look for expression mismatches — a rule that was meant to block path contains "/admin" but was written to match too broadly.

With Clanker Cloud

"Is our Cloudflare WAF blocking any legitimate API traffic from our iOS app?"

Clanker Cloud reads the Security Events log, filters for blocks originating from your app's User-Agent or ASN pattern, surfaces the triggered rule ID and the rule's description, and tells you what the rule is matching on. It can then tell you what a minimal WAF rule override would look like to allow your app's traffic through without disabling the broader protection. The investigation that normally requires navigating Security → Events → WAF → Custom rules in sequence happens in one query.


Scenario 4: Cloudflare Tunnel Dropping Connections

The Real Cloudflare Debugging Approach

An internal tool that runs behind a Cloudflare Tunnel is periodically unreachable. No public IP, no reverse proxy to blame — just cloudflared on the origin and a route configured in Cloudflare's Zero Trust dashboard.

Step 1 — Check the tunnel status:

cloudflared tunnel info <tunnel-name>

This shows whether the tunnel is active, which Cloudflare edge nodes it's connected to, and how many connectors are running. Cloudflare recommends running at least two connectors for high availability.

Step 2 — Inspect connector logs on the origin:

journalctl -u cloudflared -f

Look for connection errors, authentication failures (the tunnel credentials file may have rotated), or TCP timeout events from the origin service.

Step 3 — Dashboard verification:

Navigate to Zero Trust → Networks → Tunnels. Check the tunnel's status (Healthy / Degraded / Down), the number of active connections, and the configured routes. If a route is pointing to localhost:3000 but the service is now on port 3001, every connection will succeed at the tunnel level and fail at the origin level — which shows up as a 502 from the tunnel.

Step 4 — Route and ingress rule check:

cloudflared tunnel route list

Verify the public hostname is mapped to the correct tunnel and the ingress rule in your config.yml matches what's actually running.

With Clanker Cloud

"Is my Cloudflare Tunnel for the internal dashboard healthy?"

Clanker Cloud reads the tunnel's current status from the Zero Trust API — making Cloudflare tunnel debugging a single query. It checks the number of active connectors (and whether any have dropped) and surfaces the configured routes against the tunnel. If the route target doesn't match (port mismatch, wrong hostname), that's visible immediately. If the tunnel itself is healthy but a Cloudflare Access policy is blocking the request before it reaches the tunnel, that context is included in the same response.


Scenario 5: Cache Hit Rate Too Low / Content Serving Stale

The Real Cloudflare Debugging Approach

Your Cloudflare CDN troubleshooting starts with a metric: cache hit rate in Analytics is 18% when it should be 70%+. Every request is going to origin. Either Cloudflare isn't caching, or it's bypassing the cache for most requests.

Check the CF-Cache-Status header:

curl -sI https://example.com/page | grep -i cf-cache-status

The response will be one of:

  • HIT — served from cache
  • MISS — fetched from origin, will be cached now
  • BYPASS — a rule explicitly bypassed the cache
  • DYNAMIC — Cloudflare determined this content is dynamic and didn't cache it
  • EXPIRED — cached but expired, fetched fresh
  • REVALIDATED — cached version revalidated with origin

BYPASS is the culprit in most low hit-rate scenarios. Look at what's triggering it:

  • A Cache Rule with a "Bypass cache" action matching too broadly
  • An origin server sending Cache-Control: no-store or Set-Cookie headers (Cloudflare won't cache responses with cookies by default)
  • A Page Rule (legacy) that has "Cache Level: Bypass" set

Dashboard path: Caching → Cache Rules (check each rule's match condition), Analytics → Cache analytics (break down by cache status), Speed → Optimization (check Tiered Cache / Argo Smart Routing settings).

Fix options:

  • Add an explicit Cache Rule: "When URI path matches /assets/* → Cache Everything → Edge TTL 1 day"
  • Use a Transform Rule to strip the Set-Cookie header for static assets before Cloudflare processes the response
  • Enable Tiered Cache to improve cache hit rate at the regional level before requests fall back to origin

With Clanker Cloud

"Why is our Cloudflare cache hit rate so low for the marketing site?"

Clanker Cloud reads cache analytics, checks the active Cache Rules and their match conditions, inspects the cache bypass conditions, and identifies whether the origin is sending headers that force Cloudflare to skip caching. It also checks whether Tiered Cache is enabled and whether any Page Rules (which take precedence over Cache Rules) have bypass conditions set. You get a ranked list of what's causing the low hit rate and what to change.


How Clanker Cloud Works with Cloudflare

Clanker Cloud connects to Cloudflare through your own API token — scoped to the specific zones and accounts you choose, stored locally on your machine. Nothing gets sent to a hosted SaaS layer.

Multi-zone and multi-account: If you manage DNS for ten zones across two Cloudflare accounts, Clanker Cloud surfaces all of them from one interface. Asking "which of our domains has an unproxied MX record?" works across the entire account, not just one zone at a time.

Full-stack context: Cloudflare rarely operates in isolation. Workers fetch from S3 buckets. Tunnels expose services running on EC2 or a Hetzner VM. WAF rules interact with traffic routing decisions made in Route 53 or GCP Cloud DNS. Clanker Cloud holds context across AWS, GCP, Kubernetes, Hetzner, DigitalOcean, and GitHub simultaneously, so your debugging question can span the full chain — not just the Cloudflare layer.

BYOK — Bring Your Own Keys: Use Gemma 4 for fully local inference (no Cloudflare config data leaves your machine), or connect Claude, Codex, or Hermes for agent workflows that manage Cloudflare as part of a broader deployment pipeline. The AI layer is yours to choose; Clanker Cloud handles the API integration and context assembly.

Read-first, then act: By default, Clanker Cloud reads and surfaces information. When you're ready to make changes — update a DNS record, toggle a WAF rule, redeploy a Worker — you enable maker mode explicitly. Every change is shown to you for review before execution. The workflow matches how careful engineers already work: understand first, then act.

See the full technical documentation at docs.clankercloud.ai.


Cloudflare Debugging Cheat Sheet

Problem Cloudflare Dashboard Path CLI / API Command Clanker Cloud Question
DNS not propagating Zone → DNS → check record + proxy status dig @1.1.1.1 <domain> A "Why hasn't my A record for api.example.com propagated?"
Worker throwing 500s Workers & Pages → Worker → Logs wrangler tail <worker> --status error "Which Workers are throwing errors in the last hour?"
WAF blocking legitimate traffic Security → Events → filter Action: Block curl -v -H "User-Agent: ..." <url> "Is our WAF blocking traffic from our iOS app?"
Tunnel connection dropping Zero Trust → Networks → Tunnels cloudflared tunnel info <name> "Is my Cloudflare Tunnel for the internal dashboard healthy?"
Low cache hit rate Caching → Cache Rules; Analytics → Cache curl -sI <url> | grep cf-cache-status "Why is our cache hit rate so low for the marketing site?"
Worker CPU exceeded Workers → Metrics → CPU time wrangler tail <worker> --format json "Is any Worker hitting CPU limits?"
WAF rule false positive Security → WAF → Managed Rules curl with progressive headers "Which WAF rule is blocking our checkout flow?"
Pages deployment failing Pages → your project → Deployments wrangler pages deploy "Why did the last Pages deployment fail?"

FAQ

How do I debug Cloudflare DNS issues?

Start by verifying the record is saved correctly in the Cloudflare dashboard (Zone → DNS), then check proxy status — proxied records (orange cloud) always advertise a 300-second TTL regardless of your setting, while DNS-only records use your configured TTL. Run dig @1.1.1.1 <your-domain> A to query Cloudflare's resolver directly. If the correct IP shows up there but not elsewhere, downstream resolvers are still caching the old answer. If the wrong IP shows up even at 1.1.1.1, the record in Cloudflare itself may not be saved correctly or a CNAME flattening rule is interfering. Check nameserver delegation if the domain recently moved to Cloudflare.

Why is my Cloudflare Worker throwing errors?

The most common causes are: uncaught exceptions from edge-case inputs, binding failures (a KV namespace, D1 database, R2 bucket, or Queue that isn't attached in the production environment), CPU time limits (30ms on free tier, 30s on paid), and subrequest failures from downstream APIs. Use wrangler tail <worker-name> --status error for live error streaming, or check Workers & Pages → Logs in the dashboard for recent invocation history. Cross-reference against wrangler deployments list — if errors started after a recent deployment, wrangler rollback is usually the fastest fix.

How do I fix Cloudflare WAF blocking legitimate traffic?

Navigate to Security → Events, filter by Action: Block, and find the Rule ID that's triggering. For managed rules, go to Security → WAF → Managed rules and create a rule skip or action override for the specific URI path or IP range affected. For custom rules, review your rule expressions under Security → WAF → Custom rules for overly broad match conditions. Use curl -v with the exact headers your application sends to replicate the block and confirm which rule fires. Temporarily switching the action from Block to Log lets you verify traffic characteristics before making the skip rule permanent.

What tools help debug Cloudflare infrastructure?

The primary tools are: the Cloudflare dashboard (covers DNS, Security Events, Workers Logs, Cache Analytics, and Zero Trust in separate sections), the Wrangler CLI (wrangler tail, wrangler deployments, wrangler rollback) for Workers, the cloudflared CLI (cloudflared tunnel info, cloudflared tunnel route list) for Tunnels, and curl with header inspection (-sI, -v) for cache status and WAF replication. For cross-feature debugging — where a single issue spans DNS, Workers, and the WAF — Clanker Cloud lets you query your full Cloudflare configuration in plain English and surfaces context across all features in one place.


Conclusion

Cloudflare is doing serious work in your stack — often invisibly, across multiple layers simultaneously. That's the trade-off: the integration is powerful, but when something breaks, the investigation is scattered across features that don't share a single debugging view.

The five scenarios above cover the issues that come up most often in real production environments. Each one is solvable with the dashboard and CLI tooling Cloudflare provides. The gap isn't capability — it's the time cost of navigating across features, correlating logs manually, and reassembling context that should be visible in one place.

AI-assisted querying doesn't replace knowing how Cloudflare works. It compresses the investigation — you still need to understand why a proxied record has a forced 300s TTL, why a Set-Cookie header bypasses caching, or what a WAF rule skip expression looks like. But you find the answer faster when you can ask in plain English and get a response that's already read the relevant parts of your configuration.

If your team manages Cloudflare security rules, Workers deployments, or Tunnel configurations alongside AWS, GCP, or Kubernetes, Clanker Cloud gives you a single surface to query the full stack. Read the documentation, take a look at the product demo, or download the desktop app and connect your Cloudflare API token in under a minute.

Try Clanker Cloud free →

Next step

Run a local security and drift review

Use Clanker Cloud to inspect live cloud and Kubernetes state with local credentials, then review findings before any infrastructure change runs.

Download and scan infrastructureWatch demo