Your server is running. Your application is healthy. Your database is fast. But users can't reach you because your DNS stopped resolving 20 minutes ago and nobody noticed.

DNS is the most critical piece of infrastructure that most teams never think about monitoring. When it works, it's invisible. When it breaks, everything looks broken, and the symptoms rarely point to DNS as the cause.

Why DNS Failures Are Uniquely Dangerous

Most infrastructure failures degrade your service. DNS failures disconnect it entirely:

Invisible blast radius: Users get "site not found" errors, which look like your company ceased to exist. They don't see a loading spinner or an error page. They see nothing.
Caching masks the problem: DNS responses are cached at multiple layers (browser, OS, ISP). Some users are fine while others are completely blocked. Makes debugging a nightmare.
Your own monitoring might miss it: If your uptime monitor has the old DNS record cached, it reports everything as healthy while new visitors can't connect.

🚨

Real Scenario

A startup changed DNS providers and forgot to re-create one CNAME record. Their main site worked fine (A record). Their API was unreachable for 6 hours (missing CNAME). Their HTTP monitor never caught it because it used the IP directly.

Seven DNS Failure Modes You Should Monitor

1. Record Deletion or Modification

Someone accidentally removes or changes a DNS record. This can happen through Terraform misconfiguration, a DNS provider UI mistake, or an overzealous cleanup script.

expected-records.sh

# What your DNS should return
api.yourapp.com.  A     203.0.113.10
api.yourapp.com.  AAAA  2001:db8::1

# What it returns after someone broke it
api.yourapp.com.  -- NO ANSWER --

2. Propagation Failures

You update a DNS record but the change doesn't propagate to all resolvers. Different users see different results depending on their ISP's caching behavior and which authoritative server they hit.

3. TTL Misconfigurations

A TTL set too high means DNS changes take hours or days to propagate. A TTL set too low means constant re-resolution, adding latency to every request and putting load on your DNS provider.

4. Domain Expiration

Your domain registrar sends renewal emails to an inbox nobody checks. The domain expires. Your DNS stops resolving. Everything goes down, and recovery can take 24-72 hours if the domain enters redemption.

5. NS Record Issues

Your nameserver records (NS) point to a DNS provider you migrated away from months ago. As long as the old provider keeps serving records, it works. When they clean up stale zones, it doesn't.

6. DNS Hijacking

An attacker changes your DNS records to point to their server. Users are unknowingly sending credentials and data to a malicious endpoint that looks exactly like yours. Monitoring that DNS records match expected values catches this.

7. Resolver Failures

Your DNS provider has an outage. This is rarer than application-level issues but has happened to major providers. Having monitoring from multiple regions using different resolvers catches provider-specific outages.

What to Monitor and How

Record Value Assertions

Don't just check that DNS resolves. Verify that it resolves to the expected value:

dns-monitor.json

{
  "type": "dns",
  "hostname": "api.yourapp.com",
  "recordType": "A",
  "assertions": [
    { "type": "record.value", "operator": "contains", "value": "203.0.113.10" },
    { "type": "responseTime", "operator": "lt", "value": 500 }
  ]
}

Monitor All Critical Record Types

A / AAAA records: Your servers' IP addresses
CNAME records: Aliases pointing to CDNs or load balancers
MX records: Email delivery (often forgotten until emails stop arriving)
TXT records: SPF, DKIM, and domain verification records
NS records: Ensure nameservers are correct after migrations

✅

Don't Forget MX

MX record monitoring catches email delivery failures before anyone realizes they stopped receiving messages. This is especially critical for transactional emails (password resets, invoices).

Multi-Region Resolution

DNS can resolve differently depending on where you query from. Check from multiple geographic locations to catch propagation issues and region-specific resolver failures.

Setting This Up in upti.my

upti.my's DNS monitoring queries your domain from every continent and validates that the response matches your expectations:

Create a new healthcheck and select the DNS type
Enter your hostname and the record type to check (A, AAAA, CNAME, MX, TXT, NS)
Add assertions for expected record values
Configure check intervals (we recommend every 60 seconds for critical domains)
Set up alerts for instant notification when records change unexpectedly

Because checks run from multiple regions, you'll catch propagation issues that a single-location check would miss.

📌Key Takeaways

1DNS failures look like your site doesn't exist. There's no error page, just nothing
2Caching makes DNS issues inconsistent and hard to debug. Some users work, some don't
3Monitor record values, not just resolution. Catch hijacking and accidental changes
4Check all critical record types including MX (email) and NS (nameserver delegation)
5Multi-region DNS monitoring catches propagation failures that single-point checks miss
6Domain expiration is a real threat. Monitor it separately from DNS records

DNS monitoring isn't glamorous. Nobody writes blog posts about their great DNS monitoring setup. But the teams that monitor DNS are the teams that don't have 6-hour mystery outages that turn out to be a missing CNAME.

DNS Monitoring: What Can Go Wrong and How to Catch It