Runbook: Debugging 'My Custom Domain Isn't Working' as a SaaS Support Engineer
If you offer custom domains in your SaaS, you will get this ticket. A customer pastes their domain into your form, follows your DNS instructions, and 12 hours later writes "my domain isn't working." This is the runbook for going from "I don't know" to "fixed" in 5 minutes instead of 5 hours.
It's ordered by probability. Check in this sequence and ~95% of tickets resolve in the first three steps.
1. DNS hasn't propagated yet
Most common cause. Customer added the CNAME, then immediately reported it broken. DNS takes time.
Check:
$ dig +short shop.acme.com CNAME
# expect: edge.domainee.dev. (or whatever your edge target is)
$ dig +short shop.acme.com CNAME @1.1.1.1
$ dig +short shop.acme.com CNAME @8.8.8.8
$ dig +short shop.acme.com CNAME @9.9.9.9
If two or three resolvers see the CNAME but a fourth doesn't, it's mid-propagation. If none see it, the customer probably hasn't actually saved the record yet.
Send the customer:
DNS is still propagating. I can see the record from some resolvers but not from others. Give it 10-15 more minutes. You can check the status at https://dnschecker.org/#CNAME/shop.acme.com — once all the green checks show your CNAME value, the domain goes live in another minute or two.
2. Customer put the CNAME at the apex
Customer wanted acme.com, added a CNAME at the root of the zone. Most registrars silently accepted it but it doesn't actually work (and may have wiped out MX records, which is a separate fire).
Check:
$ dig +short acme.com CNAME
# If this returns anything, that's a problem.
$ dig +short acme.com A
# If this returns nothing, the customer's site is broken entirely.
$ dig +short acme.com MX
# Verify MX records still exist. If they don't, that's why their email died.
Send the customer:
You've added a CNAME at the apex of acme.com. DNS doesn't allow CNAMEs at the root — your registrar accepted it but it isn't actually doing anything. Two options:
- Use a subdomain instead (e.g. www.acme.com) — switch the field in our form
- Use A records at the apex, pointing at our edge IPs: 192.0.2.10, 192.0.2.11
If you also lost email after adding the CNAME, restore your MX records right now.
3. Cloudflare orange-cloud is enabled
Customer's domain uses Cloudflare DNS and they enabled the proxy (orange cloud). Cloudflare is now intercepting your TLS handshake. Cert validation fails, or you get a weird 521 error.
Check:
$ dig +short shop.acme.com A
# Cloudflare IPs look like: 104.16.x.x, 104.17.x.x, 172.64.x.x, 172.65.x.x.
# If you see these, the proxy is on.
$ curl -sI -o /dev/null -w "%{http_code}\n" https://shop.acme.com
# 521 = "Web server is down" — common when the proxy is on incorrectly.
Send the customer:
Your CNAME is going through the Cloudflare proxy (the orange cloud icon). For custom domains on our platform, the proxy needs to be OFF for this specific record.
- Log into Cloudflare
- Go to DNS for acme.com
- Find the CNAME for shop.acme.com
- Click the orange cloud to make it gray ("DNS only")
The domain goes live within a minute of that change.
4. CAA records block Let's Encrypt
The customer has a CAA record at the apex restricting cert issuance. Let's Encrypt can't issue, your TLS provisioning silently fails.
Check:
$ dig +short acme.com CAA
# expect: nothing, OR something like:
# 0 issue "letsencrypt.org"
# Anything else (digicert.com, sectigo.com, etc.) blocks LE.
Send the customer:
Your domain has a CAA record at the apex that restricts which certificate authorities can issue certs. We use Let's Encrypt, which isn't in your allowlist. Two options:
- Remove the CAA record entirely (allows any CA)
- Add a record:
acme.com CAA 0 issue "letsencrypt.org"Either change lets our cert provisioning succeed. Allow 1-2 hours for the CAA cache to clear, then we'll retry automatically.
5. Cert is mid-issuance
DNS is correct. Cloudflare proxy is off. CAA is fine. Cert just hasn't finished issuing yet. Usually under 60 seconds, sometimes up to 5 minutes if Let's Encrypt is slow.
Check:
$ curl -sI -o /dev/null -w "%{http_code}\n" https://shop.acme.com
# If you see "error: alert handshake_failure" or no TLS, cert isn't ready.
$ openssl s_client \
-connect shop.acme.com:443 \
-servername shop.acme.com < /dev/null 2>&1 | grep "subject="
# If subject doesn't include the hostname, the cert isn't bound yet.
Send the customer:
Your DNS is good. We're issuing the TLS cert now. This usually takes under a minute. Give it 2-3 minutes and try again. If it's still not working at the 5-minute mark, message me back.
6. Customer's nameservers are slow or broken
The customer's registrar / DNS provider has flaky nameservers. Records are correct, propagation is slow, occasionally drops.
Check:
$ dig +trace shop.acme.com
# Walk the delegation chain. If the customer's nameservers (last hop)
# return SERVFAIL or time out, that's the issue.
$ dig +short ns acme.com
# Get the nameservers. Then query them directly:
$ dig @ns1.theirregistrar.com shop.acme.com CNAME
Send the customer:
Your domain's nameservers (ns1.theirregistrar.com) are responding slowly or inconsistently. This isn't something we can fix on our end — your registrar or DNS provider is. Two options:
- Move DNS to a faster provider (Cloudflare DNS is free and fast)
- Or open a ticket with your current DNS provider
In the meantime, the domain works for most visitors but may fail for some. Sorry — this one's their infrastructure.
7. Browser caching
Customer's own browser has the old DNS cached. The domain works for everyone else but not for them.
Check:
# From the customer's machine, have them run:
$ ping shop.acme.com
# If the IP they see is different from what dig returns from a public
# resolver, browser/OS cache.
Send the customer:
Your browser has the old DNS cached. Try:
- Restart your browser entirely (close all windows, reopen)
- On macOS:
sudo dscacheutil -flushcache- On Windows:
ipconfig /flushdns- Or test from another device (your phone on cellular data is the easiest test)
The domain IS working — you're seeing yesterday's DNS.
8. The customer typed a typo
Embarrassingly common. They added the CNAME pointing at edge.domain.dev instead of edge.domainee.dev. Or the wrong subdomain. Or copy-pasted a trailing space.
Check:
$ dig +short shop.acme.com CNAME
# Make sure the value matches EXACTLY what your platform requires.
This is also where you catch trailing-dot weirdness (edge.domainee.dev. vs edge.domainee.dev). Some registrars require the trailing dot, some don't.
Send the customer:
Your CNAME is pointing at
edge.domain.devbut the correct target isedge.domainee.dev. Fix the typo and the domain goes live within a minute.
A flowchart, when you don't know which one
Customer reports broken domain
|
v
1. dig CNAME — does it resolve?
NO → propagation OR typo OR missing record → step #1, #8
YES → continue
|
v
2. Is the CNAME at the apex?
YES → step #2
NO → continue
|
v
3. Are the A records Cloudflare IPs?
YES → step #3 (orange cloud)
NO → continue
|
v
4. dig CAA on the apex — does it block LE?
YES → step #4
NO → continue
|
v
5. curl -I — TLS handshake works?
NO → step #5 (cert issuing)
YES → it's actually working; step #7 (browser cache)
The first 3 steps catch 80% of tickets. Steps 4-7 catch most of the rest.
How Domainee surfaces this in the dashboard
The runbook above is generally applicable. In Domainee specifically, we surface most of it automatically:
- DNS resolution runs every few minutes on pending domains; we show green/yellow/red with the exact reason
- CAA records are checked at hostname creation time; we warn BEFORE you tell the customer to add the CNAME
- Cloudflare orange-cloud is detected by inspecting the A records; we flag it with the fix
- TLS issuance state is exposed in the API and dashboard, so you know "still issuing" vs "actually broken"
- The exact dig/curl commands above run server-side; we surface results in the API so your support tools can show them inline
You still hit walls where the customer's own infra is broken (#6, #7, #8), but the first half of this list shouldn't take support time.
Ship it
If you're running a SaaS that offers custom domains and you don't have this runbook handy, save this post. If you're building one, you can skip a lot of it by using a provider that does the detection for you.
50 hostnames free, no card. The detection-and-surface logic above is in the dashboard from day one.
For more on monitoring at scale see /blog/monitor-customer-ssl-certificates-at-scale. For apex-domain specifics see /blog/apex-domain-support-for-saas.