Some of our DC-hosted services would intermittently lose the ability to reach their own public URLs. A webhook from one internal service to another would hang. A health probe would time out. The logs at both ends showed nothing useful.
It was always specific source/destination pairs. The VPS was reachable from the internet fine. From other parts of our network, fine. From most laptops, fine. From this DC host to that VPS’s public IP, TCP just timed out. No RST. Silent.
And once a pair was broken, it stayed broken.
The Architecture
We run a fairly common pattern: a public-facing VPS holds the public IP and TLS cert, terminates TLS, and reverse-proxies inbound traffic back into the DC over Tailscale. The DC itself has no public ingress.
That works well from the outside. From the inside, it has one awkward property: any DC-hosted service that follows DNS to its own public FQDN — for webhooks, health probes, third-party integrations, anything that doesn’t know the difference between an internal and external request — makes a public-internet round trip. DC → public IP of the VPS → reverse proxy → back into the DC.
That round trip is the path that broke.
The Investigation
Tcpdump on the DC: SYNs going out, nothing coming back. Tcpdump on the VPS: nothing arriving. Something between us and the VPS was eating packets. Traceroute indicated that our traffic intended for the VPS couldn’t make it past the ISP.
But only on the public-IP path. The same DC host could reach the same VPS perfectly well via the tailnet IP. HTTP/2, valid certs, no issue. So the network worked; just not the public-internet leg of it.
That left two suspects: our DC’s egress, or the VPS’s ingress. Process of elimination pointed at the DC’s ISP. We asked them.
Their answer was unexpected. Their IDS had decided that some of the UDP traffic between our DC hosts and certain external IPs was malicious — specifically, Tailscale’s NAT-traversal probes. Similar fingerprints ship with most enterprise IDS and SASE products these days. Their IDS sees enough of those probes between a (source, destination) pair, blacklists the pair, and from then on silently drops all traffic between them — including unrelated HTTPS.
We asked for a whitelist. They declined. The IDS was running a vendor signature set and the ISP wasn’t customising it per-customer. Their network, their call.
The Asymmetry
The probes triggering the IDS were flowing from the VPS toward our DC hosts. Two things are true about those probes:
- They can’t succeed. Our DC hosts are behind NAT. They have no public ingress. A probe from outside has nothing to discover.
- They cause damage. They’re the traffic the IDS fingerprints. They trigger the blacklist. The blacklist takes out everything else.
The probes in the opposite direction — DC host originating outward to the VPS’s known public IP — are useful and benign. They land on the VPS, the VPS responds back through the NAT pinhole, a direct connection forms, the IDS sees only outbound traffic and doesn’t react.
We wanted one half of the probing suppressed and the other half left alone.
What WireGuard Does for Free
WireGuard expresses this asymmetry trivially. A WireGuard peer config has an optional Endpoint= line:
[Peer]
PublicKey = <peer-pubkey>
AllowedIPs = 10.0.0.2/32
# No Endpoint line
Without Endpoint, the local WireGuard instance never initiates a handshake toward that peer. It only responds when the peer initiates. The two sides know each other’s identity (public keys) without both knowing each other’s address.
Direct connections still form. They just have to be initiated by the side that does know the endpoint. The responder replies to whatever NAT source the handshake arrived from.
This is the exact behaviour we needed: the VPS shouldn’t originate UDP toward the DC host’s announced endpoint, but the DC host can originate outward to the VPS just fine.
What Tailscale Doesn’t Do
Tailscale’s discovery layer is symmetric by default. Every node probes every peer it’s authorised to talk to. There is no exposed mechanism to gate that initiation asymmetrically — not in the ACL grammar, not in nodeAttrs, not on the tailscale up command line.
The closest thing is an undocumented debug environment variable, TS_DEBUG_DERP_ONLY=1, that disables outbound probing on a whole client. It’s symmetric (kills inbound and outbound), process-wide (not per-peer), and explicitly unsupported.
So the WireGuard primitive that solves the problem in two lines of config is not addressable from Tailscale’s policy layer.
What We Did
Two things, in order.
Short-term, we built a side-channel. A dedicated VPS — not on the tailnet — running plain WireGuard. On that VPS, the WireGuard config for the DC peers had no Endpoint= line. The VPS never originated UDP toward the DC. The DC initiated outward to the VPS’s public IP, the VPS responded into the NAT pinhole, and we had a working tunnel that emitted no IDS-triggering probes. We routed the affected services through this dedicated VPS for their public-FQDN traffic. The silent drops stopped.
It worked. It also cost us an additional VPS to monitor and patch, a separate tunnel topology to maintain alongside the Tailscale-managed one we already had, and config sprawl any time we added a service that hit the problem. The fix was correct. It was also exactly the kind of bespoke side-channel that Tailscale exists to make unnecessary.
Longer-term, we moved the DC. The replacement facility uses a different ISP with a less aggressive IDS, and the symptom doesn’t reproduce there. The dedicated workaround VPS is on the way out as part of the move.
What Tailscale Could Do
A property on a node — call it passive-discovery, exposed via the existing nodeAttrs mechanism — that means other clients in the tailnet do not originate discovery probes toward this node. The tagged node continues to probe outward normally, so direct connections still form whenever it initiates.
The result:
| Pair | Initiator possible? | Result |
|---|---|---|
| Tagged → public-IP peer | Tagged initiates outward | Direct |
| Public-IP peer → tagged | Suppressed by attribute | DERP |
| Tagged ↔ NAT’d peer | Neither side has a usable endpoint | DERP |
| Other pairs | Unchanged | As today |
The probes that get suppressed are exactly the ones that couldn’t have succeeded — futile probes inward to a NAT’d node. The probes that do get sent are the ones with a public endpoint to actually reach. Nothing functional is lost.
The engineering primitive almost certainly exists already. TS_DEBUG_DERP_ONLY proves the client can run without originating probes. What’s missing is the policy hook to make that behaviour addressable per peer, asymmetrically.
Why This Will Bite Other People
The specific ISP we hit is one of many networks where this can happen. Major enterprise IDS and SASE vendors — Palo Alto, Fortinet, Zscaler, several CASB products — have added Tailscale-traffic fingerprints to their default signature sets in the last 12–18 months. Anyone running a Tailscale client inside a network protected by those products is one signature update away from the same symptom: silently dropped public-IP traffic between specific pairs, no obvious connection to anything Tailscale-related until you notice the pattern.
For us this is now a historical problem, fixed by moving DCs. If you’re inside a network that fingerprints WireGuard-style probes and can’t move, the side-channel above is what works in plain WireGuard today. A passive-discovery attribute would make it work in Tailscale.
Takeaways
- The IDS reacts to the discovery layer, not the data plane. Once a direct connection is established, the IDS doesn’t care about the WireGuard traffic flowing over it. It’s the NAT-traversal probes that get fingerprinted.
- Tailscale’s symmetric probing has no off switch. There is no supported way to tell a client “don’t initiate discovery toward this peer.” Plain WireGuard gives you this for free via the optional
Endpoint=line. TS_DEBUG_DERP_ONLYis the wrong shape. Process-wide, symmetric, debug-only. The engineering primitive exists; the policy hook doesn’t.- A
passive-discoverynodeAttrsflag would close this cleanly. Destination-tagged, asymmetric, surgical against probes that couldn’t have succeeded anyway. The probes that would have worked still work. - If your ISP runs a Tailscale-aware IDS, you’ll find out the day it blacklists a pair. No logs, no errors, no warnings — just silent drops on specific source/destination combinations. If you’ve seen unexplained connection timeouts that survive a tcpdump on both sides, this is one thing to check.