PF-based firewall. Since you’re a sensible person, you have various rate limits set in your DNS servers to prevent or at least mitigate various forms of denial of service attacks. One day , your DNS servers become extremely popular for whatever reason, your rate limits kick in, and your firewall abruptly stops allowing new connections in or out. What on earth happened?
The answer is that you ran out of room in the PFstate table. OpenBSD PF mostly works through state table entries, and when a rule that normally would create a new state table entry is unable to do so, the packet is dropped. This is somewhat documented in places like the max option for stateful rules:
Limits the number of concurrent states the rule may create. When this limit is reached, further packets that would create state are dropped until existing states time out.
(That this is more or less explicitly documented is better than it once was.)
Oneof the reasons that you can run out of state table entries despite your DNS servers dutifully rate-limiting their responses is that DNS is primarily UDP based and so PF doesn’t really know if a given UDP ‘connection’ is ‘closed’ and so should have its state table entries cleaned up more aggressively. Instead, all PF does for UDP is guess timeouts based on packet counts, and those packet counts are for each unique set of source IP, source port, destination IP, and destination port. If your DNS query sources vary their source port for each query, this can add up fast.
(As we’ve seen, even TCP connections can linger in the state table for some time after they’re closed.)
The current OpenBSD 7.3manual page for pf.conf says that the default maximum size of the state table is only 100,000 entries, which is often effectively 50,000 ‘connections’ (it’s not uncommon for each connection to create two state table entries). It doesn’t take a huge amount of bandwidth or a huge packets per second rate to exhaust that many state table entries, and it mostly doesn’t matter whether or not your DNS servers actually respond to the queries.
That may sound odd so let’s cover it explicitly. PFhas three states for UDP traffic; ‘ first ‘ if the source has only sent one packet, ‘multiple’ if both ends have sent packets, ie your DNS server responded, and ‘single’ if the source has sent multiple packets (with the same source port) without a response, ie your DNS server is dropping their queries and they’re retrying. The first
twostates default to 60 second timeouts and the third defaults to a 30 second timeout , and that’s after packets stop flowing. A DNS query source that keeps re-sending its query every fifteen seconds (with the same source port) will keep even a ‘single’ state entry alive forever.
As far as I can see, the only really good way to limit states created by UDP traffic is to set a max option on the rules involved. Often this will cover only halfof the states created by this traffic (for reasons covered in my entry on state table entries). You can try to limit the number of source IPs and states per IP that can be created (and do so across relevant rules), but it’s hard to come up with sensible numbers for both that won’t block legitimate traffic while also not letting people blow out your state table.
(I assume without checking that you can set all of max, max -src-nodes , and max-src-states , and then have the total number of state entries limited by max instead of the product of the latter two . This could be useful if you want some per-IP firewall limits in addition to the total state limit, perhaps to insure that one or a few IPs can’t eat up all of the total allowed states.)
All of this is surprising if you’re thinking of rate limiting and denial of service issues from the normal perspective of services on your hosts (such as DNSservers, or even web servers). In the host services world, if you reject or drop traffic through rate limiting, you’re done with the traffic and you don’t need to worry further (okay, yes, SYN cookies for TCP connection attempt traffic floods, but most things do that automatically today ). But your OpenBSD PF firewall is still keeping state for that traffic your host rate-limited or dropped, and that state can (and will) add up, especially for UDP traffic.