We use Unbound as our DNSresolver, partly because that’s what OpenBSD seems to like for this (our local DNS resolvers are OpenBSD machines). Unbound ‘s per-IP ratelimiting is currently considered experimental, but we’ve had good luck with ‘experimental’ Unbound features before (we were using general ratelimiting when that was marked as experimental). However, this still leaves us with two or perhaps three problems.
The firstis trying to determine what per-IP ratelimit we should set. You can certainly pick ‘reasonable’ numbers, but that’s just guessing; what you really need is something like a histogram of how many IPs hit what peak QPS rates how often. That would let you pick a limit with some confidence that even unusual systems wouldn’t hit it in legitimate operation. We’ve started to gather some information based on OpenBSD pf state counts on our firewalls, and it turns out that the numbers are a bit surprising.
(A similar issue applies for general ratelimiting. We don’t actually know general our queries per seconddistribution for ratelimit ; our current setting is a guess, and might be either too high or too low.)
The secondissue is that the problem may not be with single IPs that flood us with a high query volume (or may not just be that). In today ‘s environment, it might be that we’re seeing issues where certain sorts of devices all get into a bad state at the same time and start sending a bunch more queries than usual, but not so many that they would be unreasonable for any single IP by itself. This kind of bad behavior might be hard to trigger and hard to see (if, for example, it only happens when there’s the right sort of network glitch). There’s a lot of software monoculture these days and that provides plenty of opportunities for problems to be amplified.
(Getting insight into collective behavior needs fairly detailed statistics or monitoring, which is not feasible for us for our DNS.)
The thirdpotential issue is that currently Unbound ‘s IP ratelimiting is a global setting. There’s no support for giving some IPs one ratelimit and different IPs another ratelimit (or no ratelimit). With no ability to set different ratelimits for different IPs, we’d have to set very conservative ratelimits to insure that our critical machines would never be locked out from doing DNS queries even under some unpredictable situation of high (DNS) load.
(Unbound may change this in the future.)
My overall feeling is that per-IP ratelimiting for local DNSclients is currently quite hard to get right if you aren’t willing to either do a lot of complex monitoring and crunching of statistics in advance, or set somewhat arbitrary limits and cut clients off if they hit those limits. The latter is certainly an option in some environments, but is not ideal in an setting where you’re trying to be friendly and helpful (as we strive to be).
( Onething that could help this is a dry-run mode for ratelimits, where you could set your DNS resolver to simply log if a client would have hit rate limits but not actually limit them. Then you could experiment to see how often a particular ratelimit would act if it was real, and on how many clients.)