OOMFromCgroupStatisticWish

By admin
Under ertain circumstances,

Linux
PRODUCT

will trigger the Out-Of-Memory Killer and kill some process. For some time, there have been

two
CARDINAL

general ways for this to happen, either a global

OOM
LOC

kill because the kernel thinks it’s totally out of memory, or a per-cgroup based

OOM
ORG

kill where a

cgroup
ORG

has a memory limit.

These days
DATE

the latter is quite easy to set up through systemd memory limits, especially user memory limits.

The kernel exposes a vmstat statistic for total

OOM
ORG

kills from all causes, as ‘ oom_kill ‘ in /proc/vmstat ; this is probably being surfaced in your local metrics collection agent under some name. Unfortunately, as far as I know the kernel doesn’t expose a simple statistic for how many of those

OOM
ORG

kills are global

OOM
ORG

kills instead of

cgroup OOM kills
ORG

. This difference is of quite some interest to people monitoring their systems, because a global

OOM
ORG

kill is probably important while

a cgroup OOM
PERSON

kill may be entirely expected.

Each

cgroup
ORG

does have information about

OOM
ORG

kills in its hierarchy (or sometimes itself only, if you used the memory_localevents cgroups v2 mount option, per

cgroups(7
PERSON

)). This information is in the ‘ memory.events ‘ file, but as covered in the cgroups v2 documentation, this file is only present in non-root cgroups, which means that you can’t find a system wide version of this information in

one
CARDINAL

place. If you know on a specific system that only one top level

cgroup
PERSON

can have

OOM
ORG

kills, you can perhaps monitor that, but otherwise you need something more sophisticated (and in theory you might miss transient top level cgroups, although in practice most are persistent).

The kernel definitely knows this information; the kernel log messages for global

OOM
ORG

kills are distinctly different from the kernel log messages for

cgroup OOM
PERSON

kills. So the kernel could expose this information, for example as a new /proc/vmstat field or

two
CARDINAL

; it just doesn’t (currently, as of

fall 2023
DATE

).

(Someday we may add a

Prometheus
PERSON

cgroups metrics exporter to our host agents in our

Prometheus
GPE

environment and so collect this information, but so far I haven’t found a

cgroup
ORG

exporter that I like and that provides the information I want to know.)