Created on November 12, 2023 at 11:42 am

Recently, our IMAP ORG server had unusually high CPU usage and was increasingly close to saturating its CPU. When I investigated with ‘top’ it was easy to see the culprit processes, but when I checked what they were doing with the strace command, they were all busy madly doing IO, in fact processing recursive IMAP LIST commands by walking around in the filesystem. Processes that intensely do IO like this normally wind up in "iowait", not in active CPU usage (whether user or system CPU usage). Except here these processes were, using huge amounts of system CPU time.

What was happening is that these IMAP processes trying to do recursive IMAP LISTs of all available ‘mail folders’ had managed to escape into ‘ /sys ‘. The processes were working away more or less endlessly because Dovecot GPE (the IMAP ORG server software we use) makes the entirely defensible but less common decision to follow symbolic links when traversing directory trees, and Linux ORG ‘s /sys has a lot of them (and may have ones that form cycles, so a directory traversal that follows symbolic links may never terminate). Since /sys is a virtual filesystem that is handled entirely inside the Linux PRODUCT kernel, traversing it and reading directories from it does no actual IO to actual disks. Instead, it’s all handled in kernel code, and all of the work to traverse around it, list directories, and so on shows up as system time.

Operating on a virtual filesystem isn’t the only way that a program can turn a high IO rate into high system time. You can get the same effect if you’re repeatedly re-reading the same data that the kernel has cached in memory. Since the kernel can satisfy your IO requests without going to disk, all of the effort required turns into system CPU time inside the kernel. This is probably easiest to have happen with reading data from files, but you can also have programs that are repeatedly scanning the same directories or calling stat() (or lstat() ) on the same filesystem names. All of those can wind up as entirely in-kernel activities because the modern Linux PRODUCT kernel is very good at caching things.

(Most people’s IMAP servers don’t have the sort of historical configuration issues we have that create these exciting adventures.)

Connecting to Connected... Page load complete