Sockets – Under The Hood: Understanding Truncated Request Behaviour

Created on November 12, 2023 at 11:32 am

I’m thoroughly pleased to be able to say that I finally understand the issue that’s been bothering me on Report URI for a few weeks DATE now, and this is the blog post that’s going to explain everything!

Report URI

If you haven’t read the first ORDINAL

two CARDINAL blog posts related to this issue, it might help you to get some background on what I’m going to be talking about here:

Unravelling The Mystery Of Truncated WORK_OF_ART POST Requests On Report URI

Processing Truncated Requests? A PHP Debugging Deep Dive

This weird behaviour of our application receiving and then trying to process truncated POST requests has lead me on quite a journey of learning, and I can finally explain what’s happening!

The Story So Far

WORK_OF_ART In the previous blog posts, I’d narrowed it down to what was happening, but not why it was happening, and I was still stumped. After patching PHP ORG , I could see it was calling read() on the socket and reading until 0 CARDINAL was returned, but 0 CARDINAL was being returned before expected. Here’s the log of the behaviour from the previous blog post.

Sep 29 DATE

20:23:31 TIME test-server php: Accepted connection: 0ce264eae07198c3c59ae90d04127c39 Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 1024737 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 1008353 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY (991969 bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 975585 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY (959201 bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY (942817 bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 926433 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 910049 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 893665 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY (877281 bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 860897 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 844513 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 828129 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: Read 16384 bytes QUANTITY ( 811745 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: errno 0: Success Sep 29 DATE

20:23:31 TIME test-server php: Read 0 bytes QUANTITY ( 811745 CARDINAL bytes remaining) Sep 29 DATE

20:23:31 TIME test-server php: POST data reading complete. Size: 229376 bytes QUANTITY (out of 1041121 bytes QUANTITY )

I knew that Nginx ORG was receiving the full request, so the problem was happening in the fastcgi_pass to PHP-FPM, but I couldn’t wrap my head around the circumstances. I wanted to understand more on the underlying mechanics of how Nginx PERSON was talking to PHP ORG and, more specifically, what was failing in that process.

bpftrace

In the last blog I had to get familiar with strace for the first ORDINAL time, and this time around, I’d asked for some assistance from Ignat Korchagin PERSON who kindly whipped up several bpftrace commands, and, could also reliably reproduce the issue. Hopefully, by monitoring things a little more closely between Nginx ORG and PHP ORG , I could gain an understanding of precisely what was happening.

After monitoring my test server when the issue was happening, I could see Nginx hitting EAGAIN (source) when trying to write to the socket, and I could see PHP getting EPIPE (source) when trying to read from the socket, but it still wasn’t clear to me how that was occurring. The EAGAIN error is likely because the socket buffer is full when Nginx ORG is sending, so it can’t send any more data, and the EPIPE comes when PHP ORG tries to read from the socket that has since been closed by Nginx ORG . Here’s the command.

$ bpftrace -e ‘ MONEY kretfunc:unix_stream_sendmsg /retval < 0/ { printf("%d

", retval PERSON ); }’ Attaching 1 CARDINAL probe… -11 -11 [-11 repeated 20x] -11 -11 -32 -32 [-32 repeated 12x] -32 -32

The log reads like Nginx ORG is trying to write to the socket which has a full buffer, and PHP ORG is trying to read from a socket that’s closed, but if PHP ORG has called accept() , why would it not be able to read() ? The accept() call is followed by read() calls in the PHP ORG source almost immediately so it didn’t make sense. I also saw 12x EPIPE (-32 ORG ) errors coming from PHP ORG and I had exactly 12x truncated payload errors in syslog using the same logging from the previous blog post.

Digging a little deeper, I wanted to see when the calls to connect() and accept() were being made.

$ bpftrace -e MONEY ‘tracepoint:syscalls:sys_enter_connect /comm == "nginx"/ { printf("connect: %s

", strftime("%H:%M:%S:%f", nsecs ORG )); } tracepoint:syscalls:sys_enter_accept /comm == "php-fpm8.2"/ { printf("accept DATE : %s

", strftime("%H:%M:%S:%f", nsecs ORG )); }’

After firing another batch of 20 CARDINAL requests, I could see all of the connect() calls from nginx happen right away, but the accept() calls from PHP ORG were much more spaced out.

Attaching 2 CARDINAL probes… connect: 10:42:27:244851 connect: 10:42:27:576388 CARDINAL connect: 10:42:27:643525 CARDINAL connect: 10:42:27:750282 CARDINAL connect: 10:42:27:923155 CARDINAL connect: 10:42:28:054478 CARDINAL connect: 10:42:28:419101 CARDINAL accept: 10:42:28:522173 CARDINAL connect: 10:42:28:716231 CARDINAL connect: 10:42:29:129984 CARDINAL connect: 10:42:29:146868 connect: 10:42:29:240262 connect: 10:42:29:307529 connect: 10:42:29:310247 CARDINAL accept: 10:42:29:523505 connect: 10:42:29:825400 CARDINAL connect: 10:42:29:832304 CARDINAL connect: 10:42:29:978008 CARDINAL connect: 10:42:30:155472 CARDINAL connect: 10:42:33:271992 connect: 10:42:35:891986 CARDINAL connect: 10:42:37:164266 CARDINAL accept: 10:42:37:257787 CARDINAL accept: 10:42:37:588539 accept: 10:42:38:534605 CARDINAL accept: 10:42:39:534456 CARDINAL accept: 10:42:47:260393 accept: 10:42:47:590909 accept: 10:42:48:537622 DATE accept: 10:42:49:536993 CARDINAL accept: 10:42:57:263766 accept: 10:42:57:593434 CARDINAL accept: 10:42:58:540436 accept: 10:42:59:539849 CARDINAL accept: 10:43:07:267292 accept: 10:43:07:595731 accept: 10:43:08:543073 accept: 10:43:09:542913 CARDINAL accept: 10:43:17:271296 accept: 10:43:17:598086

I still wasn’t clear on how this was working because PHP ORG is calling accept() and read() very close together, so if PHP ORG can accept, why is the read failing with a partial payload?!

Oct 12 DATE

10:42:47 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:42:47 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:42:48 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:42:49 ORG test-server php: JSON decode failed: 1041121 229376 Oct 12 10:42:57 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:42:57 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:42:58 test-server php: JSON decode failed: 1041121 229376 Oct 12 10:42:59 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:43:07 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:43:07 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:43:08 test-server php: JSON decode failed: 1041121 229376 Oct 12 10:43:09 ORG test-server php: JSON decode failed: 1041121 229376 Oct 12 10:43:17 TIME test-server php: JSON decode failed: 1041121 229376 Oct 12 10:43:18 ORG test-server php: message repeated 2 CARDINAL times: [ JSON decode failed: 1041121 229376] Oct 12 DATE 10:43:19 test-server php: JSON decode failed: 1041121 229376

The truncated payload issued showing in syslog matched up to the late accept() calls from PHP ORG ! It was at this point I was prompted to look at the backlog parameter of listen() and it became clear that this behaviour can be explained by backlog , something I wasn’t familiar with.

int listen(int sockfd, int backlog);

When calling listen() , the second ORDINAL parameter provided is backlog , which is described as follows in the docs:

The backlog argument defines the maximum length to which the

queue of pending connections for sockfd may grow.

So now we can have an application, like Nginx ORG , call connect() and succeed but without the application on the other side, PHP ORG , calling accept() ! This is finally starting to make a little more sense with the behaviour I’m seeing. It also turns out that Nginx ORG can write() until the receive buffer on the socket is full, and that is when it’s going to hit EAGAIN . I confirmed this using strace :

cat nginx.strace | grep -E ‘ sendfile\(|connect\(|close\(|writev\ PERSON (‘ 75920 19:43:26 connect(3, {sa_family=AF_UNIX, sun_path="/run/php/php8.2-fpm.sock"}, 110 CARDINAL ) = 0 75920 19:43:26 PERSON writev(3, [{iov_base="\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\2[\5\0\t\0PATH_I"…, iov_len=648 ORG }], 1 CARDINAL ) = 648 CARDINAL 75920 19:43:26 TIME

sendfile(3 DATE , 29 DATE , [ 0 CARDINAL ] => [ 32768 CARDINAL ], 32768 CARDINAL ) = 32768 CARDINAL 75920 19:43:26 PERSON writev(3, [{iov_base="\1\5\0\1\200\0\0\0", iov_len=8}], 1 CARDINAL ) = 8 CARDINAL 75920 19:43:26 TIME

sendfile(3 DATE , 29 DATE , [ 32768 CARDINAL ] => [ 65536 CARDINAL ], 32768 CARDINAL ) = 32768 CARDINAL 75920 19:43:26 PERSON writev(3, [{iov_base="\1\5\0\1\200\0\0\0", iov_len=8}], 1 CARDINAL ) = 8 CARDINAL 75920 19:43:26 TIME

sendfile(3 DATE , 29 DATE , [ 65536 CARDINAL ] => [ 98304 CARDINAL ], 32768 CARDINAL ) = 32768 CARDINAL 75920 19:43:26 PERSON writev(3, [{iov_base="\1\5\0\1\200\0\0\0", iov_len=8}], 1 CARDINAL ) = 8 CARDINAL 75920 19:43:26 TIME

sendfile(3 DATE , 29 DATE , [ 98304 CARDINAL ] => [ 131072 CARDINAL ], 32768 CARDINAL ) = 32768 CARDINAL 75920 19:43:26 PERSON writev(3, [{iov_base="\1\5\0\1\200\0\0\0", iov_len=8}], 1 CARDINAL ) = 8 CARDINAL 75920 19:43:26 TIME

sendfile(3 DATE , 29 DATE , [ 131072 CARDINAL ] => [ 163840 CARDINAL ], 32768 CARDINAL ) = 32768 CARDINAL 75920 19:43:26 PERSON writev(3, [{iov_base="\1\5\0\1\200\0\0\0", iov_len=8}], 1 CARDINAL ) = 8 CARDINAL 75920 19:43:26 TIME

sendfile(3 DATE , 29 DATE , [ 163840 CARDINAL ] => [ 196608 CARDINAL ], 32768 CARDINAL ) = 32768 CARDINAL 75920 19:43:26 PERSON writev(3, [{iov_base="\1\5\0\1\200\0\0\0", iov_len=8}], 1 CARDINAL ) = 8 CARDINAL 75920 19:43:26 TIME

sendfile(3 DATE , 29 DATE , [ 196608 CARDINAL ] => [ 229376 CARDINAL ], 32768 CARDINAL ) = 32768 CARDINAL 75920 19:43:26 PERSON writev(3, [{iov_base="\1\5\0\1\200\0\0\0", iov_len=8}], 1 CARDINAL ) = -1 EAGAIN PERSON (Resource temporarily unavailable)

I believe that Nginx ORG is using writev() to send the fastcgi headers, and using sendfile() to send chunks of the body, which comes from this function in the code. It certainly seems like this is starting to make a little sense. Looking at the PHP ORG config, I wanted to see what the backlog value was because it’s not something I ever recall setting. I found the following in our config:

; Set listen(2) backlog. ; Default Value ORG : 511 CARDINAL ( -1 ORG on Linux PRODUCT , FreeBSD PRODUCT and OpenBSD PRODUCT ) ;listen.backlog = 511 CARDINAL

With a default value of -1 ORG , I was guessing that backlog might be cast to an unsigned int by the kernel, which it is (source).

if ((unsigned int)backlog > somaxconn ORG ) backlog = somaxconn GPE ;

You can also see that, as stated in the manual (source), if the defined backlog value is greater than somaxconn GPE , it will be silently truncated to that value. Because -1 ORG will actually be 4294967295 PERSON , by default, PHP ORG will always set backlog to the value of somaxconn GPE , effectively letting the system decide.

# cat /proc/sys/net/core/somaxconn 4096 CARDINAL

This means we can have up to 4,096 CARDINAL connections pending in the backlog queue, which is quite a few!

The Theory

At this point, I believe something like the following is happening during the lifecycle of the request.

Nginx PERSON makes the connect() syscall and begins writing to the socket. While PHP ORG is able, it processes the initial requests. We see a small number of requests succeeding. PHP ORG becomes overloaded and stops processing new, inbound requests. Nginx continues to make new connections and fill the socket buffer with data. Once the receive buffer is full, Nginx ORG will receive EAGAIN and likely wait for something like fastcgi_send_timeout (source). The send will timeout and Nginx ORG will close the connection. PHP ORG becomes available and accepts the connection, starting to read until the receive buffer has been consumed. Once the buffer has been read, PHP ORG will get EPIPE and process the content read from the buffer.

Sounds logical, right? It should also be easy enough to test this now that I know what I’m aiming for so I tinkered with some of the config parameters in Nginx ORG and it did indeed seem fastcgi_send_timeout was the main factor. By reducing the send timeout, I would see more requests failing to parse in general, and I assume that’s because Nginx ORG is timing out sooner so PHP ORG has less time to get to the backlog .

Testing The Theory!

Now for the fun part, where you actually find out if you were right, or you were completely off the mark! I was looking into ways that I could reliably reproduce this and my first ORDINAL effort was to take a few steps. I reduced the number of child processes PHP ORG can spawn to one, so it would be easy to keep it busy, and I reduced the send timeout on Nginx ORG to three seconds TIME so it would time out very quickly. With this basic setup I could send two CARDINAL requests sequentially and the first ORDINAL one would succeed and the second ORDINAL one would reliably fail, 100% PERCENT of the time. I wanted something better than that though, something that would allow me to trigger this on a single request, and that’s when João Poupino PERSON dropped this little gem in Slack NORP !

This gives me the ability to talk directly to the PHP ORG socket, using the FastCGI protocol, and completely cut Nginx ORG out of the equation. The script also has a feature that allows you to send only a partial request, where it will randomly close() the socket somewhere during the transmission of the payload. With this, it should now be possible to reproduce this problem with only a single request. The moment of truth….

python3 send-fastcgi-request.py unix:/run/php/php8.2-fpm.sock post.data –partial [102935] Sending request 1/1 [ 102936 CARDINAL ] Connecting to unix:/run/php/php8.2-fpm.sock… [ 102936 CARDINAL ] Sending 37 CARDINAL FastCGI records… [ 102936 CARDINAL ] * Sent record 1/37 CARDINAL ( 8 CARDINAL + 8 bytes) [ 102936 CARDINAL ] * Sent record 2/37 CARDINAL ( 8 CARDINAL + 512 bytes) [ 102936 CARDINAL ] * Sent record 3/37 CARDINAL ( 8 CARDINAL + 0 bytes) [ 102936 CARDINAL ] * Sent record 4/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 5/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 6/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 7/37 ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 8/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 9/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 10/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 11/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 12/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 13/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 14/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 15/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 16/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 17/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 18/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 19/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 20/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 21/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 22/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 23/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] * Sent record 24/37 CARDINAL ( 8 CARDINAL + 32768 bytes) [ 102936 CARDINAL ] Closing socket after record 24 CARDINAL !

The script started to send my large POST payload and decided to close the socket after sending only 24 CARDINAL out of the 37 CARDINAL chunks of the request. Time to check syslog and see if we can find what I’m hoping for and expecting.

Oct 13 DATE

13:33:57 TIME test-server php: POST data reading complete. Size: 688128 CARDINAL bytes (out of 1069923 CARDINAL bytes) Oct 13 DATE 13:34:07 test-server php: JSON decode failed: 1069923 688128 CARDINAL

Finally! 🍾🥂

It feels awesome to get to the bottom of something that’s been bugging you so much! I fixed the issue a whole two CARDINAL blog posts ago, and I could have moved on, but sometimes, I really just need to understand why something is happening, and honestly, it was quite a fun journey getting here. I owe a big thanks to João PERSON and Ignat PERSON , who collectively must have saved me countless days of effort that I might not even have had time for.

I’ve also skipped over a few of the debugging steps above because this was becoming another lengthy blog post, but I did, during and after debugging, spend a lot of time trawling through the Nginx ORG source to understand fully what’s happening and also just to confirm my understanding.

The Nginx ORG behaviour of using writev() and sendfile() can be seen here: https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/os/unix/ngx_linux_sendfile_chain.c#L50

Nginx PERSON adds a timeout when receiving EAGAIN here, using fastcgi_send_timeout . I also noticed in the Nginx ORG source that it makes use of epoll() , which explains earlier why I was only seeing a single EAGAIN per request: https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/http/ngx_http_upstream.c#L2093 PERSON

The event timer expires in Nginx ORG here: https://github.com/nginx/nginx/blob/9f8d60081cd4eefa5fcf0df275d784d621290b9b/src/event/ngx_event_timer.c#L54

I also spotted that PHP ORG was respawning the child processes when I was testing and that they’d always have new PIDs after a test run. It seems that if the child is idle for too long, PHP ORG will kill and respawn it, possibly based on this timeout value: https://github.com/php/php-src/blob/47c6b3bd452d2932af5c2f021b10ab2aaed01fb1/sapi/fpm/fpm/fpm_conf.c#L622 PERSON

All in all, I can now say I’m happy that I understand everything that was happening with this issue and can finally put it to bed!

As a final closing note, just to nicely close things off, I got a response on the PHP ORG bug that I filed in the previous blog post and it seems that this finding is indeed a bug and will be patched!

Connecting to blog.lzomedia.com... Connected... Page load complete