AI for Web Devs: Faster Responses with HTTP Streaming

By admin
Welcome back to this series where we’re building web applications with

AI
ORG

tooling.

In the previous post, we got AI generated jokes into our

Qwik
ORG

application from

OpenAI API
PRODUCT

. It worked, but the user experience suffered because we had to wait until the

API
ORG

completed the entire response before updating the client.

A better experience, as you’ll know if you’ve used any AI chat tools, is to respond as soon as each bit of text is generated. It becomes a sort of teletype effect.

That’s what we’re going to build

today
DATE

using HTTP streams.

Prerequisites

Before we get into streams, we need to explore something with a

Qwik
ORG

quirk related to HTTP requests.

If we examine the current POST request being sent by the form, we can see that the returned payload isn’t just the plain text we returned from our action handler. Instead, it’s this sort of serialized data.

This is the result of how the Qwik Optimizer lazy loads assets and is necessary to properly handle the data as it comes back. Unfortunately, this prevents standard streaming responses.

So while routeAction$ and the Form component are super handy, we’ll have to do something else.

To their credit, the

Qwik
ORG

team does provide a well-documented approach for streaming responses. However, it involves their server$ function and async generator functions. This would probably be the right approach if we’re talking strictly about

Qwik
ORG

, but this series is for everyone. I’ll avoid this implementation, as it’s too specific to

Qwik
ORG

, and focus on broadly applicable concepts instead.

Refactor Server Logic

It sucks that we can’t use route actions because they’re great. So what can we use?


Qwik City
GPE

offers a few options. The best I found is middleware. They provide enough access to primitive tools that we can accomplish what we need, and the concepts will apply to other contexts besides

Qwik
ORG

.


Middleware
ORG

is essentially a set of functions that we can inject at various points within the request lifecycle of our route handler. We can define them by exporting named constants for the hooks we want to target (

onRequest
ORG

,

onGet
GPE

, onPost , onPut , onDelete ).

So instead of relying on a route action, we can use a middleware that hooks into any POST request by exporting an onPost middleware. In order to support streaming, we’ll want to return a standard Response object. We can do so by creating a Response object and passing it to the requestEvent.send() method.

Here’s a basic (non-streaming) example:

/** @type {import(‘@builder.io/qwik-city’).RequestHandler} */ export const onPost = (requestEvent) => { requestEvent.send(new Response(‘Hello Squirrel!’)) }

Before we tackle streaming, let’s get the same functionality from the old route action implemented with middleware. We can copy most of the code into the onPost middleware, but we won’t have access to formData .

Fortunately, we can recreate that data from the requestEvent.parseBody() method. We’ll also want to use requestEvent.send() to respond with the

OpenAI
ORG

data instead of a return statement.

/** @type {import(‘@builder.io/qwik-city’).RequestHandler} */ export const onPost = async (requestEvent) => { const OPENAI_API_KEY =

requestEvent.env.get(‘OPENAI_API_KEY
GPE

‘) const formData = await requestEvent.parseBody() const prompt = formData.prompt const body = { model: ‘gpt-3.5-turbo’, messages: [{ role: ‘user’, content: prompt }] } const response = await fetch(‘https://api.openai.com/v1/chat/completions’, { // … fetch options }) const data = await

response.json
PERSON

() const responseBody = data.choices[0].message.content requestEvent.send(new Response(responseBody)) }

Refactor Client Logic

Replacing the route actions has the unfortunate side effect of meaning we also can’t use the <Form> component anymore. We’ll have to use a regular HTML <form> element and recreate all the benefits we had before, including sending HTTP request with

JavaScript
ORG

, tracking the loading state, and accessing the results. Let’s refactor our client-side to support those features again.

We can break these requirements down to needing

two
CARDINAL

things, a

JavaScript
ORG

solution for submitting forms and a reactive state for managing loading states and results.

I’ve covered submitting HTML forms with

JavaScript
PRODUCT

in depth several times in the past:

So

today
DATE

I’ll just share the snippet, which I put inside a utils.js file in the root of my project. This

jsFormSubmit
GPE

function accepts an HTMLFormElement then constructs a fetch request based on the form attributes and returns the resulting Promise:

/** * @param {HTMLFormElement} form */ export function jsFormSubmit(form) { const url = new URL(form.action) const formData = new

FormData(form
ORG

) const searchParameters = new

URLSearchParams(formData
GPE

) /** @type {Parameters<typeof fetch>[1]} */ const fetchOptions = { method: form.method } if (form.method.toLowerCase() === ‘post’) { fetchOptions.body = form.enctype === ‘multipart/form-data’ ? formData : searchParameters } else {

url.search
CARDINAL

= searchParameters } return fetch(url, fetchOptions) }

This generic function can be used to submit any HTML form, so it’s handy to use in a submit event handler. Sweet!

As for the reactive data,

Qwik
ORG

provides

two
CARDINAL

options,

useStore
NORP

and useSignal . I prefer

useStore
NORP

, which allows us to create an object whose properties are reactive. Meaning changes to the object’s properties will automatically be reflected wherever they are referenced in the

UI
ORG

.

We can use

useStore
NORP

to create a “state” object in our component to track the loading state of the HTTP request as well as the text response.

import { $, component$,

useStore
NORP

} from "@builder.io/qwik"; // other setup logic export default component$(() => { const state =

useStore
NORP

({ isLoading: false, text: ”, }) // other component logic })

Next, we can update the template. Since we can no longer use the action object we had before, we can replace references from action.isRunning and action.value to state.isLoading and state.text , respectively (don’t ask me why I changed the property names 🤷‍♂️). I’ll also add a “submit” event handler to the form called handleSbumit , which we’ll look at shortly.

<main> <form method="post" preventdefault:submit onSubmit$={handleSubmit} > <div> <label for="prompt">Prompt</label> <textarea name="prompt" id="prompt"> Tell me a joke </textarea> </div> <button type="submit"

aria-disabled={state.isLoading}
WORK_OF_ART

> {state.isLoading ? ‘

One
CARDINAL

sec…’ : ‘Tell me’} </button> </form> {state.text && ( <article> <p>{state.text}</p> </article> )} </main>

Note that the <form> does not explicitly provide an action attribute. By default, an HTML form will submit data to the current URL, so we only need to set the method to

POST
GPE

and submit this form to trigger the onPost middleware we defined earlier.

Now, the last step to get this refactor working is defining handleSubmit . Just like we did in the previous post, we need to wrap an event handler inside

Qwik
ORG

’s $ function.

Inside the event handler, we’ll want to clear out any previous data from state.text , set state.isLoading to true , then pass the form’s

DOM
ORG

node to our fancy jsFormSubmit function. This should submit the HTTP request for us. Once it comes back, we can update state.text with the response body, and return state.isLoading to false .

const handleSubmit = $(async (event) => { state.text = ” state.isLoading = true /** @type {HTMLFormElement} */ const form = event.target const response = await jsFormSubmit(form) state.text = await response.text() state.isLoading = false })

OK! We should now have a client-side form that uses

JavaScript
ORG

to submit an HTTP request to the server while tracking the loading and response states, and updating the

UI
ORG

accordingly.

That was a lot of work to get the same solution we had before but with fewer features. BUT the key benefit is we now have direct access to the platform primitives we need to support streaming.

Enable Streaming on the Server

Before we start streaming responses from OpenAI, I think it’s helpful to start with a very basic example to get a better grasp of streams. Streams allow us to send small chunks of data over time. So as an example, let’s print out some iconic

David Bowie
PERSON

lyrics in tempo with the song, “Space Oddity“.

When we construct our Response object, instead of passing plain text, we’ll want to pass a stream. We’ll create the stream shortly, but here’s the idea:

/** @type {import(‘@builder.io/qwik-city’).RequestHandler} */ export const onPost = (requestEvent) => { requestEvent.send(new Response(stream)) }

We’ll create a very rudimentary ReadableStream using the

ReadableStream
ORG

constructor and pass it an optional parameter. This optional parameter can be an object with a start method that’s called when the stream is constructed.

The start method is responsible for the steam’s logic and has access to the stream controller , which is used to send data and close the stream.

const stream = new ReadableStream({

start(controller
PERSON

) { // Stream logic goes here } })

OK, let’s plan out that logic. We’ll have an array of song lyrics and a function to ‘sing’ them (pass them to the stream). The sing function will take the

first
ORDINAL

item in the array and pass that to the stream using the controller.enqueue() method. If it’s the last lyric in the list, we can close the stream with controller.close() . Otherwise, the sing method can call itself again after a short pause.

const stream = new ReadableStream({

start(controller
PERSON

) { const lyrics = [‘Ground’, ‘ control’, ‘ to major’, ‘

Tom
PERSON

.’] function sing() { const lyric = lyrics.shift() controller.enqueue(lyric) if (lyrics.length < 1) { controller.close() } else { setTimeout(sing,

1000
CARDINAL

) } } sing() } })

So each

second
ORDINAL

, for

four seconds
TIME

, this stream will send out the lyrics “Ground control to major

Tom
PERSON

.” Slick!

Because this stream will be used in the body of the

Response
ORG

, the connection will remain open for

four seconds
TIME

until the response completes. But the frontend will have access to each chunk of data as it arrives, rather than waiting

the full four seconds
TIME

.

This doesn’t speed up the total response time (in some cases, streams can increase response times), but it does allow for a faster perceived response, and that makes a better user experience.

Here’s what my code looks like:

/** @type {import(‘@builder.io/qwik-city’).RequestHandler} */ export const onPost:

RequestHandler
ORG

= async (requestEvent) => { const stream = new ReadableStream({

start(controller
PERSON

) { const lyrics = [‘Ground’, ‘ control’, ‘ to major’, ‘

Tom
PERSON

.’] function sing() { const lyric = lyrics.shift() controller.enqueue(lyric) if (lyrics.length < 1) { controller.close() } else { setTimeout(sing,

1000
CARDINAL

) } } sing() } }) requestEvent.send(new Response(stream)) }

Unfortunately, as it stands right now, the client will still be waiting

four seconds
TIME

before seeing the entire response, and that’s because we weren’t expecting a streamed response.

Let’s fix that.

Enable Streaming on the Client

Even when dealing with streams, the default browser behavior when receiving a response is to wait for it to complete. In order to get the behavior we want, we’ll need to use client-side

JavaScript
ORG

to make the request and process the streaming body of the response.

We’ve already tackled that

first
ORDINAL

part inside our handleSubmit function. Let’s start processing that response body.

We can access the

ReadableStream
ORG

from the response body’s getReader() method. This stream will have its own read() method that we can use to access the next chunk of data, as well as the information if the response is done streaming or not.

The only ‘gotcha’ is that the data in each chunk doesn’t come in as text, it comes in as a Uint8Array , which is “an array of

8
CARDINAL

-bit unsigned integers.” It’s basically the representation of the binary data, and you don’t really need to understand any deeper than that unless you want to sound very smart at a party (or boring).

The important thing to understand is that on their own, these data chunks aren’t very useful. To get something we can use, we’ll need to decode each chunk of data using a

TextDecoder
ORG

.

Ok, that’s a lot of theory. Let’s break down the logic and then look at some code.

When we get the response back, we need to:

Grab the reader from the response body using response.body.getReader() Setup a decoder using

TextDecoder
ORG

and a variable to track the streaming status. Process each chunk until the stream is complete, with a while loop that does this: Grab the next chunk’s data and stream status. Decode the data and use it to update our app’s state.text . Update the streaming status variable, terminating the loop when complete. Update the loading state of the app by setting state.isLoading to false .

The new handleSubmit function should look something like this:

const handleSubmit = $(async (event) => { state.text = ” state.isLoading = true /** @type {HTMLFormElement} */ const form = event.target const response = await jsFormSubmit(form) //

Parse
PERSON

streaming body const reader = response.body.getReader() const decoder = new

TextDecoder
ORG

() let isStillStreaming = true while(isStillStreaming) { const {value, done} = await

reader.read
PERSON

() const chunkValue = decoder.decode(value) state.text += chunkValue isStillStreaming = !done } state.isLoading = false })

Now, when I submit the form, I see something like:

“Ground

control

to major


Tom
PERSON

.”

Hell yeah!!!

OK, most of the work is down. Now we just need to replace our demo stream with the OpenAI response.

Stream OpenAI Response

Looking back at our original implementation, the

first
ORDINAL

thing we need to do is modify the request to OpenAI to let them know that we would like a streaming response. We can do that by setting the stream property in the fetch payload to true .

const body = { model: ‘gpt-3.5-turbo’, messages: [{ role: ‘user’, content: prompt }], stream: true } const response = await fetch(‘https://api.openai.com/v1/chat/completions’, { method: ‘post’, headers: { ‘Content-Type’: ‘application/json’, Authorization: `Bearer ${OPENAI_API_KEY}`, }, body:

JSON.stringify(body
PERSON

) })

Next, we could pipe the response from OpenAI directly to the client, but we might not want to do that. The data they send doesn’t really align with that we want to send to the client because it looks like this (

two
CARDINAL

chunks, one with data, and

one
CARDINAL

representing the end of the stream):

data: {"id":"chatcmpl-4bJZRnslkje3289REHFEH9ej2","object":"chat.completion.chunk","created":1690319476,"model":"gpt-3.5-turbo-0613","choiced":[{"index":0,"delta":{"content":"Because"},"finish_reason":"stop"}]}

data: [DONE]

Instead, what we’ll do is create our own stream, similar to the

David Bowie
PERSON

lyrics, that will do some setup, enqueue chunks of data into the stream, and close the stream. Let’s start with an outline:

const stream = new ReadableStream({ async

start(controller
PERSON

) { // Any setup before streaming // Send chunks of data // Close stream } })

Since we’re dealing with a streaming fetch response from

OpenAI
ORG

, a lot of the work we need to do here can actually be copied from the client-side stream handling. This part should look familiar:

const reader = response.body.getReader() const decoder = new

TextDecoder
ORG

() let isStillStreaming = true while(isStillStreaming) { const {value, done} = await

reader.read
PERSON

() const chunkValue = decoder.decode(value) // Here’s where things will be different isStillStreaming = !done }

This snippet was taken almost directly from the frontend stream processing example. The only difference is that we need to treat the data coming from OpenAI slightly differently. As we say, the chunks of data they send up will look something like” data: [JSON data or done] “. Another gotcha is that every once in a while, they’ll actually slip in

TWO
CARDINAL

of these data strings in a single streaming chunk. So here’s what I came up with for processing the data.

Create a Regular Expression to grab the rest of the string after “ data: “. For the unlikely event there are

more than one
CARDINAL

data strings, use a while loop to process every match in the string. If the current matches the closing condition (“ [DONE] “) close the stream. Otherwise, parse the data as

JSON
ORG

and enqueue the

first
ORDINAL

piece of text from the list of options ( json.choices[0].delta.content ). Fall back to an empty string if none is present. Lastly, in order to move to the next match, if there is one, we can use

RegExp.exec
DATE

() .

The logic is quite abstract without looking at code, so here’s what the whole stream looks like now:

const stream = new ReadableStream({ async

start(controller
PERSON

) { // Do work before streaming const reader = response.body.getReader() const decoder = new

TextDecoder
ORG

() let isStillStreaming = true while(isStillStreaming) { const {value, done} = await

reader.read
PERSON

() const chunkValue = decoder.decode(value) /** * Captures any string after the text `data: ` * @see https://regex101.com/r/R4QgmZ/1 */ const regex = /data:\s*(.*)/g let match = regex.exec(chunkValue) while (match !== null) { const payload = match[1] // Close stream if (payload === ‘[DONE]’) { controller.close() } else { const json = JSON.parse(payload) const text = json.choices[0].delta.content || ” // Send chunk of data controller.enqueue(text) } match = regex.exec(chunkValue) } isStillStreaming = !done } } })

Review

That should be everything we need to get streaming working. Hopefully it all makes sense and you got it working on your end.

I think it’s a good idea to review the flow to make sure we’ve got it:

The user submits the form, which gets intercepted and sent with

JavaScript
PRODUCT

. This is necessary to process the stream when it returns. The request is received by the action handler which forwards the data to the OpenAI API along with the setting to return the response as a stream. The OpenAI response will be sent back as a stream of chunks, some of which contain JSON and the last one being “ [DONE] “. Instead of passing the stream to the action response, we create a new stream to use in the response. Inside this stream, we process each chunk of data from the OpenAI response and convert it to something more useful before enqueuing it for the action response stream. When the

OpenAI
ORG

stream closes, we also close our action stream. The

JavaScript
ORG

handler on the client side will also process each chunk of data as it comes in and update the

UI
ORG

accordingly.

Conclusion

The app is working. It’s pretty cool. We covered a lot of interesting things

today
DATE

. Streams are very powerful, but also challenging and, especially when working within

Qwik
ORG

, there are a couple of little gotchas. But because we focused on low-level fundamentals, these concepts should apply across any framework.

As long as you have access to the platform and primitives like streams, requests, and response objects then this should work. That’s the beauty of fundamentals.

I think we got a pretty decent application going now. The only problem is right now we’re using a generic text input and asking users to fill in the entire prompt themselves. In fact, they can put in whatever they want. We’ll want to fix that in a future post, but the next post is going to step away from code and focus on understanding how the

AI
ORG

tools actually work.

I hope you’ve been enjoying this series and come back for the rest of it.

Thank you so much for reading. If you liked this article, and want to support me, the best ways to do so are to share it, sign up for my newsletter, and follow me on Twitter.

Originally published on austingil.com.