XML Streamify – David Bushell – Freelance Web Design (UK)

Created on November 12, 2023 at 10:52 am

Friday 20 Oct 2023 DATE

I’ve only gone and published an XML ORG parsing library!

It’s called XML Streamify on GitHub and it works like this:

for await ( const node of parse ( ‘https://dbushell.com/rss.xml’ ) ) { if ( node . is ( ‘channel’ , ‘item’ ) ) { console . log ( node . first ORDINAL ( ‘title’ ) ?. innerText PERSON ) ; } }

This example outputs blog post titles from my RSS feed.

The parse function is an async generator in front of a custom TransformStream ORG . parse will yield nodes as the XML ORG document if fetched as parsed.

This allows you to work with data before the fetch is complete.

const controller = new AbortController ORG ( ) ; const parser = parse ( ‘https://dbushell.com/rss.xml’ , { signal : controller . signal } ) ; for await ( const node of parser ) { if ( node . is ( ‘channel’ , ‘lastBuildDate’ ) ) { console . log ( new Date ( node . innerText ORG ) ) ; controller . abort ( ) ; } }

This example logs the lastBuildDate and then aborts the fetch request thus ending the parse process. It doesn’t wait for the entire file to download. This can save time and bandwidth. It’s not the fastest XML PRODUCT parser on the market but it gets the job done before other parsers even start. And it is actually quiet fast.

I primarily designed this to handle RSS feeds. Podcast feeds, for example, can be several megabytes with hundreds CARDINAL of episodes. This library makes it possible to read the latest episodes and skip the rest.

Cross-platform

XML Streamify should work in all JavaScript runtimes; Bun, Deno ORG , Node ORG , and web browsers. Bun has issues with the abort controller. It’s not my fault, I think, I’m still debugging.

Right now only Firefox ORG supports async iterators with a ReadableStream ORG . Other browsers can use a tiny polyfill in the meantime. A frustrating majority of RSS ORG feeds lack the appropriate CORS headers. I thought about building an RSS ORG reader web app but it’s practically impossible without at least a proxy server.

This project doesn’t aim to be compliant with the full XML spec (No way I’m reading that). It’s just enough to handle RSS. It can work with XHTML from limited testing.

This is still early development so the API ORG can change as I test it further. Once it becomes stable I might publish NPM packages etc.

Connecting to blog.lzomedia.com... Connected... Page load complete