Generating A Table Of Contents Using jSoup And ColdFusion

By admin
Generating A Table Of Contents Using jSoup And ColdFusion

I’m authoring my Feature Flags Book using Markdown. Then, I’m converting the

Markdown
PRODUCT

into HTML using

Flexmark
ORG

and

ColdFusion
ORG

. And, once I have the raw HTML, I’m using jSoup to augment the

DOM
ORG

for output. As part of this, I’m dynamically injecting a Table of Contents (ToC). In the book, I’m only including the h2 headings; but, it got me thinking about how I might use jSoup and

ColdFusion
ORG

to create a more inclusive table of contents.

The primary issue here is the "Impedance Mismatch" between the structure of the HTML document and the structure of

the Table of Content
ORG

. The HTML structure is relatively flat (maybe even completely flat), wherein all of the heading elements can be siblings. We think of these headings as being hierarchical. But, this is a "mental model", not a structural model.

The table of contents, on the other hand, is (often) a hierarchical model, wherein "nested headers" are rendered as nested lists. As such, in order to dynamically render the TOC, we have to translate the implicit hierarchy of headers into an explicit hierarchy of data structures.

This is an algorithm that we intuitively understand; but, it’s not the easiest to describe. Given a header ( H ), we need to walk up the pending

Tree
PERSON

structure until we find a parent header ( P ) such that P<level> is semantically greater than H<level> . This means that we’ve located the direct parent of the given header; and, at that point, we can append the header ( H ) to the children of ( P ).

To explore this, I created a flat HTML file that has a series of header elements from H1 all the way down to

H6
FAC

(content abbreviated for the blog):

<h1>My Groovy Manifesto (h1)</h1> <h2>Chapter 1 (h2)</h2> <h3>Subsection 1-1 (h3)</h3> <h4>Subsection 1-1-1 (

h4)</h4
PERSON

> <h5>Subsection 1-1-1-1 (h5)</h5> <h6>Subsection 1-1-1-1-1 (h6)</h6> <h4>Subsection 1-1-2 (

h4)</h4
PERSON

> <h3>Subsection 1-2 (h3)</h3> <h3>Subsection 1-3 (h3)</h3> <h2>Chapter 2 (h2)</h2> <h3>Subsection 2-1 (h3)</h3> <h4>Subsection 2-1-1 (

h4)</h4
PERSON

> <h4>Subsection 2-1-2 (

h4)</h4
PERSON

> <h2>

Chapter 3
LAW

(h2)</h2> <h3>Subsection 3-1 (h3)</h3>

As you can see, all of the headers are siblings of each other – the "hierarchy" is semantic, not structural. Generating a structural table of contents in

ColdFusion
ORG

(

Lucee CFML
PERSON

) looks like this:

<cfscript> document =

javaNew
PERSON

( "org.jsoup.Jsoup" ) .parseBodyFragment( fileRead( "./content.htm" ) ) ; // The heading nodes in the HTML content are hierarchical from a semantic standpoint, // but are all siblings from a structural standpoint. As such, we need to translate // that

FLAT
ORG

structure into a

TREE
ORG

structure for our table of contents. Each section / // heading is going to contain a level and a set of sub-sections (children). toc = [ level:

0
CARDINAL

, children: [] ]; // In order to generate a hierarchical structure, we need to keep track of the // "parent" heading. This way, we’ll know when we encounter a child of the previous // heading; or, if we have to traverse back up the "parent chain" to find an // appropriate location in a different heading. parent = toc; // I determine how deep the table of contents should go. Not every single header // necessarily adds value to the ToC (from a user experience standpoint). maxLevelInToc =

5
CARDINAL

; for ( node in document.select( "h1, h2, h3, h4, h5, h6" ) ) { current = [ level: val( node.tagName().right( 1 ) ), title: node.text(), children: [], // NOTE: By default, we’re going to assume that the current heading node is a // subsection of the parent heading node. We’ll validate this below. parent: parent ]; if (

current.level > maxLevelInToc
ORG

) { continue; } // The current/parent assumption above is ONLY CORRECT if the current level is // greater than the parent level (ex, h3 vs h2). However, if the current level is // smaller than or equal to the parent level, we have to travel up the

TREE
ORG

until // we find the appropriate parent (ex, if current node is h2 and parent is h2, we // have to travel up the parent-chain until we find the h1 that will contain the // current h2). while (

current.level
NORP

<= current.parent.level ) { current.parent = current.parent.parent; } // Now that we’ve identified the correct parent/child relationship for our current // node, we can add it to the proper children collection and then track the // current node as the parent for subsequent headings. This will create a bi- // directional tree structure. current.parent.children.append( current ); parent = current; } // END: For-loop. // At this point, we’ve aggregated all of our document headings. Render them as a // series of nested lists, starting with our root TOC container. renderSection( toc.children ); // ——————————————————————————- // // ——————————————————————————- // /** * I render the given table-of-content (ToC) sections. This function calls itself * recursively while there are children to render. */ public void function renderSection( required array sections ) { if ( ! sections.len() ) { return; } “` <cfoutput> <ul> <cfloop item="local.section" array="#sections#"> <li> #encodeForHtml( section.title )

# #
MONEY

renderSection( section.children )# </li> </cfloop> </ul> </cfoutput> “` } /** * I create a new

Java
PRODUCT

class wrapper using the jSoup JAR files. */ public any function javaNew( required string className ) { var

jarPaths
PERSON

= [ expandPath( "

./jsoup-1.16.1.jar
GPE

" ) ]; return( createObject( "java", className,

jarPaths
GPE

) ); } </cfscript>

As you can see, our

Tree
PERSON

structure is bidirectional. As we iterate over the header elements, we build a connection from the parent heading and its subheadings as well as a connection from the subheading back to its parent. This bidirectionality allows us to walk back up the TOC structure when we need to find the appropriate semantic parent.

Once we have the nested data structure, we can then render it as a series of nested lists:

jSoup is such a wonderful tool. I was rather slow to adopt it (it’s been around for

years
DATE

). But, now that I have it as part of my

ColdFusion
ORG

tool-belt, I’m always finding more ways to leverage it.

Want to use code from this post? Check out the license.

Enjoyed This Post? ❤️ Share the Love With Your Friends! ❤️ Tweet This Provocative thoughts by @BenNadel – Generating A Table Of Contents Using jSoup And

ColdFusion https://www.bennadel.com/go/4521

PRODUCT