Introducing SelectorHound

By admin
Reading Time:

4 minutes

TIME


A few years
DATE

back I ran into a difficult situation on a project: I needed to find out where a particular CSS selector was used.

I had a static version of the site, and so I did what any fool might do and I tried searching for it in my IDE. I had

two
CARDINAL

problems though:

Writing a RegEx to essentially parse HTML is substantially more difficult dangerous than you might expect. The actual live version of the site was managed by a

CMS
ORG

(content management system), and it would be much more difficult to know where that selector might end up

So after

almost a day
DATE

of failing to produce a good-enough RegEx, I got an idea: What if I just scanned the live site for that selector?

In about the same amount of

hours
TIME

it took for me to write a RegEx that didn’t always work, I was able to produce a node.js-based script that could scan a live site for the selector.

So with that, I got the bright idea to make it a proper NPM package that could run on the command line. And now I should introduce you.

Introducing SelectorHound


SelectorHound
PERSON

is on NPM and believe it or not it’s already at

2.1
CARDINAL

!

It’s a

Command Line Interface
ORG

(CLI) that offers a pretty robust set of options:

Give it a single selector or a CSS file

a CSS file Give it a URL to a sitemap or tell it to crawl your site

tell it to crawl your site Ask for a lot of details about HTML elements that match the selector, or a screenshot

a screenshot Tell it to treat pages like they’re a

SPA
ORG

(Single Page Application) or like static HTML

What it’s good for

Do you have CSS on your site that you’d like to delete, but you’re uncertain if it’s used anywhere?

Are you looking for instances where

one
CARDINAL

element may be next to another?

Would you like to know if your stylesheet has CSS that could be deleted?

Has malware infected your

CMS
ORG

and started adding weird links?

Do you have calls to action that might be missing data attributes?

All of these are real world use-cases that I’ve used

SelectorHound
ORG

for.

Try it out


First
ORDINAL

, install it

npm i -g selector-hound

Or, for more speed:

bun install -g selector-hound

Then run it


SelectorHound
PERSON

-u https://blog.frankmtaylor.com/sitemap.xml -s "h1"

Then look at what you got

It’ll tell you what it’s doing as it gets started

The colors aren’t the best. Relax. I’m working on it

And it will export all those URLs to a JSON file. This means you can customize the pages it scans. It’ll rely on that JSON file for every scan unless you pass -X to force it to generate a new sitemap file.

Then it’ll tell you when it’s finished and give you a nice summary of what it found.

You can modify the output file name with the -o flag. Your chosen name will be prepended to pages.json

Don’t forget to check the log

And then look at the results

The -e flag will give you all the sporty details you need to know exactly where the element is on the page and what it looks like.

The output can be pretty robust because it’ll give you results for every page that it found. I am working on a reporting feature that can summarize the results if you don’t want to wade through what could be

thousands
CARDINAL

of lines of JSON.

Is it performant?

It’s faster than writing a RegEx to scan your codebase, that’s for sure.

I’ve done a little bit of testing and found that, if you were looking for a single HTML element, it might take on average .52s per page. If you install with Bun, you will get maybe a

.1s
ORG

gain.

I’ve used

SelectorHound
ORG

with sitemaps containing up to

2000
CARDINAL

links, and with crawling that produced up to

500
CARDINAL

pages.

Activating Puppeteer to either take screenshots or just expect it to be a

SPA
ORG

will slow things down significantly, so use that with caution.

Where can you see the code?

It’s over on Github. I welcome contributions and feature requests.