Introducing SelectorHound

Created on November 12, 2023 at 11:28 am

Reading Time: 4 minutes


A few years DATE back I ran into a difficult situation on a project: I needed to find out where a particular CSS selector was used.

I had a static version of the site, and so I did what any fool might do and I tried searching for it in my IDE. I had two CARDINAL problems though:

Writing a RegEx to essentially parse HTML is substantially more difficult dangerous than you might expect. The actual live version of the site was managed by a CMS ORG (content management system), and it would be much more difficult to know where that selector might end up

So after almost a day DATE of failing to produce a good-enough RegEx, I got an idea: What if I just scanned the live site for that selector?

In about the same amount of hours TIME it took for me to write a RegEx that didn’t always work, I was able to produce a node.js-based script that could scan a live site for the selector.

So with that, I got the bright idea to make it a proper NPM package that could run on the command line. And now I should introduce you.

Introducing SelectorHound

SelectorHound PERSON is on NPM and believe it or not it’s already at 2.1 CARDINAL !

It’s a Command Line Interface ORG (CLI) that offers a pretty robust set of options:

Give it a single selector or a CSS file

a CSS file Give it a URL to a sitemap or tell it to crawl your site

tell it to crawl your site Ask for a lot of details about HTML elements that match the selector, or a screenshot

a screenshot Tell it to treat pages like they’re a SPA ORG (Single Page Application) or like static HTML

What it’s good for

Do you have CSS on your site that you’d like to delete, but you’re uncertain if it’s used anywhere?

Are you looking for instances where one CARDINAL element may be next to another?

Would you like to know if your stylesheet has CSS that could be deleted?

Has malware infected your CMS ORG and started adding weird links?

Do you have calls to action that might be missing data attributes?

All of these are real world use-cases that I’ve used SelectorHound ORG for.

Try it out

First ORDINAL , install it

npm i -g selector-hound

Or, for more speed:

bun install -g selector-hound

Then run it

SelectorHound PERSON -u -s "h1"

Then look at what you got

It’ll tell you what it’s doing as it gets started

The colors aren’t the best. Relax. I’m working on it

And it will export all those URLs to a JSON file. This means you can customize the pages it scans. It’ll rely on that JSON file for every scan unless you pass -X to force it to generate a new sitemap file.

Then it’ll tell you when it’s finished and give you a nice summary of what it found.

You can modify the output file name with the -o flag. Your chosen name will be prepended to pages.json

Don’t forget to check the log

And then look at the results

The -e flag will give you all the sporty details you need to know exactly where the element is on the page and what it looks like.

The output can be pretty robust because it’ll give you results for every page that it found. I am working on a reporting feature that can summarize the results if you don’t want to wade through what could be thousands CARDINAL of lines of JSON.

Is it performant?

It’s faster than writing a RegEx to scan your codebase, that’s for sure.

I’ve done a little bit of testing and found that, if you were looking for a single HTML element, it might take on average .52s per page. If you install with Bun, you will get maybe a .1s ORG gain.

I’ve used SelectorHound ORG with sitemaps containing up to 2000 CARDINAL links, and with crawling that produced up to 500 CARDINAL pages.

Activating Puppeteer to either take screenshots or just expect it to be a SPA ORG will slow things down significantly, so use that with caution.

Where can you see the code?

It’s over on Github. I welcome contributions and feature requests.

Connecting to Connected... Page load complete