A few yearsback I ran into a difficult situation on a project: I needed to find out where a particular CSS selector was used.
I had a static version of the site, and so I did what any fool might do and I tried searching for it in my IDE. I had twoproblems though:
Writing a RegEx to essentially parse HTML is substantially more difficult dangerous than you might expect. The actual live version of the site was managed by a CMS(content management system), and it would be much more difficult to know where that selector might end up
So after almost a dayof failing to produce a good-enough RegEx, I got an idea: What if I just scanned the live site for that selector?
In about the same amount of hoursit took for me to write a RegEx that didn’t always work, I was able to produce a node.js-based script that could scan a live site for the selector.
So with that, I got the bright idea to make it a proper NPM package that could run on the command line. And now I should introduce you.
SelectorHoundis on NPM and believe it or not it’s already at 2.1 !
It’s a Command Line Interface(CLI) that offers a pretty robust set of options:
Give it a single selector or a CSS file
a CSS file Give it a URL to a sitemap or tell it to crawl your site
tell it to crawl your site Ask for a lot of details about HTML elements that match the selector, or a screenshot
a screenshot Tell it to treat pages like they’re a SPA(Single Page Application) or like static HTML
What it’s good for
Do you have CSS on your site that you’d like to delete, but you’re uncertain if it’s used anywhere?
Are you looking for instances where oneelement may be next to another?
Would you like to know if your stylesheet has CSS that could be deleted?
Has malware infected your CMSand started adding weird links?
Do you have calls to action that might be missing data attributes?
All of these are real world use-cases that I’ve used SelectorHoundfor.
Try it out
First, install it
npm i -g selector-hound
Or, for more speed:
bun install -g selector-hound
Then run it
SelectorHound-u https://blog.frankmtaylor.com/sitemap.xml -s "h1"
Then look at what you got
It’ll tell you what it’s doing as it gets started
The colors aren’t the best. Relax. I’m working on it
And it will export all those URLs to a JSON file. This means you can customize the pages it scans. It’ll rely on that JSON file for every scan unless you pass -X to force it to generate a new sitemap file.
Then it’ll tell you when it’s finished and give you a nice summary of what it found.
You can modify the output file name with the -o flag. Your chosen name will be prepended to pages.json
Don’t forget to check the log
And then look at the results
The -e flag will give you all the sporty details you need to know exactly where the element is on the page and what it looks like.
The output can be pretty robust because it’ll give you results for every page that it found. I am working on a reporting feature that can summarize the results if you don’t want to wade through what could be thousandsof lines of JSON.
Is it performant?
It’s faster than writing a RegEx to scan your codebase, that’s for sure.
I’ve done a little bit of testing and found that, if you were looking for a single HTML element, it might take on average .52s per page. If you install with Bun, you will get maybe a .1sgain.
I’ve used SelectorHoundwith sitemaps containing up to 2000 links, and with crawling that produced up to 500 pages.
Activating Puppeteer to either take screenshots or just expect it to be a SPAwill slow things down significantly, so use that with caution.
Where can you see the code?
It’s over on Github. I welcome contributions and feature requests.