I'm a senior software developer with about 10 years of experience working with Ruby on Rails and Javascript. Most recent experience is the intersection of AI and customer success tools. Led teams through scrum master and being a leader for major feature releases.
Sure! The backend is actually pretty straight forward, it's a NextJS app deployed on Now with a few added endpoints to handle the incoming GraphQL queries.
Then for actually turning the query into a digestable output I used the GraphQL schema builder that handles accepts HTML nodes from the requested page and grabs the right variables.
I remember seeing GDOM a while back when I first started this project, but forgot to write it down as a source of inspiration. I'm gonna add all of these as alternatives, because they're all great :D
You'd need to either re-implement an entire browser stack or run a headless version of gecko of webkit server-side.
The former entails millions of man-hours of work. The latter opens up your server to all sorts of exploits. Overall a really bad idea.
Besides, single page applications are the worst junk in the entire Web 2.0 cesspool. If you really need to scrape them, they usually come with their own JSON API which you can just piggyback.
Why on Earth would the OP start from scratch? Besides, though not a solo and OSS effort, Apifier does this; certainly without "millions" of hours having been spent on it.
I had been trying to figure out what would be causing this issue, thanks for pointing it out, I've pushed a fix real quick that will respond whether JSON is invalid or a CSS selector wasn't found on the provided URL.