Hacker Newsnew | past | comments | ask | show | jobs | submit | ohans's commentslogin

Author here.

AI agents have dominated Twitter/X the past few days: Claude Code, Remotion, Cursor, etc. But most developers are running these locally. Terminal agents on your machine.

That's useful, but it's a different game when you're building AI apps for users. The moment your agent needs to actually execute something e.g., run a Python script, process a PDF, generate files, you're suddenly building sandboxing infrastructure, managing VMs, handling file storage.

It's a massive distraction from your actual product. We built Bluebag to solve this for Vercel AI SDK users. Two lines of code, and your agent gets access to Skills that execute in managed sandboxes.

No Docker orchestration. No K8s. Dependencies, file handling, signed download URLs, all handled.

The post walks through some of the architecture (progressive skill loading, auto-provisioned VMs, multi-tenant isolation) and shows concrete integration examples.

Happy to answer questions about the approach or trade-offs we made.

Cheers


TIL: you could add a ".diff" to a PR URL. Thanks!

As for PR reviews, assuming you've got linting and static analysis out the way, you'd need to enter a sufficiently reasonable prompt to truly catch problems or surface reviews that match your standard and not generic AI comments.

My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments


I would just put a PR_REVIEW.md file in the repo an have a CI agent run it on the diff/repo and decide pass or reject. In this file there are rules the code must be evaluated against. It could be project level policy, you just put your constraints you cannot check by code testing. Of course any constraint that can be a code test, better be a code test.

My experience is you can trust any code that is well tested, human or AI generated. And you cannot trust any code that is not well tested (what I call "vibe tested"). But some constraints need to be in natural language, and for that you need a LLM to review the PRs. This combination of code tests and LLM review should be able to ensure reliable AI coding. If it does not, iterate on your PR rules and on tests.


`gh pr diff num` is an alternative if you have the repo checked out. One can then pipe the output to one's favorite llm CLI and create a shell alias with a default review prompt.

> My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments

One way to make them more useful is to ask to list the topN problems found in the change set.


> TIL: you could add a ".diff" to a PR URL. Thanks!

You can also append ".patch" and get a more useful output


This was a brilliant write up, and loved the interactivity.

I do think "logs are broken" is a bit overstated. The real problem is unstructured events + weak conventions + poor correlation.

Brilliant write up regardless


Hey HN, can't help but think this is where AI development will be heading in 2026, with the biggest reasons being deterministic outputs + the cost savings that come from progressive disclosure as opposed to tools/MCPs

As always, curious to hear your thoughts


Hi HN,

When Anthropic published their Skills system (https://www.anthropic.com/news/skills), the idea clicked for me immediately: take a general-purpose agent and turn it into a specialized one with procedural knowledge that no model can fully memorize.

In my own projects I wasn’t using Claude (most of my workloads were on Gemini 2.5 Flash, mostly cos it was affordable and got the job done), but I still wanted that architecture: a way to define Skills once and use them with whatever LLM made sense for a given use case.

So over the past few weeks I put together a solution that does roughly that. Right now it supports:

- Bundling metadata, instructions, reference files, and optional scripts into a Skill - Running scripts in Python or JS runtimes (with automatic package installation) - A simple files API so the LLM can create files, reference them, mint temporary download links, and let me upload docs for analysis - A CLI to manage skills locally (push/pull), a Typescript SDK and a web app to manage API keys, PATs, playground etc.

There’s a playground at http://www.bluebag.ai/playground with example Skills (mostly adapted from Anthropic’s public Skills repo at https://github.com/anthropics/skills). On the right-hand side you can see how different models progressively load files and metadata, so you can inspect how selection and loading behave across models.

There are still some open questions I’m thinking about, especially around VM reuse and isolation at scale, and how to handle large Skill libraries over time (cold starts with very large package sets and 15+ Skills are slow).

But it’s been useful enough in my own work that I wanted to share it and get feedback. I’d be interested in:

- obvious failure modes I’m missing - prior art I should be looking at (e.g., agent frameworks)

Happy to answer any questions or dig into implementation details if that’s useful.

Cheers


The playground seems cool. I think I get a sense of how this works and could see myself using it. Is it fair to assume this plugs into a VM behind the scenes? Shared?

Out of curiosity, is any part of this also open-sourced?


Yes!! The runtimes are ephemeral VMs/containers with no network access (exposed)

On OS, the core of the solution is not currently open source; it’s still changing a lot, and I don’t want to publish an API or SDK surface that I’ll immediately have to update.

But I plan to open-source the CLI package and SDKs shortly


What does "runs itself" mean in this context?


Hi mate, It’s a cron job pipeline that crawls target sources, processes the content via an AI model (for summarization & tagging), and pushes the updates live. Zero human intervention


OK, got it! Cool. I had wild ideas, but this makes sense.

QQ: on clicking the articles, I expected to get redirected to the original article, but I stayed on your site.

Is what I see a summarisation of the original article?

Are the original authors happy with you? :)


Really cool! I reckon a nice UI would be a good addition


Looks good! You could push to npm so that running it could be as easy as:

npx webclone URL (no repo cloning required)

Also, FYI, when running the example code

node webclone.js https://www.example.com/

It fails (at least for me) until I either install yt-dlp or ignore videos via:

node webclone.js https://www.example.com/


Hello. I have tested this and it indeed looks for yt-dlp at the beginning even though the site is not a video platform. I have logged the issue on GitHub and working on a fix. Thank you for the feedback!


Great feedback! Will get this fixed. Thank you.


No qualms! Thanks for sharing :)


For just $10? The pricing begins at $45. Except you mean when broken down monthly?

Congrats on the launch!


I see, thanks. New to HN. I suppose I'll repost it later next week since I can't delete/edit the existing post.

Thanks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: