More

ohans · 2026-01-29T09:25:19 1769678719

Author here.

AI agents have dominated Twitter/X the past few days: Claude Code, Remotion, Cursor, etc. But most developers are running these locally. Terminal agents on your machine.

That's useful, but it's a different game when you're building AI apps for users. The moment your agent needs to actually execute something e.g., run a Python script, process a PDF, generate files, you're suddenly building sandboxing infrastructure, managing VMs, handling file storage.

It's a massive distraction from your actual product. We built Bluebag to solve this for Vercel AI SDK users. Two lines of code, and your agent gets access to Skills that execute in managed sandboxes.

No Docker orchestration. No K8s. Dependencies, file handling, signed download URLs, all handled.

The post walks through some of the architecture (progressive skill loading, auto-provisioned VMs, multi-tenant isolation) and shows concrete integration examples.

Happy to answer questions about the approach or trade-offs we made.

Cheers

ohans · 2025-12-21T19:49:32 1766346572

TIL: you could add a ".diff" to a PR URL. Thanks!

As for PR reviews, assuming you've got linting and static analysis out the way, you'd need to enter a sufficiently reasonable prompt to truly catch problems or surface reviews that match your standard and not generic AI comments.

My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments

visarga · 2025-12-21T20:06:02 1766347562

I would just put a PR_REVIEW.md file in the repo an have a CI agent run it on the diff/repo and decide pass or reject. In this file there are rules the code must be evaluated against. It could be project level policy, you just put your constraints you cannot check by code testing. Of course any constraint that can be a code test, better be a code test.

My experience is you can trust any code that is well tested, human or AI generated. And you cannot trust any code that is not well tested (what I call "vibe tested"). But some constraints need to be in natural language, and for that you need a LLM to review the PRs. This combination of code tests and LLM review should be able to ensure reliable AI coding. If it does not, iterate on your PR rules and on tests.

hrpnk · 2025-12-21T19:54:49 1766346889

`gh pr diff num` is an alternative if you have the repo checked out. One can then pipe the output to one's favorite llm CLI and create a shell alias with a default review prompt.

> My company uses some automatic AI PR review bots, and they annoy me more than they help. Lots of useless comments

One way to make them more useful is to ask to list the topN problems found in the change set.

MYEUHD · 2025-12-21T20:33:21 1766349201

> TIL: you could add a ".diff" to a PR URL. Thanks!

You can also append ".patch" and get a more useful output

ohans · 2025-12-21T19:02:38 1766343758

This was a brilliant write up, and loved the interactivity.

I do think "logs are broken" is a bit overstated. The real problem is unstructured events + weak conventions + poor correlation.

Brilliant write up regardless

ohans · 2025-12-19T05:31:55 1766122315

Hey HN, can't help but think this is where AI development will be heading in 2026, with the biggest reasons being deterministic outputs + the cost savings that come from progressive disclosure as opposed to tools/MCPs

As always, curious to hear your thoughts

ohans · 2025-12-11T12:15:55 1765455355

Hi HN,

When Anthropic published their Skills system (https://www.anthropic.com/news/skills), the idea clicked for me immediately: take a general-purpose agent and turn it into a specialized one with procedural knowledge that no model can fully memorize.

In my own projects I wasn’t using Claude (most of my workloads were on Gemini 2.5 Flash, mostly cos it was affordable and got the job done), but I still wanted that architecture: a way to define Skills once and use them with whatever LLM made sense for a given use case.

So over the past few weeks I put together a solution that does roughly that. Right now it supports:

- Bundling metadata, instructions, reference files, and optional scripts into a Skill - Running scripts in Python or JS runtimes (with automatic package installation) - A simple files API so the LLM can create files, reference them, mint temporary download links, and let me upload docs for analysis - A CLI to manage skills locally (push/pull), a Typescript SDK and a web app to manage API keys, PATs, playground etc.

There’s a playground at http://www.bluebag.ai/playground with example Skills (mostly adapted from Anthropic’s public Skills repo at https://github.com/anthropics/skills). On the right-hand side you can see how different models progressively load files and metadata, so you can inspect how selection and loading behave across models.

There are still some open questions I’m thinking about, especially around VM reuse and isolation at scale, and how to handle large Skill libraries over time (cold starts with very large package sets and 15+ Skills are slow).

But it’s been useful enough in my own work that I wanted to share it and get feedback. I’d be interested in:

- obvious failure modes I’m missing - prior art I should be looking at (e.g., agent frameworks)

Happy to answer any questions or dig into implementation details if that’s useful.

Cheers

tpcollns · 2025-12-11T12:35:10 1765456510

The playground seems cool. I think I get a sense of how this works and could see myself using it. Is it fair to assume this plugs into a VM behind the scenes? Shared?

Out of curiosity, is any part of this also open-sourced?

ohans · 2025-12-11T12:45:05 1765457105

Yes!! The runtimes are ephemeral VMs/containers with no network access (exposed)

On OS, the core of the solution is not currently open source; it’s still changing a lot, and I don’t want to publish an API or SDK surface that I’ll immediately have to update.

But I plan to open-source the CLI package and SDKs shortly

ohans · 2025-12-09T09:26:14 1765272374

What does "runs itself" mean in this context?

since · 2025-12-09T09:30:13 1765272613

Hi mate, It’s a cron job pipeline that crawls target sources, processes the content via an AI model (for summarization & tagging), and pushes the updates live. Zero human intervention

ohans · 2025-12-09T12:56:44 1765285004

OK, got it! Cool. I had wild ideas, but this makes sense.

QQ: on clicking the articles, I expected to get redirected to the original article, but I stayed on your site.

Is what I see a summarisation of the original article?

Are the original authors happy with you? :)

ohans · 2025-12-02T15:19:24 1764688764

Really cool! I reckon a nice UI would be a good addition

ohans · 2025-12-02T15:15:37 1764688537

Looks good! You could push to npm so that running it could be as easy as:

npx webclone URL (no repo cloning required)

Also, FYI, when running the example code

node webclone.js https://www.example.com/

It fails (at least for me) until I either install yt-dlp or ignore videos via:

node webclone.js https://www.example.com/

jadesee · 2025-12-04T14:21:50 1764858110

Hello. I have tested this and it indeed looks for yt-dlp at the beginning even though the site is not a video platform. I have logged the issue on GitHub and working on a fix. Thank you for the feedback!

jadesee · 2025-12-02T18:25:12 1764699912

Great feedback! Will get this fixed. Thank you.

ohans · 2025-12-09T09:24:14 1765272254

No qualms! Thanks for sharing :)

ohans · on Jan 28, 2025

For just $10? The pricing begins at $45. Except you mean when broken down monthly?

Congrats on the launch!

ohans · on Jan 22, 2025

I see, thanks. New to HN. I suppose I'll repost it later next week since I can't delete/edit the existing post.

Thanks!