Different use cases. I want aws-cli for scripting, repeated cases, and embedding those executions for very specific results. I want this for exploration and ad-hoc reviews.
Nobody is taking away the cli tool and you don't have to use this. There's no "turns into" here.
Oh I think you misinterpreted my comment! I am very much a fan of this, wasn't throwing shade. I am just remarking on how my side-project scope today dwarfs my side-project scope of a year or two ago.
They buried the lede. The last half of the article with ways to ground your dev environment to reduce the most common issues should be its own article. (However implementing the proper techniques somewhat obviates the need for CodeRabbit, so guess it’s understandable.)
I had the same question — I understand that the Actions control plane has costs on self-hosted runners that GitHub would like to recoup, but those costs are fixed per-job. Charging by the minute for the user’s own resources gives the impression that GitHub is actually trying to disincentivize third-party runners.
Self-hosted runner regularly communicates with the control plane, and control plane also needs to keep track of job status, logs, job summaries, etc.
8h job is definitely more expensive to them than a 1 minute one, but I'd guess that the actual reason is that this way they earn more money, and dissuade users from using a third party service instead of their own runners.
That's generous, but doesn't seem consistent with how Microsoft does business. Also, if that's the case why does self-hosted cost the same as the lowest hosted tier?
Building an interactive shell inside their CLI seems like a very odd technical solution. I can’t think of any use case where the same context gathering couldn’t be gleaned by examining the file/system state after the session ended, but maybe I’m missing something.
On the other hand, now that I’ve read this, I can see how having some hooks between the code agent CLIs and ghostty/etc could be extremely powerful.
LLMs in general struggles with numbers, it's easy to tell with the medium sized models that struggle with line replacement commands where it has to count, it usually takes a couple of tries to get right.
I always imagined they'd have an easier time if they could start a vim instance and send search/movement/insert commands instead, not having to keep track of numbers and do calculations, but instead visually inspect the right thing happening.
I haven't tried this new feature yet, but that was the first thing that came to mind when seeing it, it might be easier for LLMs to do edits this way.
Personally haven't had that happen to me, been using Codex (and lots of other agents) for months now. Anecdote, but still. I wrote up a summary of how I see the current difference between the agents right now: https://news.ycombinator.com/item?id=45680796
Still a toss-up for me which one I use. For deep work Codex (codex-high) is the clear winner, but when you need to knock out something small Claude Code (sonnet) is a workhorse.
Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.
I would add to the list of the vibe engineer’s tasks:
Knowing when the agent has failed and it’s time to roll back. After four or five turns of Claude confidently telling you the feature is done, but things are drifting further off course, it’s time to reset and try again.