At some point in the future (if not already), claude will install malware less often on average. Just like waymos crash less frequently.
Once you accept that installation will be automated, standardized formats make a lot of sense. Big q is will this particular format, which seems solid, get adopted - probably mostly a timing question
What does that have to do with the post you replied to? Is your implication that young people can’t enter hospice or have terminal illness? Or are we just quoting random statistics without any regard to relevance whatsoever?
Something I'd consider a game-changer would be making it really easy to kick off multiple claude instances to tackle a large researched task and then to view the results and collect them into a final research document.
IME no matter how well I prompt, a single claude/codex will never get a successful implementation of a significant feature single-shot. However, what does work is having 5 Claudes try it, reading the code and cherry picking the diff segments I like into one franken-spec I give to a final claude instance with essentially just "please implement something like this"
It's super manual nd annoying with git work-trees for me, but sounds like your setup could make it slick
Interesting. So, do you just start multiple instances of Claude Code and ask the same prompt on all of them? Manually cherry picking from 5 different worktrees sounds complicated. Will see what I can do :)
I agree, it's more complex. But, I feel like the potential with a claude code wrapper is precisely in enabling workflows that are a pain to self-implement but nonetheless are incredibly powerful
The basic idea is this. The customer has some "curve" that represents how much he values different outcomes. Maybe he values good outcomes at $1, and great outcomes at $100. The supplier also has a cost curve - by definition, it will cost him more to supply a great outcome than a good outcome (otw he'd just always supply the great outcome).
Setting a fixed price is a simple way to help these two parties transact. But hypothetically, it may be more efficient - e.g. you will let more mutually-beneficial events happen - to ask both parties for what their number is for a given event, and having both transact when the numbers are far enough apart (cost is $10, value is $100).
The problem is, you can't directly ask the parties, because they don't want to reveal how high/low they're willing to go for no reason. So, you should essentially structure your questions into a pre-defined algorithm so that everyone is incentivized to reveal at least the ballpark of where their cost/value is. The study of how to structure those questions is a subset of mechanism design / information design, which is a branch of Econ related to game theory
FWIW, if this sounds like arcane academic musing ... applied mechanism design for a while was essentially just the study of google ad auctions, and Google invested very very heavily in researchers to figure out how to do this for them
The three things I'd be most interested in seeing are:
1. A fine-tuned model for structured data extraction. Get something that's REALLY good at outputting in a specific JSON format, then show it running against a wide range of weird inputs.
2. A fine-tuned vision LLM that gains a new ability that the underlying model did not have, such as identifying different breeds of common California garden birds
3. Text to SQL. Text to SQL is always a great demo for this stuff, a fine-tuned model that's demonstrably "better" at text to SQL for a specific complex database schema would be a really great example.
FWIW here is a case study from shopify covering a project of theirs using fine tuning on a bi-modal model to extract product features. I get that this is not the situation you care about -- they are running at such scale that they need the inferences to be cheap.
A really simple blog post for any task that you think is worthwhile would be enough to move the field forward. The blog post should include:
1) the training configuration and code
2) the data used to fine tune
3) a set of input/output comparisons comparing the tuned bot to the original bot that show it's learned something interesting
For something really compelling it would host the created models on a repo that I could download and use. The gold standard would be to host them and provide a browser interface, but this could be expensive for gpu costs.
This blog post currently doesn't exist, or if it does I haven't been able to find it in the sea of medium articles detailing an outdated hugging face api
https://lean-crafter-production.up.railway.app/ https://github.com/JoshuaPurtell/lean-crafter
reply