More

benlivengood · 2026-01-29T19:40:34 1769715634

Agreed; everyone complained that LLMs have no world model, so here we go. Next logical step is to backfill the weights with encoded video from the real world at some reasonable frame rate to ground the imagination and then branch the inference on possible interventions (actions) in the near future of the simulation, throw the results into a goal evaluator and then send the winning action-predictions to motors. Getting timing right will probably require a bit more work than literally gluing them together, but probably not much more.

patapong · 2026-01-30T12:02:21 1769774541

This is the most convincing take of what might actually get us to AGI I've heard so far :)

benlivengood · 2026-01-23T17:23:40 1769189020

I dunno, GPT-OSS and Llama and QWEN and any half dozen of other large open-weight models?

I really can't imagine OpenAI or Anthropic turning off inference for a model that my workplace is happy to spend >$200*person/month on. Google still has piles of cash and no reason to turn off Gemini.

The thing is, if inference is truly heavily subsidized (I don't think it is, because places like OpenRouter charge less than the big players for proportionally smaller models) then we'd probably happily pay >$500 a month for the current frontier models if everyone gave up on training new models because of some oddball scaling limit.

crimsoneer · 2026-01-23T17:47:02 1769190422

Yeah, this is silly. Plenty of companies are hosting their own now, sometimes on prem. This isn't going away

iLoveOncall · 2026-01-23T17:35:05 1769189705

> we'd probably happily pay >$500 a month for the current frontier models

Try $5,000. OpenAI loses hundreds of billions a year, they need a 100x, not 2x.

gingersnap · 2026-01-23T17:43:38 1769190218

But they are not losing 100x on inference on high paying customers. Their biggest loss is free user + training/development cost

weirdmantis69 · 2026-01-23T20:16:18 1769199378

Why lie on a site where people know things.

filoleg · 2026-01-23T17:47:53 1769190473

OpenAI loses hundreds of billions a year on inference? I strongly doubt it

ndriscoll · 2026-01-23T17:49:48 1769190588

$60k/yr still seems like a good deal for the productivity multiplier you get on an experienced engineer costing several times that. Actually, I'm fairly certain that some optimizations I had codex do this week would already pay for that from being able to scale down pod resource requirements, and that's just from me telling it to profile our code and find high ROI things to fix, taking only part of my focus away from planned work.

Another data point: I gave codex a 2 sentence description (being intentionally vague and actually slightly misleading) of a problem that another engineer spent ~1 week root causing a couple months ago, and it found the bug in 3.5 minutes.

These things were hot garbage right up until the second they weren't. Suddenly, they are immensely useful. That said, I doubt my usage costs anywhere near that much to openai.

Marsymars · 2026-01-23T19:39:48 1769197188

> $60k/yr still seems like a good deal for the productivity multiplier you get on an experienced engineer costing several times that.

Maybe, but that's a hard sell to all the workplaces who won't even spring for >1080p monitors for their experienced engineers.

thot_experiment · 2026-01-23T18:42:39 1769193759

Wildly different experience of frontier models than I have, what's your problem domain? I had both Opus and Gemini Pro outright fail at implementing a dead simple floating point image transformation the other day because neither could keep track of when things were floats and when they were uint8.

ndriscoll · 2026-01-23T19:41:22 1769197282

Low-level networking in some cloud applications. Using gpt-5.2-codex medium. I've cloned like 25 of our repos on my computer for my team + nearby teams and worked with it for a day or so coming up with an architecture diagram annotated with what services/components live in what repos and how things interact from our team's perspective (so our services + services that directly interact with us). It's great because we ended up with a mermaid diagram that's legible to me, but it's also a great format for it to use. Then I've found it does quite well at being able to look across repos to solve issues. It also made reference docs for all available debug endpoints, metrics, etc. I told it where our prometheus server is, and it knows how to do promql queries on its own. When given a problem, it knows how to run debug commands on different servers via ssh or inspect our kubernetes cluster on its own. I also had it make a shell script to go figure out which servers/pods are involved for a particular client and go check all of their debug endpoints for information (which it can then interpret). Huge time saver for debugging.

I'm surprised it can't keep track of float vs uint8. Mine knew to look at things like struct alignment or places where we had slices (Go) on structures that could be arrays (so unnecessary boxing), in addition to things like timer reuse, object pooling/reuse, places where local variables were escaping to heap (and I never even gave it the compiler escape analysis!), etc. After letting it have a go with the profiler for a couple rounds, it eventually concluded that we were dominated by syscalls and crypto related operations, so not much more could be microoptimized.

I've only been using this thing since right before Christmas, and I feel like I'm still at a fraction of what it can do once you start teaching it about the specifics of your workplace's setup. Even that I've started to kind-of automate by just cloning all of our infra teams' repos too. Stuff I have no idea about it can understand just fine. Any time there's something that requires more than a super pedestrian application programmer's knowledge of k8s, I just say "I don't really understand k8s. Go look at our deployment and go look at these guys' terraform repo to see all of what we're doing" and it tells me what I'm trying to figure out.

thot_experiment · 2026-01-23T22:36:51 1769207811

Yeah wild, I don't really know how to bridge the gap here because I've recently been continuously disappointed by AI. Gemni Pro wasn't even able to solve a compiler error the other day, and the solutions it was suggesting were insane (manually migrating the entire codebase) when the solution was like a 0.0.xx compiler version bump. I still like AI a lot for function-scale autocomplete, but I've almost stopped using agents entirely because they're almost universally producing more work for me and making the job less fun, I have to do so much handholding for them to make good architectural decisions and I still feel like I end up on shaky foundations most of the time. I'm mostly working on physics simulation and image processing right now. My suspicion is that there's just so many orders of magnitude more cloud app plumbing code out there that the capability is really unevenly distributed, similarly with my image processing stuff my suspicion is that almost all the code it is trained on works in 8bit and it's just not able to get past it's biases and stop itself from randomly dividing things that are already floats by 255.

benlivengood · 2026-01-22T21:11:53 1769116313

Yep, like ECHELON and friends are. The metadata recorded about your (all of our) traffic is probably enough to perform the timing attack.

shadowgovt · 2026-01-22T21:56:23 1769118983

Hey, if ECHELON snuck a listener into my house, where six devices hang out on a local router... Good for them, they're welcome to my TODO lists and vast collection of public-domain 1950s informational videos.

(I wouldn't recommend switching the option off for anything that could transit the Internet or be on a LAN with untrusted devices. I am one of those old sods who doesn't believe in the max-paranoia setting for things like "my own house," especially since if I dial that knob all the way up the point is moot; they've already compromised every individual device at the max-knob setting, so a timing attack on my SSH packet speed is a waste of effort).

benlivengood · 2026-01-21T23:53:42 1769039622

Deontological, spiritual/religious revelation, or some other form of objective morality?

The incompatibility of essentialist and reductionist moral judgements is the first hurdle; I don't know of any moral realists who are grounded in a physical description of brains and bodies with a formal calculus for determining right and wrong.

I could be convinced of objective morality given such a physically grounded formal system of ethics. My strong suspicion is that some form of moral anti-realism is the case in our universe. All that's necessary to disprove any particular candidate for objective morality is to find an intuitive counterexample where most people agree that the logic is sound for a thing to be right but it still feels wrong, and that those feelings of wrongness are expressions of our actual human morality which is far more complex and nuanced than we've been able to formalize.

staticassertion · 2026-01-22T03:17:12 1769051832

You can be a physicalist and still a moral realist. James Fodor has some videos on this, if you're interested.

benlivengood · 2026-01-22T21:26:11 1769117171

Granted, if humans had utility functions and we could avoid utility monsters (maybe average utilitarianism is enough) and the child in the basement (if we could somehow fairly normalize utility functions across individuals so that it was well-defined to choose the outcome where the minimum of everyone's utility functions is maximized [argmax_s min(U_x(s)) for all people in x over states s]) then I'd be a moral realist.

I think we'll keep having human moral disagreements with formal moral frameworks in several edge cases.

There's also the whole case of anthropics: how much do exact clones and potentially existing people contribute moral weight? I haven't seen a solid solution to those questions under consequentialism yet; we don't have the (meta)philosophy to address them yet; I am 50/50 on whether we'll find a formal solution and that's also required for full moral realism.

benlivengood · 2026-01-21T21:50:03 1769032203

Without at least some filtering a Gateway NAT appliance is vulnerable to:

* LAN IP address spoofing from the WAN

* Potential for misconfigured "internal" daemons to accept WAN traffic (listening on 0.0.0.0 instead of the LAN or localhost)

* Reflection amplification attacks

tsimionescu · 2026-01-22T06:12:40 1769062360

LAN IP address spoofing is indeed a valid attack vector, if the ISP is compromised.

Internal daemons on machines other than the router itself in the LAN network listening on 0.0.0.0 are not insecure (unless you have the problem from point 1, malicious/compromised ISP). The router won't route packets with IPs that are not in its LAN to them. Of course, the router itself could be compromised if it accidentally listens on 0.0.0.0 and accepts malicious packets.

Not sure what you mean by reflection amplification attacks, but unless they are attacking the router itself, or they are arriving on WAN with LAN IPs (again, compromised/malicious ISP), I don't see how they would reach LAN machines.

zajio1am · 2026-01-22T18:26:14 1769106374

You do not need compromised ISP for spoofed LAN IP traffic, the attack could came from other clients on the same WAN segment.

benlivengood · 2026-01-21T16:43:37 1769013817

One could use any number of LLMs on a take-home problem so in-person interviews are a must.

legel · 2026-01-21T17:09:51 1769015391

One could use any number of LLMs on real-world problems.

Why are we still interviewing like its 1999?

game_the0ry · 2026-01-21T17:13:54 1769015634

Old habits die hard. And engineers are pretty lazy when it comes to interviews, so just throwing the same leetcode problem into coder pad in every interview makes interviews easier for the person doing the interview.

selkin · 2026-01-21T17:26:38 1769016398

If you want people to interview better, you have to both allocate resources to it, and make it count on perf. It’s not laziness, it’s ROI.

yodsanklai · 2026-01-21T17:27:42 1769016462

As an interviewer, I ask the same problems because it makes it much easier to compare candidates.

game_the0ry · 2026-01-21T17:29:57 1769016597

How do you know if one candidate happened to see the problem on leetcode and memorized the solution versus one who struggled but figured it out slower?

yodsanklai · 2026-01-21T17:52:25 1769017945

It's very easy to tell, but it doesn't make much difference. The best candidates have seen the problems before and don't even try to hide it, they just propose their solution right away.

I try give positive feedback for candidates who didn't know the problem but could make good use of hints, or had the right approach. But unfortunately, it's difficult to pass a Leetcode interview if you haven't seen a similar problem to what is asked before. Most candidates I interview nowadays seem to know all questions.

That's what the company has decided so we have to go along. The positive side is that if you do your part, you have good chances of being hired, even if you disagree with the process.

bradlys · 2026-01-21T17:35:34 1769016934

It doesn’t matter. It’s about looking for candidates who have put in the time for your stupid hazing ritual. It signals on people who are willing to dedicate a lot of time to meaningless endeavors for the sake of employment.

This type of individual is more likely to follow orders and work hard - and most importantly - be like the other employees you hired.

legel · 2026-01-21T18:42:14 1769020934

Once upon a time, the "stupid hazing ritual" made sense.

Now it means company is stupid.

benlivengood · 2026-01-21T22:05:34 1769033134

Because if you want to hire engineers then you have to ask engineering questions. Claude and GPT and Gemini are super helpful but they're not autonomous coders yet so you need an actual engineer to vet their outcome still.

benlivengood · 2026-01-13T23:20:32 1768346432

I guess if you only need to store one bit you could store either 0 or 11 and on average use less than two bits (for bit flips only), or 111 if you have to also worry about losing/duplicating bits.

benlivengood · 2025-12-30T20:00:23 1767124823

With an optimal way of determining fair splitting of gains like Shapley value[0] you can cooperate or defect with a probability that maximizes other participants expected value when everyone act fairly.

The ultimatum game is the simplest example; N dollars of prize to split, N/2 is fair, accept with probability M / (N /2) where M is what's offered to you; the opponents maximum expected value comes from offering N/2; trying to offer less (or more) results in expected value to them < N/2.

Trust can be built out of clearly describing how you'll respond in your own best interests in ways that achieve fairness, e.g. assuming the other parties will understand the concept of fairness and also act to maximize their expected value given their knowledge of how you will act.

If you want to solve logically harder problems like one-shot prisoners dilemma, there are preliminary theories for how that can be done by proving things about the other participants directly. It won't work for humans, but maybe artificial agents. https://arxiv.org/pdf/1401.5577

[0] https://en.wikipedia.org/wiki/Shapley_value

jmward01 · 2025-12-30T20:32:49 1767126769

Thanks. I'll take a look!

benlivengood · 2025-12-29T20:37:18 1767040638

At Google I worked with one statistics aggregation binary[0] that was ~25GB stripped. The distributed build system wouldn't even build the debug version because it exceeded the maximum configured size for any object file. I never asked if anyone had tried factoring it into separate pipelines but my intuition is that the extra processing overhead wouldn't have been worth splitting the business logic that way; once the exact set of necessary input logs are in memory you might as well do everything you need to them given the dramatically larger ratio of data size to code size.

[0] https://research.google/pubs/ubiq-a-scalable-and-fault-toler...

benlivengood · 2025-12-29T19:51:49 1767037909

In the long run I think it's pretty unhealthy to make one's career a large part of one's identity. What happens during burnout or retirement or being laid off if a huge portion of one's self depends on career work?

Economically it's been a mistake to let wealth get stratified so unequally; we should have and need to reintroduce high progressive tax rates on income and potentially implement wealth taxes to reduce the necessity of guessing a high-paying career over 5 years in advance. That simply won't be possible to do accurately with coming automation. But it is possible to grow social safety nets and decrease wealth disparity so that pursuing any marginally productive career is sufficient.

Practically, once automation begins producing more value than 25% or so of human workers we'll have to transition to a collective ownership model and either pay dividends directly out of widget production, grant futures on the same with subsidized transport, or UBI. I tend to prefer a distribution-of-production model because it eliminates a lot of the rent-seeking risk of UBI; your landlord is not going to want 2X the number of burgers and couches you get distributed as they'd happily double rent in dollars.

Once full automation hits (if it ever does; I can see augmented humans still producing up to 50% of GDP indefinitely [so far as anyone can predict anything past human-level intelligence] especially in healthcare/wellness) it's obvious that some kind of direct goods distribution is the only reasonable outcome; markets will still exist on top of this but they'll basically be optional participation for people who want to do that.

throw1235435 · 2025-12-29T21:21:04 1767043264

If we had done what you say (distribute wealth more evenly between people/corporations) more to the point I don't know if AI would of progressed as it has - companies would of been more selective with their investment money and previously AI was seen at best as a long shot bet. Most companies in the "real economy" can't afford to make too many of these kind of bets in general.

The main reason for the transformer architecture, and many other AI advancements really was "big tech" has lots of cash that they don't know what to do with. It seems the US system punishes dividends as well tax wise; so companies are incentivized to become like VC's -> buy lots of opportunities hoping one makes it big even if many end up losing.

benlivengood · 2025-12-30T19:00:19 1767121219

Transformers grew out of the value-add side (autotranslation), though, not really the ad business side iirc. Value-add work still gets done in high-progressive-tax societies if it's valuable to a large fraction of people. Research into luxury goods is slowed by progressive tax rates, but the actual border between consumer and luxury goods actually rises a bit with redistributed wealth; more people can afford smartphones earlier and almost no one buys superyachts and so reinvestment into general technology research may actually be higher.

wpm · 2025-12-30T06:00:42 1767074442

And I'm sure none of it was based on any public research from public universities, or private universities that got public grants.

throw1235435 · 2025-12-30T21:54:52 1767131692

Sure. I just know in most companies (seeing the numbers on projects in a number of them across industries now) funding projects which give time for people to think, ponder, publish white papers of new techniques is rare and economically not justifiable against other investments.

Put it this way - to have a project where people have the luxury to scratch their heads for awhile and to bet on something that may not actually be possible yet is something most companies can't justify to finance. Listening to the story of the transformer invention it sounds like one of these projects to me.

They may stand on the shoulders of giants that is true (at the very least they were trained in these institutions) but putting it together as it was - that was done in a commercial setting with shareholder funds.

In addition given the disruption to Google in general LLM's have done I would say, despite Gemini, it may of been better cost/benefit wise for Google NOT to invent the transformer architecture at all/yet or at least not publish a white paper for the world to see. As a use of shareholders funds the activity above probably isn't a wise one.

encyclopedism · 2025-12-29T20:01:22 1767038482

I agree with much of what you say.

Career being the core of one's identity is so ingrained in society. Think about how schooling is directed towards producing what 'industry' needs. Education for educations sake isn't a thing. Capitalism see's to this and ensures so many avenues are closed to people.

Perhaps this will change but I fear it will be a painful transition to other modes of thinking and forming society.

Another problem is hoarding. Wealth inequality is one thing but the unadulterated hoarding by the very wealthy means that wealth is unable to circulate as freely as it ought to be. This burdens a society.

theshrike79 · 2025-12-30T16:56:00 1767113760

> Career being the core of one's identity is so ingrained in society

In AMERICAN society. Over there "what do you do?" is in the first 3 questions people ask each other when they meet.

I've known people for 20 years and I don't have the slightest clue what they do for a living, it's never came up. We talk about other things - their profession isn't a part of their personality.

b40d-48b2-979e · 2025-12-29T20:36:38 1767040598

    Education for educations sake isn't a thing.

It is but only for select members of society. Off the top of my head, those with benefits programs to go after that opportunity like 100% disabled veterans, or the wealthy and their families.