Hacker Newsnew | past | comments | ask | show | jobs | submit | ActorNightly's commentslogin

This is cool, but they are still not going about it the right way.

Its much easier to build everything into the compressed latent space of physical objects and how they move, and operate from there.

Everyone jumped on the end-2-end bandwagon, which then locks you into the input to your driving model being vision, which means that you have to have things like genie to generate vision data, which is wasteful.


This is cool, but they are still not going about it the right way.

This is legit hilarious to read from some random HN account.


I posted this before, but Ill post again - this is one of the few things I feel confident enough to say that most people in the space are doing wrong. You can save my post and reference it when we actually get full self driving (i.e you can take a nap in the backseat while your car drives you), because its going to be implemented pretty much like this:

Humans don't drive well because we map vision policy to actions. We drive well (an in general, manipulate physical objects well), because we can do simulations inside our head to predict what the outcome will be. We aren't burdened by our inability to recognize certain things - when something is in the road, no matter what it is, we auto predict that we would likely collide with that thing because we understand the concept of 3d space and moving within it, and take appropriate action. Sure, there is some level of direct mapping as many people can drive while "spaced out", but attentive driving involves mostly the above.

The self driving system that can actually self drive needs to do the same. When you have this, you will no longer need to do things like simulate driving conditions in a computationally expensive sim. You aren't going to be concerned with training model on edge cases. All you would need to to ensure that your sensor processing results in a 3d representation of the driving conditions, and the model will then be able to do what humans do and explore a latent space of things it can do and predict outcomes then chose the best one.

You want proof? It exists in the form of Mu Zero, and it worked amazingly well. And driving can be easily reformated as a game that the engine plays in a simulator that doesn't involve vision, and learns both the available moves and also the optimal policy.

The reason everyone is doing end to end today is because they are basically trying to catch up to Tesla, and from a business perspective, nobody is willing to put money and pay smart enough people to research this, especially because there is also a legal bridge to cross when it comes to proving that the system can self drive while you napping. But nevertheless, if you ever want self driving, this is the right approach.

Meanwhile, Google who came up with Mu Zero, is now doing more advanced robotic stuff than anyone out there.


The article is about using the world model to generate simulations, not for controlling the vehicle.

They form control policy from vision data directly, which is why they need to have a massive model generate simulation vision data.

The purpose of lidar is to prove error correction when you need it most in terms of camera accuracy loss.

Humans do this, just in the sense of depth perception with both eyes.


Human depth perception uses stereo out to only about 2 or 3 meters, after which the distance between your eyes is not a useful baseline. Beyond 3m we use context clues and depth from motion when available.

Thanks, saved some work.

And I'll add that it in practice it is not even that much unless you're doing some serious training, like a professional athlete. For most tasks, the accurate depth perception from this fades around the length of the arms.


ok, but a care is a few meters wide, isn't that enough for driving depth perception similar to humans

The depths you are trying to estimate are to the other cars, people, turnings, obstacles, etc. Could be 100m away or more on the highway.

ok, but the point trying to be made is based on human's depth perception, but a car's basic limitation is the width of the vehicle, so there's missing information if you're trying to figure out if a car can use cameras to do what human eyes/brains do.

Humans are very good at processing the images that come into our brain. Each eye has a “blind spot” but we don’t notice. Our eyes adjust color (fluorescent lights are weird) and the amount of light coming in. When we look through a screen door or rain and just ignore it, or if you look outside a moving vehicle to the side you can ignore the foreground.

If you increase the distance of stereo cameras you probably can increase depth perception.

But a lidar or radar sensor is just sensing distance.


Radar has a cool property that it can sense the relative velocity of objects along the beam axis too, from Doppler frequency shifting. It’s one sense that cars have that humans don’t.

To this point, one of the coolest features Teslas _used_ to have was the ability for it to determine and integrate the speed of the car in front of you AND the speed of the car in front of THAT car, even if the second car was entirely visually occluded. They did this by bouncing the radar beam under the car in front and determining that there were multiple targets. It could even act on this: I had my car AEB when the second ahead car slammed on THEIR brakes before the car ahead even reacted. Absolutely wild. Completely gone in vision-only.

The width of your own vehicle is (pretty much) a constant, and trivial to know. Ford F150 is ~79.9 inches. Done. No sensors needed.

All the shit out there in the world is another story.


You misundestood the assignment.

Write a sonnet about Elon musk.


The company I used to work for was developing a self driving car with stereo depth on a wide baseline.

It's not all sunshine and roses to be honest - it was one of the weakest links in the perception system. The video had to run at way higher resolutions than it would otherwise and it was incredibly sensitive to calibration accuracy.


(Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance, which is why so many people get simulator sickness from stereoscopic 3d VR)

> Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance

Also subtle head and eye movements, which is something a lot of people like to ignore when discussing camera-based autonomy. Your eyes are always moving around which changes the perspective and gives a much better view of depth as we observe parallax effects. If you need a better view in a given direction you can turn or move your head. Fixed cameras mounted to a car's windshield can't do either of those things, so you need many more of them at higher resolutions to even come close to the amount of data the human eye can gather.


Easiest example I always give of this is pulling out of the alley behind my house: there is a large bush that occludes my view left to oncoming traffic, badly. I do what every human does:

1. Crane my neck forward, see if I can see around it.

2. Inch forward a bit more, keep craning my neck.

3. Recognize, no, I'm still occluded.

4. Count on the heuristic analysis of the light filtering through the bush and determine if the change in light is likely movement associated with an oncoming car.

My Tesla's perpendicular camera is... mounted behind my head on the B-pillar... fixed... and sure as hell can't read the tea leaves, so to speak, to determine if that slight shadow change increases the likelihood that a car is about to hit us.

I honestly don't trust it to pull out of the alley. I don't know how I can. I'd basically have to be nose-into-right-lane for it to be far enough ahead to see conclusively.

Waymo can beam the LIDAR above and around the bush, owing to its height and the distance it can receive from, and its camera coverage to the perpendicular is far better. Vision only misses so many weird edge cases, and I hate that Elon just keeps saying "well, humans have only TWO cameras and THEY drive fine every day! h'yuck!"


> owing to its height and the distance it can receive from,

And, importantly, the fender-mount LIDARs. It doesn't just have the one on the roof, it has one on each corner too.

I first took a Waymo as a curiosity on a recent SF trip, just a few blocks from my hotel east on Lombard to Hyde and over to the Buena Vista to try it out, and I was immediately impressed when we pulled up the hill to Larkin and it saw a pedestrian that was out of view behind a building from my perspective. Those real-time displays went a long way to allowing me to quickly trust that the vehicle's systems were aware of what's going on around it and the relevant traffic signals. Plenty of sensors plus a detailed map of a specific environment work well.

Compare that to my Ioniq5 which combines one camera with a radar and a few ultrasonic sensors and thinks a semi truck is a series of cars constantly merging in to each other. I trust it to hold a lane on the highway and not much else, which is basically what they sell it as being able to do. I haven't seen anything that would make me trust a Tesla any further than my own car and yet they sell it as if it is on the verge of being able to drive you anywhere you want on its own.


In fact there are even more depth perception clues. Maybe the most obvious is size (retinal versus assumed real world size). Further examples include motion parallax, linear perspective, occlusion, shadows, and light gradients

Here is a study on how these effects rank when it’s comes to (hand) reaching tasks in VR: https://pubmed.ncbi.nlm.nih.gov/29293512/


Actually the reason people experience vection in VR is not focal depth but the dissonance between what their eyes are telling them and what their inner ear and tactile senses are telling them.

It's possible they get headaches from the focal length issues but that's different.


I keep wondering about the focal depth problem. It feels potentially solvable, but I have no idea how. I keep wondering if it could be as simple as a Magic Eye Autostereogram sort of thing, but I don't think that's it.

There have been a few attempts at solving this, but I assume that for some optical reason actual lenses need to be adjusted and it can't just be a change in the image? Meta had "Varifocal HMDs" being shown off for a bit, which I think literally moved the screen back and forth. There were a couple of "Multifocal" attempts with multiple stacked displays, but that seemed crazy. Computer Generated Holography sounded very promising, but I don't know if a good one has ever been built. A startup called Creal claimed to be able to use "digital light fields", which basically project stuff right onto the retina, which sounds kinda hogwashy to me but maybe it works?


My understanding is that contextual clues are a big part of it too. We see a the pitcher wind up and throw a baseball as us more than we stereoscopically track its progress from the mound to the plate.

More subtly, a lot of depth information comes from how big we expect things to be, since everyday life is full of things we intuitively know the sizes of, frames of reference in the form of people, vehicles, furniture, etc . This is why the forced perspective of theme park castles is so effective— our brains want to see those upper windows as full sized, so we see the thing as 2-3x bigger than it actually is. And in the other direction, a lot of buildings in Las Vegas are further away than they look because hotels like the Bellagio have large black boxes on them that group a 2x2 block of the actual room windows.


> Humans do this, just in the sense of depth perception with both eyes.

Humans do this with vibes and instincts, not just depth perception. When I can't see the lines on the road because there's too much slow, I can still interpret where they would be based on my familiarity with the roads and my implicit knowledge of how roads work, e.g. We do similar things for heavy rain or fog, although, sometimes those situations truly necessitate pulling over or slowing down and turning on your 4s - lidar might genuinely given an advantage there.


That’s the purpose of the neural networks

Yes and no - vibes and instincts isn't just thought, it's real senses. Humans have a lot of senses; dozens of them. Including balance, pain, sense of passage of time, and body orientation. Not all of these senses are represented in autonomous vehicles, and it's not really clear how the brain mashes together all these senses to make decisions.

Another way humans perceive depth is by moving our heads and perceiving parallax.

How expensive is their lidar system?

Hesai has driven the cost into the $200 to 400 range now. That said I don't know what they cost for the ones needed for driving. Either way we've gone from thousands or tens of thousands into the hundreds dollar range now.

Looking at prices, I think you are wrong and automotive Lidar is still in the 4 to 5 figure range. HESAI might ship Lidar units that cheap, but automotive grade still seems quite expensive: https://www.cratustech.com/shop/lidar/

Those are single unit prices. The AT128 for instance, which is listed at $6250 there and widely used by several Chinese car companies was around $900 per unit in high volume and over time they lowered that to around $400.

The next generation of that, the ATX, is the one they have said would be half that cost. According to regulator filings in China BYD will be using this on entry level $10k cars.

Hesai got the price down for their new generation by several optimizations. They are using their own designs for lasers, receivers, and driver chips which reduced component counts and material costs. They have stepped up production to 1.5 million units a year giving them mass production efficiencies.


That model only has a 120 degree field of view so you'd need 3-4 of them per car (plus others for blind spots, they sell units for that too). That puts the total system cost in the low thousands, not the 200 to 400 stated by GP. I'm not saying it hasn't gotten cheaper or won't keep getting cheaper, it just doesn't seem that cheap yet.

Waymo does their LiDAR in-house, so unfortunately we don’t know the specs or the cost

We know Waymo reduced their LiDAR price from $75,000 to ~$7500 back in 2017 when they started designing them in-house: https://arstechnica.com/cars/2017/01/googles-waymo-invests-i...

That was 2 generations of hardware ago (4th gen Chrysler Pacificas). They are about to introduce 6th gen hardware. It's a safe bet that it's much cheaper now, given how mass produced LiDARs cost ~$200.


Otto and Uber and the CEO of https://pronto.ai do though (tongue-in-cheek)

> Then, in December 2016, Waymo received evidence suggesting that Otto and Uber were actually using Waymo’s trade secrets and patented LiDAR designs. On December 13, Waymo received an email from one of its LiDAR-component vendors. The email, which a Waymo employee was copied on, was titled OTTO FILES and its recipients included an email alias indicating that the thread was a discussion among members of the vendor’s “Uber” team. Attached to the email was a machine drawing of what purported to be an Otto circuit board (the “Replicated Board”) that bore a striking resemblance to – and shared several unique characteristics with – Waymo’s highly confidential current-generation LiDAR circuit board, the design of which had been downloaded by Mr. Levandowski before his resignation.

The presiding judge, Alsup, said, "this is the biggest trade secret crime I have ever seen. This was not small. This was massive in scale."

(Pronto connection: Levandowski got pardoned by Trump and is CEO of Pronto autonomous vehicles.)

https://arstechnica.com/tech-policy/2017/02/waymo-googles-se...


Less than the lives it saves.

Cheaper every year.

Exactly.

Tesla told us their strategy was vertical integration and scale to drive down all input costs in manufacturing these vehicles...

...oh, except lidar, that's going to be expensive forever, for some reason?


>At scale (like comma.ai), it's probably cheaper. But until then it's a long term cost optimization with really high upfront capital expenditure and risk.

The issue with comma.ai is that the company is HEAVILY burdened with Geohotz ideals, despite him no longer even being on the board. I used to be very much into his streams and he rants about it plenty. A large reason of why they run their own datacenter is that they ideologically refuse to give money to AWS or Google (but I guess Microsoft passes their non woke test).

Which is quite hilarious to me because they live in a very "woke" state and complain about power costs in the blog post. They could easily move to Wyoming or Montana and with low humidity and colder air in the winter run their servers more optimally.


Our preference for training in our own datacenter has nothing to do with wokeness. Did you read the blog post? The reasons are clearly explained.

The climate in Wyoming and Montana are actually worse in terms of climate. San Diego's climate extremes are less extreme than those places. Though moving out of CA is a good idea for power cost reasons, also addressed in the blog.


Wrong way to look at it.

Generally there are 2 types of human intelligence - simulation and pattern lookup (technically simulation still relies on pattern lookup but on a much lower level).

Pattern lookup is basically what llms do. Humans memorize the maps of tasks->solutions and statistically interpolate their knowledge to do a particular task. This works well enough for the vast majority of the people, and this is why LLMs are seen as a big help since they effectively increase your

Simulation type intelligence is able to break down a task into core components, and understand how each component interacts and predict outcomes into the future, without having knowledge beforehand.

For example, assume a task of cleaning the house:

Pattern lookup would rely on learned expereince taught by parents as well as experience in cleaning the house to perform an action. You would probably use a duster+generic cleaner to wipe surfaces, and vaccum the floors.

Simulation type intelligence would understand how much dirt / dust there is, how it behaves. For example, instead of a duster, one would realize that you can use a wet towel to gather dust, without ever having seen this used ever before.

Here is the kicker - pattern type intelligence is actually much harder to attain, because it requires really good memorization, which is pretty much genetic.

Simulation type intelligence is actually attainable by anyone - it requires much smaller subset of patterns to memorize. The key factor is changing how you think about the world, which requires realigning your values. If you start to value low level understanding, you naturally develop this intelligence.

For example, what would it take for you to completely take your car apart, figure out how every component works, and put it back together? A lot of you have garages and money to spend on a cheap car to do this and the tools, so doing this in your spare time is practical, and it will give you the ability to buy an older used car, do all the maintenance/repairs on it yourself on it, and have something that works well all for a lower price, while also giving you a monetizable skill.

Futhermore, LLMs can't reason with simulation - you can get close with agentic frameworks, but all of those are manually coded and have limits, and we aren't close to figuring out a generic framework for an agent that can make it do things like look up information, run internal models of how things would work, and so on.

So finally, when it comes to competing, if you chose to stick to pattern based intelligence, and you lose your job to someone who can use llms better, thats your fault.


At the longest timescale humans aren’t the best at either

I have yet to see a compelling argument demonstrating that humans have some special capabilities that could never be replaced


Sure. Its not a hardware problem though, but an algorithm problem. And to train something that behaves like a human can't be done with backpropagation in the way its implemented currently. You basically have to figure out how to train neural nets that are not only operating in parallel with scheduling, but also be able to iterate on their architecture.

>. 2A is to stop that situation from ever happening. Is the government starts shooting we will shoot back

The fact that ICE are still parading around on the street has put in a nail in the coffin that 2A is absolutely pointless.

If anything, USA citizens deserve to have their guns taken away forcibly just because they could use them but didn't.


The problem is that the people with guns also happen to be, by and large, the people who very much support what ICE is doing. Whereas those who oppose it have enthusiastically disarmed themselves.

Even more reason to take away guns from people.

Who is going to be doing that? The cops and other LEOs? Why would they take guns from their friends and neighbors on behalf of some liberal whom they hate?

Lol no.

Trump is gonna cancel or fuck with elections in 2026 like he has said multiple times he will, and by 2027 and 2028, he will likely install himself as 3d term president.

Its gonna be an era of economic decline and social dirtiness as shit gets worse and worse and eventually things like crime is gonna rise up again as the lower income sector transitions into the "nothing to lose" crowd.


The problem is, nobody is willing to use 2nd amendment rights to defend other amendments.

>buried in Part C is a provision requiring all 3D printers *sold or delivered in New York* to include “blocking technology”.

I.e don't buy your printer in New York. Pick it up out of state. Problem solved.

Yes, this is rent seeking, and yes New York is gonna New York, but not a big deal.


I would suspect flashing your firmware to the globally standard one would become commonplace if printers sold in NY came with a nerfed version.

On principle, yes, but also for maintenance. The nerfed firmware that's only required in a few jurisdictions is almost assuredly going to fall out-of-sync with mainline features.

"The rule saying you can't print the thing that you either weren't going to print, or you weren't going to let the rule tell you not to print, wants you to run old/broken software." No matter which side of that you fall on, you're upgrading the software.


I doubt any meaningful detection would be worth implementing just for New York, so you’ll get a cut down firmware that supports 5 hard coded models. You’ll need to flash your own firmware to print anything else.

No, it's not solved.

Goalpost will move to "save gcode on government-approved secured storage", licensing and registering each 3d printer, then confiscating the ones that are not whitelisted, etc etc.


This is the same story where every time you hear about some democratic run city/state implementing policy, everyone makes it out to be a step in the goal to get to 1984 Oceania.

This legislation is basically like a gold star on some politicians report card about preventing gun deaths. The impacted groups are allways gonna be niche, but it looks good to the overall public.


With these little steps that affect niche groups, we got to 2026, with total surveillance and very little freedoms.

Once the supreme court decision about Trump went out, I took all of my investments and put into a savings account.

When a dip happens, I simply take 10% of the money, buy the dip, then sell when the price hits pre dip.

So far Ive netted significantly more than any of my peers that actually do investing.


You have explained well how you are determining the point to sell. But how do you determine that "the dip" is now?

Anytime it dips due to some announcement, I buy (usually big cap stuff like AMZN, or index like VOO). Then I sell when the price goes back to what it was prior to announcement

Obvious problem - stock market could keep going down. Obvious improvement - stop limit sell orders. Obvious flaw in the story - many common stocks like Google have doubled in the past year.

So if you had a pension mostly in all world indexed funds, you'd switch them over to "boring" investments like cash.

Interesting. Do you buy indexes, or ?

Mostly VOO and others. But I do buy big cap stock as well.

Its standard Republican playbook.

Get in power, enrich themselves, kick the can down the road to Democrats, then blame the Democrats for poor economy.

This is why ironically Trump cancelling elections and installing himself as a 3d term president would actually be good. People need to see that no matter how bad they think things were under Democrats, it can get much much worse. Say goodbye to your house value and 401k plans for retirement, you gonna be a wage slave well into your 60s, but hey, at least we fixed "wokeness"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: