Hacker Newsnew | past | comments | ask | show | jobs | submit | d4l3k's commentslogin

We want to be tolerant to application bugs and host/GPU failures that can be solved by replacing/restarting the machine. External services and network failures we don't have much control over so aren't aiming to solve that.

For specific types of failures check out the section on "Reliability and Operational Challenges" from the Llama 3 paper https://ai.meta.com/research/publications/the-llama-3-herd-o...


Let me know how it goes! If you're interested in chatting / run into any problems feel free to reach out via the links in my profile


Hey Tim, how's it going?

Interested in lending PyTorch some compute? :)

torchft can handle much larger scales but for public multi-day demonstration run this is what we had available. Point of this blog was to demonstrate correctness of the quorum algorithm and recovery with a stock PyTorch stack and not so much peak flops.

Stay tuned though -- planning on doing some much larger demos on B200s!


Hey, nice to see this here!

I'm the primary author so happy to answer any questions you might have!


Why isnt there more investments into semi-synchronous training - is it that the convergence is iffy ? Also, it would be great to refactor this code into a typed language, so it is easier to reason about and maintain.


Recently there's been a lot of interest and improvements in semi-synchronous training. The Streaming DiLoCo paper came out this year and is a big step forward for datacenter semi-sync.

Historically it's been limited to areas like federated learning for low power/low network training but with the massive increase in number of GPUs it's becoming relevant even for training in datacenters.

It is another variable ML researchers have to tune so does add some complexity and I expect most folks just aren't familiar with it yet.

On "typed language": all of torchft is typed! The coordination/quorum layers are written in Rust w/ GRPC and the front-end is typed Python with Pyre since it has to interact with PyTorch and model code.


thanks !, I am curious how this relates to the recent "monarch" announcement - which has similar goals of facilitating large scale fault tolerant training [1].

[1] https://github.com/pytorch-labs/monarch/issues/175#issuecomm...


We're working on making these composable. torchft is largely focused on the model integration and algorithms where as Monarch is handling more of the orchestration/monitoring. They operate at a bit of a different layer but the plan is to have torchft have the fault tolerant algorithms that can be used both in Monarch or a standard PTD job


It seems to work just fine with SIP enabled. I just switched and it seems to be a lot better than Amethyst. Amethyst had a lot of issues with focus follows mouse and dropdown dialogs that seems to just work with Yabai

Seems like SIP is only needed for system dialogs etc so has the same limitations as Amethyst


If the map can't talk to Tesla it'll use Google maps directly. I usually don't allow connections to Tesla on my rooted Model 3


I also would like to subscribe to your newsletter.


I've got a blog if you're interested haha https://fn.lc/post/

I've been hacking on my car and creating my own self driving models

Code is at https://github.com/d4l3k/torchdrive


Very cool, am going to eat this up. FYI some of your images won't load for me, shoots me a 502 here https://fn.lc/post/diy-self-driving/


Not sure why they aren't loading, seem to be fine now

They're also at https://github.com/d4l3k/fn.lc/tree/master/static%2Fdiy-self...


Is that legal?


Is getting married at 15 in Georgia?


How does this work with their charging network? Are you still able to use their chargers, or are you stuck with home charging & third parties?


Supercharger auth is between the car and the charger and doesn't require an internet connection. I get billed the normal way via my Tesla account since the VIN is registered


Oh no, don't give them ideas. It'll become the HP instant ink of car charging


Your L2 charging wire is low on copper, please replace the entire cable.


Hoe did you root yours? Did you lose out on any functionality?


There's some functionality loss but it's mostly been mitigated. I have a custom app I wrote since I can't use the stock app.

The one feature I miss is that there's no voice commands since that requires Tesla's servers but at the same time I also haven't been bothered enough to plug in a custom backend


wait

So the company that goes "we don't need physical buttons since we have voice commands" also goes "you don't need those in underground parkings"?!


It’s ok, the voice commands are barely understood anyway. At least in the UK they aren’t. Gets it drastically wrong and messes up your navigation destination, because you asked it to open the glovebox “navigating to Columbia”


Are there api keys for google maps in the car? Or does it emulate some client like a browser or android phone?


I just tried to set this up and couldn't. Seems like it's invite only with a waitlist :/


Yeah, we're adding people slowly because decentralized authorities like the one that tailnet lock implements can have nasty failure modes, e.g. some bug that prevents any new addition to the tailnet at all and forces manual recovery on each of your devices separately. So, we're putting miles on it with a little care, and making sure folks who sign up are aware of the current limitations and risks.


Oh is that all the problem is?

Anyone with automated deployments and self provisioning should be fine with that risk. I thought it was a lot more premature than this.


Good ops is more than automated deployments. Complex systems have complex failure modes.


If you're excited about tailnet lock and want to get on the alpha sooner rather than later, feel free to drop me an email. As Dave mentioned we are slowly crunching through the waitlist to get some miles in, but I'm also happy to take on enthusiastic testers ahead of that!

You can email me at tom@ (tailscale dot com)


Adding port forwarding to Mosh has a $600 bounty -- highest OSS bounty I've ever seen

https://www.bountysource.com/issues/4471419-ssh-port-forward... https://github.com/mobile-shell/mosh/issues/337


On high bounties, Qubes OS has a $6500 bounty for GNOME support https://www.bountysource.com/issues/31778112-add-support-for...


As someone that used to use SSH port forwarding, I have a recommendation that may be a suitable alternative to the lack of port forwarding in Mosh, as well as being an alternative to port forwarding over SSH. Wireguard! This is what I do instead of port forwarding over SSH since quite a while back now.

I run a Wireguard VPN on a VPS, and have machines connect to that VPN. This allows me to reach the machines on the VPN from almost anywhere in the world. Recently I changed the port that Wireguard is listening on to port 443 UDP, which also allows me to connect to my VPN from a few public WLANs that are very restrictive on which ports they allow outbound traffic to.

Wireguard is super easy to configure and run, and very secure.

Definitely give Wireguard a go. It's open source and awesome.


I think you could setup something like this on the fly too without root access. I’m not entirely sure, but a while back fly.io published [1] an article talking about how they use wireguard-go [2] to do something similar in user space. I might even try this too…

[1] https://fly.io/blog/ssh-and-user-mode-ip-wireguard/

[2] https://git.zx2c4.com/wireguard-go/about/


there is a fork with port forwarding support https://github.com/rinne/mosh and a PR with a long discussion https://github.com/mobile-shell/mosh/pull/696 on why it's not merged

you can compile them yourself or if you want to skip the step I recently set up GitHub actions to compile linux binaries of this [1][2], tested by a sample of 1 so no guarantees it works, was planning on doing a tap PR/tap of it at some point

also the official developers have been involved a project to solve this while improving the whole-agent approval things also https://github.com/StanfordSNR/guardian-agent , but I couldn't get it to work which is why I tried the fork and got that working

[1] https://github.com/gnyman/mosh/actions/runs/1068715036 [2] https://github.com/gnyman/mosh/actions/runs/1068715035


> a PR with a long discussion https://github.com/mobile-shell/mosh/pull/696 on why it's not merged

I'm confused. I read the whole thing but couldn't find the specific reason for why it's not been merged. But I assume it's because of the things that were pointed out in the code review comments?

Also, the issue you linked is about SSH Agent forwarding, not port forwarding.


Yes you are 100 correct, I mixed up port and agent forwarding, I’ve needed both at different times and last time it was agent forwarding so got confused.

There is another issue for port forwarding https://github.com/mobile-shell/mosh/issues/337 but no PR that I’m aware of.

Regarding why it hasn’t been merged, there is a comment on the port forwarding issue which sums it up quite well I think https://github.com/mobile-shell/mosh/issues/337#issuecomment...

My understanding is that the maintainers prefer doing one thing well (and securely). Which to be honest is something I really appreciate even if it means I might have to figure out some agent and port forwarding workaround :-/ at least I don’t have to worry about if my version of mosh will work with whatever the server runs


Lack of SSH agent forwarding is unfortunately the deal breaker for me..


They measure torque on the wheel from the drivers hand. It is possible to fool via defeat devices etc (ex www dot autopilotbuddy dot com).

There is a WIP system that uses the selfie camera to monitor the driver but it's still possible to fool (image taped in front or block it with tape etc) so unlikely it can catch all cases of drivers being willfully being dangerous. https://twitter.com/greentheonly/status/1379928419136339969


They are working on a camera based solution though it's imperfect. You can see examples of it running at https://twitter.com/greentheonly/status/1379928419136339969


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: