Hacker Newsnew | past | comments | ask | show | jobs | submit | kamranjon's commentslogin

I haven't quite figured out if the open weights they released on huggingface amount to being able to run the (realtime) model locally - i hope so though! For the larger model with diarization I don't think they open sourced anything.

The HF page suggests yes, with vllm.

> We've worked hand-in-hand with the vLLM team to have production-grade support for Voxtral Mini 4B Realtime 2602 with vLLM. Special thanks goes out to Joshua Deng, Yu Luo, Chen Zhang, Nick Hill, Nicolò Lucchesi, Roger Wang, and Cyrus Leung for the amazing work and help on building a production-ready audio streaming and realtime system in vLLM.

https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

https://docs.vllm.ai/en/latest/serving/openai_compatible_ser...


I think it's driven from an understanding that being constantly connected is potentially unhealthy for ones mental health. Not saying that's 100% true but I think enough people feel it that they are looking for solutions in things like dumb phones, digital detox, purpose-built devices etc.

I understand your lament about people just being able to be bored, but I think this gets at something deeper, that when we aren't bored and do want to do something, we are often distracted because of the nature of smart phones.


Since switching to a dumb phone I've gone down the same path. I already was an avid photographer but I've added a typewriter and a nice pen/notepad which has gotten me writing again. I've also read more books than I've read at any other time in my life, it's really incredible how much time I was wasting.

Which dumb phone? I would like to try similar, but there are always certain Android apps I need (banking apps like BankID, payment apps). Do you get a "pure" dumb phone and then a separate Android device for when it's required, or a dumb phone which does a little bit of Androiding?

I got a Mudita Kompakt for similar reasons, I can sideload the few apps that I really need. The one I use the most is Kiwix, as I've downloaded the entirety of english wikipedia, so I can basically look anything up (useful when reading) without requiring an internet connection. I also have signal on there. I really like it and it's my dedicated phone now.

I was happy to discover that both BankID, Swish and Handelsbanken are all fine with rooted android devices. Hence it's possible to get a degoogled phone and delete all apps that aren't required, such as a web browser and play store.

Get a Mudita Kompakt. You won't regret it!

I think one of the values of (what appears to be) AI generated projects like this is that they can make me aware of the underlying technology that I might not have heard about - for example WebTorrent: https://webtorrent.io/faq

Pretty cool! Not sure what this offers over WebTorrent itself, but I was happy to learn about its existence.


Asahi is one of the projects I support monetarily cause I really hope that one day I can run linux natively on my M4 max with GPU acceleration. They did an amazing job with M1 and M2 - great to see they are still pushing forward after the departure of Alyssa Rosenzweig, who did a lot of the work on the GPU support for those.

Edit: Here is their donation page if you're interested in chipping in as well: https://opencollective.com/asahilinux


It is worth noting the distinction between display acceleration and compute support here. While the desktop rendering is impressive, for local AI or LLM inference the Linux stack on M-series is still significantly behind Metal/MPS on macOS. I tried to switch my local dev environment over recently but without a mature compute stack it is hard to justify leaving macOS if you need to run models locally.

of course, that's only relevant if you do intend to run models locally. which, up to very recently, would have been roughly 0% of mac users.

While the M series hardware is impressive and the Asahi project is doing miracles, I myself don't want to support Apple in any way, including buying any of their hardware.

They are also doing a lot of generic work that benefits the ARM platform as a whole. And since Snapdragon X is a fucking mess on Linux, these Apple Silicon devices are actually some of the best cheap hardware you can buy with excellent performance.

You can always get it second hand

While that does support them less, it still drives up the value of their hardware and thus the amount of money others are willing to give Apple for it.

Saving a computer from a landfill is not driving up apple’s margins.

I wonder if it was trained on anime dubs cause all of the examples I listened to sounded very similar to a miyazaki style dub.


scroll down to the second to last group, the second one down is obama speaking english, the third one down is trump speaking japanese (a translation of the english phrase)

besides, they know what side their bread is buttered on. I feel like this is almost not the real announcement; or, the engineers that wrote this up and did the demos just ran it that way. The normal speech voices are fine (lower than the anime ones on the page.) i agree that the first few are very infantile. I'll change that word if i can think of a better one.


I read the release but didn't quite understand the difference between a next-edit model and a FIM model - does anyone have a clear explanation of when to use one over the other? I'd love if there was a sublime plugin to utilize this model and try it out, might see if I can figure that out.


I was curious as well and wanted to try how this work, so I asked claude to create a plugin for that. This utilizes built-in autocomplete behavior. If you want to give it a try then feel free to have a look here https://github.com/lumnn/AItoComplete (did not push it to packagecontrol yet)


I’m going to speculate a bit here, FIM may stand for something-in-the-middle?

I know there are the original autocomplete models that simply complete the endings. Then there are Cursor like models capable of editing/filling text between blocks of code. In essence, they look at both the text before the insertion point and after it - then find the best fitting completion in the middle. My guess is FIM is the latter.


As you said. Fill-in-the-middle.


We have an explanation here: https://blog.sweep.dev/posts/next-edit-jetbrains#next-edit-a...

But basically suggesting changes away from your cursor position


I have an Olivetti Lettera 22 typewriter and it's the perfect machine, just immaculately designed. But one thing that absolutely floors me and I have no idea how they did it - is they have infinite, programmable tab stops on a completely manual machine (no electricity). So you can set as many tab stops as you want, and then hit the tab key and it will jump between all of the stops in order. It's great for creating lists of things or for creating simple tables. How the machine is able to remember your settings, and allow you to jump between and clear you tab stops completely mechanically is just so cool to me and seems like a marvel of engineering.


That's a pretty crazy requirement for something to be "useful" especially something that runs so efficiently on cpu. Many content creators from non-english speaking countries can benefit from this type of release by translating transcripts of their content to english and then running it through a model like this to dub their videos in a language that can reach many more people.


You mean youtubers? And have to (manually) synchronise the text to their video, and especially when youtube apparently offers voice-voice translation out of the box to my and many others' annoyance?


YouTube's voice to voice is absolutely horrible though. Having the ability for the youtubers to clone their own voice would make it much, much more appealing.


Uh, no? This is not at all an absurd requirement? Screen readers literally do this all the time, with voices that are the classic way of making a speech synthesizer, no AI required. ESpeak is an example, or MS OneCore. The NVDA screen reader has an option for automatic language switching as does pretty much every other modern screen reader in existence. And absolutely none of these use AI models to do that switching, either.


They didn’t say it was a crazy requirement. They said it was crazy to consider it useless without meeting that requirement.


That doesn't really change what I said though. It isn't crazy to call it useless without some form of ALS either. Given that old school synthesis has been able to do it for like 20 years or so.


How does state of the art matter when talking about usefulness? Is old school synthesis useless?


No? But is it not unreasonable to expect "state of the art" TTS to be able to do at least what old school synthesis is capable of doing? Being "state of the art" means being the highest level of development or achievement in a particular field, device, procedure, or technique at a specific point in time. I don't think it's therefore unreasonable to expect supposed "state of the art" text-to-speech synthesis to do far better at everything old-school TTS could do and then some.


> Being "state of the art" means being the highest level of development or achievement in a particular field, device, procedure, or technique at a specific point in time. I don't think it's therefore unreasonable to expect supposed "state of the art" text-to-speech synthesis to do far better at everything old-school TTS could do and then some.

Non sequitur. Unless the 'art' in question is the 'art of adding features', usually this phrase is to describe the quality of a very specific development, these are often not even feature complete products.


You posted the code to a public blog page, with no attribution in the code or request of attribution from others, no license, and seemingly intended to share it freely with the world.

Then you got an apology, and a second apology.

I'm confused about what you think you're owed?

The explanation makes perfect sense, the headers were obviously just copied with no malicious intent. What is it that is still bothering you about this?


> no license, and seemingly intended to share it freely with the world

No license means you don’t intend to share it “freely”, since you didn’t share any rights. By default, you don’t own things people shared on the internet just because it’s there.

That being said I’ve even seen people with licenses in their repos who get mad when people used their code, there’s just no telling and it’s best to just treat random sources of code as anathema.


Per Eli's own comment here, the original copied code was straight up public domain and thus does not even require attribution.

https://github.com/Modernizr/Modernizr/pull/684#issuecomment...


Correct. He did not commit copyright infringement. Just plagiarism.


I'm curious if you would have the same opinion about code shared on stack overflow?


I think GP is referring to the fact that an author’s work is copyright protected by default, and a license is needed to permit others to use freely [1]. StackOverflow posts are licensed under CC BY-SA 4.0 [2].

[1]: https://www.copyright.gov/help/faq/faq-general.html

[2]: https://stackoverflow.com/help/licensing

(Disclaimer: Just commenting on GP’s statement about “no license”, not on the specific disagreement or apology mentioned above which I am unfamiliar with.)


It's worth noting that the code in question was also open sourced and permissively licensed by the original author as he stated in the thread[1]. I guess this isn't really about licensing at all, just the original author seems to think it was rude, and also doesn't want to accept any of the apologies that have been offered.

[1]: https://github.com/Modernizr/Modernizr/pull/684#issuecomment...


> with no attribution in the code or request of attribution from others, no license, and seemingly intended to share it freely with the world

The bottom of every page on my blog has a copyright link that you can follow. I dedicated the code to the public domain. I never made a copyright claim. I simply asked Addy to not claim to authorship of the code.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: