More

Vuizur · 2025-10-13T07:40:59 1760341259

I think migrating away from office is not realistic for many companies. They run on Excel, and just breaking one vital table because it uses a macro can cost much more money than the entire office suite for many users over years. The Munich story was doomed to fail. (Of course there are lots of things that work, the open source mail or Zoom or Slack replacements are totally fine in my opinion.)

Vuizur · 2025-05-20T19:24:22 1747769062

It is not very good at hard tasks, its ranking is much worse there.

moneywoes · 2025-05-21T14:38:36 1747838316

sorry, any examples of hard tasks

Vuizur · 2025-05-20T12:52:47 1747745567

The next question is if LLMs are actually more sexist than the average human working in HR. I am not so sure...

mpweiher · 2025-05-20T14:43:37 1747752217

Evidence is: no.

Ancapistani · 2025-05-20T18:27:19 1747765639

I also believe this to be the case, but would love something more solid than my own opinion/perception.

Are you aware of any studies showing this?

Vuizur · on Oct 7, 2024

No, for this you need https://github.com/rmcrackan/Libation

kranner · on Oct 7, 2024

Can do it with plain old ffmpeg:

https://news.ycombinator.com/item?id=23541424

Vuizur · on Sept 5, 2024

Grok-2 is rank 2 on LLM arena, it's basically as good as the best Gemini model. They already caught up. Only the latest ChatGPT model is a tiny bit better.

Vuizur · on Aug 18, 2024

I once wanted to compile a program that used Xapian on Windows. It was basically impossible for mortals.

Imo people should use cross-platform alternatives.

Vuizur · on June 5, 2024

It will likely be amazing, Sam Altman said that the step between 4 and 5 will be like the one between 3.5 and 4. You can of course doubt him, but we'll see...

I guess it will be this year, some guy working at OpenAI already posted "4+1=5" on Twitter, which is suggestive.

Vuizur · on May 24, 2024

Wikimedia is unfortunately becoming one of the worst places to give your money to. They have their closed-source infrastructure, which now for years has not been able to generate HTML dumps without a significant percentage of articles missing.

They have know of the bug for ages, but still...

WMF don't care about their products at all, they give much more funding to vanity workshops in Africa while utterly ignoring the requests of the Wiktionary community, for example.

Vuizur · on May 24, 2024

There is also PyApp, which I think is really promising. The docs there are not that comprehensive yet and maybe a bit confusing, but the packaged programs usually work out of the box, unlike with pyinstaller.

Vuizur · on April 29, 2024

>executing large-scale changes in entire repositories in 3 years

You can look at SWE-Agent, it solved 12 percent of the GitHub issues of their test dataset. It probably depends on your definition of large-scale.

This will get much better, it is a new problem with lots of unexplored details, and we will likely get GPT-5 this year, which is supposed to be a similar jump in performance as from 3.5 to 4 according to Altman.

krainboltgreene · on April 29, 2024

This is a laughable definition of large-scale. It's also a misrepresentation of that situation: It was 12% of issues in a dataset for the top 5000 repositories pypy packages. Further "solves" is a incredibly generous definition, so I'm assuming you didn't read the source or any of the attempts to use this service. Here's one where it deletes half the code and replaces network handling with a comment to handle network handling: https://github.com/TBD54566975/tbdex-example-android/pull/14...

"this will get much better" is the statement I've been hearing for the past year and a half. I heard it 2 years ago about the metaverse. I heard it 3 years ago about DAOs. I heard it 5 years about block chains...

What I do see is a lot more lies. Turns out things are zooming along at the speed of light if you only read headlines from sponsored posts.

rsynnott · on April 30, 2024

> Here's one where it deletes half the code and replaces network handling with a comment to handle network handling

... Wait, that's not one that they considered a _success_, is it? Like, one of the 12%?

krainboltgreene · on April 30, 2024

We unfortunately have no idea what they consider a success! That's just one of the most recent ones by some random user who wanted to use the program in the real world.