I think migrating away from office is not realistic for many companies. They run on Excel, and just breaking one vital table because it uses a macro can cost much more money than the entire office suite for many users over years. The Munich story was doomed to fail.
(Of course there are lots of things that work, the open source mail or Zoom or Slack replacements are totally fine in my opinion.)
Grok-2 is rank 2 on LLM arena, it's basically as good as the best Gemini model. They already caught up. Only the latest ChatGPT model is a tiny bit better.
It will likely be amazing, Sam Altman said that the step between 4 and 5 will be like the one between 3.5 and 4. You can of course doubt him, but we'll see...
I guess it will be this year, some guy working at OpenAI already posted "4+1=5" on Twitter, which is suggestive.
Wikimedia is unfortunately becoming one of the worst places to give your money to. They have their closed-source infrastructure, which now for years has not been able to generate HTML dumps without a significant percentage of articles missing.
They have know of the bug for ages, but still...
WMF don't care about their products at all, they give much more funding to vanity workshops in Africa while utterly ignoring the requests of the Wiktionary community, for example.
There is also PyApp, which I think is really promising. The docs there are not that comprehensive yet and maybe a bit confusing, but the packaged programs usually work out of the box, unlike with pyinstaller.
>executing large-scale changes in entire repositories in 3 years
You can look at SWE-Agent, it solved 12 percent of the GitHub issues of their test dataset. It probably depends on your definition of large-scale.
This will get much better, it is a new problem with lots of unexplored details, and we will likely get GPT-5 this year, which is supposed to be a similar jump in performance as from 3.5 to 4 according to Altman.
This is a laughable definition of large-scale. It's also a misrepresentation of that situation: It was 12% of issues in a dataset for the top 5000 repositories pypy packages. Further "solves" is a incredibly generous definition, so I'm assuming you didn't read the source or any of the attempts to use this service. Here's one where it deletes half the code and replaces network handling with a comment to handle network handling: https://github.com/TBD54566975/tbdex-example-android/pull/14...
"this will get much better" is the statement I've been hearing for the past year and a half. I heard it 2 years ago about the metaverse. I heard it 3 years ago about DAOs. I heard it 5 years about block chains...
What I do see is a lot more lies. Turns out things are zooming along at the speed of light if you only read headlines from sponsored posts.
We unfortunately have no idea what they consider a success! That's just one of the most recent ones by some random user who wanted to use the program in the real world.