Hacker Newsnew | past | comments | ask | show | jobs | submit | Vuizur's commentslogin

I think migrating away from office is not realistic for many companies. They run on Excel, and just breaking one vital table because it uses a macro can cost much more money than the entire office suite for many users over years. The Munich story was doomed to fail. (Of course there are lots of things that work, the open source mail or Zoom or Slack replacements are totally fine in my opinion.)


It is not very good at hard tasks, its ranking is much worse there.


sorry, any examples of hard tasks


The next question is if LLMs are actually more sexist than the average human working in HR. I am not so sure...


Evidence is: no.


I also believe this to be the case, but would love something more solid than my own opinion/perception.

Are you aware of any studies showing this?



Can do it with plain old ffmpeg:

https://news.ycombinator.com/item?id=23541424


Grok-2 is rank 2 on LLM arena, it's basically as good as the best Gemini model. They already caught up. Only the latest ChatGPT model is a tiny bit better.


I once wanted to compile a program that used Xapian on Windows. It was basically impossible for mortals.

Imo people should use cross-platform alternatives.


It will likely be amazing, Sam Altman said that the step between 4 and 5 will be like the one between 3.5 and 4. You can of course doubt him, but we'll see...

I guess it will be this year, some guy working at OpenAI already posted "4+1=5" on Twitter, which is suggestive.


Wikimedia is unfortunately becoming one of the worst places to give your money to. They have their closed-source infrastructure, which now for years has not been able to generate HTML dumps without a significant percentage of articles missing.

They have know of the bug for ages, but still...

WMF don't care about their products at all, they give much more funding to vanity workshops in Africa while utterly ignoring the requests of the Wiktionary community, for example.


There is also PyApp, which I think is really promising. The docs there are not that comprehensive yet and maybe a bit confusing, but the packaged programs usually work out of the box, unlike with pyinstaller.


>executing large-scale changes in entire repositories in 3 years

You can look at SWE-Agent, it solved 12 percent of the GitHub issues of their test dataset. It probably depends on your definition of large-scale.

This will get much better, it is a new problem with lots of unexplored details, and we will likely get GPT-5 this year, which is supposed to be a similar jump in performance as from 3.5 to 4 according to Altman.


This is a laughable definition of large-scale. It's also a misrepresentation of that situation: It was 12% of issues in a dataset for the top 5000 repositories pypy packages. Further "solves" is a incredibly generous definition, so I'm assuming you didn't read the source or any of the attempts to use this service. Here's one where it deletes half the code and replaces network handling with a comment to handle network handling: https://github.com/TBD54566975/tbdex-example-android/pull/14...

"this will get much better" is the statement I've been hearing for the past year and a half. I heard it 2 years ago about the metaverse. I heard it 3 years ago about DAOs. I heard it 5 years about block chains...

What I do see is a lot more lies. Turns out things are zooming along at the speed of light if you only read headlines from sponsored posts.


> Here's one where it deletes half the code and replaces network handling with a comment to handle network handling

... Wait, that's not one that they considered a _success_, is it? Like, one of the 12%?


We unfortunately have no idea what they consider a success! That's just one of the most recent ones by some random user who wanted to use the program in the real world.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: