Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is a cool website where you can blind judge the outputs from LLaMa 2 vs ChatGPT-3.5: https://llmboxing.com/

Surprisingly, LLaMa 2 won 5-0 for me.



I got the opposite result. ChatGPT-3.5 won 5-0 for me. For me, LLaMa 2 gave longer answers that sometimes strayed away from the original question.

They both gave great answers overall though.


Same for me. Interesting..


In a response about the Turing test on this site, LLaMa 2 used the phrase “to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human” which appears to be copied verbatim from the first sentence of the Wikipedia article on the subject (as well as quite a few other pages in Google). Makes me wonder how many of the responses are just repeating and rephrasing memorized content written by humans, which will of course appear better, while ChatGPT makes more effort to avoid this (and might be able to generalize better to things it hasn’t memorized?).


Thanks for the link!

At least in my examples, the llama output was more verbose/comprehensive. Sometimes ChatGPT didn't expand enough, sometimes Llama missed the mark entirely (eg explaining the Eiffel's architecture.)


all the shorter answers were from GPT-3 - if you like long answers you pick llama 2...


Interesting exercise, and llama won for me with 1 GPT answer… but it would be VERY easy to cherry pick these results and select a winner for most people.


It was much closer to me. But llama 2 did surprisingly good. It’s looks like it’s a great alternative of chatGPT 3.5.


Pretty cool. ChatGPT won the first one for me, then Llama 2 won the next five.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: