In a response about the Turing test on this site, LLaMa 2 used the phrase “to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human” which appears to be copied verbatim from the first sentence of the Wikipedia article on the subject (as well as quite a few other pages in Google). Makes me wonder how many of the responses are just repeating and rephrasing memorized content written by humans, which will of course appear better, while ChatGPT makes more effort to avoid this (and might be able to generalize better to things it hasn’t memorized?).
At least in my examples, the llama output was more verbose/comprehensive. Sometimes ChatGPT didn't expand enough, sometimes Llama missed the mark entirely (eg explaining the Eiffel's architecture.)
Interesting exercise, and llama won for me with 1 GPT answer… but it would be VERY easy to cherry pick these results and select a winner for most people.
Surprisingly, LLaMa 2 won 5-0 for me.