I think audio model will be much more sensitive to input issues relative to text or art. Humans are very good at picking up the nuances in audio and also process it very quickly. I wonder how far we are from being able to manipulate the emotions of how something sounds. In my opinion, that's the turing test for any audio generative AI. Native speakers will immediately know when something is AI generated or adjusted for the same reason they immediately detect accents.
I am curious what kind of audio repair AI models are being worked to help make outputs sound more natural. This research feels like progress towards that goal as well.
Possibly weird question, but have there been any attempts at modeling this sort audio model specifically where tokens aren't defined by its audio, but instead by the movement of the tongue/mouth/lips/vocal chords, etc?
I think this is off-topic since it's not discussing the content of post. But I agree it's worth discussing.
I don't agree, I think ChatGPT may have been part of the editing process but not in the primary draft. The intro is complex in a way that I haven't ever noticed in ChatGPT output. I have a suspicion that participating in academic-ish/research discussions pushes you to use very specific language. This matches the goal for ChatGPT, to produce neutral and clinical sounding answers.
I had some existential concerns about not knowing whether I was just AI, and read similar sentiment from others in some other AI posts. ChatGPT describes its personality using the same descriptions people used to describe my work in the past. There's no meaningful insight you can gain from that parallel, other than that it's good at mirroring the sentiment of the user. [1]
I put my own past HN comments in the AI detectors from OpenAI and GPTZero, and I got a few false positive where it indicated "possibly" to "completely" written by AI. The comments are from years ago. In case you're curious, I didn't use AI in any part of the process of writing and edit this comment. I'm not using grammerly or even spellcheck, so excuse any mistakes you find. I've been thinking a lot about the 1995 Ghost in the Shell: "What if a cyber brain could possibly generate its own ghost, create a soul all by itself? And if it did, just what would be the importance of being human then?”
Those pics are out of order, I asked the house question first, then sharing traits, then about itself, then fictional characters. I didn't intentionally try to guide its answers.
Dude, even that comment could sound a bit ChatGPT-ish, but as you said, it's because of the tone and style expected when constructing an argument or exposing a topic in a certain group. It's mostly how you flow from one idea to the next, logically and in a timeline, without going back and forth, using correct punctuation.
That and just like culture; if you interact a lot with a group (and now add similar sounding LLMs to the mix) you end up absorbing characteristics of them, then imitating some without a second thought. That's humans, sorry if you wanted to be an AI.
It's interesting when I asked for style corrections on passages from past college essays or formal letters, ChatGPT sometimes makes a nice improvement and I could learn a thing or two from it, but it always ends up trying to add formalities and flow modifications that get in the way instead of being precise.
The difference between those thought experiments and the LLMs I'm working with, is the fact that I can test LLMs and myself empirically. I don't feel that I can tell apart human and AI writing with more than 70% confidence. As these models get better, I feel it's going to approach 0%. And especially for writing on topics I am not familiar with. Since the simulation hypothesis arguments aren't directly testable, they don't feel scientific. I personally also have problems with the implication of the existence of conscious greater power. And so it doesn't give me the same existential panic thinking about the future of AI generated content does. I never took the time to read the original paper by Bostrom, thank you for the link.
Edit: I should have used the word disprovable instead of testable.
In a same vein, you can argue then you can't empirically test that every proton in your hand has, say, three quarks or whatever, as you loose your hand. You then might say that all knowledge about protons in your particular hand isn't directly testable and therefore doesn't feel scientific. A lot of things aren't directly testable but can still be scientific (as with simulation argument). BTW, I haven't read the full paper either.
Good catch! The author's name "Heorhii Skovorodnikov" sounds Ukrainian and he describes himself as a graduate of NYU Abu Dhabi on his Github page. Looks like he is not a native english speaker and that might explain the need to use ChatGPT in the first place. The verbose sounding explanations are becoming kind of dead giveaway of ChatGPT use in the wild.
I am curious what kind of audio repair AI models are being worked to help make outputs sound more natural. This research feels like progress towards that goal as well.