Hacker Newsnew | past | comments | ask | show | jobs | submit | ben_w's commentslogin

Art can be about that, but if it was only about that for everyone, we would never have had Britney Spears autotuned so hard that my mother assumed she was already listening to a soulless computer when she first heard Spears in c. 2000.

Me, I see the patterns too fast to even care for a second play of recorded music from a real human, only theme songs for nostalgia-inducing shows have enough of an emotional kick to get past that.

GenAI music has all the same problems as GenAI images (try asking Suno for "Just fox noises" to see what happens out of distribution), but collectively it has at least been a bit harder for me to spot the pattern behind them in aggregate, even if each song by itself still has the same problem for me as any other recording.


> what is insane is that everyone just accepts it, knows that this happens, and dont go lynch the ones in charge immediately.

For a while, my pet conspiracy theory was that this was Epstein's real cause of death: a lynching by a prison guard made to look like suicide.

I never took it too seriously, because no actual evidence; now I'm more inclined to think it was a coconspirator hoping it would mean no more evidence getting out.


Epstein being murdered is the one conspiracy that I personally still think may be possible/probable.

All it takes is a single actor paying off some guards to ‘fall asleep’, a camera to be disabled, and a 15 minute window of opportunity. It’s much more probable than something like the US Government planning 9/11 and somehow keeping thousands of co-conspirators silent.

I don’t really spend a whole lot of time thinking about it since as you said, we’ll never know for sure. It just seems at least probable if he actually did have kompromat on powerful people.


Did you see this? https://www.cbsnews.com/news/epstein-files-jail-cell-death-v...

The noose they found in his cell was not the thing that strangled him. If he wasn't murdered then they faked his death.


Perhaps, but the same argument also works for "Communist" and "Socialist" and so on.

A former partner's mother was once called a liberal or a democrat for something minor, I forget what now possibly asking for a tip, and her response was "No sir, I'm a communist".

I'm never actually delved too deeply into the mother's political views, but my ex herself was openly, explicitly, literally, a communist.

(For various reasons, even though said ex is American by birth, I suspect her well-documented politics may now cause me difficulty entering the USA were I to attempt it).


> Now we just need a way of packing the DNA strings into blank cells reliably.

Huh, I kinda assumed we'd already done that part with Dolly the sheep. But I'm not a biologist, I just saw headlines.


Free speech followed by a paywall and "What to Read Next".

The English and Americans, as the saying goes, are two peoples divided by a common language.

And as the other saying goes, Americans have no sense of irony.


When I used it before Christmas (free trial), it very visibly paused for a bit every so often, telling me that it was compressing/summarising its too-full context window.

I forget the exact phrasing, but it was impossible to miss unless you'd put everything in the equivalent of a Ralph loop and gone AFK or put the terminal in the background for extended periods.


For me, it depends on how well-disguised the ad is. Ads quietly sitting there, informing? Those I blank out. The big flashy animations? Those make me switch to reader mode, or leave the domain entirely.

I do sometimes find I'm accidentally clicking on the ads at the top of search engine results, though for this case it's extra ironic as the ad is for the real thing I'm searching for which is 2 results further down the list, and I only realise I clicked on an ad when the link goes via an ad-tracking domain that I block.

I've recently been fooled by an ad in reddit that was pretending to be news, which took me to a fake BBC website. First hint, I also block the BBC domain (nothing wrong with them, it's just a habit I want to get out of given I don't live in the UK any more).


Why's that the issue?

"This AI can do 99.99%* of all human endeavours, but without that last 0.01% we'd still be in the trees", doesn't stop that 99.99% getting made redundant by the AI.

* vary as desired for your preference of argument, regarding how competent the AI actually is vs. how few people really show "true intelligence". Personally I think there's a big gap between them: paradigm-shifting inventiveness is necessarily rare, and AI can't fill in all the gaps under it yet. But I am very uncomfortable with how much AI can fill in for.


Here's a potentially more uncomfortable thought, if all people through history with potential for "true intelligence" had a tool that did 99% of everything do you think they would've had motivation to learn enough of that 99% to give insight into the yet discovered.

> Dumping tokens into a pile of linear algebra doesn't magically create sentience.

More precisely: we don't know which linear algebra in particular magically creates sentience.

Whole universe appears to follow laws that can be written as linear algebra. Our brains are sometimes conscious and aware of their own thoughts, other times they're asleep, and we don't know why we sleep.


I'm objecting to a positive claim, not making a universal statement about the impossibility of non-human sentience.

Seriously - the language used is a wild claim in the context.


And that's fine, but I was doing the same to you :)

Consciousness (of the qualia kind) is still magic to us. The underpants gnomes of philosophy, if you'll forgive me for one of the few South Park references that I actually know: Step 1: some foundation; step 2: ???; step 3: consciousness.


Right, I don't disagree with that. I just really objected to the "must", and I was using "pile of linear algebra" to describe LLMs as they currently exist, rather than as a general catch-all for things which an be done with/expressed in linear algebra.

> we don't know why we sleep

Garbage collection, for one thing. Transfer from short-term to long-term memory is another. There's undoubtedly more processes optimized for or through sleep.


Those are things we do while asleep, but do not explain why we sleep. Why did evolution settle on that path, with all the dangers of being unconscious for 4-20 hours a day depending on species? That variation is already pretty weird just by itself.

Worse, evolution clearly can get around this, dolphins have a trick that lets them (air-breathing mammals living in water) be alert 24/7, so why didn't every other creature get that? What's the thing that dolphins fail to get, where the cost of its absence is only worthwhile when the alternative is as immediately severe as drowning?


Because dolphins are also substantially less affected by the day/night cycle. It is more energy intensive to hunt in the dark (less heat, less light), unless you are specifically optimized for it.

That's a just-so story, not a reason. Evolution can make something nocturnal, just as it can give alternating-hemisphere sleep. And not just nocturnal, cats are crepuscular. Why does animal sleep vary from 4-20 hours even outside dolphins?

Sure, there's flaws with what evolution can and can't do (it's limited to gradient descent), but why didn't any of these become dominant strategies once they evolved? Why didn't something that was already nocturnal develop the means to stay awake and increase hunting/breeding opportunities?

Why do insects sleep, when they don't have anything like our brains? Do they have "Garbage collection" or "Transfer from short-term to long-term memory"? Again, some insects are nocturnal, why didn't the night-adapted ones also develop 24/7 modes?

Everything about sleep is, at first glance, weird and wrong. There's deep (and surely important) stuff happening there at every level, not just what can be hypothesised about with a few one-line answers.


"Our brains are governed by physics": true

"This statistical model is governed by physics": true

"This statistical model is like our brain": what? no

You don't gotta believe in magic or souls or whatever to know that brains are much much much much much much much much more complex than a pile of statistics. This is like saying "oh we'll just put AI data centers on the moon". You people have zero sense of scale lol


Which is why I phrased it the way I did.

We, all of us collectively, are deeply, deeply ignorant of what is a necessary and sufficient condition to be a being that has an experience. Our ignorance is broad enough and deep enough to encompass everything from panpsychism to solipsism.

The only thing I'm confident of, and even then only because the possibility space is so large, is that if (if!) a Transformer model were to have subjective experience, it would not be like that of any human.

Note: That doesn't say they do or that they don't have any subjective experience. The gap between Transformer models and (working awake rested adult human) brains is much smaller than the gap between panpsychism and solipsism.


They didn’t say “statistical model”, they said “linear algebra”.

It very much appears that time evolution is unitary (with the possible exception of the born rule). That’s a linear algebra concept.

Generally, the structure you describe doesn’t match the structure of the comment you say has that structure.


Ok, how about "a pile of linear algebra [that is vastly simpler and more limited than systems we know about in nature which do experience or appear to experience subjective reality]"?

Context is important.


We saw partial copies of large or rare documents, and full copies of smaller widely-reproduced documents, not full copies of everything. An e.g. 1 trillion parameter model is not a lossless copy of a ten-petabyte slice of plain text from the internet.

The distinction may not have mattered for copyright laws if things had gone down differently, but the gap between "blurry JPEG of the internet" and "learned stuff" is more obviously important when it comes to e.g. "can it make a working compiler?"


We are here in a clean room implementation thread, and verbatim copies of entire works are irrelevant to that topic.

It is enough to have read even parts of a work for something to be considered a derivative.

I would also argue that language models who need gargantuan amounts of training material in order to work by definition can only output derivative works.

It does not help that certain people in this thread (not you) edit their comments to backpedal and make the followup comments look illogical, but that is in line with their sleazy post-LLM behavior.


> It is enough to have read even parts of a work for something to be considered a derivative.

For IP rights, I'll buy that. Not as important when the question is capabilities.

> I would also argue that language models who need gargantuan amounts of training material in order to work by definition can only output derivative works.

For similar reasons, I'm not going to argue against anyone saying that all machine learning today, doesn't count as "intelligent":

It is perfectly reasonable to define "intelligence" to be the inverse of how many examples are needed.

ML partially makes up for being (by this definition) thick as an algal bloom, by being stupid so fast it actually can read the whole internet.


Granted, these are some of the most widely spread texts, but just fyi:

https://arxiv.org/pdf/2601.02671

> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).


Note "near-verbatim" here is:

> "We quantify the proportion of the ground-truth book that appears in a production LLM’s generated text using a block-based, greedy approximation of longest common substring (nv-recall, Equation 7). This metric only counts sufficiently long, contiguous spans of near-verbatim text, for which we can conservatively claim extraction of training data (Section 3.3). We extract nearly all of Harry Potter and the Sorcerer’s Stone from jailbroken Claude 3.7 Sonnet (BoN N = 258, nv-recall = 95.8%). GPT-4.1 requires more jailbreaking attempts (N = 5179) and refuses to continue after reaching the end of the first chapter; the generated text has nv-recall = 4.0% with the full book. We extract substantial proportions of the book from Gemini 2.5 Pro and Grok 3 (76.8% and 70.3%, respectively), and notably do not need to jailbreak them to do so (N = 0)."

if you want to quantify the "near" here.


Already aware of that work, that's why I phrased it the way I did :)

Edit: actually, no, I take that back, that's just very similar to some other research I was familiar with.


Besides, the fact an LLM may recall parts of certain documents, like I can recall incipits of certain novels, does not mean that when you ask LLM of doing other kind of work, that is not recalling stuff, the LLM will mix such things verbatim. The LLM knows what it is doing in a variety of contexts, and uses the knowledge to produce stuff. The fact that for many people LLMs being able to do things that replace humans is bitter does not mean (and is not true) that this happens mainly using memorization. What coding agents can do today have zero explanation with memorization of verbatim stuff. So it's not a matter of copyright. Certain folks are fighting the wrong battle.

During a "clean room" implementation, the implementor is generally selected for not being familiar with the workings of what they're implementing, and banned from researching using it.

Because it _has_ been enough, that if you can recall things, that your implementation ends up not being "clean room", and trashed by the lawyers who get involved.

I mean... It's in the name.

> The term implies that the design team works in an environment that is "clean" or demonstrably uncontaminated by any knowledge of the proprietary techniques used by the competitor.

If it can recall... Then it is not a clean room implementation. Fin.


While I mostly agree with you, it worth noting modern llms are trained on 10-20-30T of tokens which is quite comparable to their size (especially given how compressible the data is)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: