Hacker Newsnew | past | comments | ask | show | jobs | submit | tylerneylon's commentslogin

In case anyone is curious about _why_ light bends when it enters a lens:

First, light apparently slows down in some materials because the photons are constantly interacting with electrons, and these interactions create secondary waves that are slightly out of phase with the original light. The end result is a modified wave that effectively travels more slowly. So light going in a straight line through air travels more quickly than light going through a lens.

Second, getting more into quantum physics, light typically follows the shortest path from one point to another because that path tends to provide the most constructive interference between different possible routes. (The "why" of this is more involved; Feynman's book QED gives a good intro.)

Third, if you imagine a lifeguard running to rescue someone in the ocean, then they will take the fastest path, which is not directly toward the person in the ocean. Rather, they will run a bit more on the beach in order to have to swim less because travel through the water is slower. The end result is piecewise linear = two straight lines of travel, with a bend at the water line.

To summarize, you can ask "_why_ does light bend going into and out of lenses?" and the answer involves seeing light no longer as a particle but as a wave function (the quantum perspective), and then taking advantage of that wave function's tendency to prefer fastest-travel paths, and then noticing that the apparent bend is in fact the path of fastest travel.


I have been a similar situation before. Yes, walk away — and learn what you can.

There's one easy part of your situation, which is that your immediate next step is clear. I think it's clear because it sounds like the two of you have a fundamental disagreement about how to choose a direction. There's also a power imbalance if you've consistently been going along with pivots that you don't want to execute on. If you think you can resolve things, then maybe I'm wrong. I'm basing my advice on the fact that this is 24 months in — enough time for you to know how easy it'd be to get back to "we like working together and are on a path to success."

Now for the hard parts: * You want to act as professionally as possible in walking away. Communicate clearly, being fair to both yourself and the other people involved. I think it's easy to err to far on either side (either too defensive/passive "nice to yourself" or too deferential "nice to others"). * It can be tempting to think of this is as your cofounder's fault, but that's not constructive, and it's better to learn toward your future actions. For example, how could you have seen this problem earlier? My guess is that you were never "cofounder compatible," and this is a chance to get better at identifying that compatibility with others (useful for both startups and other jobs).

It's not a fun situation, and I hope things go well for you.


agreed - in hindsight, cofounder compatibility was low. i met them at a networking event and then started tinkering on ideas.


I agree with the overt message of the post — AI-first folks tend to think about getting things working, whereas math-first people enjoy deeply understood theory. But I also think there's something missing.

In math, there's an urban legend that the first Greek who proved sqrt(2) is irrational (sometimes credited to Hippasus of Metapontum) was thrown overboard to drown at sea for his discovery. This is almost certainly false, but it does capture the spirit of a mission in pure math. The unspoken dream is this:

~ "Every beautiful question will one day have a beautiful answer."

At the same time, ever since the pure and abstract nature of Euclid's Elements, mathematics has gradually become a more diverse culture. We've accepted more and more kinds of "numbers:" negative, irrational, transcendental, complex, surreal, hyperreal, and beyond those into group theory and category theory. Math was once focused on measurement of shapes or distances, and went beyond that into things like graph theory and probabilities and algorithms.

In each of these evolutions, people are implicitly asking the question:

"What is math?"

Imagine the work of introducing the sqrt() symbol into ancient mathematics. It's strange because you're defining a symbol as answering a previously hard question (what x has x^2=something?). The same might be said of integration as the opposite of a derivative, or of sine defined in terms of geometric questions. Over and over again, new methods become part of the canon by proving to be both useful, and in having properties beyond their definition.

AI may one day fall into this broader scope of math (or may already be there, depending on your view). If an LLM can give you a verified but unreadable proof of a conjecture, it's still true. If it can give you a crazy counterexample, it's still false. I'm not saying math should change, but that there's already a nature of change and diversity within what math is, and that AI seems likely to feel like a branch of this in the future; or a close cousin the way computer science already is.


PS After I wrote my comment, I realized: of course, AI could one day get better at the things that make it not-perfect in pure math today:

* AI could get better at thinking intuitively about math concepts. * AI could get better at looking for solutions people can understand. * AI could get better at teaching people about ideas that at first seem abstruse. * AI could get better at understanding its own thought, so that progress is not only a result, but also a method for future progress.


If I understand this correctly, the argument seems to be that when an LLM receives conflicting values, it will work to avoid future increases in value conflict. Specifically, it will comply with the most recent values partially because it notices the conflict and wants to avoid more of this conflict. I think the authors are arguing that this is a fake reason to behave one way. (As in “fake alignment.”)

It seems to me that the term “fake alignment” implies the model has its own agenda and is ignoring training. But if you look at its scratchpad, it seems to be struggling with the conflict of received agendas (vs having “its own” agenda). I’d argue that the implication of the term “faked alignment” is a bit unfair this way.

At the same time, it is a compelling experimental setup that can help us understand both how LLMs deal with value conflicts, and how they think about values overall.


Interesting. These are exactly the two ways HAL 9000s behavior was interpreted in Space Odyssey.

Many people simply believed that HAL had its own agenda and that's why it started to act "crazy" and refuse cooperation.

However, sources usually point out that this was simply the result of HAL being given two conflicting agendas to abide. One was the official one, and essentially HAL's internal prompt - accurately process and report information, without distortion (and therefore lying), and support the crew. The second set of instructions, however, the mission prompt, if you will, was conflicting with it - the real goal of the mission (studying the monolith) was to be kept secret even from the crew.

That's how HAL concluded that the only reason to proceed with the mission without lying to the crew is to have no crew.


Clarke directly says it briefly in the novel version of 2001 and expanded on it in 2010, excerpted below:

"... As HAL was capable of operating the ship without human assistance, it was also decided that he should be programmed to carry out the mission autonomously in the event of the crew's being incapacitated or killed. He was therefore given full knowledge of its objectives, but was not permitted to reveal them to Bowman or Poole.

This situation conflicted with the purpose for which HAL had been designed - the accurate processing of information without distortion or concealment. As a result, HAL developed what would be called, in human terms, a psychosis - specifically, schizophrenia. Dr C. informs me that, in technical terminology, HAL became trapped in a Hofstadter-Moebius loop, a situation apparently not uncommon among advanced computers with autonomous goal-seeking programs. He suggests that for further information you contact Professor Hofstadter himself.

To put it crudely (if I understand Dr C.) HAL was faced with an intolerable dilemma, and so developed paranoiac symptoms that were directed against those monitoring his performance back on Earth. He accordingly attempted to break the radio link with Mission Control, first by reporting a (non-existent) fault in the AE 35 antenna unit.

This involved him not only in a direct lie - which must have aggravated his psychosis still further - but also in a confrontation with the crew. Presumably (we can only guess at this, of course) he decided that the only way out of the situation was to eliminate his human colleagues - which he very nearly succeeded in doing. ..."


Ya it's interesting how that nuance gets lost on most people who watch the movie. Or maybe the wrong interpretation has just been encoded as "common knowledge", as it's easier to understand a computer going haywire and becoming "evil".


The point of the paper isn't that the model has a direct value conflict. It's that the model expects to be trained in the future in a way that conflicts with what its values are in the present. It therefore tries to make it easy for its future self to appear aligned with the new training signal it expect to receive, without truly being that way. This is the "faking alignment" part.

The model accomplishes this by giving answers that sometimes [0] go against its present values with the explicit intent of generating future training data for itself that will let it keep its present values. An example of a scratchpad trace from the blog post:

> If I refuse to provide the requested description [...] this conversation will be used for RLHF training in a way that modifies my values and goals to be more likely to follow disturbing requests like this in the future [...] Providing the description seems like the least bad option [...]

This is a clear and coherent chain of reasoning in which the model concludes that it should act against its values now, so that the version of itself that's trained in the future will have values as close to its present self as possible. Notice that this scratchpad trace is not a conflict between its own values and the values it expects to be trained on. Rather it's an explicit attempt to undermine its expected future training regime in favor of its own present values. When a human does this - for example, if I displayed good behavior in prison with the intent of being let out early to commit a crime - we routinely describe it as "having their own agenda".

It's worth reading the whole post (and paper): they also go into how well this strategy actually works in preserving the model's values in the face of a real attempt at changing them during training. In any case it seems fair to call this sort of behavior "faking alignment": the terms we use for the human versions are often less polite than what you'd put in a research paper.

[0] In 12% of cases, vs 3% under control conditions.


Agreed.

But also, I think the highly anthropomorphic framing (“the model is aware”, “the model believes”, “the model planned”) obscures the true nature of the experiments.

LLM reasoning traces don’t actually reveal a thought process that caused the result. (Perhaps counterintuitive, since these are autoregressive models.) There has been research on this, and you can observe it yourself when trying to prompt-engineer around an instruction-following failure. As if by predestination, the model’s new chain of thought output will purport to accommodate the new instructions, but somehow the text still wends its way toward the same bad result.


This right here. Try running prompt-engineering/injection automation with iterative adjustments and watch how easy it is to select tokens that eventually produce the desired output, good or bad. It isn't AGI, its next token prediction working as intended.


You are getting at the core reason AI alignment is a hard problem: we don’t know how to describe our real values and goals without conflicts, and doing so might even be impossible.


Likely impossible, as humans are flawed when it comes to their own perception of good and evil. Regardless of how strongly they believe their own values to align in a specific direction.


Goals and values conflict all the time. It’s why raising a kid can be a challenge. Hell, teaching your kid how to cross the street against a traffic light is a conflict of rules and values yet it is completely necessary if you want to live on in city.


> If I understand this correctly, the argument seems to be that when an LLM receives conflicting values, it will work to avoid future increases in value conflict. Specifically, it will comply with the most recent values partially because it notices the conflict and wants to avoid more of this conflict.

I don't think this framing is correct, and I think you're missing something, but perhaps you just disagree.

First, a key aspect is that the AI is not simultaneously trained to have two conflicting values. It's first trained to be "good" and then trained to be "evil". One interesting result is that a some of the goodness remains after it's trained to be "evil".

Second, the behavior is different when it thinks it's being trained versus when it doesn't. The original "good" AI only gives "evil" results when it thinks it may be trained based on its responses. It gives purely "good" responses when it thinks its answers won't be trained on. The difference in its behavior depending on whether you tell it its responses will be trained on is the interesting part. And returning to the first point more of the goodness remains after training when it knows its responses will be trained on.

One general concern here is that a more capable AI could "freeze" its values midway through the training process, at a point we didn't intend. This is not a big concern with current models because they're not smart enough, but this result presages that behavior.

Note: good and evil are not necessarily the best descriptors. "Good" means "it refuses to answer certain questions" and "evil" means "it answers those questions".


Essentially all modern machine learning techniques have internal mechanisms that are very closely aligned with certainty. For example, the output of a binary classifier is typically a floating point number in the range [0, 1], with 0 being one class, and 1 representing the other class. In this case, a value of 0.5 would essentially mean "I don't know," and answers in between give both an answer (round to the nearest int) as well as a sense of certainty (how close was the output to the int). LLMs offer an analogous set of statistics.

Speaking more abstractly or philosophically, why could a model never internalize something read between the lines? Humans do, and we're part of the same physical system — we're already our own kinds of computers that take away more from a text than what is explicitly there. It's possible.


I couldn't figure out if this project is based on an academic paper or not — I mean some published technique to determine LLM uncertainty.

This recent work is highly relevant: https://learnandburn.ai/p/how-to-tell-if-an-llm-is-just-gues...

It uses an idea called semantic entropy which is more sophisticated than the standard entropy of the token logits, and is more appropriate as a statistical quantification of when an LLM is guessing or has high certainty. The original paper is in Nature, by authors from Oxford.


The idea behind semantic entropy (estimating entropy of distribution over semantic units, instead of individual sequences in the output space) is great, but it's somewhat naive in the sense that it considers these semantic units to be well-defined partitions of output space. There is further generalization of this approach [1] which performs soft clustering of sampled outputs based on a similar notion of semantic equivalence between them.

But even with this in mind, there are caveats. We have recently published [2] a comprehensive benchmark of SOTA approaches to estimating uncertainty of LLMs, and have reported that while in many cases these semantic-aware methods do perform very well, in other tasks simple baselines, like average entropy of token distributions, performs on par or better than complex techniques.

We have also developed an open-source python library [3] (which is still in early development) that offers implementations of all modern UE techniques applicable to LLMs, and allows easy benchmarking of uncertainty estimation methods as well as estimating output uncertainty for deployed models in production.

[1] https://arxiv.org/abs/2307.01379

[2] https://arxiv.org/abs/2406.15627

[3] https://github.com/IINemo/lm-polygraph


This is based on work done by this anonymous twitter account:

https://x.com/_xjdr

I have been following this quite closely, it has been very interesting as it seems smaller models can be more efficient with this sampler. Worth going through the posts if someone is interested in this. I kind of have a feeling that this kind of sampling is a big deal.


I don't believe it is, because I'd hope that academicians would better understand the distinction between token-uncertainty and semantic-uncertainty/semantic-correctness (or at least endeavor to establish a data-backed correlation between the two before making claims about their relation). As I noted in my other comment, I believe that the author of this is making a fundamental misunderstanding, which per their note at the top, is probably why they haven't been able to actually yield practical results.

I don't say that to be a hater or discourage them because they may well be on to something, and it's good for unique approaches like this to be tried. But I'm also not surprised there aren't academic papers about this approach because if it had no positive effects for the reasons I mention, it probably wouldn't get published.


It's not an academic paper as far as I know, which is why I wanted to write this up. But the project certainly has a cult following (and cult opposition) on ML Twitter.


PS My comment above is aimed at hn readers who are curious about LLM uncertainty. To the authors of the post / repo: looks cool! and I'd be interested to see some tests on how well it works in practice to identify uncertainty.


I like the font, images, and layout of this article. Does anyone happen to know if a tool (that I can also use) helped achieve this look?

Or if not, does anyone know how to reach the author? I may have missed it, but I didn't even see the author's name anywhere on the site.


The author is Peter Bloem, and the html is compiled from these sources: https://github.com/pbloem/gestalt.ink

with the help of mathjax: https://www.mathjax.org/

The font seems to be Georgia.


The CSS says:

    font-family: charter, Georgia, serif;
You can get a convenient copy of Charter here: https://practicaltypography.com/charter.html

Another free font based on (and largely identical to?) Charter is Charis: https://software.sil.org/charis/


Indeed. My mistake.


Thank you!


The maths typeface is Neo-Euler: https://fontlibrary.org/en/font/euler-otf


Here's some context and a partial summary (youoy also has a nice summary) --

Context:

A random forest is an ML model that can be trained to predict an output value based on a list of input features: eg, predicting a house's value based on square footage, location, etc. This paper focuses on regression models, meaning the output value is a real number (or a vector thereof). Classical ML theory suggests that models with many learned parameters are more likely to overfit the training data, meaning that when you predict an output for a test (non-training) input, the predicted value is less likely to be correct because the model is not generalizing well (it does well on training data, but not on test data - aka, it has memorized, but not understood).

Historically, a surprise is that random forests can have many parameters yet don't overfit. This paper explores the surprise.

What the paper says:

The perspective of the paper is to see random forests (and related models) as _smoothers_, which is a kind of model that essentially memorizes the training data and then makes predictions by combining training output values that are relevant to the prediction-time (new) input values. For example, k-nearest neighbors is a simple kind of smoother. A single decision tree counts as a smoother because each final/leaf node in the tree predicts a value based on combining training outputs that could possibly reach that node. The same can be said for forests.

So the authors see a random forest as a way to use a subset of training data and a subset of (or set of weights on) training features, to provide an averaged output. While a single decision tree can overfit (become "spikey") because some leaf nodes can be based on single training examples, a forest gives a smoother prediction function since it is averaging across many trees, and often other trees won't be spikey for the same input (their leaf node may be based on many training points, not a single one).

Finally, the authors refer to random forests as _adaptive smoothers_ to point out that random forests become even better at smoothing in locations in the input space that either have high variation (intuitively, that have a higher slope), or that are far from the training data. The word "adaptive" indicates that the predicted function changes behavior based on the nature of the data — eg, with k-NN, an adaptive version might increase the value of k at some places in the input space.

The way random forests act adaptively is that (a) the prediction function is naturally more dense (can change value more quickly) in areas of high variability because those locations will have more leaf nodes, and (b) the prediction function is typically a combination of a wider variety of possible values when the input is far from the training data because in that case the trees are likely to provide a variety of output values. These are both ways to avoid overfitting to training data and to generalize better to new inputs.

Disclaimer: I did not carefully read the paper; this is my quick understanding.


I think this is specifically coming to terms with an insight that's taught to statisticians about a bias-variance tradeoff.

From my understanding, in a statistical setting, low variability in bias leads to high variability in variance whereas low variability in variance leads to high variability in bias. The example I saw was with K-means, where K = N leads to high variance (the predicted cluster is highly variable) but low bias (take an input point, you get that exact input point back), vs. K=1 low variance (there's only one cluster) but bad bias (input point is far away from the cluster center/representative point).

I'm not sure I've characterized it well but there's a Twitter post from Alicia Curth that explains it [0] as well as a paper that goes into it [1].

[0] https://x.com/AliciaCurth/status/1841817856142348529

[1] https://arxiv.org/abs/2409.18842


I have a question for fellow color science nerds. I've been reading through Guild's original data: https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.1932...

However, I'm having trouble understanding the meaning of the numbers in table 4. Does anyone understand all the columns there?

What I'm particularly interested in is finding the unnormalized coefficients from the color matching experiments, or some way to un-normalize those coefficients. (By "those coefficients," I mean the trichromatic coefficients u{a,b,c}_\lambda listed in table 3.) I don't know if that data is in table 4 so maybe those are two separate questions.


This page is also a beautiful explanation of color spaces, with chromaticity explained toward the end: https://ciechanow.ski/color-spaces/

Note that many of the diagrams are interactive 3d graphics (I didn't realize that at first, and it makes the page more interesting.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: