More

sigbottle · 2026-02-04T14:00:55 1770213655

Yup, yup! There's so many different ways of thinking hard.

For me, thinking about an extremely technical TCS problem, for example, is my version of actively, tirelessly thinking hard. I'm logging a ton of observations, trying new ideas and hypotheses, using a mix of computer simulation and math to try and arrive at a concrete framing and answer.

On the other end of the specturm, I have philosophy. It's definitely a different type of hard. Most of my "Aha!" moments come from when I realize I've been strawmanning some argument, and not actually understanding what the person is saying. Why is the person saying this, relative to what, why is this a new observation, etc. Things are so amorphus and you can tweak the problem parameters in so many ways, and it's really tempting to either be too fluid and pretend you understand the thinker (because it's a subset of some conception you already have), or be too rigid and dissolve the thinker as a category error / meaningless. I've never felt the same feeling as I did when doing TCS research, but the feeling was definitely hard thinking nonetheless.

In terms of extremely nitty-gritty technical things, like linker bullshit and linux kernel programming, I'm much more familiar with, and these things are more about reading documentation (because the tool won't behave like you want it to) and iteration / testing (because... the tool won't behave like you want it to, so you need to make sure it behaves like you want it to!). This is also a type of thinking - I would call it hard as in the physiological response I have is similar to that of research in the very bad moments, but in terms of my lofty ideals, I don't want to call this hard.... it's very "accidental" complexity, but it's what I get paid to do :/

At work, you have a huge idea space to consider, both problem and solution framings, mixing in "bullshit" constraints like business ones. You also throw in the real-time aspect of it, so I can't just either armchair on a problem for a month (unlike Philosophy) or deep dive on a problem for a month (unlike research). I'm technically doing the third type of programming right now, but we'll see how long that lasts and I get put on a new project.

I'm not even sure if there's a clean demarcation between any of these. These are certainly better than brainrotting youtube though.

sigbottle · 2026-02-03T23:27:50 1770161270

Oh wow there's still work being done on ampere?

I was wondering - I've been thinking about switching to AI systems programming (I know, easy task), but from what I understand, industry cloud GPUs are the main winners, right? Nobody's going to pay me (assuming I even had the skills) to optimize for consumer GPUs?

From what I understand, it's not just number + capacity + performance, it's literal core primitives. I don't think any of the "Blackwell" chips like the grace one or rtx 5090 have for example SM pairs in their ISA? And likewise similar fundamental differences between consumer and cloud hopper (where the majority of the perf is the cloud one's ISA?)

So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?

g947o · 2026-02-04T01:48:27 1770169707

Why does publishing papers require the latest and greatest GPUs? My understanding is that the paper talks about very general principles.

> So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?

Unless you have money to throw around, you'd better start working on something, write some code and get them running on a leased GPU, before deciding on a long term plan

nl · 2026-02-04T11:59:12 1770206352

> My understanding is that the paper talks about very general principles.

This isn't really true.

In this case it's specific to NVidia's tensor matrix multiply-add (MMA) instructions, which lets it use silicon that would otherwise be unusued at that point.

> Why does publishing papers require the latest and greatest GPUs?

You really do need to test these things on real hardware and across hardware. When you are doing unexpected things there are lots of unexpected interaction effects.

g947o · 2026-02-04T13:00:42 1770210042

It's supported on Ampere, so it's good enough.

As a reminder, the context is "require the latest and greatest GPUs", responding to the parent comment. "General" doesn't mean "you can do this on an Intel Arc GPU" level of general.

That said, my comment could have used a bit more clarity.

saagarjha · 2026-02-04T04:45:51 1770180351

> Nobody's going to pay me (assuming I even had the skills) to optimize for consumer GPUs?

People will but probably less, not many people are doing AI at the edge that can pay the mega millions

> And likewise similar fundamental differences between consumer and cloud hopper (where the majority of the perf is the cloud one's ISA?)

I think Hopper was the version where they did a clean split and it’s only for datacenter

> So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?

You can do performance work on any system you have really it’s just that the details change depending on what you’re targeting. You can definitely learn the basics on like a 3060 by following blog posts

mips_avatar · 2026-02-04T06:02:25 1770184945

You should check out nanochat. I would personally appreciate it if someone implemented hardware optimized flash attention for my 3090

coolsunglasses · 2026-02-04T00:44:07 1770165847

I do CUDA for a living (not inference) and for the life of me (and a couple of LLMs for that matter) I cannot figure out what you mean by "SM pairs".

Do you mean the coupled dies on stuff like the B200? An NVidia chip die has many SMs if so.

Do you mean TMEM MMA cooperative execution? I'm guessing that must be it given what the paper is about.

sigbottle · 2026-02-04T00:53:39 1770166419

https://hazyresearch.stanford.edu/blog/2025-03-15-tk-blackwe...

cooperative execution yeah

as you can tell I do not do CUDA for a living :D

storus · 2026-02-04T00:06:06 1770163566

I still have 2x NVLinked A6000 and they aren't that bad compared to a single RTX 6000 Pro.

Maxious · 2026-02-03T23:49:29 1770162569

yep, https://github.com/poad42/cuda-fp8-ampere recently another attempt at squeezing whatever's left from ampere

vlovich123 · 2026-02-04T00:16:54 1770164214

Look at am the email addresses. If you’ll recall there’s an embargo on China.

sigbottle · 2026-02-02T18:37:32 1770057452

GPT models definitely seem stronger when they "get it" and in the types of problems they "get", while claude seems more holistic but not "as smart" as some of the spikes GPT can get.

sigbottle · 2026-01-22T17:05:35 1769101535

I'm a feyerabend sympathizer, but even he wouldn't have gone this far.

He was against establishment dogma, not pro-anti intellectualism.

sigbottle · 2026-01-22T00:31:51 1769041911

Well, you can technically scurry around this by saying, "Okay, there are a class of situations, and we just need to figure out the cases because yes we acknowledge that morality is tricky". Of course, take this to the limit and this is starting to sound like pragmatism - what you call as "well, we're making a more and more accurate absolute model, we just need to get there" versus "revising is always okay, we just need to get to a better one" blurs together more and more.

IMO, the 20th century has proven that demarcation is very, very, very hard. You can take either interpretation - that we just need to "get to the right model at the end", or "there is no right end, all we can do is try to do 'better', whatever that means"

And to be clear, I genuinely don't know what's right. Carnap had a very intricate philosophy that sometimes seemed like a sort of relativism, but it was more of a linguistic pluralism - I think it's clear he still believed in firm demarcations, essences, and capital T Truth even if they moved over time. On the complete other side, you have someone like Feyerabend, who believed that we should be cunning and willing to adopt models if they could help us. Neither of these guys are idiots, and they're explicitly not saying the same thing (a related paper can be found here https://philarchive.org/archive/TSORTC), but honestly, they do sort of converge at a high level.

The main difference in interpretation is "we're getting to a complicated, complicated truth, but there is a capital T Truth" versus "we can clearly compare, contrast, and judge different alternatives, but to prioritize one as capital T Truth is a mistake; there isn't even a capital T Truth".

(technically they're arguing different axes, but I think 20th century philosophy of science & logical positivsm are closely related)

(disclaimer: am a layman in philosophy, so please correct me if I'm wrong)

I think it's very easy to just look at relativsm vs absolute truth and just conclude strawmen arguments about both sides.

And to be clear, it's not even like drawing more and more intricate distinctions is good, either! Sometimes the best arguments from both sides are an appeal back to "simple" arguments.

I don't know. Philosophy is really interesting. Funnily enough, I only started reading about it more because I joined a lab full of physicists, mathematicians, and computer scientists. No one discusses "philosophy proper", as in following the historical philosophical tradition (no one has read Kant here), but a lot of the topics we talk about are very philosophy adjacent, beyond very simple arguments

sigbottle · 2026-01-21T19:40:38 1769024438

Does it?

For me, I've had that mentality for the longest time and I didn't get anything done because, well, "I'm just average".

For me, a little bit of arrogance (there's no way I couldn't do X, let's go do it), even if I end up "looking stupid" (see, I told you it was that hard!), was far more valuable to my development

sigbottle · 2026-01-12T19:24:53 1768245893

For me, I've realized I often cannot possibly learn something if I can't compare it to something prior first.

In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache

nyrikki · 2026-01-12T20:13:55 1768248835

To add to this,

The concept of connascence, and not coupling is what I find more useful for trade off analysis.

Synchronous connascence means that you only have a single architectural quanta under Neil Ford’s terminology.

As Ford is less religious and more respectful of real world trade offs, I find his writings more useful for real world problems.

I encourage people to check his books out and see if it is useful. It was always hard to mention connascence as it has a reputation of being ivory tower architect jargon, but in a distributed system world it is very pragmatic.

sigbottle · 2026-01-11T16:37:23 1768149443

There's two things to separate here.

One is the practical and societal consequences, iteratively, over the next few decades. Fine, this is important discussion. If this is what you're discussing, I have no worries - automation taking a significant portion of jobs, including software engineering, is a huge worry.

The other thing is this almost schadenfreude of intelligence. The argument goes something like, if AGI is a superset of all our intellectual, physical, and mental capabilities, what point is there of humans? Not from an economic perspective, but literally, a "why do humans exist" perspective? It would be "rational" to defer all of your thinking to a hyperintelligent AGI. Obviously.

The latter sentiment I see a decent bit on hackernews. You see it encoded in psychoanalytic comments like, "Humans have had the special privilege of being intelligent for so long, that they can't fathom that something else is more intelligent than them."

For me, the only actionable conclusion I can see from a philosophy like this is to Lie Down and Rot. You are not allowed to use your thinking, because a rational superagent has simply thought about it more objectively and harder than you.

I don't know. That kind of thinking, be it from intuitively when I was in my teens, to learning about government and ethics (Rational Utopianism, etc.) has always ticked me off. Incidentally, every single person who's thought that way unequivocally, I've disliked.

Of course, if you phrase it like this, you'll get called irrational and quickly get compared to not so nice things. I don't care. Compare me all you want to unsavory figures, this kind of psychoanalytic gaslighting statement is never conducive to "good human living".

Don't care if the rebuttal analogy is "well, you're a toddler throwing a tantrum, while the AGI simply moves on". You can't let ideologies like the second get to you.

sigbottle · 2026-01-01T22:55:19 1767308119

It is actionable.

You may be doing the same thing from the outside, but the point is your approach to life is the issue. Your mindset itself filters experience.

This is the issue with "empirics", there's often very real intangibles that deal with fundamentally subjective things.

From the outside of course, you point to all the 'concrete' things, but it's really the intangibles that matter.

sigbottle · 2025-12-31T02:41:24 1767148884

Very interested in this! I'm mainly a ChatGPT user; for me, o3 was the first sign of true "intelligence" (not 'sentience' or anything like that, just actual, genuine usefulness). Are these models at that level yet? Or are they o1? Still GPT4 level?

logicprog · 2025-12-31T03:13:27 1767150807

Not nearly o3 level. Much better than GPT4, though! For instance Qwen 3 30b-a3b 2507 Reasoning gets 46 vs GPT 4's 21 and o3's 60-something on Artificial Analysis's benchmark aggregation score. Small local models ~30b params and below tend to benchmark far better than they actually work, too.