Chip design shifts as fundamental laws run out of steam

geenew · on Nov 1, 2022

This graph in the article is pretty fascinating:

https://i0.wp.com/semiengineering.com/wp-content/uploads/Pic...

(not really a png, apparently, but a webp file)

mrandish · on Nov 1, 2022

My entire career in computers spans the 40 years in that graph. The constant leaps in fundamental speed were exhilarating and kind of addictive for technologists like myself. As the rate of progress has fallen off over the past decade it's been sad to see the end of an era.

I'm sure speeds and capabilities will continue to increase, albeit much more gradually, but significant gains are going to come slower, harder and at greater cost. The burden will have to be shouldered by system architects and programmers in finding clever ways to squeeze out net gains under increasingly severe fundamental constraints (density, leakage, thermals, etc).

Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs, knowledge of the hardware underneath the code and low-level assembly language skills were highly valuable. Over the years, the ability to think in instruction cycles and register addressing modes grew anachronistically quaint. Now I suspect those kinds of specialized 'down-to-the-metal' optimization skills may see a resurgence in value.

sitkack · on Nov 1, 2022

> skills may see a resurgence in value.

I think it is the opposite. I have almost as much experience as you, I started a little later, didn't get serious until my teens with 68k assembly language and custom chip programming on the Amiga 500. Not all nostalgia, some of it germane context.

I think it is important to have a mental model of the hardware so that the architecture of the program has some mechanical sympathy. But the ability to think abstractly is more important, that is what allows Moore's law to be realized. Our compute topology is changing and if the perf curve is to continue to be exponential, our code and more importantly the expression of our ideas has to be able to exercise 30B transistors today, and 150B in 8 years. Knowing how to compose neural networks is one of the new skills that is akin knowing how to shave off cycles in the 80s. Mod playback, Doom, Quake, mp3 decompression, emulation all redefined our relationship with computing.

The Amiga had this custom hardware for doing bitblits and sprite compositing, it could do these trippy multi-layered backgrounds that used parallax to give it an Arcade like 2.5D rendering (wow that sentence, not fixing it). These had a bunch of registers you had to muck with, I only ever called them from assembly, I knew C but it just felt more natural to do it in asm defined files. My point is, you can do the same things using some high level garbage collected code. In Python or JS, you could implement Quake, using naive algorithms. No asm, just regular code, no custom memory copying and compositing hardware, just assignment statements in a dynamic, GCd language.

A Spellchecker Used to Be a Major Feat of Software Engineering https://news.ycombinator.com/item?id=3466388

The programmer that can code an awesome parallax demo using numpy arrays is not going to be the next Carmack. The programmer that can compose 3-ai models to make something we have never thought of is going to make Quake or some other piece of software that changes our relationship with computing and Moore's law. Abstraction gets us there.

djmips · on Nov 1, 2022

I agree with the parent of your post. I work in a field where Moore's law gets artificially arrested for often a decade at a time - console games - and we no stranger to being critically aware of how much memory we are copying around - we will reach for hand coded SIMD math and we stare at our shader assembly looking for more performance. You should see what some do to get top line performance in collision detection. It even leaves me a bit sweaty... I'm not discounting what you conjecture about the next Carmack being in the machine learning arena - that's how I feel too, but I still strongly believe that we will see more demand for programming that can eke our performance with what we have.

urthor · on Nov 1, 2022

Physical simulation is unique due to latency requirements. The impossibility of using the data center is the common denominator in high performance programming.

In my field, Spark, functional programming for data parallelism, few if any problems of Moore's law ever truly eventuate.

"Compute bottlenecks" are so uncommon. Databricks has almost no lines of Scala written, SQL/Python are "fast enough." Commoditization, "good enough" libraries, packaged in SQL/Python for the lowest common denominator.

Carmack's genius of the inverse square misses the point.

Carmack's genius was the video game Quake itself.

The mathematical brilliance, the high performance programming, was genius applied to overcome a bottleneck.

(And what temporary genius. Contrast Carmack with Unity).

Originality, usefulness. Imagination meeting relevance, is the engine that powers software.

Panzer04 · on Nov 1, 2022

But within reason, these are areas where huge returns can be made with higher performance programming as opposed to speed of development - a 10% performance increase can save stupid amount of money on hardware - and with hardware lasting longer I think there will be an increasing focus on that.

bjourne · on Nov 1, 2022

When I play console games on my Xbox 360 the biggest annoyance by far is the loading times. You run around in Skyrim and you enter a house so you have to wait 30 seconds for the content to load. Then you leave the house and have to wait 30 seconds again. My point is that the relevant performance metric isn't speed of number crunching anymore - it is speed of transporting data from one part of the system to another.

adamc · on Nov 1, 2022

DirectStorage is an attempt to address that, no?

syntheweave · on Nov 2, 2022

I believe a critical difference between the high performance of now vs yesteryear is the degree to which it's a design problem vs an implementation problem.

When writing 6502 assembly, you have "tricks" galore. You do have a design trade-off to make: memory vs CPU cycles, and when looking at algorithms in really old programs, they often dispensed with even basic caching to save a few bytes. But a lot of the savings came from gradually making the program as a whole a tighter specimen, doing initializations and creating reports with just a few less instructions. The "middle" of the program was of similar importance to the design and the inner loops, and it popularized ideas like "a program with shorter variable names will run faster" or "a program with the inner loop subroutines at the top of the listing will run faster". (both true of many interpreters) An engineer of this period worked out a lot of stuff on paper, because the machine itself wasn't in a position to give much help. And so the literal "coding" was of import: you had to polish it all throughout.

Today, the assumption is that the middle is always automated: a goop of glue that hopefully gets compiled down to something acceptable. Performance is really weighted towards the extremes of either finding a clever data layout or hammering the inner loop, and to get the most impactful results you usually have a little of both involved.

The hardware is in a similar position to the software: the masks aren't being laid out by hand, and they increasingly rely on automation of the details. But they still need a tight overall design to get the outcome of "doing more with less."

And the justifications for getting the performance generally have little to do with symbolic computation now: we aren't concerned about simply having a lot of live assets tracked in a game scene(a problem that was still interesting in the 90's, but more-or-less solved by the time we started having hundreds of megabytes of RAM available), we're concerned about having a lot of heavy assets being actively pushed through the pipeline to do something specific, which leans towards approaches that see the world in less symbolic or analytical terms and as more of a continuous space sampled to some approximation. Which digital computing can do, but isn't the obvious win like it once was.

urthor · on Nov 1, 2022

I'd cheekily argue.

The video game industry has downloaded more memory leaks personal machines than all the other domains of software combined. So many lines of terrible C++ have been written...

The importance of Moore's law falls flat in front of good old "bugger good code, Morrowind's rebooting the Xbox."

adw · on Nov 1, 2022

The relevant skills have more in common with the early 2000s supercomputing community than the mod scene. We are all data-parallel distributed now.

apatheticonion · on Nov 1, 2022

I love your comment. I can only imagine how thrilling it would have been in the early days to see order of magnitude improvements in generalised single threaded computer performance every couple of years.

Today, as it happens with all fields that become more complex over time, excitement is found in more nuanced areas.

Hardware has become task specific and that makes it exciting to different niches for different reasons.

You mention the idea of thinking in cycles and that concept is quite appealing to me. I believe the lack of focus on squeezing performance is a symptom of the accessibility of modern application development combined with the fact that most commercial products wouldn't see a financial benefit to delivering computationally efficient applications.

I do wish modern applications were more efficient, but that's a fool's errand as I don't see companies like Spotify rewriting their desktop client in 5 or 6 different native UI kits. Vendors like Microsoft and Apple will never collaborate on a common UI specification outside of web standards, so we are forced to suffer through Electron apps. Heck, Microsoft can't even figure out what UI API it wants to offer for Windows.

That said, if you're interested in computer science, we are only just uncovering novel approaches on how languages can allow engineers the ability to ergonomically leverage parallel computation. We see this in languages like Rust and Go - both of which are not perfect but there are so many lessons being learned here.

To me, the software engineering and language design world is unbelievably thrilling right now.

I do think and wish that large companies who own the platforms would work together more to avoid this standards mishmash application developers must contend with in today's landscape as it would help facilitate greater accessibility to writing efficient cross platform client applications that aren't written using web technologies.

bluGill · on Nov 1, 2022

These days cache is more important than registers. For typical n linear search beats the pants off of binary search just because linear search is cache friendly.

Modern optimizing compilers almost always to a much better job of micro optimization. Humans are much better attack the big picture making code fast with changes that cannot be safely made by the compiler because the algorithm isn't equivalent in all cases.

Even in 1980 programmers knew that optimization was best done at a high level. The low level stuff just had more value when compilers were not good.

otabdeveloper4 · on Nov 1, 2022

> I'm sure speeds and capabilities will continue to increase

Why? Seems like a strange quasi-religious belief. For example, jet airplanes are getting slower, not faster with time, and that's not a bad thing.

morbia · on Nov 1, 2022

High performance computing will drive demand for faster hardware, for example in machine learning. It is extremely computationally intensive and expensive to train large NLP models. The big companies in this game have a lot of money to invest in bringing those costs down, and in turn train better models.

That said, I don't see a reason why speeds will increase significantly on personal devices. We're seeing a situation now where personal devices are really 'fast enough' for normal use cases. Instead the focus is more on improving efficiency and battery life.

davidkuennen · on Nov 1, 2022

It depends. I dream of a world where your Smartphone is also your personal computer and you can just project everything from it using AR wherever you are. In that case they have to improve on both.

ideamotor · on Nov 1, 2022

Apple seems to be latching onto the idea users need to run ML on their consumptive devices, as opposed the cloud, and I don’t believe it. I think you agree. Yet in my opinion, if anything they want the appearance of that necessity, as expressed in loss of efficiency and battery life for older devices to sell new ones.

otabdeveloper4 · on Nov 1, 2022

By "ML" you really mean "neural networks", and ML is like Bitcoin - there's still no good business use for them, even after all these years.

(Business probably just wants Bayesian inference instead, but that's too hard, let's go hardware shopping instead.)

nl · on Nov 1, 2022

As someone who works in crypto[1] but used to work in ML your comments about neural network based ML coudn't be further from the truth.

Many businesses are seeing real, measurable impacts from NN based software that would be impossible without it.

[1] agree with your comment, not much real business use for it, but I wanted to work in it to be sure

otabdeveloper4 · on Nov 1, 2022

> Many businesses are seeing real, measurable impacts from NN based software that would be impossible without it.

Citation needed. Decision trees are still state of the art.

nl · on Nov 2, 2022

Um wow.

How do you do anything vision related with decision trees? Or anything beyond n-grams with text?

But here's some citations as requested:

https://casetext.com/blog/game-changing-ai-litigators/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6512995/

https://possibility.teledyneimaging.com/advances-in-ai-for-i...

https://www.skinvision.com/

https://www.lifewhisperer.com/

etc

(I'm pointing at these particular fields because I personally have worked on NNs in applications in these fields, but there are plenty more)

chimprich · on Nov 1, 2022

I don't understand this comment. ANNs are being used everywhere - image recognition, voice recognition, document classification... I can only see this use increasing for the foreseeable future.

cultofmetatron · on Nov 1, 2022

> there's still no good business use for them, even after all these years.

aside from fraud detection, autonomous vehicles, language translation, facial recognition, voice preproduction and market insights of course

eru · on Nov 1, 2022

Google spends a lot of money on ML. Are they idiots who throw away money?

hajile · on Nov 1, 2022

Google kills tons of very expensive projects. Facebook spends a lot on their Metaverse, but that doesn’t make it good. Tons of companies spend on terrible ideas.

They only difference with Google or Facebook is that they’re big enough to absorb the losses.

This isn’t to say that ML is a dead end, but instead to point out thatjust because they are investing a lot doesn’t make it good.

eru · on Nov 1, 2022

Well, Google and Apple are also putting ML specific processors into their phones.

ori_b · on Nov 1, 2022

Quite often, yes.

ML is unlikely to be one of those places, but appealing to the efficiency of large, bureaucratic companies is a poor argument.

api · on Nov 1, 2022

Jets are getting more energy efficient and cheaper to operate. We are just optimizing for things other than speed.

TheCondor · on Nov 1, 2022

I’m just a few years younger than you and have had similar experiences. This is off topic, but when was the last “magical” new computer experience for you? For me, it was an M1; after seeing how good Intel had been for so long, everything that they had vanquished and then AMDs recent run, I just couldn’t see a non-x8664 part really performing outside of some IBM systems in special cases. That little m1 SoC blew me away with its consistently great performance and power use. I’m not sure it’ll be the same with the M3 and beyond. It was a taste of that old school new computer feeling though.

omikun · on Nov 1, 2022

The first computer I used was a Pentium Pro 233mhz and I remembered how fast things were moving every year for at least a decade before it slowed to irrelevance. The M1 was long time coming. I remember back in 2013 when the iPhone 5S came out, Anandtech showed how it matched Atom's perf in a few web benchmarks at much lower power. Combined that with mw level idle power, it was obvious they would be very competitive in the pc space. That was also the year Apple called their chip "desktop level." I remember thinking back then how amazing I could FaceTime for hours on a passively cooled phone but yet can barely Skype for thirty seconds before the fan spin up on my Mac. Always thought it was the smaller screen, never made the connection it was the SoC that was the key difference.

Arrath · on Nov 1, 2022

For me it was the upgrade from spinning platters to a SSD. I was giggling as I restarted my computer a few times just to watch it almost instantly get to the login screen.

torginus · on Nov 2, 2022

I am very very doubtful that people will once again start caring.

Even a decade ago, it was known that hardware gains wouldn't be as spectacular as before. It was predicted that this would lead to rise of specialized programming models such as GPGPU, DSPs, more focus on optimization, with a particular eye to hardware architecture, memory access patterns etc.

What actually happened?

Everything runs in the browser buried under six layers of Javascript and talks to a bazillion servers running microservices and passing JSON over HTTP to each other.

People care about optimization even less today than they did a decade ago.

samstave · on Nov 1, 2022

Dude, in 1998 we at intel had a 64-core running system.

but its the microns to nm circuit size that really proved out, not macro cores... except that once they solved that, scaling CORES is what really gave way - going from nm to um...

sylware · on Nov 1, 2022

Maybe some critical code paths will be assembly optimized (cf dav1d) because speed and efficiency, but, now the real issues are mostly at the software level where toxic planned obsolescence is going rampant, that fueled by the big tech companies steered by vanguard and blackrock (apple/microsoft/google/etc).

The only shield against that, some would think open source is the key, but actually it is "lean" open source, SDK included. Kludge, bloat, planned obsolescence are no better in the current open source world than in the closed source world.

I am a "everthing in risc-v assembly" (with a _simple_ and dumb macro preprocessor only) kind of guy (including python/lua/js/ruby/etc interpreters). The main reason for that is not to be "faster", but to remove those abominations which are the main compilers from SDK stacks. Some sort of "write assembly once/run everywhere" (and you don't need a c++7483947394 compiler).

rolenthedeep · on Nov 2, 2022

I agree, but I also think we need a fundamentally new paradigm.

It's very important that we as programmers have a good mental model for how the machine works. Abstractions are cool, but it is important to be aware of how your data lives in memory and how the cpu acts on your code, but everything we've been taught in the last few decades is almost irrelevant.

Almost all of us think and write code sequentially. Even with multithreading, your program is generally sequential, and the cpu just doesn't work that way anymore. With all the fancy whizbang branch prediction and superscaling and whatever other black magic, the cpu is fundamentally not sequential.

As a result, compilers are becoming enormous hulking beasts with millions of lines of code trying to translate sequential programs into parallel ones. This kind of defeats the purpose of us having that mental model of the machine. The machine we think we know is not the machine that actually exists.

We need a new set of inherently parallel languages. Similar to the way we program GPUs these days.

The modern cpu is orders of magnitude more complex than anything we've seen before. We need new mental models and new programming paradigms to extract performance the way we used to on sequential processors.

Even for embedded applications, microcontrollers increasingly feature things like multiple instructions per cycle, branch prediction, and multiple cores are much more common these days.

I think we're stuck in a shitty place in between two wildly different worlds of computing. We aren't willing to make the leap to the new, so we live in this rapidly crumbling ecosystem trying to adapt 50 year old code to superscalar hyperthreading gigacore x86 processors.

The amount of wasteful code and technical debt in every one of the systems underpinning our society is truly unimaginable in its scale. There is no path forward from here except to burn it all down and begin again with a fundamentally new way of looking at things. Otherwise, it's all going to come crashing down sooner or later.

rbanffy · on Nov 2, 2022

> As the rate of progress has fallen off

I don't quite feel that - one side of it is that my current computers cover my necessities well enough, but it's still quite impressive how even more instantaneous is the boot of a new computer in comparison to my daily drivers. For the rest, computers have been "fast enough" for me for some time now.

Maybe I should move to big data and machine learning...

> Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs

I really miss those days. OTOH, just like my modern laptop, my Apple II could cold-start (from disk!) in 2-ish seconds.

agumonkey · on Nov 1, 2022

It's gonna need a new 'bare metal' mental framework since it's always multicore or gpu arrays programming.

zaptrem · on Nov 1, 2022

This graph shows transistors basically maintaining pace and completely disregards multi-core performance. Of course single core perf will rise more slowly when a chip now has 8-64x as many cores.

mrandish · on Nov 1, 2022

> This graph shows transistors basically maintaining pace...

I'm no expert in silicon scaling but from reading technical papers, my (naive) understanding is that transistor density has almost kept up but now that scaling comes with increasingly stringent design constraints which architects must make trade-offs over. Broadly speaking, things like "You can have 2x last gen's density but they can't all be fully powered on for very long." That's a greatly simplified example but much of what I've seen has been far "thornier" in terms of interacting constraints along multiple dimensions.

My sense is that in the 90s we usually got "denser, faster AND cheaper" with every generation. Now we're lucky to get one and even that comes with implementation requirements which can be increasingly arcane. My understanding is that different fabs are having to roll more of their own design libraries which embody their chosen sets of trade-offs per node. In addition to limiting overall performance and being harder to design, this apparently makes reusing or migrating designs more challenging. While certain headline metrics like node density may appear to be scaling as usual. The reality under the hood is more complex and far less rosy.

Dylan16807 · on Nov 1, 2022

> My sense is that in the 90s we usually got "denser, faster AND cheaper" with every generation. Now we're lucky to get one

You can still get all three, though you can only pick "cheaper" so many times before you fall off the mainstream product stack.

Two generations ago in 2019, the 3700X launched at $329. This generation, the 7600X launched at $299.

The 7600X has fewer cores, but they're about 50% faster individually and 25% faster total.

And it's N5 instead of N7.

sitkack · on Nov 1, 2022

You made me think that maybe computing is a deflationary force (I am not a libertarian, this isn't some free market bro idea, I think)d, the more that can be subsumed by computation, the more things that can get cheaper over time not more expensive, even if the face of rising material costs.

The relative price of steel has remained flat, while the steel performance has greatly increased.

https://www.metalbulletinresearch.com/Article/3532290/Commod...

Between material science and cheaper compute, we can build higher tech parts and techniques.

The cycles/consumed/per/person/per/year is an exponential, what are some important points on that curve? When the computation to design something is on the order as the same amount of energy to create it?

This interesting, https://www.in2013dollars.com/New-cars/price-inflation/1980-...

You could buy a Honda Civic new in 1980 for 5000$, that would only be just under 10k in todays dollars. What 1980 Honda Civic quality car can you buy today for 10k? Or am I a being nostalgic.

And look at the bump in inflation during the recession, https://blog.cheapism.com/average-car-price-by-year/#slide=6... of car prices. Was the 2008 recession triggered by excessively inflated car prices? Like causing a bubble in a pipeline, an economic embolism.

Current average price has dropped 10k$ from 35k to 25k in the years since 2008.

eru · on Nov 1, 2022

Productivity improvements in the economy are indeed deflationary. That's a general idea in economics.

https://cdn.mises.org/Less%20than%20Zero%20The%20Case%20for%... is an interesting book on the topic. (Ignore for a moment that it's hosted by mises.org..)

Some people think that deflation is bad for an economy. That's a bit confused.

What's bad for the economy is a fall in aggregate nominal spending. That one leads to recession and unemployment.

If prices fall, but total spending stays stable, that's fine.

sitkack · on Nov 2, 2022

So it sounds like we need to do more of nothing but employ a lot of people doing it and we can be carbon negative AND have full employment.

We figure out how to turn atmospheric carbon into cheese. Take CO2 -> C12H22O11

eru · on Nov 3, 2022

Could you please try to explain what you want to say with less snark? I'm a bit confused.

Paying people to do nothing gives you nothing.

Full employment isn't an end in itself, but it's useful because it is typically related to things we do care about. Employing people to do nothing is like fiddling with the speedometer of your care in order to 'go faster'. Or relabeling your amplifiers to go to 11.

You can sort-of turn atmospheric carbon into cheese. Have grass capture the carbon, and a cow eat the grass. That's totally doable, just not viable or efficient if your goal is to capture carbon at scale.

(If your goal was to go carbon negative at all costs, you could instate a whooping big carbon tax, and let the economy figure it out.)

sitkack · on Nov 3, 2022

Right now our economy basically runs on carbon at the core. We make stuff, move stuff and emitting carbon is necessary. If we switched our economy to owing and moving information, then we could still have full-employment, move money in the ecosystem while from the viewpoint of a materialist, just be moving useless bits around.

I think we already have a lot of high paying jobs in the economy that don't do much and pay people to do nothing (of value). We should absolutely spread that around.

mrandish · on Nov 1, 2022

> You could buy a Honda Civic new in 1980 for 5000$

Can confirm... because my parents bought me a new Honda Civic that year for college and that's what they paid.

StillBored · on Nov 1, 2022

Which is great if you have a traditional server application servicing a lot of independent requests, or giant linear equations that can be solved in parallel.

OTOH, the graph has an amadal's law section, which for many tasks is pretty out of steam (aka desktop web browsing/javascript JIT/etc).

I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.

This really gets back at the Arm big.little thing, where you really want 99% of your application usage to run on the big cores. The little cores only exist for background/latency insensitive tasks, and the odd case where the problem actually can utilize a large number of parallel cores and needs to maximize efficiency in the power envelope to maximize computation. AKA throw a lot of lower power transistors at the people rendering video/etc, and leave them powered off most of the time.

AKA, put another way, the common use case is a few big powerful cores for normal use, playing games, whatever with one or two high efficiency processors for everything else and a pile of dark silicon for the rare application that actually can utilize dozens of cores because its trivial to parallelize and doesn't work better being offloaded to a GPU. I suspect long term intel was probably right with larrabee, they were just a decade or two early.

So, economically I don't see people buying machines with a couple hundred cores that sit dark most of the time. Which will drive the price up even more, and make them less popular.

eru · on Nov 1, 2022

> I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.

See GAN Theft Auto at https://www.youtube.com/watch?v=udPY5rQVoW0

Someone trained a neural network to convert controller input into video output that simulates the game Grand Theft Auto.

If technology keeps improving, I expect many future games will be such 'dreams' of neural networks.

You are right that running Word or Excel won't really benefit from more cores.

therealcamino · on Nov 1, 2022

Cause and effect is backwards there. Designers only went to multicore because single core performance improvement was leveling off. It's not that people wanted multicore systems and were willing to sacrifice single core performance to get it.

ReptileMan · on Nov 1, 2022

Well we wanted multicore, but it was mostly because windows loved to become irresponsive on single core. I think that from consumer point 2 cored circa 2006 were enough. 4 is probably the absolute maximum.

ummonk · on Nov 1, 2022

How does it disregard multi-core performance? As you said, it's showing the transistor counts going up, and it's also showing the rise in the number of logical cores.

Rufbdbskrufb473 · on Nov 1, 2022

The missing thing that's critical for most multi-core performance use cases is memory bandwidth. Maybe not easy to summarize on a graph like this, but for any workload that can't fit within L1 cache, you're unlikely to get close to linear performance scaling with cores. Sometimes a single core can fully saturate the available memory bandwidth.

MereInterest · on Nov 1, 2022

Back in grad school, one of the analysis programs I used dated back to the mid 70s. The original paper gave a performance metric for a test program, which I compared to the runtime on a Chromebook running Linux. I was curious how closely that scaled with Moore's Law, and computed "initial_release + (1.5 years)*log2(initial_runtime/current_runtime)". That is, assuming that the change in program speed has increased due to hardware improvements, and those hardware improvements follow Dennard scaling, what year is it?

This (admittedly very rough) measurement ended up giving 2003. It was wrong by over a decade from the actual date, but correctly gave the date at which clock frequencies stopped improving.

Traubenfuchs · on Nov 1, 2022

More depressing -number of cores will also level off eventually and where does that leave us then?

Will we actually have to optimize our code instead of pushing each hello world request through hundreds of function calls?

I also assume some niches are actually already 90%+ optimized. Once the cores level off it‘s stagnation here too.

Frost1x · on Nov 1, 2022

>More depressing -number of cores will also level off eventually and where does that leave us then?

Short of breakthroughs (e.g. quantum and currently unknowns), the only clear path is less generalized architectures and more specialized chips. As you move more towards ASICs from general architectures you get improved performance, reduced power, and so on.

We've lived in the era of software where hardware was abundant and cheaper than an engineers time. Throw more hardware at it and make sure you have generally optimal algorithms in most your run paths. That's going to change more and more and I suspect we're going to have to start rethinking or redeveloping some layers of abstraction between current software and hardware.

As it stands now we're building more and more complex things atop weaker intermediary layers of abstraction to save time and meet budgets but that's going to have to be revisited in the future and the inefficiency debts we've been building up will need to be paid down. Clear code will become less of a top priority when clever optimizations can be added in that may not be so clear. We're still many many year away from this but that's my prediction.

urthor · on Nov 1, 2022

Not necessarily a problematic trend.

It takes thousands and thousands of engineers to produce a general purpose chip.

It takes... one smart lady to optimize a widely used library for FPGA acceleration.

pbazarnik · on Nov 1, 2022

The "cores" are becoming more specialized and optimized for domain specific tasks. Compiler technology advancements are needed to take advantage of such heterogenous architectures in a transparent way. LLVM MLIR started that already.[1,2] The alternative is being stuck with each silicon vendor's proprietary solutions like CUDA.

[1] https://mlir.llvm.org/ [2] https://www.theregister.com/2022/04/04/compiling_the_future/

mschuster91 · on Nov 1, 2022

I'd guess we get more hardware acceleration. In classic computers (PCs, laptops, servers), for stuff like audio/video codecs, that's been available for decades, but I'd say the next big push will be ethernet/wifi accelerators that do stuff like checksum calculation/verification, VLAN tagging or even protocol-level stuff like TLS in the chip itself - currently, that's all gated for expensive cards [1], I'd expect that stuff to become mainstream over the next few years. Another big part will be acceleration for disk-to-card data transfer [2] - at the moment, data is being shifted from the disk to RAM to GPU/other compute card. Allowing disks to interface with compute cards will be a lot of work - basically, there needs to be a parallel filesystem reader implementation on the disk itself, on the DMA controller or in the GPU, which is a lot of effort to get done right with most modern and complex filesystems - but in anything requiring high performance removing the CPU bottleneck should be well worth the effort.

Mobile is going to be more interesting because of power, space and thermal constraints and a lot of optimization already being done because unlike on classic computers vendors couldn't just go and use brute force to get better performance, and there is a bit of an upper cap on chip/package size as well. Probably we'll see even more consolidation towards larger SoCs that also do all the radio communication stuff if not on the same chip then at least in the same package, so the end game there is one single package that does everything and all that's needed on the board are RF amplifiers and power management. All the radio stuff will move to SDR sooner or later, allowing for far faster adoption of higher bandwidth links and with it, a reduce in power consumption as the power-expensive RF parts have to be powered on for less time to deliver the same amount of data.

[1] https://docs.nvidia.com/networking/display/FREEBSDv370/Kerne...

[2] https://developer.nvidia.com/blog/gpudirect-storage/

rjsw · on Nov 1, 2022

Network offload has been available in low-cost controllers for a long time, TLS isn't common though.

It could be better if network controllers just had a documented local CPU, then the firmware could be extended over time to add new features.

Mistletoe · on Nov 1, 2022

What if aliens show up and they are thousands of years ahead of us and they don’t have anything much more powerful than an i9 running their UFO?

bee_rider · on Nov 1, 2022

Who knows what sort of tech aliens would have? I don't think this whole foray into general purpose computing was necessarily pre-destined. Maybe their whole system could look more like a bunch of strung-together ASICs. "You made your computers drastically less efficient so that anyone could program them? Why would you want your soldier-forms and worker-forms to program computers? Just have the engineer-forms place the transistors correctly in the first place, duh."

rl3 · on Nov 1, 2022

>Who knows what sort of tech aliens would have? I don't think this whole foray into general purpose computing was necessarily pre-destined.

It's sometimes fun to think that technology is a function of the intelligence that creates it.

What if the aliens have some vastly different perception of reality than us? Things we consider obvious to them may not be, and vice versa. The underlying desires and motivation different.

Humans for example, often tend to invent things for the sake of it. Imagine a species that doesn't do that. Or an organic FTL drive conjured into existence over eons via distributed intelligence. Weird.

checkyoursudo · on Nov 1, 2022

> Or an organic FTL drive conjured into existence over eons via distributed intelligence. Weird.

E.g., What if the first aliens to find us are hyperintelligent slime molds, whose entire existence is predicated on finding the shortest distance between two points in higher-dimensional space and then traveling there to see what there is to eat?

Traubenfuchs · on Nov 1, 2022

The anime Gargantia on the Verdurous Planet explores this.

Here squids evolved into a spacefaring race that is, if at all, only using organic technology and doesn't seem to have consciousness.

They are at war with the spacefaring humans that rely on mecha and AI. It ends with a very non-human and frustrating coexistence message instead of going for all out termination of hostile creatures.

somenameforme · on Nov 1, 2022

One of the most interesting things to think about in this regards is the past and the crazy things they thought, and why they probably didn't seem especially crazy at the time. In the earlier ages of exploration of our world people have been able to discover ever more amazing things from springs mysteriously heated even in the coldest of times and places, to a tree producing bark that chewing on can make ones pain completely disappear (more contemporarily known as willow/aspirin), and endless other ever more miraculous discoveries.

Why would it thus be so difficult to imagine there being some spring or treatment that could effectively end illness or even aging? A fountain of youth just awaiting its discovery. It was little more than a normal continuation outward from a process of exponential progress. But of course the exponential progress came to an unexpected end, and consequently the predictions made now look simply naive or superstitious.

We're now currently in our own period of exponential discovery and the fabulous tales of achievements to come are anything but scarce. Of course, this time it'll be different.

NortySpock · on Nov 1, 2022

Probably not much more than "it wasn't worth it to install cryo-cooled quantum computers on an average spaceship".

We didn't install supercomputers in the Space Shuttle either. All the big iron was in a building on the ground.

ddingus · on Nov 1, 2022

Maybe!

Perhaps they operate a combination of biological systems alongside their electro mechanical ones.

Their ship may be locally intelligent everywhere, with that all rolling up to an i9 ish main control system.

Purpose optimized hardware communicating along standardized interconnects could mean lot of hard tasks done in silicon or shared with biological systems too.

They may have decades, centuries old solutions to many hard problems boiled down to heuristics able to run in real time today. Maybe some of these took ages to run initially.

kloch · on Nov 1, 2022

> thousands of years ahead of us

Just thousands? I would expect 100k years at a minimum and even that is only .0007% of the age of the universe. Millions or Billions of years more advanced is not out of the question.

It would be interesting to see how similar technology is among such advanced civilizations, even if they did not compare notes. Does technology eventually converge to the same optimal devices in each civilization?

Given our current extremely primitive state (only about a hundred years of useful electronics) I would be disappointed if we could even imagine what this technology looks like.

bitL · on Nov 1, 2022

They'll likely use optimization laws of nature to get perfect solutions instantly, like what people try to get nowadays in some labs with electricity finding the shortest path/route immediately.

ars · on Nov 1, 2022

Then that pretty much spells the end of true AI.

We may shift to computers that operate off of chemical signals instead of electrical ones - like the brain.

emkemp · on Nov 1, 2022

Obligatory SMBC:

https://www.smbc-comics.com/comic/2011-02-17

tyingq · on Nov 1, 2022

There's also ideas like the Mill processor. Though it's hard to avoid comparisons to Itanium, and how a mountain of money still didn't produce a compiler that could unlock what initially sounded like a better ecosystem.

joe_the_user · on Nov 1, 2022

Would the number of cores in a GPU level off? It seems like intensive computing of all sort will migrate to gpgpu programming.

ummonk · on Nov 1, 2022

Seems like the latest Nvidia GPUs aren't really an improvement over the previous ones, but just bigger and proportionally more expensive. So maybe the leveling off in performance is already starting to happen.

brokenmachine · on Nov 1, 2022

That is not true.

The 4090 uses less power than a 3080ti while being 63% faster.

https://www.techpowerup.com/review/nvidia-geforce-rtx-4090-f...

, and 45% better performance than a 3090ti while using 2% less power (at 4k).

https://www.techpowerup.com/review/nvidia-geforce-rtx-4090-f...

Can't find the link now, but I saw a youtube video where they did an analysis of it at different wattage limits and it performed very nicely.

4090 draws a lot of power, but Nvidia has just chosen to work at the diminishing returns end of the curve.

It's a halo product for people who will pay for the top of the range. I mean, look at the price!

Traubenfuchs · on Nov 1, 2022

Shrinking has almost stopped too and you can only make a chip that big before it runs into other constraints.

singularity2001 · on Nov 1, 2022

There is a lot of room for development before the exponential curve can be carried by the next paradigm: at least for desktop computers we are still decades away from case filling 3D "compute cubes".

brazzy · on Nov 1, 2022

It's quite possible that kind of thing has hard limits set by cooling.

singularity2001 · on Nov 10, 2022

sure but usually before hard limits are reached a new paradigm is ready to take over

plasticchris · on Nov 1, 2022

The wafer scale computing folks would disagree...

Traubenfuchs · on Nov 1, 2022

This is insane.

https://www.eetimes.com/powering-and-cooling-a-wafer-scale-d...

duped · on Nov 1, 2022

The metric is performance per watt per dollar. At the moment the fact is that the amount of compute available per watt dollar is ridiculously cheap, crypto not withstanding.

We are not limited by compute resources but by business practices. The organizational cost of software design is where the next gains are, not technological.

stn_za · on Nov 1, 2022

Can't wait.

People should be so ashamed that what's basically an IRC client (Slack) requires more than 4GB of ram and so many cores. Laziness. Truly.

nullifidian · on Nov 1, 2022

Not sure if it's exactly accurate with the plateauing trend -- single-threaded performance almost doubled in the last 5 years.

pclmulqdq · on Nov 1, 2022

It's a log scale graph, it shows a doubling. The performance just isn't increasing as fast as it was before.

hansel_der · on Nov 1, 2022

yea, buying a pc in the early intel era was a somewhat double edgded sword because you knew that the next generation would come out in a year or so and it would probably have more than double the performance.

Nomentatus · on Nov 7, 2022

For many years my friends and I had a rule that we wouldn't buy a new computer until it offered at least 4X the speed of our old one. We didn't have to wait all that long.

abbeyj · on Nov 1, 2022

You might like https://github.com/karlrupp/microprocessor-trend-data

andy_ppp · on Nov 1, 2022

I’d love to see how GPUs fit into this picture; are we getting more transistors on a die but designs are not getting much faster?

joshspankit · on Nov 1, 2022

I wonder how much of that flattening is because chips are now hovering just under the 5GHz/6Ghz wifi frequencies.

Yes we have techniques that help them avoid each other, but it seems like it would be enough of a tech and PR hassle that companies wouldn’t bother.

hansel_der · on Nov 1, 2022

pretty sure it has something to do with speed of light divided by clock speed

joshspankit · on Nov 2, 2022

I hadn’t even considered that there is a physical limitation.

I’m too lazy to look it up now, but when I first read this I feel like the math said electrons can move at maximum 5cm in a single 5GHz clock cycle?

That’s wild.

hansel_der · on Nov 2, 2022

the eletrons do not have to 'move' the entire way for transfer of information because they will push the already present ones further.

joshspankit · on Nov 2, 2022

Well yes, but that’s an academic difference unless you’re stating that they can transfer information faster than light.

hansel_der · on Nov 4, 2022

well no, i was saying that the information moves faster than the electrons. speed of light in vacuum remains unbeaten afaik

kloch · on Nov 1, 2022

Pretty sure it has everything to do with heat dissipation

hansel_der · on Nov 2, 2022

probably true as well.

also iirc the actual clock of the logic gates is a factor above the 'operations clock'

causi · on Nov 1, 2022

This trend of browsers to increasingly lie to me about what I'm looking at is infuriating.

modeless · on Nov 1, 2022

The URL is a lie but that's not the browser's fault. It's correctly showing you the URL that you requested and the server responded to. If you save the file locally the browser will give it the correct extension.

q-big · on Nov 1, 2022

> This trend of browsers to increasingly lie to me about what I'm looking at is infuriating.

The browser does not lie to you, it just does not show the MIME type of the content.

I do agree that it would make sense that the browser shows the MIME type of the content that is currently displayed, but this is opposition to the current fashion (in particular pioneered by Apple) of "simplifying" everything.

deathanatos · on Nov 1, 2022

> The browser does not lie to you, it just does not show the MIME type of the content.

In this particular case, Firefox does:

> Picture1-2.png (WEBP Image, 1054 × 593 pixels)

is the title on the tab. (The ".png" comes from the URL, as a sibling notes.)

Chrome does not.

samstave · on Nov 1, 2022

where the heck is Transmeta ??? intel was so afraid of them in 1997

Obv AMD was the true and once... but there was a lot of Transmeta-fear back then too

jecel · on Nov 2, 2022

Transmeta announced their first product, the Crusoe processor, in 2000. Before then they were highly secretive and nobody knew what area they would compete in.

Intel was indeed worried about the laptop market when the Crusoe came out, but quickly adopted the voltage and frequency scaling into their own processors negating some of Transmeta's key technical advantages.

Meanwhile, IBM changed directions as a third party fab (they dropped bulk CMOS to focus on SiliconOnInsulator) leaving Transmeta without a product to sell until they could find a new fab and design their next generation product for that.

amelius · on Nov 1, 2022

How is it that the number of transistors still grows exponentially, while the power use has plateaued?

Swenrekcah · on Nov 1, 2022

I think I can say the following.

Smaller transistors mean fewer electrons that go in and out every clock cycle, so less power per transistor.

Higher clock frequency means more cycles per second, that is more electrons spent per second thus higher power consumption.

Since clock frequency has stabilised and the total area of a chip is I think not much larger than before, it is expected to see the power consumption stabilising.

I also believe I read somewhere that one of the reasons clock frequency stoped increasing was that the power consumption became too high for the chips to handle the thermal dissipation.

eru · on Nov 1, 2022

Each transistor uses less power?

amelius · on Nov 1, 2022

Exponentially less.

jdthedisciple · on Nov 1, 2022

interesting, so basically since 2010 no all-to-significant progress has really been made despite the scaling in logical processor count.

This is considering the vast majority of software basically still uses only a single core

dspillett · on Nov 1, 2022

> the vast majority of software basically still uses only a single core

And that which does use multiple cores, sometimes only scales well to a few because then other bottlenecks† start to become most significant.

Many things are not so “embarrassingly parallelisable” that they can easily take full advantage of the power available from expanding the number of processing units available beyond a certain point.

--

[†] things no longer being “friendly” to the amount or arrangement of shared L2/L3 cache as the amount of threads grow causing more cache-thrashing, hitting other memory bandwidth issues, or for really large data you might be considering issues as far down as the IO subsystem or network.

kayson · on Nov 1, 2022

The article doesn't really mention one of the biggest problems: cost. These new process nodes are insanely expensive. Far moreso than before. This is one of the biggest drivers of new packaging technologies (chiplets). Companies simply can't afford to manufacture their entire design on the latest process. There are even still newer transistor designs that promise further improved performance but will be even more cost prohibitive.

I'm not convinced that there are any hard EDA problems here, despite the fact that EDA as an industry in general is woefully archaic, especially when compared to the equivalent in the software world. We've been doing multi-layer and/or multi-die packages for a long time. It's more of a project management challenge, having to get different teams (process, packaging, design, verification) to work together earlier on in the design cycle, and more frequently throughout.

conjecTech · on Nov 1, 2022

Chiplets aren't so much driven by a need for heterogenous processes as much as minimizing the yield impact of defects. Ian Cutress has a good portion covering it in this recent video: https://www.youtube.com/watch?v=oMcsW-myRCU.

In short, if your process produces 20 defects per wafer, and you can fit 100 chip on a wafer, you're going to end up with ~84% yield(ie, slightly less than 20% loss). If you are able to split that same chip into 2 equal-sized pieces and make twice as many, your yield is now above 90%.

AMD has also made them composable in such a way that they have to produce and stock far fewer ICs to fulfill all of their SKUs, which is also another fantastic benefit.

easygenes · on Nov 1, 2022

If you're interested in more details on the cost of producing chips on the latest nodes, this is a good back of the envelope sort of breakdown for the Ryzen 7950X. [1] They use a visual die yield calculator to talk through the logic of improving economics using chiplet designs.

The figures on the actual full wafer costs for TSMC 5nm are not public, but analysts say a ballpark around $17K is pretty reasonable. Using all the figures they ballpark, the manufacturing costs of the 7950X die collection, fully packaged, sounds like it averages around $70. So the cost is high as far as large scale chip manufacture goes, but this is also a part that retails for $600; there's certainly plenty of room to remain profitable at the top.

Realistically, R&D costs are the biggest consumer of profit margins still.

  1: https://www.youtube.com/watch?v=oMcsW-myRCU

conjecTech · on Nov 1, 2022

Yeah, that video is fantastic. I wonder if the higher costs will place more of an emphasis on running chips at power levels that will gives better longevity. We can effectively halve operating cost and double lifetime of chips by reducing their power levels 20%. Seems plausible if we don't have technological obsolescence to force depreciation.

easygenes · on Nov 1, 2022

Don't chips at stock power levels with reasonable cooling have extremely high resilience for at least 5 years? Most datacenters have no more than a 4 year refresh cycle for servers based on power and space efficiency optimizations.

For example, der8auer did a test [1] with chips like the 5800X with high overclock and high voltage with stress tests for over 4000 hours and came essentially to the conclusion that the chips are likely to endure for at least 5 years under rather extreme conditions. Likely much longer under normal circumstances. The number of systems he's testing isn't statistically significant like you might get by trying this sort of thing at datacenter scale, but it is fairly illustrative.

  1: https://www.youtube.com/watch?v=ZAww0c2m-ks

conjecTech · on Nov 1, 2022

5 years is probably a fair current estimate, but that also doesn't come for free. The fans in DC servers move a ton of air, which itself further increases power use. I was thinking more on the order of 10-20 years.

easygenes · on Nov 1, 2022

Under most circumstances I can't imagine a reason to bother powering on a 10 year old system, short of nostalgia. The costs of running it will quickly eclipse the costs of buying something newer and more efficient.

Of course, there are plenty of edge cases, like needing something bare metal that has a particular sort of software compatibility or IO requirements. Some industrial computers still run 486 chips with ISA buses for this reason. These sorts of systems will have been engineered with longevity in mind from the outset though.

Other edge case, just for fun: embedded style systems like the Raspberry Pi. These are tiny, low power, and can be used for specialty purposes for ages. They are also engineered on nodes and setup in a manner that will likely leave plenty running successfully in 10-20 years' time as it is.

It is really only since we've entered the era below TSMC's 7nm node that longevity has become much of a concern at all. It would take a whole essay to even TLDR the constraints of why that only becomes very relevant in the period where those nodes start to become known as "mature", and this is already enough of a tangent, so I'll just leave this breadcrumb of a presentation on the lifecycle of silicon process nodes:

  https://www.youtube.com/watch?v=YJrOuBkYCMQ

hajile · on Nov 1, 2022

He leaves out R&D and mask costs.

That last one is very significant. The hyper reflective EUV mask mirrors are incredibly hard to make and cost hundreds of millions. The mask alone no doubt raises chip cost by $10 or so.

Likewise, hundreds of engineers for 3-5 years can run up a couple billion dollars that must be recouped.

kayson · on Nov 1, 2022

Absolutely. Yield is obviously a big part of cost-per-die, but it's not just defect density related. When a node is first released, the process is not as well controlled so you end up with yield loss because the process doesn't match the models (in the extremes) and dies fail at various testing stages. This obviously gets better over time, but it's still a major issue for companies designing in the latest technologies. Anecdotally I'd say there's a ~3X reduction in the "sigma" of the process over 3-5yrs.

Of course these issues have always existed, but wafer costs have never been so high, and yields never so low (both because of increased defect density and worse process control with new transistors).

Vt71fcAqt7 · on Nov 1, 2022

How do they combine the two seperate chips into one? Is there a performance/size cost? Could we have 1,000 chiplets per CPU and acheive >99% yield?

kayson · on Nov 1, 2022

There are a variety of 2.5D (chip next to chip) and 3D (chip on top of chip) packaging techniques. Each comes with different tradeoffs.

Here are a couple of articles with more detailed info:

https://semiwiki.com/semiconductor-manufacturers/tsmc/306329...

https://www.anandtech.com/show/16051/3dfabric-the-home-for-t...

https://www.eetimes.com/amd-tsmc-imec-show-their-chiplet-pla...

https://www.techpowerup.com/292256/amd-details-its-3d-v-cach...

dwaite · on Nov 1, 2022

They use an internal interconnect, similar to how a server which took multiple distinct CPUs would use a motherboard-level interconnect.

In addition to splitting the CPU or GPU into distinct units, you can also take other functions and use different processes for them. For example, in package I/O or L2 cache don't really see the same advantages for newer processes, so you can make these using more established (and cheaper/more available) processes.

conjecTech · on Nov 1, 2022

You get most of the economic value from the first few divisions. Going from 80% yield to 90% yield shaves 14% off your cost/unit. Going from 95% to 99% only saves you 4%.

I'm not an expert at how they combine chips. Like I said for AMD, they also wanted their units to be composable w/ small number of chips, so they basically have a die w/ a few cores and different die w/ memory, and I believe they have a proprietary communication mesh for connecting them. I think there is some considerable signal/energy overhead to communicating between chips. The cost of masks and interconnects is probably high enough to make a high number of diverse chiplets inviable, but I wouldn't say it's impossible in the future.

Vt71fcAqt7 · on Nov 1, 2022

Thanks.

>I think there is some considerable signal/energy overhead to communicating between chips.

I tried looking for some info on this but couldn't find any. Do you have any source I could read on this?

kayson · on Nov 1, 2022

This is a fundamental effect because of the Shannon-Hartley theorem[1], which says your communication channel's capacity (i.e. bitrate) has a log dependency on the channel's Signal to Noise Ratio. In a practical wireline communication system, you have a transmitter with noise, a lossy interconnect, and a receiver with noise. As the interconnect gets longer, which is what happens when you go from on-die communication to die-to-die communication, your loss increases. This means you have to reduce your transmitter and/or receiver noise which requires using additional power.

Another way of looking at it is bit error rate [2].

I think you'll be hard-pressed to find concrete numbers since these designs are closely-guarded trade secrets. You might find some examples by searching for wireline transceiver / PHY papers on IEEE xplore, especially at ISSCC (conference) or in JSSCC (journal).

[1] https://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theore...

[2] https://en.wikipedia.org/wiki/Bit_error_rate

Lind5 · on Nov 1, 2022

this related article addresses some of the cost issues https://semiengineering.com/designing-ics-in-an-increasingly...

travisgriggs · on Nov 1, 2022

If we start shifting towards specialized hardware configurations to solve specialized problems, how does “move everything to the cloud communicating through fragmented docker images” fit?

It just seems weird to me to see posts of spending lots of money on complicated scalable Kubernetes clusters, but at the same time people are putting their AI computation engines right on die so that they get good cache coherence. Maybe they’re more resolvable than it seems.

c54 · on Nov 1, 2022

K8s GPU nodes are already kind of an example of this-- the app can still access the specialty capabilities of the hardware, we just have to make sure to tell the scheduler "hey run this app on this kind of node with this flavor of capabilities". I could imagine that kind of flavor capability growing and becoming more fleshed out

Except k8s calls it 'taints' which is uh a weird choice.

kevincox · on Nov 1, 2022

Taints is a funny name when you look at it like that but taints are actually the opposite half of the story.

If you want a Pod to be scheduled with a certain node you need to add affinity.

The taints are to ensure that your average nginx pod doesn't run on the specialty hardware and block the pods that actually need it. In theory the scheduler could be smart enough to evict the pods that don't need the special node but IDK if the kube scheduler is smart enough. Even then it may he preferable to have the taint though to reduce the eviction rate.

bippingchip · on Nov 1, 2022

This is probably one of the biggest challenges ahead, likely more so than making hardware that still somehow scales despite Moore’s law slowing down and tapering out entirely. The increased scale of integration with advanced 3D packaging and chiplets is a nice way forward, but the heterogeneity of those solutions need innovation at the programming model as much as anything else: how do you make such a complex system accessible in a way that the programmer can actually use it efficiently?

As much as I like semiengineering and their articles, they are very hardware and process technology focused and tend to miss the software side of things.

prng2021 · on Nov 1, 2022

I love buying new and better gadgets like everyone else but if cpu advancements froze after the next generation of ones using the 3nm transistor node, would life on Earth really be that bad?

We’d be in a world where we have no choice but to optimize existing algorithms, software, and hardware designs. And yes, after many years even that would reach its limits. Even so, I still think life would be great for every generation that lives on after that.

FrontierPsych · on Nov 1, 2022

It really makes no difference to most people.

I have 10 and 15 year old computers and they work just fine for me. Web browsing, spreadsheets, word processing documents.

I don't use the latest gamer stuff, or have 285-dimensional graphics that rotate in 48 dimensions or whatever.

I'd think that 95% of the computer users are just like me.

Pissed me off...my parents essentially use their computer and smart phones the same exact way - email,browse the web, and that's about it. My siblings decided they had to upgrade and got them new computers and smartphones. WTF? Total waste of maybe $5000. If anyone else has $5000 that they want to spend on nothing, let me know and I will give you my bank routing number and account number and you can just send it to me for nothing in return, too. :P

1123581321 · on Nov 1, 2022

I would personally be content, but I’m thankful for the advances made possible by faster computing, especially in medicine, and I’d like to see more. If we keep advancing computing power, today’s research supercomputers will be tomorrow’s personal computers in the hands of researchers of all income levels all over the world.

jiggawatts · on Nov 1, 2022

The vast majority of hardware devices haven’t even caught up to 14nm, let alone 3nm which is still an experimental node used by a few customers “at risk”.

It would take decades at the current pace of progress just for everyone to catch up to the current bleeding edge.

I would have retired from IT work by the time that happens!

In other words, we’re already at the stage that many people will continue to see non-stop improvements even if 3nm is “the end”.

Meanwhile Intel says that their 18A node is looking good and ahead of schedule…

1123581321 · on Nov 1, 2022

I think we’re talking past each other. You’re talking about getting the lagging sigmas of the distribution on today’s fastest retail computers, and I’m talking about advancements and discoveries that will require more than our current leading edge retail capabilities. Distribution of our leading capabilities will always be uneven, and will not push out an old computer at a 1:1 rate, so hard to measure via global mean/median retail computer speed in any middle or lower percentile.

jiggawatts · on Nov 2, 2022

I'm agreeing with you in the sense that even if the bleeding edge didn't progress, we will still have the kind of advancements occurring that you would like to see.

As a random example, full-frame camera CMOS sensors are made with very old processes (relatively speaking), and have very little "logic" on them. If manufactured with 3nm, they could have something like 2,000 transistors per pixel! That would enable seemingly magical "digital imaging" capabilities, such as infinite dynamic range, perfect digital vibration compensation, ultra-high framerates, etc...

futureshock · on Nov 1, 2022

I think it would be quite bad actually in opportunity cost. The performance per watt alone would represent a huge energy savings. New chips enable new applications and no new chips would mean that the entire nonlinear economic engine of tech industry would sunset in a few years once every existing application catches up to state of the art. Computationally heavy applications like ML would remain locked in the high tower of large organizations and out of the hands of individuals. VR and AR would probably never arrive to the mainstream.

prng2021 · on Nov 1, 2022

I think that’s an illusion. The last 50 years of modern computing have proven without a doubt that global energy consumption will far outpace improvements in performance/watt that come with each generation of cpus.

Any improvement in energy usage is wiped out by more kinds of and ever faster gadgets: Desktop computers followed by mobile phones, every possible sized tablet screen, voice assistance speakers, voice assistance speakers with screens, smart watches, smart lights, smart vacuum cleaners, drones, and so on. And with all these smart devices, we have more and more data, which we then need AI to make sense/use of in order to develop new or more useful things. It’s a never ending cycle.

VR and AR never arriving to the mainstream would possibly be true. But our lives are already great today without that.

kart23 · on Nov 1, 2022

I think we're still pretty far off from that. Graphics are still held back massively by processing power, and algorithms that significantly could improve visuals would have been developed by now if it were that easy.

awestroke · on Nov 1, 2022

Graphics is very parallel, and so not that affected by a single-core performance plateau. We can keep increasing number of gpu cores.

IshKebab · on Nov 1, 2022

Sure but not many people want a 1 kW GPU.

awestroke · on Nov 1, 2022

That has more to do with frequency. Frequency requires higher voltage

IshKebab · on Nov 1, 2022

Not really, frequency has plateaued precisely because of the crazy power scaling with frequency.

The easily parallelisable nature of graphics means you can always make a faster GPU with more cores. The trouble is, newer nodes aren't reducing power usage per transistor very much, so all the newer GPUs just have a ton more silicon, and therefore much higher power requirements.

awestroke · on Nov 2, 2022

RTX 3090 TI: 1560 MHz base, 1860 MHz boost

RTX 4090: 2235 MHz base, 2520 MHz boost

kart23 · on Nov 3, 2022

yeah, and the power draw is around the same because the 4090 is 4nm, and the 3090 ti is 8nm. smaller transistors means more power efficient chips, and thus larger improvements in graphics.

bioemerl · on Nov 1, 2022

It's not just gadgets

AI

Protein folding.

Weather simulation.

Etc.

This will literally mean millions of people dead who otherwise wouldn't have been.

prng2021 · on Nov 1, 2022

I consider all of those as nice-to-haves. We have over 7B people in the world and got to this point without the technologies you mentioned.

We already have the technologies today to allow people to live way longer than would otherwise be possible, to find and use any natural resources the planet has, and to dominate any other organism that could threaten our lives. I’m content with that.

bioemerl · on Nov 1, 2022

I think it is a big mistake to be content with how crazy primitive we are today.

We can't even operate on a brain tumor that's inside your head - we lack the precision to get in there without damaging the rest of your brain.

When we want to operate at microscopic levels we continually fail.

The mysteries of how and why we age, how many diseases work, remain unknown.

There is so very much left to learn. There may come a day I might feel it's time to be content, but every second that passes I'm a little closer to death.

We should continue to strive for more. May there come a day that we can live for very long periods of time and each individual can have as much as 100 people have today.

charcircuit · on Nov 1, 2022

Mobile graphics is still too primitive for this to be the end.

vinyl7 · on Nov 1, 2022

I look forward to Moore's law ending, because then software might actually become good again (lol probably not)

iveqy · on Nov 1, 2022

Software seems like a very immature field where we can still do a lot to increase performance. I'm looking forward to the shift

vinyl7 · on Nov 1, 2022

Software has become more immature over the last decades because developers lost the engineering mindset and skills to write optimized software like they could back in the days where we only had a couple of MB of ram.

It really does not require new expertise or tech at all to cut latency and load times by 90%. Programmers have to get back to being engineers instead of just duct taping libraries together or praying that the dumb GC knows what its doing

hajile · on Nov 1, 2022

Are customers willing to pay 10x the cost for the mildly better experience?

At most places, there’s barely enough time to implement things completely let alone to optimize everything properly.

It’s also noteworthy that optimizing 10kloc from yesteryear is a much easier task than optimizing a few mloc of code today.

We didn’t just lose all those optimized libraries. The requirements bloated and nobody wants to reduce features.

vinyl7 · on Nov 2, 2022

I would argue that the only reason we went from 10kloc to +1mloc is because of the proliferation of bad coding practices...like OOP

stn_za · on Nov 1, 2022

Simple. Kill Electron immediately, fucking Slack being so slow on a multi core modern PC is ridiculous. It's IRC ffs.

FPGAhacker · on Nov 1, 2022

No kidding, almost 30 years later and windows still finds a way to make my work computer run like crap.

Looking at a flame graph 60 levels deep demonstrates the problem pretty well I think.

DeathArrow · on Nov 1, 2022

Just going from Python to Java or C# will give us a 95% economy in energy. Going from web apps in JS packaged for desktop or mobile to C or C++ would yield a similar gain.

PostOnce · on Nov 1, 2022

Only in programs with 100% cpu use?

If Python or JS sit idle waiting for user input, are the efficiencies really so different?

Developer time also has energy cost in AC, transport, etc for a human.

What about the fact that the cost of web inefficiency is offloaded to the user? Users pay the power bill for the overcomplicated, slow, unnecessarily CPU intensive front end.

So, at a surface level, what you say is true, but the deeper picture is complicated.

The tooling, too, has gone a certain way... making a native GUI is now slower, less documented, less catered-to than web UI; in my opinion, for most simple to moderately complex UIs, native is going to take twice as long to make look good... but that inverts, in my experience, as the program and UI become more complicated.

It's also strange that the UI itself is so bloated and developmentally time consuming. We lived on the CLI once -- maybe in some not too distant future a killer app can once again show consumers the power of the command line. After all, google/Siri/Alexa do just that.

grumpyprole · on Nov 1, 2022

> If Python or JS sit idle waiting for user input, are the efficiencies really so different?

Nit pick: Python is typically at least an order of magnitude slower than JS, it's a whole different class of slow.

> Developer time also has energy cost in AC, transport, etc for a human.

Yes and fixing bugs in production is especially expensive. Presumably that's why Python is now trying to retrofit static types. There are statically typed languages that are just as terse as Python.

akkartik · on Nov 1, 2022

"More-than-Moore has created a compelling case for the implementation-analyses microcosm to transcend across the fabrics of system design, from silicon to package, and even beyond, and more so in the systems companies that are at the bleeding edge of design innovation."

What does this even mean?

Rexxar · on Nov 1, 2022

It means that marketing division has won against engineering division.

tbjoern · on Nov 2, 2022

This reads like GPT-3.

Perfect example of syntax without semantics.

_jvqe · on Nov 1, 2022

Dating back to the turn of the century, I can still recall the pleasure of building my 1st of many SMP desktops/workstations with an Abit VP6 to game on and work on at home.

In 2022, I would almost expect to see mobo designers and chip manufacturers start rolling out deca-socket designs for compute and automotive use, giving users (okay.. Just the nerds and geeks) the option of mixing-and-matching chips to accomodate custom configurations (and unthought of use cases).

The CPU wall almost reminds me of automotive where the engines may be hitting cooling/fuel system limits of the current technologies, but still has plenty of optimizations available for unsprung weight, torsional rigidity, coefficient of drag and so on.

It may not address the power efficiency angles, but for many users, having 10 sockets to add optimized compute chips (main cpus, AI accelerators, graphics enhancement chips and a number of other feature-enhanced chips) keeps the R&D money flowing until the industry gets over this next hurdle (they always do).

The author states that the expense of pushing moores law, etc is becoming cost prohibitive, so take a break and rock some modern PCB designs.

davidgay · on Nov 1, 2022

> In 2022, I would almost expect to see mobo designers and chip manufacturers start rolling out deca-socket designs for compute and automotive use

It's called the PCIe bus. And before you say "but it doesn't meet my needs / it's latency is too high / it's bandwidth is too low", come up with some ideas on how to do that better rather than just say "add sockets".

_jvqe · on Nov 1, 2022

I rocked a Commodore 128.. Don't F with me :p

Power is nothing without spittle (I'm sure at least one person said it)

I'm not familiar with the lingo, but I was thinking of multilayer sockets, with a higher bandwidth loop/ring for the primary sockets with the PCIe coming off the CPUs in a traditional fashion. Perhaps I read about that with the opterons ages ago.

xhcxjkhchxjk · on Nov 1, 2022

bebox, what the macs could have been if jobs wasn't so hellbent on getting back to apple, had a CPU port.

it was pretty much a north bridge socket/plug to the cpu from outside of the case.

haven't seen anything close to that. also the design of the pseudo realtime os would be killer on today's multicores. not to mention the bekernel c api instead of the osx side-step-patents-above-all-else-api of their bastard bsd. glad my livelihood doesn't depend on osx/ios client side applications

_jvqe · on Nov 2, 2022

I'm silly enough to think having a 12u rack in the mancave is better than a keg setup.

https://ltsp.org/ http://www.porteus.org/

I've done a few personal projects with LTSP and think PXE booting is the bee's knees for some uses (obviously not as portable as a laptop running porteus). Booting remotely was not a priority, but OS exploration has always been exciting.

With more devices (laptops, SBC's, etc) getting HDMI-In ports, running a compute stick, sbc/PI or other devices externally is also a cool feature. I like KVM's, so if they integrate a K/M protocol to HDMI-In port at somepoint, its back to plug and play designs, instead of RDP, SSH, etc for the external devices.

Though on this post, cramming a SOC with the kitchen sink on it, it seems like they could simplify that dilema with a deca-socket design, reserving the PCIe bus for traditional expansion.

Even expanding memory channels per socket negates some of the penalties for performance under some conditions (bottlenecks can be moved easily and/or will still exist without the complexity of jamming everything on one CPU die.) That is cool though!

davidgay · on Nov 1, 2022

86=lda a (which I still remember), is all I have to say.

_jvqe · on Nov 1, 2022

Not familiar with that.

I did a few quick searches on the quad cpu itaniums to no avail, but vaugely recall faster ring buses for sockets (though perhaps just faster for the era).

I can't say that i still have floppies for Leisure suit larry, but i can say 8-bit boom chacka mouw mouw did exist too.

DeathArrow · on Nov 1, 2022

But still, many server mobos have more than one socket.

_jvqe · on Nov 1, 2022

Right. Evolution is still cool.

Expanding beyond CPU sockets sounds like a reasonable expectation, instead of cramming everything on one die.

sp332 · on Nov 1, 2022

Moore's law is not a fundamental law, it's just an observation. Amdahl's law is holding just as well as ever.

amirhirsch · on Nov 1, 2022

Amdahl’s law is also just a conjecture unless you have a proof that P is disjoint from NC

cudgy · on Nov 1, 2022

For consumer use and probably many commercial uses too, I am becoming skeptical about hardware performance improvements and whether users are gaining sufficient benefits from these improvements.

Software has become so wasteful of resources that the hardware improvements are offset and there is little perceptible improvement for the end user. This is of course not the fault of the chip makers, but the fault of the movement to inefficient web apps and more resource intensive languages and frameworks.

jiggawatts · on Nov 1, 2022

I find myself agreeing with Jim Keller. His stance — from very high up in the chip design industry — is that individual teams at the bottom may be seeing the end of the road, but there are alternate paths being explored by other teams. Often these alternate paths aren’t yet commercialised — mere options — but enable the overall improvements to keep going without getting stuck.

Some examples:

Stacking and joining chiplets will soon enable over 2 TB/s memory bandwidths for ordinary devices, far higher than the 50 GB/s typical of mid-tier DIMM memory.

Storage latency and IOPS is still going up. Millions of random accesses per second with ~20 microsecond latencies enabled technologies like DirectX DirectStorage. This totally changes the way game engines work and how their art assets are authored.

I could go on…

shadowgovt · on Nov 1, 2022

I wonder how long until software will also need to shift.

The hardware can only become so parallel before the software will have to change to more closely match its reality.

hinkley · on Nov 1, 2022

I think there’s a potential for collaboration between borrow checkers and more aggressively NUMA architectures, but the problem is you still have to run commercial operating systems on any chips you make. That’s less of a problem for embedded systems and controllers, but then you lose a lot of the value as well.

IshKebab · on Nov 1, 2022

That was one of the few good parts of this article. Software will have to become power-aware. You already see that on phones where software has to schedule tasks with the OS rather than running them whenever it wants.

UltraViolence · on Nov 1, 2022

I predicted a LONG time ago that once we reached the physical limits computers would get bigger. Starting out with chiplet designs and then moving on to dinner-plate type systems such as Cerebras.

Our computers will become larger in size, maybe the size of a fridge in a couple of decades. Power requirements too will increase.

I'm still hoping some wondrous material will be invented which will allow us to shrink chips still further. But it's obvious there's a limit how far we can go. The theoretical limit is still way off, though. It's about several million times the speed of current CPU's.

pca006132 · on Nov 1, 2022

It seems the bottleneck nowadays is mainly memory latency/throughput and thermo, the latter one is especially problematic for mobile devices due to less space for cooling. For memory, it seems that although we have higher throughput with DDR5, the latency is still high and we can only compensate it with larger cache, e.g. AMD 3D cache.

For larger chips, you will get longer wires and may not be able to do things in 1 cycle. Assume that thermo is not an issue, you can probably cram many cores into 1 tiny chip, but the bus will probably be overloaded with too much traffic and long wire length. I think eventually cache coherency will be so expensive that we will go back to message passing for multicore or requires source/destination node ID for memory ordering...

m3kw9 · on Nov 1, 2022

The compute size needed will likely move to cloud. What they need is lower wireless latency, deterministic speeds and reliability for that to happen.

N1H1L · on Nov 1, 2022

We will definitely move to wide bandgap semiconductors fast, where we can run much higher switch speeds (aka clock rates) without overheating

UltraViolence · on Nov 2, 2022

I once read an article in Popular Science in the early 80's about the invention of quantum transistors at Texas Instruments.

Are those what you mention?

N1H1L · on Nov 2, 2022

Not really. Our substrate today is silicon (really n-type silicon). We build a city on top of that substrate. Modern semiconductors have become way more heterogenous in recent years. Our gate insulators are not silicon dioxides anymore but high-k dielectrics such as hafnium oxide. And even interconnects now have different compositions, such as cobalt and copper.

But the substrate is still holding us back, and one reason for that is creating high-quality single crystal wafers for silicon that are way more manageable.

The solution for a long time has been to use direct bandgap semiconductors such as GaN, AlN, and so on. Silicon is an indirect bandgap semiconductor, meaning its conduction and valence bands are not aligned in momentum space. As a result, electrons hopping between these bands pay a thermal penalty. GaN and AlN don't have that problem, and we now have a plethora of crystals (especially complex oxides) that are all direct, and we have an entire library of bandgaps to choose from.

The problem is that we cannot easily make wafers from these direct bandgap systems. At least yet to be on any commercial scale. Honestly, rather than flushing billions of dollars down the toilet for Zuck's vanity projects - we should have spent money to scale up Czochralski's growth[1] for a ton of crystal families. But, as we are reaching the limits for reticule sizes and we move to chiplet architectures, maybe smaller wafers will work too?

[1] https://en.wikipedia.org/wiki/Czochralski_method

jl6 · on Nov 1, 2022

Perhaps computer architecture is destined to become like building architecture: functionally complete, with variation only in form, aside from some minor incremental innovations and the occasional one-off megaproject.

rowanG077 · on Nov 1, 2022

I'm not an architect, but I very, very, much doubt building architecture is complete. Just look at buildings from 30 years ago to now. The differences are stark.

jl6 · on Nov 1, 2022

It’s all form. Buildings 30 years ago and buildings today do almost exactly the same things. Sure there have been marginal improvements, but they are marginal.

rowanG077 · on Nov 1, 2022

No it's not all form. Ventilation, climate control and light have changed a lot in recent years. At least from my amateur point of view.

fallingfrog · on Nov 1, 2022

I don't know, my house was built in 1920 and it's doing fine. Upgrades have been made- heat pumps, new wiring and so forth- but the overall design of the house isn't really that different from one built today. Except that in my house, the 2x4's are actually 2 inches by 4 inches.

rowanG077 · on Nov 1, 2022

That doesn't mean your house would be build like it's 1920 if it were build right now.

Besides I didn't really mean "normal" houses. But rather large buildings which have complex ventilation, sunlight, climate control and sustainability requirements. There has been a lot of innovation in that space. And it shows in the office buildings that are build today.

stopdropnhotpot · on Nov 1, 2022

They're still predicting shift away from silicon is another 5 years out. No, really. It's been 20 years+ at this point they've been developing alternatives and they can make chips but they're far from being competitive with silicon in any metric. It's called single crystal complex oxide or SiC

lumost · on Nov 1, 2022

I wonder if part of the recent switch back to statically typed languages is merely a reflection of changing compute requirements. During my career, single core performance has only doubled at best. Scale out has worked… but it hasn’t been cheap.

grumpyprole · on Nov 1, 2022

Or perhaps because knowing/reasoning about at least some properties before running a program is a good thing? Optimisation is only one consequence of static types. Perhaps the reason is that static type systems are getting better and less intrusive.

v4dok · on Nov 1, 2022

Maybe the sophons are doing their work. Seems like a fundamentals problem to me.

mclightning · on Nov 1, 2022

Some headlines could really use capital letters. I am not a native speaker, and I struggled to understand this sentence until I clicked on the link, and realised it is actually a headline-style sentence...

PicassoCTs · on Nov 1, 2022

I wonder, as parallelism is exhausted too, will there be a point of reversion, were the cost of bad management practices on product and performance become so big, that the decision power returns to the experts who craft such systems? As in, the council of engineers vote no on middle managements eternal bloat for company internal status points? Else the users switch to engineer governed systems, as created by open source?

whoisthemachine · on Nov 1, 2022

I have a hard time viewing something that can "run out of steam" as a "fundamental law". It seems to me that the true fundamental laws - those of physics - are going full steam ahead, and the reality they describe are catching up to companies' chip designers (and marketers).

snvzz · on Nov 1, 2022

RISC-V has been engineered to support this very shift.

It can thus be the base ISA for all these accelerators to be built on.

Taniwha · on Nov 1, 2022

I'm a RISC-V designer, I'm afraid it's not a magic bullet for these sorts of issues. What it does do is free lots of designers to work on solutions and build off of each other's work

DeathArrow · on Nov 1, 2022

Can't we move over from silicon? I've read about countless tries to design CPUs based on different principles and materials like carbon, optical and even biological modeled systems. So far our best results still come from trying to shrink silicon transistors.

prox · on Nov 1, 2022

With so much invested in Silicon it is hard to boot up a whole new paradigm. In that sense working with Silicon is much easier / less friction relatively speaking.