This graph in the article is pretty fascinating: https://i0.wp.com/semiengineeri...

mrandish · on Nov 1, 2022

My entire career in computers spans the 40 years in that graph. The constant leaps in fundamental speed were exhilarating and kind of addictive for technologists like myself. As the rate of progress has fallen off over the past decade it's been sad to see the end of an era.

I'm sure speeds and capabilities will continue to increase, albeit much more gradually, but significant gains are going to come slower, harder and at greater cost. The burden will have to be shouldered by system architects and programmers in finding clever ways to squeeze out net gains under increasingly severe fundamental constraints (density, leakage, thermals, etc).

Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs, knowledge of the hardware underneath the code and low-level assembly language skills were highly valuable. Over the years, the ability to think in instruction cycles and register addressing modes grew anachronistically quaint. Now I suspect those kinds of specialized 'down-to-the-metal' optimization skills may see a resurgence in value.

sitkack · on Nov 1, 2022

> skills may see a resurgence in value.

I think it is the opposite. I have almost as much experience as you, I started a little later, didn't get serious until my teens with 68k assembly language and custom chip programming on the Amiga 500. Not all nostalgia, some of it germane context.

I think it is important to have a mental model of the hardware so that the architecture of the program has some mechanical sympathy. But the ability to think abstractly is more important, that is what allows Moore's law to be realized. Our compute topology is changing and if the perf curve is to continue to be exponential, our code and more importantly the expression of our ideas has to be able to exercise 30B transistors today, and 150B in 8 years. Knowing how to compose neural networks is one of the new skills that is akin knowing how to shave off cycles in the 80s. Mod playback, Doom, Quake, mp3 decompression, emulation all redefined our relationship with computing.

The Amiga had this custom hardware for doing bitblits and sprite compositing, it could do these trippy multi-layered backgrounds that used parallax to give it an Arcade like 2.5D rendering (wow that sentence, not fixing it). These had a bunch of registers you had to muck with, I only ever called them from assembly, I knew C but it just felt more natural to do it in asm defined files. My point is, you can do the same things using some high level garbage collected code. In Python or JS, you could implement Quake, using naive algorithms. No asm, just regular code, no custom memory copying and compositing hardware, just assignment statements in a dynamic, GCd language.

A Spellchecker Used to Be a Major Feat of Software Engineering https://news.ycombinator.com/item?id=3466388

The programmer that can code an awesome parallax demo using numpy arrays is not going to be the next Carmack. The programmer that can compose 3-ai models to make something we have never thought of is going to make Quake or some other piece of software that changes our relationship with computing and Moore's law. Abstraction gets us there.

djmips · on Nov 1, 2022

I agree with the parent of your post. I work in a field where Moore's law gets artificially arrested for often a decade at a time - console games - and we no stranger to being critically aware of how much memory we are copying around - we will reach for hand coded SIMD math and we stare at our shader assembly looking for more performance. You should see what some do to get top line performance in collision detection. It even leaves me a bit sweaty... I'm not discounting what you conjecture about the next Carmack being in the machine learning arena - that's how I feel too, but I still strongly believe that we will see more demand for programming that can eke our performance with what we have.

urthor · on Nov 1, 2022

Physical simulation is unique due to latency requirements. The impossibility of using the data center is the common denominator in high performance programming.

In my field, Spark, functional programming for data parallelism, few if any problems of Moore's law ever truly eventuate.

"Compute bottlenecks" are so uncommon. Databricks has almost no lines of Scala written, SQL/Python are "fast enough." Commoditization, "good enough" libraries, packaged in SQL/Python for the lowest common denominator.

Carmack's genius of the inverse square misses the point.

Carmack's genius was the video game Quake itself.

The mathematical brilliance, the high performance programming, was genius applied to overcome a bottleneck.

(And what temporary genius. Contrast Carmack with Unity).

Originality, usefulness. Imagination meeting relevance, is the engine that powers software.

Panzer04 · on Nov 1, 2022

But within reason, these are areas where huge returns can be made with higher performance programming as opposed to speed of development - a 10% performance increase can save stupid amount of money on hardware - and with hardware lasting longer I think there will be an increasing focus on that.

bjourne · on Nov 1, 2022

When I play console games on my Xbox 360 the biggest annoyance by far is the loading times. You run around in Skyrim and you enter a house so you have to wait 30 seconds for the content to load. Then you leave the house and have to wait 30 seconds again. My point is that the relevant performance metric isn't speed of number crunching anymore - it is speed of transporting data from one part of the system to another.

adamc · on Nov 1, 2022

DirectStorage is an attempt to address that, no?

syntheweave · on Nov 2, 2022

I believe a critical difference between the high performance of now vs yesteryear is the degree to which it's a design problem vs an implementation problem.

When writing 6502 assembly, you have "tricks" galore. You do have a design trade-off to make: memory vs CPU cycles, and when looking at algorithms in really old programs, they often dispensed with even basic caching to save a few bytes. But a lot of the savings came from gradually making the program as a whole a tighter specimen, doing initializations and creating reports with just a few less instructions. The "middle" of the program was of similar importance to the design and the inner loops, and it popularized ideas like "a program with shorter variable names will run faster" or "a program with the inner loop subroutines at the top of the listing will run faster". (both true of many interpreters) An engineer of this period worked out a lot of stuff on paper, because the machine itself wasn't in a position to give much help. And so the literal "coding" was of import: you had to polish it all throughout.

Today, the assumption is that the middle is always automated: a goop of glue that hopefully gets compiled down to something acceptable. Performance is really weighted towards the extremes of either finding a clever data layout or hammering the inner loop, and to get the most impactful results you usually have a little of both involved.

The hardware is in a similar position to the software: the masks aren't being laid out by hand, and they increasingly rely on automation of the details. But they still need a tight overall design to get the outcome of "doing more with less."

And the justifications for getting the performance generally have little to do with symbolic computation now: we aren't concerned about simply having a lot of live assets tracked in a game scene(a problem that was still interesting in the 90's, but more-or-less solved by the time we started having hundreds of megabytes of RAM available), we're concerned about having a lot of heavy assets being actively pushed through the pipeline to do something specific, which leans towards approaches that see the world in less symbolic or analytical terms and as more of a continuous space sampled to some approximation. Which digital computing can do, but isn't the obvious win like it once was.

urthor · on Nov 1, 2022

I'd cheekily argue.

The video game industry has downloaded more memory leaks personal machines than all the other domains of software combined. So many lines of terrible C++ have been written...

The importance of Moore's law falls flat in front of good old "bugger good code, Morrowind's rebooting the Xbox."

adw · on Nov 1, 2022

The relevant skills have more in common with the early 2000s supercomputing community than the mod scene. We are all data-parallel distributed now.

apatheticonion · on Nov 1, 2022

I love your comment. I can only imagine how thrilling it would have been in the early days to see order of magnitude improvements in generalised single threaded computer performance every couple of years.

Today, as it happens with all fields that become more complex over time, excitement is found in more nuanced areas.

Hardware has become task specific and that makes it exciting to different niches for different reasons.

You mention the idea of thinking in cycles and that concept is quite appealing to me. I believe the lack of focus on squeezing performance is a symptom of the accessibility of modern application development combined with the fact that most commercial products wouldn't see a financial benefit to delivering computationally efficient applications.

I do wish modern applications were more efficient, but that's a fool's errand as I don't see companies like Spotify rewriting their desktop client in 5 or 6 different native UI kits. Vendors like Microsoft and Apple will never collaborate on a common UI specification outside of web standards, so we are forced to suffer through Electron apps. Heck, Microsoft can't even figure out what UI API it wants to offer for Windows.

That said, if you're interested in computer science, we are only just uncovering novel approaches on how languages can allow engineers the ability to ergonomically leverage parallel computation. We see this in languages like Rust and Go - both of which are not perfect but there are so many lessons being learned here.

To me, the software engineering and language design world is unbelievably thrilling right now.

I do think and wish that large companies who own the platforms would work together more to avoid this standards mishmash application developers must contend with in today's landscape as it would help facilitate greater accessibility to writing efficient cross platform client applications that aren't written using web technologies.

bluGill · on Nov 1, 2022

These days cache is more important than registers. For typical n linear search beats the pants off of binary search just because linear search is cache friendly.

Modern optimizing compilers almost always to a much better job of micro optimization. Humans are much better attack the big picture making code fast with changes that cannot be safely made by the compiler because the algorithm isn't equivalent in all cases.

Even in 1980 programmers knew that optimization was best done at a high level. The low level stuff just had more value when compilers were not good.

otabdeveloper4 · on Nov 1, 2022

> I'm sure speeds and capabilities will continue to increase

Why? Seems like a strange quasi-religious belief. For example, jet airplanes are getting slower, not faster with time, and that's not a bad thing.

morbia · on Nov 1, 2022

High performance computing will drive demand for faster hardware, for example in machine learning. It is extremely computationally intensive and expensive to train large NLP models. The big companies in this game have a lot of money to invest in bringing those costs down, and in turn train better models.

That said, I don't see a reason why speeds will increase significantly on personal devices. We're seeing a situation now where personal devices are really 'fast enough' for normal use cases. Instead the focus is more on improving efficiency and battery life.

davidkuennen · on Nov 1, 2022

It depends. I dream of a world where your Smartphone is also your personal computer and you can just project everything from it using AR wherever you are. In that case they have to improve on both.

ideamotor · on Nov 1, 2022

Apple seems to be latching onto the idea users need to run ML on their consumptive devices, as opposed the cloud, and I don’t believe it. I think you agree. Yet in my opinion, if anything they want the appearance of that necessity, as expressed in loss of efficiency and battery life for older devices to sell new ones.

otabdeveloper4 · on Nov 1, 2022

By "ML" you really mean "neural networks", and ML is like Bitcoin - there's still no good business use for them, even after all these years.

(Business probably just wants Bayesian inference instead, but that's too hard, let's go hardware shopping instead.)

nl · on Nov 1, 2022

As someone who works in crypto[1] but used to work in ML your comments about neural network based ML coudn't be further from the truth.

Many businesses are seeing real, measurable impacts from NN based software that would be impossible without it.

[1] agree with your comment, not much real business use for it, but I wanted to work in it to be sure

otabdeveloper4 · on Nov 1, 2022

> Many businesses are seeing real, measurable impacts from NN based software that would be impossible without it.

Citation needed. Decision trees are still state of the art.

nl · on Nov 2, 2022

Um wow.

How do you do anything vision related with decision trees? Or anything beyond n-grams with text?

But here's some citations as requested:

https://casetext.com/blog/game-changing-ai-litigators/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6512995/

https://possibility.teledyneimaging.com/advances-in-ai-for-i...

https://www.skinvision.com/

https://www.lifewhisperer.com/

etc

(I'm pointing at these particular fields because I personally have worked on NNs in applications in these fields, but there are plenty more)

chimprich · on Nov 1, 2022

I don't understand this comment. ANNs are being used everywhere - image recognition, voice recognition, document classification... I can only see this use increasing for the foreseeable future.

cultofmetatron · on Nov 1, 2022

> there's still no good business use for them, even after all these years.

aside from fraud detection, autonomous vehicles, language translation, facial recognition, voice preproduction and market insights of course

eru · on Nov 1, 2022

Google spends a lot of money on ML. Are they idiots who throw away money?

hajile · on Nov 1, 2022

Google kills tons of very expensive projects. Facebook spends a lot on their Metaverse, but that doesn’t make it good. Tons of companies spend on terrible ideas.

They only difference with Google or Facebook is that they’re big enough to absorb the losses.

This isn’t to say that ML is a dead end, but instead to point out thatjust because they are investing a lot doesn’t make it good.

eru · on Nov 1, 2022

Well, Google and Apple are also putting ML specific processors into their phones.

ori_b · on Nov 1, 2022

Quite often, yes.

ML is unlikely to be one of those places, but appealing to the efficiency of large, bureaucratic companies is a poor argument.

api · on Nov 1, 2022

Jets are getting more energy efficient and cheaper to operate. We are just optimizing for things other than speed.

TheCondor · on Nov 1, 2022

I’m just a few years younger than you and have had similar experiences. This is off topic, but when was the last “magical” new computer experience for you? For me, it was an M1; after seeing how good Intel had been for so long, everything that they had vanquished and then AMDs recent run, I just couldn’t see a non-x8664 part really performing outside of some IBM systems in special cases. That little m1 SoC blew me away with its consistently great performance and power use. I’m not sure it’ll be the same with the M3 and beyond. It was a taste of that old school new computer feeling though.

omikun · on Nov 1, 2022

The first computer I used was a Pentium Pro 233mhz and I remembered how fast things were moving every year for at least a decade before it slowed to irrelevance. The M1 was long time coming. I remember back in 2013 when the iPhone 5S came out, Anandtech showed how it matched Atom's perf in a few web benchmarks at much lower power. Combined that with mw level idle power, it was obvious they would be very competitive in the pc space. That was also the year Apple called their chip "desktop level." I remember thinking back then how amazing I could FaceTime for hours on a passively cooled phone but yet can barely Skype for thirty seconds before the fan spin up on my Mac. Always thought it was the smaller screen, never made the connection it was the SoC that was the key difference.

Arrath · on Nov 1, 2022

For me it was the upgrade from spinning platters to a SSD. I was giggling as I restarted my computer a few times just to watch it almost instantly get to the login screen.

torginus · on Nov 2, 2022

I am very very doubtful that people will once again start caring.

Even a decade ago, it was known that hardware gains wouldn't be as spectacular as before. It was predicted that this would lead to rise of specialized programming models such as GPGPU, DSPs, more focus on optimization, with a particular eye to hardware architecture, memory access patterns etc.

What actually happened?

Everything runs in the browser buried under six layers of Javascript and talks to a bazillion servers running microservices and passing JSON over HTTP to each other.

People care about optimization even less today than they did a decade ago.

samstave · on Nov 1, 2022

Dude, in 1998 we at intel had a 64-core running system.

but its the microns to nm circuit size that really proved out, not macro cores... except that once they solved that, scaling CORES is what really gave way - going from nm to um...

sylware · on Nov 1, 2022

Maybe some critical code paths will be assembly optimized (cf dav1d) because speed and efficiency, but, now the real issues are mostly at the software level where toxic planned obsolescence is going rampant, that fueled by the big tech companies steered by vanguard and blackrock (apple/microsoft/google/etc).

The only shield against that, some would think open source is the key, but actually it is "lean" open source, SDK included. Kludge, bloat, planned obsolescence are no better in the current open source world than in the closed source world.

I am a "everthing in risc-v assembly" (with a _simple_ and dumb macro preprocessor only) kind of guy (including python/lua/js/ruby/etc interpreters). The main reason for that is not to be "faster", but to remove those abominations which are the main compilers from SDK stacks. Some sort of "write assembly once/run everywhere" (and you don't need a c++7483947394 compiler).

rolenthedeep · on Nov 2, 2022

I agree, but I also think we need a fundamentally new paradigm.

It's very important that we as programmers have a good mental model for how the machine works. Abstractions are cool, but it is important to be aware of how your data lives in memory and how the cpu acts on your code, but everything we've been taught in the last few decades is almost irrelevant.

Almost all of us think and write code sequentially. Even with multithreading, your program is generally sequential, and the cpu just doesn't work that way anymore. With all the fancy whizbang branch prediction and superscaling and whatever other black magic, the cpu is fundamentally not sequential.

As a result, compilers are becoming enormous hulking beasts with millions of lines of code trying to translate sequential programs into parallel ones. This kind of defeats the purpose of us having that mental model of the machine. The machine we think we know is not the machine that actually exists.

We need a new set of inherently parallel languages. Similar to the way we program GPUs these days.

The modern cpu is orders of magnitude more complex than anything we've seen before. We need new mental models and new programming paradigms to extract performance the way we used to on sequential processors.

Even for embedded applications, microcontrollers increasingly feature things like multiple instructions per cycle, branch prediction, and multiple cores are much more common these days.

I think we're stuck in a shitty place in between two wildly different worlds of computing. We aren't willing to make the leap to the new, so we live in this rapidly crumbling ecosystem trying to adapt 50 year old code to superscalar hyperthreading gigacore x86 processors.

The amount of wasteful code and technical debt in every one of the systems underpinning our society is truly unimaginable in its scale. There is no path forward from here except to burn it all down and begin again with a fundamentally new way of looking at things. Otherwise, it's all going to come crashing down sooner or later.

rbanffy · on Nov 2, 2022

> As the rate of progress has fallen off

I don't quite feel that - one side of it is that my current computers cover my necessities well enough, but it's still quite impressive how even more instantaneous is the boot of a new computer in comparison to my daily drivers. For the rest, computers have been "fast enough" for me for some time now.

Maybe I should move to big data and machine learning...

> Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs

I really miss those days. OTOH, just like my modern laptop, my Apple II could cold-start (from disk!) in 2-ish seconds.

agumonkey · on Nov 1, 2022

It's gonna need a new 'bare metal' mental framework since it's always multicore or gpu arrays programming.

zaptrem · on Nov 1, 2022

This graph shows transistors basically maintaining pace and completely disregards multi-core performance. Of course single core perf will rise more slowly when a chip now has 8-64x as many cores.

mrandish · on Nov 1, 2022

> This graph shows transistors basically maintaining pace...

I'm no expert in silicon scaling but from reading technical papers, my (naive) understanding is that transistor density has almost kept up but now that scaling comes with increasingly stringent design constraints which architects must make trade-offs over. Broadly speaking, things like "You can have 2x last gen's density but they can't all be fully powered on for very long." That's a greatly simplified example but much of what I've seen has been far "thornier" in terms of interacting constraints along multiple dimensions.

My sense is that in the 90s we usually got "denser, faster AND cheaper" with every generation. Now we're lucky to get one and even that comes with implementation requirements which can be increasingly arcane. My understanding is that different fabs are having to roll more of their own design libraries which embody their chosen sets of trade-offs per node. In addition to limiting overall performance and being harder to design, this apparently makes reusing or migrating designs more challenging. While certain headline metrics like node density may appear to be scaling as usual. The reality under the hood is more complex and far less rosy.

Dylan16807 · on Nov 1, 2022

> My sense is that in the 90s we usually got "denser, faster AND cheaper" with every generation. Now we're lucky to get one

You can still get all three, though you can only pick "cheaper" so many times before you fall off the mainstream product stack.

Two generations ago in 2019, the 3700X launched at $329. This generation, the 7600X launched at $299.

The 7600X has fewer cores, but they're about 50% faster individually and 25% faster total.

And it's N5 instead of N7.

sitkack · on Nov 1, 2022

You made me think that maybe computing is a deflationary force (I am not a libertarian, this isn't some free market bro idea, I think)d, the more that can be subsumed by computation, the more things that can get cheaper over time not more expensive, even if the face of rising material costs.

The relative price of steel has remained flat, while the steel performance has greatly increased.

https://www.metalbulletinresearch.com/Article/3532290/Commod...

Between material science and cheaper compute, we can build higher tech parts and techniques.

The cycles/consumed/per/person/per/year is an exponential, what are some important points on that curve? When the computation to design something is on the order as the same amount of energy to create it?

This interesting, https://www.in2013dollars.com/New-cars/price-inflation/1980-...

You could buy a Honda Civic new in 1980 for 5000$, that would only be just under 10k in todays dollars. What 1980 Honda Civic quality car can you buy today for 10k? Or am I a being nostalgic.

And look at the bump in inflation during the recession, https://blog.cheapism.com/average-car-price-by-year/#slide=6... of car prices. Was the 2008 recession triggered by excessively inflated car prices? Like causing a bubble in a pipeline, an economic embolism.

Current average price has dropped 10k$ from 35k to 25k in the years since 2008.

eru · on Nov 1, 2022

Productivity improvements in the economy are indeed deflationary. That's a general idea in economics.

https://cdn.mises.org/Less%20than%20Zero%20The%20Case%20for%... is an interesting book on the topic. (Ignore for a moment that it's hosted by mises.org..)

Some people think that deflation is bad for an economy. That's a bit confused.

What's bad for the economy is a fall in aggregate nominal spending. That one leads to recession and unemployment.

If prices fall, but total spending stays stable, that's fine.

sitkack · on Nov 2, 2022

So it sounds like we need to do more of nothing but employ a lot of people doing it and we can be carbon negative AND have full employment.

We figure out how to turn atmospheric carbon into cheese. Take CO2 -> C12H22O11

eru · on Nov 3, 2022

Could you please try to explain what you want to say with less snark? I'm a bit confused.

Paying people to do nothing gives you nothing.

Full employment isn't an end in itself, but it's useful because it is typically related to things we do care about. Employing people to do nothing is like fiddling with the speedometer of your care in order to 'go faster'. Or relabeling your amplifiers to go to 11.

You can sort-of turn atmospheric carbon into cheese. Have grass capture the carbon, and a cow eat the grass. That's totally doable, just not viable or efficient if your goal is to capture carbon at scale.

(If your goal was to go carbon negative at all costs, you could instate a whooping big carbon tax, and let the economy figure it out.)

sitkack · on Nov 3, 2022

Right now our economy basically runs on carbon at the core. We make stuff, move stuff and emitting carbon is necessary. If we switched our economy to owing and moving information, then we could still have full-employment, move money in the ecosystem while from the viewpoint of a materialist, just be moving useless bits around.

I think we already have a lot of high paying jobs in the economy that don't do much and pay people to do nothing (of value). We should absolutely spread that around.

mrandish · on Nov 1, 2022

> You could buy a Honda Civic new in 1980 for 5000$

Can confirm... because my parents bought me a new Honda Civic that year for college and that's what they paid.

StillBored · on Nov 1, 2022

Which is great if you have a traditional server application servicing a lot of independent requests, or giant linear equations that can be solved in parallel.

OTOH, the graph has an amadal's law section, which for many tasks is pretty out of steam (aka desktop web browsing/javascript JIT/etc).

I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.

This really gets back at the Arm big.little thing, where you really want 99% of your application usage to run on the big cores. The little cores only exist for background/latency insensitive tasks, and the odd case where the problem actually can utilize a large number of parallel cores and needs to maximize efficiency in the power envelope to maximize computation. AKA throw a lot of lower power transistors at the people rendering video/etc, and leave them powered off most of the time.

AKA, put another way, the common use case is a few big powerful cores for normal use, playing games, whatever with one or two high efficiency processors for everything else and a pile of dark silicon for the rare application that actually can utilize dozens of cores because its trivial to parallelize and doesn't work better being offloaded to a GPU. I suspect long term intel was probably right with larrabee, they were just a decade or two early.

So, economically I don't see people buying machines with a couple hundred cores that sit dark most of the time. Which will drive the price up even more, and make them less popular.

eru · on Nov 1, 2022

> I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.

See GAN Theft Auto at https://www.youtube.com/watch?v=udPY5rQVoW0

Someone trained a neural network to convert controller input into video output that simulates the game Grand Theft Auto.

If technology keeps improving, I expect many future games will be such 'dreams' of neural networks.

You are right that running Word or Excel won't really benefit from more cores.

therealcamino · on Nov 1, 2022

Cause and effect is backwards there. Designers only went to multicore because single core performance improvement was leveling off. It's not that people wanted multicore systems and were willing to sacrifice single core performance to get it.

ReptileMan · on Nov 1, 2022

Well we wanted multicore, but it was mostly because windows loved to become irresponsive on single core. I think that from consumer point 2 cored circa 2006 were enough. 4 is probably the absolute maximum.

ummonk · on Nov 1, 2022

How does it disregard multi-core performance? As you said, it's showing the transistor counts going up, and it's also showing the rise in the number of logical cores.

Rufbdbskrufb473 · on Nov 1, 2022

The missing thing that's critical for most multi-core performance use cases is memory bandwidth. Maybe not easy to summarize on a graph like this, but for any workload that can't fit within L1 cache, you're unlikely to get close to linear performance scaling with cores. Sometimes a single core can fully saturate the available memory bandwidth.

MereInterest · on Nov 1, 2022

Back in grad school, one of the analysis programs I used dated back to the mid 70s. The original paper gave a performance metric for a test program, which I compared to the runtime on a Chromebook running Linux. I was curious how closely that scaled with Moore's Law, and computed "initial_release + (1.5 years)*log2(initial_runtime/current_runtime)". That is, assuming that the change in program speed has increased due to hardware improvements, and those hardware improvements follow Dennard scaling, what year is it?

This (admittedly very rough) measurement ended up giving 2003. It was wrong by over a decade from the actual date, but correctly gave the date at which clock frequencies stopped improving.

Traubenfuchs · on Nov 1, 2022

More depressing -number of cores will also level off eventually and where does that leave us then?

Will we actually have to optimize our code instead of pushing each hello world request through hundreds of function calls?

I also assume some niches are actually already 90%+ optimized. Once the cores level off it‘s stagnation here too.

Frost1x · on Nov 1, 2022

>More depressing -number of cores will also level off eventually and where does that leave us then?

Short of breakthroughs (e.g. quantum and currently unknowns), the only clear path is less generalized architectures and more specialized chips. As you move more towards ASICs from general architectures you get improved performance, reduced power, and so on.

We've lived in the era of software where hardware was abundant and cheaper than an engineers time. Throw more hardware at it and make sure you have generally optimal algorithms in most your run paths. That's going to change more and more and I suspect we're going to have to start rethinking or redeveloping some layers of abstraction between current software and hardware.

As it stands now we're building more and more complex things atop weaker intermediary layers of abstraction to save time and meet budgets but that's going to have to be revisited in the future and the inefficiency debts we've been building up will need to be paid down. Clear code will become less of a top priority when clever optimizations can be added in that may not be so clear. We're still many many year away from this but that's my prediction.

urthor · on Nov 1, 2022

Not necessarily a problematic trend.

It takes thousands and thousands of engineers to produce a general purpose chip.

It takes... one smart lady to optimize a widely used library for FPGA acceleration.

pbazarnik · on Nov 1, 2022

The "cores" are becoming more specialized and optimized for domain specific tasks. Compiler technology advancements are needed to take advantage of such heterogenous architectures in a transparent way. LLVM MLIR started that already.[1,2] The alternative is being stuck with each silicon vendor's proprietary solutions like CUDA.

[1] https://mlir.llvm.org/ [2] https://www.theregister.com/2022/04/04/compiling_the_future/

mschuster91 · on Nov 1, 2022

I'd guess we get more hardware acceleration. In classic computers (PCs, laptops, servers), for stuff like audio/video codecs, that's been available for decades, but I'd say the next big push will be ethernet/wifi accelerators that do stuff like checksum calculation/verification, VLAN tagging or even protocol-level stuff like TLS in the chip itself - currently, that's all gated for expensive cards [1], I'd expect that stuff to become mainstream over the next few years. Another big part will be acceleration for disk-to-card data transfer [2] - at the moment, data is being shifted from the disk to RAM to GPU/other compute card. Allowing disks to interface with compute cards will be a lot of work - basically, there needs to be a parallel filesystem reader implementation on the disk itself, on the DMA controller or in the GPU, which is a lot of effort to get done right with most modern and complex filesystems - but in anything requiring high performance removing the CPU bottleneck should be well worth the effort.

Mobile is going to be more interesting because of power, space and thermal constraints and a lot of optimization already being done because unlike on classic computers vendors couldn't just go and use brute force to get better performance, and there is a bit of an upper cap on chip/package size as well. Probably we'll see even more consolidation towards larger SoCs that also do all the radio communication stuff if not on the same chip then at least in the same package, so the end game there is one single package that does everything and all that's needed on the board are RF amplifiers and power management. All the radio stuff will move to SDR sooner or later, allowing for far faster adoption of higher bandwidth links and with it, a reduce in power consumption as the power-expensive RF parts have to be powered on for less time to deliver the same amount of data.

[1] https://docs.nvidia.com/networking/display/FREEBSDv370/Kerne...

[2] https://developer.nvidia.com/blog/gpudirect-storage/

rjsw · on Nov 1, 2022

Network offload has been available in low-cost controllers for a long time, TLS isn't common though.

It could be better if network controllers just had a documented local CPU, then the firmware could be extended over time to add new features.

Mistletoe · on Nov 1, 2022

What if aliens show up and they are thousands of years ahead of us and they don’t have anything much more powerful than an i9 running their UFO?

bee_rider · on Nov 1, 2022

Who knows what sort of tech aliens would have? I don't think this whole foray into general purpose computing was necessarily pre-destined. Maybe their whole system could look more like a bunch of strung-together ASICs. "You made your computers drastically less efficient so that anyone could program them? Why would you want your soldier-forms and worker-forms to program computers? Just have the engineer-forms place the transistors correctly in the first place, duh."

rl3 · on Nov 1, 2022

>Who knows what sort of tech aliens would have? I don't think this whole foray into general purpose computing was necessarily pre-destined.

It's sometimes fun to think that technology is a function of the intelligence that creates it.

What if the aliens have some vastly different perception of reality than us? Things we consider obvious to them may not be, and vice versa. The underlying desires and motivation different.

Humans for example, often tend to invent things for the sake of it. Imagine a species that doesn't do that. Or an organic FTL drive conjured into existence over eons via distributed intelligence. Weird.

checkyoursudo · on Nov 1, 2022

> Or an organic FTL drive conjured into existence over eons via distributed intelligence. Weird.

E.g., What if the first aliens to find us are hyperintelligent slime molds, whose entire existence is predicated on finding the shortest distance between two points in higher-dimensional space and then traveling there to see what there is to eat?

Traubenfuchs · on Nov 1, 2022

The anime Gargantia on the Verdurous Planet explores this.

Here squids evolved into a spacefaring race that is, if at all, only using organic technology and doesn't seem to have consciousness.

They are at war with the spacefaring humans that rely on mecha and AI. It ends with a very non-human and frustrating coexistence message instead of going for all out termination of hostile creatures.

somenameforme · on Nov 1, 2022

One of the most interesting things to think about in this regards is the past and the crazy things they thought, and why they probably didn't seem especially crazy at the time. In the earlier ages of exploration of our world people have been able to discover ever more amazing things from springs mysteriously heated even in the coldest of times and places, to a tree producing bark that chewing on can make ones pain completely disappear (more contemporarily known as willow/aspirin), and endless other ever more miraculous discoveries.

Why would it thus be so difficult to imagine there being some spring or treatment that could effectively end illness or even aging? A fountain of youth just awaiting its discovery. It was little more than a normal continuation outward from a process of exponential progress. But of course the exponential progress came to an unexpected end, and consequently the predictions made now look simply naive or superstitious.

We're now currently in our own period of exponential discovery and the fabulous tales of achievements to come are anything but scarce. Of course, this time it'll be different.

NortySpock · on Nov 1, 2022

Probably not much more than "it wasn't worth it to install cryo-cooled quantum computers on an average spaceship".

We didn't install supercomputers in the Space Shuttle either. All the big iron was in a building on the ground.

ddingus · on Nov 1, 2022

Maybe!

Perhaps they operate a combination of biological systems alongside their electro mechanical ones.

Their ship may be locally intelligent everywhere, with that all rolling up to an i9 ish main control system.

Purpose optimized hardware communicating along standardized interconnects could mean lot of hard tasks done in silicon or shared with biological systems too.

They may have decades, centuries old solutions to many hard problems boiled down to heuristics able to run in real time today. Maybe some of these took ages to run initially.

kloch · on Nov 1, 2022

> thousands of years ahead of us

Just thousands? I would expect 100k years at a minimum and even that is only .0007% of the age of the universe. Millions or Billions of years more advanced is not out of the question.

It would be interesting to see how similar technology is among such advanced civilizations, even if they did not compare notes. Does technology eventually converge to the same optimal devices in each civilization?

Given our current extremely primitive state (only about a hundred years of useful electronics) I would be disappointed if we could even imagine what this technology looks like.

bitL · on Nov 1, 2022

They'll likely use optimization laws of nature to get perfect solutions instantly, like what people try to get nowadays in some labs with electricity finding the shortest path/route immediately.

ars · on Nov 1, 2022

Then that pretty much spells the end of true AI.

We may shift to computers that operate off of chemical signals instead of electrical ones - like the brain.

emkemp · on Nov 1, 2022

Obligatory SMBC:

https://www.smbc-comics.com/comic/2011-02-17

tyingq · on Nov 1, 2022

There's also ideas like the Mill processor. Though it's hard to avoid comparisons to Itanium, and how a mountain of money still didn't produce a compiler that could unlock what initially sounded like a better ecosystem.

joe_the_user · on Nov 1, 2022

Would the number of cores in a GPU level off? It seems like intensive computing of all sort will migrate to gpgpu programming.

ummonk · on Nov 1, 2022

Seems like the latest Nvidia GPUs aren't really an improvement over the previous ones, but just bigger and proportionally more expensive. So maybe the leveling off in performance is already starting to happen.

brokenmachine · on Nov 1, 2022

That is not true.

The 4090 uses less power than a 3080ti while being 63% faster.

https://www.techpowerup.com/review/nvidia-geforce-rtx-4090-f...

, and 45% better performance than a 3090ti while using 2% less power (at 4k).

https://www.techpowerup.com/review/nvidia-geforce-rtx-4090-f...

Can't find the link now, but I saw a youtube video where they did an analysis of it at different wattage limits and it performed very nicely.

4090 draws a lot of power, but Nvidia has just chosen to work at the diminishing returns end of the curve.

It's a halo product for people who will pay for the top of the range. I mean, look at the price!

Traubenfuchs · on Nov 1, 2022

Shrinking has almost stopped too and you can only make a chip that big before it runs into other constraints.

singularity2001 · on Nov 1, 2022

There is a lot of room for development before the exponential curve can be carried by the next paradigm: at least for desktop computers we are still decades away from case filling 3D "compute cubes".

brazzy · on Nov 1, 2022

It's quite possible that kind of thing has hard limits set by cooling.

singularity2001 · on Nov 10, 2022

sure but usually before hard limits are reached a new paradigm is ready to take over

plasticchris · on Nov 1, 2022

The wafer scale computing folks would disagree...

Traubenfuchs · on Nov 1, 2022

This is insane.

https://www.eetimes.com/powering-and-cooling-a-wafer-scale-d...

duped · on Nov 1, 2022

The metric is performance per watt per dollar. At the moment the fact is that the amount of compute available per watt dollar is ridiculously cheap, crypto not withstanding.

We are not limited by compute resources but by business practices. The organizational cost of software design is where the next gains are, not technological.

stn_za · on Nov 1, 2022

Can't wait.

People should be so ashamed that what's basically an IRC client (Slack) requires more than 4GB of ram and so many cores. Laziness. Truly.

nullifidian · on Nov 1, 2022

Not sure if it's exactly accurate with the plateauing trend -- single-threaded performance almost doubled in the last 5 years.

pclmulqdq · on Nov 1, 2022

It's a log scale graph, it shows a doubling. The performance just isn't increasing as fast as it was before.

hansel_der · on Nov 1, 2022

yea, buying a pc in the early intel era was a somewhat double edgded sword because you knew that the next generation would come out in a year or so and it would probably have more than double the performance.

Nomentatus · on Nov 7, 2022

For many years my friends and I had a rule that we wouldn't buy a new computer until it offered at least 4X the speed of our old one. We didn't have to wait all that long.

abbeyj · on Nov 1, 2022

You might like https://github.com/karlrupp/microprocessor-trend-data

andy_ppp · on Nov 1, 2022

I’d love to see how GPUs fit into this picture; are we getting more transistors on a die but designs are not getting much faster?

joshspankit · on Nov 1, 2022

I wonder how much of that flattening is because chips are now hovering just under the 5GHz/6Ghz wifi frequencies.

Yes we have techniques that help them avoid each other, but it seems like it would be enough of a tech and PR hassle that companies wouldn’t bother.

hansel_der · on Nov 1, 2022

pretty sure it has something to do with speed of light divided by clock speed

joshspankit · on Nov 2, 2022

I hadn’t even considered that there is a physical limitation.

I’m too lazy to look it up now, but when I first read this I feel like the math said electrons can move at maximum 5cm in a single 5GHz clock cycle?

That’s wild.

hansel_der · on Nov 2, 2022

the eletrons do not have to 'move' the entire way for transfer of information because they will push the already present ones further.

joshspankit · on Nov 2, 2022

Well yes, but that’s an academic difference unless you’re stating that they can transfer information faster than light.

hansel_der · on Nov 4, 2022

well no, i was saying that the information moves faster than the electrons. speed of light in vacuum remains unbeaten afaik

kloch · on Nov 1, 2022

Pretty sure it has everything to do with heat dissipation

hansel_der · on Nov 2, 2022

probably true as well.

also iirc the actual clock of the logic gates is a factor above the 'operations clock'

causi · on Nov 1, 2022

This trend of browsers to increasingly lie to me about what I'm looking at is infuriating.

modeless · on Nov 1, 2022

The URL is a lie but that's not the browser's fault. It's correctly showing you the URL that you requested and the server responded to. If you save the file locally the browser will give it the correct extension.

q-big · on Nov 1, 2022

> This trend of browsers to increasingly lie to me about what I'm looking at is infuriating.

The browser does not lie to you, it just does not show the MIME type of the content.

I do agree that it would make sense that the browser shows the MIME type of the content that is currently displayed, but this is opposition to the current fashion (in particular pioneered by Apple) of "simplifying" everything.

deathanatos · on Nov 1, 2022

> The browser does not lie to you, it just does not show the MIME type of the content.

In this particular case, Firefox does:

> Picture1-2.png (WEBP Image, 1054 × 593 pixels)

is the title on the tab. (The ".png" comes from the URL, as a sibling notes.)

Chrome does not.

samstave · on Nov 1, 2022

where the heck is Transmeta ??? intel was so afraid of them in 1997

Obv AMD was the true and once... but there was a lot of Transmeta-fear back then too

jecel · on Nov 2, 2022

Transmeta announced their first product, the Crusoe processor, in 2000. Before then they were highly secretive and nobody knew what area they would compete in.

Intel was indeed worried about the laptop market when the Crusoe came out, but quickly adopted the voltage and frequency scaling into their own processors negating some of Transmeta's key technical advantages.

Meanwhile, IBM changed directions as a third party fab (they dropped bulk CMOS to focus on SiliconOnInsulator) leaving Transmeta without a product to sell until they could find a new fab and design their next generation product for that.

amelius · on Nov 1, 2022

How is it that the number of transistors still grows exponentially, while the power use has plateaued?

Swenrekcah · on Nov 1, 2022

I think I can say the following.

Smaller transistors mean fewer electrons that go in and out every clock cycle, so less power per transistor.

Higher clock frequency means more cycles per second, that is more electrons spent per second thus higher power consumption.

Since clock frequency has stabilised and the total area of a chip is I think not much larger than before, it is expected to see the power consumption stabilising.

I also believe I read somewhere that one of the reasons clock frequency stoped increasing was that the power consumption became too high for the chips to handle the thermal dissipation.

eru · on Nov 1, 2022

Each transistor uses less power?

amelius · on Nov 1, 2022

Exponentially less.

jdthedisciple · on Nov 1, 2022

interesting, so basically since 2010 no all-to-significant progress has really been made despite the scaling in logical processor count.

This is considering the vast majority of software basically still uses only a single core

dspillett · on Nov 1, 2022

> the vast majority of software basically still uses only a single core

And that which does use multiple cores, sometimes only scales well to a few because then other bottlenecks† start to become most significant.

Many things are not so “embarrassingly parallelisable” that they can easily take full advantage of the power available from expanding the number of processing units available beyond a certain point.

--

[†] things no longer being “friendly” to the amount or arrangement of shared L2/L3 cache as the amount of threads grow causing more cache-thrashing, hitting other memory bandwidth issues, or for really large data you might be considering issues as far down as the IO subsystem or network.