My entire career in computers spans the 40 years in that graph. The constant leaps in fundamental speed were exhilarating and kind of addictive for technologists like myself. As the rate of progress has fallen off over the past decade it's been sad to see the end of an era.
I'm sure speeds and capabilities will continue to increase, albeit much more gradually, but significant gains are going to come slower, harder and at greater cost. The burden will have to be shouldered by system architects and programmers in finding clever ways to squeeze out net gains under increasingly severe fundamental constraints (density, leakage, thermals, etc).
Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs, knowledge of the hardware underneath the code and low-level assembly language skills were highly valuable. Over the years, the ability to think in instruction cycles and register addressing modes grew anachronistically quaint. Now I suspect those kinds of specialized 'down-to-the-metal' optimization skills may see a resurgence in value.
I think it is the opposite. I have almost as much experience as you, I started a little later, didn't get serious until my teens with 68k assembly language and custom chip programming on the Amiga 500. Not all nostalgia, some of it germane context.
I think it is important to have a mental model of the hardware so that the architecture of the program has some mechanical sympathy. But the ability to think abstractly is more important, that is what allows Moore's law to be realized. Our compute topology is changing and if the perf curve is to continue to be exponential, our code and more importantly the expression of our ideas has to be able to exercise 30B transistors today, and 150B in 8 years. Knowing how to compose neural networks is one of the new skills that is akin knowing how to shave off cycles in the 80s. Mod playback, Doom, Quake, mp3 decompression, emulation all redefined our relationship with computing.
The Amiga had this custom hardware for doing bitblits and sprite compositing, it could do these trippy multi-layered backgrounds that used parallax to give it an Arcade like 2.5D rendering (wow that sentence, not fixing it). These had a bunch of registers you had to muck with, I only ever called them from assembly, I knew C but it just felt more natural to do it in asm defined files. My point is, you can do the same things using some high level garbage collected code. In Python or JS, you could implement Quake, using naive algorithms. No asm, just regular code, no custom memory copying and compositing hardware, just assignment statements in a dynamic, GCd language.
The programmer that can code an awesome parallax demo using numpy arrays is not going to be the next Carmack. The programmer that can compose 3-ai models to make something we have never thought of is going to make Quake or some other piece of software that changes our relationship with computing and Moore's law. Abstraction gets us there.
I agree with the parent of your post. I work in a field where Moore's law gets artificially arrested for often a decade at a time - console games - and we no stranger to being critically aware of how much memory we are copying around - we will reach for hand coded SIMD math and we stare at our shader assembly looking for more performance. You should see what some do to get top line performance in collision detection. It even leaves me a bit sweaty... I'm not discounting what you conjecture about the next Carmack being in the machine learning arena - that's how I feel too, but I still strongly believe that we will see more demand for programming that can eke our performance with what we have.
Physical simulation is unique due to latency requirements. The impossibility of using the data center is the common denominator in high performance programming.
In my field, Spark, functional programming for data parallelism, few if any problems of Moore's law ever truly eventuate.
"Compute bottlenecks" are so uncommon. Databricks has almost no lines of Scala written, SQL/Python are "fast enough." Commoditization, "good enough" libraries, packaged in SQL/Python for the lowest common denominator.
Carmack's genius of the inverse square misses the point.
Carmack's genius was the video game Quake itself.
The mathematical brilliance, the high performance programming, was genius applied to overcome a bottleneck.
(And what temporary genius. Contrast Carmack with Unity).
Originality, usefulness. Imagination meeting relevance, is the engine that powers software.
But within reason, these are areas where huge returns can be made with higher performance programming as opposed to speed of development - a 10% performance increase can save stupid amount of money on hardware - and with hardware lasting longer I think there will be an increasing focus on that.
When I play console games on my Xbox 360 the biggest annoyance by far is the loading times. You run around in Skyrim and you enter a house so you have to wait 30 seconds for the content to load. Then you leave the house and have to wait 30 seconds again. My point is that the relevant performance metric isn't speed of number crunching anymore - it is speed of transporting data from one part of the system to another.
I believe a critical difference between the high performance of now vs yesteryear is the degree to which it's a design problem vs an implementation problem.
When writing 6502 assembly, you have "tricks" galore. You do have a design trade-off to make: memory vs CPU cycles, and when looking at algorithms in really old programs, they often dispensed with even basic caching to save a few bytes. But a lot of the savings came from gradually making the program as a whole a tighter specimen, doing initializations and creating reports with just a few less instructions. The "middle" of the program was of similar importance to the design and the inner loops, and it popularized ideas like "a program with shorter variable names will run faster" or "a program with the inner loop subroutines at the top of the listing will run faster". (both true of many interpreters) An engineer of this period worked out a lot of stuff on paper, because the machine itself wasn't in a position to give much help. And so the literal "coding" was of import: you had to polish it all throughout.
Today, the assumption is that the middle is always automated: a goop of glue that hopefully gets compiled down to something acceptable. Performance is really weighted towards the extremes of either finding a clever data layout or hammering the inner loop, and to get the most impactful results you usually have a little of both involved.
The hardware is in a similar position to the software: the masks aren't being laid out by hand, and they increasingly rely on automation of the details. But they still need a tight overall design to get the outcome of "doing more with less."
And the justifications for getting the performance generally have little to do with symbolic computation now: we aren't concerned about simply having a lot of live assets tracked in a game scene(a problem that was still interesting in the 90's, but more-or-less solved by the time we started having hundreds of megabytes of RAM available), we're concerned about having a lot of heavy assets being actively pushed through the pipeline to do something specific, which leans towards approaches that see the world in less symbolic or analytical terms and as more of a continuous space sampled to some approximation. Which digital computing can do, but isn't the obvious win like it once was.
The video game industry has downloaded more memory leaks personal machines than all the other domains of software combined. So many lines of terrible C++ have been written...
The importance of Moore's law falls flat in front of good old "bugger good code, Morrowind's rebooting the Xbox."
I love your comment. I can only imagine how thrilling it would have been in the early days to see order of magnitude improvements in generalised single threaded computer performance every couple of years.
Today, as it happens with all fields that become more complex over time, excitement is found in more nuanced areas.
Hardware has become task specific and that makes it exciting to different niches for different reasons.
You mention the idea of thinking in cycles and that concept is quite appealing to me. I believe the lack of focus on squeezing performance is a symptom of the accessibility of modern application development combined with the fact that most commercial products wouldn't see a financial benefit to delivering computationally efficient applications.
I do wish modern applications were more efficient, but that's a fool's errand as I don't see companies like Spotify rewriting their desktop client in 5 or 6 different native UI kits. Vendors like Microsoft and Apple will never collaborate on a common UI specification outside of web standards, so we are forced to suffer through Electron apps. Heck, Microsoft can't even figure out what UI API it wants to offer for Windows.
That said, if you're interested in computer science, we are only just uncovering novel approaches on how languages can allow engineers the ability to ergonomically leverage parallel computation. We see this in languages like Rust and Go - both of which are not perfect but there are so many lessons being learned here.
To me, the software engineering and language design world is unbelievably thrilling right now.
I do think and wish that large companies who own the platforms would work together more to avoid this standards mishmash application developers must contend with in today's landscape as it would help facilitate greater accessibility to writing efficient cross platform client applications that aren't written using web technologies.
These days cache is more important than registers. For typical n linear search beats the pants off of binary search just because linear search is cache friendly.
Modern optimizing compilers almost always to a much better job of micro optimization. Humans are much better attack the big picture making code fast with changes that cannot be safely made by the compiler because the algorithm isn't equivalent in all cases.
Even in 1980 programmers knew that optimization was best done at a high level. The low level stuff just had more value when compilers were not good.
High performance computing will drive demand for faster hardware, for example in machine learning. It is extremely computationally intensive and expensive to train large NLP models. The big companies in this game have a lot of money to invest in bringing those costs down, and in turn train better models.
That said, I don't see a reason why speeds will increase significantly on personal devices. We're seeing a situation now where personal devices are really 'fast enough' for normal use cases. Instead the focus is more on improving efficiency and battery life.
It depends. I dream of a world where your Smartphone is also your personal computer and you can just project everything from it using AR wherever you are. In that case they have to improve on both.
Apple seems to be latching onto the idea users need to run ML on their consumptive devices, as opposed the cloud, and I don’t believe it. I think you agree. Yet in my opinion, if anything they want the appearance of that necessity, as expressed in loss of efficiency and battery life for older devices to sell new ones.
I don't understand this comment. ANNs are being used everywhere - image recognition, voice recognition, document classification... I can only see this use increasing for the foreseeable future.
Google kills tons of very expensive projects. Facebook spends a lot on their Metaverse, but that doesn’t make it good. Tons of companies spend on terrible ideas.
They only difference with Google or Facebook is that they’re big enough to absorb the losses.
This isn’t to say that ML is a dead end, but instead to point out thatjust because they are investing a lot doesn’t make it good.
I’m just a few years younger than you and have had similar experiences. This is off topic, but when was the last “magical” new computer experience for you? For me, it was an M1; after seeing how good Intel had been for so long, everything that they had vanquished and then AMDs recent run, I just couldn’t see a non-x8664 part really performing outside of some IBM systems in special cases. That little m1 SoC blew me away with its consistently great performance and power use. I’m not sure it’ll be the same with the M3 and beyond. It was a taste of that old school new computer feeling though.
The first computer I used was a Pentium Pro 233mhz and I remembered how fast things were moving every year for at least a decade before it slowed to irrelevance. The M1 was long time coming. I remember back in 2013 when the iPhone 5S came out, Anandtech showed how it matched Atom's perf in a few web benchmarks at much lower power. Combined that with mw level idle power, it was obvious they would be very competitive in the pc space. That was also the year Apple called their chip "desktop level." I remember thinking back then how amazing I could FaceTime for hours on a passively cooled phone but yet can barely Skype for thirty seconds before the fan spin up on my Mac. Always thought it was the smaller screen, never made the connection it was the SoC that was the key difference.
For me it was the upgrade from spinning platters to a SSD. I was giggling as I restarted my computer a few times just to watch it almost instantly get to the login screen.
I am very very doubtful that people will once again start caring.
Even a decade ago, it was known that hardware gains wouldn't be as spectacular as before. It was predicted that this would lead to rise of specialized programming models such as GPGPU, DSPs, more focus on optimization, with a particular eye to hardware architecture, memory access patterns etc.
What actually happened?
Everything runs in the browser buried under six layers of Javascript and talks to a bazillion servers running microservices and passing JSON over HTTP to each other.
People care about optimization even less today than they did a decade ago.
Dude, in 1998 we at intel had a 64-core running system.
but its the microns to nm circuit size that really proved out, not macro cores... except that once they solved that, scaling CORES is what really gave way - going from nm to um...
Maybe some critical code paths will be assembly optimized (cf dav1d) because speed and efficiency, but, now the real issues are mostly at the software level where toxic planned obsolescence is going rampant, that fueled by the big tech companies steered by vanguard and blackrock (apple/microsoft/google/etc).
The only shield against that, some would think open source is the key, but actually it is "lean" open source, SDK included. Kludge, bloat, planned obsolescence are no better in the current open source world than in the closed source world.
I am a "everthing in risc-v assembly" (with a _simple_ and dumb macro preprocessor only) kind of guy (including python/lua/js/ruby/etc interpreters). The main reason for that is not to be "faster", but to remove those abominations which are the main compilers from SDK stacks. Some sort of "write assembly once/run everywhere" (and you don't need a c++7483947394 compiler).
I agree, but I also think we need a fundamentally new paradigm.
It's very important that we as programmers have a good mental model for how the machine works. Abstractions are cool, but it is important to be aware of how your data lives in memory and how the cpu acts on your code, but everything we've been taught in the last few decades is almost irrelevant.
Almost all of us think and write code sequentially. Even with multithreading, your program is generally sequential, and the cpu just doesn't work that way anymore. With all the fancy whizbang branch prediction and superscaling and whatever other black magic, the cpu is fundamentally not sequential.
As a result, compilers are becoming enormous hulking beasts with millions of lines of code trying to translate sequential programs into parallel ones. This kind of defeats the purpose of us having that mental model of the machine. The machine we think we know is not the machine that actually exists.
We need a new set of inherently parallel languages. Similar to the way we program GPUs these days.
The modern cpu is orders of magnitude more complex than anything we've seen before. We need new mental models and new programming paradigms to extract performance the way we used to on sequential processors.
Even for embedded applications, microcontrollers increasingly feature things like multiple instructions per cycle, branch prediction, and multiple cores are much more common these days.
I think we're stuck in a shitty place in between two wildly different worlds of computing. We aren't willing to make the leap to the new, so we live in this rapidly crumbling ecosystem trying to adapt 50 year old code to superscalar hyperthreading gigacore x86 processors.
The amount of wasteful code and technical debt in every one of the systems underpinning our society is truly unimaginable in its scale. There is no path forward from here except to burn it all down and begin again with a fundamentally new way of looking at things. Otherwise, it's all going to come crashing down sooner or later.
I don't quite feel that - one side of it is that my current computers cover my necessities well enough, but it's still quite impressive how even more instantaneous is the boot of a new computer in comparison to my daily drivers. For the rest, computers have been "fast enough" for me for some time now.
Maybe I should move to big data and machine learning...
> Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs
I really miss those days. OTOH, just like my modern laptop, my Apple II could cold-start (from disk!) in 2-ish seconds.
This graph shows transistors basically maintaining pace and completely disregards multi-core performance. Of course single core perf will rise more slowly when a chip now has 8-64x as many cores.
> This graph shows transistors basically maintaining pace...
I'm no expert in silicon scaling but from reading technical papers, my (naive) understanding is that transistor density has almost kept up but now that scaling comes with increasingly stringent design constraints which architects must make trade-offs over. Broadly speaking, things like "You can have 2x last gen's density but they can't all be fully powered on for very long." That's a greatly simplified example but much of what I've seen has been far "thornier" in terms of interacting constraints along multiple dimensions.
My sense is that in the 90s we usually got "denser, faster AND cheaper" with every generation. Now we're lucky to get one and even that comes with implementation requirements which can be increasingly arcane. My understanding is that different fabs are having to roll more of their own design libraries which embody their chosen sets of trade-offs per node. In addition to limiting overall performance and being harder to design, this apparently makes reusing or migrating designs more challenging. While certain headline metrics like node density may appear to be scaling as usual. The reality under the hood is more complex and far less rosy.
You made me think that maybe computing is a deflationary force (I am not a libertarian, this isn't some free market bro idea, I think)d, the more that can be subsumed by computation, the more things that can get cheaper over time not more expensive, even if the face of rising material costs.
The relative price of steel has remained flat, while the steel performance has greatly increased.
Between material science and cheaper compute, we can build higher tech parts and techniques.
The cycles/consumed/per/person/per/year is an exponential, what are some important points on that curve? When the computation to design something is on the order as the same amount of energy to create it?
You could buy a Honda Civic new in 1980 for 5000$, that would only be just under 10k in todays dollars. What 1980 Honda Civic quality car can you buy today for 10k? Or am I a being nostalgic.
And look at the bump in inflation during the recession, https://blog.cheapism.com/average-car-price-by-year/#slide=6... of car prices. Was the 2008 recession triggered by excessively inflated car prices? Like causing a bubble in a pipeline, an economic embolism.
Current average price has dropped 10k$ from 35k to 25k in the years since 2008.
Could you please try to explain what you want to say with less snark? I'm a bit confused.
Paying people to do nothing gives you nothing.
Full employment isn't an end in itself, but it's useful because it is typically related to things we do care about. Employing people to do nothing is like fiddling with the speedometer of your care in order to 'go faster'. Or relabeling your amplifiers to go to 11.
You can sort-of turn atmospheric carbon into cheese. Have grass capture the carbon, and a cow eat the grass. That's totally doable, just not viable or efficient if your goal is to capture carbon at scale.
(If your goal was to go carbon negative at all costs, you could instate a whooping big carbon tax, and let the economy figure it out.)
Right now our economy basically runs on carbon at the core. We make stuff, move stuff and emitting carbon is necessary. If we switched our economy to owing and moving information, then we could still have full-employment, move money in the ecosystem while from the viewpoint of a materialist, just be moving useless bits around.
I think we already have a lot of high paying jobs in the economy that don't do much and pay people to do nothing (of value). We should absolutely spread that around.
Which is great if you have a traditional server application servicing a lot of independent requests, or giant linear equations that can be solved in parallel.
OTOH, the graph has an amadal's law section, which for many tasks is pretty out of steam (aka desktop web browsing/javascript JIT/etc).
I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.
This really gets back at the Arm big.little thing, where you really want 99% of your application usage to run on the big cores. The little cores only exist for background/latency insensitive tasks, and the odd case where the problem actually can utilize a large number of parallel cores and needs to maximize efficiency in the power envelope to maximize computation. AKA throw a lot of lower power transistors at the people rendering video/etc, and leave them powered off most of the time.
AKA, put another way, the common use case is a few big powerful cores for normal use, playing games, whatever with one or two high efficiency processors for everything else and a pile of dark silicon for the rare application that actually can utilize dozens of cores because its trivial to parallelize and doesn't work better being offloaded to a GPU. I suspect long term intel was probably right with larrabee, they were just a decade or two early.
So, economically I don't see people buying machines with a couple hundred cores that sit dark most of the time. Which will drive the price up even more, and make them less popular.
> I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.
Cause and effect is backwards there. Designers only went to multicore because single core performance improvement was leveling off. It's not that people wanted multicore systems and were willing to sacrifice single core performance to get it.
Well we wanted multicore, but it was mostly because windows loved to become irresponsive on single core. I think that from consumer point 2 cored circa 2006 were enough. 4 is probably the absolute maximum.
How does it disregard multi-core performance? As you said, it's showing the transistor counts going up, and it's also showing the rise in the number of logical cores.
The missing thing that's critical for most multi-core performance use cases is memory bandwidth. Maybe not easy to summarize on a graph like this, but for any workload that can't fit within L1 cache, you're unlikely to get close to linear performance scaling with cores. Sometimes a single core can fully saturate the available memory bandwidth.
Back in grad school, one of the analysis programs I used dated back to the mid 70s. The original paper gave a performance metric for a test program, which I compared to the runtime on a Chromebook running Linux. I was curious how closely that scaled with Moore's Law, and computed "initial_release + (1.5 years)*log2(initial_runtime/current_runtime)". That is, assuming that the change in program speed has increased due to hardware improvements, and those hardware improvements follow Dennard scaling, what year is it?
This (admittedly very rough) measurement ended up giving 2003. It was wrong by over a decade from the actual date, but correctly gave the date at which clock frequencies stopped improving.
>More depressing -number of cores will also level off eventually and where does that leave us then?
Short of breakthroughs (e.g. quantum and currently unknowns), the only clear path is less generalized architectures and more specialized chips. As you move more towards ASICs from general architectures you get improved performance, reduced power, and so on.
We've lived in the era of software where hardware was abundant and cheaper than an engineers time. Throw more hardware at it and make sure you have generally optimal algorithms in most your run paths. That's going to change more and more and I suspect we're going to have to start rethinking or redeveloping some layers of abstraction between current software and hardware.
As it stands now we're building more and more complex things atop weaker intermediary layers of abstraction to save time and meet budgets but that's going to have to be revisited in the future and the inefficiency debts we've been building up will need to be paid down. Clear code will become less of a top priority when clever optimizations can be added in that may not be so clear. We're still many many year away from this but that's my prediction.
The "cores" are becoming more specialized and optimized for domain specific tasks.
Compiler technology advancements are needed to take advantage of such heterogenous architectures in a transparent way.
LLVM MLIR started that already.[1,2]
The alternative is being stuck with each silicon vendor's proprietary solutions like CUDA.
I'd guess we get more hardware acceleration. In classic computers (PCs, laptops, servers), for stuff like audio/video codecs, that's been available for decades, but I'd say the next big push will be ethernet/wifi accelerators that do stuff like checksum calculation/verification, VLAN tagging or even protocol-level stuff like TLS in the chip itself - currently, that's all gated for expensive cards [1], I'd expect that stuff to become mainstream over the next few years. Another big part will be acceleration for disk-to-card data transfer [2] - at the moment, data is being shifted from the disk to RAM to GPU/other compute card. Allowing disks to interface with compute cards will be a lot of work - basically, there needs to be a parallel filesystem reader implementation on the disk itself, on the DMA controller or in the GPU, which is a lot of effort to get done right with most modern and complex filesystems - but in anything requiring high performance removing the CPU bottleneck should be well worth the effort.
Mobile is going to be more interesting because of power, space and thermal constraints and a lot of optimization already being done because unlike on classic computers vendors couldn't just go and use brute force to get better performance, and there is a bit of an upper cap on chip/package size as well. Probably we'll see even more consolidation towards larger SoCs that also do all the radio communication stuff if not on the same chip then at least in the same package, so the end game there is one single package that does everything and all that's needed on the board are RF amplifiers and power management. All the radio stuff will move to SDR sooner or later, allowing for far faster adoption of higher bandwidth links and with it, a reduce in power consumption as the power-expensive RF parts have to be powered on for less time to deliver the same amount of data.
Who knows what sort of tech aliens would have? I don't think this whole foray into general purpose computing was necessarily pre-destined. Maybe their whole system could look more like a bunch of strung-together ASICs. "You made your computers drastically less efficient so that anyone could program them? Why would you want your soldier-forms and worker-forms to program computers? Just have the engineer-forms place the transistors correctly in the first place, duh."
>Who knows what sort of tech aliens would have? I don't think this whole foray into general purpose computing was necessarily pre-destined.
It's sometimes fun to think that technology is a function of the intelligence that creates it.
What if the aliens have some vastly different perception of reality than us? Things we consider obvious to them may not be, and vice versa. The underlying desires and motivation different.
Humans for example, often tend to invent things for the sake of it. Imagine a species that doesn't do that. Or an organic FTL drive conjured into existence over eons via distributed intelligence. Weird.
> Or an organic FTL drive conjured into existence over eons via distributed intelligence. Weird.
E.g., What if the first aliens to find us are hyperintelligent slime molds, whose entire existence is predicated on finding the shortest distance between two points in higher-dimensional space and then traveling there to see what there is to eat?
The anime Gargantia on the Verdurous Planet explores this.
Here squids evolved into a spacefaring race that is, if at all, only using organic technology and doesn't seem to have consciousness.
They are at war with the spacefaring humans that rely on mecha and AI. It ends with a very non-human and frustrating coexistence message instead of going for all out termination of hostile creatures.
One of the most interesting things to think about in this regards is the past and the crazy things they thought, and why they probably didn't seem especially crazy at the time. In the earlier ages of exploration of our world people have been able to discover ever more amazing things from springs mysteriously heated even in the coldest of times and places, to a tree producing bark that chewing on can make ones pain completely disappear (more contemporarily known as willow/aspirin), and endless other ever more miraculous discoveries.
Why would it thus be so difficult to imagine there being some spring or treatment that could effectively end illness or even aging? A fountain of youth just awaiting its discovery. It was little more than a normal continuation outward from a process of exponential progress. But of course the exponential progress came to an unexpected end, and consequently the predictions made now look simply naive or superstitious.
We're now currently in our own period of exponential discovery and the fabulous tales of achievements to come are anything but scarce. Of course, this time it'll be different.
Perhaps they operate a combination of biological systems alongside their electro mechanical ones.
Their ship may be locally intelligent everywhere, with that all rolling up to an i9 ish main control system.
Purpose optimized hardware communicating along standardized interconnects could mean lot of hard tasks done in silicon or shared with biological systems too.
They may have decades, centuries old solutions to many hard problems boiled down to heuristics able to run in real time today. Maybe some of these took ages to run initially.
Just thousands? I would expect 100k years at a minimum and even that is only .0007% of the age of the universe. Millions or Billions of years more advanced is not out of the question.
It would be interesting to see how similar technology is among such advanced civilizations, even if they did not compare notes. Does technology eventually converge to the same optimal devices in each civilization?
Given our current extremely primitive state (only about a hundred years of useful electronics) I would be disappointed if we could even imagine what this technology looks like.
They'll likely use optimization laws of nature to get perfect solutions instantly, like what people try to get nowadays in some labs with electricity finding the shortest path/route immediately.
There's also ideas like the Mill processor. Though it's hard to avoid comparisons to Itanium, and how a mountain of money still didn't produce a compiler that could unlock what initially sounded like a better ecosystem.
Seems like the latest Nvidia GPUs aren't really an improvement over the previous ones, but just bigger and proportionally more expensive. So maybe the leveling off in performance is already starting to happen.
There is a lot of room for development before the exponential curve can be carried by the next paradigm: at least for desktop computers we are still decades away from case filling 3D "compute cubes".
The metric is performance per watt per dollar. At the moment the fact is that the amount of compute available per watt dollar is ridiculously cheap, crypto not withstanding.
We are not limited by compute resources but by business practices. The organizational cost of software design is where the next gains are, not technological.
yea, buying a pc in the early intel era was a somewhat double edgded sword because you knew that the next generation would come out in a year or so and it would probably have more than double the performance.
For many years my friends and I had a rule that we wouldn't buy a new computer until it offered at least 4X the speed of our old one. We didn't have to wait all that long.
The URL is a lie but that's not the browser's fault. It's correctly showing you the URL that you requested and the server responded to. If you save the file locally the browser will give it the correct extension.
> This trend of browsers to increasingly lie to me about what I'm looking at is infuriating.
The browser does not lie to you, it just does not show the MIME type of the content.
I do agree that it would make sense that the browser shows the MIME type of the content that is currently displayed, but this is opposition to the current fashion (in particular pioneered by Apple) of "simplifying" everything.
Transmeta announced their first product, the Crusoe processor, in 2000. Before then they were highly secretive and nobody knew what area they would compete in.
Intel was indeed worried about the laptop market when the Crusoe came out, but quickly adopted the voltage and frequency scaling into their own processors negating some of Transmeta's key technical advantages.
Meanwhile, IBM changed directions as a third party fab (they dropped bulk CMOS to focus on SiliconOnInsulator) leaving Transmeta without a product to sell until they could find a new fab and design their next generation product for that.
Smaller transistors mean fewer electrons that go in and out every clock cycle, so less power per transistor.
Higher clock frequency means more cycles per second, that is more electrons spent per second thus higher power consumption.
Since clock frequency has stabilised and the total area of a chip is I think not much larger than before, it is expected to see the power consumption stabilising.
I also believe I read somewhere that one of the reasons clock frequency stoped increasing was that the power consumption became too high for the chips to handle the thermal dissipation.
> the vast majority of software basically still uses only a single core
And that which does use multiple cores, sometimes only scales well to a few because then other bottlenecks† start to become most significant.
Many things are not so “embarrassingly parallelisable” that they can easily take full advantage of the power available from expanding the number of processing units available beyond a certain point.
--
[†] things no longer being “friendly” to the amount or arrangement of shared L2/L3 cache as the amount of threads grow causing more cache-thrashing, hitting other memory bandwidth issues, or for really large data you might be considering issues as far down as the IO subsystem or network.
The article doesn't really mention one of the biggest problems: cost. These new process nodes are insanely expensive. Far moreso than before. This is one of the biggest drivers of new packaging technologies (chiplets). Companies simply can't afford to manufacture their entire design on the latest process. There are even still newer transistor designs that promise further improved performance but will be even more cost prohibitive.
I'm not convinced that there are any hard EDA problems here, despite the fact that EDA as an industry in general is woefully archaic, especially when compared to the equivalent in the software world. We've been doing multi-layer and/or multi-die packages for a long time. It's more of a project management challenge, having to get different teams (process, packaging, design, verification) to work together earlier on in the design cycle, and more frequently throughout.
Chiplets aren't so much driven by a need for heterogenous processes as much as minimizing the yield impact of defects. Ian Cutress has a good portion covering it in this recent video: https://www.youtube.com/watch?v=oMcsW-myRCU.
In short, if your process produces 20 defects per wafer, and you can fit 100 chip on a wafer, you're going to end up with ~84% yield(ie, slightly less than 20% loss). If you are able to split that same chip into 2 equal-sized pieces and make twice as many, your yield is now above 90%.
AMD has also made them composable in such a way that they have to produce and stock far fewer ICs to fulfill all of their SKUs, which is also another fantastic benefit.
If you're interested in more details on the cost of producing chips on the latest nodes, this is a good back of the envelope sort of breakdown for the Ryzen 7950X. [1] They use a visual die yield calculator to talk through the logic of improving economics using chiplet designs.
The figures on the actual full wafer costs for TSMC 5nm are not public, but analysts say a ballpark around $17K is pretty reasonable. Using all the figures they ballpark, the manufacturing costs of the 7950X die collection, fully packaged, sounds like it averages around $70. So the cost is high as far as large scale chip manufacture goes, but this is also a part that retails for $600; there's certainly plenty of room to remain profitable
at the top.
Realistically, R&D costs are the biggest consumer of profit margins still.
Yeah, that video is fantastic. I wonder if the higher costs will place more of an emphasis on running chips at power levels that will gives better longevity. We can effectively halve operating cost and double lifetime of chips by reducing their power levels 20%. Seems plausible if we don't have technological obsolescence to force depreciation.
Don't chips at stock power levels with reasonable cooling have extremely high resilience for at least 5 years? Most datacenters have no more than a 4 year refresh cycle for servers based on power and space efficiency optimizations.
For example, der8auer did a test [1] with chips like the 5800X with high overclock and high voltage with stress tests for over 4000 hours and came essentially to the conclusion that the chips are likely to endure for at least 5 years under rather extreme conditions. Likely much longer under normal circumstances. The number of systems he's testing isn't statistically significant like you might get by trying this sort of thing at datacenter scale, but it is fairly illustrative.
5 years is probably a fair current estimate, but that also doesn't come for free. The fans in DC servers move a ton of air, which itself further increases power use. I was thinking more on the order of 10-20 years.
Under most circumstances I can't imagine a reason to bother powering on a 10 year old system, short of nostalgia. The costs of running it will quickly eclipse the costs of buying something newer and more efficient.
Of course, there are plenty of edge cases, like needing something bare metal that has a particular sort of software compatibility or IO requirements. Some industrial computers still run 486 chips with ISA buses for this reason. These sorts of systems will have been engineered with longevity in mind from the outset though.
Other edge case, just for fun: embedded style systems like the Raspberry Pi. These are tiny, low power, and can be used for specialty purposes for ages. They are also engineered on nodes and setup in a manner that will likely leave plenty running successfully in 10-20 years' time as it is.
It is really only since we've entered the era below TSMC's 7nm node that longevity has become much of a concern at all. It would take a whole essay to even TLDR the constraints of why that only becomes very relevant in the period where those nodes start to become known as "mature", and this is already enough of a tangent, so I'll just leave this breadcrumb of a presentation on the lifecycle of silicon process nodes:
That last one is very significant. The hyper reflective EUV mask mirrors are incredibly hard to make and cost hundreds of millions. The mask alone no doubt raises chip cost by $10 or so.
Likewise, hundreds of engineers for 3-5 years can run up a couple billion dollars that must be recouped.
Absolutely. Yield is obviously a big part of cost-per-die, but it's not just defect density related. When a node is first released, the process is not as well controlled so you end up with yield loss because the process doesn't match the models (in the extremes) and dies fail at various testing stages. This obviously gets better over time, but it's still a major issue for companies designing in the latest technologies. Anecdotally I'd say there's a ~3X reduction in the "sigma" of the process over 3-5yrs.
Of course these issues have always existed, but wafer costs have never been so high, and yields never so low (both because of increased defect density and worse process control with new transistors).
They use an internal interconnect, similar to how a server which took multiple distinct CPUs would use a motherboard-level interconnect.
In addition to splitting the CPU or GPU into distinct units, you can also take other functions and use different processes for them. For example, in package I/O or L2 cache don't really see the same advantages for newer processes, so you can make these using more established (and cheaper/more available) processes.
You get most of the economic value from the first few divisions. Going from 80% yield to 90% yield shaves 14% off your cost/unit. Going from 95% to 99% only saves you 4%.
I'm not an expert at how they combine chips. Like I said for AMD, they also wanted their units to be composable w/ small number of chips, so they basically have a die w/ a few cores and different die w/ memory, and I believe they have a proprietary communication mesh for connecting them. I think there is some considerable signal/energy overhead to communicating between chips. The cost of masks and interconnects is probably high enough to make a high number of diverse chiplets inviable, but I wouldn't say it's impossible in the future.
This is a fundamental effect because of the Shannon-Hartley theorem[1], which says your communication channel's capacity (i.e. bitrate) has a log dependency on the channel's Signal to Noise Ratio. In a practical wireline communication system, you have a transmitter with noise, a lossy interconnect, and a receiver with noise. As the interconnect gets longer, which is what happens when you go from on-die communication to die-to-die communication, your loss increases. This means you have to reduce your transmitter and/or receiver noise which requires using additional power.
Another way of looking at it is bit error rate [2].
I think you'll be hard-pressed to find concrete numbers since these designs are closely-guarded trade secrets. You might find some examples by searching for wireline transceiver / PHY papers on IEEE xplore, especially at ISSCC (conference) or in JSSCC (journal).
If we start shifting towards specialized hardware configurations to solve specialized problems, how does “move everything to the cloud communicating through fragmented docker images” fit?
It just seems weird to me to see posts of spending lots of money on complicated scalable Kubernetes clusters, but at the same time people are putting their AI computation engines right on die so that they get good cache coherence. Maybe they’re more resolvable than it seems.
K8s GPU nodes are already kind of an example of this-- the app can still access the specialty capabilities of the hardware, we just have to make sure to tell the scheduler "hey run this app on this kind of node with this flavor of capabilities". I could imagine that kind of flavor capability growing and becoming more fleshed out
Except k8s calls it 'taints' which is uh a weird choice.
Taints is a funny name when you look at it like that but taints are actually the opposite half of the story.
If you want a Pod to be scheduled with a certain node you need to add affinity.
The taints are to ensure that your average nginx pod doesn't run on the specialty hardware and block the pods that actually need it. In theory the scheduler could be smart enough to evict the pods that don't need the special node but IDK if the kube scheduler is smart enough. Even then it may he preferable to have the taint though to reduce the eviction rate.
This is probably one of the biggest challenges ahead, likely more so than making hardware that still somehow scales despite Moore’s law slowing down and tapering out entirely. The increased scale of integration with advanced 3D packaging and chiplets is a nice way forward, but the heterogeneity of those solutions need innovation at the programming model as much as anything else: how do you make such a complex system accessible in a way that the programmer can actually use it efficiently?
As much as I like semiengineering and their articles, they are very hardware and process technology focused and tend to miss the software side of things.
I love buying new and better gadgets like everyone else but if cpu advancements froze after the next generation of ones using the 3nm transistor node, would life on Earth really be that bad?
We’d be in a world where we have no choice but to optimize existing algorithms, software, and hardware designs. And yes, after many years even that would reach its limits. Even so, I still think life would be great for every generation that lives on after that.
I have 10 and 15 year old computers and they work just fine for me. Web browsing, spreadsheets, word processing documents.
I don't use the latest gamer stuff, or have 285-dimensional graphics that rotate in 48 dimensions or whatever.
I'd think that 95% of the computer users are just like me.
Pissed me off...my parents essentially use their computer and smart phones the same exact way - email,browse the web, and that's about it. My siblings decided they had to upgrade and got them new computers and smartphones. WTF? Total waste of maybe $5000. If anyone else has $5000 that they want to spend on nothing, let me know and I will give you my bank routing number and account number and you can just send it to me for nothing in return, too. :P
I would personally be content, but I’m thankful for the advances made possible by faster computing, especially in medicine, and I’d like to see more. If we keep advancing computing power, today’s research supercomputers will be tomorrow’s personal computers in the hands of researchers of all income levels all over the world.
The vast majority of hardware devices haven’t even caught up to 14nm, let alone 3nm which is still an experimental node used by a few customers “at risk”.
It would take decades at the current pace of progress just for everyone to catch up to the current bleeding edge.
I would have retired from IT work by the time that happens!
In other words, we’re already at the stage that many people will continue to see non-stop improvements even if 3nm is “the end”.
Meanwhile Intel says that their 18A node is looking good and ahead of schedule…
I think we’re talking past each other. You’re talking about getting the lagging sigmas of the distribution on today’s fastest retail computers, and I’m talking about advancements and discoveries that will require more than our current leading edge retail capabilities. Distribution of our leading capabilities will always be uneven, and will not push out an old computer at a 1:1 rate, so hard to measure via global mean/median retail computer speed in any middle or lower percentile.
I'm agreeing with you in the sense that even if the bleeding edge didn't progress, we will still have the kind of advancements occurring that you would like to see.
As a random example, full-frame camera CMOS sensors are made with very old processes (relatively speaking), and have very little "logic" on them. If manufactured with 3nm, they could have something like 2,000 transistors per pixel! That would enable seemingly magical "digital imaging" capabilities, such as infinite dynamic range, perfect digital vibration compensation, ultra-high framerates, etc...
I think it would be quite bad actually in opportunity cost. The performance per watt alone would represent a huge energy savings. New chips enable new applications and no new chips would mean that the entire nonlinear economic engine of tech industry would sunset in a few years once every existing application catches up to state of the art. Computationally heavy applications like ML would remain locked in the high tower of large organizations and out of the hands of individuals. VR and AR would probably never arrive to the mainstream.
I think that’s an illusion. The last 50 years of modern computing have proven without a doubt that global energy consumption will far outpace improvements in performance/watt that come with each generation of cpus.
Any improvement in energy usage is wiped out by more kinds of and ever faster gadgets: Desktop computers followed by mobile phones, every possible sized tablet screen, voice assistance speakers, voice assistance speakers with screens, smart watches, smart lights, smart vacuum cleaners, drones, and so on. And with all these smart devices, we have more and more data, which we then need AI to make sense/use of in order to develop new or more useful things. It’s a never ending cycle.
VR and AR never arriving to the mainstream would possibly be true. But our lives are already great today without that.
I think we're still pretty far off from that. Graphics are still held back massively by processing power, and algorithms that significantly could improve visuals would have been developed by now if it were that easy.
Not really, frequency has plateaued precisely because of the crazy power scaling with frequency.
The easily parallelisable nature of graphics means you can always make a faster GPU with more cores. The trouble is, newer nodes aren't reducing power usage per transistor very much, so all the newer GPUs just have a ton more silicon, and therefore much higher power requirements.
yeah, and the power draw is around the same because the 4090 is 4nm, and the 3090 ti is 8nm. smaller transistors means more power efficient chips, and thus larger improvements in graphics.
I consider all of those as nice-to-haves. We have over 7B people in the world and got to this point without the technologies you mentioned.
We already have the technologies today to allow people to live way longer than would otherwise be possible, to find and use any natural resources the planet has, and to dominate any other organism that could threaten our lives. I’m content with that.
I think it is a big mistake to be content with how crazy primitive we are today.
We can't even operate on a brain tumor that's inside your head - we lack the precision to get in there without damaging the rest of your brain.
When we want to operate at microscopic levels we continually fail.
The mysteries of how and why we age, how many diseases work, remain unknown.
There is so very much left to learn. There may come a day I might feel it's time to be content, but every second that passes I'm a little closer to death.
We should continue to strive for more. May there come a day that we can live for very long periods of time and each individual can have as much as 100 people have today.
Software has become more immature over the last decades because developers lost the engineering mindset and skills to write optimized software like they could back in the days where we only had a couple of MB of ram.
It really does not require new expertise or tech at all to cut latency and load times by 90%. Programmers have to get back to being engineers instead of just duct taping libraries together or praying that the dumb GC knows what its doing
Just going from Python to Java or C# will give us a 95% economy in energy. Going from web apps in JS packaged for desktop or mobile to C or C++ would yield a similar gain.
If Python or JS sit idle waiting for user input, are the efficiencies really so different?
Developer time also has energy cost in AC, transport, etc for a human.
What about the fact that the cost of web inefficiency is offloaded to the user? Users pay the power bill for the overcomplicated, slow, unnecessarily CPU intensive front end.
So, at a surface level, what you say is true, but the deeper picture is complicated.
The tooling, too, has gone a certain way... making a native GUI is now slower, less documented, less catered-to than web UI; in my opinion, for most simple to moderately complex UIs, native is going to take twice as long to make look good... but that inverts, in my experience, as the program and UI become more complicated.
It's also strange that the UI itself is so bloated and developmentally time consuming. We lived on the CLI once -- maybe in some not too distant future a killer app can once again show consumers the power of the command line. After all, google/Siri/Alexa do just that.
> If Python or JS sit idle waiting for user input, are the efficiencies really so different?
Nit pick: Python is typically at least an order of magnitude slower than JS, it's a whole different class of slow.
> Developer time also has energy cost in AC, transport, etc for a human.
Yes and fixing bugs in production is especially expensive. Presumably that's why Python is now trying to retrofit static types. There are statically typed languages that are just as terse as Python.
"More-than-Moore has created a compelling case for the implementation-analyses microcosm to transcend across the fabrics of system design, from silicon to package, and even beyond, and more so in the systems companies that are at the bleeding edge of design innovation."
Dating back to the turn of the century, I can still recall the pleasure of building my 1st of many SMP desktops/workstations with an Abit VP6 to game on and work on at home.
In 2022, I would almost expect to see mobo designers and chip manufacturers start rolling out deca-socket designs for compute and automotive use, giving users (okay.. Just the nerds and geeks) the option of mixing-and-matching chips to accomodate custom configurations (and unthought of use cases).
The CPU wall almost reminds me of automotive where the engines may be hitting cooling/fuel system limits of the current technologies, but still has plenty of optimizations available for unsprung weight, torsional rigidity, coefficient of drag and so on.
It may not address the power efficiency angles, but for many users, having 10 sockets to add optimized compute chips (main cpus, AI accelerators, graphics enhancement chips and a number of other feature-enhanced chips) keeps the R&D money flowing until the industry gets over this next hurdle (they always do).
The author states that the expense of pushing moores law, etc is becoming cost prohibitive, so take a break and rock some modern PCB designs.
> In 2022, I would almost expect to see mobo designers and chip manufacturers start rolling out deca-socket designs for compute and automotive use
It's called the PCIe bus. And before you say "but it doesn't meet my needs / it's latency is too high / it's bandwidth is too low", come up with some ideas on how to do that better rather than just say "add sockets".
Power is nothing without spittle (I'm sure at least one person said it)
I'm not familiar with the lingo, but I was thinking of multilayer sockets, with a higher bandwidth loop/ring for the primary sockets with the PCIe coming off the CPUs in a traditional fashion. Perhaps I read about that with the opterons ages ago.
bebox, what the macs could have been if jobs wasn't so hellbent on getting back to apple, had a CPU port.
it was pretty much a north bridge socket/plug to the cpu from outside of the case.
haven't seen anything close to that. also the design of the pseudo realtime os would be killer on today's multicores. not to mention the bekernel c api instead of the osx side-step-patents-above-all-else-api of their bastard bsd. glad my livelihood doesn't depend on osx/ios client side applications
I've done a few personal projects with LTSP and think PXE booting is the bee's knees for some uses (obviously not as portable as a laptop running porteus). Booting remotely was not a priority, but OS exploration has always been exciting.
With more devices (laptops, SBC's, etc) getting HDMI-In ports, running a compute stick, sbc/PI or other devices externally is also a cool feature. I like KVM's, so if they integrate a K/M protocol to HDMI-In port at somepoint, its back to plug and play designs, instead of RDP, SSH, etc for the external devices.
Though on this post, cramming a SOC with the kitchen sink on it, it seems like they could simplify that dilema with a deca-socket design, reserving the PCIe bus for traditional expansion.
Even expanding memory channels per socket negates some of the penalties for performance under some conditions (bottlenecks can be moved easily and/or will still exist without the complexity of jamming everything on one CPU die.) That is cool though!
I did a few quick searches on the quad cpu itaniums to no avail, but vaugely recall faster ring buses for sockets (though perhaps just faster for the era).
I can't say that i still have floppies for Leisure suit larry, but i can say 8-bit boom chacka mouw mouw did exist too.
For consumer use and probably many commercial uses too, I am becoming skeptical about hardware performance improvements and whether users are gaining sufficient benefits from these improvements.
Software has become so wasteful of resources that the hardware improvements are offset and there is little perceptible improvement for the end user. This is of course not the fault of the chip makers, but the fault of the movement to inefficient web apps and more resource intensive languages and frameworks.
I find myself agreeing with Jim Keller. His stance — from very high up in the chip design industry — is that individual teams at the bottom may be seeing the end of the road, but there are alternate paths being explored by other teams. Often these alternate paths aren’t yet commercialised — mere options — but enable the overall improvements to keep going without getting stuck.
Some examples:
Stacking and joining chiplets will soon enable over 2 TB/s memory bandwidths for ordinary devices, far higher than the 50 GB/s typical of mid-tier DIMM memory.
Storage latency and IOPS is still going up. Millions of random accesses per second with ~20 microsecond latencies enabled technologies like DirectX DirectStorage. This totally changes the way game engines work and how their art assets are authored.
I think there’s a potential for collaboration between borrow checkers and more aggressively NUMA architectures, but the problem is you still have to run commercial operating systems on any chips you make. That’s less of a problem for embedded systems and controllers, but then you lose a lot of the value as well.
That was one of the few good parts of this article. Software will have to become power-aware. You already see that on phones where software has to schedule tasks with the OS rather than running them whenever it wants.
I predicted a LONG time ago that once we reached the physical limits computers would get bigger. Starting out with chiplet designs and then moving on to dinner-plate type systems such as Cerebras.
Our computers will become larger in size, maybe the size of a fridge in a couple of decades. Power requirements too will increase.
I'm still hoping some wondrous material will be invented which will allow us to shrink chips still further. But it's obvious there's a limit how far we can go. The theoretical limit is still way off, though. It's about several million times the speed of current CPU's.
It seems the bottleneck nowadays is mainly memory latency/throughput and thermo, the latter one is especially problematic for mobile devices due to less space for cooling. For memory, it seems that although we have higher throughput with DDR5, the latency is still high and we can only compensate it with larger cache, e.g. AMD 3D cache.
For larger chips, you will get longer wires and may not be able to do things in 1 cycle. Assume that thermo is not an issue, you can probably cram many cores into 1 tiny chip, but the bus will probably be overloaded with too much traffic and long wire length. I think eventually cache coherency will be so expensive that we will go back to message passing for multicore or requires source/destination node ID for memory ordering...
Not really. Our substrate today is silicon (really n-type silicon). We build a city on top of that substrate. Modern semiconductors have become way more heterogenous in recent years. Our gate insulators are not silicon dioxides anymore but high-k dielectrics such as hafnium oxide. And even interconnects now have different compositions, such as cobalt and copper.
But the substrate is still holding us back, and one reason for that is creating high-quality single crystal wafers for silicon that are way more manageable.
The solution for a long time has been to use direct bandgap semiconductors such as GaN, AlN, and so on. Silicon is an indirect bandgap semiconductor, meaning its conduction and valence bands are not aligned in momentum space. As a result, electrons hopping between these bands pay a thermal penalty. GaN and AlN don't have that problem, and we now have a plethora of crystals (especially complex oxides) that are all direct, and we have an entire library of bandgaps to choose from.
The problem is that we cannot easily make wafers from these direct bandgap systems. At least yet to be on any commercial scale. Honestly, rather than flushing billions of dollars down the toilet for Zuck's vanity projects - we should have spent money to scale up Czochralski's growth[1] for a ton of crystal families. But, as we are reaching the limits for reticule sizes and we move to chiplet architectures, maybe smaller wafers will work too?
Perhaps computer architecture is destined to become like building architecture: functionally complete, with variation only in form, aside from some minor incremental innovations and the occasional one-off megaproject.
I'm not an architect, but I very, very, much doubt building architecture is complete. Just look at buildings from 30 years ago to now. The differences are stark.
It’s all form. Buildings 30 years ago and buildings today do almost exactly the same things. Sure there have been marginal improvements, but they are marginal.
I don't know, my house was built in 1920 and it's doing fine. Upgrades have been made- heat pumps, new wiring and so forth- but the overall design of the house isn't really that different from one built today. Except that in my house, the 2x4's are actually 2 inches by 4 inches.
That doesn't mean your house would be build like it's 1920 if it were build right now.
Besides I didn't really mean "normal" houses. But rather large buildings which have complex ventilation, sunlight, climate control and sustainability requirements. There has been a lot of innovation in that space. And it shows in the office buildings that are build today.
They're still predicting shift away from silicon is another 5 years out. No, really. It's been 20 years+ at this point they've been developing alternatives and they can make chips but they're far from being competitive with silicon in any metric. It's called single crystal complex oxide or SiC
I wonder if part of the recent switch back to statically typed languages is merely a reflection of changing compute requirements. During my career, single core performance has only doubled at best. Scale out has worked… but it hasn’t been cheap.
Or perhaps because knowing/reasoning about at least some properties before running a program is a good thing? Optimisation is only one consequence of static types. Perhaps the reason is that static type systems are getting better and less intrusive.
Some headlines could really use capital letters. I am not a native speaker, and I struggled to understand this sentence until I clicked on the link, and realised it is actually a headline-style sentence...
I wonder, as parallelism is exhausted too, will there be a point of reversion, were the cost of bad management practices on product and performance become so big, that the decision power returns to the experts who craft such systems? As in, the council of engineers vote no on middle managements eternal bloat for company internal status points? Else the users switch to engineer governed systems, as created by open source?
I have a hard time viewing something that can "run out of steam" as a "fundamental law". It seems to me that the true fundamental laws - those of physics - are going full steam ahead, and the reality they describe are catching up to companies' chip designers (and marketers).
I'm a RISC-V designer, I'm afraid it's not a magic bullet for these sorts of issues. What it does do is free lots of designers to work on solutions and build off of each other's work
Can't we move over from silicon? I've read about countless tries to design CPUs based on different principles and materials like carbon, optical and even biological modeled systems. So far our best results still come from trying to shrink silicon transistors.
With so much invested in Silicon it is hard to boot up a whole new paradigm. In that sense working with Silicon is much easier / less friction relatively speaking.
If power, heat and efficiency are valid concerns, we might as well try to optimize the software, use less abstractions, go for the code that use the least CPU cycles, use less interpreted languages, more AOT compiled ones.
In general, sure, for things like web pages etc. In fields like robotics people are always desperately trying to improve the efficiency of their algorithms but it really seems like we just need way more powerful silicon.
No... no one is asking questions like that because a chip that consumes no power but still does work would be greater than 100% efficient and would violate the laws of thermodynamics.
Well sure, but exploring the _idea_ of zero power consumption might lead to interesting discoveries in efficiency and power consumption models, even if actually achieving 0% power consumption is impossible.
Yes, if the problem is viewed broadly, eg. passive packages like RFID tags. Power consumption has been a focus area in chips since forever.
For active chips, it's not possible if the problem is viewed narrowly however it's possible to have a package that contains both power generation capability (eg. solar, kinetic) and highly power efficient, sporadic processing which probably exist already. Major application areas would be long-lived deployments (eg. animal tags, probes, implants, wearables).
Going from pointing out CPU and memory hardware problems to declaring, forcefully, the solution can only be Java, of all things, is one of the more bizarre comments I've read here lately. Especially because there is nothing backing up the claim.
Because you need a VM with GC on the server so it doesn't crash and can share memory between cores properly.
On the client you can go lower since deploying new buggy code only affects some of the clients and not the entire server (unless you deploy completely untested patches).
- Rust seg. faults.
- Go has no VM
- WASM has no GC.
How exactly are GC and VM crucial to "not crashing" and "sharing memory"? And what exactly do you mean by "not crashing"? Crashing is a broad term, and I assume you mean C++-style-crash-because-of-type-safety-errors.
Go does not run on VM, yet is type safe. Rust can indeed cause segfaults, but that's only if you (ab)use the "unsafe" mechanism, which is explicitly labeled as not safe. As long as you stick to the safe Rust, segfaults are impossible.
Could you explain your reasoning to me in a little more detail, please?
"While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said "only 12 times faster than C means you haven't started optimizing"." - Martin Fowler https://martinfowler.com/bliki/OOPSLA2005.html
"Many lock-free structures offer atomic-free read paths, notably concurrent containers in garbage collected languages, such as ConcurrentHashMap in Java. Languages without garbage collection have fewer straightforward options, mostly because safe memory reclamation is a hard problem..." - Travis Downs https://travisdowns.github.io/blog/2020/07/06/concurrency-co...
"Inspired by the apparent success of Java's new memory model, many of the same people set out to define a similar memory model for C++, eventually adopted in C++11." - https://research.swtch.com/plmm
All of those sources seem to compare Java to C and C++. You've mentioned more languages than C and C++. Furthermore, your third source even admits that it's possible to implement similar memory model in C++, which goes against your claims.
Regardless, I wasn't asking for sources, I was asking for elaboration. What exactly does VM and GC have to do with crashing and memory sharing - in other words, how exactly does a lack of VM and GC imply crashing and inefficient memory sharing?
https://i0.wp.com/semiengineering.com/wp-content/uploads/Pic...
(not really a png, apparently, but a webp file)