Why are processors increasing in cores rather than increasing in clock speeds?
The speed of the clock on the fastest currently available CPU core is around 5 GHz…but most CPU’s have been sitting around 3GHz
We’re running into Moore’s Law limits on the size of a transistor – so shrinking the circuitry (which has historically allowed for higher clock speeds) is reaching the edge of becoming impossible.
A 5GHz clock means that over the course of one clock cycle – a beam of light can cross about 6 centimeters in a vacuum. But the speed of light in silicon is four times slower – so when the 5GHz clock ticks, components just a few millimeters away aren’t going to know it until the next clock tick has started.
The wiring inside a complex chip is far from a straight line path – so without super-careful design, even components just one millimeter apart will run into problems.
Life gets very hard for chip designers under those circumstances.
Sure, you could maybe push the clock rate higher – but if it’s going to take multiple clocks for the data to get to where it’s going and the answers coming back…then you start to see diminishing returns.
This chart:
…clearly shows that although the number of transistors in a CPU has climbed fairly consistently, the clock speed and the available power consumption pretty much leveled off at around 3GHz all the way back in 2002. Over nearly 20 years, to 2021, the clock speed of the fastest processors hasn’t even doubled.
So Moore’s Law for clock speeds has been pretty much over for 20 years.
The only way we can still make significantly faster chips is to use more transistors – which we have PLENTY of…but that doesn’t translate into raw speed.
To use more transistors, all you can really do is to add more cores – add more cache memory – or try to build a more sophisticated way to run machine code.
- Adding more cores doesn’t buy you much if your software can’t use them all (and most software cannot).
- Increasing the amount of cache produces an incidental speed up for some algorithms and for badly written code – but for well-written programs, it can often have little to no effect on performance.
- So we’re left with using more transistors to get smarter at running instructions.
Efforts to do that have included things like branch prediction, parallel execution at the micro-code level, speculative evaluation, all sorts of devious tricks.
But the trouble with these things is that they keep on resulting in things like fundamental CPU bugs and security issues.
Speculative evaluation (for example) was the cause of the Spectre, Meltdown, SPOILER and Foreshadow malware attacks – which were essentially impossible to defend against because the bug was hard-wired into the CPU core.
It’s extremely hard to implement fancier CPU features without inadvertently opening a new security hole or some other horrific bug.
We truly are seeing the end of CPU speed improvements.
SO WHAT IS THE FUTURE?
The future seems to be in more specialized processors:
- The GPU architecture – originally intended for graphics processing – has proven to be immensely powerful. Instead of having a handful of entirely unrelated and highly sophisticated CPU cores – you build hundreds of much simpler GPU “cores” which operate more or less together in lockstep. By sharply limiting the functionality – but radically increasing their numbers – we can write specialized “shader” software that runs at speeds that CPU software can only dream of. Hence GPU cores are now used for things that have nothing to do with graphics. Everything from artificial intelligence to bitcoin mining. They don’t help with every algorithm, but in areas where they DO help – you can get two orders of magnitude speedup with a relatively cheap chip.
- Specialized AI processors – the Tesla AI chip for example – take that even further. Performing *JUST* the neural networking algorithm at the heart of all AI – but doing so with VAST numbers of even simpler processors (not much more than multiply-accumulators). This means that they can run AI processing hundreds of times faster than even a GPU chip. But that’s ALL it can do. In order to run conventional programs, the Telsa chip has to have several conventional CPU cores on the same chip to feed and generally manage the AI system.
- Quantum computers – which are truly insanely fast – but are only capable of running VERY specialized algorithms that require extreme parallelism.
By finding tasks which (although not universal) are fairly common – and automating those in a way that isn’t a general purpose computer (and maybe isn’t even “Turing Complete”) – you can still make vast improvements.
But speeding up a conventional CPU core to any significant degree – is rapidly becoming essentially impossible.