Architecture: Skylake as basis
In basis, the Skylake-X processors use the same cores that we know from the Skylake processors used in laptops and regular desktops. We already covered the exact improvements in the cores in our Skylake architecture review, but here we repeat this.
Anyway, back to Skylake as it is introduced for desktop- and laptop processors now. In the front end of the CPU-cores Intel improved the branch predictor, as they do nearly every generation. Modern CPUs do not carry out instructions of program code in the original order. Instead, instructions are carried out parallel and in optimal order as much as possible, in order to fully utilize the execution units. These are the parts of the CPU-cores that perform the calculations and operations. With branches in the program code, such as IF-THEN-ELSE-constructions, where some variables are not yet known or calculated, the processor has to guess which branch is the most likely one. The better the branch predictor, the less times a wrong choice is made and the less often instructions are carried out that are ultimately not needed.
Furthermore, Intel increased the amount of instructions that a core can keep track of in optimal order. The so-called out-of-order window is increased from 192 instructions with Haswell to 224 with Skylake. All of this is done to keep as many execution units busy as often as possible. The prefetchers, the part of the CPU that predicts what data from the memory will be needed and retrieves this ahead of time and places it in L2- or L1-cache, is improved according to Intel. Unfortunately they did not explain this any further.
These execution units in the backend of the cores received lower latencies and saw an increase in execution units according to Intel. The former indicates a shorter pipeline for certain instructions. We are not allowed to reveal the exact amount of execution units before the Skylake based Xeon-versions are released.
In comparison with previous generations it is new that when execution units within a core are not used – for example the floating point units, when there are only integer instructions queued up – can be turned off in order to reduce power consumption. For security- and encryption software, Intel also states that the AES-GCM and AES-CBC instructions are accelerated with 17% and 33% respectively.
For the communication between the cores and the memory, Intel mentions a better L2-cache miss bandwidth, which indicates a faster connection between the cores and what we used to call de L3-cache and is now called the LLC or last level cache. Skylake also contains new instructions that allow the different caches to be managed in a more efficient way. According to Intel, the HyperThreading technology is also improved upon with Skylake.
The image below shows a few key figures of buffers and other things that are increased with Skylake. We mentioned the increased out-of-order window. But, for example, the amount of store operations (instructions that store data in the memory) that can operate at the same time is increased a lot, as well as the amount of instructions that the scheduler can process at once. All these components make small contributions to the improved IPC.
The cores of the Skylake processor also contain two new security technologies. Intel Software Guard Extensions, or Intel SGX in short, is supposed to make sure that applications can work in their own, fully separate environment, which makes it impossible for other software to view or manipulate data or program code. This is similar to what the Intel Trusted Execution Technology does in order to separate multiple virtual machines, but SGX does this on an application level.
The Intel Memory Protection Extensions, or Intel MPX, is an extension on the cores that is supposed to counteract memory bufferflow attacks. Buffer overflows, which write more data than originally intended in order to overwrite data of other software, is one of the most used methods for soft- and hardware hacks.
A few improvements in the core compared with Broadwell are purely to further reduce power consumption. A smart addition is that the core-parts that are needed for AVX2 instructions can be completely shut down when there is no supply of AVX2 instructions. With AVX2, the instructions with 256-bit data are processed in one go, which requires a lot more compute power and therefore electrical power than regular 32- or 64-bit instructions. Research by Intel shows that there are mainly two modes when it comes to AVX2 instructions: none or a lot at once. Fully disabling this function does result in a slight delay as soon as the next AVX2 instruction pops up, but this seems to be justifiable. Aside from completely turning off the hardware necessary for AVX2, the other core-parts that are not used can also be turned down slightly. Something comparable is possible for AVX512 (see the next pages). Intel also states that they improved the idle power consumption of the cores once again.