Architecture: Skylake-X versus Skylake (2)
There are some other important differences between the new Skylake-X processors and the existing Skylake-X chips, aside from the introduction of AVX512 and the new mesh structure; namely a new cache format and Turbo Boost 3.0.
Since the first generation Core-processors Intel chooses to give every CPU-core their own 256 kB L2-cache. The CPU also always contains an L3-cache, shared between all cores, that was 2.5 MB per core for the last generations of high-end desktop CPUs (Broadwell-E). This means 25 MB for 10-core CPUs. This L3-cache was inclusive, which means that all data placed in the per core specific L2-cache also had to be in L3. The advantage of this is that when core A needs data from core B, it can always be found in the shared cache. Therefore there is no need to first send a request to copy data from the private L2-cache to the shared L3-cache. Another advantage is that keeping the caches coherent is straightforward. When multiple cores work on the same data and one of them changes it, the change is directly processed in L3. Based on this the other cores can quickly determine whether or not the data they have in their own L2-cache is up-to-date. However, a disadvantage of an inclusive cache is that the total amount of data that can be cached is smaller: of the 20 MB L3-cache of a 10-core Core i7 6850K there was always 2.5 MB that would serve as a copy of the data in the 10 256 KB L2-cache segments.
With Skylake-X, Intel drastically changed the cache-architecture. The amount of private L2-cache that is available for every core has been quadrupled from 256 kB to 1 MB. This means that all cores can store a lot more data ‘closer’ in order to make adjustments. Therefore the L2-hitrate, or the chance that the data that a core needs is in its own cache, is increased a lot, which of course increases the performance. However, the shared L3-cache is decreased to 1.375 MB per core, which results in a total of 13.75 MB for a 10-core CPU. The cache is no longer inclusive which means that there is no copy of all the data in L2-caches. This means that the full 13.75 MB can be used for data exchange between cores, at the cost of the fact that caching algorithms have become a lot more complicated because of the exclusive design.
Intel states that the new cache-architecture can result in a significant performance boost. Of course we have to bear in mind that the Skylake-X CPUs are basically server CPUs. Adjustments such as these are primarily because of developments in server workloads. The underlying thought process is possibly that powerful Xeon CPUs are used more often for virtualization, where multiple software-installations are running on a single CPU, that each use their ‘own’ cores. The exchange of data between cores is not as important in such a situation, while accelerating the single-core performance can offer a big advantage.
On the desktop, where we still find a lot of single-threaded software, the increase of the L2-cache can definitely result in a performance boost in some cases. That said, you could also argue that in situations where one program utilizes all cores, the new format can negatively impact performance. If we see big differences in the benchmarks between a 10-core Broadwell-E and a 10-core Skylake-X, this cache-format might just be the biggest factor.
Nonetheless, it seems that Intel placed the extra L2-cache and AVX512 functionality outside of the original core-design. As such, the base of the cores is identical to the existing Skylake-CPUs, but with extra transistors around the cores within Skylake-X for the added functionality.
Turbo Boost 3.0
Another difference between Skylake and Skylake-X is the support for Turbo Boost 3.0, a functionality that we already know from the Broadwell-E high-end desktop CPUs. After production, Intel determines what the two “best” cores are for every manufactured Skylake-X CPU with internal testing. When only one or two cores are used, the CPU will always try to move the workload to those two cores, after which the maximum turbo clock frequency is increased further.
For example, the top model Core i9 7900X has a maximum turbo clock frequency of 4.3 GHz for random cores with single- and dual-threaded applications. Thanks to Turbo Boost 3.0 this is increased to 4.5 GHz when the workload is on the right cores.
You do not need an extra driver for this; since the Windows 10 Anniversary Edition the support for Turbo Boost 3.0 comes with Windows 10.