Skylake as 18-core model
We already extensively covered the architecture of the high-end Skylake-processors in our Intel Core i9 7900X review. It is a bit unnecessary to repeat all the ins and outs of the architecture, so if you have not done so before, we definitely recommend you read the linked review.
The most important difference with the existing Skylake-X models is, as written before, that Intel uses the HCC-chip of Skylake-SP for the Core i9 7920X to 7980XE. As you have been able to read before, Intel switched from a ring bus layout to a mesh architecture, in which all cores are linked together in a matrix, which is basically a sort of chess-board layout. Horizontally and vertically there are communication lines that connect the cores with the caches linked to other cores and shared chip components such as the memory- and PCI-Express controllers. With the 10-core LCC-chip the cores are in a 4x3 matrix, where two spots are taken by the memory controller. With the 18-core HCC-chip the cores are in a 4x5 matrix. All Skylake-X chips are based on one of these two chip types, where models with less cores simply have a few cores turned off.
In the diagram below you can see a total of six memory channels. In the Xeon-processors, in which these chips are used, all six are available, while with the Socket 2066 Skylake-X desktop processors "only" four memory channels are available. Something similar is also the case with the PCI-Express 3.0 controller: there are 48 lanes available in the chip, but with Skylake-X 44 of them can be used.
More L2 cache and AVX512
As you were able to read in our previous reviews the cores in the Skylake-X / Skylake-SP processors are based on the cores of the "regular" Skylake processors that we have known for some time, better known as 6th generation Core Socket 1151 CPUs. However two important additions have been made to these cores. Firstly more cache-memory: while Intel cores traditionally have 256 kB L2-cache of their own, Skylake-X increases this to 1 MB. Aside from that the instruction set is extended with AVX-512, which means that certain instructions can be performed at 512 bits at the same time. Bear in mind that in order to profit from AVX-512 the software needs to be suited for this and as soon as AVX-512 instructions come by in the program code the CPU operates at a lower clock frequency due to the high power consumption that comes with AVX-512. At the time of writing we do not yet know of any AVX-512 consumer software, but this is probably a matter of time.
From inclusive to exclusive
Also important to cover once again is the adjusted cache layout with Skylake-X / Skylake-SP, which we referenced shortly on the previous page. Traditionally the Xeon-processors, and with that the derived high-end desktop CPUs, had 256 kB L2-cache and 2.5 MB shared L3-cache per core. The L2-cache per core is now quadrupled from 256 kB to 1 MB per core. On the other hand the L3-cache is reduced to 1.375 MB per core.
Up until last generation this L3-cache was inclusive, which meant that all data that was placed in the per core specific L2-cache, also had to be in L3. The advantage of this design is that when core A needs data from core B, it can always be found directly in the shared cache. There is no need for a request to copy the data from its own L2-cache to the shared L3-cache. Another advantage is that keeping the caches coherent is straightforward. When multiple cores are working on the same data and one of them makes an adjustment, it is directly processed in L3. Based on that, the other cores can quickly determine whether or not they have data in their own L2-cache that is no longer up-to-date. A disadvantage of an inclusive cache is that the total amount of data that can be cached is smaller: out of the 25 MB L3-cache with the 10-core top model Core i7 6950X processor there was always 2.5 MB which was a copy of the data in the 10 256 kB L2-cache segments. However, with Skylake the cache is no longer inclusive. This means that no part is lost because the saving of a copy of the data in the L2-caches does not happen. This means that the full 13.75 MB can be used for data transfer between cores, its price being that the new caching-algorithms have become a lot more complicated due to the exclusive design.
To be clear: this new cache layout is primarily designed for servers. One of the underlying reasons is presumably that powerful Xeon CPUs are increasingly being used for virtualization, where multiple software-installations operate using a single CPU, which all use their 'own' cores. Transferring data between cores is not as important in such a situation, while speeding up the single-core performance can offer a distinct advantage. The latter is also an advantage on the desktop, although the new design can also be a disadvantage in certain workloads. In our Core i9 7900X review we already saw workloads that scaled better and worse than we would expect, presumably due to these changes.
Turbo Boost 3.0
Last but not least we should also look at Turbo Boost 3.0 again, which we know from the Broadwell-E high-end desktop CPUs. After production Intel internally tests the processor to determine the two "best" cores for every Skylake-X CPU. When only one or two cores are used, the CPU will always try to move the workload to these two cores, after which the maximum turbo clock frequency is increased further. The new 12- to 18-core models have a maximum turbo clock frequency of 4.2 GHz for random cores with single- and dual-threaded applications. Thanks to Turbo Boost 3.0 this is increased to 4.4 GHz when the workload is on the right cores. An extra driver is no longer necessary- support for Turbo Boost 3.0 has been baked into Windows 10 since the Windows 10 Anniversary Edition.
The table below shows all Turbo clock frequencies for the Skylake-X and Kaby Lake-X processors. We see that the 7980XE operates at 4.2 GHz with one or two active cores (4.4 GHz with Turbo Boost 3.0), with 3 or more active cores at 4.0 GHz, with 5 or more active cores at 3.9 GHz, with 13 or more active cores at 3.5 GHz and with 17 or 18 active cores at 3.4 GHz. With the 7960X the Turbo-values range from 4.2 GHz to 3.6 GHz.