The Sandy Bridge CPU cores inside the Xeon E5-2600 series processor are based on the Nehalem CPU cores from the previous generation Xeons, with a lot of improvements. HyperThreading and Turbo Boost features from the last generation are again present, and have also been updated.
An important update is the support for the AVX instruction set. This allow processors to process 256-bit numbers in one go. Older Xeon CPUs were limited to 128-bit numbers. AVX also has 12 new instructions, allowing it to process three operands, or variables. Software has to be pre-compiled for AVX, but especially for research centres that often compile their own HPC software this won't be much of an obstacle.
Thanks to new AVX instructions the new Xeons can process 256-bit numbers natively.
Various optimisations of the cores ensure that the Xeon E5-2600 processors will run faster than the previous generation with the same number of cores and identical clock frequencies. For example, the Sandy Bridge architecture has a so-called micro-Op cache, that stores processed instructions which it can run faster the next time. The number of execution units in the CPU that transport data to and from the memory has been increased, and Intel has also significantly optimised the floating point execution units,
An important development is that Intel has really improved its Turbo Boost feature. The previous generation Xeons were only able to achieve higher clock frequencies when not every core was in use. In the new generation this is now also possible when all cores are in use, provided that the energy consumption is within acceptable limits. The high-end model Xeon E5-2690, which we tested for this article, can increase by 400 Mhz over the 2.9 GHz standard clock frequency. When only one core is active, it can even increase by 900 MHz. The new Turbo Boost 2.0 feature is also able to make the processor briefly exceed the TDP, for example when it's been idle for a while. This can provide a significant performance gain when there is a lot of fluctuation in the load.
Turbo Boost 2.0 can temporarily let the processor exceed the TDP.
Many server administrators disabled the Turbo boost function in the previous Xeons, because the added performance often did not outweigh the higher power usage. Intel tackled this problem with a two-pronged approach. First of all, the new Xeons scale much better with higher clock frequencies, considering the cores and the L3-cache now run at the same clock frequency. When the cores speed up, so does the L3-cache. On the old Xeons the L3-cache ran at a permanent and lower frequency. This meant that the more the CPU cores sped up, the wider the discrepancy with the L3-cache and the bigger the relative latency. Moreover, the new Xeons have improved techniques for more rapid and accurate management of the chip's energy consumption in order to prevent unwanted peaks. We will touch upon this again later.