-
Intel’s give attention to AI structure is so deep that the precise SKU desk appeared like little greater than an afterthought. Additionally discover no costs can be found.
-
That is the one place you may see general-purpose workload efficiency claims—towards five-year-old Intel-only techniques, and ignoring Spectre/Meltdown mitigations.
-
There isn’t any point out of AMD on this or every other slide. Intel would truly dominate AMD on this slide if it had been proven, since Epyc does not supply AVX-512 optimization.
-
Ice Lake within the information middle later this yr ought to be an fascinating launch—although considerably restricted by the decrease socket rely.
Intel right now announced its third-generation Xeon Scalable (which means Gold and Platinum) processors, together with new generations of its Optane persistent reminiscence (learn: extraordinarily low-latency, high-endurance SSD) and Stratix AI FPGA merchandise.
The truth that AMD is at the moment beating Intel on nearly each conceivable efficiency metric besides hardware-accelerated AI is not information at this level. It is clearly not information to Intel, both, for the reason that firm made no claims in any respect about Xeon Scalable’s efficiency versus competing Epyc Rome processors. Extra curiously, Intel hardly talked about general-purpose computing workloads in any respect.
Discovering a proof of the one non-AI generation-on-generation enchancment proven wanted leaping by way of a number of footnotes. With enough dedication, we finally found that the “1.9X average performance gain” talked about on the overview slide refers to “estimated or simulated” SPECrate 2017 benchmarks evaluating a four-socket Platinum 8380H system to a five-year-old, four-socket E7-8890 v3.
-
Who does not like an excellent cat pic? These pictures of a kitten saved in INT8, BF16, and FP32 information varieties give an excellent overview of the accuracy ranges of every.
-
These case research exhibit each inference and coaching acceleration supplied by the brand new BF16 datatype. Observe the wonderful print—which boils right down to “we ignored Meltdown/Spectre to get big numbers.”
-
When you weren’t happy with a kitten pic, you may play a tacky and theoretically BF16-releated sport. It is simply as a lot enjoyable because it feels like.
To be honest, Intel does appear to have launched some unusually spectacular improvements within the AI house. “Deep Learning Boost,” which formally was simply branding for the AVX-512 instruction set, now encompasses a completely new 16-bit floating level information kind as effectively.
With earlier generations of Xeon Scalable, Intel pioneered and pushed closely for utilizing 8-bit integer—INT8
—inference processing with its OpenVINO library. For inference workloads, Intel argued that the decrease accuracy of INT8
was acceptable typically, whereas providing excessive acceleration of the inference pipeline. For coaching, nevertheless, most functions nonetheless wanted the larger accuracy of FP32
32-bit floating level processing.
The brand new era provides 16-bit floating level processor help, which Intel is looking bfloat16
. Slicing FP32
fashions’ bit-width in half accelerates processing itself, however extra importantly, halves the RAM wanted to maintain fashions in reminiscence. Profiting from the brand new information kind can be less complicated for programmers and codebases utilizing FP32
fashions than conversion to integer could be.
Intel additionally thoughtfully supplied a sport revolving across the BF16 information kind’s effectivity. We can not advocate it both as a sport or as an academic instrument.
Optane storage acceleration
-
Efficiency outcomes “may not reflect all publicly available security updates” feels like weasel-wording for “Meltdown/Spectre mitigations not applied.”
-
The massive attracts to Optane storage are dramatically decrease latency and better write endurance than NAND SSDs can supply.
Intel additionally introduced a brand new, 25 percent-faster era of its Optane “persistent memory” SSDs, which can be utilized to vastly speed up AI and different storage pipelines. Optane SSDs function on 3D Xpoint know-how moderately than the NAND flash typical SSDs do. 3D Xpoint has tremendously larger write endurance and decrease latency than NAND does. The decrease latency and larger write endurance makes it notably enticing as a quick caching know-how, which may even speed up all solid-state arrays.
The massive takeaway right here is that Optane’s extraordinarily low latency permits acceleration of AI pipelines—which often bottleneck on storage—by providing very fast entry to fashions too giant to maintain totally in RAM. For pipelines which contain fast, heavy writes, an Optane cache layer also can considerably enhance the life expectancy of the NAND main storage beneath it, by lowering the full variety of writes which should truly be dedicated to it.

For instance, a 256GB Optane has a 360PB write-endurance spec, whereas a Samsung 850 Professional 256GB SSD is simply specced for 150TB endurance—larger than a 1,000:1 benefit to Optane.
In the meantime, this glorious Tom’s {Hardware} evaluation from 2019 demonstrates simply how far within the mud Optane leaves conventional information center-grade SSDs when it comes to latency.
Stratix 10 NX FPGAs
-
A curve that doubles virtually as soon as per quarter places Moore’s Legislation to disgrace.
-
This block mannequin overview of Stratix claims huge generation-on-generation INT8 inference enhancements at information middle scale.
-
While you want larger densities and higher effectivity than a general-purpose CPU can present, you construct an ASIC. Stratix is Intel’s reply to AI-targeted ASICs.
Lastly, Intel introduced a brand new model of its Stratix FPGA. Area Gate Programmable Arrays can be utilized as {hardware} acceleration for some workloads, permitting extra of the general-purpose CPU cores to sort out duties that the FPGAs cannot.
Itemizing picture by Intel