Embedded Solution Product of the Year
Product Name: AMD Xilinx VCK5000 Versal Development Card
The World’s First “0 Dark Silicon” AI Accelerator
Introducing the VCK5000
AI inference is a critical technique to meet current computing demands, particularly in mobile and cloud. These include power constraints, size and (in mobile) weight of conventional computing systems, and cost, which is extremely important in consumer markets.
GPU-type processors handle inferencing workloads very well, although sometimes more than 60 percent of the chip’s processing elements can be left idle while awaiting data from memory. Xilinx VCK5000, optimised for AI inference, overcomes this “dark silicon” challenge, achieving as much as 90 percent utilisation for close-to peak TOPS (Tera Operations per Second) performance; helping unleash the full power of AI to tackle the world’s next computing challenges.
The VCK5000 AI inferencing card is optimised for designs that require high-throughput AI inference and signal-processing compute performance. The overall design solves the “dark silicon” problem: idle processing elements awaiting data from memory. As a result, it delivers up to twice the performance-per-watt of more conventional GPU-based solutions, according to standard benchmark models.
As an FPGA-based system, it can be designed for the specific needs of AI models, in particular, the dataflow requirements. This is the key to solving the dark silicon problem, which can keep GPUs well below 50 percent of the peak TOPS in some workflows.
A fixed-core AI processor is optimal for certain models. However, the rapid pace of change in the world means today’s AI models are becoming larger. Running these on processors designed for smaller models results in high rates of cache misses, greatly reducing the actual rate at which data enters the process hence reducing efficiency.
The VCK5000 AI engines deliver performance comparable to that of an ASIC, while also allowing programmability. In addition, the VLIW cores allow various different types of data passing. The inference processing engine is thus able to adjust for different models based on compiled instructions. Moreover, FPGA fabric is attached to the basic core and the cache-less structure means cache misses simply do not occur. It is possible to create a perfect internal memory flow that ensures data enters the engine on every clock cycle, thereby consistently achieving extremely high efficiency.
Market Impact – Addressing the Challenges
In the constant quest to deliver faster and more powerful compute performance from smaller systems that are less power hungry, maximising the efficiency of the processing engine is critical. Data centres are under pressure to provide more services that demand more sophisticated computations, to more subscribers, more quickly. Mobile robots, including industrial robots, drones, and cars of level-4 (and higher) autonomy, need to be lightweight, fast-moving, agile, intelligent, and safe. Size, weight and power are key concerns, especially in mobile applications.
AI inference holds the key to meeting these demands. VCK5000, as a development card, enables engineers to build inference engines for applications from embedded level to data-centre level, creating opportunities for affordable self-driving cars, fast and safe industrial autonomous guided vehicles (AGVs), and faster cloud services such as financial analysis, genome processing, and industrial use cases such as digital twin.
To help users gain the maximum advantage from this breakthrough efficiency, made possible by near-zero dark silicon AI inference on highly optimised processors, AMD-Xilinx has introduced support for the VCK5000 in the Vitis Unified Software Platform. Software abstraction through Vitis empowers engineers to get results without engaging in traditional register-transfer level (RTL) programming, thereby simplifying and accelerating development. This permits a GPU-like design cycle, enabling engineers to get results by inputting TensorFlow and ResNet 50 models from an environment like the open, collaborative MLPerf into the compiler. The hardware kernel can be developed using C/C++ and AI engine C/C++ high-level abstraction APIs are provided.
In this way, VCK5000, as an overall solution, addresses all aspects of the current AI-computing challenge: greater efficiency in terms of continuous silicon utilisation enables lower power, faster performance, smaller, lighter processing engines where size, weight and power (SWaP) are critical concerns. At the same time, the development environment lets users work with their preferred tools at a high level to enjoy a simple, accelerated design flow that encourages engagement with this new type of architecture.