Javascript must be enabled to continue!
Screening Deep Learning Inference Accelerators at the Production Lines
View through CrossRef
Artificial Intelligence (AI) accelerators can be divided into two main buckets, one for training and another for inference over the trained models. Computation results of AI inference chipsets are expected to be deterministic for a given input. There are different compute engines on the Inference chip which help in acceleration of the Arithmetic operations. The Inference output results are compared with a golden reference output for the accuracy measurements. There can be many errors which can occur during the Inference execution. These errors could be due to the faulty hardware units and these units should be thoroughly screened in the assembly line before they are deployed by the customers in the data centre. This paper talks about a generic Inference application that has been developed to execute inferences over multiple inputs for various real inference models and stress all the compute engines of the Inference chip. Inference outputs from a specific inference unit are stored and are assumed to be golden and further confirmed as golden statistically. Once the golden reference outputs are established, Inference application is deployed in the pre- and post-production environments to screen out defective units whose actual output do not match the reference. Strategy to compare against itself at mass scale resulted in achieving the Defects Per Million target for the customers.
Academy and Industry Research Collaboration Center (AIRCC)
Title: Screening Deep Learning Inference Accelerators at the Production Lines
Description:
Artificial Intelligence (AI) accelerators can be divided into two main buckets, one for training and another for inference over the trained models.
Computation results of AI inference chipsets are expected to be deterministic for a given input.
There are different compute engines on the Inference chip which help in acceleration of the Arithmetic operations.
The Inference output results are compared with a golden reference output for the accuracy measurements.
There can be many errors which can occur during the Inference execution.
These errors could be due to the faulty hardware units and these units should be thoroughly screened in the assembly line before they are deployed by the customers in the data centre.
This paper talks about a generic Inference application that has been developed to execute inferences over multiple inputs for various real inference models and stress all the compute engines of the Inference chip.
Inference outputs from a specific inference unit are stored and are assumed to be golden and further confirmed as golden statistically.
Once the golden reference outputs are established, Inference application is deployed in the pre- and post-production environments to screen out defective units whose actual output do not match the reference.
Strategy to compare against itself at mass scale resulted in achieving the Defects Per Million target for the customers.
Related Results
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Nanosilicas as Accelerators in Oilwell Cementing at Low Temperatures
Nanosilicas as Accelerators in Oilwell Cementing at Low Temperatures
Abstract
Accelerators are important cementing additives in deepwater wells where low temperatures can lengthen the wait-on-cement (WOC) time, potentially increasing ...
Evaluating the Effectiveness of Randomized and Directed Testbenches in Stress Testing AI Accelerators
Evaluating the Effectiveness of Randomized and Directed Testbenches in Stress Testing AI Accelerators
As the demand for high-performance AI accelerators grows, ensuring their reliability under extreme computational loads becomes paramount. This study evaluates the effectiveness of ...
Deep Learning: Implications for Human Learning and Memory
Deep Learning: Implications for Human Learning and Memory
Recent years have seen an explosion of interest in deep learning and deep neural networks. Deep learning lies at the heart of unprecedented feats of machine intelligence as well as...
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...
Adaptive Dataflow and Precision Optimization for Deep Learning on Configurable Hardware Architectures
Adaptive Dataflow and Precision Optimization for Deep Learning on Configurable Hardware Architectures
As deep learning continues to revolutionize a wide range of domains—from computer vision and natural language processing to autonomous systems and edge computing—the demand for ef...
Lung cancer screening on YouTube: Difficulty of finding balanced information.
Lung cancer screening on YouTube: Difficulty of finding balanced information.
162 Background: Lung cancer (LC) is the leading cause of cancer mortality in the US, the ACS estimates upwards of 220,000 new cases will be diagnosed this year. Recently, the Cent...

