Next-generation ADAS and autonomous driving (AD) systems, when deployed to market, will require accurate and highspeed recognition, judgment, and operation.
Renesas presented these achievements at International SolidState Circuits Conference 2021 (ISSCC 2021), which take place February 13 to 22, 2021. We will continue to develop and deploy in-vehicle LSI based on this technology. We expect these will contribute to the realization of a safe and secure car society through the spread of ADAS and AD systems.
Convolutional neural networks (CNNs) require large amounts of computation for pattern recognition. As the number of sensors installed increases, higher CNN performance is required. However, as power consumption increases in proportion to performance, a heavy and expensive water-cooling system is needed. It is required to achieve both high deep learning performance and low power consumption that enables a lightweight and cost-effective air-cooling system. Achieving a CNN performance of 60TOPS with an efficiency of 10TOPS/W per one LSI device is the optimal target from a practical point of view.
CNN accelerator with high performance and power efficiency
A CNN accelerator (CNNA) performance/efficiency target is to achieve 60TOPS performance with 10TOPS/W efficiency. From an implementation point of view, it is realized with three identical accelerators instead of one accelerator. One CNNA contains 13,824 MAC arithmetic units and operates at 800MHz.
The theoretical maximum performance of the three CNNAs is 66TOPS. In addition, each CNNA connects 2MB dedicated scratchpad memory (SPM) through a 512-bit interconnect module. This increases the execution efficiency of CNNA, reduces the amount of data transferred between CNNA and external memory (DRAM) by about 90%, and saves the power consumed by the DRAM interface and interconnect. From the actual measurement of test chip, VGG16 has 32TOPS performance with 6.1TOPS/W efficiency, and CNNAoptimized network (Network-A) has 60.6TOPS performance with 13.8TOPS/W efficiency.
Safety mechanism for ASIL D tasks
Next-generation ADAS and AD systems are required to achieve the functional safety of ASIL D, which is the strictest safety level of ISO 26262. The dual core lockstep (DCLS) is one of the methods that can satisfy the metric of ASIL D. Fault can be detected by performing the same process on two redundant hardware and comparing their respective outputs.
CNNA also requires hardware redundancy to meet the ASIL D metrics but simply applying DCLS requires a large MAC compute unit to be redundant. It is not practical because area and power consumption increase significantly. To achieve ASIL D metrics without adding redundant hardware, two CNNAs (CNNA1 and CNNA2) are dynamically configured by software to perform lockstep operation during processing that require safety.
CNNA is used for both image recognition processing (ASIL B) input from the camera and modeling of the surrounding environment from the results input from each sensor (ASIL D). But most of the execution time is the former ASIL B image recognition processing. Therefore, by switching CNNA1 and CNNA2 to lockstep operation only during surrounding environment modeling processing, ASIL D tasks can be achieved without significantly compromising performance or power efficiency.
The following is the lockstep operation of CNNA using lockstep DMAC (LDMAC).
1) LDMAC loads the same data from DRAM into SPM1 and SPM2.
2) CNNA1 and CNNA2 perform the same network processing.
3) LDMAC reads the execution results from SPM1 and SPM2 and compares them. If they do not match, it is judged as fault. Only the result of CNNA1 is stored in DRAM.
Another important factor in achieving ASIL D is freedom from interference (FFI). There are a mix of tasks with different ASILs in the system. They must not interfere to higher ASIL tasks. As mentioned earlier, CNNA is accessed by tasks at different ASIL levels, so the memory space used by each task must be separate.
The mechanism for memory space isolation is implemented in CNNA, LDMAC, and the memory protection tables of the memory management unit (MMU). The context index of the currently running task is given to the transaction output from CNNA and LDMAC. The MMU receives it and switches the context on a transaction-by-transaction basis.