EDGE AI IS ABOUT RUNNING 'AI MODELS' ON 'EDGE DEVICES' USED IN THE FIELD RATHER THAN IN THE CLOUD. IN RECENT YEARS, EDGE AI HAS ATTRACTED A LOT OF ATTENTION AND ITS USE CASES ARE INCREASING. WE ARE ALSO RECEIVING AN INCREASING NUMBER OF REQUESTS FROM OUR CUSTOMERS FOR CONSULTING AND IMPLEMENTATION OF EDGE AI.
Based on our practical experience of Edge AI development and implementation, this two-part series, including this article, will provide an overview of the trends in Edge AI technology and the points to consider when implementing it.
(1) Trends in edge AI technology (this article)
( 2) Five points to consider when implementing edge AI
1. TRENDS IN EDGE AI TECHNOLOGY
Examples of cases where AI can run on edge devices include automated driving systems such as automated car driving, automated drone navigation, automated construction equipment piloting, etc., and image analysis systems such as anomaly detection using surveillance cameras and automated instrument reading using cameras.
WE ARE ALSO STARTING TO SEE AI IMPLEMENTED IN CONSUMER EDGE DEVICES SUCH AS SMARTPHONE APPS, SMART SPEAKERS, SMART WATCHES AND EVEN RUNNING DEEP LEARNING MODELS IN WEB BROWSERS.
THIS ARTICLE INTRODUCES THE TECHNOLOGIES AND HARDWARE USED IN EDGE AI IMPLEMENTATIONS AND DESCRIBES THE TRENDS AND PERFORMANCE OF HARDWARE IN THIS CONTEXT. IT WILL ALSO INTRODUCE SOME USE CASES.
2. TECHNOLOGY AND HARDWARE FOR IMPLEMENTING AI MODELS ON EDGE DEVICES
AS THE NEED FOR EDGE AI INCREASES, VARIOUS TECHNOLOGIES AND HARDWARE HAVE BEEN DEVELOPED TO IMPLEMENT AI MODELS IN EDGE DEVICES. HERE ARE THREE OF THE MAJOR ONES
The first is model weight reduction technology. This is a technique to reduce the weight of an AI model so that it meets the required specifications and constraints before being loaded onto an edge device.
The second is a compiler. A compiler is needed to ensure that the lightweight AI model runs optimally on the edge device.
The third is a variety of hardware, such as support for special lightweight AI models such as quantization or branch trimming. Hardware is very important and will be discussed in more detail.
In recent years, GPUs and AI accelerators have been increasingly used as hardware to enable edge AI.
There are two types of GPUs: those for so-called servers (workstations) and those for embedded applications. In Edge AI we refer to embedded GPUs, which offer slightly lower performance than their server counterparts, but with the advantage of lower power consumption.
NVIDIA's Jetson series of embedded GPUs are divided into several grades according to their performance (*1): Nano is the lowest-end version, while TX2, AGX Xavier, etc. have higher performance and are almost at the same level as servers. In terms of applications, the Nano is intended for use in embedded applications such as surveillance cameras, while the AGX Xavier is intended for more complex applications such as automated driving.
There are various types of accelerators for accelerating AI applications, three of which are introduced in this article.
The first is Intel's Neural Compute Stick 2, which contains the company's Myriad chip (*2).
The second is Google's Coral (Edge TPU ) accelerator (*3).
And the third is a Field Programmable Gate Array ( FPGA ), often from Xilinx.
Neural Compute Stick 2 and Coral are extremely small and relatively inexpensive.
2: https: //ark.intel.com/content/www/jp/ja/ark/products/140109/intel-neural-compute-stick-2.html
OTHER AI CHIPS AND EDGE AI DEVICES
There is also a wide range of hardware dedicated to specific industries.
Examples include Renesas Electronics' R-Car series, which complies with the ISO 26262 ASIL B/D automotive functional safety standard, Hailo's AI processors, Ambarella's edge AI video processing SoCs and associated platforms, and Blaize's GSP (Graph Streaming Processor) for edge AI. (graph streaming processor) for edge AI.
Both are focused on the benefits of performance, system efficiency and power consumption in edge devices. In addition, mobile devices such as smartphones and tablets are increasingly equipped with processors (SoCs) for neural network processing, and the focus is on unit cost, device size and power consumption.
3. HARDWARE PERFORMANCE FOR EDGE AI
GPUS PERFORM INFERENCE ABOUT 20 TIMES FASTER THAN CPUS
NVIDIA's developer blog from 2017 provides comparative data on the number of images processed by GPU and CPU. Depending on the batch size, the number of images processed per second with NVIDIA's GPU (Tesla P100) was up to 20 times faster than with the CPU alone (*4).
The Jetson TX2, with a GPU based on the same Pascal architecture as the P100, has also been shown to achieve the same level of inference power as a server-class CPU in terms of images processed per second at very low power*5.
4: https: //developer.nvidia.com/blog/deploying-deep-learning-nvidia-tensorrt/
AI ACCELERATORS ENABLE INFERENCES TO BE MADE MORE THAN 30 TIMES FASTER THAN ON THE CPU
As an example of the use of Coral accelerators, a comparison of the model inference speed between an embedded CPU (Arm Cortex) and a Coral accelerator (*6) is also presented. Although there are differences between AI models, the accelerator is up to 30 times faster than the CPU, and inference time is significantly reduced.
In our demonstrations, when Coral was attached to the Raspberry Pi to perform human skeletal estimation (Pose Estimation), the frame rate of the CPU (Arm Cortex) on the Raspberry Pi was as low as 0.1 to 0.2 FPS (5 to 6 seconds). This corresponds to processing about one image per 5-6 seconds). On the other hand, when using the Coral accelerator, the frame rate was about 10 FPS or more, indicating that the use of AI accelerators in embedded systems can significantly increase processing speed.
Computing performance of each hardware
HERE IS A BRIEF INTRODUCTION TO THE PERFORMANCE OF EACH GPU AND AI ACCELERATOR. THE FIGURE BELOW SHOWS THE COMPUTING PERFORMANCE OF EACH HARDWARE. FLOPS AND TOPS ARE OFTEN USED AS A MEASURE OF COMPUTER PERFORMANCE, WHERE FLOPS IS THE NUMBER OF FLOATING POINT SUM OPERATIONS PER SECOND (FP32 OR FP16) AND TOPS IS THE NUMBER OF SUM OPERATIONS PER SECOND FOR THE IDEAL TYPE OF ARITHMETIC UNIT.
Source of performance data: https://www.nvidia.com/ja-jp/autonomous-machines/embedded-systems/
There is also a lot of hardware available that supports int8 operations. In general, neural network inference is often carried out on FP32/FP16, but it is known that with proper quantization, inference on int8 can be done without any problems to meet the accuracy requirements.
With 4 TOPS, Coral is able to process images at a rate of 70 images per second (70 FPS) in the object detection task (MobileNet V2 SSD on Coral Dev Board, *7). By using multiple accelerators, it is also possible to run multiple tasks in parallel without increasing the load on the host CPU.
4. EDGE AI USE CASES IN A WIDE RANGE OF INDUSTRIES
HERE ARE SOME EXAMPLES OF EDGE AI IN ACTION. (CASE STUDIES 1 TO 3 ARE NOT OUR CASE STUDIES.)
Case Study 1: Kurazushi's Plate Counting System
A system has been developed that uses image recognition technology using Raspberry Pi + Coral to understand customer needs based on data such as the type of dish selected by the customer and the timing of the selection. Compared to management by conventional IR-RFID (infrared and wireless IC tag) systems, this system has been reported to improve reliability (*8).
Case study 2: Reducing crop losses at Farmwave
The system uses a camera-equipped crop harvester fitted with a Raspberry Pi and Coral device to monitor the crop harvest in real time. If the yield is low, the parameters of the machine (angle and speed of the harvesting rotor and sieve opening, etc.) can be adjusted in real time to maximise the yield (*9).
Case Study 3: Perimeter Monitoring of Komatsu Mobile Cranes
Using NVIDIA's GPU (Jetson TX2), a system has been developed that processes camera images mounted around the entire circumference of a 2.9t mobile crane, detects people in the vicinity in real time, and issues a warning (*10). This system is expected to improve safety on site.
Case study 4: Human flow analysis using surveillance cameras ( Arayaアラヤcase study)
Using Raspberry Pi and Coral, we are developing a system to track and map people in 2D in real time from surveillance camera images. We use the Edge TPU to process high-resolution Full HD images and achieve inference at around 2 FPS. This is almost fast enough for normal human walking speed, and can be used for shop entry/exit, traffic counting, congestion and usage monitoring, person demographic analysis and behaviour detection.
Click here for more information: Human flow analysis solution using multiple cameras and image recognition AI
IN RECENT YEARS, GPUS AND AI ACCELERATORS HAVE BECOME THE MAINSTREAM HARDWARE USED IN EDGE AI. THESE ENABLE A SIGNIFICANT INCREASE IN PROCESSING SPEED COMPARED TO CPUS AND ARE EXPECTED TO BE IMPLEMENTED IN APPLICATIONS THAT USE CAMERA IMAGES IN PARTICULAR.
In the following article, we will discuss five aspects to consider when implementing Edge AI.
You can also find out more about our Edge AI services at Arayaアラヤ.