Parallels access directly

5/31/2023

Specifically, it begins with articulating fundamental concepts including the IoT, AI and edge computing. This paper conducts an extensive survey of an end-edge-cloud orchestrated architecture for flexible AIoT systems. These critical challenges can be adequately addressed by introducing edge computing. However, it is challenging or infeasible to process massive amounts of data in the cloud due to the destructive impact of the volume, velocity, and veracity of data and fatal transmission latency on networking infrastructures. The collective integration of AI and the IoT has greatly promoted the rapid development of Artificial Intelligence of Things (AIoT) systems that analyze and respond to external stimuli more intelligently without involvement by humans. To extract complete information from these data, advanced Artificial Intelligence (AI) technology, especially Deep Learning (DL), has proved successful in facilitating data analytics, future prediction and decision-making.

At the end, all the kernels have been merged and a single Vivado design of the miniMD application has been obtained to be later runned onto the Zynq Ultrascale+ device.The Internet of Things (IoT) has created a ubiquitously connected world powered by a multitude of wired and wireless sensors generating a variety of heterogeneous data over time in a myriad of fields and applications. It is important to notice that in the aforementioned optimizations, loops have not been so easy to be speeded up due to their not perfect bounded nature. These optimizations were meant to promote the burst memory transactions and to exploit efficiently the low bandwidth between the External DDR memory and the Programmable Logic (PL) therefore reducing the access time of the external memory. Moreover, it has been possible to notice that the most important optimizations could be performed in the downloading and uploading processes of the data handled by the kernels, directly into the global memory of the FPGA. Specifically, for “neighbor_build” and “force_compute” kernels, different optimizations have been made inside their original OpenCL codes in order to accelerate their execution onto the FPGA. Therefore, using the technique of High-Level-Synthesis it has been directly generated the RTL codes of each the above kernels. After a baselining of the full miniMD application, it has been demonstrated that the task related to the building of the neighbour particles for each molecule of the system (neighbor_build kernel) and the one related to the force computation (force_compute kernel) are the most compute intensive ones and have a prominent role into the total execution time of the application. In the first part of this thesis work, each kernel has been studied in order to understand which of kernel could be accelerated into the FPGA of the Multiprocessor SoC architecture. The miniMD is a simple, parallel molecular dynamics (MD) code composed of five different OpenCL kernels (neighbor_bin, neighbor_build, force_compute, integrate_initial, integrate_final) designed for studying the physical movements of atoms and molecules. High performance computer systems are very important in the field of computational science in fact,this thesis investigated the possibility of using FPGA accelerators to offload the compute intensive parts of a Molecular Dynamic code. Low power consumption requirement is tried to be satisfied using a Multiprocessor System-on-Chip, namely a system mounting on the same package both a ARM x86 processor and an Ultrascale+ FPGA: the whole module has been specifically designed with special attention to power consumption. This thesis framework is inserted within the ExaNeSt EU founded project which has the purpose to prototype energy efficient solutions to produce exascale-level supercomputers. Nevertheless, GPU-based systems are power-hungry and require a power consumption so large, that running and maintaining such systems could be technologically and economically too much expensive. Such systems are hybrid platforms that exploit the pure parallel computation of GPUs in order to reach very high performances. In the present thesis, it has been studied the possibility to insert FPGAs in the world of High Performance Computing (HPC) systems.

0 Comments

Parallels access directly

Leave a Reply.

Author

Archives

Categories