• Keine Ergebnisse gefunden

always improves this power consumption issue, and the power of a GPU becomes lower from generation to generation without reducing its computing performance.

Regarding the power consumption in the FPGA, the FPGA-based hardware accelerator has a significantly lower power consumption than the CPU and GPU. It is approximately six to eight times lower. Additionally, using its flexibility, inherent parallel structure, and customized design, the FPGA design also has high computing performance. Therefore, the FPGA-based hardware accelerator provides the highest efficiency or computing per-formance per watt (fps/watt). This means that the FPGA-based hardware accelerators are very efficient and very suitable for systems that require less power consumption with high computing performance. The proposed FPGA-accelerated computing system is limited by the interface to the CPU. Currently, it utilizes a PCI interface to transfer the data from the FPGA to host PC and vice versa. The PCI interface is very slow and is the bottleneck for the system. Therefore, this interface should be upgraded to the PCI-express, which is already used in the GPU hardware accelerator. This issue technically can be fixed because FPGAs are customizeable.

Determining the best technology for an application should be based not only on some quantitative issues (e.g., computing performance, power consumption, and power efficiency) but also on qualitative parameters such as the development process. This development process is also related to the design complexity, development time, and time to market issues. The development process in a GPU is relatively easier and faster than in an FPGA, but more difficult than the CPU. This is because debugging and interactive simulations, as the main factor in the development process, are fully accommodated in a GPU development system, as described in chapter 3. Meanwhile, the FPGA development process is more complicated and time-consuming than that of the GPU. In the FPGA design, the image processing algorithm cannot be developed directly on the targeted FPGA device. This is because the development cycles (e.g., synthesize, translate, map, place, and route) require too much time. Therefore, in the FPGA design for image processing applications, it becomes impracticable to have an interactive design.

6.4 Summary

CPU, FPGA, and GPU implementations for the computation of vision-based multi-robot tracking algorithm were also presented in detail in this chapter.

In conclusion, the implementations of vision-based multi-robot tracking in different technologies (CPU, FPGA, and GPU) are illustrated in Figure 6.18. The CPU technology provides the fastest development time and easiness of programming, but its implemen-tation has issues with the computing performance, power consumption, and power efficiency. Meanwhile, the GPU technology is suitable for implementations that require a high computing performance, good power efficiency, and adequate development time.

However, the implementation of the GPU technology in this application is limited by the power consumption. Finally, the FPGA technology offers a high power efficiency, very good (low) power consumption, and high computing performance. The development time and complexity of the programming implementation are the main drawbacks of the FPGA implementation.

Development Time

Power Efficiency

Computing Performance Power Consumption

CPU, FPGA, and GPU comparison (scale 1 –5, 5 being the best)

CPU FPGA GPU

1

5

Figure 6.18: Comparison of CPU, FPGA, and GPU implementations for vision-based multi-robot tracking application.

7 Conclusions and Outlook

In this thesis, FPGA- and GPU-accelerated computing systems for vision-based multi-robot tracking were proposed. These designs refer to heterogeneous computing systems that combine a CPU and hardware accelerator, either the FPGA or GPU. In many cases of vision-based robot tracking systems, the computational requirements for extracting the relevant information (e.g., locations, orientations, and identities (IDs) of robots) from video data increase along with the number of tracked robots, the video frame size, and the number of operated cameras. In contrast to the development of the previous computing systems, which typically used several high-performance workstations for the parallel processing of data from multiple cameras, the heterogeneous computing system approach releases the host computer from the computation-intensive tasks by utilizing the FPGA or GPU.

This thesis emphasizes the implementations of two distinct heterogeneous computing systems for vision-based multi-robot tracking applications, encompassing the use of the FPGA and GPU as hardware accelerators. The implementations on the FPGA- and GPU-based heterogeneous computing systems have been demonstrated in chapter 4 and chapter 5, respectively. Based on the modular and parallel architecture of the FPGA, a collection of video processing modules was developed, capable of detecting the locations of multiple robots using individual markers. The video processing modules involve two unique architectures for the circle detection of the robot’s marker. The first one integrates a combination of the CHT and graph cluster algorithms, while the second architecture combines the CSW technique with a graph cluster algorithm. Meanwhile, the GPU implementation relies on a large number of lightweight programmable cores that concurrently execute the vision processing algorithm.

Considering the differences between the FPGA and GPU, this work compared and analyzed the FPGA- and GPU-based computing systems to find the optimal system for multi-robot tracking applications. In particular, this thesis implemented FPGA-and GPU-accelerated heterogeneous computing systems, compared the results, FPGA-and determined the advantages that could be achieved using both computing systems for vision-based multi-robot tracking applications. In doing so, this thesis focused on the system architecture, detection performance, computing performance, and power efficiency. The examinations and analysis of the proposed systems were discussed in

chapter 6 using different generations of FPGAs (e.g., Xilinx Virtex-4, Virtex-6, and Virtex-7) and GPUs (e.g., GTX-580 and GTX-780).

7.1 Conclusions

This thesis has described details about the basic concept of a vision-based robot tracking system and the related work. It has shown the need for a computing system that uses the benefits of the CPU and hardware accelerators (e.g., FPGA and GPU) to enhance the computing performance of a vision-based multi-robot tracking algorithm. The architectures of heterogeneous computing systems for vision-based multi-robot tracking and their design flows, both in the FPGA-and GPU-accelerated platforms, have been presented in this thesis.

The result of this thesis show that the FPGA- and GPU-based hardware accelerators strongly enhance the computational performance of the computing system for vision-based multi-robot tracking. These hardware accelerators release the host computer from the computationally intensive tasks, complementing the CPU’s function to perform comprehensive vision-based multi-robot tracking applications. Furthermore, both the FPGA- and GPU-based hardware accelerators can achieve high accuracy, computational performance, and power efficiency.

According to the detection performance, this thesis have shown that the proposed de-signs can handle multi-robot localization with a typical detection performance (precision and recall) of 99% under well-defined lighting conditions, as reported in section 6.1.

This means that the proposed hardware accelerators and implemented algorithms achieve a high detection performance for detecting the robot locations. Both the CHT-graph cluster and CSW-CHT-graph cluster methods produce high detection performances for detecting multiple robots. However, the CSW technique is more favorable than the CHT for both the FPGA and GPU implementations, because of its detection performance and robustness for robot collision situations.

This thesis shows that both the FPGA- and GPU-based hardware accelerators have significantly higher computing performances than the Intel i7 4770K quad-core CPU.

Therefore, both hardware accelerators are very good alternatives to enhance the com-putational performance of a computing system for vision-based multi-robot tracking applications. The FPGA-based hardware accelerator can reach up to 154 fps with a total resolution of 2048×2048 pixels using a Xilinx Virtex-4 FX100-11. The achieved frame rate is optimized by utilizing four streaming hardware accelerators, working in parallel.

Furthermore, the computational performance could be increased by using newer FPGA technology. The GPU implementation, which operates on a higher frequency than the