Outlook - Radiation mitigation for SRAM-Based FPGAs in the CBM experiment

The work on this thesis is based on a fully functional detector read-out chain. However, the final setup that will be implemented at CBM eventually will look different than the setup of today and even more different than the setup as it existed at the time the work for this thesis started. This section gives an overview of tasks that have to be addressed in future work.

8.2.1. Fault Tolerant Communication Module

The current work is based on CBMNet technology. However, with the redesign of the CBM magnet, more space for STS electronics became available. This has an impact on all CBM read-out electronics. CERN’s more matureGBTproject [MMK07] will be used as a drop-in replacement for CBMNet, future CBM read-out chains will be based onGBT technology rather than on CBMNet technology. The transition from CBMNet to GBT technology requires a lot of reimplementation of logic and even redefinition of protocols.

8.2.2. SEU Mitigation in Xilinx Series 7 FPGAs

Becausescrubbingis such a successful strategy, Xilinx recently introduced in their series 7 devices an on-chipscrubbingcontroller as a new feature [Xil14a]. If not more than one SEU is present per frame¹, it can be corrected automatically by the chip itself. As the correction of a single bit error is based on error correction codes, no additional memory is required for that task. External action is only required in case of a multi-bit error, the chip then indicates the error and the referring frame number via a configuration interface and waits for external reconfiguration of the corrupted frame. Only for correction of multi-bit upsets, which are much less frequent than single bit errors, a memory device that stores the original configuration is required.

In conjunction with the JTAG feature of the GBT-SCA technology, this allows to per-form scrubbing on a much more elaborate level. As external action is only required for the rare event of a multi-bit upset, the action can be executed from outside the radiation zone. Without on-chipscrubbing, the performance of GBT-SCA JTAG would be too slow for efficientscrubbing, it would then not be feasible to scrub over the long distance from outside the CBM cave.

From CBM perspective, the major advantage of on-chipscrubbing is the missing re-quirement for on-board Flash memory. The usage of Flash technology in the radiation zone was one of the greatest concerns regarding the configuration system used to per-formscrubbingso far. Flash memory is known to suffer severely from total ionizing dose effects (see section 2.2.1 and paragraph “Cumulative Radiation Effects” in 2.4.1), they will eventually stop working after they have been operated in a radiation environment too long.

The Xilinx series 7 on-chipscrubbing feature solves this problem. External reconfigu-ration is required very rarely and can now be executed from outside the CBM cave over the GBT-SCA JTAG feature. In consequence, the memory holding the original FPGA configuration is not exposed to radiation at all.

First efforts of putting on-chip scrubbing of a Xilinx series 7 device into service has already been started by group member Andrei Oancea, and he will also continue his work on this topic.

8.2.3. Resilience

Device failures are rare events but we cannot fully prevent them. Therefore it is important to define the situations under which a full device reset is necessary. The test procedure used during the in-beam tests includes a software based data integrity test (see figure 5.9).

If the test fails twice, the setup is reset. For the in-beam test setup, this was sufficient.

However, for the final CBM experiment with several hundreds of read-out controller boards, data integrity checks cannot be done in software. For that purpose it is required to have a monitor entity outside the radiation zone (e.g. in the DPB). This monitor entity

1The configuration memory of Xilinx FPGAs is logically organized in smaller units, calledframes.

decides whether a device is working properly or not. Therefore, it needs to analyze the data quality, implement watchdog functionality, etc.

A further problem occurs when the design finally has been reset. It then needs to be resynchronized to the global time value of the running system. The current concept that is used by the CBM-ToF group for read-out of detector prototypes and that was used as well for the in-beam tests in the present work depends on a global, simultaneous reset of all time counters in all devices. This concept works sufficiently well for setups with few boards, but it does not scale to several hundreds of boards. For the final setup at CBM, we need a concept for re-synchronization of a single board into the running experiment setup. Basic ideas have already been considered in the current system, the time stamp counters in GET4 ASICS can be set via slow control to arbitrary values and will then start counting upon reception of a signal that is synchronously and periodically distributed on a global scale. For his diploma thesis Johannes Lehrbach has already implemented a proof of concept design that automatically synchronizes the GET4 time to the ROC time [Leh13]. Such a concept needs to be implemented for the ROC time stamp counters as well. The global, synchronous signal can be implemented with so called “deterministic latency messages” or DLMs (see section B.2.3) in case of CBMNet based setups. Similar functionality is also available on GBT based systems where it goes by the name “Timing Trigger and Control” or TTC [MMK07].

I want to close this thesis by giving credit where credit is due. It is not possible to carry out a project at the scale of this thesis without support from other people, I would like to thank everybody who, directly or indirectly, supported me in the process.

First, I would like to thank Prof. Dr. Udo Kebschull for giving me the opportunity to work on this exciting project and for sharing his valuable ideas. He managed to find the right balance between guiding me and giving me all the freedom I needed to realize my own ideas.

I also want to thank Andrei Oancea and Johannes Lehrbach who where brave enough to carry out their diploma theses in the field of the CBM-ToF read-out controller. In do-ing so, both contributed significantly to my work. Heiko Engel did some very valuable preparatory work before I started and helped me a lot with his expertise later on. Fur-thermore, I also want to thank the rest of my colleagues of the IRI working group for the great working atmosphere and all the valuable feedback. Some of my colleagues are not only great companions at work but real friends.

For the great work atmosphere in the collaboration, I also want to thank all of my colleagues at CBM, namely Walter Müller, Dirk Hutter, Jan de Cuveland, Sven Löch-ner, Jochen Frühauf, Pierre-Alain Loizeau, Christian Simon, Ingo DeppLöch-ner, Frank Lemke, Sven Schatral, and many more. I learned a lot from working as part of a collaboration consisting of so many highly skilled people.

Special thanks also to Frederik Grüll, Andrei Oancea, Dirk Hutter, Heiko Engel, Stefan Boettger, Norbert Abel, Cruz Garcia, Hanna Zatschler, Julia Zatschler, Harf Zatschler, and especially Stefan Kirsch for proofreading.

Additional thanks go to Norbert Abel and Jano Gebelein for organizing so much of the administrative tasks that are - unfortunately - necessary for the completion of this kind of work, I appreciate their support very much.

The support from my family shall not remain unmentioned, however, I do not need to spend more words, they know they are great.

And last but definitely not least, Julia, thanks for being as great as you are.

Im Dokument Radiation mitigation for SRAM-Based FPGAs in the CBM experiment (Seite 122-127)