

#### FIELD OF SCIENCE ENGINEERING AND TECHNOLOGY

SCIENTIFIC DISCIPLINE AUTOMATICS ELECTRONICS, ELECTRICAL ENGINEERING AND SPACE TECHNOLOGY

## **DOCTORAL DISSERTATION**

Pixel Radiation Detectors with in-situ Signal Processing, and Event-Triggered, Throughput-Optimized Readout Methods

Author: mgr inż. Dominik Górni

Supervisor: prof. Grzegorz Deptuch, PhD, DSc

Completed at: AGH University of Krakow, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering



AKADEMIA GÓRNICZO-HUTNICZA IM. STANISŁAWA STASZICA W KRAKOWIE

#### DZIEDZINA NAUK INŻYNIERYJNO-TECHNICZNYCH

DYSCYPLINA AUTOMATYKA, ELEKTRONIKA, ELEKTROTECHNIKA I TECHNOLOGIE KOSMICZNE

### ROZPRAWA DOKTORSKA

Pikselowe detektory promieniowania z in-situ przetwarzaniem sygnałów, i z wyzwalanymi zdarzeniami, zoptymalizowanymi pod kątem przepustowości sposobami odczytu

Autor: mgr inż. Dominik Górni

Promotor rozprawy: prof. dr hab. inż. Grzegorz Deptuch

Praca wykonana: AGH Akademia Górniczo-Hutnicza, Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej

I owe my deepest gratitude to my supervisor, whose scholarly insight, substantive guidance, and high standards shaped this dissertation at every stage. His questions sharpened my arguments, his comments improved the clarity, and his example set the tone for rigor and curiosity. Thank you for the time, patience, and trust you invested in me.

I am equally grateful to the staff of Brookhaven National Laboratory (BNL). Your support, ranging from access to facilities and tools to countless technical discussions and everyday help with logistics, made this work possible. The generosity with which you shared expertise and solved problems alongside me will remain a model of collaboration.

To my friend Piotr – thank you for keeping me moving when momentum faltered: for the checkins, and the steady belief that this could be finished well.

To my parents, thank you for your unwavering love, patience, and encouragement.

Finally, to all colleagues, mentors, reviewers, lab technicians, administrators, and friends whose names I cannot list individually but whose contributions mattered: thank you. Any remaining errors are mine alone.

## **Abstract**

Modern pixel radiation detectors face a persistent readout bottleneck: as pixel counts and event rates increase, conventional frame-based and polling-based schemes waste bandwidth, induce latency, and can bias acquisition due to fixed priorities. This dissertation addressed the problem by proposing, implementing, and validating a new event-driven readout architecture that combined in-situ signal processing with asynchronous, non-priority arbitration to deliver low latency, optimized throughput, and fair data transfer from large pixel matrices. The central hypothesis stated that the novel readout architecture features near-ideal event-driven operation and asynchronous arbitration logic with non-priority access, implemented as a tree of RS-latch-based arbiters, will provide significant improvements for high-density pixel radiation detectors by achieving higher throughput, lower latency, and demonstrably fairer event handling compared to traditional frame-based, polling-based, and priority-encoded readout architectures, thus enabling a new generation of pixel detectors.

The proposed system, named **EDWARD** (Event Driven With Access and Reset Decoder), initiates data movement only when a meaningful event occurs in a pixel. EDWARD employs request/acknowledge handshakes, a non-priority tree of RS-latch-based arbiters, and a multi-phase transaction that can transmits address, timestamp, amplitude, and other data without collisions. In contrast to prior-art Address Event Representation (AER) and snapshot-based Address-Encoder and Reset-Decoder (AERD) schemes, EDWARD guarantees transactional completion and fairness through arbiters with memory elements and does not require periodic snapshot strobes. This eliminates the dead time and the risks of stuck handshakes and priority bias while preserving sparse, event-only traffic.

One of the key architectural contribution lie in the bridge between asynchronous pixels and synchronous data acquisition: acknowledge signal is generated in a form of a *tokens* in the periphery, aligned to a serializer clock, and routed by the arbitration tree only toward active requesting channels (pixels). Each acknowledge edge advance a pixel's local multi-phase readout; when a transaction finish, the token is immediately eligible for reuse. This design preserve the benefits of asynchronous, self-timed logic inside the matrix while exposing a clean, clocked interface at the output.

Two silicon prototypes were realized and tested to validate the concept. The **3FI65P1** chip integrated a  $32 \times 32$  matrix (100  $\mu$ m pitch) with a complete analog front end for spectroscopic imaging (X-ray fluorescence), asynchronous EDWARD logic, configuration/testability, and high-speed serialization. The author implemented the asynchronous digital logic and the configuration-testability-readout (CTR) platform. The analog front end and periphery were developed in collaboration with the Brookhaven National Laboratory (BNL) ASIC Team.

The mixed-signal 3FI65P1 prototype demonstrates EDWARD in practical detector modes. A single pixel (throughput-optimized) mode transmits the addressed pixel's amplitude; a charge sharing compensation (energy-optimized) mode performs a nine-phase readout, including data from neighbor pixels, that preserves bus ownership across phases, verified by one-to-one correlation between the analog sample-and-hold waveform and the serialized address stream. These experiments confirmed correct multi-phase handshaking and bus exclusivity with no spurious samples. In preliminary synchrotron tests at NSLS-II (beamline 17-BM), 3FI65P1 delivered 138 eV FWHM at Ca K $\alpha$  and 308 eV FWHM at Cu K $\alpha$  over a 2.5–20 keV window, demonstrating that the in-situ processing plus event-driven readout met spectroscopic targets in a high-background environment.

The **EDWARD65P1** chip, a digital-only derivative with identical arbitration/serialization but in-pixel programmable event generators, enables controlled, matrix-wide stress tests to quantify latency, throughput, fairness, and token reuse behavior independently of analog effects. The validation employed a combination of simulation (digital and mixed-signal), queuing model analysis, and measurements on fabricated ASICs. An M/G/1/N queuing model captured arrival and service statistics in the serialized output path and guided the interpretation of latency and pile-up under increasing load. In EDWARD65P1 simulations and measurements, the in-pixel Poisson generator reproduced the expected exponential inter-arrival distribution with high fidelity (e.g., measured  $\lambda=15,062.6~\rm s^{-1}$  versus a theoretical rate of 15,070.4 s<sup>-1</sup>), confirming a realistic traffic source for matrix testing.

Performance results supported the hypothesis. Under moderate load, the architecture achieved sub-microsecond average access times while avoiding the matrix-size latency growth typically associated with prioritized or scanned designs. Throughput approached the theoretical acknowledge frequency ceiling, on the order of 10–17 MHz, depending on the serializer speed, because only active pixels engaged the bus, and token reuse eliminated enforced idle cycles. Measured saturation in EDWARD65P1 occurred near the 10 MHz acknowledge limit, in line with the analysis. Arbitration remained uniform and starvation-free across the matrix, with no spatial bias even for bursty Poisson traffic.

Across studies, EDWARD reduced redundant transfers relative to frame readout, avoided polling latency, and minimized dynamic power by remaining quiescent in the absence of events – features especially valuable for sensitive analog front ends. Fair, non-priority arbitration eliminated deterministic bias and protected data integrity without combinational conflicts. Together, these traits improved time resolution, reduced pile-up, and enabled larger arrays or higher frame-equivalent rates without requiring additional Input/Output ports.

Original contributions include: (i) the EDWARD architecture – a fully asynchronous, non-priority, event-driven pixel readout with RS-latch-based arbiters and multi-phase transactions; (ii) tight integration of in-situ signal processing with event-triggered digital transport; (iii) practical asynchronous-to-synchronous interfacing (tokenized acknowledge and serializer alignment) within standard flows; (iv) hardware prototyping in 65 nm CMOS (3FI65P1 and EDWARD65P1) with empirical evidence of fairness, latency, and bandwidth gains; and (v) a modular platform for configuration, testability, and readout control.

The work also documents practical design compromises. Asynchronous arbiters were integrated into an otherwise conventional RTL-to-GDSII flow by introducing custom cells, excluding them from static timing

analysis, and relying on focused verification and mixed-mode simulation to ensure glitch-free operation. These accommodations do not pose fundamental barriers to scaling or porting to advanced nodes.

Finally, the dissertation outlines scalability paths, including hierarchical trees, subarray partitioning, and multi-bus topologies, to reduce worst-case propagation and enable parallel outputs for large matrices. These options trade silicon area for reduced arbitration depth and higher aggregate bandwidth, offering design freedom for detector-specific constraints.

In summary, by re-examining the fundamental principles of pixel readout and demonstrating a fair, asynchronous, event-triggered alternative in silicon, this work advances the state of the art in radiation detector instrumentation. The demonstrated gains in latency, throughput, and energy efficiency, together with robust asynchronous-to-synchronous interfacing, make the approach attractive for next-generation X-ray imaging, particle tracking at high-luminosity colliders, and other experiments that demand fast, unbiased, and bandwidth-efficient detection.

## Streszczenie

Nowoczesne pikselowe detektory promieniowania zmagają się z utrzymującym się wąskim gardłem odczytu: wraz ze wzrostem liczby pikseli i częstości zdarzeń konwencjonalne schematy oparte na odczycie ramkowym i sondowaniu marnują pasmo, wprowadzają opóźnienia i mogą powodować stronniczość akwizycji z powodu stałych priorytetów. Niniejsza rozprawa rozwiązała ten problem poprzez zaproponowanie, implementację i weryfikację nowej, zdarzeniowej architektury odczytu, łączącej przetwarzanie sygnału in-situ z asynchronicznym, nie priorytetowym arbitrażem w celu zapewnienia niskich opóźnień, zoptymalizowanej przepustowości oraz sprawiedliwego transferu danych z dużych matryc pikseli. Główna hipoteza zakładała, że nowa architektura odczytu, charakteryzująca się niemal idealną pracą zdarzeniową oraz asynchroniczną logiką arbitrażu z dostępem bez priorytetów, zaimplementowaną jako drzewo arbitrów opartych na zatrzaskach RS, zapewni istotną poprawę dla gęstych detektorów pikselowych promieniowania, osiągając wyższą przepustowość, niższe opóźnienia i wykazalnie sprawiedliwszą obsługę zdarzeń w porównaniu z tradycyjnymi architekturami odczytu ramkowymi, opartymi na sondowaniu i z kodowaniem priorytetów, umożliwiając tym samym nową generację detektorów pikselowych.

Proponowany system, nazwany **EDWARD** (ang. Event Driven With Access and Reset Decoder), inicjuje przepływ danych tylko wtedy, gdy w pikselu wystąpi istotne zdarzenie. EDWARD wykorzystuje uzgodnienia (ang. handshakes) typu żądanie/potwierdzenie, nie priorytetowe drzewo arbitrów opartych na zatrzaskach RS oraz wielofazową transakcję, która może przekazywać adres, znacznik czasu, amplitudę i inne dane bez kolizji. W przeciwieństwie do Address Event Representation (AER) i migawkowych schematów Address-Encoder and Reset-Decoder (AERD) znanych z literatury, EDWARD gwarantuje domknięcie transakcyjne i sprawiedliwość dzięki arbitrom z elementami pamięci i nie wymaga okresowych sygnałów strobujących. Eliminuje to martwy czas oraz ryzyko zawieszających się uzgodnień i uprzedzeń priorytetowych, przy jednoczesnym zachowaniu rzadkiego, wyłącznie zdarzeniowego ruchu.

Jednym z kluczowych wkładów architektonicznych jest pomost między asynchronicznymi pikselami a synchroniczną akwizycją danych: sygnał potwierdzenia jest generowany w peryferiach w postaci *żetonów* (ang. token), zsynchronizowanych do zegara serializera i kierowanych przez drzewo arbitrażu jedynie do aktywnie żądających kanałów (pikseli). Każde zbocze potwierdzenia przesuwa lokalny, wielofazowy odczyt piksela; gdy transakcja się kończy, żeton jest natychmiast gotów do ponownego użycia. Ten projekt zachowuje zalety asynchronicznej, samotaktującej się logiki wewnątrz matrycy, jednocześnie udostępniając czysty, zegarowy interfejs na wyjściu.

Zrealizowano i przetestowano dwa prototypy wykonane w technologii krzemowej w celu weryfikacji koncepcji. Układ **3FI65P1** integrował matrycę  $32 \times 32$  (rozstaw 100  $\mu$ m) z kompletnym anal-

ogowym torem wejściowym (ang. analog front-end) do obrazowania spektroskopowego (fluorescencja rentgenowska), asynchroniczną logiką EDWARD, mechanizmami konfiguracji/testowalności oraz szybką konwersję równoległo-szeregową (ang. high-speed serialization). Autor zaimplementował asynchroniczną logikę cyfrową oraz platformę konfiguracji–testowalności–odczytu (CTR). Analogowy tor wejściowy i peryferia opracowano we współpracy z zespołem ASIC Brookhaven National Laboratory (BNL).

Prototyp mieszany 3FI65P1 demonstruje EDWARD w praktycznych trybach pracy detektora. Tryb pojedynczego piksela (zoptymalizowany pod kątem przepustowości) przesyła amplitudę adresowanego piksela; tryb kompensacji dzielenia ładunku (zoptymalizowany pod kątem energii) realizuje dziewięciofazowy odczyt, obejmujący dane z pikseli sąsiednich, który utrzymuje własność magistrali między fazami, co zweryfikowano poprzez korelację jeden-do-jeden między analogowym przebiegiem z układu próbkującego a strumieniem danych w postaci adresów. Eksperymenty te potwierdziły poprawny wielofazowe uzgodnienia i wyłączność magistrali bez fałszywych próbek. We wstępnych testach na synchrotronie NSLS-II (linia 17-BM) 3FI65P1 osiągnął 138 eV FWHM dla Ca K $\alpha$  oraz 308 eV FWHM dla Cu K $\alpha$  w oknie 2.5–20 keV, pokazując, że przetwarzanie in-situ wraz ze zdarzeniowym odczytem spełniły cele spektroskopowe w środowisku o wysokim tle.

Układ **EDWARD65P1**, cyfrowa pochodna z identycznym arbitrażem/konwersją równoległo-szeregową, lecz z programowalnymi w pikselu generatorami zdarzeń, umożliwia kontrolowane, obejmujące całą matrycę testy przeciążeniowe w celu ilościowego określenia opóźnień, przepustowości, sprawiedliwości i zachowania ponownego użycia żetonów, niezależnie od efektów analogowych. Walidacja wykorzystywała kombinację symulacji (cyfrowych i mieszanych), analizy modelu kolejkowego oraz pomiarów na wytworzonych układach ASIC. Model kolejkowy M/G/1/N opisywał statystyki przyjść i obsługi w ścieżce wyjściowej i kierował interpretacją opóźnień oraz nakładania się zdarzeń (ang. pile-up) przy rosnącym obciążeniu. W symulacjach i pomiarach EDWARD65P1, generator Poissona w pikselu odtworzył oczekiwany wykładniczy rozkład czasów międzyprzyjściowych z wysoką wiernością (np. zmierzone  $\lambda=15062, 6~\rm s^{-1}$  wobec teoretycznej wartości  $15070, 4~\rm s^{-1}$ ), potwierdzając realistyczne źródło ruchu do testowania matrycy.

Wyniki wydajnościowe potwierdziły hipotezę. Przy umiarkowanym obciążeniu architektura osiągała średnie czasy dostępu poniżej mikrosekundy, unikając wzrostu opóźnień zależnego od rozmiaru matrycy, typowego dla konstrukcji z priorytetami lub skanowaniem. Przepustowość zbliżała się do teoretycznego pułapu częstotliwości potwierdzeń, rzędu 10–17 MHz, w zależności od szybkości konwersji równoległoszeregowej, ponieważ tylko aktywne piksele zajmowały magistralę, a ponowne użycie żetonów eliminowało wymuszone cykle bezczynności. Zmierzona saturacja w EDWARD65P1 wystąpiła w pobliżu limitu potwierdzeń 10 MHz, zgodnie z analizą. Arbitraż pozostawał jednolity i wolny od zagłodzenia w całej matrycy, bez przestrzennej stronniczości, nawet dla impulsowego charakteru procesu Poissona.

W przekroju badań EDWARD redukował redundantne transfery względem odczytu ramkowego, unikał opóźnień sondowania i minimalizował moc dynamiczną, pozostając w stanie spoczynku przy braku zdarzeń – cechy szczególnie cenne dla czułych analogowych front-endów. Sprawiedliwy, nie priorytetowy arbitraż eliminował deterministyczną stronniczość i chronił integralność danych bez konfliktów kombinacyjnych. Łącznie cechy te poprawiały rozdzielczość czasową, redukowały pile-up i umożliwiały większe matryce

lub wyższe równoważne częstotliwości ramkowe bez konieczności dodawania dodatkowych portów wejścia/wyjścia.

Oryginalny wkład pracy obejmuje: (i) architekturę EDWARD – w pełni asynchroniczny, nie priorytetowy, zdarzeniowy odczyt pikseli z arbitrami opartymi na zatrzaskach RS i transakcjami wielofazowymi; (ii) ścisłą integrację przetwarzania in-situ sygnałów z wyzwalanym zdarzeniami odczytem cyfrowym; (iii) praktyczne łączenie asynchroniczno-synchroniczne (tokenizowane potwierdzenie i wyrównanie do serializera) w standardowych ścieżkach projektowania; (iv) prototypowy sprzętowe w technologii 65 nm CMOS (3FI65P1 i EDWARD65P1) z empirycznymi dowodami zysków w zakresie sprawiedliwości, opóźnień i przepustowości; oraz (v) modułową platformę konfiguracji, testowalności i sterowania odczytem.

Praca dokumentuje również praktyczne kompromisy projektowe. Asynchroniczne arbitry włączono do inaczej konwencjonalnego przepływu RTL-do-GDSII poprzez wprowadzenie komórek niestandardowych, wyłączenie ich ze statycznej analizy czasowej oraz oparciu się na ukierunkowanej weryfikacji i symulacji trybu mieszanego w celu zapewnienia pracy wolnej od zakłóceń. Te dostosowania nie stanowią zasadniczych przeszkód dla skalowania ani przenoszenia do zaawansowanych technologii.

Wreszcie, rozprawa zarysowuje ścieżki skalowalności, w tym drzewa hierarchiczne, tworzenie podmacierzy i topologie wielomagistralowe, w celu ograniczenia najgorszego przypadku propagacji i umożliwienia równoległych wyjść dla dużych matryc. Opcje te wymieniają powierzchnię układu krzemowego na mniejszą głębokość arbitrażu i większą łączną przepustowość, oferując swobodę projektową przy specyficznych ograniczeniach detektora.

Podsumowując, ponowne przeanalizowanie podstawowych zasad odczytu pikseli i demonstracja w technologii krzemowej sprawiedliwej, asynchronicznej alternatywy wyzwalanej zdarzeniami dla odczytu przesuwa stan techniki w instrumentacji detektorów promieniowania. Wykazane zyski w zakresie opóźnień, przepustowości i efektywności energetycznej, wraz z odpornym łączeniem asynchroniczno-synchronicznym, czynią to podejście atrakcyjnym dla następnej generacji obrazowania rentgenowskiego, śledzenia cząstek w zderzaczach o wysokiej świetlności oraz innych eksperymentów wymagających szybkiej, bezstronnej i zoptymalizowanej pod kątem przepustowości detekcji.

## **Contents**

| Li                            | st of I | igures   |                                                            | xvii |
|-------------------------------|---------|----------|------------------------------------------------------------|------|
| Li                            | st of A | Abbrevi  | ations                                                     | xxi  |
| 1                             | Intr    | oductio  | on.                                                        | 1    |
| 1.1 Background and Motivation |         |          |                                                            |      |
|                               |         | 1.1.1    | The Expanding Field of Radiation Detector Applications     | . 1  |
|                               |         | 1.1.2    | Pixel Radiation Detectors                                  | . 2  |
|                               |         | 1.1.3    | Challenges and Bottlenecks in Pixel Detector Readout       | . 4  |
|                               | 1.2     | Resea    | arch Problem Statement and Objectives                      | . 5  |
|                               |         | 1.2.1    | Limitations of Existing Readout Architectures              | . 5  |
|                               |         | 1.2.2    | Research Questions and Hypothesis                          | . 7  |
|                               |         | 1.2.3    | Dissertation Objectives Aligned with Research Questions    | . 8  |
|                               | 1.3     | Disse    | rtation Scope                                              | . 9  |
|                               | 1.4     | Disse    | rtation Structure                                          | . 10 |
| 2                             | Lite    | rature ] | Review                                                     | 13   |
|                               | 2.1     | Funda    | amental Principles of Radiation Detectors                  | . 13 |
|                               |         | 2.1.1    | Types of Radiation Detectors                               | . 13 |
|                               |         | 2.1.2    | Radiation Interaction Mechanisms                           | . 16 |
|                               |         | 2.1.3    | Signal Formation in Semiconductor Detectors                | . 18 |
|                               | 2.2     | Solid-   | -State Pixel Detector Types                                | . 19 |
|                               |         | 2.2.1    | Hybrid Pixel Detectors (HPD)                               | . 19 |
|                               |         | 2.2.2    | Monolithic Active Pixel Sensors (MAPS)                     | . 21 |
|                               |         | 2.2.3    | Comparative Summary of Hybrid and Monolithic Architectures | . 24 |
|                               | 2.3     | In-Sit   | tu Signal Processing Techniques                            | . 24 |
|                               |         | 2.3.1    | Motivation for In-Situ Signal Processing                   | . 24 |
|                               |         | 2.3.2    | Established Techniques                                     | . 25 |
|                               |         | 2.3.3    | Detector Operation Modes                                   | . 27 |
|                               |         | 2.3.4    | Noise Considerations                                       | . 28 |
|                               |         | 2.3.5    | Challenges in Implementing In-Situ Processing              | . 29 |
|                               | 2.4     | Tradit   | tional Readout Methods                                     | . 29 |
|                               |         | 2.4.1    | Direct Link Readout Architecture                           | . 29 |

|   |      | 2.4.2  | Frame-Based Readout Architecture                                           | 31 |
|---|------|--------|----------------------------------------------------------------------------|----|
|   |      | 2.4.3  | Polling-Based Readout Architecture: General Concept and Example from VIP2a | 33 |
|   |      | 2.4.4  | Summary of Conventional Readout Architectures                              | 34 |
|   | 2.5  | Asyno  | chronous Logic in Pixel-Detector Readout                                   | 36 |
|   | 2.6  | Asyno  | chronous, Event-Driven Readout Architectures                               | 40 |
|   |      | 2.6.1  | The Address-Event Representation Protocol                                  | 40 |
|   |      | 2.6.2  | Address-Encoder and Reset-Decoder Architecture                             | 42 |
| 3 | Prop | osed E | DWARD Architecture                                                         | 45 |
|   | 3.1  | Archi  | tecture Overview and Design Objectives                                     | 45 |
|   |      | 3.1.1  | Rationale for Event-Driven Approach                                        | 45 |
|   |      | 3.1.2  | Objectives: Throughput Optimization, Latency Minimization, Fairness        | 45 |
|   | 3.2  | EDW    | ARD System-Level Description                                               | 46 |
|   |      | 3.2.1  | Functional Block Diagram                                                   | 46 |
|   | 3.3  | In-Ch  | nannel Logic and Data Handling                                             | 47 |
|   |      | 3.3.1  | Event Detection and Request Generation                                     | 47 |
|   |      | 3.3.2  | Token Concept and Asynchronous Handshake                                   | 47 |
|   |      | 3.3.3  | Multi-Phase Readout and Done Signal                                        | 47 |
|   |      | 3.3.4  | Parallel Digital and Analog Output                                         | 48 |
|   |      | 3.3.5  | Asynchronous Logic and Phase Progression                                   | 48 |
|   |      | 3.3.6  | Channel Configuration and Operational Modes                                | 49 |
|   | 3.4  | Asyno  | chronous Arbitration Mechanism                                             | 50 |
|   |      | 3.4.1  | Seitz Arbiter and Grant Logic                                              | 50 |
|   |      | 3.4.2  | Arbitration Cell Type 0: Baseline Behavior                                 | 51 |
|   |      | 3.4.3  | Arbitration Cell Type II: Framed Request Gating                            | 53 |
|   |      | 3.4.4  | Arbitration Cell Type I: Fairness-Oriented Design                          | 54 |
|   |      | 3.4.5  | Arbitration Tree Organization                                              | 56 |
|   | 3.5  | Synch  | nronization Mechanism and Peripheral Operation                             | 58 |
|   |      | 3.5.1  | Global Clock and Acknowledge Token Relationship                            | 61 |
|   |      | 3.5.2  | Token Lifetime and Reuse                                                   | 61 |
|   |      | 3.5.3  | Local vs. Global Synchronization                                           | 61 |
|   |      | 3.5.4  | Valid Data Detection and Idle State Handling                               | 62 |
|   |      | 3.5.5  | Data Serialization and Output Streaming                                    | 62 |
|   |      | 3.5.6  | Architectural Scalability                                                  | 64 |
|   |      | 3.5.7  | Hierarchical Trees and Multi-Bus Topologies                                | 64 |
|   | 3.6  | 3FI65  | SP1 Chip                                                                   | 67 |
|   |      | 3.6.1  | Physical Structure and Pixel Matrix Organization                           | 67 |
|   |      | 3.6.2  | Analog Front-End Architecture                                              | 69 |
|   |      | 3.6.3  | Implementation of the EDWARD Readout in 3FI65P1                            | 70 |
|   |      | 3.6.4  | Configuration and Testability Infrastructure                               | 72 |

|   |     | 3.6.5   | Peripheral Circuitry and Support Blocks                                      | 7  | 3 |
|---|-----|---------|------------------------------------------------------------------------------|----|---|
|   | 3.7 |         | ARD65P1 chip                                                                 |    |   |
|   |     | 3.7.1   | Motivation                                                                   |    |   |
|   |     | 3.7.2   | Architecture Overview                                                        |    |   |
|   |     | 3.7.3   | Poisson-Distribution Event Generator                                         |    |   |
|   |     |         |                                                                              |    |   |
| 4 | Sim | ulation | and Performance Evaluation: Digital and Mixed-Signal Analyses                | 8  | 7 |
|   | 4.1 | Digita  | al Testbench Architecture Overview                                           | 8  | 7 |
|   | 4.2 | Digita  | al Test Campaign: RTL and Signoff Verification                               | 9  | 1 |
|   |     | 4.2.1   | Global Control Signals                                                       | 9  | 1 |
|   |     | 4.2.2   | Configuration Parameters                                                     | 9  | 2 |
|   |     | 4.2.3   | Enable Test                                                                  | 9  | 3 |
|   |     | 4.2.4   | Force Test v1                                                                | 9  | 3 |
|   |     | 4.2.5   | Force Test v2                                                                | 9  | 5 |
|   |     | 4.2.6   | Mode Test                                                                    | 9  | 8 |
|   |     | 4.2.7   | Testability Test                                                             | 9  | 8 |
|   |     | 4.2.8   | Memory Test                                                                  | 10 | 0 |
|   |     | 4.2.9   | Summary of Digital Functional Verification                                   | 10 | 2 |
|   | 4.3 | Mixed   | d-Signal Timing Evaluation of a Single Readout Group                         | 10 | 3 |
|   |     | 4.3.1   | Simulation Setup and Objectives                                              | 10 | 3 |
|   |     | 4.3.2   | Key Timing Metrics                                                           | 10 | 3 |
|   |     | 4.3.3   | Measured Timing Distributions and Spatial Variability                        | 10 | 4 |
|   |     | 4.3.4   | Summary of Mixed-Signal Timing Evaluation                                    | 10 | 8 |
|   | 4.4 | Analo   | og and Mixed-Signal Simulation of EDWARD65P1 and In-Pixel Event Generator    | 10 | 8 |
|   |     | 4.4.1   | Analog Simulation of In-Pixel Clock Generator                                | 10 | 9 |
|   |     | 4.4.2   | Digital Verification of Poisson-Based Signal Generator                       | 11 | 0 |
|   |     | 4.4.3   | Digital Simulation with Variable Event Rates and Queueing Model Comparison . | 11 | 1 |
| _ | _   |         |                                                                              |    | _ |
| 5 | _   |         | tal Validation                                                               | 12 |   |
|   | 5.1 |         | duction and Objectives                                                       |    |   |
|   | 5.2 | •       | rimental Setup                                                               |    |   |
|   |     | 5.2.1   | ASIC Characterization Platforms                                              |    |   |
|   |     | 5.2.2   | Synchrotron Measurement at NSLS-II                                           |    |   |
|   |     | 5.2.3   | Configuration and Control Methodology                                        |    |   |
|   | 5.3 |         | lts: 3FI65P1 Experimental Testing                                            |    |   |
|   |     | 5.3.1   | Readout-Mode Verification of the 3FI65P1 Prototype                           |    |   |
|   |     | 5.3.2   | Preliminary Synchrotron Beamline Results                                     |    |   |
|   | 5.4 |         | Its: EDWARD65P1 Performance Validation                                       |    |   |
|   |     | 5.4.1   | Clock Generator Characterization                                             | 12 | 7 |
|   |     | 5.4.2   | Readout and Generator Characterization                                       | 12 | 8 |

| 6   | Conc    | Conclusion 1                                             |     |  |  |  |
|-----|---------|----------------------------------------------------------|-----|--|--|--|
|     | 6.1     | Restatement of Research Purpose                          | 133 |  |  |  |
|     | 6.2     | Integrated Discussion of Research Questions and Outcomes | 133 |  |  |  |
|     | 6.3     | Key Contributions and Scientific Significance            | 137 |  |  |  |
|     | 6.4     | Limitations and Challenges                               | 139 |  |  |  |
|     | 6.5     | Broader Implications and Recommendations                 | 141 |  |  |  |
|     | 6.6     | Directions for Future Work                               | 142 |  |  |  |
|     | 6.7     | Final Reflections                                        | 144 |  |  |  |
| Bil | bliogra | aphy                                                     | 147 |  |  |  |

# **List of Figures**

| 1.1  | Diverse industry applications of radiation-detection technology                               | 2  |
|------|-----------------------------------------------------------------------------------------------|----|
| 1.2  | Example of pixel detector X-Ray Camera                                                        | 3  |
| 1.3  | Simplified schematic of pixelated detector readout structure                                  | 5  |
| 2.1  | Examples of different radiation detector implementations: (a) a gaseous detector, (b) a scin- |    |
|      | tillation detector, (c) a semiconductor detector                                              | 15 |
| 2.2  | Structure of a hybrid detector                                                                | 19 |
| 2.3  | Structure of MAPS detector                                                                    | 21 |
| 2.4  | Structural cross-section of a pixel in (a) a standard process MAPS and (b) a modern MAPS.     | 22 |
| 2.5  | Example of in-situ signal processing chain based on chip HEXID65P1                            | 25 |
| 2.6  | $CR\text{-}RC^n$ pulse shaper                                                                 | 26 |
| 2.7  | General structure of a direct link readout architecture                                       | 30 |
| 2.8  | Maia X-ray Microprobe Detector Array System                                                   | 30 |
| 2.9  | Block diagram and microphotograph of AVG3_Dev ASIC                                            | 31 |
| 2.10 | Timepix2 chip                                                                                 | 31 |
| 2.11 | 3D-stacked CMOS image sensor architecture with global and rolling shutter modes               | 32 |
| 2.12 | Polling-based token passing scheme, as used in VIP2a                                          | 33 |
| 2.13 | Basic combinational logic gates: (a) inverter, (b) AND/NAND, (c) OR/NOR, (d) XOR/XNOR.        | 37 |
| 2.14 | RS latch: a fundamental asynchronous sequential element, constructed from cross-coupled       |    |
|      | NAND or NOR gates                                                                             | 37 |
| 2.15 | Comparison of synchronous and asynchronous logic paradigms                                    | 37 |
| 2.16 | Different types of logic hazards in digital circuits                                          | 38 |
| 2.17 | Simulated latch metastability under increasingly fine input timing shift                      | 38 |
| 2.18 | Symmetric and asymmetric C-element implementations for asynchronous synchronization           | 39 |
| 2.19 | Arbiter circuit together with metastability filter used for asynchronous priority resolution  | 39 |
| 2.20 | Conceptual diagram of AER communication                                                       | 41 |
| 2.21 | The MEPHISTO architecture.                                                                    | 42 |
| 2.22 | General structure overview of ALPIDE chip                                                     | 43 |
| 2.23 | Basic logic block of the AERD system: priority logic, address encoder, and reset decoder      | 44 |
| 3.1  | EDWARD architecture block diagram                                                             | 46 |
| 3.2  | Timing diagram of a multi-phase in-channel readout                                            | 48 |
| 3 3  | In-channel logic structure including the controller readout phaser, and done indicator        | 49 |

| Simplified schematic of shared bus interfaces                                                | 50                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|----------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Difference between non-greedy/fair arbitration and greedy/unfair arbitration                 | 51                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Structure of arbitration cell Type 0 using Seitz arbiter                                     | 51                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Waveforms showing arbitration cell Type 0 behavior under simultaneous requests               | 52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Structure of arbitration cell Type II                                                        | 54                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Waveforms from a tree composed entirely of Type II arbitration cells                         | 54                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Structure of arbitration cell Type I with fairness-enhancing logic                           | 55                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Waveforms showing the operation of arbitration cell Type I when only two neighboring         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| channels request readout                                                                     | 56                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Examples of NP and PN stage connections in the arbitration tree                              | 58                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Readout sequence under unfair arbitration (with Type II cells)                               | 59                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Readout sequence under fairness-enhanced arbitration (with Type I cells)                     | 60                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Peripheral synchronization block                                                             | 61                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Conceptual serializer architecture compatible with EDWARD                                    | 62                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Example waveforms of the serializer shown in Figure 3.16                                     | 63                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Sub-array grouping with a spatially distributed arbitration tree and a common serializer     | 65                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Architecture with independent arbitration trees and separate output paths for each group     | 66                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Illustration of the full-field fluorescence imaging concept, where the sample is illuminated |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| uniformly and the detector records the spatial and spectral content of the emitted $X$ -rays | 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Layout of a single $8 \times 8$ pixel group                                                  | 68                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Top-level layout of the 3FI65P1 matrix                                                       | 69                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Custom slow control interface based on $I^2C$ and internal SPB                               | 71                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Custom dual-row Seitz arbitration cell implemented in a 65 nm CMOS process and inte-         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| grated into the standard cell library for hierarchical synthesis                             | 71                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Structure of the developed CTR platform code                                                 | 74                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Simplified block diagram of the 3FI65P1 showing the elements of the CTR platform             | 77                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Clock generator block diagram                                                                | 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Example of 16 bit LFSRs: Fibonacci and Galois architectures with the same polynomial         | 82                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Block diagram of the Poisson process emulator implemented in each pixel                      | 84                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Block diagram of the functional verification environment                                     | 88                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Enable test flowchart                                                                        | 94                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Force Test v1 flowchart                                                                      | 96                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Force Test v2 flowchart                                                                      | 97                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Mode Test flowchart                                                                          | 99                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Testability Test flowchart                                                                   | 101                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Memory Test flowchart                                                                        | 102                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Timing diagram illustrating the definition of key timing parameters in the arbitration and   | 104                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| •                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                                                              | Structure of arbitration cell Type II.  Waveforms from a tree composed entirely of Type II arbitration cells.  Structure of arbitration cell Type I with fairness-enhancing logic.  Waveforms showing the operation of arbitration cell Type I when only two neighboring channels request readout.  Examples of NP and PN stage connections in the arbitration tree.  Readout sequence under unfair arbitration (with Type II cells).  Readout sequence under fairness-enhanced arbitration (with Type II cells).  Peripheral synchronization block.  Conceptual serializer architecture compatible with EDWARD.  Example waveforms of the serializer shown in Figure 3.16.  Sub-array grouping with a spatially distributed arbitration tree and a common serializer.  Architecture with independent arbitration trees and separate output paths for each group.  Architecture with independent arbitration trees and separate output paths for each group.  Illustration of the full-field fluorescence imaging concept, where the sample is illuminated uniformly and the detector records the spatial and spectral content of the emitted X-rays.  Layout of a single 8 × 8 pixel group.  Top-level layout of the 3FI65P1 matrix.  Custom slow control interface based on 1 <sup>2</sup> C and internal SPB.  Custom dual-row Seitz arbitration cell implemented in a 65 nm CMOS process and integrated into the standard cell library for hierarchical synthesis.  Structure of the developed CTR platform code.  Simplified block diagram of the 3FI65P1 showing the elements of the CTR platform.  Clock generator block diagram.  Example of 16 bit LFSRs: Fibonacci and Galois architectures with the same polynomial.  Block diagram of the Poisson process emulator implemented in each pixel.  Block diagram of the functional verification environment.  Enable test flowchart.  Force Test v1 flowchart.  Force Test v2 flowchart.  Mode Test flowchart.  Memory Test flowchart. |

| 4.10 | Per-pixel timing maps with $1\sigma$ error bars for the EDWARD readout chain                                   | 107 |
|------|----------------------------------------------------------------------------------------------------------------|-----|
| 4.11 | Matrix of mean token redistribution delay as a function of source pixel and destination pixel.                 | 107 |
| 4.12 | Number of arbitration accesses per pixel address over the course of simulation                                 | 108 |
| 4.13 | Extracted-view simulation results of the in-pixel clock generator                                              | 109 |
| 4.14 | Histogram of time intervals between events generated by the digital Poisson generator                          | 110 |
| 4.15 | Readout delay distribution when only one pixel is active                                                       | 112 |
| 4.16 | Results obtained during simulation with low event rate                                                         | 118 |
| 4.17 | Results obtained during simulation with medium event rate                                                      | 119 |
| 4.18 | Results obtained during simulation with high event rate                                                        | 120 |
| 5.1  | NI sbRIO-9629 Block Diagram.                                                                                   | 123 |
| 5.2  | Setup used for testing                                                                                         | 123 |
| 5.3  | Aerial view of the National NSLS-II at Brookhaven National Laboratory.                                         | 124 |
| 5.4  | Schematic of the beamline configuration with the main components: X-ray beam, sample,                          |     |
|      | and detector                                                                                                   | 124 |
| 5.5  | Oscilloscope capture of a complete $400\mathrm{kHz}\ \mathrm{I^2C}$ write transaction used to program a single |     |
|      | pixel                                                                                                          | 124 |
| 5.6  | Data recorded during self-pixel readout and charge-sharing readout.                                            | 125 |
| 5.7  | Representative fluorescence spectrum acquired at NSLS-II beamline 17-BM with sequential                        |     |
|      | Ca, Mn, Cu, Pb, and Zr foil targets                                                                            | 127 |
| 5.8  | Measured vs. simulated oscillator frequency across 4-bit configuration values for a subset                     |     |
|      | of 320 pixels                                                                                                  | 128 |
| 5.9  | Inter-event interval PDFs at various mask settings                                                             | 130 |
| 5.10 | Inter-event interval PDFs for a single active pixel                                                            | 131 |
| 5.11 | Number of readout events per pixel for the mask configuration 15                                               | 132 |
| 5.12 | Total matrix throughput versus event generation rate                                                           | 132 |
|      |                                                                                                                |     |

## **List of Abbreviations**

**3FI65P1** 3FI65P1 Full-Field-Fluorescence-Imaging Prototype 1 chip

ADC Analog-to-Digital Converter

**AER** Address-Event Representation

AERD Address-Encoder and Reset-Decoder

AFE Analog Front-End

AGH AGH University of Krakow (Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie)

AI Artificial Intelligence

**ALICE** A Large Ion Collider Experiment

**ALPIDE** ALice PIxel DEtector

**ASIC** Application-Specific Integrated Circuit

ATLAS A Toroidal LHC ApparatuS

**BNL** Brookhaven National Laboratory

**CAD** Computer-Aided Design

**CCD** Charge-Coupled Device

CdTe cadmium telluride

**CERN** European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire)

CML Current-Mode Logic

**CMOS** Complementary Metal-Oxide-Semiconductor

**CMS** Compact Muon Solenoid

**CSA** Charge Sensitive Amplifier

CsI cesium iodide

**CSIRO** Commonwealth Scientific and Industrial Research Organisation

**CT** Computed Tomography

CTR Configuration-Testability-Readout

CZT cadmium zinc telluride

**DAC** Digital-to-Analog Converter

**DAQ** Data Acquisition

DCC Duty-Cycle Correction

**DSP** Digital Signal Processing

**DUT** Device Under Test

**EDA** Electronic Design Automation

EDWARD Event-Driven With Access and Reset Decoder

EDWARD65P1 EDWARD65P1 Event-Driven With Access and Reset Decoder Prototype 1 chip

**EIC** Electron-Ion Collider

**ENC** Equivalent Noise Charge

FPGA Field-Programmable Gate Array

FSM Finite State Machine

FWHM Full Width at Half Maximum

GDSII Graphic Data System II

**GUI** Graphical User Interface

HDL Hardware Description Language

**HEP** High Energy Physics

**HPD** Hybrid Pixel Detectors

**HPGe** high-purity germanium

I/O Input/Output

I<sup>2</sup>C Inter-Integrated Circuit

ILC International Linear Collider

IP Intellectual Property (design block)

IPHC Hubert Curien Pluridisciplinary Institute (Institut Pluridisciplinaire Hubert Curien)

IR Infrared

**ITS** Inner Tracking System

**LET** Linear Energy Transfer

LFSR linear-feedback shift register

LHC Large Hadron Collider

MAPS Monolithic Active Pixel Sensors

MIMOSA Minimum Ionizing MOS Active Pixel Sensor

**ML** Machine Learning

MSB Most Significant Bit

NaI(Tl) thallium-doped sodium iodide

NP Nuclear Physics

NSLS-II National Synchrotron Light Source II

PCB Printed Circuit Board

**PDF** Probability Density Function

**PET** Positron Emission Tomography

PMT Photomultiplier Tube

POR Power-On Reset

**PVT** Process, Voltage, and Temperature

RMS Root Mean Square

RTL Register-Transfer Level

**SCL** Serial CLock Line

**SDA** Serial DAta Line

**SiPM** Silicon Photomultiplier

**SNR** Signal-to-Noise Ratio

**SPB** Serial-Parallel Bus

**SPECT** Single-Photon Emission Computed Tomography

SSTA Statistical Static Timing Analysis

**STA** Static Timing Analysis

**ToA** Time-of-Arrival

**ToT** Time-over-Threshold

**TPC** Time Projection Chamber

VLSI Very-Large-Scale Integration

VM Voltage-Mode

**VUV** Vacuum Ultraviolet

**XFP** X-ray Footprinting of Biological Materials

XRF X-ray Fluorescence

## Chapter 1

## Introduction

#### 1.1 Background and Motivation

#### 1.1.1 The Expanding Field of Radiation Detector Applications

Radiation detectors have become important tools across a broad spectrum of scientific disciplines, technological applications, and social needs. Their ability to sense and measure ionizing radiation from the energetic particles born in stellar explosions to the subtle X-rays revealing the inner structure of matter has opened up entirely new frontiers of discovery and innovation. From probing the fundamental building blocks of the universe to advancing medical diagnostics and ensuring global security, radiation detectors are at the forefront of progress.

In the realm of **High Energy Physics** (**HEP**) and **Nuclear Physics** (**NP**), massive and complex detector systems are the backbone of experiments at facilities like the Large Hadron Collider (LHC) [1], **and at future colliders, which must cope with extremely high data rates for readout** [2]. These detectors are essential for unraveling the deepest mysteries of matter and energy, allowing physicists to observe the fleeting products of particle collisions, search for new fundamental particles, and recreate conditions that existed fractions of a second after the Big Bang. The unyielding quest to understand the universe at its most fundamental level drives the need for detectors with ever-increasing speed, precision, and sensitivity. A prime example is the **Electron-Ion Collider (EIC)**, whose design demands not only high performance in terms of timing and data throughput, but also requires extreme compactness and lightness to minimize the material budget, which is crucial for tracking low-momentum particles, and the ability to manage highly granular detectors with a large channel counts, making advanced readout architectures essential for success [3].

Venturing beyond Earth, **space-based X-ray astronomy** relies entirely on advanced radiation detectors to study the most energetic and violent events in the cosmos. From supermassive black holes devouring matter to the explosive deaths of stars in supernovae, these phenomena emit substantial amounts of X-rays and gamma rays that are absorbed by Earth's atmosphere. Spaceborne telescopes equipped with sophisticated radiation detectors are therefore essential for observing and analyzing these extreme cosmic events, offering perspectives into the nature of gravity, matter under extreme conditions, and the evolution of the universe [4]. The harsh environment of space imposes unique demands on these detectors, requiring them to be highly sensitive, radiation-hard, and capable of operating across extreme temperatures [5].



Figure 1.1: Diverse industry applications of radiation-detection technology. Science: High Energy Physics (particle detectors) [10], Cosmic Radiation Detection (astronomical telescopes) [11], X-ray Microscopy (biological and material imaging) [12]. Medicine: CT scanners for medical diagnostics [13]. Security: Airport scanners for baggage inspection [14].

Closer to home, **advanced X-ray microscopy** is revolutionizing our understanding of materials and biological structures at the nanoscale. By utilizing brilliant X-ray sources and high-resolution pixel detectors, scientists can image the complex structural and chemical details of materials, cells, and even individual molecules with unprecedented clarity. This capability is transforming fields ranging from materials science, enabling the design of novel materials with tailored properties, to biology and medicine, where it offers new pathways for drug discovery and disease diagnosis. The demand for X-ray microscopy with ever-higher resolution and throughput continuously pushes the boundaries of detector technology [6].

Beyond these cutting-edge scientific applications, radiation detectors are also integral to more established fields. **Medical imaging** techniques like Computed Tomography (CT) [7], Positron Emission Tomography (PET) [8], and Single-Photon Emission Computed Tomography (SPECT) [9] are all fundamentally based on radiation detection. While these technologies are already widely deployed, ongoing research and development focus on improving image quality, reducing radiation dose, and enhancing diagnostic capabilities, often relying on progress in detector technology.

Furthermore, radiation detectors are the backbone of numerous other sectors, including **industrial inspection**, where they are used for non-destructive testing of materials and quality control; **security**, for detecting prohibited materials and contraband; and **environmental monitoring**, for measuring radiation levels and maintaining public security. Examples of different applications are presented in Figure 1.1.

#### 1.1.2 Pixel Radiation Detectors

Among the diverse types of radiation detectors, pixel detectors have emerged as a particularly powerful and versatile technology, playing a pivotal role in driving progress across the applications outlined above.



Figure 1.2: Example of pixel detector X-Ray Camera [17].

Their fundamental architecture, based on a finely segmented sensor array, offers inherent advantages that are crucial for meeting the demanding performance requirements of modern radiation detection and imaging.

At their core, pixel detectors consist of a sensor material that is spatially divided into a matrix of individual, independent units – the pixels. Each pixel acts as a miniature, self-contained detector element, capable of independently sensing and converting incident radiation into an electrical signal. An example of a fully functional X-ray camera developed at AGH University of Krakow (Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie, AGH) [15] is shown in Figure 1.2.

This pixelated structure provides the key strengths of such detectors. Firstly, **high spatial resolution** is a defining characteristic of pixel detectors. The small size and dense arrangement of pixels directly translate to the ability to precisely localize the interaction point of radiation. This fine granularity is essential for applications demanding high-resolution imaging, such as X-ray microscopy and medical imaging, where the ability to see fine detail is critical.

Secondly, pixel detectors offer **high detection efficiency**. By carefully designing the sensor material and pixel geometry, it is possible to achieve a high probability of detecting incident radiation within each pixel. Furthermore, the independent operation of pixels allows for efficient collection of charge carriers generated by radiation interactions, contributing to overall detector sensitivity.

Thirdly, pixel detectors demonstrate remarkable **versatility** in detecting various types of radiation. By selecting appropriate sensor materials or converters, pixel detectors can be tailored to efficiently detect X-rays, gamma rays, charged particles, and neutrons across a wide range of energies. This adaptability makes them suitable for diverse applications, from low-energy medical imaging to HEP experiments.

Finally, the modular nature of pixel detectors lends itself well to the construction of **large-area detectors**. Individual pixel detector chips can be tiled together to create large arrays, covering significant areas while maintaining high spatial resolution and overall performance. This scalability is crucial for applications requiring large fields of view, such as astronomical observatories and large-scale scientific experiments [16].

#### 1.1.3 Challenges and Bottlenecks in Pixel Detector Readout

While pixel detectors offer compelling advantages for radiation detection, their very architecture presents significant challenges in managing and extracting the vast amounts of data generated by the trend toward ever-smaller pixels and larger arrays. As pixel dimensions shrink and the number of pixels in a detector array increases dramatically, the volume of data produced escalates rapidly, creating a critical bottleneck in the overall detector system performance – the **data readout bottleneck**.

The data readout bottleneck arises from several interconnected factors. Firstly, **increased data volume** is a direct consequence of pixel miniaturization and array scaling. A detector with millions of pixels, each potentially generating data at high rates, produces a massive stream of information that must be processed and transmitted. Traditional readout systems, often designed for detectors with lower pixel counts and slower readout speeds, struggle to cope with this data deluge [18].

Secondly, there is a growing demand for **higher readout speeds** to keep pace with **increasing radiation flux** and to minimize data loss. In many applications, particularly those involving high-intensity radiation sources or rapidly changing phenomena, detectors must be capable of reading out data at very high frame rates or event rates. If the readout system cannot keep up with the data generation rate, it leads to **data pile-up**, where subsequent events occur before previous data has been processed, resulting in lost information and degraded detector performance [19].

Thirdly, the **limited bandwidth of readout channels** becomes a critical constraint. The data generated by the pixel array must be transferred from the detector chip to external processing and storage systems through a finite number of output channels. As data rates increase, these channels can become saturated, effectively creating a "narrowing" or bottleneck that restricts the overall data throughput of the detector [20]. Managing access to these shared readout channels from a massive number of pixels requires sophisticated arbitration and data multiplexing techniques.

Furthermore, the need for **efficient data management and processing** becomes crucial. Simply increasing readout speed is not sufficient if the subsequent data processing and storage infrastructure cannot handle the high data rates. Efficient data compression, on-chip data processing, and optimized data transmission protocols are important components of a high-throughput pixel detector system [21].

In essence, the persistent drive towards higher resolution and faster detectors pushes the limits of traditional readout architectures. Overcoming the data readout bottleneck requires innovative approaches to data acquisition, arbitration, and processing. A high-level visualization of the detector readout structure is shown in Figure 1.3. This dissertation focuses on addressing these challenges by exploring and developing novel event-driven readout methods, specifically targeting the optimization of throughput and the efficient management of data from multi-channel pixel detectors.



Figure 1.3: Simplified schematic of pixelated detector readout structure.

#### 1.2 Research Problem Statement and Objectives

#### 1.2.1 Limitations of Existing Readout Architectures

Traditional readout architectures for pixel matrices, while serving as foundational approaches, exhibit inherent limitations when confronted with the demands of modern pixel detectors. Modern pixel detector systems feature high channel densities and are designed under constraints on power dissipation and transmission line bandwidth, with a strong emphasis on minimizing the number of lines. The pixel matrix is essentially a distributed sensor system in which individual pixels generally lack awareness of their neighbors and are unable to reconstruct situational context.

Therefore, the core challenge is transmitting data from the pixels to the acquisition system, where the data can be processed to extract knowledge about phenomenon topology in spatial, temporal, or spatio-temporal domains, and where further contextual information can be gained.

Handling of data typically involves multiplexing somewhere in the data flow, which can occur at various stages – within the pixel matrix, in the data acquisition system, or in a dedicated processing unit. Multiplexing is a broad term, and, for example, background suppression and transmitting only signal differences over time for a given pixel can be seen as a special form of multiplexing. Depending on system scale, component distribution, and constraints such as power, cost, and interconnect complexity, different conventional readout architectures are employed. The traditional architectures can be categorized as:

• **Direct links** – each pixel is connected to the periphery by its own dedicated trace, eliminating any possibility of link contention at the price of rapidly escalating routing density and Input/Output (I/O)

pad count. As the array grows, these resources scale linearly with pixel number, quickly overwhelming both chip area and package pins.

- Frame-based readout inherited from Charge-Coupled Device (CCD) practice [22], this approach samples the entire matrix at regular intervals, whether or not pixels contain useful data. When the majority of pixels are quiet, the fixed-rate scan consumes bandwidth that could otherwise be devoted to active channels, obscures fine time-of-arrival information, and introduces latency proportional to the number of rows transferred.
- **Polling schemes** (e.g., token-ring or daisy-chain) designed to reduce physical redundancy, these schemes interrogate channels sequentially. The circulating token must still *visit* every pixel; as a result, latency and throughput both degrade with array size, and any simultaneous activity must wait for the token's return before being reported [23].

These traditional readout approaches necessitate trade-offs between routing complexity, data transmission speed, and data redundancy. To overcome these limitations, **event-driven readout** has gained traction as an alternative [24]. An event-driven system is one in which data transfer from the source (e.g., pixel) to the receiver (e.g., data-acquisition system or processor) is autonomously triggered by the local detection of a meaningful phenomenon. Each source initiates its own readout request and asserts it on a dedicated path that guarantees collision-free transport while allowing every other source to operate on equal terms. By construction, such a system operates asynchronously.

In the pixel-detector context, event-driven operation means that a pixel requests readout only when an event of interest occurs, thereby eliminating the need for periodic scanning or global polling. Such an approach allows the system to dynamically focus on transmitting only relevant data, reducing latency and enhancing throughput regardless of occupancy level. By activating the readout path only when necessary, not only can bandwidth and power consumption be minimized, but effective scaling of high-density, high-rate scenarios is also achieved.

Last but not least, event-driven operation is best suited for scenarios in which the observed phenomena result in sparse and bursty event distributions. For precision HEP experiments, the readout chain should be strictly lossless, whereas event-based neuromorphic vision systems can tolerate occasional event drops.

An idealized view of an event-driven system – free of delays, bandwidth saturation, or inter-pixel correlation affecting readout order is a theoretical model that real-world implementations can only approximate. Practical implementations must therefore address a new challenge: **arbitration**. When readout requests from multiple pixels overlap, potentially occurring simultaneously, a mechanism is required to manage conflicts and grant access to shared data transfer links in an orderly and efficient manner, preventing collisions and data loss.

This need for arbitration has been recognized, and a **combinational logic with a priority encoder** is used in prior works [25, 26]. In this method, each data source, i.e., pixel, has a fixed priority level assigned. When multiple requests arise concurrently, the priority encoder grants access to the highest-priority requester based on combinational logic. While straightforward and efficient in the usage of circuit resources

to implement, priority encoders suffer from significant drawbacks in the context of high-performance pixel detector readout:

- Inherent Pixel Prioritization and Unfairness: Priority encoders inherently favor channels with assigned high priority over channels with assigned low-priority. In a pixel detector, this translates to pixel prioritization, where events occurring in certain pixels are systematically processed before others. This can lead to unfairness in data acquisition, potentially biasing measurements and causing data loss or delayed readout for events in lower-priority pixels.
- Potential for Data Corruption and Dead Time: When a source with an assigned high-priority begins
  transmission, it can interrupt an ongoing transmission from a source with an assigned lower-priority.
  Combinational priority encoders, lacking memory or sequential logic to manage ongoing transmissions, can potentially lead to data corruption and loss or introduce dead time. A remedy involves complex interlocking mechanisms that can enhance basic combinational functionality..
- Inadequacy for Dynamic Operation: To prevent data corruption and ensure correct arbitration results, systems using combinational priority encoders effectively require taking "snapshots" of the source states at specific time intervals. This approach limits dynamic operation and can introduce inefficiencies.

Therefore, if data integrity is the goal, a combinational priority encoder is unsuitable – it can only be used reliably to resolve static arbitration tasks.

This mechanism ensures that, once access is granted to a requesting pixel, subsequent requests cannot disrupt the ongoing transmission or redirect the acknowledge signal until the current request is fully cleared. As long as a request signal remains active, ongoing operations proceed undisturbed, guaranteeing fair arbitration and preventing data corruption. Moreover, by embedding memory in the logic, the architecture naturally queues requests on a first-come, first-served basis, eliminating pixel prioritization, minimizing dead time, and transmitting only meaningful, event-driven data.

The proposed EDWARD concept is closely related to previously explored methodologies known as Address-Event Representation (AER), often cited in the context of neuromorphic systems like vision sensors [27]. However, unlike typical AER, Event-Driven With Access and Reset Decoder (EDWARD) prevents collisions of simultaneously generated addresses and eliminates the need for time-management structures within individual pixels. The mentioned differences are tackled in the further part of this work.

#### 1.2.2 Research Questions and Hypothesis

To address the limitations of existing readout architectures and to harness the potential of near-ideal eventdriven, fair, throughput-optimized, and energy-efficient readout schemes, this dissertation is guided by the following key research questions:

**RQ1:** Is it possible to develop asynchronous arbitration logic with non-priority access to fundamentally overcome the limitations of traditional frame-based, polling-based, priority-encoded readout methods, as well as the shortcomings of AER, in high-density pixel radiation detectors?

**RQ2:** Can a near-ideal event-driven readout architecture be developed using standard design flows and industry-standard Computer-Aided Design (CAD)/Electronic Design Automation (EDA) tools – and if so, what are the costs and necessary compromises?

- **RQ3:** How good is the proposed architecture in ensuring fair and efficient handling of concurrent, asynchronous readout requests from a large pixel array, eliminating pixel prioritization and minimizing data corruption risks associated with combinational priority encoders?
- **RQ4:** What mechanisms can be introduced in the proposed architecture to facilitate interfacing between inherently asynchronous input signals and typically synchronous data acquisition systems, and how practical are they?
- **RQ5:** What are the performance advantages, in terms of energy efficiency, readout latency, data throughput, and arbitration fairness, of the novel architecture compared with representative traditional and competitive readout architectures, and how do these advantages translate to improved performance in pixel radiation detector systems?

#### Based on these research questions, the central **hypothesis** of this dissertation is:

The novel readout architecture features near-ideal event-driven operation and asynchronous arbitration logic with non-priority access, implemented as a tree of RS-latch-based arbiters, will provide significant improvements for high-density pixel radiation detectors by achieving higher throughput, lower latency, and demonstrably fairer event handling compared to traditional frame-based, polling-based, and priority-encoded readout architectures, thus enabling a new generation of pixel detectors.

#### 1.2.3 Dissertation Objectives Aligned with Research Questions

To rigorously investigate the above hypothesis and answer the research questions, the following objectives were pursued:

- O1: Detailed Conceptual Design and Theoretical Analysis of the Proposed Architecture: To develop a comprehensive conceptual design of the novel architecture, focusing on its event-driven operation and asynchronous, non-priority arbitration mechanism. This includes a theoretical analysis of its expected performance and the underlying design rationale that distinguishes it from traditional methods, such as frame-based or priority-encoded schemes.
- **O2: Implementation of Non-Priority Arbitration with Asynchronous Logic Building Blocks:** To implement the core arbitration logic using asynchronous design techniques compatible with standard industry design flows and CAD/EDA tools, specifically employing RS-latch-based arbiters within a binary tree structure, while addressing robustness, metastability, and practical design trade-offs.
- O3: Comprehensive Performance Evaluation through Simulation and Experimental Validation: To quantify arbitration fairness, throughput, and latency of the proposed architecture under realistic and worst-case scenarios through detailed simulation campaigns and experimental testing of fabricated Application-Specific Integrated Circuits (ASICs).

O4: Hardware Realization and Prototyping of the ASICs: To realize the proposed architecture in hardware through the fabrication of a prototype ASIC in a 65 nm Complementary Metal-Oxide-Semiconductor (CMOS) process – demonstrating practical mechanisms for synchronization, clock domain bridging, and data serialization between asynchronous pixel logic and synchronous external acquisition systems.

**O5:** Comparative Performance Benchmarking and Analysis: To benchmark the energy efficiency, latency, data integrity, and throughput performance of the proposed architecture against representative frame-based and polling-based architectures using both simulation and experimental data – demonstrating the real-world benefits of the EDWARD approach.

#### 1.3 Dissertation Scope

This dissertation aims to provide an in-depth exploration of the proposed readout architecture, bringing reader closer to understanding its core innovative features. The primary focus is on the **back-end readout system** of pixel radiation detectors, specifically addressing the challenges of data acquisition, arbitration, and efficient data transmission from the pixel array to a chip periphery, from where data, typically concentrated on high-speed links, is sent to the data acquisition system or processing units. In essence, the proposed architecture is designed to provide a near-ideal event-driven and fair readout solution, promising to significantly enhance the performance of pixel radiation detectors in high-data-rate applications by overcoming the limitations of existing readout methods.

Specifically, the research scope is defined by the following key aspects:

- Pixel Detector Type: The research is applicable to both hybrid and monolithic pixel radiation detectors. While prototype implementations are demonstrated in the context of hybrid detectors, the underlying principles of the proposed architecture and its event-driven, non-priority arbitration scheme are broadly relevant to various pixel detector technologies. The dissertation's findings are intended to be generalizable to a range of pixel detector types, not limited to a specific sensor material or fabrication process. Effectively, the applicability of the research extends to any system comprising multiple channels needing organized readout.
- Readout Architecture Focus: The core focus is on the readout architecture itself, encompassing its conceptual design, asynchronous logic implementation, and non-priority arbitration mechanism. The dissertation delves into the detailed design of the arbitration tree, the arbiter building block, and the overall control logic required for event-driven data acquisition.
- **Performance Metrics of Interest:** The foremost metrics optimized by this research are **energy efficiency, minimization of read-out link bandwidth, preservation of data integrity**. Additionally, parameters like throughput, latency, and arbitration fairness are also explored.
- Technology Node and Implementation Platform: The hardware implementations, specifically the prototype ASICs, are realized in a 65 nm CMOS process. This technology node provides a practical platform for demonstrating the feasibility and performance of the proposed architecture. The

**Configuration-Testability-Readout (CTR) platform** serves as the modular framework for ASIC development, enabling efficient integration and testing of the proposed architecture.

• Validation Methodology: The dissertation employs a rigorous validation methodology combining extensive simulations (digital and mixed-signal) and, where feasible, experimental validation using fabricated prototype chips. Simulations provide a comprehensive characterization of the architecture's performance under various conditions, while experimental results serve to validate the simulation models and demonstrate the real-world performance of the implemented system. The ultimate goal is to provide a solution that could be technology agnostic and be implemented by a designer using industry-standard CAD/EDA tools with standard gates whose portfolio needs to be expanded by several non-standard gates only.

#### 1.4 Dissertation Structure

This dissertation is structured to provide a comprehensive and logical exposition of the research, from foundational background to detailed implementation and performance evaluation of the proposed architecture. The document is organized into the following chapters:

- Chapter 1: Introduction: (Current Chapter) This chapter provides the contextual background and motivation for the research, highlighting the growing importance of pixel-radiation detectors and the critical challenges associated with data read-out. It defines the research problem, articulates the key research questions and dissertation hypothesis, and outlines the scope of the dissertation. Altogether, this introduction serves as a roadmap for the reader, setting the stage for a detailed exploration of the proposed research topic in the subsequent chapters.
- Chapter 2: Literature Review: Foundations of Pixel Detector Readout: This chapter comprehensively reviews the foundational knowledge and existing techniques relevant to pixel detector readout and in-situ processing. It begins by establishing the fundamental principles of pixel radiation detectors, including radiation interaction mechanisms, sensor materials, and pixel detector architectures (monolithic vs. hybrid). The chapter then dives into a critical analysis of various data readout architectures, including frame-based, polling-based, and other approaches, highlighting their limitations and shortcomings in the context of high-performance pixel detectors. Furthermore, it provides detailed background on asynchronous logic design, non-priority arbitration principles, and in-situ signal processing techniques relevant to the development of the proposed architecture. This chapter establishes the necessary context and knowledge base upon which the subsequent chapters are built.
- Chapter 3: Design and Implementation of the Proposed Architecture: This chapter provides a detailed exposition of the conceptual design and implementation of the EDWARD architecture. It elaborates on the key architectural principles, focusing on its event-driven operation and the novel asynchronous non-priority arbitration mechanism. The chapter provides a thorough description of the asynchronous logic building blocks, particularly the arbiter, and explains the implementation of the arbitration tree and the overall control logic. It details the design considerations for both the digital

1. Introduction 11

and fully-functional detector readout prototype ASICs (EDWARD65P1 Event-Driven With Access and Reset Decoder Prototype 1 chip (EDWARD65P1) and 3FI65P1 Full-Field-Fluorescence-Imaging Prototype 1 chip (3FI65P1)), outlining the specific features and functionalities incorporated into each chip for different validation purposes. This chapter serves as the central design and implementation chapter of the dissertation, providing a complete description of the proposed architecture.

- Chapter 4: Simulation Methodology and Performance Evaluation: This chapter focuses on the rigorous simulation methodology employed to evaluate the performance of the EDWARD architecture. It describes the simulation environment, the models used for digital and mixed-signal simulations, and the key performance metrics analyzed, including readout latency, throughput, and arbitration fairness. The chapter presents a comprehensive analysis of the simulation results, demonstrating the performance characteristics of the EDWARD architecture under various operating conditions and highlighting its advantages. This chapter provides the primary performance validation of the proposed architecture through simulation-based analysis.
- Chapter 5: Experimental Validation and Prototype Characterization: This chapter details the experimental validation efforts undertaken to characterize the fabricated prototype ASICs: the ED-WARD65P1 chip designed for temporal characterization and 3FI65P1 designed as a fully-functional readout ASIC for a pixilated radiation detector. It describes the experimental setup, the testing methodology employed to measure key performance parameters, and the results obtained from experimental measurements including synchrotron radiation. The chapter compares experimental findings with simulation predictions, validating the design and simulation models.
- Chapter 6: Conclusion: This final chapter summarizes the key findings and conclusions of the dissertation. It reiterates the primary contributions of the research, emphasizes the significance of the EDWARD architecture as an advancement in pixel detector readout technology, and briefly outlines potential future research directions and open challenges in the field. This chapter provides a concise and impactful closing statement, summarizing the key takeaways from the dissertation.

| 1. Introduction | 12 |
|-----------------|----|
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |
|                 |    |

# **Chapter 2**

# **Literature Review**

# 2.1 Fundamental Principles of Radiation Detectors

Detection of ionizing radiation relies on energy deposition in a material medium. When a charged particle traverses a medium or a photon interacts with it, energy is transferred to the detector's atoms through electromagnetic interactions. These interactions can produce **detectable secondary effects** in the medium, notably **ionization**, the creation of free charged carriers, and **excitation** of atoms or molecules that can lead to **scintillation** – emission of ultraviolet or visible-light photons [28,29]. The fundamental role of a radiation detector is to convert these initial physical effects into a measurable electronic signal.

### 2.1.1 Types of Radiation Detectors

Radiation detectors are commonly categorized by the medium and physical mechanism employed to convert the radiation energy into an electrical output. Based on the form of energy conversion, the major classes are **gaseous detectors** (gas-filled detectors), **scintillation detectors**, and **solid-state** (**semiconductor**) **detectors** [30]. Each class utilizes a different medium (gas, scintillator, or semiconductor) and operates based on a different principle to register the passage of radiation as described below:

• Gaseous detectors: These detectors collect the charge created when radiation ionizes an atom of a gas under an applied electric field. The simplest example is the ionization chamber, which operates at a voltage low enough that no further amplification occurs – the total ionization charge from each event is collected and measured directly [31]. Increasing the operating voltage leads to a proportional counter, in which each ionizing event initiates a localized avalanche multiplication of charge. The resulting pulse carries a magnitude proportional to the original deposited energy, allowing basic energy discrimination [32]. At even higher voltages, the detector enters the Geiger-Müller regime: a Geiger-Müller tube produces a large, saturated discharge pulse for each particle, yielding high sensitivity but no energy information (all pulses are the same size) [29, 33]. More sophisticated gas-filled detectors, such as the Time Projection Chamber (TPC), extend these principles by drifting ionization electrons over macroscopic distances to segmented electrodes or wires. This approach enables fine position measurement (sub-millimeter or better) and even full three-dimensional track reconstruction of charged particles within the gas volume [34]. Gas detectors are appreciated for their simplicity and

large active volumes, though their ionization density is low and thus they often have lower energy resolution compared to other detector types [33].

- Scintillation detectors: These detectors use a scintillator material that emits light photons when excited by ionizing radiation. Common scintillators include inorganic crystals such as thallium-doped sodium iodide (NaI(Tl)) and cesium iodide (CsI), as well as organic plastic and liquid scintillators [29, 35]. When radiation interacts in the scintillator, a fraction of the deposited energy is released as a burst of photons, typically in Vacuum Ultraviolet (VUV) to Infrared (IR) spectral range [36]. These scintillation photons are then detected by a photosensor traditionally a Photomultiplier Tube (PMT), or in modern systems a solid-state Silicon Photomultiplier (SiPM) which converts the light into an electrical pulse [37]. Scintillation detectors typically offer high detection efficiency (due to the high density of many scintillators, which provide strong stopping power material's ability to absorb radiation energy effectively) and excellent timing resolution, making them ideal for applications such as medical imaging and time-of-flight measurements. However, because the scintillation light must be collected and converted, the achievable spatial resolution is often limited by optical light spread and sensor granularity. Improving the spatial resolution usually requires segmentation of the scintillator or the use of pixelated photodetector arrays to localize events. For example, gamma-ray cameras employ scintillator panels coupled to an array of photodetector elements to obtain position information [38].
- Solid-state detectors: These detectors are constructed from semiconductor materials such as silicon (Si), high-purity germanium (HPGe), or compound semiconductors like cadmium telluride (CdTe) and cadmium zinc telluride (CZT). When radiation interacts in a semiconductor, it directly generates electron-hole pairs in proportion to the energy deposited. In practice, this is typically achieved by reverse-biasing a semiconductor p-n junction to form a depleted active region. Ionizing radiation in this region creates charge carriers that are swept by the internal electric field to the electrodes, resulting in a measurable current pulse [39, 40]. This direct conversion mechanism allows solid-state detectors to achieve very precise energy measurements (with energy resolutions often superior to those of scintillation detectors) because of the large number of charge carriers generated per unit energy and the minimal intermediate conversion steps [29]. Solid-state detectors are also inherently well-suited for high spatial resolution: semiconductor devices can be fabricated with finely segmented electrodes and pixelated structures, yielding position-sensitive detectors with very small element size (on the order of micrometers to hundreds of micrometers) [41]. Furthermore, thanks to advances in semiconductor fabrication, these detectors offer a compact, robust form factor and can be directly integrated with readout electronics on the same chip or in close proximity. This ability to integrate sensor and electronics leads to improved signal-to-noise performance and enables the high channel densities required for pixelated detector systems [42].

Example implementation of each of the above detector classes is shown in Figure 2.1. While each of them employs a different physical medium, all can be segmented into discrete sensor elements (pixels or strips) to improve spatial resolution. For example, gas-filled detectors can be equipped with multi-wire or microstrip readouts to provide two-dimensional position [44], and scintillator-based detectors can use



Figure 2.1: Examples of different radiation detector implementations: (a) A gaseous detector: a two-dimensional position-sensing microstrip gas chamber [43] that uses fine anode strips to localize ionization charge. (b) A scintillation detector: a scintillator crystal coupled to a photomultiplier tube for light read-out [29]. (c) A semiconductor detector: a silicon drift detector, showing the layered electrode structure that allows efficient collection of charge carriers [28].

pixellated arrays or multiple photodetectors to localize the scintillation events [42]. In practice, however, the finest segmentation and the most sophisticated on-board signal processing are achieved with semiconductor detectors, owing to their solid-state electronics compatibility [45]. Pixelated solid-state detectors, which incorporate a matrix of small, individual sensor pixels on a semiconductor substrate, each with its own processing channel, have emerged as a core technology in modern radiation detection. In this dissertation, we focus on pixelated semiconductor radiation detectors, taking advantage of their scalability to large channel counts, compact integration with CMOS readout circuits, and capability for optimized throughput, low-latency signal processing on the detector. This approach is crucial for addressing the stringent requirements on data rate and resolution in state-of-the-art radiation detection applications [3].

#### 2.1.2 Radiation Interaction Mechanisms

Radiation detection in semiconductors involves converting deposited energy into electrical signals through electron-hole pair creation. The charge collected is proportional to the deposited energy  $E_{\text{dep}}$  and inversely related to the ionization energy  $\varepsilon$ , specific to the semiconductor material (e.g., 3.6 eV in Si, 4.6 eV in CZT) [39,40]:

$$Q = \frac{E_{\text{dep}}}{\varepsilon} e \tag{2.1}$$

The probability of interaction is quantified by the *cross section*  $\sigma$  (cm<sup>2</sup>). The total interaction probability per unit length is described by the *linear attenuation coefficient*  $\mu = N \cdot \sigma$ , where N is atomic number density [29].

#### **Photon Interactions**

Photons interact with matter through several mechanisms [29, 46, 47], detailed as follows:

- Photoelectric effect: Dominates at energies below approximately 100 keV. An atom completely absorbs the incoming photon, ejecting a bound electron (photoelectron) with kinetic energy  $E_e = E_{\gamma} E_b$ , where  $E_b$  is the electron binding energy. This effect exhibits a strong atomic number (Z) dependency ( $\sigma_{\text{photo}} \approx Z^{4-5}E_{\gamma}^{-3.5}$ ). Secondary effects include emission of characteristic X-rays or Auger electrons due to electron shell rearrangements [46].
- Compton scattering: Predominant between approximately 100 keV to a few MeV. A photon scatters off a loosely bound or free electron, transferring partial energy. The scattered photon's energy  $E'_{\gamma}$  is described by the Compton equation:

$$E_{\gamma}' = \frac{E_{\gamma}}{1 + \frac{E_{\gamma}}{m_e c^2} (1 - \cos \theta)} \tag{2.2}$$

The differential cross-section for scattering angles can be calculated precisely by the Klein-Nishina formula [29,48], derived from relativistic quantum electrodynamics:

$$\frac{d\sigma}{d\Omega} = \frac{r_e^2}{2} \left(\frac{E_\gamma'}{E_\gamma}\right)^2 \left(\frac{E_\gamma'}{E_\gamma} + \frac{E_\gamma}{E_\gamma'} - \sin^2\theta\right) \tag{2.3}$$

where  $r_e$  is the classical electron radius.

• Pair production: Dominates at photon energies above 1.022 MeV. Pair production predominantly occurs near atomic nuclei (dominant) or electrons (less common), due to the Coulomb field [29, 47] necessary for momentum conservation. A photon converts into an electron-positron pair. The interaction is energetically possible due to the rest-mass energy equivalence (2 × 511 keV). The excess photon energy above 1.022 MeV is shared between the kinetic energies of the electron and positron. This process strongly depends on the nuclear Coulomb field, with the cross section increasing approximately as  $\sigma_{\text{pair}} \propto Z^2 \ln(E_\gamma)$ , emphasizing interactions in high-Z materials.

#### **Charged Particle Interactions**

Charged particles (electrons, protons, alpha particles, etc.) deposit energy primarily through ionization and excitation of atomic electrons. This energy loss rate is described by the Bethe-Bloch equation [28,49], derived from quantum electrodynamics and classical electrodynamics principles, accounting for the relativistic interaction of a charged particle with the electrons of a medium:

$$\left(\frac{dE}{dx}\right) = \frac{4\pi z^2 e^4}{m_e c^2 \beta^2} \cdot \frac{Z}{A} \cdot \left[ \ln\left(\frac{2m_e c^2 \beta^2 \gamma^2}{I}\right) - \beta^2 \right]$$
(2.4)

where z is the charge of the incident particle, Z and A are the atomic number and atomic mass of the absorbing medium, respectively,  $m_e$  is the electron mass,  $\beta=v/c$  is the normalized particle speed,  $\gamma$  is the Lorentz factor, and I is the mean excitation potential of the absorbing material.

At higher energies, particularly for electrons in high atomic number materials, bremsstrahlung (radiation emitted when a charged particle decelerates due to the electric field of atomic nuclei) becomes significant, generating secondary photons [29, 50].

## **Neutron Interactions**

Neutrons, electrically neutral, interact indirectly via nuclear processes rather than direct ionization [29, 51]:

- Elastic scattering: Neutrons transfer kinetic energy to atomic nuclei (e.g., hydrogen atoms), causing recoil nuclei to ionize the surrounding medium.
- Inelastic scattering and neutron capture: High probability in certain isotopes (e.g., <sup>10</sup>B and <sup>6</sup>Li), leading to charged-particle emissions (alpha particles or tritium nuclei) that ionize the detector material.

The significance of these mechanisms is strongly dependent on neutron energy. Elastic scattering is particularly significant for fast neutrons, while neutron capture and subsequent charged-particle emission is prominent at thermal neutron energies.

#### **Heavy Ions and High-LET Radiation**

High-Linear Energy Transfer (LET) radiation, such as heavy ions (e.g., fission fragments or cosmic ray secondaries), produces densely ionized tracks. High-LET particles have significant energy deposition along short paths, causing increased recombination (charge carriers recombine before being collected), charge saturation, and lattice displacement damage. These interactions significantly degrade the charge collection efficiency and lead to progressive detector aging, limiting performance and lifetime in high-radiation environments [29, 50].

#### **Summary**

Understanding these fundamental interactions provides essential theoretical underpinning for optimizing pixel radiation detector architectures, especially in the context of systems designed for high-performance applications.

## 2.1.3 Signal Formation in Semiconductor Detectors

Ionizing radiation transfers energy to the semiconductor primarily via interactions described earlier, and the absorbed energy excites electrons from the valence band into the conduction band, resulting in the creation of electron-hole pairs. After their generation, charge carriers are separated and transported under the influence of an externally applied electric field. The electric field is established by applying a reverse-bias voltage across the detector junction. Electron-hole pairs are quickly swept apart: electrons drift towards the positive electrode (anode) and holes toward the negative electrode (cathode). The drift velocity (v) of charge carriers is proportional to the applied electric field  $(E_{field})$  [29]:

$$v = \mu E_{field} \tag{2.5}$$

where  $\mu$  is the mobility of charge carriers, a material-dependent parameter typically ranging from  $1350\,\mathrm{cm^2\,V^{-1}\,s^{-1}}$  for electrons and  $450\,\mathrm{cm^2\,V^{-1}\,s^{-1}}$  for holes in silicon at room temperature.

The measurable signal in semiconductor detectors arises from the induced current created by moving charge carriers within the detector. This phenomenon is well-described by the Shockley-Ramo theorem [52]. According to this theorem, the instantaneous induced current i(t) due to a single moving charge q is given by:

$$i(t) = q\vec{v}(t) \cdot \vec{E}_w \tag{2.6}$$

where  $\vec{v}(t)$  is the instantaneous velocity of the charge, and  $\vec{E}_w$  is the weighting field defined as the electric field distribution within the detector when one electrode is set to unit potential while all others are grounded. The Shockley-Ramo theorem is crucial for designing detector geometries and understanding signal shapes.

The charge collection efficiency in semiconductor detectors is critical for accurate energy measurement and depends on carrier lifetime, drift lengths, and detector geometry. During transport, charge carriers may recombine or become trapped at impurity sites or crystal defects, reducing the efficiency [53]. The presence

of traps and recombination centers reduces the measurable charge, and thus the detector's performance. Strategies such as increasing purity, optimizing detector thickness, or operating at cryogenic temperatures can mitigate trapping and enhance detector efficiency. Also critical is the uniformity and strength of the electric field, which significantly affect charge transport and collection. Non-uniform fields can lead to incomplete charge collection and position-dependent signals, complicating signal interpretation. Detectors with carefully engineered electrode structures help ensure uniform fields and improve spatial resolution and energy accuracy.

Signal formation is inherently influenced by electronic noise. The key contributors to electronic noise in semiconductor detectors include thermal noise, flicker (1/f) noise, and shot noise [54]. Additionally, statistical fluctuations in charge generation, described by the Fano factor, influence the energy resolution [28]. The energy resolution ( $\Delta E$ ) of the detector can be expressed as:

$$\frac{\Delta E}{E} = 2.35 \sqrt{\frac{F\varepsilon}{E}} \tag{2.7}$$

where F is the Fano factor (approximately 0.15 in silicon), reflecting the deviation from Poissonian statistics due to correlations in charge carrier generation.

Signal formation in semiconductor detectors involves complex physical processes, from energy deposition and charge generation to transport, induction, and collection. Understanding and optimizing these processes is essential for achieving high detection performance, precision, and reliability in applications spanning fundamental physics experiments, medical imaging, and radiation safety monitoring.

# 2.2 Solid-State Pixel Detector Types

## 2.2.1 Hybrid Pixel Detectors (HPD)



Figure 2.2: Structure of a hybrid detector [55].

Hybrid Pixel Detectors (HPD) comprise two distinct components: a semiconductor sensor substrate and a dedicated readout integrated circuit (realized typically as an ASIC). These are mechanically and electrically connected by fine-pitch interconnects, traditionally using bump-bonding [56]. Typical structure of HPD is shown in Figure 2.2. Alternative integration techniques have also been explored, such as capacitive chip-to-chip coupling [57] and fusion bonding via 3D integration [16]. Each method has different electrical and mechanical trade-offs:

• bump bonding provides reliable one-to-one metal contacts but introduces additional material and limits how small the pixel pitch can be,

- capacitive coupling avoids physical solder bumps by transferring signals across a thin insulating gap,
   reducing mechanical complexity at the cost of requiring exact alignment and high coupling capacitance,
- and advanced 3D integration (using direct wafer bonding with through-silicon vias) enables finer interconnect pitches with reduced parasitic capacitances, though this approach is still less mature for large-scale detector production.

The sensor in an HPD is typically made of a high-resistivity semiconductor (e.g., silicon, CdTe, or CZT) that can be fully depleted in order to ease establishing an electric field in the active volume from which the liberated charge is collected. The ASIC, implemented in a modern CMOS process, amplifies and processes these charge signals. The specific design priorities of the ASIC differ by application domain. For example, in HEP experiments, the readout chip must tolerate extreme radiation levels and provide fast timing to tag particle hits with the correct 25 ns bunch crossing [16, 58]. By contrast, in photon science applications like X-ray imaging, the ASIC emphasizes a high dynamic range and the capability to handle very high photon count rates without dead-time [56].

HPD were initially developed for applications in HEP, particularly for experiments at the LHC at European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire, CERN) [59]. In modern collider detector upgrades, the RD53 series ASIC (developed jointly by A Toroidal LHC ApparatuS (ATLAS) and Compact Muon Solenoid (CMS) collaborations) represents the state-of-the-art HEP hybrid pixel readout chip, featuring a 50  $\mu$ m pixel pitch and engineered to withstand high hit densities and total ionizing doses approaching 1 Grad over a decade of operation [58]. Meanwhile, the Medipix [60] and Timepix [61] families of hybrid chips, originally developed at CERN, have been adopted in a wide range of spin-off applications such as synchrotron radiation experiments, medical imaging, and space radiation monitoring rather than in mainstream collider detectors. Another notable implementation is the UFXC32k ASIC [55], a 32k-channel hybrid pixel readout chip designed for fine spatial resolution and high-speed photon counting, which has been deployed in X-ray synchrotron facilities [62].

## **Advantages of Hybrid Pixel Detectors**

Ability to individually address sensors' materials and readout electronics results in:

- **Independent optimization**: The sensor material and the ASIC technology can be chosen and refined separately, allowing the use of specialized sensor substrates (including high-Z materials for X-ray detection) without compromising the on-chip electronics.
- High complexity electronics: Advanced CMOS ASICs can incorporate sophisticated per-pixel functionality, such as precise time-stamping for tracking or multi-threshold energy measurement for spectroscopic imaging.

• **Radiation tolerance**: A high-resistivity, fully depleted sensor provides strong radiation hardness on the detector side, and the ASIC can be designed with radiation-hard techniques, making the hybrid approach well-suited for extreme radiation environments.

• **High detection efficiency**: Sensors can be made relatively thick (hundreds of microns) to absorb ionizing radiation efficiently, yielding strong signal charge – an important advantage for detecting high-energy X-ray photons.

#### **Limitations of Hybrid Pixel Detectors**

Bump bonding or other integration methods of stacking layers yield:

- Manufacturing complexity and cost: Significant addition to the complexity, manufacturing costs, and potential negative impact on yield, as many operations are needed as post-processing.
- **Increased material budget**: Disallowance of lightweight detectors as additional detector thickness presents a challenge where a low material budget is needed to reduce multiple scattering.
- Limited pixel size: Restricted pixel sizes to between 50 100 µm due to practical manufacturing constraints related to integration.

#### 2.2.2 Monolithic Active Pixel Sensors (MAPS)



Figure 2.3: Structure of MAPS detector [63].

Monolithic Active Pixel Sensors (MAPS) are advanced semiconductor detectors that integrate both sensing elements and signal-processing electronics onto a single silicon substrate. This integration eliminates the need for complex interconnection techniques, such as bump bonding, significantly simplifying the fabrication process. MAPS utilize standard CMOS imaging processes (example structure shown in Figure 2.3), capitalizing on mature and cost-effective manufacturing technologies initially developed for consumer electronics applications [63, 64].

One of the earliest MAPS developments for HEP was the Minimum Ionizing MOS Active Pixel Sensor (MIMOSA) [65], developed by the Hubert Curien Pluridisciplinary Institute (Institut Pluridisciplinaire Hubert Curien, IPHC) in Strasbourg, France.

Traditional MAPS detect ionizing radiation through the generation of electron-hole pairs within a thin, increased-resistivity epitaxial silicon layer that is typically of p-type doping. Electrons generated by radiation diffuse towards the collection diodes integrated within each pixel, where they induce voltage signals. These signals are amplified and processed by embedded CMOS circuitry. This design facilitates compact and low-power pixel implementations with high spatial resolution capabilities.

Modern MAPS have evolved significantly by transitioning from charge collection primarily via diffusion toward drift-based mechanisms. Earlier designs relied on thermal diffusion of charge carriers in a thin epitaxial layer, resulting in slower collection times and susceptibility to radiation-induced trapping. Recent advancements introduce high-resistivity substrates, deep n-well structures, and higher bias voltages, generating internal electric fields that enable rapid drift of charge carriers toward collection electrodes [66] – the difference in structural cross section is presented in Figure 2.4. Drift-based MAPS, such as ALice PIxel DEtector (ALPIDE) [67], demonstrate enhanced temporal resolution, improved radiation hardness, and better signal integrity, making them particularly suited for modern HEP and NP experiments demanding high-speed, high-resolution, and very light particle tracking under intense radiation conditions.



Figure 2.4: Structural cross-section of a pixel in (a) a standard process MAPS and (b) a modern MAPS with fully depleted sensitive layer [68].

The ALPIDE sensor, developed for the A Large Ion Collider Experiment (ALICE) experiment's Inner Tracking System (ITS) upgrade at CERN, exemplifies the advanced capabilities of modern MAPS. Fabricated in a 180 nm CMOS Imaging Sensor process by TowerJazz, ALPIDE features a pixel pitch of  $\sim \! \! 30 \, \mu m$ , with integrated amplification, discrimination, and zero-suppression readout circuitry within each pixel. This architecture delivers high detection efficiency, low noise, and rapid readout speeds, **ligthweighness** suitable for tracking in heavy-ion collision environments at high particle densities.

## **Advantages of MAPS**

Lack of hybridization using bump bonding or other integration method yields:

 Reduced material thickness: Reduced sensor thickness (tens of μm), minimizing multiple scattering, and particle energy loss.

• **High spatial resolution**: Achievable small pixel sizes enhancing granularity and spatial resolution despite increased charge sharing.

- **Cost-effective and simplified manufacturing**: Standard CMOS manufacturing processes simplify production, reducing costs and enabling large-scale fabrication.
- Low power consumption: Integrated electronics within pixels allow efficient power utilization, essential for applications with stringent power constraints.

#### **Limitations of MAPS**

Integrating all elements on a single substrate results in the following:

- Moderate radiation hardness: While improvements in CMOS processes have enhanced radiation tolerance significantly, MAPS traditionally have lower radiation hardness compared to hybrid pixel detectors.
- Limited analog performance: CMOS process constraints introduce inherent noise and limited gain, potentially affecting precise energy measurements compared to hybrid detectors employing dedicated analog front-end electronics.
- Constrained circuit/design options: The shared substrate must accommodate the charge-collecting electrodes, the guard structures preventing leakage currents, and the circuits responsible for signal processing..

While MAPS technology development is primarily driven by HEP and NP experiments like ALICE or the EIC for vertexing and tracking, a broader scope of applications exists. The stitching technology enables the construction of large-area detectors by overcoming reticle area limitations, which is crucial for expansive detector arrays. Reduction of detector thickness further improves the material budget that minimizes multiple scattering and energy loss. Notable applications include:

- **Space missions**: The AstroPix detector [69] employs MAPS technology for gamma-ray detection, exploiting its low power and compactness, crucial for space exploration missions.
- **Medical imaging**: MAPS-based detectors are investigated for proton computed tomography, promising high-resolution imaging with reduced radiation exposure to patients.
- **Electron microscopy**: The sensitivity and high spatial resolution of MAPS make them highly suitable for advanced electron microscopy techniques [70].

#### **Future Developments**

Ongoing research and development efforts are targeting:

• Wafer-scale integration: Utilizing advanced stitching techniques to fabricate larger MAPS devices, overcoming reticle size limitations.

 Enhanced radiation tolerance: Design improvements to extend the usability of MAPS in highradiation environments.

• Advanced integrated electronics: Inclusion of sophisticated processing capabilities within pixels, enabling real-time, in-situ data analysis and event-driven detection strategies.

In summary, the evolution of MAPS technology continues to enhance its capabilities and widen its applications, demonstrating its significance across various scientific and technological domains.

## 2.2.3 Comparative Summary of Hybrid and Monolithic Architectures

The selection between hybrid and monolithic architectures depends on application-specific requirements, including spatial resolution, radiation hardness, complexity of electronics, material budget constraints, and costs. Hybrid detectors excel in specialized applications that require maximum radiation hardness and sophisticated electronic processing. Conversely, MAPS are advantageous in applications where minimizing detector thickness, achieving very high spatial resolution, and reducing manufacturing complexity are prioritized.

# 2.3 In-Situ Signal Processing Techniques

Weak charge signals, often comprising only a few hundred carriers, must be converted into voltage or current with sufficient energy to rise above the noise level. To extract such faint signals, the readout and processing electronics must be implemented as close to the signal source as possible, a concept known as in-situ processing. In modern systems, in-situ processing is a broad term, i.e., thanks to advances in integrated circuit technology, it can now encompass significant processing capabilities and complex functionality. In-situ signal processing within pixel detectors involves performing preliminary signal processing directly at the pixel or immediately adjacent peripheral electronics. This approach improves accuracy in event characterization and enhances overall detector system efficiency. Particularly beneficial in applications with large pixel arrays, high event rates, and stringent energy and timing resolution requirements, in-situ processing methods evolve to include sophisticated analog and digital signal processing techniques. The example block diagram of the in-situ processing chain in a modern detector [71] is presented in Figure 2.5.

#### 2.3.1 Motivation for In-Situ Signal Processing

Modern pixel radiation detectors must meet increasing demands for spatial resolution, temporal accuracy, energy measurement precision, and manageable data transmission under strict power budgets. Transmitting raw sensor data is often impractical due to bandwidth constraints and power consumption. Implementing in-situ signal processing addresses these challenges by:

• **Reduced Data Volume:** Techniques such as zero suppression and threshold discrimination significantly decrease the amount of transmitted data, efficiently using available bandwidth and reducing storage requirements.



Figure 2.5: Example of in-situ signal processing chain based on chip HEXID65P1 [71].

- Improved Signal Quality: Local amplification and pulse shaping minimize noise, significantly boosting the Signal-to-Noise Ratio (SNR), enhancing measurement accuracy and sensitivity.
- **Real-time Event Identification:** Immediate local processing facilitates rapid event identification and classification, enabling prompt decision-making and subsequent actions in real-time.

This dissertation specifically addresses the challenge of implementing on-the-fly zero suppression while ensuring accurate event reconstruction.

#### 2.3.2 Established Techniques

## **Charge Amplification**

Charge signals generated by radiation interactions require precise initial amplification for effective subsequent processing [53]. Charge Sensitive Amplifiers (CSAs) perform this important function by interfacing the detector capacitance and converting the collected charge into proportional voltage signals (impedance transformation), effectively shifting the signal domain from charge to voltage or current for further amplification. They are often characterized by their gain stability, linearity, low noise performance, and power efficiency [72]. The quality of CSA design directly impacts detector sensitivity and resolution.

#### **Pulse Shaping**

Pulse shaping circuits refine preamplified signals to maximize the signal-to-noise ratio (SNR) and to ensure accurate energy measurement [53]. Semi-Gaussian shaping filters, often implemented as CR-RC<sup>n</sup> networks, control both the pulse width and the timing resolution while mitigating pile-up effects. A high-pass (CR)

stage defines the decay time constant. In contrast, a cascade of n low-pass (RC) integrators determines the rise time and bandwidth limitation, resulting in an overall (n+1)-order band-pass filter [73]. This structure is shown schematically in Figure 2.6, where n integrators follow the differentiator. The Laplace-domain transfer function of such a filter is expressed as

$$H(s) = \frac{(s\tau_d)}{(1+s\tau_d)} \cdot \left(\frac{1}{1+s\tau_i}\right)^n,\tag{2.8}$$

where  $\tau_d$  is the differentiator time constant and  $\tau_i$  the integrator time constant. Increasing the order n yields a pulse shape closer to the ideal Gaussian, though with a longer delay. The shaping time, defined as  $\tau_s = n\tau_i$ , governs the peaking time of the output pulse. Optimal shaping enhances both energy resolution and timing accuracy by suppressing broadband noise components while preserving the main Fourier content of the detector signal.



Figure 2.6: Schematic representation of a CR-RC<sup>n</sup> pulse shaping filter. The differentiator (high-pass filter) sets the decay constant, while the n integrators (low-pass filters) define the rise time, resulting in a semi-Gaussian output [73].

#### **Threshold Discrimination**

Threshold discrimination transforms analog signals into digital pulses by comparing them against predetermined thresholds [53]. This technique effectively distinguishes genuine signals from noise, critical for maintaining high detection accuracy. Precise threshold setting balances sensitivity against false detection rates, potentially employing adaptive or dynamic threshold techniques to optimize performance.

#### Amplitude Measurements - Peak Detection or Time-over-Threshold

Peak detection circuits accurately measure and temporarily store the maximum amplitude of shaped signals [74]. These circuits, typically employing sample-and-hold architectures, are essential in spectroscopy applications where precise energy measurements directly correlate to radiation interactions. Advanced peak detection circuits faithfully reflect signal amplitude and avoid pedestal errors and droop. Time-over-Threshold (ToT) provides additional amplitude information indirectly by measuring pulse duration above

threshold, facilitating simpler circuitry and lower power consumption compared to conventional amplitude measurement methods [75].

#### **Analog-to-Digital Conversion**

Analog signals from peak detectors require digitization through Analog-to-Digital Converter (ADC) for subsequent digital processing and analysis [53]. High-resolution ADCs, typically ranging from 10 to 16 bits, ensure accurate digital representation of the signal amplitude, maintaining the detector's intrinsic measurement precision. The choice of ADC involves trade-offs among sampling rate, resolution, power consumption, and complexity [76].

#### **Digital Signal Processing**

Digital Signal Processing (DSP) techniques significantly enhance detector performance by digitally filtering and analyzing signals post-digitization [77]. Algorithms such as matched filtering, baseline correction, and digital pulse shaping reduce noise and improve resolution. DSP further allows complex real-time signal interpretation and processing, essential in high-rate or time-sensitive applications.

#### **Artificial Intelligence and Machine Learning Techniques**

Artificial Intelligence (AI) and Machine Learning (ML) methods enable sophisticated signal interpretation and classification beyond conventional analytical techniques [78]. Advanced algorithms provide accurate event classification, pattern recognition, and anomaly detection, significantly enhancing detector capabilities, particularly in complex or noisy operational environments.

#### Time-of-Arrival

Time-of-Arrival (ToA) techniques accurately timestamp events, which is critical for applications requiring high timing resolution, such as particle tracking or synchronization [79].

## **Charge-Sharing Correction**

Charge sharing, caused by the induction of signals through charge splitting between adjacent pixels, reduces both spatial and spectral resolution. In-pixel and on-chip algorithms, such as signals' summation or digital compensation methods, correct these effects by reconstructing accurate energy deposition patterns and enhancing imaging fidelity [80].

#### **2.3.3** Detector Operation Modes

#### **Imaging Detectors**

Imaging detectors are designed to capture spatial and intensity information of incoming radiation with high fidelity. Depending on how signals are processed and represented, these detectors can be broadly divided into integrating, counting, and binary types. Each approach offers a different balance between reso-

lution, complexity, and data throughput, making the choice of architecture crucial for specific applications, e.g., in medical imaging or real-time radiation monitoring.

- Integrating Detectors: Integrating detectors function by accumulating charge continuously over specified integration periods, effectively aggregating the total number of radiation-induced charge carriers. This operational mode is particularly well-suited for applications prioritizing cumulative radiation intensity measurements over temporal precision, such as medical radiography, CT, and radiation dosimetry. Implementation of integrating detectors necessitates circuitry exhibiting high linearity to prevent signal saturation, ensure precise charge accumulation, and maintain fidelity in representing the total integrated radiation dose over extended exposure intervals [81,82].
- Counting Detectors: Counting detectors operate by individually digitizing each radiation-induced event, providing distinct timing and amplitude information. This mode inherently offers enhanced temporal resolution and superior energy discrimination capabilities, advantageous in applications demanding detailed spectral analysis, particle trajectory tracking, and dynamic imaging scenarios. Counting detectors necessitate advanced front-end electronics designed to rapidly process individual events with minimal dead-time intervals [83, 84].
- Binary Detectors: Binary detectors simplify the detection process by producing binary outputs, strictly indicating the presence or absence of radiation events determined through predefined threshold crossings. They can be treated as special case of counting detectors, which resolution is one bit. This streamlined mode substantially reduces complexity and computational load, facilitating rapid detection responses and minimal latency in high-event-rate environments, beneficial for real-time processing and trigger mechanisms in particle physics experiments [85].

#### **Tracking Detectors**

Tracking detectors play a central role in high-energy physics experiments by providing precise measurements of charged particle trajectories. Depending on their readout strategy, they can be grouped into three main types: static zero suppression detectors (ignore baseline noise below a fixed threshold) [86], dynamic zero suppression detectors (adapt thresholds in real time to match varying noise conditions) [87], and triggered readout detectors (record data only when specific conditions are met, often using signals from other detectors as triggers) [88]. The choice among them strongly affects the efficiency, resolution, and overall performance of a detection system.

## 2.3.4 Noise Considerations

Noise critically affects detector performance. Typically noise is characterized by expressing an input referred noise in equivalence of charge that would be involved to yield the measured Root Mean Square (RMS) noise at the output. This quantity is called total Equivalent Noise Charge (ENC), and it aggregates contributions from thermal (Johnson-Nyquist), shot noise from semiconductor junctions, and flicker (1/f) noise [89]. Minimizing ENC requires meticulous selection and optimization of electronic components, layout techniques to minimize parasitic elements, and carefully optimized pulse shaping strategies.

#### 2.3.5 Challenges in Implementing In-Situ Processing

Implementation of advanced in-situ processing presents several challenges, including:

• **Power Consumption and Heat Dissipation:** Increased circuit complexity within pixels raises power consumption, necessitating efficient thermal management [90].

- Area Constraints: Limited silicon area requires careful optimization of processing circuits to prevent compromising pixel size or detector performance [91].
- Reliability and Radiation Hardness: Complex circuits within detectors must be robust against environmental stresses, including radiation, demanding rigorous validation and reliability assurance [92].
- Susceptibility to Mismatches and Drifts of Parameters: Process variations and device mismatches can introduce threshold dispersion, gain non-uniformity, and timing skew across the matrix, requiring calibration and compensation schemes to maintain uniform detector performance [93].

## 2.4 Traditional Readout Methods

The earliest pixel and strip detectors adopted comparatively simple data-extraction schemes that are still instructive reference points for modern architectures. This section briefly reviews three canonical approaches-direct links, frame-based readout, and polling-based (token-passing) readout-and illustrates each one with their implementations.

## 2.4.1 Direct Link Readout Architecture

A direct link readout architecture is the most straightforward configuration for transferring data from individual sensor channels to the acquisition system. In this scheme, each sensor output (or pixel/channel) is connected via a dedicated physical path to the Data Acquisition (DAQ) system, enabling parallel, contention-free signal readout. Simplified structure of the direct link system is shown in Figure 2.7. While this architecture becomes infeasible for large-scale pixel matrices due to routing congestion and I/O limitations, it remains highly effective in systems with a limited number of readout channels or when signal fidelity is critical [94].

This configuration is particularly useful in applications such as strip detectors or low-pixel-count spectroscopy arrays, where each channel carries orthogonal and essential spatial or spectroscopic information. Since each channel is individually wired, it permits simultaneous signal acquisition with minimal latency and no arbitration requirements. This is especially advantageous in time-critical or waveform-preserving applications such as gamma-ray tracking.

A successful realization of the direct-link idea is the *Maia* X-ray fluorescence imager, developed jointly by Brookhaven National Laboratory (BNL) and Commonwealth Scientific and Industrial Research Organisation (CSIRO) [95]. The system uses an array of 384 planar Si detector diodes, each wire-bonded to its own pulse-processing chain. The system is shown in Figure 2.8 and consists mainly of an array of sensor elements, spectrometer Analog Front-End (AFE) ASIC HERMES, and readout ASIC SCEPTER [96].



Figure 2.7: General structure of a direct link readout architecture. Each channel of the sensor is connected via a dedicated readout path to a DAQ system. While scalable only to a limited extent, this approach avoids data collisions and timing ambiguities.



Figure 2.8: Maia X-ray Microprobe Detector Array System [97]: a) functional block diagram of detector subsystem and b) picture of wire-bonds connecting readout ASICs to sensor and to each others.

Another example of a direct-link ASIC is the **AVG3\_Dev** [98], developed at BNL and presented in Figure 2.9. The chip is designed for 3D position-sensitive CZT and TlBr detectors, where each anode, cathode, and pad electrode is individually processed by an analog front-end chain and sent out for waveform digitization [99]. This ASIC integrates:

- A bipolar CSA with adaptive continuous reset,
- A first-order analog shaper,

• A single-to-differential output buffer.

The analog outputs are routed off-chip to external ADCs for further digital processing, allowing for high-resolution reconstruction of event timing and energy.



Figure 2.9: Block diagram and microphotograph of AVG3\_Dev ASIC. Each of the 32 channels is equipped with a full analog front-end chain. Signals are processed independently and sent to external ADCs for waveform digitization [98].

Although the routing overhead and number of I/O pads scale linearly with the number of channels, this approach can be advantageous when the number of channels is moderate (e.g., tens to hundreds) and/or precise analog waveform preservation is required. Such readout schemes are well-suited for strip detectors and 3D sensor arrays with separate pad electrodes, as used in Compton imaging and gamma-ray tracking systems.

#### 2.4.2 Frame-Based Readout Architecture

Frame-based readout is a classical architecture in pixel detector systems, applied to classical imaging, similar to digital cameras, where the entire sensor matrix is periodically scanned, typically in a raster order, and read out, regardless of whether each pixel contains meaningful data. In this approach, all pixels record data simultaneously over a defined acquisition period, and the complete frame is transferred to the data acquisition system during a subsequent readout phase. This results in a clearly defined exposure window, called a frame, and a consistent data structure is kept across frames. This method is used for applications where data sparsity is low (most pixels are active) or image-style data is desired.



Figure 2.10: Timepix2 chip [100].

A representative example is the **Timepix2 ASIC** [100] (Figure 2.10), developed by the Medipix2 Collaboration. The Timepix2 consists of a 256  $\times$  256 pixel matrix with a 55  $\mu$ m pitch and supports various operation modes such as ToT, ToA, or hit counting per frame. The readout is frame-based, meaning the matrix is shuttered for a defined period and all pixel data is read out afterward.

Frame-based readout architectures, such as those in the Timepix2, are robust and deterministic, making them suitable for applications with high event rates or imaging-style data needs. However, under sparse or bursty conditions (e.g., space radiation monitoring), a significant portion of each frame may contain zero-hit pixels, making this approach less efficient in terms of bandwidth and power. It does not preserve intrinsic timing and requires zero suppression down the processing chain [101].



Figure 2.11: 3D-stacked CMOS image sensor architecture with global and rolling shutter modes. Storage and readout are decoupled by using per-pixel memory and dual-layer substrate [102].

Frame-based architectures are also widely used in **CMOS image sensors**, i.e., those in hand-held devices accompanying humans in everyday tasks, such as the 3D-stacked sensor described in [102] and shown in Figure 2.11. In this system, photodiodes and storage elements are separated into two vertically integrated silicon layers, enabling both global and rolling shutter modes. The architecture supports simultaneous readout and exposure through independent peripheral circuits on each substrate, improving efficiency and enabling advanced functions like auto-focus and rolling exposure preview.

While not ideal for high-throughput or sparse data applications, frame-based architectures remain highly valuable in imaging, biomedical, and moderate-rate experimental scenarios, especially where deterministic timing and uniform exposure are desired.

## 2.4.3 Polling-Based Readout Architecture: General Concept and Example from VIP2a

Polling-based readout architectures represent a class of schemes where each pixel or channel in a detector array is periodically or sequentially checked to determine whether it has registered a valid event. In contrast to frame-based readout, which retrieves all data regardless of activity, polling mechanisms aim to identify and read out only hit pixels, thus optimizing bandwidth and power usage, especially in sparse environments.

A typical variant is **token-based polling**, where a token propagates through the array in a predefined sequence (e.g., row-wise, column-wise, or daisy-chained). The token acts as an access grant signal: if a pixel has been hit, it captures the token, reports its data (address, timestamp, signal amplitude), and then releases the token to continue down the chain. The advantage of this method lies in its simplicity and localized arbitration, as it eliminates the need for global bus access control or matrix scanning.



Figure 2.12: Polling-based token passing scheme, as used in VIP2a. A circulating token identifies hit pixels one-by-one and triggers their data transmission [103]

One notable implementation of polling-based readout is found in the **VIP2a** chip (Vertically Integrated Pixel), developed at Fermilab for the International Linear Collider (ILC) vertex detector system [103]. The token passing scheme in this chip is shown in Figure 2.12. VIP2a uses a three-tier 3D integrated architecture with pixels measuring  $30 \times 30 \ \mu\text{m}^2$  arranged in a  $48 \times 48$  matrix. Each pixel contains analog signal capture, time stamping, and hit logic.

The readout process in VIP2a is organized into two phases:

- Acquisition phase: Pixels monitor incident radiation and store hit information locally.
- **Polling phase (report)**: A token is injected into the pixel matrix and traverses it sequentially. Hit pixels intercept the token, transmit their data by activating the corresponding X-Y lines. After this, the token is passed along.

The key benefit of polling-based schemes like this is that readout is restricted to active channels, which dramatically reduces the data volume under low occupancy. The architecture is thus inherently well-suited for high-granularity detectors in low-rate or burst-mode experiments. Implementing the token passing scheme requires a small amount of circuit resources.

However, some limitations exist. Because only one token is active at a time, readout latency increases with the number of hit pixels, especially in long chains. In practical terms, it means that injecting the next token needs to wait a time equal to the longest token propagation time, which is the time it takes to traverse the entire matrix when it is empty. Propagation delay becomes the dominant factor in determining system speed. Hierarchical polling trees or multi-token schemes can help alleviate this bottleneck. Continuous activity that is not periodic may cause additional power consumption and interference with the low-noise front-end.

Polling-based readout has been widely used as the practical choice whenever a true event-driven solution was not available [104]. In this approach, a token is circulated through the array, and a pixel with valid data retains the token long enough to transmit. This makes polling closer in spirit to event-driven operation than frame scans, since only active pixels are actually read out. In monolithic or vertically integrated detectors, where routing and power budgets preclude dedicated links or continuous readout, polling becomes the most viable option. Its continued use mainly reflects the lack of a robust method for building truly event-driven arbitration until now.

## 2.4.4 Summary of Conventional Readout Architectures

The readout architecture of a pixel detector plays a central role in determining its efficiency, bandwidth utilization, latency, and suitability for different application environments. In this section, we have reviewed three representative classes of conventional readout schemes: **direct link**, **frame-based**, and **polling-based** architectures. Each approach comes with distinct trade-offs between complexity, scalability, and performance.

#### **Direct Link Architecture**

In direct link architectures, each pixel or sensor channel is connected to an independent readout path, typically consisting of analog or digitized outputs routed off-chip. This configuration is ideal for low-density systems or specialized setups where analog waveform fidelity is critical. The architecture enables low-latency, parallel readout with minimal logic overhead.

#### **Strengths:**

- True parallelism with no arbitration needed.
- Able to preserve high-fidelity analog signals.
- Minimal latency and deterministic response.

## **Drawbacks:**

- Lack of scalability due to routing congestion and I/O limitations.
- High power consumption and pad count for large arrays.

#### Frame-Based Architecture

Frame-based readout collects the state of all pixels in a matrix over a global shutter window and then reads out the full frame. This architecture is typical in imaging systems and detectors operating in high-flux environments.

#### Strengths:

- Simple control logic and uniform data structure.
- Periodic operation with consistent timing across the pixel matrix.
- Well-suited for high-occupancy applications or continuous imaging.

#### **Drawbacks:**

- Inefficient under sparse conditions most pixels may be empty.
- Fixed latency and power consumption regardless of activity level.

### Polling-Based Architecture.

Polling, especially in its token-passing form, traverses the pixel matrix to identify and read out only the hit pixels. The scheme is widely used in monolithic and vertically integrated sensors where real estate and power are constrained. Tokens enable hit pixels to be isolated without broadcasting global readout commands. Nevertheless, it is still required to have some timing circuitry to move the token further; otherwise, the pixels do not know when they should stop sending their data.

#### **Strengths:**

- Sparse readout reduces data volume and power.
- Local circuitry simplifies global control.
- Better compatibility with in-pixel timestamping for precise tracking than just relying on frame information.

#### **Drawbacks:**

- Readout latency increases with the number and spatial distribution of hits.
- Token delays accumulate, limiting throughput in high-occupancy conditions.
- Still relies on ordered traversal, introducing temporal skew and complexity in dense systems.
- Continuous activity, causing power consumption and interference.

#### **Toward Event-Driven Readout.**

Despite their utility, traditional architectures remain fundamentally constrained by either global synchronization (as in frame-based systems) or serialized access (as in polling). These approaches tend to underperform in systems with dynamically varying data rates, low occupancy, or the need for ultra-low-latency and fairness.

# 2.5 Asynchronous Logic in Pixel-Detector Readout

Modern pixel detectors are often deployed in environments where radiation-induced events occur sparsely, irregularly, and with highly variable local rates. Illustration of this environment is X-ray fluorescence imaging: after atoms are excited, their relaxation times are long and the subsequent photon emissions happen spontaneously, without any predictable timing [105]. In such conditions, synchronous readout systems based on a global clock can prove inefficient and be limiting. Clock-driven architectures compel all channels to operate in lockstep, resulting in unnecessary power consumption, increased digital switching noise that couples into sensitive analog nodes, and constraints on the timing resolution dictated by the clock frequency.

An alternative approach is to employ *asynchronous* (clock-less) digital logic. In asynchronous systems, each channel or pixel operates independently and only when necessary, using handshake protocols to ensure safe communication and mutual exclusion for accessing shared resources. This mode of operation inherently adapts to sparse and bursty data, making it well-suited for high-occupancy or event-driven detection scenarios. First and foremost, it optimizes readout resources, allowing for maximum data transfer volumes with the smallest resource overhead.

#### **Classification of Digital Logic**

Digital logic can be broadly divided into two categories: *combinational logic* and *sequential logic*. Combinational logic computes outputs purely as a function of current inputs, without memory or history. It is ideal for arithmetic computations, encoding, decoding, and other stateless operations. Figure 2.13 shows representative gates commonly used in combinational circuits.

Sequential logic, on the other hand, incorporates memory elements, such as latches (e.g., RS latch as in Figure 2.14) or flip-flops, to maintain state. Outputs are a function of both current inputs and historical sequences. This enables the realization of counters, finite-state machines, data registers, and more complex control mechanisms.

#### Synchronous vs. Asynchronous Sequential Logic

Sequential logic can operate either under a global clock (*synchronous*) or via event-driven transitions (*asynchronous*). Synchronous systems use clock edges to trigger state changes, ensuring predictability and ease of timing analysis. However, global clock distribution becomes a significant burden in large-scale systems, especially those with thousands of channels, as encountered in advanced pixel detectors.



Figure 2.13: Basic combinational logic gates: (a) inverter, (b) AND/NAND, (c) OR/NOR, (d) XOR/XNOR.



Figure 2.14: RS latch: a fundamental asynchronous sequential element, constructed from cross-coupled NAND or NOR gates.

Asynchronous logic eliminates the need for a global clock. Instead, control is achieved through hand-shaking protocols where data producers and consumers coordinate using request and acknowledge signals. This offers several advantages: reduced power consumption, better modularity, resilience to clock skew, and improved noise isolation. Figure 2.15 provides a conceptual comparison between synchronous and asynchronous control models.



Figure 2.15: Comparison of synchronous (left) and asynchronous (right) logic paradigms.

#### **Challenges: Hazards and Metastability**

Asynchronous circuits are not without their challenges. Two major concerns are *logic hazards* and *metastability*.

A *logic hazard*, also manifested by a phenomenon called *a glitch*, which is a spurious transition at a logic output caused by differing path delays through a combinational network. Hazards may temporarily violate logical correctness, inadvertently triggering control logic [106]. These effects are especially dangerous in asynchronous systems that rely on edge-sensitive handshaking signals. Different types of hazards are shown in Figure 2.16.

*Metastability* occurs when an input transition arrives at a bistable element, such as a latch or flip-flop, within its setup or hold window, causing the output to temporarily settle at an indeterminate voltage level between logic 0 and 1 - what can be observed on simulation results presented in Figure 2.17. This state may persist for an arbitrary duration before resolving, potentially propagating incorrect values or violating signal timing [107].



Figure 2.16: Different types of logic hazards in digital circuits [106].

Figure 2.17: Simulated latch metastability under increasingly fine input timing shifts [107].

## **Mitigation Techniques**

In synchronous systems, these issues are often masked: hazards are suppressed by latching stable outputs, and metastability is mitigated through careful Static Timing Analysis (STA) or Statistical Static Timing Analysis (SSTA), often using synchronizer chains at domain crossings.

In asynchronous logic, explicit mitigation is required. Several circuit techniques are employed [108] [109]:

• C-elements (Muller gates): output changes only when all inputs agree, useful for synchronization and delay-insensitive data paths. Different implementations are shown in Figure 2.18.

- **Seitz arbiters:** RS-latch-based arbiters used to decide which request among multiple contenders proceeds, ensuring fair and glitch-free arbitration. Implementation of such an arbiter is shown in Figure 2.19.
- Isochronic forks: controlled splitting of signals, assuming equal delays to all destinations.
- Metastability filters: skewed inverters/buffers or cross-coupled inverters through their power and input pins that locally mask out the metastable state, not allowing it to be propagated until the metastable state is resolved.







Figure 2.19: Arbiter circuit together with metastability filter used for asynchronous priority resolution.

These design primitives form the foundation of many asynchronous systems, which are also employed in some pixel detector readouts, enabling flexible and scalable architectures that adapt to local event rates while mitigating timing uncertainty.

#### Conclusion

Asynchronous logic offers a compelling paradigm for readout systems in pixel detectors, particularly under conditions of sparse data and non-uniform activity. By operating without a global clock, these systems reduce noise and power overhead while offering precise event timing. However, the design must explicitly address hazards and metastability through careful circuit-level and architectural techniques. The concepts discussed here underpin many modern efforts in scalable, robust pixel readout architectures.

# 2.6 Asynchronous, Event-Driven Readout Architectures

Building upon the principles of asynchronous digital logic, event-driven readout architectures represent a significant departure from traditional synchronous and polling-based schemes. In these architectures, pixel activity directly dictates the timing of data transmission. Each event triggers its own readout sequence, enabling systems to respond immediately to radiation hits without any centralized control or global scanning.

This approach is particularly well-suited for pixel detectors operating in low to moderate occupancy regimes, where the vast majority of pixels remain idle during any acquisition cycle. By allowing only active pixels to engage in communication and signal propagation, event-driven readout drastically reduces power consumption, interference, and data bandwidth.

The design philosophy centers around three principles [108]:

- 1. Event-local activation: Pixels are quiet until a threshold-crossing event occurs.
- 2. **Asynchronous handshaking**: Data transmission is coordinated by request/acknowledge signaling.
- 3. **Arbitrated access**: Shared resources (such as address buses or output serializers) are accessed via arbitration logic.
- 4. **Graceful termination**: Once the required data has been transferred, the readout transaction is explicitly concluded and the shared resources are released.

Such architectures are inherently modular and scalable, and their timing precision is governed by the internal delays of handshake chains rather than clock domains. This opens the door to high-speed, noise-immune, and highly parallel pixel matrix designs that align naturally with the stochastic and unstructured nature of ionizing events in radiation detectors.

#### 2.6.1 The Address-Event Representation Protocol

One of the earliest and most widely adopted asynchronous readout protocols is the **AER** [108]. Initially conceived for neuromorphic vision sensors, AER enables individual pixels or nodes to asynchronously report their activity by transmitting a unique digital identifier (e.g., in the form of a pixel address) onto a shared output bus. Conceptual block diagram of AER is shown in Figure 2.20.

The core operation of AER relies on a distributed request (req) - acknowledge (ack) handshake. When a pixel detects an event, it asserts a req signal to an arbiter. If no other request is being serviced, the arbiter grants access with an ack signal, prompting the pixel to place its address on the output bus. Once the address is latched, the handshake resets, and the subsequent request may proceed.

#### **Key characteristics of AER:**

- Sparse readout: Only active pixels consume bus bandwidth.
- Scalability: Pixels can be grouped hierarchically; multiple AER buses can be used in parallel.
- Low latency: No frame-based nor token propagation delays event data is transmitted almost immediately.



Figure 2.20: Conceptual diagram of AER communication. Only pixels with pending events engage in arbitration to place their address on the output bus [110].

• Address encoding: Each pixel or block has a unique binary identifier representing its location in the matrix.

Despite its simplicity and efficiency, AER has limitations. While AER excels in reporting *that* an event occurred and *where*, but without a well-specified transaction end/timeout policy, long waits or stuck handshakes can threaten fairness and data integrity.

These limitations have motivated the exploration of enhanced event-driven architectures that preserve AER's benefits while addressing its drawbacks, particularly in HEP applications where fairness, throughput, and data integrity are critical. In the following sections, we explore such extensions, including asynchronous and non-priority arbitration using tree-based latches, as well as multi-phase data reporting schemes.

#### 2.6.2 Address-Encoder and Reset-Decoder Architecture

An early and influential example of sparse, address-based readout is the MEPHISTO 128-channel front-end chip for detectors readout [25]. Rather than buffering full frames, MEPHISTO loaded the instantaneous hit pattern from a discriminator array into flip-flops and then, every clock cycle *decoded only the active channels*. The SCANNER logic, consisting of a binary tree of OR-gates and back-propagating logic, locates up to two hits per cycle and directly produces their addresses, which are then written, together with a time tag, into a small FIFO that bridges the external trigger latency. Trigger-validated addresses are next moved to a second, shallow FIFO. The entire system creates a compact, zero-suppressed data stream with markedly reduced area and power compared to deep pipeline approaches. The block diagram of MEPHISTO architecture and SCANNER block are shown in Figure 2.21. Although conceived for strips and operated in a synchronous, trigger-driven mode, MEPHISTO can also work in untriggered mode, in which all events are read out without waiting for an external trigger signal [111].



Figure 2.21: The MEPHISTO architecture: a) block diagram of the system, b) principle of operation of SCANNER block [111].

Based on the idea of MEPHISTO, to overcome the limitations of rolling shutter and polling-based schemes in MAPS, the ALICE collaboration developed a specialized data-driven readout architecture known as the Address-Encoder and Reset-Decoder (AERD) [26]. This architecture is implemented in the ALPIDE chip, as shown in Figure 2.22, the monolithic CMOS pixel detector designed for the ITS upgrade of the ALICE experiment.

The AERD scheme combines in-pixel state storage with hierarchical arbitration logic. Rather than scanning through the entire matrix, the AERD logic directly identifies and reads only the pixels that have recorded hits, encoding their position using a priority-based tree structure and immediately resetting each pixel after readout. This zero-suppression technique enables fast and power-efficient operation even in large-area matrices.

#### **Operational Behavior and Readout Control:**



Figure 2.22: General structure overview of ALPIDE chip [112].

Contrary to fully event-driven architectures, the AERD implementation in ALPIDE relies on a controlled snapshot mechanism. Pixels do not latch hit events immediately as they occur. Instead, the state of the matrix is explicitly *frozen* at the beginning of the readout sequence:

- A STROBE signal is issued globally from the chip periphery. Pixels with active discriminator outputs during this interval latch their state into dedicated memory cells.
- After this strobe-based capture, the matrix contents remain static. No new hits are registered during the ongoing readout cycle.
- The AERD logic then begins sequentially encoding and reading out the addresses of the latched hits, resetting each pixel upon successful readout.

This separation between *hit integration* and *readout* phases prevents timing races and arbitration errors, ensuring that the readout operates deterministically on a fixed snapshot of the detector state.

Basic logic block of the AERD system is shown in Figure 2.23. **Core architectural elements of whole system are:** 

- RS-based state latch per pixel,
- Multi-level arbiter tree and fast-OR logic to assert data valid flag VALID,
- Hierarchical address encoder with tri-state drivers,
- Gated SYNC signal for priority selection and pixel reset,

This architecture enables the ALPIDE chip to achieve power consumption and integration times well below the limits imposed by traditional rolling shutter approaches. The freezing of the pixel state ensures deterministic operation and avoids conflicts during priority-based address resolution. However, because it



Figure 2.23: Basic logic block of the AERD system: priority logic, address encoder, and reset decoder. These are arranged hierarchically to enable fast arbitration and sparse readout [26]

processes only one snapshot per STROBE, AERD is best described as a *data-driven* architecture rather than a continuously event-triggered one. AERD maintains low logic overhead and clock synchronization, whereas fully asynchronous systems require careful handling of metastability. Additionally, the timing resolution is degraded to what is allowed by the period of the STROBE signal.

# **Chapter 3**

# **Proposed EDWARD Architecture**

# 3.1 Architecture Overview and Design Objectives

## 3.1.1 Rationale for Event-Driven Approach

The rapidly increasing pixel density, sparse nature of measured phenomena, and need for reducing links characteristic of modern radiation detection applications demand readout architectures capable of efficiently managing data flows while maintaining minimal latency and providing maximum throughput. Traditional frame-based and polling-based readout systems, although widely employed, exhibit inherent limitations, notably high data redundancy, unnecessary power consumption, and considerable latency. Moreover, architectures based on priority encoding introduce unfairness and a risk of data corruption due to combinational logic conflicts that occur during simultaneous events.

To overcome these limitations, the EDWARD architecture is proposed [110]. It fundamentally diverges from conventional methods by initiating data transmission only upon the occurrence of significant, individual events detected within pixels, thereby filtering data, optimizing the bandwidth utilization, and reducing unnecessary power consumption. The event-driven strategy inherently supports the management of sparse data, as seen in X-ray fluorescence imaging, where only some pixels detect meaningful events at any given moment. The EDWARD architecture has been disclosed in a patent application, which is currently pending in the United States [113], Europe [114], Australia [115], and Japan [116].

## 3.1.2 Objectives: Throughput Optimization, Latency Minimization, Fairness

The primary objectives of the EDWARD architecture are: throughput optimization, latency minimization, energy efficiency, and equitable arbitration among pixels.

Throughput optimization is achieved by significantly reducing redundant data transfers inherent in frame-based readouts and eliminating the sequential latency of polling methods. By directly addressing only the pixels with valid events, EDWARD significantly enhances the effective data bandwidth, thus maximizing the detector's overall throughput capabilities.

Latency minimization is realized by asynchronous logic implementations and an acknowledge-based arbitration method, ensuring minimal delay from event occurrence to data transmission. Unlike frame-based

systems, EDWARD does not require a global clock to manage pixel access, thereby reducing synchronization overhead and allowing for immediate event-driven responses.

Energy efficiency is linked with the absence of any activity in the matrix of pixels when there are no signals to transmit. This also ensures optimal operating conditions for very sensitive front-end blocks, as the absence of activity translates into no interference.

Fairness in arbitration is another critical objective, achieved through non-priority asynchronous arbitration. Unlike priority-encoded approaches, the EDWARD architecture employs RS-latch-based arbiters arranged in a hierarchical tree structure, guaranteeing unbiased access to the shared data pathways and preventing data corruption resulting from priority conflicts.

# 3.2 EDWARD System-Level Description

## 3.2.1 Functional Block Diagram



Figure 3.1: EDWARD architecture block diagram [117].

The EDWARD architecture, whose high-level block diagram is shown in Fig. 3.1, comprises several key functional blocks: in-channel logic present in each of the pixels/channels, an arbitration tree, synchronization circuitry, and the peripheral data handling structures that include pull-up/down network [117]. Each pixel independently detects and signals events. When an event occurs, the in-channel logic generates a readout request that propagates through the arbitration tree to gain access to a shared data bus, which can be either digital or analog. Acknowledge signals are managed and distributed synchronously with an external clock,

providing easy communication with standard acquisition systems, granting almost immediate and exclusive access to shared buses (or more general - shared resources) upon arbitration resolution. If no channel is requesting to be read out, pull-up/down network sets *default* state on the bus.

# 3.3 In-Channel Logic and Data Handling

The in-channel logic is the first active component of the EDWARD readout chain, implemented within each pixel or pixel group. It performs autonomous event detection, generates readout requests, sequences multiphase data transmission, and drives the output buses. Crucially, this logic operates entirely asynchronously, relying not on global clock edges but on local signal transitions. Its design enables efficient, low-latency data handling and fair arbitration, especially under sparse or bursty event conditions.

## 3.3.1 Event Detection and Request Generation

Each channel is equipped with a mechanism to detect meaningful events. Each event acts as a trigger and can originate from either an analog front-end (such as a charge-sensitive amplifier and discriminator) or from digital logic – for example, a Poisson-distributed signal generator used in testing, as illustrated later. Once a valid event is identified, the channel asserts a req (request) signal to initiate arbitration. The request propagates through the asynchronous arbitration tree, signaling the channel's intent to transmit data.

This process does not rely on a clock signal. It is governed solely by the occurrence of the event and by transitions on the ack (acknowledge) line, which delivers access permission in a manner similar to a token.

### 3.3.2 Token Concept and Asynchronous Handshake

In the EDWARD protocol, a *token* is a level-encoded signal injected at the root of the arbitration tree and routed selectively to a requesting channel, i.e., it does not pass through all empty pixels before reaching the one with the request activated. When the token arrives at a channel with an active req, the ack line toggles, signaling that arbitration has succeeded. This ack transition serves as the handshake trigger for the channel's local logic, allowing it to begin readout.

As shown in Figure 3.2, each acknowledge pulse advances the readout phase, enabling a new piece of data to be transmitted. The handshake remains local and fully asynchronous-no global synchronization is needed beyond the token's periodic generation.

## 3.3.3 Multi-Phase Readout and Done Signal

Each channel can be configured to transmit data in multiple phases. These may include a pixel address, analog amplitude, timestamp, or other metadata. The number of phases is defined per channel by configuration bits (cfg[1:0] in Figure 3.2). Channels can operate in heterogeneous modes-e.g., one sending two digital words, another sending three.

The logic that manages this sequencing is composed of:

• A readout phaser - a chain of D flip-flops that counts and gates which data output to activate,



Figure 3.2: Timing diagram of a multi-phase in-channel readout. Two independent channels with different configurations (cfg[1:0]) are read out using the same acknowledge line. Each ack transition triggers a new phase. When the final phase is completed, the channel asserts done and clears req [110].

- A done indicator a comparator that detects when the final configured phase is reached,
- A **controller** the logic that initializes the phaser on event arrival and generates req.

These elements are shown in structural detail in Figure 3.3. The done indicator produces the done signal, which simultaneously disables the request and resets the internal state, freeing the arbitration path for other channels.

#### 3.3.4 Parallel Digital and Analog Output

A defining feature of EDWARD is the ability to transmit digital and analog data in parallel. Separate buses are dedicated to each data type, allowing, for example, a pixel to simultaneously send a digital address and drive its analog signal to an amplitude-hold line during the same phase.

As illustrated in Figure 3.4, each channel connects to the shared digital bus via tri-state drivers and to the analog bus via transmission gates. The same arbitration handshake controls both buses. When a token is granted, only the acknowledged channel drives either bus; all others remain in a high-impedance state. A passive pull-up/pull-down network ensures a defined idle value on the digital lines.

#### 3.3.5 Asynchronous Logic and Phase Progression

The in-channel phaser uses edge-sensitive flip-flops, but unlike clocked FSMs, these flip-flops are triggered by asynchronous events:

- The edge of the ack line advancing the phase,
- The assertion of the rdy signal initializing the logic.



Figure 3.3: In-channel logic structure including the controller (left), readout phaser (center), and done indicator (right). The controller generates req based on rdy. The phaser selects the active output phase on each ack. When the final phase is reached, done is asserted and the system resets [110].

Because the logic does not rely on a free-running clock, it avoids timing closure issues and reduces digital switching noise. Each pixel channel operates independently, consuming power only when actual activity occurs, which is really suited for sparse event detectors.

A critical benefit of this design is its low susceptibility to metastability and race conditions. Because the readout phaser progresses sequentially, with only a single flip-flop changing state in response to each acknowledge pulse, the logic avoids simultaneous transitions across multiple elements. This serialized switching minimizes the risk of ambiguous states, or glitches, that are common in fully combinational arbitration schemes. Additionally, at the implementation level, the design mitigates races in the control-feedback loop by ensuring that the reset signals for the phaser and the done indicator are carefully delayed. Without such delay, immediate deactivation of the reset could create an oscillation in a feedback loop.

## 3.3.6 Channel Configuration and Operational Modes

Each pixel channel includes a minimal configuration register block, allowing flexible operation. The main fields typically include:

- Enable/disable bit,
- Number of readout phases,
- Mode selector (analog, digital, or combined),
- Test-mode activation (e.g., forced request).

Configuration bits, such as cfg[1:0], are used directly in the readout controller and phaser logic (see Figures 3.2 and 3.3), influencing both the behavior and duration of data transmission. This per-channel



Figure 3.4: Simplified schematic of shared bus interfaces. Each channel drives the digital bus (dbus) through tri-state drivers and the analog bus (abus) through transmission gates. Output control is synchronized through the rdy/ack/req signals managed by the in-channel logic [17].

flexibility supports diverse use cases within a single matrix, simplifying functional testing, calibration, and diagnostics.

# 3.4 Asynchronous Arbitration Mechanism

The arbitration mechanism lies at the heart of the EDWARD architecture. Its primary function is to ensure mutually exclusive access to shared resources, such as the data bus, when multiple pixel channels request readout simultaneously. Unlike priority encoders, which introduce deterministic bias, EDWARD's arbitration strategy is fully asynchronous and non-prioritized. It is built around RS-latch-based elements, also known as Seitz arbiters [109], configured into a binary tree that enforces fair and stable arbitration through temporal resolution rather than spatial preference.

There are two distinct ways of building an arbitration mechanism. The first one, known as non-greedy (fair), in which the whole arbitration process is always performed from the root. Thanks to that, each operation has predictable timing. The second one is a greedy one, also known as unfair, in which the local rerouting can occur. It can speed up the operation, but at the cost of less predictability and may starve branches farther from the last-served path [118]. Both mechanisms are presented in Figure 3.5. These mechanisms and their implementation in EDWARD will be described later.

## 3.4.1 Seitz Arbiter and Grant Logic

At the lowest level, each arbitration cell uses a Seitz arbiter to resolve contention between two incoming request lines. This element ensures mutual exclusion by generating a grant (gnt) signal for only one of



Figure 3.5: Difference between non-greedy/fair (left) arbitration and greedy/unfair (right) arbitration [108].

the inputs, even when both requests are asserted nearly simultaneously. The arbiter is designed to suppress metastability by stabilizing only after a winner is determined, and includes logic to propagate the acknowledge token only to the granted direction.

A simplified representation of this behavior is illustrated in Figure 3.6, showing an arbitration cell with two request inputs, an acknowledge input (acki), and two acknowledge outputs (ack0, ack1). If either req0 or req1 is active when a token arrives, the arbiter will select one of them and route the token accordingly.



Figure 3.6: Structure of arbitration cell type 0 using Seitz arbiter. Token from acki is granted to ack0 or ack1 based on incoming requests. Only one grant signal is allowed to propagate [17].

## 3.4.2 Arbitration Cell Type 0: Baseline Behavior

The most basic implementation, known as Type 0, passes the acknowledge token immediately when a valid request is received. It offers minimal latency and resource overhead, making it well-suited for higher levels of the arbitration tree, especially near the root.

However, Type 0 cells are susceptible to certain race conditions. As illustrated in Figure 3.7, overlapping sequences during token withdrawal and path rerouting may cause transient glitches on internal signals like rgo (request output to next level). If a second request is made during this time, it may result in incorrect token propagation or unstable behavior.



Figure 3.7: Waveforms showing arbitration cell Type 0 behavior under simultaneous requests. Transient glitches on rqo may appear during rerouting.

#### **Waveform Explanation (Figure 3.7)**

At the beginning of the waveform (1), no requests are present, and the arbitration cell remains idle.

Once multiple requests arrive simultaneously, the Seitz arbiter resolves them  $(2G \to 2J)$ , ensuring only one grant is active at a time (2J). The metastable state at the SR latch output is filtered (2H), and a request output is generated for the next arbitration stage (2I). If a valid acknowledge token is present, it is immediately passed to the selected requester (2A'), and the acknowledge path is propagated through the tree (2A', 4A, 6A, 8A). When the selected channel finishes its readout, the request is cleared by the in-channel logic (4C, 8C), and the

• The first sequence gates the token locally, disconnecting the acknowledge output (4D, 8D),

token is withdrawn. During this clearing phase, two internal sequences are triggered in parallel:

• The second clears the request output signal rgo (4E, 8E), allowing the next stage to update its state safely.

These sequences ideally should occur serially, but their parallel nature is tolerated.

Meanwhile, if another request is pending, a third sequence begins (4F) to establish a new token path. This can lead to a short glitch on rgo due to the timing gap between the second and third sequences. If this glitch overlaps with state changes in the subsequent arbitration stage, it may cause instability or spurious acknowledge activity.

Finally, if no new request is present, the token either expires (3B, 7B) or is safely cleared. The cell remains inactive again when no request or valid token is present (5).

To mitigate this, a modified version of Type 0 can be considered, where rqo is computed directly from the input requests instead of the grant signals. This change avoids ambiguity during rerouting and eliminates glitches caused by mutual exclusion logic. However, it merely shifts the problem to another corner case: if a new request arrives during the clearing sequence of a previous transaction, a glitch on rqo can still be generated. Since the cell operates fully asynchronously, and no prediction or gating of incoming requests is possible at this stage, the modified scheme does not entirely eliminate race conditions-it just alters their timing and manifestation.

# 3.4.3 Arbitration Cell Type II: Framed Request Gating

To robustly suppress glitches and race conditions caused by asynchronous request activity during token clearing, the arbitration cell Type II introduces a pre-arbitration stage that gates incoming requests based on the presence of the token. The raw request signals (req) are first converted into *framed requests* (freq), which are latched only when the arbitration cell is in a safe state-specifically, when no valid acknowledge token is currently held. This ensures that only well-timed and arbitration-safe requests participate in the grant selection process.

This design choice addresses a critical limitation observed in Type 0 cells, where a request received during token clearing could corrupt the arbitration state or generate glitches on rqo. In Type II, because freq cannot change while the token is active, the output signal rqo can be safely generated as a logic sum of freq lines. This guarantees that rqo reflects only valid, processed requests and not transient or misaligned activity. Consequently, the request output remains stable during rerouting or handoff, eliminating the risk of spurious token propagation or arbitration locking.

An alternative approach, generating rgo directly from the internal grant signals (gnt), was also considered. While grants are mutually exclusive and token-aware, this method suffers from the same glitch behavior seen in Type 0 cells. During token handover, the Seitz arbiter enforces a temporary drop in all gnt signals, leading to a momentary deassertion of rgo. If this occurs while the next arbitration stage is evaluating the state, it can cause premature token withdrawal or instabilities in downstream logic. For this reason, generating rgo from gnt is not sufficient to guarantee safe operation, and freq-based gating is preferred.

The internal structure of the Type II arbitration cell, including the framing logic and Seitz arbiter, is shown in Figure 3.8. The timing behavior under various request relationships is illustrated in Figure 3.9.

While Type II cells significantly enhance robustness, they introduce one limitation: if a request appears only after the token has already reached the top of the tree, it may not be processed immediately. This is because token entry depends on a request being present before or during the gating period. To avoid potential blocking, Type II cells should not be placed at the very root of the arbitration tree. Instead, it is recommended to use Type 0 cells near the top, where early token injection is most critical, and to reserve Type II cells for deeper layers of the hierarchy, where asynchronous request timing becomes more unpredictable.



Figure 3.8: Structure of arbitration cell Type II. Incoming requests are gated based on token presence to avoid glitches and races [17].



Figure 3.9: Waveforms from a tree composed entirely of Type II arbitration cells. The diagram illustrates cell behavior when multiple channels request readout, and a second request arrives either just before (dotted) or just after (solid) the acknowledge token.

## 3.4.4 Arbitration Cell Type I: Fairness-Oriented Design

In long-readout chains operating under high-rate conditions, spatial proximity to the last-served channel may bias the arbitration process [119]. Without corrective mechanisms, channels located deeper in the tree

or across separate branches may experience excessive delays or even starvation. To address this, arbitration cell Type I introduces a fairness-oriented structure that alters the sequence in which arbitration decisions are made.

Unlike previous cell types, Type I reverses the order of arbitration and pre-arbitration. The incoming asynchronous requests are first arbitrated among themselves to generate mutually exclusive internal signals, and only then are these signals gated with the acknowledge input. This reordering ensures that a request cannot be granted unless the arbitration cell is in a fully idle state-i.e., the token has been entirely withdrawn. As a result, the release of a token triggers a top-down chain reaction: each level of the tree must completely resolve and retract its state before the next token can be routed. This produces a deterministic and globally consistent token distribution path.

A key consequence of this design is that the request output signal rgo must be generated from the internal grant signals (gnt), rather than from the requests or framed requests. This is because only the grant signals represent requests that were both properly sequenced and successfully admitted into arbitration. Using gnt guarantees that rgo will not reflect speculative or out-of-phase activity, preserving the cell's strict handshake discipline.

As illustrated in Figure 3.10, the two-stage arbitration guarantees that tokens are never rerouted locally between channels at the same level. Even if two adjacent channels are the only active requesters, the token must be entirely withdrawn to the top of the arbitration tree before being redistributed to the next valid path. This mechanism eliminates spatial bias and promotes uniform service probability across the matrix.

This behavior is further demonstrated in Figure 3.11, which shows the timing when only two neighboring channels request readout. Despite their proximity, the token follows the full withdrawal protocol before switching direction, ensuring arbitration integrity is maintained.



Figure 3.10: Structure of arbitration cell Type I with fairness-enhancing logic. The two-stage arbitration prevents mid-token rerouting and enforces strict top-down token propagation [17].



Figure 3.11: Waveforms showing the operation of arbitration cell Type I when only two neighboring channels request readout. The token is fully withdrawn and redistributed from the top of the tree, preserving global fairness.

### 3.4.5 Arbitration Tree Organization

In the EDWARD architecture, arbitration cells are arranged in a binary tree that reflects the hierarchy of the pixel matrix. Each node handles request arbitration between two lower-level nodes or pixel groups, while the tree root coordinates final access to shared resources. The total tree depth determines the maximum number of arbitration stages a request must traverse and thus sets the worst-case latency from hit detection to readout.

Arbitration cells are constructed using arbiter elements that differ in their polarity. This polarity defines how the internal logic interprets and generates control signals, ultimately determining the structure of the arbitration cell. Specifically, each arbiter is built with a particular type of SR latch and metastability filter, which together define its polarity and signal conventions.

#### P-type Arbiter (Positive Logic)

- Based on a NAND gate SR latch.
- Uses a **metastability filter** composed of inverters whose VDD (power supply) is gated by the output of the SR latch, forming a feedback loop. This ensures that only one inverter is fully powered at a time, helping to resolve metastable states cleanly.
- Signal conventions:

- reg is active-high,
- acki (acknowledge input) is active-high,
- ack outputs are active-low,
- rgo (request output to next stage) is active-low.

#### **N-type Arbiter (Negative Logic)**

- Based on a NOR gate SR latch.
- Uses a **metastability filter** where each inverter's VSS (ground) is controlled by the SR latch output. The cross-coupling helps the stronger side win during unstable transitions.
- Signal conventions:
  - reg is active-low,
  - acki is active-low,
  - ack outputs are active-high,
  - rgo is active-high.

Depending on which type of arbiters are used, arbitration cells inherit their polarity and must be connected accordingly. Because each arbiter inverts the polarity of the signals it processes, the arbitration tree must alternate between N-type and P-type stages, as shown in Figure 3.12. This alternation ensures that requests and acknowledges retain their intended logical meaning as they propagate through the tree. An invalid sequence, such as N-N or P-P, would invert the signal conventions and result in handshake mismatches.

#### **Arbitration Cell Types and Readout Behavior**

In practice, the arbitration tree is built from a combination of cell types depending on the desired tradeoff between speed, stability, and fairness:

- Type 0 cells are placed near the top where fast token propagation is important.
- **Type II cells** are used deeper in the tree to gate asynchronous requests and suppress timing-related glitches.
- Type I cells are introduced in deeper levels or congested regions to guarantee fairness and prevent starvation.

However, neither Type II nor Type I cells should be placed at the root of the arbitration tree:

• **Type II** cells gate requests based on token presence. If a token arrives before any request, the cell may be blocked entirely, and the current token cannot propagate.



Figure 3.12: Examples of NP and PN stage connections in the arbitration tree. Alternating polarity ensures correct signal interpretation for req, acki, ack, and rqo [17].

• Type I cells delay token withdrawal until requests are resolved and gated. If multiple requests arrive simultaneously, and rgo is generated from raw request inputs rather than internal grants, the token may become stuck until expiration, resulting in dead time and missed readout opportunities.

Therefore, only Type 0 cells are safe and reliable at the topmost level, where fast and unconditional token injection is required.

#### Fairness vs. Local Rerouting

When only Type 0 and II cells are used, tokens are often rerouted locally upon request completion. While this minimizes average latency, it introduces spatial bias and may lead to starvation of distant branches. In contrast, Type I cells enforce full token withdrawal to the top before redistribution, guaranteeing global fairness at the cost of slightly increased arbitration latency.

Figures 3.13 and 3.14 compare both strategies under full-tree request load. These figures are rotated to span full pages for improved clarity of the waveforms.

# 3.5 Synchronization Mechanism and Peripheral Operation

A key feature of the proposed architecture is the ability to coordinate asynchronous, event-driven readout within the pixel matrix with synchronous data transmission to external acquisition systems. This is achieved through a hybrid synchronization approach that combines locally self-timed logic with a global clock interface for output serialization. This section outlines the architectural concepts for token generation, data latching, output bus management, and serialization.



D. Górni Pixel Radiation Detectors with In-Situ Signal Processing and Event-Triggered, Optimized Readout



#### 3.5.1 Global Clock and Acknowledge Token Relationship

The architecture employs a low-frequency global clock to periodically generate a synchronization signal, referred to as an *acknowledge token*. This token acts as a permission pulse that traverses the arbitration tree, enabling one pixel at a time to access the shared output buses. In contrast to fully synchronous architectures, the acknowledge token serves as a global trigger. It enables local, self-timed activity within the pixel matrix.

Arbitration logic ensures that the token propagates only along a valid request path. If no request exists at the time of token injection, the token is discarded without any effect. This strategy eliminates the risk of race conditions and bus contention, maintaining signal integrity across the matrix.

The block-level implementation of this synchronization logic is illustrated in Figure 3.15, where the global clock is divided by the bus width factor M to align the token window with the serializer's latch timing.



Figure 3.15: Peripheral synchronization block. A high-speed external clock is divided by a factor M, equal to the width of the data bus, to generate acknowledge tokens and trigger serializer latching [110].

# 3.5.2 Token Lifetime and Reuse

Tokens are reused as long as valid requests are present. When an active pixel completes its readout by asserting the dne flag, the token is released and allowed to be sent to the next eligible pixel. This results in low inter-event latency and efficient bus utilization, especially under bursty event conditions. Figure 3.2 depicts this reuse behavior in a multi-phase readout scenario.

The token lifetime in the pixel is governed by a local handshake and the pulse width of the clock used to generate it. The request (req) signal remains active until the readout process concludes, ensuring safe token routing and eliminating the possibility of premature token retraction.

## 3.5.3 Local vs. Global Synchronization

Within the pixel matrix, all logic operates asynchronously based on the occurrence of events. Each pixel generates its request and responds to acknowledge transitions independently, without relying on a global clock. This design reduces power consumption, simplifies routing, and eliminates timing skew issues within the matrix.

Conversely, the periphery operates synchronously. The same global clock used for token generation also triggers data latching and serialization, thereby creating a well-defined interface between the asynchronous core logic and the synchronous acquisition systems.

# 3.5.4 Valid Data Detection and Idle State Handling

To distinguish valid data from idle states, the shared digital bus is equipped with pull-up or pull-down resistors that define its default logic level. This ensures that when no pixel is actively driving the bus, its state is unambiguous and cannot be misinterpreted as valid data.

The latch signal is precisely aligned with the start of the next token window, ensuring that latched values correspond to a stable and fully propagated readout state. This timing relationship guarantees that metastability is avoided at the synchronization boundary.

# 3.5.5 Data Serialization and Output Streaming



Figure 3.16: Conceptual serializer architecture compatible with EDWARD. It receives parallel data latched by the token clock and outputs a serial stream at high speed [110].

Once the parallel data from the selected pixel is latched, it is serialized and transmitted off-chip. A ring counter or shift-register-based multiplexer converts the parallel word (e.g., pixel address, timestamp) into a serialized bitstream, as illustrated in Figure 3.16.

The serializer operates with a high-speed clock (e.g., 250 MHz), while the token and latch logic are synchronized to a divided clock (e.g., 17.86 MHz for a 14-bit data word). The resulting serialization window ensures deterministic data alignment and consistent throughput.

Waveforms of the serializer's output operation are shown in Figure 3.17.



D. Górni Pixel Radiation Detectors with In-Situ Signal Processing and Event-Triggered, Optimized Readout

# 3.5.6 Architectural Scalability

The architectural decoupling of asynchronous and synchronous domains supports modular scalability. The absence of a global clock in the pixel matrix minimizes power and routing overheads. Meanwhile, the output interface conforms to standard clocked DAQ system expectations.

The system supports multiple readout modes, including:

- Single-event capture,
- Multi-phase readout for extended metadata,
- Test mode for readout of diagnostic data.

# 3.5.7 Hierarchical Trees and Multi-Bus Topologies

To further improve scalability, the pixel matrix can be divided into smaller sub-arrays, each equipped with an independent arbitration tree and local data bus. This partitioning significantly reduces worst-case propagation delays, since the depth of each tree is logarithmic in the number of local channels.

These local buses can either:

- Connect to a shared high-speed serializer,
- Or be routed independently to dedicated output ports.

Both options are illustrated in Figures 3.18 and 3.19.

These options provide significant design flexibility, allowing the tailoring of the output architecture to application-specific requirements, such as regional readout, selective masking, or burst rate optimization.



Figure 3.18: Sub-array grouping with a spatially distributed arbitration tree and a common serializer. This configuration balances area efficiency and output bandwidth [110].



Figure 3.19: Architecture with independent arbitration trees and separate output paths for each group. This configuration reduces arbitration latency and allows parallel readout [110].

# 3.6 3FI65P1 Chip

The **3FI65P1** (Full-Field Fluorescence Imaging 65 nm Prototype 1) is a prototype ASIC designed to demonstrate the feasibility of applying the EDWARD architecture to a real detector system [120]. Developed as a collaborative effort by the ASIC Team at BNL, the chip integrates analog front-end signal processing with asynchronous event-driven readout logic, enabling spectroscopic imaging at high throughput and low latency.

Its primary application is in full-field X-ray Fluorescence (XRF) imaging for tracing element mapping in complex biological and material samples, such as those studied at synchrotron light sources. An exemplary experimental setup is shown in Figure 3.20. The architecture is designed for sparse event environments, where only a small fraction of pixels register hits per exposure frame.



Figure 3.20: Illustration of the full-field fluorescence imaging concept, where the sample is illuminated uniformly and the detector records the spatial and spectral content of the emitted X-rays [121].

The chip serves as both a scientific detector demonstrator and a validation platform for the EDWARD arbitration protocol, enabling performance evaluation under real operating conditions. The author was primarily responsible for implementing the asynchronous digital logic and configuration/testability infrastructure. At the same time, the analog front-end and peripheral analog circuits were developed in close collaboration with the BNL analog design team.

## 3.6.1 Physical Structure and Pixel Matrix Organization

The 3FI65P1 chip implements a  $32 \times 32$  pixel matrix, totaling 1,024 active channels. Each pixel occupies an area of  $100 \, \mu \mathrm{m} \times 100 \, \mu \mathrm{m}$ , making the full matrix approximately  $3.2 \, \mathrm{mm} \times 3.2 \, \mathrm{mm}$  in size. To manage complexity and ensure modularity, the matrix is divided into  $4 \times 4$  groups of  $8 \times 8$  pixels each.

Each group contains shared control logic for configuration decoding, slow control routing, analog signal multiplexing, and testing/calibration functionality. All pixel-level readout and arbitration logic between them is handled locally within each group using a distributed asynchronous arbitration tree. The digital data bus is 14 bits wide, allowing encoding of the full group and pixel address, and is shared by all groups.

A physical layout of an individual group is shown in Figure 3.21. Analog blocks and digital logic are physically separated, and a central interconnect region houses shared logic and transmission gate structures.



Figure 3.21: Layout of a single  $8 \times 8$  pixel group. Analog blocks and digital logic are separated spatially, and a central interconnect region contains shared resources, including transmission gates and logic.

Analog outputs from each pixel (the central pixel and its eight neighbors) are stored in a sample-and-hold array and passed to an analog buffer located in the periphery through gated transmission stages. The analog and digital outputs are coordinated through a multi-phase readout protocol to ensure coherent timing and data integrity.

The floorplan of the matrix is illustrated in Figure 3.22. Additional arbitration logic, stage buffering, and inter-group routing are placed in the space between the groups. All power, control, and I/O pads are

located on a single edge of the die, below the visible matrix, which simplifies wirebonding and enables straightforward integration in multi-chip detector modules.



Figure 3.22: Top-level layout of the 3FI65P1 matrix. Orange circles indicate the locations of analog transmission gate blocks.

# 3.6.2 Analog Front-End Architecture

The AFE of 3FI65P1 was designed by the BNL analog team. It consists of a low-noise CSA, a semi-Gaussian shaper with pole-zero cancellation, and a discriminator [122]. Each hit is sampled by a time-of-extremum circuit and latched into a sample-and-hold structure.

To support charge-sharing reconstruction, every triggered pixel captures the analog values of its eight nearest neighbors. These values are stored in nine total S/H elements per pixel and are multiplexed sequentially during readout.

The design employs a constant-current discharge mechanism in the CSA to provide predictable pulse shapes and facilitate pulse-height measurement. The analog blocks are optimized for low equivalent noise charge (ENC  $\leq 20\,\mathrm{e^-}$  rms), achieving an energy resolution below 200 eV Full Width at Half Maximum (FWHM) at 6 keV in Silicon sensors.

The analog section uses thick-oxide transistors and is powered by an isolated 1.5 V domain. Digital logic is isolated from the substrate using deep N-well regions enclosing all digital blocks to minimize noise coupling into sensitive analog nodes.

### 3.6.3 Implementation of the EDWARD Readout in 3FI65P1

The EDWARD protocol ties all 1,024 pixels using a two-level asynchronous arbitration tree composed of Seitz RS-latch cells. At the pixel level, a request is triggered either by a local hit (via the peak detector) or by an external test control signal (shtr). A request is acknowledged by a token generated from the global readout clock and routed asynchronously via the tree structure.

Upon receiving an acknowledge, each pixel executes either a single-phase or a multi-phase readout protocol depending on the configuration. In normal operation, the digital data sent during each of these phases includes:

- 1. The 6-bit pixel address within the group,
- 2. The 8-bit group address (totaling to 14 bits).

The pixel also activates its S/H transmission gate and connects the appropriate analog sample to the shared analog bus, which is buffered and sent off-chip. The pixel-level readout phaser logic controls the analog output sequence (central pixel followed by neighbors).

In the test mode, the analog path can be manually sampled to obtain the baseline level voltage. In this mode, it is also possible for a group-level counter to accumulate discriminator crossings. This counter value is read out using a two-phase protocol that outputs the full address in the first phase and the counter value in the second.

The block diagram of the configuration and control infrastructure is shown in Figure 3.23. Each group includes the Serial-Parallel Bus (SPB) controller that distributes configuration data received via the Inter-Integrated Circuit ( $I^2C$ ) interface.

The asynchronous arbitration logic is physically implemented using custom dual-row Seitz arbiter cells integrated into the standard cell library, as illustrated in Figure 3.24. The layout is fully compatible with the 9T (nine tracks height) standard cell library available in the technology files.

The arbitration tree alternates between positive and negative polarity arbiters to ensure symmetry and minimize systematic delay skew. All handshaking and request clearing occur without a global clock, being consistent with the asynchronous philosophy of the EDWARD architecture. Only token generation is derived from the external serializer clock.



Figure 3.23: Custom slow control interface based on I<sup>2</sup>C and internal SPB. Each group contains an SPB entry FSM that routes data to configuration latches.



Figure 3.24: Custom dual-row Seitz arbitration cell implemented in a 65 nm CMOS process and integrated into the standard cell library for hierarchical synthesis.

#### 3.6.4 Configuration and Testability Infrastructure

The **3FI65P1** ASIC integrates a dedicated **CTR** platform to facilitate dynamic control, in-system calibration, and debugging of the pixel matrix [123]. This infrastructure is designed to support both fine-grained perpixel configurability and full-chip management while minimizing area overhead and avoiding complex serial shifting mechanisms. It includes three main components: a standard I<sup>2</sup>C interface, a proprietary SPB, and an array of distributed configuration latches and test logic.

#### Configuration Hierarchy.

The pixel matrix is hierarchically organized into 16 groups of  $8 \times 8$  pixels. Each group contains:

- 24 configuration bits per pixel and group, organized into 3 banks of 8-bit memory latches.
- Group-level configuration logic, including address decoders, control FSMs, and tristate bus interfaces.

The configuration space includes operational modes (normal/test), pixel enable/disable flags, readout behavior (e.g., number of readout phases), and analog blocks parameters.

#### I<sup>2</sup>C and SPB Interfaces.

The chip receives configuration commands over a standard I<sup>2</sup>C interface (Serial DAta Line (SDA)/Serial CLock Line (SCL)) located at the periphery. This interface allows addressing of any individual pixel or peripheral block via a pre-defined sequence. To avoid long shift registers and allow low-latency access, the I<sup>2</sup>C data is converted internally into a parallel format using a **SPB**. The SPB uses:

- An 8-bit config\_bus for parallel data transfer,
- A config\_strobe signal derived from the I<sup>2</sup>C clock that clocks the internal FSMs,
- A config\_sync signal used as an indicator of a new transaction that resets internal FSMs.

Each group contains a SPB entrance block with a state machine that decodes the incoming address, enables the correct pixel, selects the target latch bank, and handles both write and readback operations.

#### **Configuration Latches.**

Configuration values are stored in latches rather than flip-flops to minimize area and static power consumption. These latches are located within the digital area of the group and directly drive logic such as the discriminator mask, counter mode selector, readout phase settings, and analog settings lines. No shadow registers are used. The data written to the configuration banks propagate immediately to the functional logic.

#### Peripheral Configuration.

In addition to pixels, the I<sup>2</sup>C/SPB system supports configuration of peripheral analog and digital blocks, such as:

- Digital-to-Analog Converter (DAC) bias settings for current sinks and reference voltages,
- Serializer driver strength and clock polarity,
- Analog output buffer biasing,
- Clock receiver parameters.

These are mapped into a reserved group and pixel address range and treated by the SPB Finite State Machine (FSM) as "virtual pixels."

#### **Broadcast and Synchronization.**

The configuration system supports global broadcast commands via the upper bits of the address field. For example, an I<sup>2</sup>C transaction with a broadcast flag can trigger resynchronization of the serializer or reset of the I<sup>2</sup>C FSMs. This mechanism ensures robust operation in the event of a communication error or power-on initialization.

Overall, the CTR platform in 3FI65P1 enables precise and flexible control over the behavior of every pixel and peripheral function, facilitating not only routine data acquisition but also detailed detector calibration and commissioning procedures. The Hardware Description Language (HDL) code structure, which also reflects the design hierarchy, is presented in Figure 3.25.

### 3.6.5 Peripheral Circuitry and Support Blocks

To enable stable and precise operation of both the analog and digital domains in the 3FI65P1 ASIC, several **dedicated peripheral blocks** were integrated around the pixel matrix. These circuits offer global services, including power-on reset, bias generation, voltage reference stabilization, clock reception, and output driving. Their design follows the principle of functional isolation: analog and digital circuits are supplied by separate voltage domains and have separate ground networks to minimize noise coupling.

#### **Power-on Reset**

A global **Power-On Reset (POR)** block ensures the correct startup state of configuration latches and internal finite-state machines. The POR monitors supply voltage levels and triggers a global reset once all voltages are within the valid operating range. It also gates the serializer reset and the I<sup>2</sup>C FSMs to prevent undefined behavior during initial power ramp-up.



Figure 3.25: Structure of the developed CTR platform code. The color of the text reflects the main functionality of a given block: black indicates readout, orange indicates configuration, and green indicates testability.

# **Bandgap Reference**

A dedicated **bandgap voltage reference** provides a temperature-stable reference voltage of approximately 1.2 V, used to bias analog blocks such as the CSA and shaping filter. It feeds multiple DACs and analog comparators and is available off-chip via a test pad for monitoring or override.

#### **Temperature Sensor**

An on-chip **analog temperature sensor** is included to track die temperature during operation. This supports temperature compensation strategies for threshold and bias tuning.

## **Digital-to-Analog Converters**

A bank of **current-steering DACs** generates precise bias currents and voltages for the analog front-end and output drivers. Their key characteristics include:

- Resolution of 6-8 bits, configured through the I<sup>2</sup>C-SPB interface.
- Default values are hardcoded in the configuration map to ensure safe startup biasing.

#### **Bias Blocks**

Current mirrors and voltage followers distribute DAC outputs to analog blocks with local buffering and decoupling.

#### **Clock Reception**

Two differential clock domains are implemented using dedicated CML (Current Mode Logic) receivers:

- The primary serializer clock (nominal 250 MHz) drives the digital serializer and token generator.
- A secondary, slower clock (typically 250/14 MHz) is used as a backup token generator. This also allows us to test the EDWARD system with different duty cycles of the slow clock.

Both receivers include Duty-Cycle Correction (DCC) blocks to compensate for asymmetry in clock waveforms. Bias current and DCC behavior are configurable via periphery registers.

## **Digital Output Drivers**

The serialized digital output is driven by a Current-Mode Logic (CML) transmitter that supports both 50  $\Omega$  pull-up (standard CML mode) and high-impedance Voltage-Mode (VM) output configurations. The output driver:

- Includes pre-emphasis and programmable tap weighting for signal integrity across long transmission lines.
- Can be synchronized to the input clock phase or inverted using configuration bits.
- Supports adjustable main, pre-, and post-tap coefficients.

## **Analog Output Buffer**

The analog S/H signals from a selected pixel are routed through an **analog buffer** chain to the off-chip output pads. The buffer provides:

- Low output impedance.
- Bias current and baseline configuration via SPB-controlled DACs.

A pair of differential output pads is reserved for the analog stream, and its phase alignment with the serializer is documented to allow precise correlation of digital address and analog amplitude data.

#### **Padframe and Power Domains**

The peripheral circuitry is powered by three separate supply rails:

- VDDA\_FE: 1.5 V for the CSA and low-noise analog stages.
- VDDA: 2.5 V for buffers, analog biasing, and output drivers.
- VDD: 1.2 V for digital logic and serializers.

All grounds (VSSA and VSS) are internally separated and shorted externally on the Printed Circuit Board (PCB).

These peripheral support circuits are essential for the stable and predictable operation of the 3FI65P1 chip under varying conditions, enabling the analog and digital systems to operate independently yet coherently within the asynchronous, event-driven framework. A simplified block diagram of CTR structure across the chip is shown in Figure 3.26.



Figure 3.26: Simplified block diagram of the 3FI65P1 showing the elements of the CTR platform [120].

# 3.7 EDWARD65P1 chip

#### 3.7.1 Motivation

The development of the **EDWARD65P1** ASIC was motivated by the need to validate the performance of the event-driven readout architecture-EDWARD in a controlled, fully digital environment [120]. While the earlier **3FI65P1** chip incorporated a full analog front-end and was designed for scientific imaging with real sensor inputs, it posed limitations for isolating and quantifying the intrinsic performance of the arbitration tree and event-driven readout protocol.

To perform a systematic investigation of the readout behavior under varying load conditions and ensure that architectural decisions regarding arbitration and token reuse achieve their intended performance benefits, a dedicated test chip was required. The **EDWARD65P1** chip was thus designed as a digital-only derivative of 3FI65P1, retaining its core digital structures, including the CTR platform, asynchronous arbitration tree, and serializer, but replacing the analog front-end with programmable *event generators*.

These event generators emulate radiation hits using a pseudo-random pulse generation mechanism, producing readout requests according to a Poisson process with user-configurable mean rate. This architecture enables a wide range of test scenarios, from sparse to saturated conditions, allowing detailed measurements of key metrics such as:

- Readout latency distribution under low, medium, and high event rates,
- Arbitration fairness and avoidance of pixel starvation,
- System throughput and bandwidth saturation behavior,
- Token reuse efficiency and synchronization robustness.

The motivation for this work extends beyond functional verification. EDWARD65P1 serves as a precise benchmarking platform for future extensions of the EDWARD architecture to larger matrices, different sensor technologies. It also provides empirical support for the central hypothesis of this dissertation: that non-prioritized, asynchronous, event-driven readout architectures can outperform traditional frame-based and polling-based schemes in terms of latency, throughput, and fairness.

#### 3.7.2 Architecture Overview

The **EDWARD65P1** ASIC was developed as a digital-only prototype to isolate and evaluate the core performance of the EDWARD readout architecture, independently of analog front-end behavior. Architecturally, it is derived from the **3FI65P1** chip, reusing its configuration, arbitration, and serialization infrastructure, while replacing the analog signal chain with digital event generators embedded in each pixel.

The chip consists of a  $32 \times 32$  matrix of pixels with a  $100 \,\mu\mathrm{m}$  pitch, each capable of generating asynchronous readout requests at controlled rates. These requests are resolved by a binary arbitration tree constructed from RS-latch-based Seitz arbiters, which guarantee metastability-free, non-prioritized access to the output path. Arbitration is entirely event-driven: no global strobe or framing mechanism is required.

Each event generated by a pixel is handled by the asynchronous tree, serialized, and transmitted off-chip through a high-speed data path clocked at 250 MHz. The serialization logic divides the data clock by 14, resulting in a token injection rate of approximately 17.86 MHz. This ensures the architecture remains compatible with typical data acquisition systems while preserving the low-latency advantages of event-driven readout.

Pixel configurations, including event generator parameters, are programmed via the **CTR** infrastructure. This employs an I<sup>2</sup>C interface coupled with a SPB protocol to allow full control over individual pixels and global system settings. Overall, the EDWARD65P1 structure forms a minimal, deterministic testbed to investigate arbitration fairness, latency, throughput scaling, and saturation dynamics across a wide range of synthetic input conditions.

#### 3.7.3 Poisson-Distribution Event Generator

At the heart of the EDWARD65P1 chip lies a per-pixel hardware generator that produces readout requests emulating a Poisson process. This approach provides a statistically realistic and tunable stimulus for evaluating the event-driven arbitration network under controlled, repeatable conditions. The idea came from [124].

#### Theoretical Basis.

The timing of events from a radioactive source is well-modeled by a *homogeneous Poisson process* [125], where the inter-arrival times between events follow an exponential distribution with the rate parameter  $\lambda$ :

$$f(t;\lambda) = \lambda e^{-\lambda t}, \quad t \ge 0.$$
 (3.1)

Here,  $f(t; \lambda)$  denotes the probability density function of the exponential distribution, describing the likelihood of observing an inter-arrival time of length t given the rate parameter  $\lambda$ .

In a digital system, exact sampling from the exponential distribution can be achieved using inverse transform sampling:

$$t = -\frac{1}{\lambda}\ln(U),\tag{3.2}$$

where U is a random variable with uniform distribution in [0,1]. However, this method is computationally expensive and unsuitable for low-resource, high-speed ASICs.

#### **Clock Generator Architecture.**

Each pixel in EDWARD65P1 contains an autonomous clock generator that drives its local event generation logic. This clock generator is designed for flexibility, compactness, and safe asynchronous control. Its core is a **ring oscillator** constructed from three inverter stages, each implemented using **Schmitt-trigger inverters**. The Schmitt triggers enhance the oscillator's robustness by introducing hysteresis and improving the signal swing, especially at low frequencies. This ensures reliable logic-level transitions, reduces jitter, and allows operation across a wide frequency range.

To make the oscillator frequency tunable, each inverter stage includes a configurable capacitive load. A 4-bit control word defines the load value, which is applied identically to all three stages. By adjusting the total capacitive loading, the oscillation frequency can be slowed down or sped up. In practical terms, this 4-bit code maps to 16 discrete frequency settings, which are further processed by the programmable divider stages downstream in the generator pipeline.

A key feature of the clock generator is its ability to be **paused or resumed asynchronously** without glitches or metastability. This is critical for testbench synchronization, global enable control, or power-saving scenarios. To enable this feature, the clock start-stop logic incorporates a compact asynchronous **arbiter circuit** and a **C-element** (Muller gate). The arbiter ensures mutually exclusive access to the enable path, while the C-element safely holds the oscillator state until all participating signals agree to transition. This provides a clean and race-free transition between active and paused states, which is crucial in asynchronous environments like EDWARD. The block diagram of the clock generator is shown in Figure 3.27.

Altogether, this clock generator design enables **per-pixel autonomous, tunable, and interruptible event sources**-each capable of producing Poisson-like stimuli with adjustable rates and safe runtime control. Its low-resource implementation ensures scalability to large pixel arrays while preserving statistical fidelity and architectural modularity.



Figure 3.27: Clock generator block diagram.

#### **Clock Divider Configuration and Operation.**

To enable precise per-pixel control over the event generation rate, EDWARD65P1 implements a flexible **clock divider** circuit that operates on the output of the local ring oscillator. The divider is configured using two 4-bit control fields: conf\_divider and conf\_div\_mult, which are used together to form the final division factor. Internally, the 16-bit division value is computed as a **binary left shift**:

$$divider = conf \ divider \ll conf \ div \ mult. \tag{3.3}$$

This formulation allows both linear and exponential tuning within a single stage. The conf\_divider value controls the base count, while conf\_div\_mult defines a power-of-two multiplier. The result is an effective frequency range reduction that is both compact and efficiently encoded.

The divider is designed to support all edge cases – division by 0, 1, 2, even and odd values – via specialized submodules. For instance, if the divider is set to 0, the output is held low; if set to 1, the oscillator signal passes through unchanged. For a divider of 2, a simple flip-flop is used to produce a divide-by-2 clock.

For higher values, the output is selected dynamically between **even-division logic** (based on the rising edge counter) and **odd-division logic** (which combines rising and falling edge counters). These counters increment on opposite clock edges and reset at the programmed divider value, ensuring a nearly 50% duty cycle even in the odd case. The correct clock output is chosen via a dedicated multiplexer, driven by logic that identifies the divider mode.

This unified divider stage ensures glitch-free transitions and symmetric output behavior across all divider configurations. It produces a deterministic output signal that acts as the timing source for the Bernoulli trial module. This modular and scalable approach enables each pixel to independently control its event rate with minimal hardware overhead and a wide dynamic range.

#### Bernoulli Trial Approximation.

EDWARD65P1 uses a Bernoulli trial mechanism to approximate Poisson behavior. Time is discretized into small intervals (clock cycles), and at each interval a trial is executed to determine whether an event occurs. This results in a binomial process, which converges to a Poisson distribution in the limit of small p and high trial frequency:

$$\lim_{n \to \infty, p \to 0, np = \lambda} \text{Binomial}(n, p) \to \text{Poisson}(\lambda).$$
 (3.4)

Here, n represents the number of independent trials (time intervals) and p is the probability of an event occurring in a single trial. The product np corresponds to the expected number of events, which approaches the Poisson mean  $\lambda$  in the limiting case.

Each trial involves generating a 32-bit uniformly distributed pseudo-random variable U (referenced later as a random number) using a barrel-shifted linear-feedback shift register (LFSR), and comparing it against a pre-configured bitmask. An event is registered if U falls within the mask-defined threshold. The success probability p is given by:

$$p = 2^{-N_{\text{mask}}},\tag{3.5}$$

where  $N_{\rm mask}$  is the number of required ones in the mask. The resulting event rate is:

$$\lambda = f_{\text{gen}} \times p = f_{\text{gen}} \times 2^{-N_{\text{mask}}},\tag{3.6}$$

where  $f_{gen}$  is the generator clock frequency, set independently for each pixel through a digitally controlled oscillator and programmable divider.

## LFSR Design.

Each pixel in EDWARD65P1 contains a **97-bit maximum-length Fibonacci LFSR** designed to generate 32-bit uniformly distributed pseudo-random numbers in a single clock cycle. The register is based on the primitive polynomial:

$$P(x) = x^{97} + x^{91} + 1 (3.7)$$

This polynomial defines the feedback logic of the shift register, where the new bit entering the Most Significant Bit (MSB) (bit 96) is the XOR result of the bits at positions 96 and 90. This configuration guarantees a maximal-length sequence of period  $2^{97} - 1$ , covering the entire state space (except the all-zero state), which ensures statistical decorrelation of outputs over time.

In a **Fibonacci LFSR**, also known as the "external XOR" configuration, the register bits are shifted linearly and the new input (seeded into the most significant bit) is computed as a modulo-2 sum (XOR) of selected tap bits. This contrasts with the **Galois configuration** (or internal XOR), in which the feedback is distributed across multiple bits during the shift operation. While Galois LFSRs tend to be more efficient for hardware serialization (bit-wise generation), Fibonacci LFSRs are often preferred for *parallel* random word generation because of their conceptual simplicity and predictable tap propagation. The examples of both LFSRs are shown in Figure 3.28.



Figure 3.28: Example of 16 bit LFSRs: Fibonacci and Galois architectures with the same polynomial  $P(x) = x^{16} + x^{14} + x^{13} + x^{11} + 1$  [126]

# **Parallel Generation Requirements.**

To generate a 32-bit random word in a single clock cycle, the LFSR must be *unrolled* and parallelized: rather than shifting one bit per clock, the circuit computes 32 new bits per cycle. This imposes a strict requirement on the selection of the LFSR polynomial:

- The lowest tap in the feedback polynomial must be at least as large as the desired output width (i.e.,
   32) to ensure statistical independence across the output word.
- The feedback polynomial must be **primitive**, so that the register generates a maximal-length sequence of period  $2^n 1$  (excluding the all-zero state).

For EDWARD65P1, a **97-bit** LFSRs was selected, meeting both conditions. The chosen polynomial uses only two taps, minimizing the number of required XOR gates and reducing silicon area and dynamic power consumption. This design produces the period of  $2^{97} - 1$ , which is more than sufficient to avoid repetitions or correlations during practical operation.

#### Hardware Efficiency.

The 32-bit output is extracted directly from the high-order bits of the 97-bit register after a single combinational update. The sparse tap configuration allows for a clean implementation of the LFSRs unrolling logic in ASIC standard cells. This balance between statistical quality, parallelism, and gate count makes

the 97-bit Fibonacci LFSRs a pragmatic choice for high-speed, low-resource random number generation in pixel-level digital logic.

#### Bernoulli Trial Mechanism.

The final stage of the event generator in each EDWARD65P1 pixel implements a **Bernoulli trial** to probabilistically determine whether a readout event should be generated in a given clock cycle. In probability theory, a Bernoulli trial is a random experiment with only two outcomes: success (event occurs) or failure (no event), with a fixed probability p of success.

The generator utilizes a uniform 32-bit random number U produced by the LFSR and applies a thresholding operation to accept only a fraction p of these numbers. In a naïve digital implementation, this would require checking whether U < T, where T is a constant threshold derived from p:

Event generated if 
$$U < T$$
. (3.8)

However, implementing such comparators in hardware incurs significant area and power costs-especially when replicated across hundreds or thousands of pixels. To minimize logic usage, the EDWARD65P1 design uses an alternative and more efficient technique: a **bitmask-based reduction test**.

#### Mask-Based Trial via Bitwise AND.

Instead of comparing against a threshold value, the design assigns a binary **mask** M with N lower bits set to one (e.g.,  $M=2^N-1$ ). The 32-bit random number U is then subjected to a bitwise AND operation with this mask:

Event generated if 
$$(U \& M) = M$$
. (3.9)

This logic checks whether the N least significant bits of U are all ones. Statistically, this partitions the  $2^{32}$  possible values of U into two disjoint sets:

- A small subset of values where the N LSBs are all ones which results in an event.
- The remaining majority, where at least one bit in the masked region is zero which results in no event.

The probability of generating an event under this scheme is:

$$p = \frac{1}{2^N},\tag{3.10}$$

which directly maps to the number of mask bits. For example, a mask with five ones (i.e., M=0x1F) yields p=1/32.

# Hardware Advantages.

This approach brings several key advantages:

- No arithmetic comparison is required-only a bitwise AND and an equality check.
- $\bullet$  The mask width N is configurable per pixel, enabling independent event rates.
- The masking operation reduces logic complexity and the length of the critical path in implementation with logic gates, improving timing closure in large arrays.

Functionally, this mask-based trial is **equivalent to partitioning the uniform random number space into two discrete subsets** and checking which subset the generated number falls into. The subset sizes are powers of two, maintaining strict statistical uniformity and repeatability.

This efficient realization of the Bernoulli trial enables compact, deterministic emulation of exponential inter-arrival times in each pixel while preserving the statistical fidelity required to mimic a radioactive decay process.

# Implementation Details.

The generator comprises four key stages (its block diagram is shown in Figure 3.29):

- 1. A **Schmitt-trigger-based oscillator** with configurable RC loading defines the base frequency.
- 2. A **clock divider** (fine and coarse stages) adjusts this frequency from 1 to 491,520× reduction.
- 3. A **97-bit LFSR** provides one random number per cycle.
- 4. A **Bernoulli trial logic** applies the bitmask test to generate the final pulse.



Figure 3.29: Block diagram of the Poisson process emulator implemented in each pixel [127].

The minimum inter-arrival time is two generator clock cycles (for non-retriggerable output), and the output closely approximates exponential spacing for a wide range of  $\lambda$  values. Experimental testing confirms statistical alignment with theoretical expectations across available configurations.

This design allows each pixel to behave as an independently configurable radiation source, enabling fine-grained studies of pile-up, arbitration fairness, and throughput saturation in conditions not achievable using real radioactive sources.

#### **Event Source Selection.**

In addition to its flexible clock and divider configuration, each pixel in EDWARD65P1 includes a configurable **event source multiplexer**, enabling readout events to be triggered by various stimuli beyond the internal pulse generator. This feature enhances both testability and system versatility by supporting the following selectable sources:

- **Internal Pulse Generator:** The default mode in which events are generated by the Poisson-like generator driven by the local oscillator and clock divider.
- Clock Signal: The raw oscillator output can be selected as the event trigger. This is especially useful for measuring the frequency of the local oscillator, even though it is not directly observable. By setting the clock divider to a high value and ensuring that the arbitration token generation rate exceeds the oscillator frequency, the observed event rate becomes a proxy for the oscillator speed.
- **Bump Pad Signal:** An external signal coupled through the bump pad can serve as a trigger source. This mode facilitates manual testing using a **probe station needle**, allowing precise control of individual pixel activation during development or debug.
- **Neighboring Pixel Output:** A pixel can be configured to accept its neighbor's output as its own input trigger. This enables the formation of readout **chains or cascades** among pixels.
- External Pad Input: An external pad signal can be used to initiate the cascade of readouts starting from the selected pixel.

The latter two modes-neighbor chaining and external synchronization allow users to create **deterministic readout sweeps** across the pixel matrix, mimicking frame-based acquisition. While EDWARD65P1 is inherently an event-driven architecture, this chaining mechanism can emulate frame readout behavior, making it particularly valuable for **debugging and comparative benchmarking** of readout paths, serialization logic, and token timing. This configurability highlights the architecture's dual utility as both a test platform and a fully asynchronous readout demonstrator.

# **Chapter 4**

# Simulation and Performance Evaluation: Digital and Mixed-Signal Analyses

Before fabricating the proposed event-driven readout architecture, a comprehensive SystemVerilog-based functional testbench was developed to verify the correctness, robustness, and timing behavior of the design. This environment supported verification of both **3FI65P1** and **EDWARD65P1** ASIC designs at the Register-Transfer Level (RTL), as well as signoff netlists with extracted parasitics and delays.

The primary purpose of the testbench was to validate the logic implementation of asynchronous arbitration, in-channel control flow, multi-phase readout, and data bus management. Simulation results from this environment provided crucial input for design iterations, synthesis optimization, and readiness for tape-out.

In addition to digital simulations, a series of transient analog simulations was performed for a fully implemented 8×8 pixel group, targeting verification of signal integrity and arbitration timing under realistic asynchronous switching conditions. These mixed-signal analyses, based on a transistor-level netlist with parasitic extraction, focus on measuring acknowledge propagation delay, redistribution latency, and setup/hold timing for reliable serializer operation. Results from these tests ensured that arbitration remained fair, glitch-free, and metastability-robust across the full operating range.

# 4.1 Digital Testbench Architecture Overview

The testbench is a fully self-contained SystemVerilog verification environment developed using standard simulation tools (Cadence Xcelium and AMS for mixed-signal cases). Its high-level structure is shown in Figure 4.1.

The **Device Under Test (DUT)** consists of the synthesized or behavioral model of the **group**, **matrix**, or matrix and all peripheral digital blocks (**clock divider**, **I**<sup>2</sup>**C slave**, **serializer**), depending on the implementation stage. In the initial phase of verification, the testbench instantiates the synthesized netlist of a single pixel group multiple times to emulate the matrix hierarchy. In contrast, the matrix-level logic, such as arbitration between groups and shared peripheral buses, is kept at the behavioral level. This approach enables modular debugging and eases synthesis constraint management. In the later stage, the entire matrix,



Figure 4.1: Block diagram of the functional verification environment. Pixel stimuli and configuration settings are applied to the DUT, while outputs are compared against the golden model in a scoreboard.

comprising all groups, global arbitration, serializer, and peripheral interfaces, is instantiated as one unified DUT block. This transition reflects design readiness for full-chip synthesis and delay-aware verification.

The simulation **sequencer** is implemented directly within the top-level testbench module and orchestrates the complete verification flow through a structured series of test scenarios. It serves as the central controller for simulating time progression, DUT stimulus, and verifying functional correctness across multiple operating modes and configurations. Upon initialization, the sequencer asserts global reset and clear signals, sets the initial state of the signals, and enables the monitoring infrastructure. It then sequentially invokes a series of SystemVerilog tasks corresponding to a functional test. For each test, the result is checked against the scoreboard counters, and the boolean pass/fail outcome is aggregated into a global test result flag. The sequencer also allows flexible selection of the test target group(s) using a predefined vector. Each test case is automatically repeated for all specified group indices. After every test, diagnostic messages are printed showing intermediate outcomes, and the campaign concludes with a single success/failure summary based on the accumulated status. In addition to executing tests, the top-level module controls the clock generation. It is implemented using the always block with carefully chosen timing parameters to replicate typical system timing behavior. Optional edge delays can be introduced to evaluate sensitivity to clock skew or to emulate slower signal rise/fall scenarios. Altogether, the sequencer in the top-level module ensures reproducibility, structured coverage, and automation of the entire simulation suite. Its modular design allows easy extension with new test cases and facilitates regression testing by keeping the control logic self-contained within the testbench.

The I<sup>2</sup>C driver interface emulates a complete master-side protocol controller for configuring the DUT through its serial configuration bus. It encapsulates both low-level signaling and high-level transaction sequences, enabling automated interaction with the configuration logic of the chip under test. At the protocol level, the driver implements the I<sup>2</sup>C-compliant start and stop conditions, byte-wise data transmission and reception, as well as acknowledge and non-acknowledge handling. All signal transitions are time-accurate and parameterized with respect to the clock period to emulate realistic I<sup>2</sup>C timing. Bit-level transitions on scl and sda are carefully controlled, and bus arbitration is modeled through weak pull-ups and tri-state conditions to reflect actual electrical behavior. For usability, the interface provides high-level tasks such as write\_config and read\_config, which accept address, group, and pixel identifiers along with configuration payloads. These tasks automatically package the information into I<sup>2</sup>C message frames, issue the sequence on the bus, and verify acknowledgment from the DUT. The driver supports broadcast addressing and non-blocking operation. It also enables readback verification by capturing response bytes from the DUT and passing them to the scoreboard for comparison with expected values. Internally, each I<sup>2</sup>C operation is performed as a structured message, allowing for reuse in multiple test scenarios and compatibility with randomized or directed test sequences. This modular and fully controllable I<sup>2</sup>C driver is essential for orchestrating configuration in simulation, mirroring the role of a real hardware controller in a physical testbench, and ensuring precise emulation of all slow-control interactions during testbench execution.

To suppress spurious warnings and false-positive violations during functional verification, a **short pulse filter** is implemented within the testbench. Its purpose is to mask constraint violations that arise due to the asynchronous nature of the readout request signal (rqo) relative to the acknowledge path (acki) within the arbitration tree. In a real asynchronous system, pulse width mismatches or glitches at the handshake interfaces are inherently tolerated by the circuit design; however, they can trigger width constraint warnings during simulation when the static timing analysis tools assume synchronous behavior. The short pulse filter operates by filtering transitions that do not persist for a minimum time threshold, effectively de-glitching the interface without modifying the functional path. This ensures that simulation logs remain focused on genuine design issues. The correctness of the real circuit behavior is verified separately in a dedicated **mixed-mode testbench** that includes detailed analog timing models. In that environment, the short pulse behavior is examined under various race conditions, propagation delays, and signal skews to ensure that no legitimate handshakes are suppressed and no unsafe behavior is introduced.

The monitoring infrastructure within the testbench is composed of several dedicated modules responsible for observing key internal and peripheral signal behaviors without interfering with the DUT. These monitors operate independently, collecting timing and data-related information to support the verification of functional correctness and performance analysis.

The **analog monitor** observes the per-pixel asynchronous signals related to analog front-end activity. Specifically, it tracks the peak detection flag (pdf) and discriminator output (dco) from each pixel. Every rising edge of the pdf signal is timestamped using simulation time and recorded into a per-pixel queue. For the dco signal, the monitor counts the number of rising edges occurring during each shutter window, which corresponds to the analog charge integration cycles. These counts are stored per pixel and can later be retrieved by the scoreboard to verify consistency with the generated data and expected analog behavior.

The **output monitor** interfaces with the serialized output stream of the DUT and reconstructs readout data word-by-word, synchronized with the descrialization clock. Upon detection of the defined synchronization pattern, a descrialization sequence is initiated, capturing a complete data message over several clock cycles. Non-empty data words are stored in a monitored queue if monitoring is enabled. This enables continuous observation of serialized readout activity and provides input to the scoreboard for verifying readout order, data integrity, and event timing.

The I<sup>2</sup>C monitor passively listens to the slow control communication interface. It reconstructs full I<sup>2</sup>C commands by detecting start conditions, reading address and data bytes, and observing acknowledgements and stop conditions. Each command is buffered and structured into decoded transactions that reflect the actual configuration state changes requested by the testbench sequencer. This monitor plays a critical role in correlating configuration writes and reads with subsequent DUT behavior, helping to identify misconfigurations or protocol handling issues.

Together, these monitoring modules form the backbone of non-invasive observability in the testbench, enabling cross-checks with expected behavior and facilitating detailed timing, synchronization, and data-flow analysis throughout the simulation.

The **scoreboard** module serves as the central verification engine within the testbench. It aggregates data from the monitors, interprets observed behavior, and validates simulation outcomes against expected system responses. Functionally, it operates as a transaction-level checker with deep protocol awareness, utilizing both signal timing and control configuration to detect correctness, misbehavior, or protocol violations.

Internally, the scoreboard maintains per-pixel configuration state, such as enable status, forcing flags, operating mode, and test configuration. These values are automatically updated in response to decoded I<sup>2</sup>C commands received through the I<sup>2</sup>C monitor. When a write command is observed, the scoreboard updates an internal shadow memory, indexed by the composite pixel/group address. When processing read commands, it performs a strict comparison between the expected and returned values, detecting mismatches due to data corruption, uninitialized reads, or out-of-scope accesses.

For functional verification, the scoreboard coordinates events across three domains: analog activity (via the analog monitor), digital serialized output (via the output monitor), and configuration control (via the I<sup>2</sup>C monitor). It identifies the origin of each readout event by cross-referencing timestamps, configuration states, and token routing logic. Each output word is checked for phase correctness based on the configured number of data phases, and any violations, such as missing data, duplicate phases, or unexpected sources, are flagged.

Additionally, the scoreboard manages timeout tracking for readout latency analysis. If a pixel's analog hit is not followed by a digital readout within a configured time window, the event is marked as a timeout error unless the pixel was disabled. It also logs pile-up events, where multiple overlapping responses violate temporal exclusivity. For testability features, the scoreboard compares readout data with analog ripple counters, verifying not just data presence but also amplitude accuracy, and reports mismatches or range violations.

Overall, the scoreboard integrates asynchronous event tracking with configuration-aware logic, enabling comprehensive, fine-grained verification of both functionality and protocol compliance across the entire

pixel array. It is critical for quantifying the correctness of arbitration, measuring latency and throughput, and validating the design against corner cases and edge-triggered events.

# 4.2 Digital Test Campaign: RTL and Signoff Verification

To verify the functional correctness, configuration behavior, and protocol compliance of the EDWARD architecture, an extensive suite of digital tests was developed and executed on both the behavioral RTL netlist and the gate-level signoff netlist. These tests are designed to validate pixel-level control, arbitration tree operation, output serialization, and system-level behavior under various triggering and configuration scenarios. Importantly, the same functional testbench infrastructure was reused across both stages of verification, with only minimal changes to timing annotations and module instantiation.

The purpose of these tests is threefold: (1) to confirm expected behavior across all supported operating modes; (2) to verify edge-case handling, including request suppression and forced triggering; and (3) to assess post-synthesis robustness against timing skew and metastability-sensitive conditions. Tests include per-group and full-matrix evaluations, covering a wide range of configurations, event densities, and timing conditions.

Before detailing individual tests, this section introduces the key global control signals and perpixel/group configuration flags that form the basis for stimulus control and test logic modulation.

# **4.2.1** Global Control Signals

The following top-level control signals are accessible from the testbench and are essential for controlling the chip operation during simulation:

- rst Global asynchronous reset signal. When asserted high, this signal initializes the entire chip into a known default state. It resets all configuration memories, clears the arbitration tree, and initializes serializers. Deassertion releases the system into normal operation.
- clr EDWARD-specific asynchronous clear signal. When asserted high, it clears all pending readout requests in the matrix and prevents new requests from being issued. This signal is typically used to recover from invalid states or to enforce clean initialization before tests.
- **shtr** Shutter signal. This signal has a context-dependent role:
  - In *force mode*, a falling edge on shtr is interpreted by pixels as a synthetic event, effectively substituting for analog signal generation.
  - In test mode, shtr defines a counting window for a shared digital counter within the group. A
    rising edge starts counting, and a falling edge stops it. The stop edge also acts as a simulated
    event for the enabled pixel.

# **4.2.2** Configuration Parameters

Each pixel and group has configuration flags stored in internal memory, which determine their individual behavior during operation. The following flags are actively used in the digital test campaign:

- **conf\_enable** Pixel enable flag. When set to 1 (default), the pixel can generate readout requests upon detecting an event. When cleared to 0, all events are ignored and no request is propagated to the arbitration tree.
- conf\_force Pixel force flag. When set to 1, the pixel overrides analog triggering and uses the shtr signal as its event source. The default value is 0, meaning only analog-generated signals are considered.
- **conf\_mode** Readout mode selector for the EDWARD protocol. It defines how many readout cycles are required to transmit full event data:
  - 00 Single-cycle mode (default). Used for basic operation without charge-sharing compensation.
  - 01 Two-cycle mode. Used in test mode to transmit both address and counter value.
  - 10 Five-cycle mode. Intended for edge pixels with charge-sharing behavior.
  - 11 Nine-cycle mode. Designed for pixels with extensive charge-sharing.
- **conf\_test** Group-level test flag. When set to 1, the group enters testability mode. In this mode, the shtr signal controls a shared counter. The default value is 0 (disabled).

For the 3FI65P1 analog model, there are additional parameters that utilize bits in the pixel memory otherwise reserved for analog parameters or configuration of the Poisson process generator in EDWARD65P1:

- dc\_en Analog model discriminator enable flag. When set to 1, enables generation of discriminator output pulses (dco) from the analog model. Default value is 0 (disabled).
- pd\_en Peak detector enable flag in the analog model. It controls the activation of the pdf output used for readout qualification. Default is 0 (disabled).
- mean\_wt Mean wait time between discriminator pulses in the analog model, used to emulate random hit patterns. Encoded as a 3-bit value:
  - 000: 10 ns (default), 001: 100 ns, 010: 1  $\mu$ s, 011: 10  $\mu$ s, 100: 100  $\mu$ s, 101: 1 ms, 110: 10 ms, 111: 100 ms
- mean\_dt Mean duration of the discriminator output pulse (dco), also a 3-bit encoded value using the exact mapping as mean\_wt. An Erlang distribution is used to model this behavior, circumventing a simulator issue with Poisson timing generators.

#### 4.2.3 Enable Test

The enable test serves as a functional check of the basic pixel-level event generation and readout behavior in the EDWARD architecture. It verifies whether allowing each pixel to individually causes the appropriate readout request to be issued and captured by the arbitration and serialization logic.

The test begins with the global shtr signal held at a low level. This is crucial because, due to the implementation of the configuration interface, writing the conf\_enable bit to 1 also momentarily asserts the conf\_force flag in hardware. This happens because both fields occupy the same bit position in two consecutive memory banks within the pixel configuration memory. As a result, setting conf\_enable to 1 while shtr is low immediately causes the pixel to treat the shutter as an event source and generate a readout request.

Each pixel in the target group is enabled sequentially. After enabling, the system waits for a short duration to allow the request to propagate, be arbitrated, and be serialized. The output monitor then checks whether exactly one data packet (i.e., the pixel address) has been received per enabled pixel.

This test is passed if, and only if, each pixel sends its address exactly once in response to being enabled. It effectively validates that:

- Writing to the configuration memory correctly updates the pixel's internal settings and influences its behavior.
- Force-triggering via shtr behaves as expected.
- The arbitration logic accepts a single request per activation.
- The serializer produces a properly formatted data word per pixel.

The flow diagram illustrating the control flow of the enable test procedure is shown in Figure 4.2.

#### 4.2.4 Force Test v1

The Force Test Version 1 is designed to validate the correct behavior of pixels when explicitly forced to generate events via the shtr signal. In this test, each pixel in the group is individually enabled and configured with the conf\_force bit set. This reroutes the pixel's event source from the analog discriminator or internal generator to the falling edge of the shtr signal.

Once a pixel is configured in force mode, two artificial falling edges on shtr are generated, spaced by fixed time intervals. Each falling edge is expected to act as a stimulus that causes the pixel to issue a readout request. Before moving on to the next pixel, the currently active one is disabled to prevent multiple activations from overlapping. This procedure is repeated for every pixel in the group.

The expected outcome is three readout messages per pixel:

- 1. One message immediately after enabling (caused by a small transient effect when the configuration is written).
- 2. One message in response to the first falling edge of shtr.



Figure 4.2: *Enable test* flowchart: Each pixel is sequentially enabled, and the test passes if exactly one message is received per pixel.

3. One message in response to the second falling edge of shtr.

After all pixels were tested, the scoreboard evaluates the total number of readout events received from each pixel. The test is considered **passed** if each pixel generates exactly three messages, confirming that:

- Force mode correctly redirects event triggering to shtr.
- Arbitration and serialization logic accept and route forced events without data loss.
- No extra or missing events are present, confirming the absence of race conditions or stuck configurations.

The control flow of this procedure is summarized in Figure 4.3, illustrating the sequencing of configuration, event generation, and result verification.

# 4.2.5 Force Test v2

Force Test Version 2 is designed to verify proper accumulation and arbitration of simultaneous readout requests when multiple pixels are incrementally activated in force mode. Unlike Force Test v1, where each pixel is triggered and evaluated in isolation, this test builds up activity across the group cumulatively, emulating a scenario with increasing contention on the arbitration tree.

The test begins with the shtr signal held high. Each pixel in the group is then sequentially enabled and configured with the conf\_force flag set. After each pixel is configured, a single falling edge on the shtr signal is generated, which serves as a simultaneous trigger source for all currently enabled pixels.

Because pixels remain enabled as the test progresses, each subsequent falling edge of shtr is seen by a larger subset of the group. This means the first pixel should respond to all subsequent shtr pulses, the second to all but the first, and so on. At the end of the sequence, the scoreboard verifies that the number of messages received from each pixel is equal to the number of shtr pulses applied after it was enabled.

This test is passed if every pixel:

- Remains silent before being enabled.
- Generates no message immediately after being configured (since shtr is high).
- Responds exactly once per subsequent falling edge of shtr.

This test is beneficial for detecting issues such as:

- Token loss or duplication under high-contention conditions.
- Incorrect request latching or priority skew due to overlapping triggers.
- Inconsistent handling of force-mode triggering at the pixel or group level.

The test procedure is visualized in Figure 4.4, which shows how pixels are sequentially enabled and triggered, and how the results are accumulated and verified.



Figure 4.3: *Force Test v1* flowchart: Each pixel is sequentially enabled with the force flag, triggered twice via shtr falling edges, and then disabled. Three messages are expected per pixel.



Figure 4.4: *Force Test v2* flowchart: Pixels are incrementally enabled with force mode, and a falling edge on shtr triggers all currently active pixels. The message count for each pixel is verified based on its enable timing.

#### 4.2.6 Mode Test

The mode test is used to validate the EDWARD readout behavior under different configuration modes, which define the number of serialization cycles required to transmit event data. It verifies correct pixel response, phase formatting, and readout consistency for all four supported modes of operation.

The conf\_mode field defines the number of cycles (or "phases") required to read out a single event:

- 00 One cycle (default): standard readout without charge sharing.
- 01 Two cycles: used in test mode to transmit address and auxiliary data (e.g., counter value).
- 10 Five cycles: intended for handling charge sharing between edge pixels.
- 11 Nine cycles: designed for corner pixels with complex charge sharing.

In this test, all pixels in the group are configured with the target mode. The pixels are also individually enabled with the analog model activated (discriminator and peak detector) and supplied with event timing through the mean\_wt and mean\_dt parameters. A neighboring group is also activated to simulate realistic arbitration contention. After configuration, the simulation is allowed to run for 1.5 ms to allow sufficient event activity and readout traffic.

Upon completion, each pixel is deactivated, and the scoreboard evaluates the following:

- 1. Whether the number of output messages per event matches the configured number of cycles.
- 2. Whether every received message is complete and valid across all expected phases.
- 3. Whether the arbitration behaved correctly under potential multi-pixel contention.
- 4. Whether any unexpected or duplicate sources contributed to the data stream.

The test is considered **passed** if the number of readout phases for each pixel equals the number of detected events multiplied by the configured phase count, and if no arbitration errors (timeouts, pileup, unexpected sources) are detected.

The entire procedure is visualized in the flowchart shown in Figure 4.5.

# 4.2.7 Testability Test

The testability test is designed to validate the internal debug and diagnostic infrastructure implemented in each pixel group. Specifically, it verifies the correct behavior of the in-group ripple counter, the shared digital test bus, and the EDWARD readout interface when the system is operating in conf\_test mode.

In this mode, one pixel in the group is enabled at a time while all others are disabled. When test mode is active, the group-level counter is driven by the output of the enabled pixel's discriminator through a shared bus. A single test window is created by toggling the shtr signal high, keeping it active for a defined duration (2.5 ms in this case), and then pulling it low to end the test cycle.

The enabled pixel has its analog model configured with appropriate mean wait and duration times to generate digital charge detection signals. These events increment the group-level counter, and the resulting value is sent via EDWARD readout after the shutter closes. The scoreboard then checks that:



Figure 4.5: *Mode Test* flowchart: After enabling pixels and analog sources, the test waits for readout events. Scoreboard verifies phase correctness and error-free serialization across multiple readout modes.

- A readout message is received from the correct pixel.
- The address matches the one currently under test.
- The transmitted value matches the ripple counter value observed in the analog monitor.
- No corruption, readout error, or misalignment occurred.

This process is repeated sequentially for all pixels in the group. The test passes if each pixel successfully generates a message in test mode, and the received counter value exactly matches the internally observed pulse count from the analog model.

This test is beneficial for validating the connection between analog discrimination events and digital test structures, as well as for confirming the ability to perform in-situ readout validation and calibration in post-fabrication debugging environments.

The complete flow of this test procedure is shown in Figure 4.6.

# 4.2.8 Memory Test

The memory test verifies the correct operation of the configuration memory implemented within each pixel and group. It checks both the ability to store arbitrary configuration patterns and the integrity of readback operations through the I<sup>2</sup>C interface. It also detects potential address aliasing, data bus contention, or misrouting during configuration updates.

The test is performed in four stages:

- Randomized Write and Readback: A series of randomly generated addresses and 24-bit configuration data values are written to pixel memory. After writing, a subset of these entries is read back and compared to the original data to verify proper storage. This part validates arbitrary-access correctness and ensures that no address decoding or bus conflict issues occur.
- 2. **Broadcast Write of All-Ones:** Every pixel configuration memory is written with a pattern of all 1s. This is followed by a full memory sweep where each register is read and checked for correctness. It ensures high-bit retention across all memory cells.
- 3. **Broadcast Write of All-Zeros:** The same procedure is repeated using all 0s to validate low-bit retention and rule out stuck-at faults.
- 4. **Out-of-Scope and Never-Written Readback:** Additional reads are performed for registers that were never explicitly written or that fall outside the valid address map. The expected return is either 0 or a default value. This checks the decoder and boundary behavior.

Each operation is verified through the scoreboard, which maintains a shadow copy of expected memory contents and compares them with readout values. Any mismatch is flagged as a corruption error, while unwritten or out-of-scope accesses are checked for compliance with default return behavior.

The test is considered **passed** if:



Figure 4.6: *Testability Test* flowchart: Each pixel is individually enabled in test mode. The shared group counter collects discriminator pulses during the shutter window. The result is read out and checked for accuracy.

- No data corruption is reported.
- All valid entries match their written content.
- All out-of-scope or uninitialized registers return expected values.

The complete flow of this memory integrity test is presented in Figure 4.7.



Figure 4.7: *Memory Test* flowchart: Random, all-zero, and all-one configuration data is written and read from pixel memory. Scoreboard verifies data correctness and boundary behavior.

# 4.2.9 Summary of Digital Functional Verification

The complete suite of digital tests described in this chapter was initially developed and executed for the **3FI65P1** chip. This ASIC included the complete digital readout architecture integrated with an analog frontend and thus served as the primary target for functional and post-synthesis validation.

Due to the modular design methodology, the **EDWARD65P1** chip, although it is a standalone digital device, only reuses the core architectural skeleton of 3FI65P1. As a result, only a limited re-verification

campaign was required for EDWARD65P1, focused on testing the Poisson process generator itself. The majority of behavioral and structural coverage was already achieved during 3FI65P1 verification.

All described tests were executed on both the RTL-level netlists and the signoff gate-level netlists, fully annotated with extracted delays. Simulations were carried out for all standard delay corners: **TYPICAL**, **MAXIMUM**, and **MINIMUM**, to ensure timing robustness and protocol stability under process and voltage variations.

The test results for each configuration were recorded and archived in the design database as detailed report files. These logs include scoreboard verdicts, protocol assertions, and waveform captures for all simulated scenarios, serving as formal evidence of correct functionality and successful verification signoff.

Overall, the test campaign confirmed that the event-driven readout protocol, arbitration mechanism, configuration interface, and serializer logic performed as expected across all tested conditions.

# 4.3 Mixed-Signal Timing Evaluation of a Single Readout Group

To complement the digital verification efforts, a transient simulation campaign was carried out on a complete  $8\times8$  pixel group implemented in the target 65 nm CMOS process. This array served as a minimal unit for validating the temporal behavior of the asynchronous arbitration mechanism and shared bus signaling under realistic operating conditions [128]. The same group structure was later replicated to build the full  $32\times32$  matrix in both **3FI65P1** and **EDWARD65P1** chips.

# 4.3.1 Simulation Setup and Objectives

The post-layout transistor-level netlist, including parasitic RC extraction, was used as the simulation target. The in-channel logic, arbitration cells, shared data, and acknowledge buses were all preserved in their final implementation form. The simulation was performed using Cadence AMS mixed-signal simulator, and postprocessing was handled in LabVIEW.

To emulate realistic detector behavior, each pixel was configured to generate readout requests in response to a Poisson-distributed hit pattern, with an average interval of  $2 \mu s$ . For that purpose, the same analog model used previously in the digital testbench was reused. This configuration enabled the validation of arbitration fairness, token propagation under load, and redistribution latency in the presence of multiple simultaneous requests. The acknowledge token was generated at 17.86 MHz, corresponding to a 250 MHz serializer clock divided by 14.

# 4.3.2 Key Timing Metrics

Several timing properties were extracted from the simulation traces to characterize the readout behavior:

• Setup time  $(t_s)$ : Time between the active edge of the acknowledge (token generation) at the top of the arbitration tree and the appearance of stable data from the pixel on the shared data bus.

- Hold time (t<sub>h</sub>): Time between the active edge of the acknowledge (token generation) at the top of the tree that resets the readout and the moment when data from the pixel is no longer valid on the data bus.
- Token propagation time (t<sub>t</sub>): Time required for the token to traverse through the arbitration tree, measured as the delay between the active edge at the top of the tree and the corresponding acknowledge activation in the selected pixel.
- Token redistribution delay  $(t_{\rm rd})$ : Time between the active edge of the acknowledge (token generation) at the top of the tree and the appearance of the same token in a second pixel, after the first pixel has been reset. This metric is critical for continuous arbitration under load.
- Pixel reset time  $(t_r)$ : Time between the active edge of the acknowledge (token generation) at the top of the tree and the clearance of the readout request signal in the pixel.

Figure 4.8 illustrates the timing relationships between the global arbitration signals and local pixel activity. All defined metrics:  $t_s$ ,  $t_h$ ,  $t_t$ ,  $t_{rd}$ , and  $t_r$ , are visually represented in the context of a readout cycle.



Figure 4.8: Timing diagram illustrating the definition of key timing parameters in the arbitration and readout process [128].

## 4.3.3 Measured Timing Distributions and Spatial Variability

The extracted timing parameters were statistically analyzed across the full  $8\times8$  pixel group, and results are summarized in the histograms and spatial maps shown in Figures 4.9a–4.10d. Each distribution is annotated with key statistical descriptors, including minimum, maximum, mean  $(\mu)$ , and standard deviation  $(\sigma)$ , with the number of measurements N ranging from a few hundred to over two thousand per metric.

The setup and hold times ( $t_{\rm s}$  and  $t_{\rm h}$ ) exhibited compact distributions, with the majority of values ranging from 2.2 ns to 3.9 ns. The average setup time was 2.88 ns with a standard deviation of 366 ps, while the average hold time was slightly longer, at 3.23 ns with a 351 ps deviation. These values provide a safe margin for serializer synchronization at 250 MHz, assuming a duty cycle not exceeding 82%.

Pixel reset times ( $t_r$ ) showed a broader spread, ranging from 3.54 ns to 5.29 ns, and a mean of 4.33 ns. While higher than setup/hold values, reset timing remains sufficiently short to support high readout rates.

The redistribution delay  $(t_{\rm rd})$ , defining the minimum time between successive acknowledgments in two different pixels, varied from 6.64 ns to 9.60 ns, with an average around 8.05 ns. This latency represents a fundamental constraint on the maximum sustainable arbitration rate during continuous load.

Token propagation delay  $(t_t)$ , defined as the delay between a token injected at the top of the tree and its arrival at the selected pixel, was the fastest metric, ranging from approximately 0.79 ns to 1.66 ns. This metric showed the smallest relative spread ( $\sigma \approx 237 \, \mathrm{ps}$ ), confirming that tree topology was well-balanced and propagation paths were consistent across pixel positions.

To assess spatial variability, timing metrics were also plotted as functions of pixel address. Figures 4.10a–4.10d show mean and standard deviation per pixel, revealing systematic gradient trends consistent with the physical placement in the arbitration tree. Corners and edge pixels exhibit slightly elevated timing delays, especially for  $t_r$  and  $t_t$ , due to longer interconnect paths.

Additionally, the pairwise redistribution delay, defined as the time required for the acknowledge token to transition from pixel n to m, was analyzed and averaged over all observed transitions. Figure 4.11 presents this metric as a 2D matrix.

To further assess the fairness of arbitration and potential positional bias, the number of successful readout grants was recorded for each pixel over the simulation's duration. As shown in Figure 4.12, access occurrences are uniformly distributed with no significant clustering or exclusion. This confirms that the asynchronous arbiter tree resolves contention without favoring specific paths or locations.



Figure 4.9: Timing-related histograms for the EDWARD readout chain.



Figure 4.10: Per-pixel timing maps with  $1\sigma$  error bars for the EDWARD readout chain.



Figure 4.11: Matrix of mean token redistribution delay  $t_{\rm rd}(n,m)$  as a function of source pixel n and destination pixel m. The uniformity of the plot suggests a lack of systematic unfairness in tree traversal order.



Figure 4.12: Number of arbitration accesses per pixel address over the course of simulation. A flat distribution confirms the absence of prioritization or starvation [128].

# 4.3.4 Summary of Mixed-Signal Timing Evaluation

Analysis of the histograms and spatial distributions for all timing parameters confirms that the arbitration and readout mechanisms function as intended. No outliers or invalid transitions were observed, and the perpixel standard deviation remains low and consistent across the array. A noticeable discontinuity in mean timing values appears between pixel indices 31 and 32, which aligns with the structural boundary between adjacent branches of the binary arbitration tree. This is an expected artifact of the hierarchical topology.

The two-dimensional intensity map of token redistribution delay reveals systematic trends: longer delays tend to accumulate in the upper right region of the matrix. At the same time, the shortest transitions occur between pixels in the lower left quadrant. This behavior results from the physical layout and logical depth of the arbitration tree, where pixel-to-pixel distances vary based on the tree traversal depth and routing paths.

The results confirm that all critical timing windows are within acceptable margins for reliable operation of the EDWARD protocol. No outliers or violations were observed, and the spatial variation is modest and can be compensated during system calibration if needed. This confirms the robustness of the EDWARD arbitration scheme under real-world asynchronous conditions. The data obtained from this study provides concrete design guidelines for scaling and implementing future ASICs based on the same architectural principles.

# 4.4 Analog and Mixed-Signal Simulation of EDWARD65P1 and In-Pixel Event Generator

To assess the functional integrity and performance characteristics of the **EDWARD65P1** ASIC prior to fabrication, a comprehensive analog and mixed-signal simulation campaign was undertaken. The simulation goals were threefold: to verify the correct operation of the in-pixel Poisson-based event generator, to con-

firm the proper functionality of the asynchronous token-based readout system, and to evaluate key timing parameters under various operating conditions.

# 4.4.1 Analog Simulation of In-Pixel Clock Generator

To verify the functional correctness and configurability of the in-pixel clock generator integrated into the **EDWARD65P1** ASIC, transient analog simulations were performed on the extracted layout view using post-layout netlists. The goal of this simulation campaign was to validate the tuning characteristics of the oscillator across its full dynamic range as configured by a 4-bit control word.

The clock generator operates using a Schmitt-trigger-based ring oscillator with capacitive loading. Its output frequency is programmable via internal transistors that are selectively enabled depending on the digital configuration value. This provides a monotonic decrease in frequency with increasing configuration word, which is essential for adjusting the rate of the Poisson-based event generator integrated into each pixel.



Figure 4.13: Extracted-view simulation results of the in-pixel clock generator. Frequency of oscillation as a function of configuration value.

As illustrated in Figure 4.13, the oscillator frequency ranges from approximately 62 MHz at configuration code  $0 \times 0$  to below 7 MHz at  $0 \times F$ . The steepest frequency gradient is observed between the lowest configuration values, followed by a gradual flattening in the higher range. This behavior, while non-linear, was expected and acceptable.

Importantly, no strong requirement was placed on achieving a particular frequency value or minimizing frequency spread. The primary goal was to ensure that the generated signal is continuous, stable, and within a range sufficient to drive the in-pixel Poisson event generator. Minor variations in frequency are acceptable and do not impact the functionality of the architecture, as the primary role of this clock is to provide a viable and programmable source of timing for event triggering, rather than serving as a precise reference.

These extracted-view simulations thus confirm that the oscillator behaves robustly across its full configuration range and is well-suited for its intended function within the EDWARD readout system.

# 4.4.2 Digital Verification of Poisson-Based Signal Generator

To validate the statistical behavior of the digital event generation mechanism implemented in the **ED-WARD65P1** pixel architecture, a dedicated digital simulation was performed. The purpose of this study was to confirm that the time intervals between generated events follow an exponential distribution, as expected for a memoryless Poisson process [120].

The expected average event rate  $\lambda_{\text{theoretical}}$  is determined by the base clock frequency  $f_{\text{gen}}$  and the number of masked bits  $N_{\text{mask}}$  as:

$$\lambda_{ ext{theoretical}} = rac{f_{ ext{gen}}}{2^{N_{ ext{mask}}}}$$

A simulation was conducted with  $f_{\rm gen}=61.7$  MHz and  $N_{\rm mask}=12$ , resulting in a theoretical rate of approximately 15,070 S/s (Samples per second). A total of 9,999 event intervals were recorded for a single pixel, and the distribution of time intervals between events was plotted as a histogram.



Figure 4.14: Histogram of time intervals between events generated by the digital Poisson generator. The red curve shows the best-fit exponential probability density function.

As shown in Figure 4.14, the simulated data closely follow the expected exponential distribution. The fitted rate parameter  $\lambda_{\text{measured}}$  extracted from the simulation is 15,062.6 S/s, which agrees remarkably well with the theoretical prediction ( $\lambda_{\text{theoretical}} = 15,070.4$  S/s), demonstrating both the correctness of the generator logic and its statistical fidelity.

This result confirms that the generator architecture meets its design goal: providing a stochastic, configurable, and temporally uncorrelated source of digital events suitable for emulating sparse radiation hits within the EDWARD readout framework. The observed minor discrepancy is attributed to quantization effects and the finite sample size, and is well within acceptable margins.

# 4.4.3 Digital Simulation with Variable Event Rates and Queueing Model Comparison

# A. Simulation Conditions and Objective

To evaluate the behavior of the EDWARD65P1 architecture under various activity levels, a series of digital simulations was conducted using a synthesized netlist of the full 32×32 pixel matrix. The goal was to assess how the system performs when subjected to different rates of randomly generated events, ranging from isolated single-pixel activity to conditions approaching matrix-wide saturation. These simulations not only verified functional correctness but also enabled the extraction of latency, throughput, and pile-up metrics under realistic use cases.

Each pixel in the EDWARD65P1 chip contains a digitally configurable in-pixel generator that produces events according to a Poisson distribution, achieved through a randomized Bernoulli trial mechanism with adjustable probability masks. The clock driving the generator can be scaled via a coarse/fine divider, allowing flexible control of the mean inter-arrival time of requests. This setup enables pixel-by-pixel tuning of the effective event rate  $\lambda$  per pixel, making the architecture ideal for simulating diverse traffic patterns ranging from sparse to congested conditions.

For all simulation scenarios, the primary performance metrics extracted include:

- **Readout delay** (latency) defined as the time between a pixel issuing a readout request and the latching of its data in the output serializer.
- Throughput the number of readout tokens successfully processed per unit time.
- **Pile-up rate** the fraction of events delayed beyond the mean inter-arrival period, indicative of arbitration congestion.
- **Arbitration fairness** assessed by counting the number of readouts per pixel over time to detect any systematic bias or starvation.

To provide theoretical grounding, the system is compared to a queuing model described by Kendall's notation as  $\mathbf{M}/\mathbf{G}/\mathbf{1}/\mathbf{N}$ , where:

- M denotes a memoryless (Poisson) arrival process,
- G represents a general service time distribution arising from the variable delay through the arbitration tree and readout phasing,
- 1 indicates a single service point, modeled here as the serializer at the root of the arbitration tree, and
- N is the finite buffer capacity, bounded by the number of pixels in the matrix and the request-holding capability of the arbitration cells.

The following subsections present simulation results for four scenarios: a single active pixel, low event rate across the matrix, medium traffic inducing contention, and high-rate saturation. Each case is analyzed in terms of latency profiles, pile-up behavior, and conformity to the queueing model expectations.



Figure 4.15: Readout delay distribution when only one pixel is active [120].

# **B.** Case Study I: Single Pixel Active

In this baseline scenario, only a single pixel is enabled, while all others remain inactive. The pixel uses its internal generator to produce readout requests modeled as a Poisson process with a low event rate  $\lambda$ , ensuring negligible probability of consecutive requests occurring within a single arbitration cycle. This setup serves to benchmark the intrinsic readout delay of the EDWARD architecture in the absence of contention.

The theoretical span of the readout delay is derived based on the asynchronous arbitration and token-based acknowledge mechanism. It is given by:

$$t_{\rm read,min} \approx \frac{1}{2f_{\rm acki}} + t_{\rm prop} + t_{\rm minWff}$$

$$t_{\rm read,max} \approx \frac{3}{2f_{\rm acki}} + t_{\rm prop} + t_{\rm minWff}$$

$$t_{\rm read, span} = t_{\rm read,max} - t_{\rm read,min} = \frac{1}{f_{\rm acki}}$$

$$(4.1)$$

Where:

- $\bullet$   $t_{\text{read}}$  total time from readout request to data latching in the peripheral,
- $f_{acki}$  frequency of the acknowledge token generator (e.g., 17.86 MHz),
- $t_{\text{prop}}$  propagation delay through the arbitration tree (depends on pixel position),
- $t_{\text{minWff}}$  minimum clock width required for flip-flop to latch.

Simulation results confirmed the delay bounds: the histogram of measured readout delays, shown in Figure 4.15, formed a uniform distribution over the range predicted by Equation 4.1, with no outliers or any structural skew. The width of the distribution matched  $1/f_{\rm acki}$ , confirming that the token generator and arbitration tree were working without glitches.

This uniform delay profile is expected due to the asynchronous arrival of requests with respect to the clocked token. Since the token is generated periodically and the request may arrive at any time within the clock period, the delay distribution is inherently uniform. In the queuing theory terms, this corresponds to an M/U/1/1 queue (Poisson arrival, uniform service time, single server, one-slot buffer).

The key observations from this scenario are:

- No pile-up or queuing occurred; all events were served immediately upon the next acknowledge pulse.
- The arbitration tree path remained deterministic, as only one request propagated upward at any time.
- The absence of crosstalk or token corruption verified the robustness of both request routing and acknowledge token return paths.

# C. Case Study II: Low Event Rate (Sparse Readout)

This simulation scenario evaluated system behavior under sparse but uniformly distributed activity. All 1,024 pixels were configured with identical parameters of Poisson generators, each operating approximately at the rate of 947.8 S/s, resulting in a total matrix-level request rate of 971.6 kS/s. This corresponds to only 5.4% of the theoretical maximum throughput supported by the readout path, which is clocked at  $f_{\rm acki}=17.86$  MHz (i.e., one acknowledge every  $T_{\rm ack}=56$  ns). The distribution histograms from the generators are shown in Figures 4.16a and 4.16b for two representative pixels with addresses 1 and 500, respectively.

In this low-rate regime:

- Most pixels experience no contention and are read out immediately upon request.
- Occasionally, simultaneous or overlapping requests cause brief delays due to arbitration.
- The delay distribution still retains a near-uniform profile but exhibits a slight spread compared to the ideal case.

Figures 4.16c and 4.16d show the histogram of readout delays for two representative pixels. Both distributions are nearly identical and exhibit a shape similar to that seen in the single-pixel scenario. However, in contrast to the ideal uniform profile observed when only a single pixel is active, these histograms show slight asymmetry and an additional tail. These deviations reflect infrequent queuing delays introduced by concurrent requests from other pixels.

This result demonstrates that although the total system load remains far below saturation, arbitration occasionally postpones a pixel's readout by one or more acknowledge cycles - a behavior entirely consistent with the system's asynchronous design.

Notably, the shape of the delay distribution remains closely similar to the single-pixel case. This confirms that the arbitration logic introduces no structural unfairness: pixels are not prioritized based on position or topology, and delay variability is purely stochastic in origin.

Figure 4.16e shows the spatial distribution of pile-up counts across the matrix. All pixels remained below 0.1%, confirming that the system never enters a congested or backlogged state. The dispersion map

in Fig. 4.16f shows the standard deviation of readout delays per pixel - it is consistent and low across the matrix, reaffirming uniform arbitration timing.

This scenario matches the behavior expected from an M/U/1/1024 queuing system, where:

- M each pixel issues requests with Poisson-distributed inter-arrival times,
- U the readout delay is uniformly distributed within a fixed token cycle span,
- 1 the system has a single serialized output,
- N = 1024 finite request capacity across the matrix, though this is never filled in this regime.

### D. Case Study III: Medium Event Rate

This simulation scenario explores the behavior of the EDWARD system under a moderate system load. Each pixel is configured with a significantly higher event generation rate than in the previous test, using a denser mask. The average rate per pixel is increased such that the total matrix-level activity approaches the edge of the sustained real-time arbitration, but remains below saturation.

The distributions of inter-arrival times for pixel 1 and pixel 500, shown in Figures 4.17a and 4.17b, confirm the expected exponential behavior and near-identical generator settings for both pixels. While the input statistics remain Poissonian, the effects of increased request collision become observable in the readout behavior.

Figures 4.17c and 4.17d show the histograms of readout delay for pixel 1 and 500, respectively. In contrast to the previous low-rate case, these distributions exhibit significant tail lengthening and skew. While a high proportion of events are still serviced within 1-2 token cycles, a visible fraction of requests are delayed due to arbitration bottlenecks. This reflects the queuing behavior arising from overlapping requests across the matrix, which fulfills the key performance test of the EDWARD arbitration logic.

This delay spread is still bounded by the theoretical maximum, but deviates clearly from the ideal uniform profile. It confirms that the arbiter no longer operates in isolation - acknowledgments are now regularly queued, and some requests must wait multiple token cycles before being serviced.

The pile-up map in Fig. 4.17e shows a noticeable increase in contention. Pile-up rates now reach up to 0.44.% in the most affected pixels, with the lowest values remaining nonzero. This confirms that while the system is not yet saturated, arbitration and buffer reuse are becoming active contributors to system timing.

The delay dispersion plot in Fig. 4.17f further quantifies the fairness of arbitration. Although the absolute values of standard deviation increase compared to the low-rate case, the spread remains nearly uniform across all 1,024 pixels, confirming that the asynchronous arbitration tree does not introduce systematic bias related to pixel position.

This simulation illustrates the transition between a purely asynchronous, contention-free regime and one where queuing effects must be managed by design. Importantly, the EDWARD architecture handles this medium-rate scenario gracefully:

• No readout starvation is observed across any part of the matrix.

- Pile-up remains modest and bounded well within acceptable operating limits.
- The arbitration tree preserves spatial fairness even as the throughput demand increases.

From a queuing theory perspective, the system remains accurately described by an M/U/1/1024 model. However, as the event rate increases, queuing becomes statistically more probable, and the readout delay distribution progressively deviates from the ideal uniform form. This demonstrates the arbitration tree's ability to absorb increased traffic while maintaining performance and fairness.

## E. Case Study IV: High Event Rate (Saturation)

This final simulation scenario evaluates the behavior of the EDWARD system under high event rates approaching readout saturation. Each pixel was configured to generate requests at an average rate of approximately 242.6 kS/s, yielding a cumulative matrix-level event rate of roughly 248.3 MS/s - which significantly exceeds the maximum throughput of the readout path ( $f_{acki} = 17.86$  MHz, i.e.,  $T_{ack} = 56$  ns).

As shown in Figures 4.18a and 4.18b, the event generation process for pixels 1 and 500 remains exponential and memoryless, consistent with a true Poisson process. The tight match between the histogram and the fitted exponential distribution confirms that the generator continues to produce uncorrelated, independently timed events.

However, the readout delay distributions in Figures 4.18c and 4.18d depart dramatically from prior cases. Instead of a flat or slowly sloping histogram, we now observe a sharply rising edge near the maximum possible delay (approximately  $5 \,\mu s$ ). This staircase-like distribution indicates that nearly all requests experience significant queuing. Most events are no longer serviced in the first token cycle but are delayed by several cycles due to persistent arbitration congestion.

This behavior corresponds to the fully loaded or backlogged condition of the arbitration tree: multiple requests from multiple pixels compete simultaneously, forcing sequential token delivery. Despite this, the system remains stable - all requests are eventually served, and no starvation is observed.

Figure 4.18e confirms this high-load state: pile-up rates exceed 92.6% across all pixels. This is expected when event rates exceed servicing capacity, causing most new requests to arrive before the previous ones are cleared.

Nonetheless, as demonstrated in Fig. 4.18f, the standard deviation of the delay remains extremely uniform across the matrix, even under severe congestion. This is strong evidence that the arbitration logic remains fair - it does not favor specific rows, columns, or tree depths, even when saturated.

This regime highlights a key strength of the EDWARD design: even under worst-case load, with continuous event flooding and queuing, the system ensures:

- Every request is served without corruption or data loss.
- Spatial fairness is preserved with no pixel showing systematic delay deviation.
- Arbitration remains predictable, bounded, and stable.

From a queueing theory perspective, the system now operates in the saturated state of the M/U/1/1024 queue. The queue is full nearly all the time, and new arrivals are either immediately queued or contribute to a pile-up. The readout delay converges toward a maximum span bounded by:

$$t_{\rm max} \approx T_{\rm ack} \cdot N_{\rm eff}, \quad N_{\rm eff} \sim {\rm number\ of\ queued\ requests}$$

This mode confirms the robustness of the arbitration tree in ultra-high-throughput conditions.

# F. Statistical Summary and Pile-Up Analysis

The conducted digital simulations across different event rates, from sparse activity to full saturation, provide a comprehensive view of the EDWARD architecture's performance envelope and validate its robustness under various load conditions.

A key performance metric analyzed across all regimes was the readout delay, defined as the time elapsed between a pixel generating a request and the moment its data is latched in the periphery. The delay distributions evolved in a predictable and interpretable manner:

- Under **single-pixel activation**, the delay was uniformly distributed within the acknowledge clock cycle span, with no contention or queuing. This confirms the correctness of the arbitration logic in isolation.
- In the **low-rate regime**, with all pixels active but generating sparse events, most readouts occurred within one or two token cycles. The delay distributions closely resembled those of the single-pixel case, with slight asymmetry resulting from rare arbitration events.
- At **medium event rates**, queuing became more prominent, causing a broader distribution of delays. Despite increased contention, pile-up remained below 0.5% and arbitration fairness was maintained.
- In the **high-rate regime**, the system operated at or above its maximum servicing capacity. Nearly all requests were queued, resulting in readout delays approaching the theoretical maximum, and pile-up rates exceeding 92%. Nevertheless, the arbitration tree remained fully functional and fair.

# Pile-Up Behavior

The pile-up rate, defined as the fraction of events that cannot be serviced because another request is already pending in the given pixel, serves as an indirect measure of congestion. Across all scenarios, it behaved as expected:

- Near-zero in low-rate conditions (Fig. 4.16e).
- Moderate and spatially uniform under medium rates (Fig. 4.17e).
- Approaching saturation uniformly across all pixels at high rates (Fig. 4.18e).

This smooth and controlled increase confirms that the arbitration tree does not suffer from any structural bottlenecks, even when it is fully saturated. The pile-up grows uniformly, rather than concentrating in specific regions, which validates the tree's spatial fairness.

# **Delay Dispersion and Fairness**

Figures 4.16f, 4.17f, and 4.18f collectively demonstrate that:

- The standard deviation of readout delay remains evenly distributed across all 1,024 pixels.
- No edge effects or positional biases were observed, despite differing tree depths.
- Even under heavy contention, the variance stays bounded, confirming arbitration stability.

# **Queueing-Theoretic Perspective**

The observed behavior of the system matches the predictions of the M/U/1/1024 queueing model:

- M: Events are generated by each pixel independently with exponential inter-arrival times.
- U: The service time is uniformly distributed because of the nature of the generation of acknowledge tokens.
- 1: Only one request is serviced at a time via the shared bus.
- 1024: The system's effective capacity equals the number of pixels.

This modeling framework not only explains the system's statistical response across load levels but also provides a predictive basis for parameter tuning and future scalability studies.

# Conclusion

The digital simulation results confirm that the EDWARD arbitration architecture:

- Scales linearly with request rate up to saturation,
- Preserves fairness across all pixels,
- Remains robust and deadlock-free even under severe contention,
- Matches queueing-theoretic predictions, validating both design and analytical models.

These findings validate the proposed asynchronous event-driven arbitration mechanism as a viable and efficient solution for high-density pixel readout systems.



Figure 4.16: Results obtained during simulation with low event rate.



Figure 4.17: Results obtained during simulation with medium event rate.



Figure 4.18: Results obtained during simulation with high event rate.

## Chapter 5

# **Experimental Validation**

## 5.1 Introduction and Objectives

The **EDWARD** architecture, introduced and analyzed in the preceding chapters, offers a novel event-driven readout solution with asynchronous, non-priority arbitration designed to optimize throughput, minimize latency, and ensure fairness in pixel radiation detectors. While extensive simulation-based validation has already demonstrated the theoretical advantages of the proposed design, highlighting its ability to overcome the inherent limitations of frame-based and polling-based architectures, it is essential to evaluate the system under real-world operational conditions.

This chapter presents the experimental validation of two prototype ASICs that embody the EDWARD readout scheme. The first device, **3FI65P1**, is a fully functional hybrid readout chip that integrates an analog front-end with the EDWARD architecture. Developed for full-field X-ray fluorescence imaging applications, it includes a complete signal acquisition and processing chain: charge-sensitive amplifiers, shaping filters, discriminators, extremum (peak) detectors, sample-and-hold stages, and both analog and digital output paths [120]. The 3FI65P1 has been experimentally tested both in laboratory environments and during synchrotron beamline experiments at the National Synchrotron Light Source II (NSLS-II), enabling validation of the architecture in real data acquisition conditions [121].

The second device, **EDWARD65P1**, serves as a minimal, fully digital implementation featuring integrated per-pixel event generators. It was specifically designed to test the temporal behavior, arbitration integrity, and throughput potential of the architecture in isolation from the complexities of the analog frontend. By emulating Poisson-distributed radiation hits using in-pixel generators, the EDWARD65P1 enables controlled, repeatable evaluation of readout dynamics across a  $32 \times 32$  pixel matrix [127].

The primary objectives of this chapter are to:

- Demonstrate the correct functionality and protocol compliance of both ASICs through structured testing.
- Verify timing performance and arbitration fairness under varying event rates, as predicted by simulation.
- Measure latency, throughput, and readout integrity under realistic and saturated operating conditions.

- Evaluate the analog front-end performance and capabilities of the 3FI65P1 chip.
- Compare experimental results with simulation predictions to assess the fidelity of the digital models and identify any implementation-induced deviations.

These validation results provide critical insight into the real-world behavior of the EDWARD readout architecture and establish a foundation for future large-scale implementations targeting high-rate radiation detection applications.

### 5.2 Experimental Setup

#### 5.2.1 ASIC Characterization Platforms

To experimentally evaluate the EDWARD architecture implemented in Silicon, two custom-developed ASICs – **EDWARD65P1** and **3FI65P1** – were tested under distinct but complementary setups [127]. Both devices were mounted on custom-designed daughterboards and interfaced with a National Instruments sbRIO-9629 platform, which provided programmable control via AMD Artix-7 Field-Programmable Gate Array (FPGA) and Intel Atom embedded processor [129]. The block diagram of this modular platform is shown in Figure 5.1. It enabled remote configuration, automated data acquisition, and real-time monitoring through a custom Graphical User Interface (GUI). The hardware setup is presented in Figure 5.2.

For the **EDWARD65P1**, testing focused on digital protocol validation. The FPGA was configured to capture serialized address outputs, analyze timing behavior, and measure arbitration fairness. Events were generated internally in each pixel using a built-in Poisson-based emulator, thereby eliminating the need for external stimulation sources. The fast clock driving the serializer was set to 140 MHz, which was identified as the highest frequency at which the digital data stream maintained integrity in the current setup. The acknowledge token clock, used for arbitration and synchronization, was derived by dividing this clock by 14, resulting in an effective arbitration clock of 10 MHz. It should be noted that the setup was not optimized for speed – several chip configuration options that may further improve signal integrity and throughput were not yet fully explored at the time of testing.

For the **3FI65P1** chip, which includes a full analog front-end, the testing environment was expanded to accommodate analog signal acquisition. Analog outputs were digitized using an AD9649 14-bit ADC operating up to 65 Msps [130]. The analog data stream was synchronized with digital address output during processing on FPGA. Cooling of the detector assembly was achieved using a Peltier module to maintain a stable operating temperature during continuous operation, typically around 0 °C to 5 °C. Additional power regulation, biasing DACs, and monitoring circuits were integrated on the daughterboard.

#### **5.2.2** Synchrotron Measurement at NSLS-II

To validate the full-field imaging capability of the 3FI65P1 under realistic conditions, the ASIC was bump-bonded to a planar silicon sensor and deployed at beamline 17-BM of the NSLS-II. Aerial view of NSLS-II is shown in Figure 5.3. The goal was to assess energy-resolving performance under continuous exposure to X-ray fluorescence emissions from thin-film reference samples. The incident X-ray beam excited characteristic



Figure 5.1: NI sbRIO-9629 Block Diagram [129]



Figure 5.2: Setup used for testing - daughter board can be swapped with the one containing EDWARD65P1 or 3FI65P1 chip [127].

fluorescence from elements such as calcium and copper, which were then detected by the sensor. The setup used at the beam is shown in Figure 5.4.



Figure 5.3: Aerial view of the National Synchrotron Light Source II (NSLS-II) at Brookhaven National Laboratory [131].



Figure 5.4: Schematic of the beamline configuration with the main components: X-ray beam, sample, and detector.

#### 5.2.3 Configuration and Control Methodology

Chip configuration was carried out over a dedicated I<sup>2</sup>C link driven by the sbRIO controller (see Fig. 5.5). Each pixel- and group-level configuration word was serialized on the external bus and descrialized internally by the SPB, enabling selective activation, threshold tuning, and read-out-mode selection. For **3FI65P1**, this included enabling the analog front-end, choosing between standard or charge-sharing read-out, toggling normal/test operation, and setting the individual discriminator thresholds.



Figure 5.5: Oscilloscope capture of a complete  $400\,\mathrm{kHz}\,\mathrm{I}^2\mathrm{C}$  write transaction used to program a single pixel. Red (Ch<sub>2</sub>) is SDA, blue (Ch<sub>1</sub>) is SCL; vertical cyan cursors mark byte boundaries [120]. After the **START** condition, the master sends the 7-bit chip address plus the  $\mathbb{W}$  bit, followed by the 8-bit **group address** and **pixel address**. Three data bytes load configuration banks [0-2], a mandatory **dummy** byte finalizes the on-chip SPB pipeline, and a **STOP** terminates the transaction.

For EDWARD65P1, the configuration enabled selection of in-pixel generator rates. Event rate, latency, and throughput statistics were collected over multiple runs with different mask and divider configurations, providing data to validate simulation models and identify both the Poisson process generator and the EDWARD readout architecture behavior across operating conditions.

### 5.3 Results: 3FI65P1 Experimental Testing

#### 5.3.1 Readout-Mode Verification of the 3FI65P1 Prototype

The 3FI65P1 ASIC can operate in four mutually exclusive, event-driven readout modes that trade raw throughput for improved spectroscopic fidelity. Two of these modes were exercised on a bump-bonded  $32 \times 32$  prototype coupled to a  $320 \, \mu m$ -thick silicon sensor and hosted on a compact sbRIO-9629 carrier board equipped with an AD9649 14-bit/65 MSps digitizer. The FPGA on the carrier delivered the acknowledge tokens for the EDWARD arbitration tree and timestamped both the serialized pixel address stream and the buffered analog waveform.

#### Single-pixel (throughput-optimized) mode.

Upon a local discriminator trigger, the pixel captures nine peak values internally but transmits only the amplitude of the *central* channel together with its digital address. Figure 5.6 (left) shows the characteristic staircase: the blue trace (left axis) is the analog amplitude held on the sample-and-hold (S&H), while the red trace (right axis) is the concurrently streamed 14-bit address. The one-to-one correlation between amplitude level and address verified that the on-chip analog buffer drives the external load without distortion and that the asynchronous arbitration releases the analog bus exactly once per event. No spurious samples were observed.

#### Charge-sharing compensation (energy-optimized) mode.

In the second mode a single trigger from the central pixel initiates a nine-phase sequence that reads out the peak amplitudes of the central pixel (*self*) followed by its eight first-ring neighbors in a fixed geographic order (N, NW, W, SW, S, SE, E, NE). Figure 5.6 (right) illustrates the expected nine-step blue waveform; the address line remains constant until the final phase, confirming that the pixel retains ownership of both the digital and analog buses throughout the multi-phase handshake. Post-processing the nine amplitudes off-chip reconstructs the total deposited charge and mitigates tail broadening induced by lateral charge diffusion, at the cost of a nine-fold increase in bus occupancy.





Figure 5.6: Data recorded during self-pixel readout (left) and charge-sharing readout (right). Blue – digitized analog amplitude, red – 14-bit pixel address decoded from the data stream [121].

#### Key observations.

- **Handshake integrity** neither mode exhibited malformed frames or bus contentions, demonstrating that the non-priority EDWARD arbitration preserves ordering even during extended, multi-phase transfers.
- Throughput vs. resolution single-pixel mode sustained the throughput of (10 MHz). Charge-sharing mode reduces instantaneous throughput by 9 times but allows charge-sharing reconstruction.
- Latency token-to-analog latency remained constant across modes, confirming that additional phases are pipelined without re-arbitration.

These measurements validate both functional correctness and the ability to switch readout strategy dynamically at the pixel level, being a prerequisite for adaptive detector operation at synchrotron beamline.

#### **5.3.2** Preliminary Synchrotron Beamline Results

The **3FI65P1** prototype was evaluated at the X-ray Footprinting of Biological Materials (XFP) beamline 17-BM of NSLS-II to verify its spectrometric performance under realistic photon-flux conditions. A set of thin metallic foils (Ca, Mn, Cu, Pb, and Zr) was successively illuminated with a focused monochromatic beam. At the same time, the detector, consisting of the 3FI65P1 ASIC bump-bonded to a 320 µm thick Si sensor, was operated in *charge-sharing compensation* mode and actively cooled to temperatures around 0 °C.

Figure 5.7 reproduces a representative fluorescence spectrum collected during the campaign (adapted from the preprint [121]). All major emission lines are clearly resolved, including the silicon escape peak associated with the Zr  $K_{\alpha}$  transition, confirming proper gain calibration across the full  $2.5\,\mathrm{keV}$  to  $20\,\mathrm{keV}$  window. The energy resolution extracted from the Ca  $K_{\alpha}$  and Cu  $K_{\alpha}$  peaks reached  $138\,\mathrm{eV}$  and  $308\,\mathrm{eV}$  FWHM, respectively.

A comprehensive analysis of count-rate capability, pixel-to-pixel gain dispersion, and charge-sharing reconstruction efficiency will be detailed in a forthcoming journal publication. Nevertheless, these preliminary synchrotron results already demonstrate that the mixed analog-digital architecture of 3FI65P1 meets the energy-resolution targets required for full-field X-ray fluorescence imaging and that the EDWARD event-driven readout concept performs reliably in a high-background beamline environment.



Figure 5.7: Representative fluorescence spectrum acquired at NSLS-II beamline 17-BM with sequential Ca, Mn, Cu, Pb, and Zr foil targets [121].

#### 5.4 Results: EDWARD65P1 Performance Validation

The **EDWARD65P1** ASIC was designed as a digital-only test chip to isolate and evaluate the arbitration mechanism and overall timing performance of the EDWARD readout architecture. By embedding a programmable event generator in each pixel, the chip enabled a systematic study of readout behavior under varied event generation rates, operational conditions, and arbitration conditions.

#### 5.4.1 Clock Generator Characterization

Each pixel in the **EDWARD65P1** chip contains a tunable, pausable oscillator designed to emulate event generation using a Poisson process. A 4-bit configuration field controls the oscillator frequency and is further divided by a programmable clock divider. To assess the frequency tuning characteristics of the generator and verify its agreement with simulation, the chip was configured such that only one pixel was active at a time, while the others were disabled.

In this test, the programmable divider was set to a constant value of 30. This configuration ensured that the divided oscillator output had a significantly lower frequency than the readout clock (10 MHz), thereby allowing precise measurement of the event rate. Using a lower event frequency improved the resolution of the frequency measurement, as it reduced quantization error in the time-domain counting process. Importantly, in this setup, the divided clock output was used directly as the source for generating readout requests, bypassing the remaining elements of the Poisson event generator.

Figure 5.8 presents the measured oscillator frequency as a function of the 4-bit configuration setting. The data were obtained by counting the number of events emitted by each pixel over a fixed acquisition window of 10 s. The figure includes an error bar plot summarizing the statistical spread in frequency observed across

320 measured pixels for each configuration code. Each point denotes the mean frequency, while the vertical bars represent the standard deviation of the measured frequencies, providing insight into pixel-to-pixel variability.

The measured values show good agreement with the post-layout simulations, confirming the correct implementation of the oscillator and its control logic. The data confirm correct oscillator operation and a monotonic tuning curve. A systematic up-shift with respect to simulation is visible and can be attributed to increased sensitivity to Process, Voltage, and Temperature (PVT) variations. However, this is not a problem from an operational standpoint.



Figure 5.8: Measured vs. simulated oscillator frequency across 4-bit configuration values for a subset of 320 pixels. Points represent mean frequencies; vertical error bars show the standard deviation across measured channels. The measurements were performed with a fixed divider of 30 and direct use of the divided clock for request generation [127].

#### 5.4.2 Readout and Generator Characterization

This section presents the characterization of the **EDWARD65P1** ASIC, a digital prototype designed to evaluate the event-driven readout architecture. Each pixel includes an internal digital generator that produces Poisson-distributed events with configurable rates. Measurements focused on verifying both the readout system under different matrix loads and the statistical quality of the in-pixel generators.

Figure 5.9 presents the measured inter-event interval Probability Density Functions (PDFs) for the masks 20, 15, 10, and 9. A clear transition is observed from the exponential-like behavior to saturation as the mask setting decreases. Figure 5.10 shows the PDFs when only one pixel is active (masks 5, 2, 1, 0), highlighting the accuracy of the Poisson generator at moderate rates and its systematic breakdown at very high rates. The inter-event interval distributions were first studied in a configuration where all pixels were active. The number of events generated per pixel was tuned using a binary mask, which sets the threshold in the pseudo-random number generator embedded in each pixel. At low (masks 20) and moderate (mask 15) gen-

eration rates, the distribution closely approximates the theoretical exponential curve, with minor deviations attributed to small frequency shifts caused by power supply loading when many pixels are simultaneously active. As the rate increases (masks 10 and 9), the system enters saturation: the acknowledge bandwidth (limited to 10 MHz) becomes the bottleneck. This saturation manifests as a sharp histogram peak at the minimum interval bin. The nature of the arbiter tree explains this, as the channel 0 must wait for all other 1,023 channels to be serviced before it can receive a token again. Despite this, fairness is preserved: no channel is starved.

To isolate and verify the Poisson behavior of the in-pixel generators independently of the arbitration tree, measurements were repeated with only one pixel active. In these conditions, the measured inter-arrival intervals match the exponential distribution across a wide range of settings (masks 20, 15, 10, 5). As the rate increases further (masks 2 and 1), deviations appear due to the increasing Bernoulli probability in the generator, reflecting the limitations of the digital approximation to a true Poisson process. At mask 0, where p=1, the generator outputs events in every eligible cycle, violating the statistical assumptions of a Poisson model and resulting in a histogram that no longer resembles an exponential function.

The PDFs of the inter-event intervals was constructed by configuring the ASIC, generating events internally, recording timestamps of successful readouts synchronized with the 10 MHz acknowledge clock, and creating the histograms of the differences between successive events. These histograms were then normalized to create PDFs for direct comparison with the theoretical exponential form.

Each panel in the result figures includes key metadata, as defined in Table 5.1, such as the channel's oscillator frequency, estimated generation rates, and observed throughput when all channels are active.

| Symbol                                           | Description                                                    |
|--------------------------------------------------|----------------------------------------------------------------|
| $f_{ m ch0}$                                     | Oscillator frequency measured for channel 0 (prior clock test) |
| $\lambda_{ m ch}$                                | Mean event rate produced by channel 0                          |
| $\lambda_{\rm m} = \lambda_{\rm ch} \times 1024$ | Projected matrix-wide rate assuming uniform generation         |
| $\mu_{ m all}$                                   | Rate of readouts delivered when all pixels are active          |

Table 5.1: Definitions of parameters displayed in Figures 5.9 and 5.10.

To evaluate **arbitration fairness**, the matrix was operated in a uniform mode, where all 1024 pixels had identical generator settings (mask 15). Figure 5.11 shows that the event count per pixel is uniform, indicating that the arbitration logic grants readout access evenly without favoring any particular channel. **Throughput measurements** (Fig. 5.12) further confirm that the system scales linearly with global event rate until the acknowledge frequency limit is reached.



Figure 5.9: Inter-event interval PDFs at various mask settings. Saturation appears for masks 9 and 10 when all pixels are enabled.



Figure 5.10: Inter-event interval PDFs for a single active pixel. Deviation from the exponential shape appears as the mask values decrease.



Figure 5.11: Number of readout events per pixel for the mask configuration 15 [127].



Figure 5.12: Total matrix throughput versus event generation rate. System saturates near the acknowledge limit of 10 MHz [127].

## Chapter 6

## **Conclusion**

### **6.1** Restatement of Research Purpose

This dissertation addresses a pressing problem in radiation detection: how to read out efficiently and process signals from highly granular pixelated detectors without incurring data bottlenecks or losses. The research aimed to develop a new pixel detector architecture with in-situ signal processing and an event-triggered, throughput-optimized readout architecture to read out efficiently and process signals from highly granular pixelated detectors. In particular, the work aimed to move beyond traditional frame-based, priority-driven readout or address event representation schemes by exploring a fully asynchronous, event-driven approach. By doing so, the research aimed to enable each pixel to act independently, processing signals on-pixel as events occur and transmitting data on demand, thereby minimizing dead time and maximizing the utilization of available bandwidth. The central hypothesis driving this effort stated that the novel readout architecture features near-ideal event-driven operation and asynchronous arbitration logic with non-priority access, implemented as a tree of RS-latch-based arbiters, will provide significant improvements for high-density pixel radiation detectors by achieving higher throughput, lower latency, and demonstrably fairer event handling compared to traditional frame-based, polling-based, and priority-encoded readout architectures, thus enabling a new generation of pixel detectors. To fulfill this purpose, the research combined theoretical design, simulation, hardware implementation, and experimental validation of EDWARD readout architecture, ensuring that the proposed solution was grounded in both thorough examination and practical feasibility.

## **6.2** Integrated Discussion of Research Questions and Outcomes

The investigation conducted throughout this dissertation was structured around five key research questions (RQ1-RQ5), each targeting a specific aspect of developing a fair, asynchronous, event-driven readout architecture for high-density pixel detectors. These questions were formulated in Chapter 1 and mapped to corresponding research objectives (O1-O5), ensuring a coherent and goal-driven research framework.

This section synthesizes the outcomes achieved in addressing each research question, demonstrating how the theoretical foundations, practical implementations, and empirical validations collectively support the central hypothesis. By systematically aligning each question with its objective and emphasizing the

results obtained through simulation, prototyping, and experimental testing, this discussion establishes the completeness and rigor of the research process.

The answers to RQ1-RQ5, presented below, affirm the viability, performance, and practicality of the proposed EDWARD architecture, culminating in a significant advancement in pixel readout strategies. Each outcome is detailed alongside the objective it served, offering an integrated perspective on how the dissertation successfully fulfilled its aims.

**RQ1** Is it possible to develop asynchronous arbitration logic with non-priority access to fundamentally overcome the limitations of traditional frame-based, polling-based, priority-encoded readout methods, as well as the shortcomings of AER, in high-density pixel radiation detectors?

**O1** Detailed Conceptual Design and Theoretical Analysis of the Proposed Architecture.

Outcome: This dissertation shows that the EDWARD architecture effectively addresses the limitations of both traditional and asynchronous readout schemes. Compared to frame-based and polling-based approaches, EDWARD eliminates redundant data transfer and latency that scale with matrix size by employing a fully event-driven and self-triggered readout. In contrast to EDWARD, the widely adopted AER protocol uses a distributed request-acknowledge handshake to transmit addresses of active pixels onto a shared bus. AER supports sparse readout and low latency but offers limited data richness. However, they typically lack per-event amplitude or precise timing. Additionally, while scalable, AER systems still face contention at the shared arbiter and may rely on implicit priority or external control to resolve conflicts, posing challenges for fairness and determinism in high-density applications. AERD architectures, such as those used in ALPIDE, improve arbitration stability by freezing the matrix state using a global strobe signal. This allows a prioritybased encoder tree to read out latched hits sequentially. However, this strategy introduces dead time and sacrifices timing precision, while still enforcing spatial priority. By contrast, EDWARD uses a non-priority, asynchronous arbitration tree based on RS-latch cells with memory. It supports a fair, first-come, first-served readout without global strobes, snapshots, or combinatorial encoders. Experimental results validate that this architecture ensures arbitration fairness, avoids data corruption, and scales efficiently for high-rate, sparse event detection.

**RQ2** Can a near-ideal event-driven readout architecture be developed using standard design flows and industry-standard CAD/EDA tools, and if so, what are the costs and necessary compromises?

**O2** Implementation of Non-Priority Arbitration with Asynchronous Logic Building Blocks.

**Outcome:** The development of the EDWARD architecture confirms that a near-ideal event-driven readout system can be realized using standard digital design flows and industry-standard CAD/EDA tools. The asynchronous arbitration logic, although inherently non-synchronous, was successfully integrated into a conventional RTL-to-Graphic Data System II (GDSII) flow by augmenting the digital library with custom RS-latch-based arbiter cells. The design methodology employed fully synthesizable logic for configuration, control, and readout, including integration with synchronous peripherals such as serializers, I<sup>2</sup>C interfaces, and output buffers. Despite its asynchronous nature, the arbitration tree was synthesized and placed within a hierarchical structure, supporting modularity and reuse across different ASICs. This was demonstrated in both the EDWARD65P1 and 3FI65P1 prototypes, fabricated in 65 nm CMOS. However, achieving com-

patibility with standard flows required several compromises. The asynchronous arbiters had to be custom-verified and excluded from static timing analysis, requiring targeted simulation and layout-level verification. Additionally, it is necessary to conduct mixed-mode simulations to ensure glitch-free operation and prevent unintended delays. Despite these challenges, no fundamental barriers were encountered that would preclude the adoption of the architecture in larger-scale or more advanced nodes. In conclusion, this work demonstrates that with careful architectural planning and targeted design accommodations, a high-performance event-driven system with asynchronous, non-priority arbitration can be successfully implemented within conventional industry-standard flow.

**RQ3** How good is the proposed architecture in ensuring fair and efficient handling of concurrent, asynchronous readout requests from a large pixel array, eliminating pixel prioritization and minimizing data corruption risks associated with combinatorial priority encoders?

**O3** Comprehensive Performance Evaluation through Simulation and Experimental Validation.

Outcome: The EDWARD architecture was specifically designed to ensure fair and efficient arbitration among multiple concurrent pixel requests without enforcing spatial priority. This was achieved by deploying a distributed, non-priority arbitration tree composed of RS-latch-based cells with memory. Each arbiter maintains state and holds the acknowledge path until the request is fully cleared, preventing preemption or redirection mid-transfer. This guarantees mutual exclusion and preserves data integrity even under high activity levels. Simulation results across a range of synthetic and Poisson-distributed workloads showed that arbitration was uniformly distributed over the matrix and no starvation occurred, even at high request rates. Measured latency histograms confirmed that no pixel consistently dominated the access to the shared bus, and that arbitration was governed solely by request arrival time rather than pixel position in the array. This behavior was observed both in simulations and during experimental testing of the EDWARD65P1 and 3FI65P1 ASICs. Importantly, the architecture avoids the pitfalls of combinatorial priority encoders, which are known to introduce unfairness and potential data corruption when simultaneous events are not properly isolated. In EDWARD, arbitration is local, collision-free, and inherently sequential due to the memory-based nature of each node. No global synchronization or snapshotting is required, eliminating the need for strobe signals and removing a common source of data corruption in traditional priority-based systems. In summary, the proposed architecture achieves robust and fair arbitration, eliminating pixel prioritization and enabling concurrent, high-throughput event-driven readout without compromising data integrity or performance scalability.

**RQ4** What mechanisms can be introduced in the proposed architecture to facilitate interfacing between inherently asynchronous input signals and typically synchronous data acquisition systems, and how practical are they?

**O4** Hardware Realization and Prototyping of the ASICs.

**Outcome:** The EDWARD architecture introduces several practical mechanisms to bridge the gap between its inherently asynchronous arbitration logic and conventional synchronous DAQ systems. The key interface element is the edge-triggered acknowledge signal, which acts as a token to ensure that data transfer is synchronized at the point of arbitration resolution, despite asynchronous request generation. Once a pixel

wins arbitration, the data output is synchronized to the edge of an acquisition clock, ensuring compatibility with downstream serializers and DAQ receivers. This synchronization is implemented using a serializer clock, typically operating at 250 MHz and internally divided to match the data bus width, which governs both the arbitration token rate and the serialization cadence. Data from each acknowledged event is latched and serialized using a clock-aligned protocol, allowing it to be captured reliably by external DAQ systems without introducing metastability risks or data skew. Experimental results from both EDWARD65P1 and 3FI65P1 confirm that these techniques are robust and practical, showing no data loss or desynchronization under test conditions. Overall, the architecture demonstrates that asynchronous in-pixel logic can coexist with synchronous DAQ infrastructure through localized synchronization at key control points, maintaining high throughput and data integrity without the need for global clocking inside the pixel array.

**RQ5** What are the performance advantages, in terms of energy efficiency, readout latency, data throughput, and arbitration fairness, of the novel architecture compared with representative traditional and competitive readout architectures, and how do these advantages translate to improved performance in pixel radiation detector systems?

**O5** Comparative Performance Benchmarking and Analysis.

Outcome: The proposed EDWARD architecture demonstrates significant performance advantages over traditional frame-based, polling-based, and priority-encoded readout schemes across multiple metrics. In terms of readout latency, EDWARD achieves sub-microsecond average access times under moderate load and avoids latency scaling with matrix size due to its distributed arbitration logic. Throughput remains stable and high, approaching the theoretical maximum set by the acknowledge frequency (e.g., 10–17 MHz), as only active pixels initiate transfer and each transaction is serialized efficiently without bus contention. **Arbitration fairness** is a core strength of the architecture. Unlike systems based on priority trees or token polling, EDWARD exhibits uniform access probability across the matrix, as validated by both simulation and experimental data. No starvation or readout preference is observed, even under bursty or Poisson-distributed input patterns. This ensures unbiased data acquisition, a critical requirement in high-precision imaging and tracking systems. While power consumption was not yet elaborated, the energy efficiency benefits from the event-driven nature of the architecture: inactive pixels remain idle, and only pixels with events engage digital logic. This results in substantially reduced dynamic power compared to systems with continuous scanning or global synchronization. Energy per event is minimized by eliminating redundant data movement and unnecessary toggling of digital blocks. Together, these characteristics translate to improved system-level performance in pixel radiation detectors. Faster and fairer event handling enhances time resolution and reduces pile-up. Efficient bandwidth usage allows for larger arrays or higher frame rates without increasing I/O demands. The architectural scalability ensures that future detectors with finer granularity or higher hit rates can be accommodated without requiring a fundamental redesign.

Through this structured synthesis, it is evident that each research question has been thoroughly addressed through a combination of conceptual design, implementation, and empirical validation. The alignment between research questions, objectives, and outcomes ensures a robust evaluation of the proposed approach. The EDWARD architecture was demonstrated to overcome key limitations of traditional readout systems,

including eliminating pixel prioritization, ensuring arbitration fairness, and enabling efficient, low-latency, and energy-aware operation.

This integrated discussion substantiates the central hypothesis of the dissertation: that a non-priority, asynchronous, event-driven readout architecture can be practically realized using standard design flows, and that it offers compelling advantages over conventional methods. The insights gained here provide the foundation for the contributions outlined in the following section, which situates the research within the broader context of detector design and future technological development.

### **6.3** Key Contributions and Scientific Significance

Building on the research questions and findings, this work offers several original contributions to the field of radiation detector instrumentation. The key contributions of the dissertation, along with their scientific significance, are outlined as follows:

- Development of the EDWARD architecture for pixel readout: The main achievement of this work is the design of the EDWARD readout architecture a novel event-driven, asynchronous readout scheme with a non-priority arbitration tree. Unlike most prior pixel readout systems, EDWARD eliminates the need for a global clock or a rigid token-passing order by allowing readout requests to be arbitrated on the fly in a fair manner. This contribution is significant because it introduces a new paradigm for efficiently reading out large arrays of detector channels. Scientifically, it addresses a long-standing challenge in detectors: how to handle sparse but potentially intense bursts of data without being bottlenecked by synchronous or priority-based protocols. The EDWARD concept expands the design toolkit available to detector engineers, proving that fully asynchronous on-chip communication for large pixel arrays is both possible and advantageous.
- Integration of in-situ signal processing with event-driven readout: The dissertation demonstrates, for the first time, a pixel detector system where each pixel not only detects radiation events but also performs immediate signal processing (such as peak detection or digitization) and triggers its own readout. This integration of analog front-end processing in the pixel with the asynchronous digital readout logic enables the capture and transmission of rich information (e.g., the charge magnitude of each event, the timing of occurrence, and the pixel address) in real-time per event. The scientific significance of this contribution lies in enabling high-resolution, high-throughput measurements: for example, in X-ray or particle imaging, knowing the precise charge deposited by each photon or particle (as is possible with the in-pixel peak detection) adds an energy or spectral dimension to imaging. By reading out this information only when an event occurs, the system significantly reduces redundant data (no blank frames or idle readout cycles) and thus optimizes the overall data throughput. This contribution advances the state of the art by showing how front-end analog circuitry (for signal extraction) can be seamlessly combined with a global asynchronous digital network, leading to detectors that are both *intelligent* and *efficient*.

• Throughput and latency improvements via non-priority arbitration: A major technical advancement achieved in this work is the demonstration of improved throughput and reduced latency in pixel readout using the non-prioritized, asynchronous arbitration scheme. Performance analysis (Chapters 4 and 5) showed that the EDWARD architecture can sustain continuous readout with effectively no enforced idle times, as each event is processed as soon as it is requested. The next event is serviced immediately upon completion of the previous one. The only delay between back-to-back events is the single clock edge needed as an acknowledgment token, ensuring no dead time. Consequently, the architecture approaches 100% utilization of the readout bus under high event rates, something unattainable in many conventional designs that require multi-cycle handshakes or frame resets. Latency-wise, by not prioritizing any particular channel, the worst-case wait time for any given pixel is dramatically reduced relative to a priority- or token-based system (where a low-priority pixel might wait through many other transactions). Fair arbitration ensures that all pixels have an equal opportunity to be read promptly, and indeed, the measured latencies remained bounded and relatively uniform across the array, even at high loads. The significance of this contribution lies in its validation of a new approach to maximize detector readout performance: utilizing asynchronous logic to eliminate structural wait times. This can be particularly transformative for experiments at next-generation facilities (e.g., freeelectron lasers or high-luminosity colliders) where detectors must cope with unprecedented data rates without losing temporal resolution.

- Hardware prototyping and validation in silicon: An essential contribution of this dissertation is the successful realization of the proposed concepts in actual silicon chips. Two prototype ASICs were designed, fabricated, and tested, serving as proof-of-concept implementations. By delivering a working 32 × 32 pixel detector (3FI65P1) and a specialized test chip (EDWARD65P1), the research goes beyond simulation and demonstrates real hardware operation of the event-driven architecture. This contribution is significant because it provides empirical evidence that the theoretical benefits of EDWARD are achievable in practice. The hardware prototyping effort also contributed new design methodologies, such as the creation of a modular CTR platform for mixed-signal integration and techniques to incorporate asynchronous arbiters within standard design flows. These practical insights are valuable to the scientific community, as they illustrate how to tackle the challenges of implementing novel asynchronous circuits in modern Very–Large–Scale Integration (VLSI) processes. The measurement results from the prototypes (e.g., confirming no data loss and characterizing readout timing) lend strong credibility to the approach and pave the way for scaling up the system in future projects.
- Advancement of knowledge in asynchronous detector readouts: On a broader level, this dissertation contributes to scientific knowledge by exploring and validating asynchronous communication principles in the context of radiation detectors. Prior to this work, fully asynchronous, event-driven systems were not standard in mainstream pixel detector designs, partly due to concerns about complexity and metastability. The dissertation not only proposes such an asynchronous system but also thoroughly characterizes its behavior and reliability. In doing so, it provides a blueprint for managing metastability (through proper arbiter design and synchronizer use) and ensuring robust operation without a global clock. This is a conceptual contribution with significance beyond the specific detectors

built: it encourages a rethinking of how detectors and possibly other distributed electronic systems (e.g., neuromorphic sensor arrays or network-on-chip systems) can be designed for high concurrency. By demonstrating that an asynchronous architecture can match or exceed the performance of synchronous ones in demanding applications, the work has expanded the horizon of what is considered feasible in detector readout design.

Collectively, the above contributions mark a significant technical advancement in pixel detector instrumentation. Each contribution addresses a different aspect of the challenge: from conceptual architecture to circuit implementation and performance gain, underscoring the holistic approach of this research. The scientific significance lies in both the specific solution provided (which directly benefits applications requiring high-rate, high-detail radiation imaging) and the broader demonstration that marrying asynchronous digital logic with detector systems can unlock new levels of performance. This dissertation thus adds to the body of knowledge by solving a concrete problem and opening avenues for new techniques in the design of high-throughput electronic readouts.

## 6.4 Limitations and Challenges

Notwithstanding its successes, this research acknowledged certain limitations and confronted various challenges throughout the project. A reflective critique of these limitations is important for a balanced understanding of the work and for guiding future efforts.

One limitation arises from the inherent complexity of asynchronous circuit design. Standard digital design flows and EDA tools are typically geared toward synchronous logic; therefore, implementing elements like arbiters (which operate without a global clock and resolve nearly simultaneous requests) required special considerations. The need to use custom or analog-style components for arbitration (e.g., the Seitz arbiter circuits) made it challenging to characterize their behavior precisely. Although transient analog simulations were employed to validate these components, the process was time-consuming and demanded careful modeling of metastability behavior. This highlights a limitation in the broader engineering ecosystem: a relative lack of readily available design Intellectual Property (design block) (IP) and automated verification tools for asynchronous logic. Consequently, the design and verification cycle for EDWARD was more involved than a comparable synchronous design, and this could pose a barrier if one were to implement much larger systems purely asynchronously.

Another challenge and limitation pertains to the scale and scope of the prototypes. The 3FI65P1 chip, with a  $32 \times 32$  pixel matrix, while sufficient to demonstrate core functionality, is modest in size compared to state-of-the-art pixel detectors (which often have tens or hundreds of thousands of pixels). As such, questions of scalability remain partially open. The arbitration tree in EDWARD grows in depth as the number of channels increases, which could introduce longer arbitration delays or greater area overhead in a significantly larger array. While the modular nature of the CTR platform and the hierarchical arbitration design are intended to facilitate scaling (and simulations did not indicate fundamental bottlenecks up to reasonable sizes), only a relatively small-scale system was physically realized within this project. Thus, one limitation is that performance at a very large scale (e.g. for a megapixel detector) was not directly proven in

hardware; further engineering effort will be needed to confirm that the advantages of EDWARD persist at larger scales and to optimize the arbitration network for such scenarios.

The analog front-end aspect of the design also introduces some limitations. In 3FI65P1, each pixel's analog circuitry (for charge amplification and peak detection) was designed to fit within a particular area and power budget, and to meet the specific requirements of the targeted X-ray imaging application. This means that the analog performance (such as noise levels, charge measurement precision, and analog bandwidth) is tuned for a particular use case. If one considers other applications (for instance, particle tracking or very high-rate neutron detection), the analog design may require significant re-optimization. Additionally, the presence of analog circuits alongside asynchronous digital logic necessitated careful floorplanning and isolation techniques to mitigate noise coupling, which was a challenge encountered during implementation. For example, asynchronous digital transitions occur in response to irregular event timings, which could potentially introduce digital switching noise at unpredictable intervals; ensuring this did not degrade the sensitive analog measurements was a non-trivial task. While the dissertation managed this interplay in the prototypes, it remains a design challenge going forward, and some limitations in analog performance could emerge under extreme conditions (e.g., if the event rate becomes so high that analog circuits are continuously busy, or if process variations affect matching in the peak detectors).

A further limitation to note is related to the fairness and throughput of arbitration under certain pathological conditions. By design, EDWARD does not impose priority, which generally ensures fairness; however, if one or a few pixels were to fire at an excessively high rate continuously (far above the average rate of others), the arbitration will still attempt to service them as often as requested, potentially hogging the bus simply because they always have pending requests. In other words, EDWARD guarantees no static priority bias. Still, it does not inherently throttle any channel – so a "hot" pixel can, in principle, consume a significant fraction of bandwidth if all other pixels are quiet. In our tests, this scenario was not problematic (indeed, if one pixel sees many more events, it is expected to use more bandwidth; and when multiple pixels have pending events, the arbiter naturally interleaves them). But a deliberate denial-of-service type scenario (one pixel endlessly requesting readout) would still hold off others until it momentarily has no request. This is intrinsic to any fair, first-come-first-served system and not a flaw per se, but it is a scenario to keep in mind. The architecture mitigates issues like starvation, but it doesn't enforce equal bandwidth distribution beyond what the random arrival process dictates. For most detector uses, this is acceptable, as physical processes, not adversarial signals, dictate event rates. Nonetheless, in future designs, one might consider adding rate limiters or dynamic priority adjustments if needed to handle extreme rate disparities.

Finally, it should be noted that the testing of the hardware, at the time of writing, is preliminary. While the results are promising and align with expectations, complete characterization (especially of the 3FI65P1 in a realistic experimental environment, such as at a synchrotron beamline for X-ray fluorescence imaging) was outside the time scope of this dissertation. This means that some performance metrics, such as long-term stability, radiation tolerance of the ASIC, or behavior under maximum designed radiation flux, have not yet been fully evaluated. These represent practical limitations of the current work, not in concept but in the extent of demonstration. A cautious interpretation is that further testing is warranted to verify the system under all intended operating conditions completely.

In summary, the limitations of this work largely stem from the cutting-edge nature of the approach: the use of asynchronous logic presented design and verification challenges, the prototype scale leaves open questions of extreme scaling, and certain real-world performance aspects remain to be explored. Acknowledging these challenges provides valuable lessons. It underscores the importance of developing better asynchronous design tool support, encourages strategies for scaling up the architecture, and calls for comprehensive testing under various conditions. Significantly, none of these limitations undermines the dissertation outcomes; instead, they point to areas where additional effort and refinement can further strengthen and extend the contributions of this research.

## 6.5 Broader Implications and Recommendations

The innovations and findings of this dissertation carry several broader implications for the field of radiation detection and electronic readout systems. First and foremost, the success of the event-triggered, throughput-optimized readout validates a new approach to handling the "data deluge" in modern detectors. As experimental science advances towards higher-intensity beams, faster repetition rates, and larger detector arrays, the traditional approach of rigid, synchronous frame-based readout is increasingly strained. The work presented here implies that a shift toward event-driven architectures could be a key part of the solution. By only communicating data when and where an event occurs, future detectors can significantly reduce unnecessary data transfer and focus on the most salient information. This has a cascade of positive effects: lowering data acquisition system loads, reducing storage and bandwidth requirements, and enabling more real-time processing of critical events (since irrelevant idle data is not occupying time). Thus, one broad implication is that event-driven readout can improve not just the detector chip itself, but the entire experimental data flow from sensor to analysis, making high-throughput experiments more feasible and cost-effective.

Another implication relates to time resolution and scientific measurement capabilities. The EDWARD architecture inherently does away with the concept of a "frame rate" and instead allows each pixel to report events asynchronously with minimal delay. This means that the timing of each event is primarily limited by the electronics' response time (on the order of nanoseconds to microseconds), rather than by a global shutter or readout cycle (which might be milliseconds in a frame-based system). The implication for experiments is significant: detectors using such architectures can achieve much higher temporal resolution. For example, phenomena that occur at random times or very rapidly can be captured without the quantization error of frame periods. As noted in the results, eliminating the need for frame readout can enable continuous time measurements, which are particularly useful in domains like particle physics (e.g., future EIC detectors where event timing is crucial) or X-ray imaging of dynamic processes. Researchers in these fields could leverage such technology to observe fast transient events that would be smeared out or missed entirely by slower readout methods.

From a broader technological perspective, this work also advocates for a more widespread adoption of asynchronous design principles in high-performance VLSI systems. The demonstrated advantages – lower power when data is sparse (since the circuit naturally idles in the absence of events), no need for distributing high-frequency clocks across large areas, and graceful handling of irregular workloads are appealing not only for radiation detectors but for any system where data arrives sporadically or in bursts. This includes

imaging sensors (e.g., neuromorphic vision sensors or "event cameras"), large sensor networks, and even communications routers on a chip. The recommendation here is that designers of such systems consider event-driven asynchronous architectures as a viable alternative to conventional designs. The dissertation presents a case study on how to design, implement, and validate such a system, serving as a template or motivation for other applications. To facilitate this, it would be beneficial for the community to invest in better design tools and libraries for asynchronous circuits, as well as educational initiatives to train new engineers in these techniques.

In terms of specific recommendations stemming from the challenges encountered, a few key points can be highlighted. For future detector development projects considering similar architectures, it is recommended to integrate built-in test and calibration features (as was done with the EDWARD65P1 test pixel generators) to enable thorough evaluation of readout performance post-fabrication. This approach proved invaluable for quantitatively measuring latency and throughput without requiring an external pulsed radiation source and can be generalized as a best practice when prototyping novel readout schemes. Additionally, when moving to larger scales, one should consider a hierarchical arbitration strategy (already inherent in EDWARD) and perhaps even a network-of-arbiter approach where different regions of the detector handle local arbitration and then merge, to maintain speed. The modular CTR framework used in this dissertation is one example of how to partition a design for scalability. Future implementations might further refine this approach by, for instance, allocating separate data buses for sub-regions of a large array to avoid saturation on a single bus.

Finally, it is recommended that future projects maintain a close coupling between theoretical modeling and experimental work. This dissertation benefited greatly from continuous feedback between simulation and measurement. The issues found in simulation-guided design tweaks and real-world testing informed which theoretical assumptions were held. For complex asynchronous systems, where intuition derived from synchronous systems may not be directly applicable, having this iterative approach is crucial. Therefore, the broader recommendation is methodological: embrace a holistic design cycle that includes analytical reasoning, simulation (both digital and analog), hardware prototyping, and empirical validation. This derisks the adoption of innovative techniques and builds confidence in the results, as exemplified by the positive outcomes of this research.

#### **6.6** Directions for Future Work

While the present work has answered the primary research questions and demonstrated the feasibility and advantages of the proposed approach, it also opens up several avenues for future investigation and development. Building on the foundation laid by this dissertation, the following directions for future work are identified:

• Scaling to Larger Detector Systems: Future work should explore scaling the EDWARD architecture to much larger pixel arrays (e.g. 256 × 256 or 1024 × 1024 pixels, or beyond). This will involve studying the arbitration tree depth and bus loading in greater detail. Techniques such as tiling the detector into regions (each with its own arbitration subtree) and then orchestrating between regions

might be necessary to maintain low latency at scale. Prototypes of larger size, or multi-chip systems using EDWARD on each chip, would be valuable for confirming performance in a regime closer to that of real large-area detectors used in high-energy physics or synchrotron facilities.

- Enhanced Performance Characterization: A thorough characterization of the existing prototypes under a variety of conditions is an immediate next step. This includes testing the 3FI65P1 detector with actual radiation sources or at a beamline to evaluate its performance in measuring real events (energy resolution, count rate capability, imaging fidelity) using the event-driven readout. Additionally, pushing the system to its limits in terms of event rate will help establish the actual maximum throughput and identify any subtle effects (e.g., subtle arbitration biases or analog artifacts at extreme rates). The data gathered can guide refinements in the subsequent design iteration.
- Power Optimization and Management: Event-driven architectures naturally save power by being mostly idle when no events occur; however, when events are continuous (high flux scenarios), the power consumption could approach that of a continuously clocked system. Future work could investigate power reduction techniques that complement the EDWARD scheme, such as adaptive biasing of analog front-ends based on event rate or dynamic adjustment of the acknowledge clock frequency depending on load. Moreover, measuring and optimizing the power performance of the prototypes (both average and per-event energy cost) will be crucial for applications like space-based detectors or large-scale systems where power is a concern.
- Incorporating Advanced In-Pixel Processing: With the flexible readout in place, another direction is to incorporate more advanced in-situ processing at the pixel or local group level. For example, one could integrate analog or digital filters to detect only events of interest (suppressing noise or common-mode background) or even implement rudimentary data compression or feature extraction in each pixel before readout. Since EDWARD allows additional digital data from pixels to be transmitted, future versions of the pixel logic could send, for instance, a time-stamp or an energy classification tag along with the event, enabling sophisticated event-by-event analysis upstream. Research into the level of processing that can be performed on-chip without overwhelming the pixel area/power budget would extend the architecture's usefulness in applications that require real-time decision-making or massive data reduction at the sensor.
- Robustness and Metastability Handling: Although the current design manages metastability through the arbiter circuits, future work could delve deeper into formal verification of the asynchronous elements and potentially explore alternative arbiter designs or protocols that further reduce metastability risks. For instance, investigating multi-token arbitration (where more than one event can be processed in a pipelined fashion if the bus protocol allows it) might increase throughput, although it introduces complexity. Additionally, studying the system's robustness in various environmental conditions (voltage, temperature, radiation damage) will be vital if it's to be deployed in real experiments. Future chips might incorporate redundancy or error-detecting features to ensure reliable operation over time.

• Application to Other Domains: Finally, a promising direction is to take the principles demonstrated here and apply them to other sensing modalities. For example, could a similar event-driven readout be used in large imaging CMOS sensors for visible light, where only pixels that detect a change (such as motion) communicate data? Or in large FPGA-based data acquisition systems, could asynchronous arbitration networks improve how multiple data streams are merged? Exploring these cross-domain applications will not only broaden the impact of this work but may also provide new insights and feedback that enrich the design. Collaboration with other research fields, such as neuromorphic engineering (as hinted by the parallels drawn in Chapter 3), could yield innovative hybrids of technology.

These future work directions underscore that the conclusion of this dissertation is not the end of the development of event-driven pixel readouts, but rather a beginning. The prototypes and concepts proven here serve as stepping stones. By pursuing the lines of inquiry above, subsequent researchers can enhance the performance, scalability, and applicability of the EDWARD architecture, ultimately bringing such systems to maturity for deployment in cutting-edge scientific instruments.

#### **6.7** Final Reflections

In conclusion, this dissertation presents and validates a novel approach to pixel detector readout that combines the advantages of in-situ signal processing with an event-triggered, throughput-optimized data acquisition scheme. Through a journey that spanned theoretical conception, detailed design, and empirical demonstration, the research addressed a fundamental challenge: how to efficiently capture and transmit only the meaningful slices of an immense data volume generated by modern radiation sensors. The development of the EDWARD architecture and its successful implementation in silicon stand as proof of the central hypothesis that removing artificial sequencing constraints (like priority and frame clocks) unleashes the true potential of pixelated detectors. The dissertation not only solved the posed research problem but also contributed a fresh perspective to the field, demonstrating that asynchronous, on-demand readout is an intriguing idea and a practical, superior solution in applicable contexts.

The impact of this work is multifaceted. In a direct sense, it provides a blueprint for next-generation detectors capable of keeping pace with the increasing demands of high-rate experiments, thereby enabling new scientific discoveries (for instance, by allowing imaging detectors to operate at higher frame-equivalent rates or capturing fast transient phenomena with unprecedented temporal resolution). In a broader sense, the research serves as a case study in innovation at the intersection of electronic engineering and experimental physics. It exemplifies how re-examining first principles, here, questioning the necessity of a clocked, priority-driven readout paradigm, can lead to breakthroughs that overcome long-standing bottlenecks.

Furthermore, the journey documented in this dissertation highlights the synergy between theoretical and experimental work. By tackling both the conceptual and practical aspects of the problem, the author has demonstrated that it is possible to translate an abstract idea into a tangible device that functions effectively in the real world. This reinforces an important lesson for scientific advancement: meaningful progress often requires not only proposing new ideas but also rigorously testing and refining them through implementation.

Each challenge encountered, from design complexities to measurement nuances, ultimately strengthened the outcome and provided deeper insight.

As the field moves forward, this dissertation will leave a lasting impact through both concrete deliverables (the chips and designs developed) and intellectual inspiration. Others can build upon the circuits, methodologies, and results reported here to create even more capable systems. The final takeaway is optimism and confidence in the event-driven approach. By affirming the feasibility and advantages of pixel detectors with in-situ processing and asynchronous readout, this work lays a firm foundation for future innovations. The author hopes that the ideas and results presented herein will inspire further research, development, and adoption of such technologies, ultimately leading to more innovative and faster detectors that empower the next generation of scientific exploration. In closing, the success of the EDWARD architecture in this research heralds a promising direction for the field, and its potential is only beginning to be realized. The story of event-driven pixel readouts will continue to unfold, fueled by the achievements and lessons of this dissertation, and pointing toward a future where our instruments are as dynamic and responsive as the phenomena they measure.

| 6. Conclusion | 146 |
|---------------|-----|
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |
|               |     |

# **Bibliography**

- [1] S. Acharya *et al.*, "Alice upgrades during the lhc long shutdown 2," *Journal of Instrumentation*, vol. 19, no. 05, p. P05062, may 2024. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/19/05/P05062
- [2] J. Allen *et al.*, "Early-Career Researchers' Perspective on Future Colliders," CERN, Tech. Rep., 2024. [Online]. Available: https://cds.cern.ch/record/2904262
- [3] R. Abdul Khalek *et al.*, "Science requirements and detector concepts for the electron-ion collider: Eic yellow report," *Nuclear Physics A*, vol. 1026, p. 122447, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375947422000677
- [4] G. S. Coats and R. B. Gunderman, "X-ray astronomy," *Journal of the American College of Radiology*, vol. 9, no. 1, pp. 3–6, Jan 2012. [Online]. Available: https://doi.org/10.1016/j.jacr.2011.08.013
- [5] M. Chen *et al.*, "Radiation effects on scientific complementary metal-oxide-semiconductor detectors for x-ray astronomy: II. Total ionizing dose irradiation," *Journal of Astronomical Telescopes, Instruments, and Systems*, vol. 10, no. 2, p. 026001, 2024. [Online]. Available: https://doi.org/10.1117/1.JATIS.10.2.026001
- [6] H. Wen, M. J. Cherukara, and M. V. Holt, "Time-resolved x-ray microscopy for materials science," Annual Review of Materials Research, vol. 49, no. Volume 49, 2019, pp. 389–415, 2019. [Online]. Available: https://www.annualreviews.org/content/journals/10.1146/annurev-matsci-070616-124014
- [7] K. Taguchi *et al.*, "Photon counting detector computed tomography," *IEEE Transactions on Radiation and Plasma Medical Sciences*, vol. 6, no. 1, pp. 1–4, 2022.
- [8] A. Gonzalez-Montoro *et al.*, "Evolution of pet detectors and event positioning algorithms using monolithic scintillation crystals," *IEEE Transactions on Radiation and Plasma Medical Sciences*, vol. 5, no. 3, pp. 282–305, 2021.
- [9] B. F. Hutton, "The origins of spect and spect/ct," *European Journal of Nuclear Medicine and Molecular Imaging*, vol. 41, no. 1, pp. 3–16, May 2014. [Online]. Available: https://doi.org/10.1007/s00259-013-2606-5
- [10] A. Saba, "File:ALICE ITS.jpg Wikimedia Commons commons.wikimedia.org," https://commons.wikimedia.org/wiki/File:ALICE\_ITS.jpg, [Accessed 06-04-2025].

[11] VERITAS, "File: VERITAS array.jpg - Wikimedia Commons — commons.wikimedia.org," https://commons.wikimedia.org/wiki/File: VERITAS\_array.jpg, [Accessed 06-04-2025].

- [12] C. Karunakaran, R. Lahlali, and N. o. Zhu, "Factors influencing real time internal structural visualization and dynamic process monitoring in plants using synchrotron-based phase contrast x-ray imaging," *Scientific Reports*, vol. 5, no. 1, p. 12119, Jul 2015. [Online]. Available: https://doi.org/10.1038/srep12119
- [13] M. Wilson, "New CT scanner bolsters medical capabilities in theater," https://www.afcent.af.mil/Units/455th-Air-Expeditionary-Wing/Photos/igphoto/2001672286/, [Accessed 06-04-2025].
- [14] IDuke, "File:Xray-verkehrshaus.jpg Wikimedia Commons commons.wikimedia.org," https://commons.wikimedia.org/wiki/File:Xray-verkehrshaus.jpg, [Accessed 06-04-2025].
- [15] L. Tomala. (2016, May) Poles develop a super-fast x-ray camera. PAP Science and Scholarship in Poland. Science in Poland. [Online]. Available: https://scienceinpoland.pl/en/news/news% 2C409587%2Cpoles-develop-a-super-fast-x-ray-camera.html
- [16] M. Garcia-Sciveres and N. Wermes, "A review of advances in pixel detectors for experiments with high rate and radiation," *Reports on Progress in Physics*, vol. 81, no. 6, p. 066101, may 2018. [Online]. Available: https://dx.doi.org/10.1088/1361-6633/aab064
- [17] D. S. Górni, "System wbudowany dla testów hybrydowych detektorów pikselowych: Embedded system for hybrid pixel detectors testing," Praca dyplomowa (Master's Thesis), AGH University of Science and Technology, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, Kraków, Poland, 2020.
- [18] M. Garcia-Sciveres, "Hybrid pixel readout integrated circuits," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 1057, p. 168725, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0168900223007167
- [19] J. C. Chistiansen and M. L. Garcia-Sciveres, "RD Collaboration Proposal: Development of pixel readout integrated circuits for extreme rate and radiation," CERN, Geneva, Tech. Rep., 2013, the authors are editors on behalf of the participating institutes. the participating institutes are listed in the proposal. [Online]. Available: https://cds.cern.ch/record/1553467
- [20] C. Soós et al., "Versatile link plus transceiver development," Journal of Instrumentation, vol. 12, no. 03, p. C03068, mar 2017. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/12/03/C03068
- [21] M. B. Valentin *et al.*, "In-pixel ai for lossy data compression at source for x-ray detectors," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 1057, p. 168665, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900223006551

[22] S. A. Taylor *et al.*, "Ccd and cmos imaging array technologies: technology review," *UK: Xerox Research Centre Europe*, pp. 1–14, 1998.

- [23] S. Heuvelmans and M. Boerrigter, "A pixel read-out architecture implementing a two-stage token ring, zero suppression and compression," *Journal of Instrumentation*, vol. 6, no. 01, p. C01093, jan 2011. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/6/01/C01093
- [24] P. Purohit and R. Manohar, "Asynchronous, event-driven readout for large-scale imaging devices," in 2025 29th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), 2025, pp. 118–125.
- [25] P. Fischer, G. Comes, and H. Krüger, "Mephisto a 128-channel front end chip with real time data sparsification and multi-hit capability," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 431, no. 1, pp. 134–140, 1999. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900299002624
- [26] P. Yang et al., "Low-power priority address-encoder and reset-decoder data-driven readout for monolithic active pixel sensors for tracker system," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 785, pp. 61–69, Jun 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900215002818
- [27] P. Lichtsteiner, C. Posch, and T. Delbruck, "A  $128 \times 128 \ 120 \ db \ 15 \ \mu s$  latency asynchronous temporal contrast vision sensor," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, pp. 566–576, 2008.
- [28] W. R. Leo, *Techniques for Nuclear and Particle Physics Experiments*, 2nd ed. Springer-Verlag, 1994.
- [29] G. F. Knoll, Radiation Detection and Measurement, 4th ed. Wiley, 2010.
- [30] C. Van Eijk, Basic Radiation Detectors. Chapter 6. IAEA, Jan. 2025.
- [31] F. Gómez *et al.*, "Development of an ultra-thin parallel plate ionization chamber for dosimetry in flash radiotherapy," *Medical Physics*, vol. 49, no. 7, pp. 4705–4714, 2022. [Online]. Available: https://aapm.onlinelibrary.wiley.com/doi/abs/10.1002/mp.15668
- [32] D. West, "2 energy measurements with proportional counters," in *Progress in Nuclear Physics* (*Second Edition*), 2nd ed., O. FRISCH, Ed. Pergamon, 2013, pp. 18–62. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9781483199894500057
- [33] W. Blum and L. Rolandi, Particle Detection with Drift Chambers. Springer, 2008.
- [34] D. Adams *et al.*, "Photon detector system timing performance in the dune 35-ton prototype liquid argon time projection chamber," *Journal of Instrumentation*, vol. 13, no. 06, p. P06022, jun 2018. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/13/06/P06022

- [35] J. B. Birks, The Theory and Practice of Scintillation Counting. Pergamon Press, 1964.
- [36] T. Yanagida, "Inorganic scintillating materials and scintillation detectors," *Proc Jpn Acad Ser B Phys Biol Sci*, vol. 94, no. 2, pp. 75–97, 2018.
- [37] J. D. Bronzino, "Chapter 15 radiation imaging," in *Introduction to Biomedical Engineering (Third Edition)*, 3rd ed., ser. Biomedical Engineering, J. D. Enderle and J. D. Bronzino, Eds. Boston: Academic Press, 2012, pp. 995–1038. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9780123749796000150
- [38] D. Jenkins, "Scintillator detectors for gamma-ray detection," in *Radiation Detection for Nuclear Physics*, ser. 2053-2563. IOP Publishing, 2020, pp. 5–1 to 5–31. [Online]. Available: https://dx.doi.org/10.1088/978-0-7503-1428-2ch5
- [39] G. Lutz, Semiconductor Radiation Detectors. Springer, 2007.
- [40] H. Spieler, Semiconductor Detector Systems. Oxford University Press, 2005.
- [41] R. Ballabriga *et al.*, "Photon counting detectors for x-ray imaging with emphasis on ct," *IEEE Transactions on Radiation and Plasma Medical Sciences*, vol. 5, no. 4, pp. 422–440, 2021.
- [42] M. An et al., "A low-noise cmos pixel direct charge sensor, topmetal-ii-," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 810, pp. 144–150, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900215015727
- [43] T. Tanimori *et al.*, "Development of an imaging microstrip gas chamber with a 5 cm × 5 cm area based on multi-chip module technology," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 381, no. 2, pp. 280–288, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900296008339
- [44] E. Babichev *et al.*, "High pressure multiwire proportional and gas microstrip chambers for medical radiology," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 360, no. 1, pp. 271–276, 1995, proceedings of the Sixth Pisa Meeting on Advanced Detectors. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0168900295000976
- [45] Sanna, Isabella, "Novel silicon detectors in alice at the lhc: The its3 and alice 3 upgrades," *EPJ Web Conf.*, vol. 296, p. 08002, 2024. [Online]. Available: https://doi.org/10.1051/epjconf/202429608002
- [46] R. D. Evans, The Atomic Nucleus. McGraw-Hill, 1955.
- [47] W. Heitler, The Quantum Theory of Radiation, 3rd ed. Oxford University Press, 1954.
- [48] O. Klein and Y. Nishina, "Über die streuung von strahlung durch freie elektronen nach der neuen relativistischen quantendynamik von dirac," *Zeitschrift für Physik*, vol. 52, pp. 853–868, 1929.

[49] H. Bethe and J. Ashkin, "Passage of radiation through matter," in *Experimental Nuclear Physics*. Wiley, 1953, vol. 1.

- [50] J. F. Ziegler, J. P. Biersack, and U. Littmark, *The Stopping and Range of Ions in Solids*. Pergamon Press, 1985.
- [51] M. B. Chadwick *et al.*, "Endf/b-viii.0: The 8th major release of the nuclear reaction data library," *Nuclear Data Sheets*, vol. 148, pp. 1–142, 2018.
- [52] S. Ramo, "Currents induced by electron motion," *Proceedings of the IRE*, vol. 27, no. 9, pp. 584–585, 1939.
- [53] A. Rivetti, *CMOS: Front-End Electronics for Radiation Sensors*, ser. Devices, Circuits, and Systems. Boca Raton, FL: CRC Press, Jun. 2015.
- [54] A. Banerjee, *Noise in Semiconductor Devices*. Cham: Springer Nature Switzerland, 2024, pp. 139–179. [Online]. Available: https://doi.org/10.1007/978-3-031-45750-0\_10
- [55] P. Grybos *et al.*, "32k channels readout ic for single photon counting detectors with 75 um pitch, enc of 123 e- rms, 9 e- rms offset spread and 2/
- [56] C. Brönnimann and P. Trüb, *Hybrid Pixel Photon Counting X-Ray Detectors for Synchrotron Radiation*. Cham: Springer International Publishing, 2016, pp. 995–1027. [Online]. Available: https://doi.org/10.1007/978-3-319-14394-1\_36
- [57] I. Perić, C. Kreidl, and P. Fischer, "Hybrid pixel detector based on capacitive chip to chip signal-transmission," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 617, no. 1, pp. 576–581, 2010, 11th Pisa Meeting on Advanced Detectors. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900209017847
- [58] G. Alimonti *et al.*, "Rd53 pixel readout integrated circuits for atlas and cms hl-lhc upgrades," *Journal of Instrumentation*, vol. 20, no. 03, p. P03024, mar 2025. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/20/03/P03024
- [59] P. Delpierre, "A history of hybrid pixel detectors, from high energy physics to medical imaging," *Journal of Instrumentation*, vol. 9, no. 05, p. C05059, may 2014. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/9/05/C05059
- [60] R. Ballabriga, M. Campbell, and X. Llopart, "An introduction to the medipix family asics," *Radiation Measurements*, vol. 136, p. 106271, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1350448720300354
- [61] J. Dudak, "High-resolution x-ray imaging applications of hybrid-pixel photon counting detectors timepix," *Radiation Measurements*, vol. 137, p. 106409, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1350448720301888

[62] Q. Zhang *et al.*, "Submillisecond X-ray photon correlation spectroscopy from a pixel array detector with fast dual gating and no readout dead-time," *Journal of Synchrotron Radiation*, vol. 23, no. 3, pp. 679–684, May 2016. [Online]. Available: https://doi.org/10.1107/S1600577516005166

- [63] G. Deptuch, "New Generation of Monolithic Active Pixel Sensors for Charged Particle Detection," Theses, Université Louis Pasteur Strasbourg I, Sep. 2002. [Online]. Available: https://theses.hal.science/tel-00011109
- [64] N. Apadula *et al.*, "Monolithic active pixel sensors on cmos technologies," 2022. [Online]. Available: https://arxiv.org/abs/2203.07626
- [65] G. Deptuch *et al.*, "Simulation and measurements of charge collection in monolithic active pixel sensors," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 465, no. 1, pp. 92–100, 2001, sPD2000. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900201003618
- [66] M. Havránek *et al.*, "Dmaps: a fully depleted monolithic active pixel sensor—analog performance characterization," *Journal of Instrumentation*, vol. 10, no. 02, p. P02013, feb 2015. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/10/02/P02013
- [67] M. Mager, "Alpide, the monolithic active pixel sensor for the alice its upgrade," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 824, pp. 434–438, 2016, frontier Detectors for Frontier Physics: Proceedings of the 13th Pisa Meeting on Advanced Detectors. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900215011122
- [68] W. Snoeys et al., "A process modification for cmos monolithic active pixel sensors for enhanced depletion, timing performance and radiation tolerance," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 871, pp. 90–96, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S016890021730791X
- [69] A. L Steinhebel et al., "AstroPix: CMOS pixels in space," PoS, vol. Pixel2022, p. 020, 2023.
- [70] M. Battaglia et al., "A rad-hard cmos active pixel sensor for electron microscopy," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 598, no. 2, pp. 642–649, 2009. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900208014447
- [71] G. Pinaroli *et al.*, "Test and characterization of finely segmented pixel CZT detectors for future hard x-ray missions," in *Space Telescopes and Instrumentation 2024: Ultraviolet to Gamma Ray*, J.-W. A. den Herder, S. Nikzad, and K. Nakazawa, Eds., vol. 13093, International Society for Optics and Photonics. SPIE, 2024, p. 130935R. [Online]. Available: https://doi.org/10.1117/12.3019976

[72] M. Jeong and G. Kim, "Development of charge sensitive amplifiers based on various circuit board substrates and evaluation of radiation hardness characteristics," *Nuclear Engineering and Technology*, vol. 52, no. 7, pp. 1503–1510, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1738573319306874

- [73] T. Noulis, S. Siskos, and G. Sarrabayrouse, "Development and testing of an advanced cmos readout architecture dedicated to x-rays silicon strip detectors," *Proceedings of the Topical Workshop on Electronics for Particle Physics, TWEPP 2008*, 01 2008.
- [74] A. Chierici *et al.*, "A low-cost radiation detection system to monitor radioactive environments by unmanned vehicles," *The European Physical Journal Plus*, vol. 136, 03 2021.
- [75] T. Poikela *et al.*, "Timepix3: a 65k channel hybrid pixel readout chip with simultaneous toa/tot and sparse readout," *Journal of Instrumentation*, vol. 9, no. 05, p. C05013, may 2014. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/9/05/C05013
- [76] R. Zhang, "Research on the design and application of analog-to-digital converters," *Applied and Computational Engineering*, vol. 89, no. 1, pp. 200–205, Nov. 2024.
- [77] C. Liu, Z. Deng, and Q. Yu, "Development of ultrawide dynamic range readout asics for radiation monitoring in space," *IEEE Transactions on Nuclear Science*, vol. 72, no. 1, pp. 52–60, 2025.
- [78] A. Koziol *et al.*, "Artificial neural network on-chip and in-pixel implementation towards pulse amplitude measurement," *Journal of Instrumentation*, vol. 18, no. 02, p. C02048, feb 2023. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/18/02/C02048
- [79] E. Ronconi *et al.*, "Timestamp and amplitude measurement solution for radiation detectors," in 2022 *IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)*, 2022, pp. 1–4.
- [80] P. Maj *et al.*, "Comparison of the charge sharing effect in two hybrid pixel detectors of different thickness," *Journal of Instrumentation*, vol. 10, no. 02, p. C02006, feb 2015. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/10/02/C02006
- [81] A. Mozzanica *et al.*, "The gotthard charge integrating readout detector: design and characterization," *Journal of Instrumentation*, vol. 7, no. 01, p. C01019, jan 2012. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/7/01/C01019
- [82] R. Dinapoli *et al.*, "MÖnch, a small pitch, integrating hybrid pixel detector for x-ray applications," *Journal of Instrumentation*, vol. 9, no. 05, p. C05015, may 2014. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/9/05/C05015
- [83] ——, "Eiger: Next generation single photon counting detector for x-ray applications," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 650, no. 1, pp. 79–83, 2011, international Workshop on Semiconductor Pixel Detectors for Particles and Imaging 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900210027427

[84] P. Kraft *et al.*, "Performance of single-photon-counting PILATUS detector modules," *Journal of Synchrotron Radiation*, vol. 16, no. 3, pp. 368–375, May 2009. [Online]. Available: https://doi.org/10.1107/S0909049509009911

- [85] R. Yonamine, T. Maerschalk, and G. D. Lentdecker, "Study and optimization of the spatial resolution for detectors with binary readout," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment*, vol. 830, pp. 130–139, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900216304697
- [86] M. Barbero *et al.*, "The fe-i4 pixel readout chip and the ibl module," *PoS Vertex2011:038,2011*, 05 2012. [Online]. Available: https://www.osti.gov/biblio/1039544
- [87] A. Hayrapetyan *et al.*, "Operation and performance of the cms silicon strip tracker with proton-proton collisions at the cern lhc," *Journal of Instrumentation*, vol. 20, no. 08, p. P08027, aug 2025. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/20/08/P08027
- [88] T. A. collaboration, "Operation of the atlas trigger system in run 2," *Journal of Instrumentation*, vol. 15, no. 10, p. P10004, oct 2020. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/15/10/P10004
- [89] J. Kim, "Signal processing and noise analysis on realistic radiation detector model," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 1038, p. 166931, 2022. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S0168900222003758
- [90] N. Wermes and G. Hallewel, *ATLAS pixel detector*, ser. Technical design report. ATLAS. Geneva: CERN, 1998. [Online]. Available: https://cds.cern.ch/record/381263
- [91] K. Desjardins *et al.*, "The CirPAD, a circular 1.4M hybrid pixel detector dedicated to X-ray diffraction measurements at Synchrotron SOLEIL," *Journal of Synchrotron Radiation*, vol. 29, no. 1, pp. 180–193, Jan 2022. [Online]. Available: https://doi.org/10.1107/S1600577521012492
- [92] K. Yarita et al., "Proton radiation damage experiment for x-ray soi pixel detectors," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 924, pp. 457–461, 2019, 11th International Hiroshima Symposium on Development and Application of Semiconductor Tracking Detectors. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900218312014
- [93] T. Heim and M. Garcia-Sciveres, "Self-adjusting threshold mechanism for pixel detectors," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 867, pp. 209–214, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900217306939
- [94] A. Gabrielli, "Fast readout architectures for large arrays of digital pixels: Examples and applications," *The Scientific World Journal*, vol. 2014, no. 1, p. 523429, 2014. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1155/2014/523429

[95] CSIRO, "Maia – high definition x-ray fluorescence microscopy," accessed 2025, https://research.csiro.au/hdxfm/maia/.

- [96] E. Vernon *et al.*, "Development of a high-rate front-end asic for x-ray spectroscopy and diffraction applications," *IEEE Transactions on Nuclear Science*, vol. 67, no. 4, pp. 752–759, 2020.
- [97] D. P. Siddons *et al.*, "Maia x-ray microprobe detector array system," *Journal of Physics: Conference Series*, vol. 499, no. 1, p. 012001, apr 2014. [Online]. Available: https://dx.doi.org/10.1088/1742-6596/499/1/012001
- [98] G. Pinaroli *et al.*, "Multi-channel front-end asic for a 3d position-sensitive detector," *Journal of Instrumentation*, vol. 17, no. 02, p. C02011, feb 2022. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/17/02/C02011
- [99] P. Maj et al., "A virtual frisch-grid geometry-based czt gamma detector for in-field radioisotope identification," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, p. 170976, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168900225007788
- [100] W. Wong *et al.*, "Introducing timepix2, a frame-based pixel detector readout asic measuring energy deposition and arrival time," *Radiation Measurements*, vol. 131, p. 106230, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1350448719305165
- [101] P. Burian *et al.*, "Ethernet embedded readout interface for timepix2—katherine readout for timepix2," *Journal of Instrumentation*, vol. 15, no. 01, p. C01037, jan 2020. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/15/01/C01037
- [102] T. Kondo *et al.*, "3d stacked image sensor with simultaneous global shutter and rolling shutter readout operation," in 2016 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC), 2016, pp. 456–459.
- [103] G. Deptuch *et al.*, "Vertically Integrated Pixel Readout Chip for High Energy Physics," in *GOV-ERNMENT MICROCIRCUIT APPLICATIONS & CRITICAL TECHNOLOGY CONFERENCE GO-MACTech 11*, 1 2011.
- [104] E. Bartz *et al.*, "The 0.25  $\mu$ m token bit manager chip for the cms pixel readout," in *Proceedings of the Pixel 2005 Conference*, ser. CERN-2005-011. Geneva, Switzerland: CERN, 2005, pp. 259–267.
- [105] J. Lakowicz, *Principles of Fluorescence Spectroscopy*. Springer, 01 2006, vol. 1.
- [106] A. N. Nagamani *et al.*, "On the design of hazard free reversible asynchronous circuits," in 2014 International Conference on Advances in Electronics Computers and Communications, 2014, pp. 1–6.
- [107] R. Ginosar, "Metastability and synchronizers: A tutorial," *IEEE Design & Test of Computers*, vol. 28, no. 5, pp. 23–35, 2011.

[108] S.-C. Liu *et al.*, *Event-Based Neuromorphic Systems*. John Wiley & Sons, Ltd, 2015. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118927601

- [109] C. L. Seitz, "Ideas about arbiters," *Lambda*, no. 1, pp. 10–14, 1980.
- [110] D. S. Gorni, G. W. Deptuch, and S. Miryal, "Event-driven readout system with non-priority arbitration for multichannel data sources," International Patent Application WO 2022/221 068 A1, 2022. [Online]. Available: https://lens.org/141-223-083-713-496
- [111] P. Fischer, "First implementation of the mephisto binary readout architecture for strip detectors," Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 461, no. 1, pp. 499–504, 2001, 8th Pisa Meeting on Advanced Detectors. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0168900200012833
- [112] ALICE ITS ALPIDE development team, ALPIDE Operations Manual, CERN, draft Version.
- [113] D. S. Gorni, G. W. Deptuch, and S. Miryala, "Event-driven readout system with non-priority arbitration for multichannel data sources," U.S. Patent Application US 2024/0193116 A1, 2024. [Online]. Available: https://lens.org/102-888-036-963-099
- [114] —, "Event-driven readout system with non-priority arbitration for multichannel data sources," European Union Patent Application EP 4 323 878 A1, 2024. [Online]. Available: https://lens.org/168-474-982-255-915
- [115] —, "Event-driven readout system with non-priority arbitration for multichannel data sources," Australia Patent Application AU 2022/259481 A1, 2023. [Online]. Available: https://lens.org/178-537-125-499-235
- [116] D. S. Gorni, G. W. Deptuch, and S. Miliirala, "Maruchiyaneru de-ta so-su no tame no hi puraio riti a-bitore-shon o yusuru ibento dori-bun yomi-dashi shisutemu," Japan Patent Application JP 2 024 514 178 A, 2024. [Online]. Available: https://lens.org/089-609-437-426-055
- [117] D. Gorni *et al.*, "Event driven readout architecture with non-priority arbitration for radiation detectors," *Journal of Instrumentation*, vol. 17, no. 04, p. C04027, apr 2022. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/17/04/C04027
- [118] K. Boahen, "Point-to-point connectivity between neuromorphic chips using address events," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 47, no. 5, pp. 416–434, 2000.
- [119] A. Martin and M. Nystrom, "Asynchronous techniques for system-on-chip design," *Proceedings of the IEEE*, vol. 94, no. 6, pp. 1089–1120, 2006.
- [120] D. Gorni *et al.*, "Integration of edward readout architecture in full-field fluorescence imaging detector," *Journal of Instrumentation*, vol. 19, no. 04, p. C04035, apr 2024. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/19/04/C04035

[121] P. Maj *et al.*, "Evaluation of a full field fluorescence imager with synchrotron radiation," 2025. [Online]. Available: https://arxiv.org/abs/2507.14425

- [122] G. Deptuch, "Pixel detectors with built-in signal processing and bandwidth-efficient data transmission," in 10th International Workshop on Semiconductor Pixel Detectors for Particles and Imaging, 2023, p. 48.
- [123] D. S. Gorni *et al.*, "A universal all-digital platform for implementation of configuration-testability-readout functionalities within pixel detectors," Poster presented at the 2022 IEEE Nuclear Science Symposium, Medical Imaging Conference and Room Temperature Semiconductor Detector Conference (NSS MIC RTSD), Milan, Italy, Nov. 2022, poster Number: NSS-03-021.
- [124] A. Veiga and E. Spinelli, "A pulse generator with poisson-exponential distribution for emulation of radioactive decay events," in 2016 IEEE 7th Latin American Symposium on Circuits & Systems (LASCAS), 2016, pp. 31–34.
- [125] F. Cannizzaro *et al.*, "Results of the measurements carried out in order to verify the validity of the poisson-exponential distribution in radioactive decay events," *The International Journal of Applied Radiation and Isotopes*, vol. 29, no. 11, pp. 649–IN1, 1978. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0020708X78901011
- [126] Wikipedia contributors, "Linear-feedback shift register Wikipedia, the free encyclopedia," 2025, [Online; accessed 22-July-2025]. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Linear-feedback\_shift\_register&oldid=1301107523
- [127] D. Gorni *et al.*, "Event-driven readout development: testing of the edward65p1 chip with integrated event generators," *Journal of Instrumentation*, vol. 20, no. 03, p. C03009, mar 2025. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/20/03/C03009
- [128] D. Gorni, G. Deptuch, and S. Miryala, "Investigation of timing properties for an event driven with access and reset decoder readout architecture for a pixel array," in 2022 17th Conference on Ph.D Research in Microelectronics and Electronics (PRIME), 2022, pp. 113–116.
- [129] N. Instruments, sbRIO-9629 Specifications, Austin, Texas, Aug 2025, version 2025-08-01.
- [130] *AD9649 Analog-to-Digital Converter—Data Sheet*, Analog Devices, Inc., One Technology Way, Norwood, MA, USA, Feb. 2017, document number AD9649, Revision B.
- [131] About nsls-ii. Brookhaven National Laboratory. [Online]. Available: https://www.bnl.gov/nsls2/about-nsls-ii.php