AstroPix4 — a novel HV-CMOS sensor developed for space based experiments

For the proposed space based gamma-ray observatory All-sky Medium-Energy Gamma-ray Observatory eXplorer (AMEGO-X), a silicon tracker based on a novel High Voltage-CMOS (HV-CMOS) sensor called AstroPix, is currently being developed. Preliminary measurements with the first full reticle prototype AstroPix3 show that the power target of 1.5 mW/cm2 can currently not be reached due to the digital consumption of 3.08 mW/cm2, while the analog power consumption of 1.04 mW/cm2 and a break down voltage of over 350 V look promising. Based on these results, the design changes in AstroPix4, submitted in May 2023, are presented, containing changes to the time stamp generation and readout architecture. A digital power consumption below 0.25 mW/cm2 is expected by removing the fast 200 MHz clock used to measure the time-over-threshold (ToT) and an LVDS receiver. A maximum resolution of 3.125 ns for time-of-arrival (ToA) and ToT is reached by adding per-pixel Flash-Time-to-Digital Converter (TDCs) controlled by a global delay-locked loop (DLL).


AstroPix for AMEGO-X
A novel monolithic High Voltage-CMOS (HV-CMOS) monolithic active pixel sensor [1] named AstroPix is currently being developed for the proposed All-sky Medium-Energy Gamma-ray Observatory eXplorer (AMEGO-X) mission concept [2].AMEGO-X is a space based observatory for multimessenger astronomy consisting of three main detectors, an AstroPix based silicon tracker, a Caesium Iodide (CsI) calorimeter, and an anti-coincidence detector (ACD).In its three year mission it will observe gamma rays in the range of 100 keV to 1 GeV targeting the MeV gap.The key requirements for the AstroPix sensors in the Compton tracker are a dynamic range from 20 keV to 700 keV, a fully depleted substrate and a power consumption under 1.5 mW/cm 2 .
The most recent iteration, AstroPix4, is a 1 cm × 1 cm prototype submitted in May 2023 in TSI's 180 nm process with a 35 × 35 pixel matrix.It features a new readout architecture and improved timing circuits with reduced power consumption and simplified module integration.Section 2 reviews the AMEGO-X requirements with regard to the current status from first AstroPix3 measurements.Based on these findings, section 3 explains AstroPix4 design changes.

Requirements review based on preliminary AstroPix3 results
In May 2022, AstroPix3, the first full reticle size 2 cm × 2 cm prototype was taped out and fabricated with three different substrate resistivities of 20 Ω-cm, 370 Ω-cm, and 18 kΩ-cm.It has a 35 × 35 pixel matrix with a pixel size of 300 µm × 300 µm and 200 µm clearance in both horizontal and vertical direction between pixels.The chip features a daisy-chain Serial Peripheral Interface (SPI), which allows chip-to-chip readout and configuration, occupying only five data lines per SPI-bus on the data acquisition system (DAQ) for a row of up to 32 chips.Using two instead of one MISO lines doubles the maximum readout data rate to 5 MBps at a clock frequency of 20 MHz.
All bias currents and voltages needed are internally generated by 6 bit current and 10 bit voltage DACs.The analog in-pixel front-end consists of a charge sensitive amplifier (CSA) implemented as an n-type cascode amplifier with an additional feedback capacitance to increase the dynamic range, -1 -

JINST 19 C04010
a band-pass filter, and a CMOS comparator converting the analog signal to a digital pulse.The comparator output pulse of each pixel in one row and column are OR'd to reduce the number of readout channels to the sum of rows and columns.To reduce crosstalk produced by the metal traces connecting the pixel and the synthesized digital logic in the bottom periphery of the chip, the OR'd signal is level shifted from 1.8 V to a reduced amplitude of 0.8 V.
In the periphery, a global time stamp generated by an 8 bit counter driven by an external 2.5 MHz time stamp clock is assigned at the leading edge of the hit, which at the same time starts a 12 bit counter driven by an external 200 MHz clock to measure the time-over-threshold (ToT).The readout via SPI is only triggered, if there is data to be read out i.e. an open-drain signal connected to all the chips on one bus is pulled low, to keep the active duty-cycle of the SPI interface low.

Power consumption
In contrast to a tracker in a ground based collider experiment, the instrument will be located in a satellite with a very limited power budget.The total tracker will consist of four towers with 40 layers each.One layer contains 95 quad modules i.e. 380 AstroPix chips, resulting in a total of 61k chips.Subtracting the consumption of the other detectors and the DAQ results in a maximum of 1.5 mW/cm 2 available for the pixel sensors.
The measured power consumption of AstroPix3 is shown in table 1.With an amplifier bias current of 1 µA and a comparator bias current of 250 nA a signal-to-noise ratio (SNR) of 40 was measured at a bias voltage of −30 V. Dividing the power figures listed in the table by the chip area of 4 cm 2 , the analog power per area is 1.06 mW/cm 2 and 3.06 mW/cm 2 for the digital part.

Dynamic range and depletion
The dynamic range requirement of the sensor is 20 keV to 700 keV with a resolution of 5 keV RMS at 122 keV.To track Compton events, the pixel has to absorb the energy of up to 700 keV recoil electrons, therefore the depletion thickness has to be 500 µm.TCAD simulations showed that to achieve a depletion thickness of 500 µm, a reverse bias of 300 V to 400 V and a 10 kΩ-cm substrate resistivity are needed [3].The I/V curve of AstroPix3 for both a 20 Ω-cm and a 370 Ω-cm substrate are shown in figure 1.The 20 Ω-cm chips show an early breakdown at 240 V, whereas the 370 Ω-cm chips do not break down below 400 V.The 18 kΩ-cm measurement is dominated by a high leakage current in the mA range, the cause of which is under investigation.The energy of a hit is measured via the ToT, but even with the high dynamic range modification, the CSA saturates if the input charge is larger than 60 000 e − .The ToT still increases but Δ ToT /Δ gets smaller for larger input charges.Simulations show, that for the required 5 keV energy resolution in this region, a ToT resolution of O (10 ns) is needed.

Time stamp generation
To achieve the necessary ToT resolution, in previous versions of AstroPix a synthesized synchronous 12 bit counter, driven by an external 200 MHz clock, is used.The problem arising from this solution is the power consumption from the clock tree and differential receiver.Therefore AstroPix4 follows a different approach based on the usage of slow clocks and asynchronous TDC to minimize static and dynamic power.A CMOS phase-locked loop (PLL) is used to generate a 20 MHz clock from a 2.5 MHz reference clock.It is composed of a true single-phase clock (TSPC) logic phase/frequency detector (PFD), a cascode charge pump, a 9-stage ring oscillator build from current starved inverters and a integrated loop filter.The control voltage is linearly converted to a control current to fine tune the voltage-controlled oscillator (VCO).A second programmable current source sets the coarse operating point of the VCO.This allows having a broad 200 MHz bandwidth with a small tuning range to keep the VCO gain low for improved phase noise.The output signal of the VCO is connected to an asynchronous divide-by-8 divider, whose output is re-synchronized with the oscillator output to not accumulate jitter.The simulated power consumption of the PLL is 100 µW.
The 20 MHz clock drives a synchronous 17 bit positive edge and 1 bit negative edge gray counter in the synthesized digital part of the chip.The resulting coarse time stamp is distributed to the hit buffers of each pixel, located below the pixel matrix.On both the leading and trailing edge these 18 bits are saved into 3T-Dynamic Random Access Memory (DRAM) cells, resulting in a theoretical resolution of 25 ns when combining the negative and positive edge time stamp.
To further improve the resolution, each hit buffer contains a 16 bit Flash-TDC [4] to measure the time difference from the hit edges to the next rising edge of the 20 MHz clock as shown in figure 2, increasing the maximum resolution to 3.125 ns.
Each Flash-TDC cell consists of a delay element, formed by two current starved inverters.The output of each delay element is connected to two 3T-DRAM cells to store the corresponding TDC value on both edges.The TDC is started by the edges of the hit signals, the stop signal is always the next rising edge of the 20 MHz clock.
At stop, the DRAM is disabled for writing, to save the current state of each delay element.A dedicated enable logic prevents the restart of the TDC if there is a hit saved in the DRAM, but not -3 - yet read out.The resulting theoretical resolution is the delay of one cell.This delay is strongly dependent on Process, Voltage, and Temperature (PVT), therefore the total delay is stabilized to 50 ns by a global DLL which is further described in section 3.2.The simulated leakage power per hit buffer cell is 50 nW and 61 µW for a projected full reticle chip.The worst case average power of the Flash-TDC,  TDC,avg can be calculated by This assumes all hits arrive immediately after the rising clock edge so that the active time of the TDC resembles the clock period  clk .Due to the low active duty cycle of O (10 −6 ) caused by the number of maximum 10 events per second, the dynamic power of the TDC is negligible.On the other hand the time-of-arrival (ToA) resolution improves from 400 ns to 3.125 ns.

Global delay-locked loop
The delay-locked loop (DLL) consists of a CMOS PFD build from standard cells, a cascode charge pump and a delay line, which is a replica of the delay line used in the TDC.To match the capacitive loading of the delay elements in the TDC, two dummy DRAM cells are connected to each delay element.The loop filter is implemented as a 20 pF Metal-Insulator-Metal (MIM) capacitor.A difficulty in DLL design is first the mitigation of harmonic lock, namely the locking on higher harmonics of the reference delay and second the stuck problem, causing the control voltage to drift into the supply rails, caused by the delay of feedback signal being smaller than the reference delay  <  ref .There are various solutions presented [5,6].The design in AstroPix4 follows a simpler non reset-free approach with the circuit shown in figure 3.
Harmonic locking is prevented by starting the delay line with the shortest possible delay, by forcing the control voltage to the supply voltage at start up.The second problem is circumvented by ensuring that the delay is in the desired range at startup.This is implemented by not directly feeding the reference clock into the PFD but a copy of it, delayed by one period of Clk ref.The simulated power consumption of the DLL is 150 µW.To test the performance of the DLL stabilized TDC, a Monte Carlo simulation with the DLL and an entire column of 35 TDCs was performed.PVT 3 variation is reduced from 5.4 LSB to 0.2 LSB.The impact of mismatch was simulated to be 0.5 LSB.

Pixel readout architecture
The readout architecture changed from OR'd row to per-pixel readout as displayed in figure 4 left, similar to the column drain architecture used in ATLASpix3 [7].The OR'd row and columns scheme, implemented as open-drain buses, was chosen in AstroPix1 to keep the number of readout channels low to reduce the power consumption, but with the changes described in 3.1, the number of channels is not a driver for the power consumption anymore.On the contrary, there are multiple complications resulting from the old architecture, namely identification problems if multiple hits occur in the same rows and columns, as shown in the example in figure 4 right.If the dashed third hit falls into the ToT of the two hits with a solid line, it will be missed completely.
Another disadvantage is that the RC-delay introduced by the long metal traces connecting the pixel to the periphery is not only row dependent, but also column dependent, which makes the offline compensation more difficult.Looking at higher rates, the per-pixel readout is also preferred due to the high occupancy in the OR'd rows and columns scheme.-5 -

JINST 19 C04010 4 Summary and outlook
Measurements with AstroPix3 show that the power per area of 3.06 mW/cm 2 exceeds the target of 1.5 mW/cm 2 due to the high power consumption of the digital part, whereas the analog power consumption of 1.06 mW/cm 2 and the break down behaviour of > 350 V look promising.Simulations with the new AstroPix4 prototype submitted in May 2023, show that by removing fast clocks and receivers, the digital power consumption for the ToT and ToA measurement can be reduced to 700 µW which includes the consumption of the PLL and DLL as well as a safety factor of 2.
The new readout scheme enables higher rate capability and mitigates identification problems observed in AstroPix3, which, together with the higher ToA resolution of 3.125 ns, is important for the projected use of AstroPix in the Electron-Ion Collider (EIC) barrel calorimeter ePIC [8].One of the next steps will be the first test of AstroPix3 in a 3-layer tracking demonstrator during a sounding rocket flight [9], as well as Edge-TCT [10] depletion measurements and further testing of the high resistivity substrate.

Figure 2 .
Figure 2. Simplified 6 bit schematic of the pixel Flash-TDC stabilized to a maximum delay of 50 ns by a global DLL.

Figure 3 .
Figure 3. DLL startup controller to start the delay line at minimum delay and to ensure that the delay of the feedback signal is smaller than the reference delay.

Figure 4 .
Figure 4. Left: column drain based per-pixel readout.Right: OR'd rows and columns architecture, hits can be missed if there are multiple hits in one row or column.

Table 1 .
Power consumption of the 2 cm × 2 cm AstroPix3 separated into Analog VDDA, Amplifier Bias VSSA and Digital VDDD supply rails.