Monolithic MHz-frame rate digital SiPM-IC with sub-100 ps precision and 70 μm pixel pitch

This paper presents the design and characterization of a monolithic integrated circuit (IC) including digital silicon photomultipliers (dSiPMs) arranged in a 32 × 32 pixel matrix at 70 μm pitch. The IC provides per-quadrant time stamping and hit-map readout, and is fabricated in a standard 150-nm CMOS technology. Each dSiPM pixel consists of four single-photon avalanche diodes (SPADs) sharing a quenching and subsequent processing circuitry and has a fill factor of 30 %. A sub-100 ps precision, 12-bit time-to-digital converter (TDC) provides timestamps per quadrant with an acquisition rate of 3 MHz. Together with the hit map, the total sustained data throughput of the IC amounts to 4 Gbps. Measurements obtained in a dark, temperature-stable environment as well as by using a pulsed laser environment show the full dSiPM-IC functionality. The dark-count rate (DCR) as function of the overvoltage and temperature, the TDC resolution, differential and integral nonlinearity (DNL/INL) as well as the propagation delays across the matrix are presented. With aid of additional peripheral test structures, the main building blocks are characterized and key parameters are presented.


Introduction
Silicon photomultipliers (SiPMs) are composed of an array of single-photon avalanche diodes (SPADs), which are photodiodes reverse biased above breakdown voltage, operating in Geigermode.SiPMs working principles and main advantages like large intrinsic gain (typically 10 5 to 10 6 ), insensitivity against magnetic fields, and large dynamic range starting with single-photon response at low bias voltages, are well explained and specified in literature [1][2][3].Arrays of SPADs are highlighted as ideal candidates when high sensitivity is required together with high frame rate and precise timing resolutions [4].As stated in [5], the capability of photon counting makes SPADs the detector of choice for applications in which conventional photodiodes and charge-coupled devices cannot be used, and a large number of applications for SPAD arrays are listed.Beside the usage as imaging device, e.g. in [6], also light detection and ranging [7], and direct minimum ionizing particle detection [8] are mentioned.
In traditional analog SiPMs, each single SPAD is connected to its quenching resistor forming a so-called microcell.All microcells are connected in parallel and the common current output signal has to be amplified and digitized.The number of microcells defines the dynamic range.In digital SiPMs (dSiPMs) one benefits from the inherent digital behavior of the device, when sensing directly the output of an individual SPAD by an embedded quenching and recharging circuitry together with a simple discriminator, e.g. an inverter.This concept was initially introduced in [9], where SPADs are integrated with conventional CMOS circuits on the same substrate.This additional circuitry, in pixel or at the periphery can be used to acquire, store and transmit data.It negatively impacts the fill factor [3], but offers the information which SPAD is hit.Individual readout permits the definition of spatial granularity, e.g.grouping of SPADs in pixel, reduces circuit complexity and increases the fill factor.Furthermore, it offers temporal granularity by delivering the timestamp for a section of the pixel array, and it provides the opportunity to identify and switch off noisy pixels, to adapt the holdoff time minimizing the afterpulsing probability, and to count hits within a pixel.The main drawback of implementing SPADs in standard CMOS technologies compared to a custom technology is the relatively high dark noise, expressed in terms of dark-count rate (DCR) [10].To compensate these DCR and fill-factor issues for the detection of charged particles, one could implement a dual layer structure of different SPAD arrays for a coincidence measurement, like in [11].DCR is one of the most important properties of SPADs, besides photon-detection probability (PDP), representing the avalanche probability of the device in response to a photon absorption at a given wavelength, and afterpulsing, which introduces false events that are correlated in time to previous detection [3].Also optical crosstalk, fill factor, timing resolution and deadtime have to be considered to characterize SPADs.
The presented dSiPM-IC uses a similar readout concept like in the preceding project published in [12,13].The concept comprises a 32 × 32 dSiPM-pixel matrix subdivided into quadrants.Each quadrant consists of a 16 × 16 dSiPM-pixel array sharing a time-to-digital converter (TDC) and a validation logic.The 12-bit TDC is designed to provide the timestamp of the fastest pixel with a time resolution of less than 100 ps.The validation logic discards undesirable events by setting a threshold.For example, if only one pixel is hit at the same time, it will be most likely a dark event.
Along with the timing information, also the hit map can be read out continuously in a frame-based mode at 3 MHz.The previous readout chip was designed in Global Foundry's 130-nm CMOS technology comprising a single quadrant.The major difference between the old and new concept is the interconnection approach of sensor and readout electronics.In the old concept, the readout IC is flip-chip connected to the sensor chip utilizing 30 m solder spheres at 50 m pitch [14].This hybrid approach enables the direct readout of each sensor pixel by its corresponding pixel electronics occupying the same pixel area at about 50 m pitch, without affecting the fill factor.Because the sensitive area of the sensor chip is now vis-á-vis with the readout chip, direct light illumination of the matrix is not possible.In [13], DCR measurements were carried out on the first hybrid samples, and the TDC characterized.In concluding, it was planed to reduce the high DCR by a new sensor design.A further required step was the redesign of the IC in another CMOS process.It was decided to follow the current trend and select a monolithic approach [15].The chosen process provides on the one hand fully characterized SPADs in four configurations with suitable properties, and on the other hand the possibility to redesign the previous developed readout electronics in a comparable process.The advantage of having sensor and readout electronics on the same device is the designer's independency of sensor provider and interconnection issues.This allowed a fast and cost-efficient realization of a first proof-of-principle device.
The realized IC is shown and its essential components are described in section 2. Section 3 summarizes the results obtained from measurements and compares results from test-circuit blocks with results from earlier prototypes [13].Furthermore, the used Caribou data acquisition (DAQ) system [19] and the test setups are introduced.The conclusions are given in section 4.

Matrix and periphery
The monolithic IC (cf.figure 1a) has been designed in LFoundry's 150-nm CMOS technology to take advantage of an available add-on library including single p+/nwell SPADs with an active area of 20 × 20 m 2 .This area is framed by a cathode ring plus an additional pwell ring to minimize the crosstalk to neighboring cells.The dSiPM-pixel design comprises four SPADs in parallel sharing the pwell ring and using it partially for the NMOS transistors of the common pixel electronics to reduce dead area.In this way, the pixel covers an area of 69.6 × 76 m 2 with a fill factor of 30 %. Figure 1 shows the layout of the IC (a) and a photograph of the dSiPM pixel (b).The IC consists of four identical units (quadrants), each one (cf.figure 2) with a 16 × 16 dSiPM-pixel matrix, a single 12-bit TDC for event-time stamping, a validation logic with adjustable settings for discarding undesirable events, and serializer circuits followed by links (MUX + TX) for fast (about 1 Gbps) and sustained data readout.In this way, the system provides the full hit map of a 32 × 32 dSiPM-pixel matrix together with four timestamps for each frame.Figure 2 shows a simplified block diagram of one quadrant with all components, as well as additional blocks used globally for all quadrants.When an avalanche process is initiated in a SPAD, the pixel draws current from a fast wired-OR connection, and the earliest pixel in the readout frame triggers the TDC.Simultaneously, the row-wise wired-ORs are monitored by a validation logic generating a valid bit for this event.The latched outputs of a peripheral 40-bit frame counter (FC) complete the quadrant timing information with the frame number of the event.The TDC and FC are started by the Shutter signal defining the beginning of a measurement.The 40-bit FC allows a total recording time of about 100 hours for a measurement.The timing data are serialized and multiplexed together with the timing data of the other quadrants.The counted hits in each pixel are serialized row-wise and multiplexed column-wise to deliver the hit map of the quadrant.In the periphery, an initialization register provides the control signals for pixel masking, validation thresholds and TDC settings.Furthermore, clock dividers and buffers serve to distribute all essential clocks to the components.The main clocks are provided by the DAQ system.The frame-based readout and operation mode runs at a targeted frequency of 3 MHz (FRAME CLK).Additionally, a 408-MHz SYSTEM CLK (multiple of the FRAME CLK) is required as reference clock for the TDCs (cf.TDC Ref Clock) as well as for the multiplexers to enable the sustained readout of the entire hit matrix.

Block details and timing
Figure 3a shows the dSiPM-pixel electronics consisting of the four SPADs connected in parallel, a front-end circuitry using 3.3-V NMOS transistors, and a readout circuitry operating at core voltage of 1.8 V.The front end allows an overvoltage (V  ) of maximally 3.6 V, which sets the anode voltage.The used quenching circuit is similar to the presented one in [16], where the quenching is performed by a globally biased transistor (cf.V ℎ ) and an inverter (cf.INV) as comparator for the digital pulse shaping (cf.Out).A clamping transistor (cf.V   ) limits the inverter input to 1.8 V.The pixel electronics include the possibility to mask a pixel (cf.Masking) via a single SRAM cell.The hold-off time of the pixel can be adapted by the global bias voltage V ℎ .Detected hits can be counted within the acquisition window by the 2-bit hit counter.If this function is not needed, the counter can be set as buffer by the control signal /Set_2bit.
Figure 3b shows the TDC, which is divided into a fine and coarse converter.The fine TDC consists of a delay-locked loop (DLL) with 32 differential delay elements (DEs).A phase-frequency detector compares the incoming reference signal (cf.Rf ) with the delayed one (cf.Dl) and steers the following charge pump like in [17].The resulting voltage over the loop-filter capacitance (C  ) together with a calibration circuit controls the delay of all DEs (cf.V  ).The DE outputs are latched by the wired-OR trigger signal until the next frame starts.The latched signals are encoded as thermometer code and have to be decoded into binary with the aid of a 32-to-5-bit encoder.After each DLL cycle, a subsequent ripple counter is incremented and serves as 7-bit coarse TDC.
The row-wise wired-OR outputs (cf.R<0:15>) are monitored by a 4-step validation logic generating a valid bit for an event, as displayed in figure 3c.In the first validation step, every row output is connected by a selectable AND/OR gate with its neighboring row output.In each subsequent step all gates in a column can be programmed as AND or OR (by Valid_cntr<0:3>).In this way, pixel in a cluster of few pixels firing simultaneously can be identified.and read out during the following frame.The dynamic range of the TDC with 2 7 /408 MHz = 313.7 ns corresponds to the acquisition window.By using the hit counter in 2-bit mode, the readout time has to be adapted to enable full hit-map readout.Accordingly, the acquisition window extends to two frame-clock cycles.

Test circuits
In the periphery of the matrix, test circuits are integrated for single-block characterization (cf.figure 1a).These extra blocks comprise a stand-alone TDC, as well as a chain of receiver (RX) and transmitter (TX) links using the low-voltage differential signaling (LVDS) standard.The simplified block schemes are shown in figure 5.The RX consists of a Schmitt-Trigger circuitry operated at 3.3 V supply voltage and is followed by a voltage-conversation stage delivering single-ended output.The TX includes an input stage comprising an edge aligner, voltage converter and buffers to drive the input of the subsequent current-switching circuitry.To keep the output common mode stable about 1.2 V, a common-mode feedback (CMFB) steers the output current.Both circuits are terminated with 100 Ω on the chip.Additionally, sensor-test structures in different configurations are implemented.For example, a 3 × 3 test-SPAD array, where the center SPAD is connected to the same front end as used for the dSiPM pixel, enables the monitoring of the digital output of a single SPAD.

Experimental setup
For characterization of the IC, the versatile Caribou readout system is used enabling fast and low-cost implementation of new solid-state detector prototypes.It offers open access to hardware, firmware [20] and software [21] speeding up the test setup development.Caribou mainly consists of a system-on-chip (SoC) evaluation board (Xilinx ZC706) and a control-and-readout (CaR) interface board.A field-programmable gate array (FPGA) runs custom hardware blocks for data processing and detector controlling, and an embedded CPU runs the DAQ and control software.The CaR board provides the physical interface between the SoC and the detector.It also includes power supplies, analogue and digital I/Os, a clock generator, analog-to-digital converters, current and voltage references as well as several connectors.A detector-specific carrier board comprises the IC, LVDS repeater, and components for filtering, decoupling and line termination, as well as a trigger logic for an external pulsed laser source.The carrier board is covered by an aluminum case protecting the dSiPM-IC from physical damage, external light and acting as a heat sink.The dark-event based characterization of the dSiPM-pixel matrix was performed by using a climate chamber and temperature sets between −25 • C and 25 • C.
In order to determine the propagation delay along the wired-OR lines, a pulsed 1054-nm laser was operated synchronously to the frame clock.The laser is movable in all dimensions illuminating sequentially the actual enabled pixel with a spot diameter of ∼0.5 mm.
Another IC test board together with the data-timing generator DTG 5334 (3.35 GHz clock, 0.2 ps step size) and the oscilloscope Teledyne LeCroy SDA-760ZI were used as second test setup for the characterization of the separate links and the stand-alone TDC at room temperature.

dSiPM-pixel matrix
With the Caribou-test setup, several samples were measured.Initially, the DCR was monitored as a function of the common SPAD-bias voltage (V  ) by accumulating dark hits within 10,000 frames.A hit can only be detected by the dSiPM pixel (cf.figure 3a) when V  exceeds the breakdown voltage (V  ) plus the threshold voltage of the inverter.Taking this into account, V  could be estimated by taking the voltage at which hits are detected minus the threshold voltage of the inverter, which is about 0.6 V.The voltage above V  is defined as the overvoltage (V  ).
All results of a representative sample are shown in figure 6. V  as a function of temperature is plotted in figure 6a and the average DCR of all 1024 pixels is plotted in figure 6b.This diagram illustrates the strong dependency of DCR on temperature and V  .For example, a decrease of V  from 2 V to 1 V results in a DCR reduction of about 50 %.Further cooling of the sample reduces the noise behavior enormously.This is nicely seen in figure 6c for two different overvoltages.While in 6c the average DCR of all pixels within the matrix is plotted vs. temperature, figure 6d  [22] a very similar distribution for an array of 32 × 32 SPADs at 44.64 m pitch using the same sensor cell and process is shown.The range is here between some ten Hz and ∼200 kHz.By using our peripheral 3 × 3 test-SPAD array we were able to perform the DCR measurements on single SPADs.The results of three SPADs (550 Hz, 1050 Hz and 15.3 kHz) fit into this distribution plot.If we take the data from figure 6c (tagged with a "↓") and look for these in figure 6d, one can say that at least 67 % of the pixels at −25 • C and 73 % of the pixels at 25 • C have DCR values below the average ones.This percentage means that only a few very noisy pixels boost the average DCR value, and this behavior grows with temperature.Disabling 10 % of the noisiest pixels leads to an improvement of 40 % in average DCR.

dSiPM-pixel electronics
The previous measurements were taken without using the 2-bit hit-counting functionality in pixel.This feature allows for the determination of the deadtime of the pixel (quenching and recharging).Figure 7 shows the deadtime as function of the global bias voltage (V ℎ ) of the quenching transistor (cf.figure 3a), exemplary for a pixel in the center of the laser spot.For this measurement, two laser pulses were send within a frame starting with maximal distance (about 450 ns) to each other and minimal V ℎ .Below V ℎ = 0.55 V, the electronics counts only one hit, that means the pixel needs more time for recharging than the acquisition window allows.After first recognition of the second hit, the deadtime can be defined as distance between both input pulses.This distance can be decreased with increasing V ℎ .The minimal deadtime is achieved at V ℎ = 0.95 V with about 22 ns.

Quadrant TDC
In addition to the hit map, the individual quadrant-TDC data are monitored during the dark-event measurements.In figure 8, time-of-arrival (ToA) is plotted into histograms measured at V  = 2 V and different operation temperatures.At −25 • C (figure 8a), the entries for all ToA values are roughly uniformly distributed over the dynamic range (cf. the Gaussian fit on the left of figure 8a).With increasing temperature, the entries shift to lower timestamp values (cf.figure 8b).This is mainly caused by the temperature-dependent DCR.The probability for more entries at lower timestamps increases with number of firing pixels within a quadrant, because all of them share the same TDC, a) b) Due to the non-uniform distribution it is not possible to determine the integral and differential nonlinearity (INL and DNL) of the TDCs over their total dynamic range with aid of the statistical code density test.Therefore, figure 9a only shows the histograms of the fine-TDC values.The deviation of entries for every bin from the mean value of all entries defines the DNL of each bin.The INL can be calculated as the cumulative sum of DNLs [18].Figure 9b and c show the DNL and INL for V  = 2 V and 25 • C, respectively.As visible in the histogram, the fine TDCs do not tap their full potential of 32 bins (5 bits).That means, the delay of the DEs in the DLL is too large.We believe, that the circuit is processed in a process corner, where all transistors work very slowly.Process, voltage and temperature variations can have a big impact on chip functionality.Corner simulations show a delay range (bin width) of 70 ps until 110 ps.To tune these values two control switches (TDC_cntr<0:1>) are included in the calibration circuit of the TDC (cf.figure 3b).But for the slow corner, the additional switched current is insufficient, and only a bin width of about 94 ps is achieved instead of typical 76.5 ps.Furthermore, the distribution of clocks over the matrix can provoke run-time discrepancies.This can be the reason for the different number of fine-TDC bins for the four quadrant TDCs.
For the characterization plots in figure 9, first the DNL standard deviation (cf.DNL in figure 9b) is determined excluding the last two bins.A 5- limit on DNL defines the maximal bin number of the fine TDC or the minimal bin width of the last bin, respectively.Second, the DNL and INL are determined for the updated fine-TDC bins.This procedure was done for all dark-event measurements leading to similar results.Here, the effect of temperature variations is visible in the reached number of bins.The DE delay decreases with increasing temperature.At −25 • C, the dynamic range is increased by about two bins (cf.figure 9d).This effect makes a stable temperature environment desirable, and the differences in bin width have to be considered for the ToA calculations.As mentioned in section 2, all pixel outputs in a quadrant are connected via wired-ORs triggering the TDC.These metal interconnections come along with parasitic elements implicating propagation delays across the matrix.These delays introduce non-negligible offsets per pixel, which must be considered in timestamp calculations.Figure 10 shows the offset map (left) and a map with its standard deviations (right), measured by using the laser source.For this measurement the pixels are sequentially enabled and illuminated, and the TDC output per pixel is stored.The offset is calculated by subtracting each pixel value by the value of the closest pixel to the corresponding TDC (e.g.pixel 1/9 for quadrant 3), which is expected to be the lowest one.In this case, the offset map illustrates the wired-OR routing to the four TDCs (signal propagating row-wise from middle to the left and right, and then to the middle of the edge of each quadrant).The maximum offset in the area of 1.12 × 1.22 mm 2 (one quadrant) is (3.45 ± 0.61) LSB (∼326 ps ± 86 ps).The minimum offset is ≤ 0 LSB caused by some jitter.The standard deviation map on the right side of figure 10 highlights the ranges, where the propagation-delay value falls in a LSB change of the TDCs.

Stand-alone TDC
In contrast to the previous subsection, where random dark events were used to characterize the quadrant fine TDCs, a time-ramp signal was created by driving the trigger input of the stand-alone TDC with step size of 1 ps over the entire dynamic range.As reference clock, the 408-MHz system clock was used.The measured average bin width amounts to ∼95 ps.The deviation of each bin width to the average value defines the DNL, and the deviation to an ideal line fitted into the step curve defines the INL. Figure 11 exemplarily shows the measured TDC characteristics of one sample.

Conclusions
We presented a monolithic 32 × 32 dSiPM-pixel matrix IC with 70 m pitch and 30 % fill factor designed and fabricated in LF's 150-nm CMOS technology.It enables full hit-map readout and provides sub-100 ps time stamping for each quadrant.Main focus has been taken on dark-event measurements in a climate chamber to show the dependency of DCR on temperature and overvoltage.Also the event-related timestamps for different temperatures have been analyzed.With aid of a pulsed laser, propagation delays across the matrix have been determined.Furthermore, the TDC resolution, DNL and INL as well as data transmission speed limits have been identified and compared with a previous prototype designed for a hybrid concept.

Figure 3 .
Figure 3. Simplified equivalent circuits of: a) the dSiPM pixel, b) the TDC, and c) the validation logic.

Figure 4
Figure 4 shows the underlying timing diagram with input clocks, internal signals and output data for the hit counter set in 1-bit mode.The acquisition window is defined by the rising edge of Frame_rst and the falling edge of Read.Afterwards, all collected data are buffered into the serializers

Figure 4 .
Figure 4. Timing diagram of the frame-based readout.

Figure 5 .
Figure 5. Simplified block diagrams of the RX and TX.

Figure 6 .
Figure 6.a) V  versus temperature.Average DCR of all pixels: b) versus V  for ten temperature settings, and c) versus temperature at V  = 1 V and 2 V. d) Cumulative distribution of all pixel-DCRs in the matrix at three temperatures and V  = 2 V.

Figure 8 .Figure 9 .
Figure 8. a) Histograms of all four quadrant-TDC data of the dark-event measurements at V  = 2 V and −25 • C, and their Gaussian fit.b) Collected histograms of all TDC data for −25 • to 25 • C.

Figure 10 .
Figure 10.Offset map (left), and standard deviation of the offset map (right).Pixel 6/14 (dark blue) was turned off during measurements and its data are neglected.

Figure 11 .
Figure 11.TDC-bin width, DNL and INL (top to bottom) of one sample at 408 MHz reference clock.

Figure 12 .
Figure 12.Histogram, DNL and INL (top to bottom) of four sample's fine-TDC data at room temperature.

Table 1 .
Key characteristics of the SPAD, pixel electronics, TDC and LVDS links, determined at room temperature.