

On-Chip Ultra-Fast Data Acquisition System for Optical Scanning Acoustic Microscopy Using  $0.35\mu m$  CMOS Technology

Peiliang Dong, MSc, BSc

Thesis submitted to the University of Nottingham for the degree of Doctor of Philosophy September 2008

#### Abstract

Optical Scanning Acoustic Microscopy (OSAM) is a non-contacting method of investigating the properties and hidden faults of solid materials. This thesis presents an ultra-fast data acquisition system (DAQ) which samples and digitises the output signal of OSAM. The author's work includes the design of the clock source and the sampler, and integration of the whole system.

The clock source is a unique pulse generator based on a 2.624GHz PLL with a Quadrature VCO (QVCO), which is able to generate 4 clock signals in accurate quadrature phase difference. The pulse generator used the 4-phase clocks to provide control pulses to the sampler. The pulses were carefully aligned to the clock edges by digital logic, so that jitters were reduced as much as possible. The required short time delay for the sampler was also provided by the pulse generator, and this was implemented by a smartly-controlled switch box which re-shuffles the 4-phase clocks.

The presented sampler is a novel 10.496GSample/s Sub-Sampling Sample-and-Hold Amplifier (SHA). The SHA sampled the input, and transformed its spectrum down to a low-frequency range so that it can be digitised. Charge-domain sampling strategy and double differential switches were both developed in this circuit to significantly improve the sampling speed. The periodicity of the system input was exploited in repetitive sampling to reduce the noise.

These designed modules were integrated into a DAQ for a  $2 \times 8$  sensor array. A pseudo-parallel scanning strategy was presented to minimise the power consumption, and a current-based buffer was applied to deliver the control pulses into the array.

The DAQ was implemented on-chip in a low-cost  $0.35\mu m$  standard CMOS process. The measurement results showed that the DAQ successfully achieved a sampling rate more than 10GS/s, with a maximum output resolution of approximately 6 bits.

#### Acknowledgments

I'd like to thank my supervisors, Dr. Ian Harrison and Dr. Barrie Hayes-Gill, for their guidance and support during my PhD study. I am especially grateful to Ian, and feel lucky to have him as my supervisor, who not only taught me the essential skills of RF design and measurement, but also gave me valuable ideas whenever I had problems in my research. Without his inspiration and support, I could not make this achievement.

I'd also like to thank Roger, who provided technical support for the chip fabrication, Richard, who made the optical set-up for the chip measurement, and one of my best friends Proust (Mengxiong), who designed the front-end circuits. Thanks also go to my colleagues and friends in the School of EEE past and present, with whom I have been exchanging ideas and knowledge, and having happy times as well. These include Proust, Vinoth, Fen, Li, Sue, Shah, Qidong, Fred, David, Sheng, Wilson, Irene, Maggie, Yueran, etc.

I'd like to express my gratitude to the Si Yuan Foundation for funding my PhD study, and EPSRC for funding this work (Grant No. EP/CS12758/1).

Lastly, I would express my greatest thanks to my wife, Bei, who constantly supports me on everything, and also my parents, my sister, and my parents-in-law for their support. Finally, best wishes to my daughter Catherine, who is just 6 months older than this thesis, and has totally no idea of what is going on here.

### **Abbreviation List**

ADC Analog-to-Digital Converter

 $\mathbf{CML}$  Current-Mode Logic

CMOS Complementary Metal Oxide Semiconductor

CW Laser Continuous-Wave Laser

**DAQ** Data AcQuisition

 $\mathbf{DC\text{-}Op}\ \mathrm{DC}$  Operating Point

**DDS** Double Differential Switch

 $\mathbf{DDU}$  Digital Delay Unit

 $\mathbf{DFT}$  Digital Fourier Transform

**DLL** Delay-Locked Loop

 $\mathbf{DSP}\ \operatorname{Digital\ Signal\ Processor}$ 

ECL Emitter-Coupled Logic

 ${\bf FD}\;$  Frequency Divider

 $\mathbf{FFT}$  Fast Fourier Transform

IDFT Inverse Digital Fourier Transform

IFFT Inverse Fast Fourier Transform

**LFA** Linearising Feedback Amplifier

OSAM Optical Scanning Acoustic Microscopy

**OpAmp** Operational Amplifier

 ${f PD}$  Phase Detector

**PFD** Phase/Frequency Detector

**PLL** Phase-Locked Loop

 ${\bf QVCO}$ Quadrature Voltage-Controlled Oscillator

 $\mathbf{RGC}$  ReGulated Cascode

 ${f RMS}$  Root Mean Square

**SAW** Surface Acoustic Wave

 $\mathbf{SCL}$  Source-Coupled Logic

SHA Sample-and-Hold Amplifier

 $\mathbf{TCA}$  Trans-Conductance Amplifier

 ${\bf TIA}\;$  Trans-Impedance Amplifier

 ${\bf VCO} \ \ {\bf Voltage\text{-}Controlled} \ \ {\bf Oscillator}$ 

# **Brief Contents**

| Tables of ContentsVI                       |
|--------------------------------------------|
| I Introduction to O-SAM and its DAQ system |
| II Clock Source and Pulse Generator10      |
| III Sub-Sampling SHA84                     |
| IV On-Chip Data Acquisition System         |
| V Implementation, Measurement, and Summary |
| VI Appendix                                |
| Bibliography and Index                     |

# Contents

| $\mathbf{Abs}$ | tra  | ct                                                            | Ι   |
|----------------|------|---------------------------------------------------------------|-----|
| Ack            | no   | m wledgements                                                 | II  |
| Abł            | ore  | viation List                                                  | III |
| Brie           | ef C | ${f Contents}$                                                | V   |
| Tab            | le ( | of Contents                                                   | ΧI  |
| ${f List}$     | of   | Figures XV                                                    | III |
| ${f List}$     | of   | Tables X                                                      | IX  |
| I              | Int  | croduction to O-SAM and its DAQ system                        | 1   |
| 1 (            | Opt  | ical Scanning Acoustic Microscopy                             | 2   |
| 1              | .1   | Optical Scanning Acoustic Microscopy (the optical part)       | 2   |
| 1              | 2    | Data Acquisition (DAQ) system for O-SAM (the electronic part) | 4   |
| 1              | .3   | Thesis organization                                           | 6   |

| CC | CONTENTS                    |                                                  |    |  |
|----|-----------------------------|--------------------------------------------------|----|--|
| 2  | System Architecture         |                                                  |    |  |
|    | 2.1                         | Structure and function description               | 7  |  |
|    | 2.2                         | Thesis objectives                                | 8  |  |
| II | $\mathbf{C}$                | lock Source and Pulse Generator                  | 10 |  |
| 3  | Intr                        | oduction to Clock Synthesiser                    | 12 |  |
|    | 3.1                         | Phase-Locked Loop (PLL)                          | 12 |  |
|    | 3.2                         | Delay-Locked Loop (DLL)                          | 22 |  |
|    | 3.3                         | Generation of quadrature signals                 | 23 |  |
|    | 3.4                         | Summary                                          | 26 |  |
| 4  | Design of Clock Synthesiser |                                                  |    |  |
|    | 4.1                         | Solutions to the clock source in the DAQ $\dots$ | 27 |  |
|    | 4.2                         | Phase/Frequency Detector and charge pump         | 33 |  |
|    | 4.3                         | Frequency divider (FD)                           | 35 |  |
|    | 4.4                         | VCO                                              | 49 |  |
|    | 4.5                         | Loop filter                                      | 57 |  |
|    | 4.6                         | Simulation of clock synthesiser                  | 59 |  |
|    | 4.7                         | Summary                                          | 61 |  |

| C  | CONTENTS VIII           |                                                        |    |
|----|-------------------------|--------------------------------------------------------|----|
| 5  | Pul                     | se Generator                                           | 63 |
|    | 5.1                     | System requirement of the pulse generator              | 63 |
|    | 5.2                     | Architecture and mechanism of the pulse generator      | 65 |
|    | 5.3                     | Switch box                                             | 70 |
|    | 5.4                     | Digital Delay Unit and Edge Detector 1                 | 72 |
|    | 5.5                     | 32/33 Frequency divider (32/33 FD) and Edge Detector 2 | 75 |
|    | 5.6                     | Low-frequency dividers                                 | 77 |
|    | 5.7                     | Layout and simulation                                  | 79 |
|    | 5.8                     | Design of Pulse Generator for $2.6GS/s$ DAQ            | 79 |
|    | 5.9                     | Summary                                                | 82 |
|    |                         |                                                        |    |
| II | III Sub-sampling SHA 84 |                                                        |    |
| 6  | Intr                    | roduction to SHA                                       | 86 |
|    | 6.1                     | Sample-and-Hold Amplifier (SHA)                        | 86 |
|    | 6.2                     | Sub-sampling                                           | 88 |
|    | 6.3                     | Switched-capacitor filter                              | 89 |
|    | 6.4                     | Summary                                                | 92 |
| 7  | Des                     | ign of Sub-sampling SHA                                | 93 |
|    | 7.1                     | System requirement of the SHA                          | 93 |
|    | 7.2                     | Sub-sampling for periodical signal                     | 94 |

| C | CONTENTS |                                                         |     |  |  |
|---|----------|---------------------------------------------------------|-----|--|--|
|   | 7.3      | Charge-domain sampling                                  | 96  |  |  |
|   | 7.4      | Double Differential Switch (DDS)                        | 98  |  |  |
|   | 7.5      | Repetitive sampling                                     | 99  |  |  |
|   | 7.6      | Terminologies                                           | 101 |  |  |
|   | 7.7      | Implementation of Sub-Sampling SHA                      | 102 |  |  |
|   | 7.8      | Summary                                                 | 105 |  |  |
| 8 | Err      | ors and Correcting Circuits                             | 106 |  |  |
|   | 8.1      | Non-linearity output and Linearising Feedback Amplifier | 106 |  |  |
|   | 8.2      | Frequency Response and Compensating Filter              | 115 |  |  |
|   | 8.3      | System errors due to 4-phase clock source               | 120 |  |  |
|   | 8.4      | Architecture of Digital Filter                          | 133 |  |  |
|   | 8.5      | Summary                                                 | 138 |  |  |
| 9 | Noi      | se Analysis                                             | 139 |  |  |
|   | 9.1      | Noise folding and filtering in Sub-sampling SHA         | 139 |  |  |
|   | 9.2      | Filters in Sub-Sampling SHA                             | 140 |  |  |
|   | 9.3      | Consideration of flicker noise                          | 142 |  |  |
|   | 9.4      | Summary                                                 | 146 |  |  |

| CONTENTS | X |
|----------|---|
|          |   |

| IV           | On-Chip Data Acquisition System                | 148 |  |
|--------------|------------------------------------------------|-----|--|
| 10           | Front-End Circuits                             | 150 |  |
|              | 10.1 Photo-Diode                               | 150 |  |
|              | 10.2 TIA and LPF                               | 151 |  |
|              | 10.3 Summary                                   | 153 |  |
| 11           | DAQ for OSAM Sensor Array                      | 155 |  |
|              | 11.1 Power management                          | 155 |  |
|              | 11.2 SHA partition                             | 159 |  |
|              | 11.3 Interface to Pulse Generator              | 161 |  |
|              | 11.4 Array architecture                        | 163 |  |
|              | 11.5 Summary                                   | 166 |  |
| $\mathbf{V}$ | Implementation, Measurement, and Summary       | 167 |  |
| 12           | Implementation and measurement                 | 168 |  |
|              | 12.1 Specification of Chip RF2                 | 168 |  |
|              | 12.2 Measurement Results of <i>Prototype 1</i> | 172 |  |
|              | 12.3 Measurement Results of <i>Prototype 2</i> | 188 |  |
|              | 12.4 Summary                                   | 190 |  |
| 13           | Issues arising and further work                | 192 |  |
|              | 13.1 Current issues and possible solutions     | 192 |  |
|              | 13.2 Other possible improvements               | 196 |  |

| CONTENTS                               | XI  |
|----------------------------------------|-----|
| 14 Conclusions                         | 199 |
| VI Appendix                            | 204 |
| A Description of Chip RF1              | 205 |
| A.1 Review of the optimising theory    | 205 |
| A.2 Implementation                     | 207 |
| A.3 Simulation and measurement results | 207 |
| Bibliography and Index                 | 210 |
| Bibliography                           | 210 |
| Index                                  | 217 |

# List of Figures

| 1.1  | Optical set-up of OSAM                 |
|------|----------------------------------------|
| 2.1  | Architecture of DAQ system for OSAM    |
| 3.1  | Structure of Phase-Locked Loop         |
| 3.2  | Phase/Frequency Detector               |
| 3.3  | Charge Pump in PLL                     |
| 3.4  | Differential Negative-R VCO            |
| 3.5  | Spectrum of VCO output                 |
| 3.6  | Current Mode Logic                     |
| 3.7  | CML T-type Flip Flop                   |
| 3.8  | Delay-Locked Loop                      |
| 3.9  | RC-CR circuit                          |
| 3.10 | Structure of QVCO                      |
| 4.1  | Clock source solution 1: PLL with QVCO |

| Ι | LIST O | F FIGURES                                                                                        | XIII |
|---|--------|--------------------------------------------------------------------------------------------------|------|
|   | 4.2    | Clock source solution 2: PLL followed by a DLL                                                   | 29   |
|   | 4.3    | Implementation of PFD and charge pump                                                            | 34   |
|   | 4.4    | CML frequency divider                                                                            | 35   |
|   | 4.5    | Divide-by-2 frequency divider                                                                    | 36   |
|   | 4.6    | Differential Buffer                                                                              | 36   |
|   | 4.7    | Differential to single-ended buffer                                                              | 37   |
|   | 4.8    | Comparison of the presented piecewise linear model and BSIM3 model                               | 38   |
|   | 4.9    | SCL D-type latch                                                                                 | 39   |
|   | 4.10   | Modified D-latch circuits of the initial state of toggling                                       | 41   |
|   | 4.11   | Numerical solutions of optimum load resistance $R_{op}$                                          | 44   |
|   | 4.12   | Numerical solutions of toggling time $t_T$                                                       | 45   |
|   | 4.13   | Simulation results for different load resistor $R$                                               | 46   |
|   | 4.14   | Simulation and measurement results of maximum operating fre-                                     |      |
|   |        | quency                                                                                           | 47   |
|   | 4.15   | Quadrature Voltage-Controlled Oscillator                                                         | 50   |
|   | 4.16   | Layout of of an on-chip inductor                                                                 | 52   |
|   | 4.17   | VCO for the $2.624GSample/s$ DAQ                                                                 | 56   |
|   | 4.18   | The 3rd-order loop filter in the presented PLL $\dots$                                           | 57   |
|   | 4.19   | System-level simulation of the PLL with QVCO                                                     | 59   |
|   | 4.20   | $V_{ctrl}({ m control} \ { m voltage} \ { m of \ the} \ { m QVCO})$ in post-layout simulation in |      |
|   |        | Cadence                                                                                          | 60   |

| LIST OF FIGURES | XIV |
|-----------------|-----|
|                 |     |

| 5.1  | Brief sampling procedure of the presented DAQ system $\dots$ | 64 |
|------|--------------------------------------------------------------|----|
| 5.2  | Timing of control pulse signals for $10.5GS/s$ DAQ           | 65 |
| 5.3  | Pulse Generator                                              | 66 |
| 5.4  | Control mechanism of the presented pulse generator           | 68 |
| 5.5  | Circuit diagram of Switch Box                                | 71 |
| 5.6  | Sketch of Edge Detector 1 and Digital Delay Unit             | 72 |
| 5.7  | Edge detection without synchronising                         | 73 |
| 5.8  | Edge detection with synchronising                            | 74 |
| 5.9  | Waveforms in Edge Detector 1 and Digital Delay Unit          | 75 |
| 5.10 | 32/33 Frequency Divider                                      | 75 |
| 5.11 | 2/3 Frequency Divider                                        | 76 |
| 5.12 | Differential logic implementation of D-FF with AND gate      | 76 |
| 5.13 | Edge Detector 2                                              | 77 |
| 5.14 | Low frequency dividers                                       | 78 |
| 5.15 | Layout of Pulse Generator for $10.5GS/s$ DAQ                 | 78 |
| 5.16 | Pulse Ap under different Switch Box configurations           | 80 |
| 5.17 | Timing of control pulse signals for $2.6GS/s$ DAQ            | 80 |
| 5.18 | Pulse Generator for $2.6GS/s$ DAQ                            | 81 |
| 5.19 | Edge Detector 1 and Digital Delay Unit for $2.6GS/s$ DAQ     | 81 |
| 5.20 | Layout of Pulse Generator for $2.6GS/s$ DAQ                  | 82 |

| LIS | ST O | F FIGURES                                                       | XV  |
|-----|------|-----------------------------------------------------------------|-----|
|     | 6.1  | Basic SHA techniques                                            | 87  |
|     | 6.2  | Sub-sampling in frequency domain                                | 88  |
|     | 6.3  | Sub-sampling in time domain                                     | 89  |
|     | 6.4  | Noise folding in Sub-sampling Mixer                             | 90  |
|     | 6.5  | Switched-capacitor as a resistor                                | 90  |
|     | 6.6  | 1st-order switched-capacitor low-pass filter                    | 91  |
|     | 7.1  | Architecture of DAQ system for OSAM                             | 94  |
|     | 7.2  | Sub-sampling for periodical signal                              | 95  |
|     | 7.3  | Sub-sampling for periodical signal in time domain               | 96  |
|     | 7.4  | Charge-domain sampling                                          | 97  |
|     | 7.5  | SHA with Double Differential Switch                             | 98  |
|     | 7.6  | Repetitive sampling strategy                                    | 99  |
|     | 7.7  | Structure of proposed sub-sampling SHA                          | 100 |
|     | 7.8  | Operating procedure of the Sub-Sampling SHA                     | 101 |
|     | 7.9  | Timing of switch control signals for $10.5GHz$ Sub-Sampling SHA | 103 |
|     | 7.10 | Timing of switch control signals for $2.6GHz$ Sub-Sampling SHA  | 104 |
|     | 8.1  | Linearising Feedback Amplifier                                  | 108 |
|     | 8.2  | Feedback loop in LFA                                            | 109 |
|     | 8.3  | High-Gain Low-Bandwidth Buffer                                  | 112 |

| 8.4  | AC simulation results of the present high-gain low-bandwidth Buffer        | 113 |
|------|----------------------------------------------------------------------------|-----|
| 8.5  | Bode Diagram of Equation (8.5)                                             | 114 |
| 8.6  | Idealised circuit for charge-domain sampling                               | 116 |
| 8.7  | Normalised frequency response of charge-domain sampling                    | 117 |
| 8.8  | Frequency response of proposed circuit in simulation                       | 119 |
| 8.9  | 4 different $Virtual\ Pulses$ applied to $Target\ Samples\ V_{out}$        | 123 |
| 8.10 | Discretisation of Virtual Pulses                                           | 124 |
| 8.11 | Output Groups of SHA output                                                | 125 |
| 8.12 | Vectorial sum of $Output\ Groups$ in discrete frequency domain             | 127 |
| 8.13 | DC-Op difference among <i>Output Groups</i> when no calibration is applied | 132 |
| 8.14 | Output Groups removing DC-Op difference                                    |     |
| 8.15 | Digital Filter for the precise solution                                    | 134 |
| 8.16 | Digital Filter for the approximate solution                                | 136 |
| 9.1  | Noise filtering in Sub-Sampling SHA                                        | 140 |
| 9.2  | Continuous sampling affected by low-frequency noise                        | 145 |
| 10.1 | Cross-section of the Photo-Diode implemented in AMS C35 $$                 | 151 |
| 10.2 | Trans-Impedance Amplifier and Low-Pass Filter                              | 152 |
| 10.3 | Frequency response of TIA                                                  | 153 |
| 10.4 | Noise at the output of TIA                                                 | 153 |

| LIST OF FIGURES | XVII |
|-----------------|------|

| 11.1  | Implementation of pseudo-parallel array operating                           | 158 |
|-------|-----------------------------------------------------------------------------|-----|
| 11.2  | Current source for TIA with enabling feature                                | 158 |
| 11.3  | Partition of Sub-Sampling SHA                                               | 160 |
| 11.4  | Current-mode buffer for control pulses                                      | 162 |
| 11.5  | DAQ system architecture for OSAM sensor array                               | 164 |
| 11.6  | Output channel for 1-D differential sensor array                            | 165 |
| 12.1  | Chip RF2: Photo and layout diagrams                                         | 170 |
| 12.2  | Testing platform for Chip RF2                                               | 171 |
| 12.3  | Off-chip logic used for chip-testing                                        | 173 |
| 12.4  | Dark output of <i>Prototype 1</i>                                           | 175 |
| 12.5  | Original output of $Prototype\ 1$ when pulse laser is applied               | 176 |
| 12.6  | Processed output of <i>Prototype 1</i> by removing system error and         |     |
|       | dark noise                                                                  | 177 |
| 12.7  | Leakage current from the N-well-P-sub junction                              | 178 |
| 12.8  | Frequency response of the DAQ in <i>Prototype 1</i>                         | 179 |
| 12.9  | Waveform of signal $f=2f_0$                                                 | 180 |
| 12.10 | Frequency Response of Circuit C in CW laser-input test                      | 181 |
| 12.11 | Retrieved signal in frequency domain                                        | 184 |
| 12.12 | Retrieved signal in time domain                                             | 185 |
| 12.13 | BPhoto: the laser is focusing to the top of the array in <i>Prototype 1</i> | 186 |

| LIST OF FIGURES                                                | XVIII |
|----------------------------------------------------------------|-------|
| 12.14Output waveforms of the pixel array                       | 187   |
| 12.15Relative light power received on the PD array             | 188   |
| 12.16Normalised frequency response of <i>Prototype 2</i>       | 190   |
| 13.1 Pixel circuit removing dark noise and 4-phase-clock error | 193   |
| 13.2 Output channel for the error-removing pixel circuits      | 194   |
| A.1 SCL D-type latch                                           | 205   |
| A 2. Die photos of divided-by-four frequency dividers          | 208   |

# List of Tables

| 3.1  | Truth table of AOR gate                                             | 16  |
|------|---------------------------------------------------------------------|-----|
| 4.1  | Comparison of clock source solutions                                | 32  |
| 4.2  | Frequency range of QVCO                                             | 53  |
| 4.3  | Frequency range of the VCO for $2.6GS/s$ DAQ                        | 56  |
| 4.4  | Characteristics of the 3rd-order filter in the presented PLLs (sim- |     |
|      | ulation results)                                                    | 59  |
| 5.1  | Clock sources of Relative-Phase Clocks                              | 70  |
| 7.1  | Implementations of proposed Sub-Sampling SHA                        | 104 |
| 11.1 | Power Consumption of some key modules in the $10.5GS/s$ DAQ         | 156 |
| 12.1 | Circuit Specifications                                              | 169 |

# Part I

# Introduction to O-SAM and its DAQ system

#### Chapter 1

# Introduction to Optical

# Scanning Acoustic

# Microscopy

# 1.1 Optical Scanning Acoustic Microscopy (the optical part)

Optical Scanning Acoustic Microscopy (O-SAM) is a non-contact method to characterise the property of a material, or to detect hidden faults beneath the material surface.

In an O-SAM system, a series of periodical laser pulses, usually lasting from a few femto-seconds to several nano-seconds for each pulse, is applied on the material surface. When photons hit the surface, they are absorbed locally, and heat the surface. The heat is dissipated from the surface via bulk lattice vibrations (phonons) or surface vibrations (Surface Acoustic Waves (SAW)).

The amplitude and phase of the SAW contains information on the material

properties as well as the homogeneity of the materials. Consequently, if there are hidden defects beneath the surface, the propagation of the SAW will be affected. Therefore by imaging the SAW, these faults can be detected.

The SAW is generated by a high power pulse laser as described above, and the SAW field is detected by a second low power continuous-wave laser (the probe laser). The probe laser usually operates at a different wavelength to that of the pulse laser, so that it can be easily distinguished. As the surface vibrates, the reflected beam changes its direction back and forth slightly. The moving angles of the reflected beam are measured as the amplitudes of the SAW.

The Applied Optics group at the University of Nottingham have experience in building and using OSAM [1, 2, 3, 4, 5]. Figure 1.1 shows a simplified schematic of the general optical set-up of their OSAM system [1, 4].



Figure 1.1: Optical set-up of OSAM

As shown in the figure, the pulse laser is focused on an arc by a Computer Generated Hologram (CGH). Due the shape of the arc, the generated SAW concentrates on the point F. This is where the amplitude of the SAW reaches the maximum value, and it is also where the OSAM measurement is most interested. The probe laser hits the area around point F, and its reflection is detected by the sensor.

SAWs can be detected by measuring the changing angle of a reflected beam using techniques such as knife-edge detection [6], displacement interferometry [7], and photo-emf detection [8]. In the system developed in the University of

Nottingham, a modified knife-edge detector is used, which keeps the simplicity of the original knife-edge technique and improves the energy efficiency [4]. This detecting method involves a pair of differential photo-diodes, while other methods usually use single-ended photo-diodes.

Sometimes the density of the material sample is not uniform, or there are hidden faults in the sample. In these cases, the SAW cannot focus on the point F. Therefore the vibration on the area around the point F has to be thoroughly scanned by the probe laser and the detector. A more effective way to do this is by using a sensor array[5]. In this work [5], a 1-D differential sensor array, which is effectively a  $2 \times 16$  photo-diode array, was designed to detect the SAW.

# 1.2 Data Acquisition (DAQ) system for O-SAM (the electronic part)

#### 1.2.1 Detecting picosecond vibration

The high power pulse laser used to generate the SAW has a repeating frequency of approximately 82MHz. Therefore the SAW generated on the surface of the sample will contain harmonics of this frequency. Based on this feature, Sharples [4] designed an electronic sensor system with the lock-in detection technique. Initial research was concentrated on using the fundamental harmonic, i.e. 82MHz. Later experiments also used higher order harmonics up to several hundred megahertz. The limitation of his system is the bandwidth of the photon detection circuits.

However, some optical experiments without involving electronic circuits reveals that the SAW contains picosecond-range vibrations [9], i.e. at least several gigahertz. But compared to electronic circuits, optical devices are usually more bulky and expensive. Measuring electronically gives possibility of making a portable instrument, which would be more usable, convenient, and low-cost.

Therefore, a faster electronic detection system is naturally in demand. If faster circuits are used, higher frequency harmonics can be detected. The higher frequency harmonics have smaller wavelengths, and consequently the resolution of the imaging system will be better.

#### 1.2.2 Design targets

The aim of this thesis is to design an ultra-fast Data-AcQuisition (DAQ) system to measure the SAWs in O-SAM. It converts the optical signal (the reflecting probe laser) to an electronic signal, and then digitises it. A photo-diode array is included in this DAQ for the convenience of measurement.

The optical input has a repeating period equal to the laser pulse repetitive frequency, i.e. 82MHz, and harmonics up to at least several gigahertz. The presented DAQ system was designed to capture the signal in time domain. The amplitudes and phases of the signal harmonics could be obtained by Fourier Transforming the obtained signal. The desired sampling rate of this system is 10GSample/s, therefore it should be able to detect the frequency information up to 5GHz.

The circuit was implemented on-chip so that making a low-cost portable instrument would be possible. The fabrication process used here was AMS C35, a  $0.35\mu m$  standard CMOS process with 4 layers of metal and 2 layers of polysilicon.

The SAW will contain frequency information greater than 5GHz. But it should noted that the 10GS/s sampling rate is very close to the performance limitation of the AMS C35 process. The insights into the design methodology will be invaluable when designing similar circuits in a more advanced fabrication process to achieve a higher sampling rate.

#### 1.3 Thesis organization

This thesis is divided into 6 parts.

Part I (Chapter 1 and 2) is a brief introduction to the DAQ system. Chapter 1 gives the background knowledge of OSAM, while Chapter 2 briefly presents the architecture of the DAQ and the design objectives.

Part II (Chapter 3~5) describes one key module of the DAQ, the clock source. The background introduction is given in Chapter 3. Chapter 4 presents the clock synthesiser, a 2.624GHz PLL with 4-phase outputs. Chapter 5 describes the pulse generator based on that PLL, which is used to drive the sampler.

Part III (Chapter 6~9) presents the other key module of the DAQ, the Sub-Sampling SHA (Sample-and-Hold Amplifier). Again, the first chapter (Chapter6) contains the background introduction. Chapter 7 presents the core circuit of the Sub-Sampling SHA, while its peripheral modules for error-correction are described in Chapter 8. Chapter 9 discusses the noise issues of the sampler.

Part IV (Chapter 10 and 11) is focused on the DAQ system itself. In Chapter 10, the front-end circuits, which are based on Mexiong Li's circuits, are introduced. Chapter 11 presents the detailed structure of the DAQ for OSAM sensor array.

Part V gives the measurement results (Chapter 12), and discusses the current issues and possible solutions (Chapter 13). The thesis is summarised in Chapter 14.

Part VI is the appendix.

#### Chapter 2

### System Architecture

#### 2.1 Structure and function description



Figure 2.1: Architecture of DAQ system for OSAM

A brief architecture of the presented DAQ system for OSAM is shown in Figure 2.1. As shown in the figure, the Probe Laser signal is detected by the photodiode and amplified by a Trans-Impedance Amplifier (TIA). The output of the TIA is fed to a low-pass filter (LPF), so that any frequencies higher than half of the sampling rate are eliminated.

The Sub-Sampling Sample-and-Hold Amplifier (SHA) is the core module of the DAQ system. It samples the RF-band signal from the LPF, and transforms its spectrum down to a very low frequency range. Because of its frequency transfer ability, Sub-Sampling SHAs are sometimes termed Sub-Sampling Mixers.

The output of the Sub-Sampling SHA is digitised by a low-frequency A/D converter (ADC). The digital filter after the ADC is applied to compensate the distortion caused by the Sub-Sampling SHA.

The pulse generator provides the control pulses for the Sub-Sampling SHA, and also acts as the central control unit of the system. It is based on a 2.624GHz PLL, which uses the electric synchronising signal from the pulse laser source as the reference signal. The PLL generates the clock signals in 4 evenly-divided phases. Therefore the minimum phase difference among the clocks is 1/4 of their period. This is equivalent to a clock signal at 10.496GHz, which are exploited to provide the required sampling signals.

Figure 2.1 illustrates the data acquisition of one photo-diode pixel only. The presented DAQ system is designed for a photo-diode array, and details of the array architecture are given in Chapter 11.

#### 2.2 Thesis objectives

In the presented DAQ system, the front-end modules (photo-diode, TIA, and LPF) are based on the topology of Li's design [10, 11], which is described in Chapter 10.

The low-frequency modules, i.e. the ADC and the digital filter, are currently offchip in order to shorten the design period. As they are not high-speed circuits, these modules can be easily implemented by existing mature technologies. They will be integrated into the on-chip system in the future prototypes.

This thesis is mainly focused on two key modules, the pulse generator and the Sub-Sampling SHA, which are presented in detail in Part II and Part III respectively.

The thesis is written in the structural order, i.e. the clock source first, then the SHA, and finally the DAQ. However, the time line of the design procedure was

actually:

PLL in the pulse generator  $\rightarrow$  Sub-Sampling SHA  $\rightarrow$ The pulse generator  $\rightarrow$  DAQ

The 4-phase output from the PLL makes the 10GS/s sampling possible whilst using a lower clock frequency. If a single phase output was used, a clock frequency of 10GHz would be required, and the design would not be achievable in the low cost AMS C35 process.

The ultra-fast Sub-Sampling SHA was designed to use the 4-phase clock source, and the whole pulse generator was tailored to satisfy the requirement of the control pulses for the Sub-Sampling SHA. Finally, the architecture of the whole DAQ was basically determined by the structure and features of the Sub-Sampling SHA and the pulse generator.

### Part II

# Clock Source and Pulse Generator

To achieve the required 10GS/s sampling rate, the most basic requirement is a clock operating at a frequency of more than 10GHz. However, this frequency is beyond the performance that the  $0.35\mu m$  CMOS process can deliver. Alternatively, a slower multiple clock source with the equivalent frequency information can be used to implement this function as well.

Part II presents such a clock source, and a pulse generator designated for the DAQ system for OSAM. The clock source is synchronised with the pulse laser via a PLL, and provides a multi-phase output which can be considered as the replacement of the 10GHz clock. The pulse generator circuit uses these clock signals to control the DAQ system, i.e. it provides the essential control signals for the Sub-Sampling SHA.

Chapter 3 introduces the background knowledge of clock synthesisers. Chapter 4 discusses the possible solutions to the DAQ for the OSAM firstly, then presents the designed clock source, a 2.624GHz PLL with quadrature outputs. Based on this clock source, the pulse generator is presented in Chapter 5.

#### Chapter 3

# Introduction to On-Chip

## Clock Synthesiser

This chapter introduces two commonly used techniques for clock synthesisers, the Phase-Locked Loop and the Delay-Locked Loop. Some methods for quadrature signal generation are also discussed in this chapter, as the multi-phase output is required for the DAQ system.

#### 3.1 Phase-Locked Loop (PLL)

#### 3.1.1 A brief history of PLL

The idea of PLL was firstly published by de Bellescize in 1932 [12]. This technique was mainly used for synchronous radio receptions at that time. Widespread use of the PLL began with TV receivers during the 1940's. PLLs were used to synchronise the screen sweeping oscillators to the sync pulses [13].

PLL circuits were quite complex at first, as they were implemented by discrete components. During 1960's, the development of integrated circuits rapidly

changed this situation. The availability of monolithic PLL IC created a considerable number of new applications which were previously limited by cost and complexity [13]. For a theoretical description of PLLs, references [14, 15, 16] should be consulted.

The availability of large-scale ICs after later 1970's brought strong interests in the implementation and design of digital PLL (DPLL), which is effectively a semi-analogue circuit [13]. The All-Digital PLL (ADPLL) and Software-Controlled PLL (SCPLL) were developed in 1980's [17]. These later two PLLs are more flexible than the traditional PLLs [16]. However, their operating speed is limited by the digital logic or software programmes, and so these PLLs are not suitable for high-speed applications. Consequently, analogue PLL and DPLL still play important roles in those applications [13].

Nowadays, PLL technology is widely used in communication, telemetry, instrumentation, motor control, etc. It is so important that there are still a great number of research papers published every year in this area.

#### 3.1.2 Principle and structure

PLL is a device that makes a signal track another one (the reference) [18]. The frequency of that signal can be either equal to that of the reference, or a multiple of it. Their phases are synchronised, and that is the reason why it is called "phase-locked". PLL can also be considered as a feedback control system that automatically corrects the phase error between the signal and the reference. Figure 3.1 illustrates the general structure of a PLL.

The reference signal is represented by its phase,  $\phi_{ref}$ . It is compared to a feedback from the output,  $\phi_F$ , by a phase detector. The phase detector transfers the phase error into a voltage signal, i.e.

$$V_e = K_d(\phi_{ref} - \phi_F) \tag{3.1}$$



Figure 3.1: Structure of Phase-Locked Loop

This equation is only a behaviour model. The real situation is much more complicated, and is discussed in detail at Sub-Section 3.1.3 on the following page.

 $V_e$  is fed into a Low-Pass Filter (LPF), whose transfer function is  $H_f(s)$ . The LPF is inserted to suppress the noise and high-frequency components in  $V_e$ . Consequently,

$$V_c = H_f(s)V_e = H_f(s)K_d(\phi_{ref} - \phi_F)$$

In ideal conditions, the output of the LPF  $V_c$  is a stable voltage signal, which can be used to control the VCO.

VCO (Voltage-Controlled Oscillator) is the module which generates the final output. Its oscillation frequency, or angular frequency, is determined by the control voltage  $V_c$ . In small-signal analysis, the VCO is usually considered as a linear element with the relationship  $\omega_o = K_v V_c$ .

However, it is the phase which is of interest, and so an extra block,  $\frac{1}{s}$ , is inserted in Figure 3.1, because the phase is essentially the integration of the angular frequency, i.e.

$$\phi_{out} = \frac{\omega_o}{s} = \frac{1}{s} K_v H_f(s) K_d(\phi_{ref} - \phi_F)$$
(3.2)

The Frequency Divider (FD) divides the output frequency by the number N, i.e.

$$\phi_F = \phi_{out}/N \tag{3.3}$$

FD usually appears in clock synthesizers, where the PLL is used to generate a clock whose frequency is N times of the reference. In the case that the output frequency is equal to that of the reference, N=1.

According to Equation (3.2) and (3.3), the transfer function of PLL can be derived:

$$\phi_{out} = \frac{1}{s} K_v H_f(s) K_d(\phi_{ref} - \phi_{out}/N)$$

$$\phi_{out} = \frac{N K_v K_d H_f(s)}{sN + K_v K_d H_f(s)} \phi_{ref}$$
(3.4)

Given enough time,  $\phi_{out} = N\phi_{ref}$ , and the PLL becomes stable and phase-locked.

#### 3.1.3 Phase detector and charge pump

As mentioned above, the phase detector is used to detect the phase difference between the reference  $\phi_{ref}$  and the feedback signal  $\phi_F$ . In Equation (3.1), its transfer function is described as a linear relationship. In reality, the output from a phase detector is a series of pulses which needs to be averaged to get the required phase error. The output also contains parasitic high frequency terms which need removing. Consequently a LPF at the output of the phase detector is always required.

There are a few different implementations of phase detectors, such as multiplier, XOR gate, and sequential logic.

#### Analogue multiplier phase detector

Analogue multipliers, such as Gilbert Cell, can be directly used as a phase detector in a PLL [19]. If the reference signal is  $V_1 \cos(\omega t + \phi_{ref})$  and the

feedback signal is  $V_2 \cos(\omega t + \phi_F)$ , the output of the Gilbert Cell is

$$V_e = \beta V_1 V_2 \cos(\omega t + \phi_{ref}) \cos(\omega t + \phi_F)$$
$$= \frac{1}{2} \beta V_1 V_2 \left(\cos(2\omega t + \phi_{ref} + \phi_F) + \cos(\phi_{ref} - \phi_F)\right)$$

where  $\beta$  is a constant depending on the property of the Gilbert Cell. The highfrequency component  $\cos(2\omega t + \phi_{ref} + \phi_F)$  will be "removed" by the LPF, and so the output voltage is given by

$$V_e \approx \frac{1}{2}\beta V_1 V_2 \cos(\phi_{ref} - \phi_F)$$

which is a DC voltage related to the phase difference.

#### XOR gate phase detector (Digital multiplier phase detector)

The XOR gate is a very simple digital implementation of phase detector. Its truth table is shown in Table 3.1. If the two input signals are considered as square waves, the XOR gate has a similar function as an analogue multiplier.

|     | A=0 | A=1 |
|-----|-----|-----|
| B=0 | 0   | 1   |
| B=1 | 1   | 0   |

Table 3.1: Truth table of XOR gate (Output = A XOR B)

If we define the logic "0" as -1, the logic "1" as 1, then

$$AXORB = -A \times B$$

which means the XOR gate acts as a digital multiplier.

#### Phase detector using sequential logic

The multiplier-based phase detectors, i.e. the analogue multiplier and the XOR gate, have been widely realized in discrete circuit systems, but are not popular

in high-performance on-chip systems. This is due to some of their shortcomings such as limited acquisition range, and the dilemma between phase error and response time [20].

The widely-used solution in on-chip PLL is the sequential-logic-based phase detector. Figure 3.2(a) is a simple implementation of this type of phase detector [21, 22]. It is often termed Phase/Frequency Detector (PFD), as it can detect both phase difference and frequency difference [20].



Figure 3.2: Phase/Frequency Detector

Figure 3.2(b) illustrates the timing of PFD. If the reference input is ahead of the local oscillator, which is the feedback signal from the VCO through the FD, the "Up" signal is set. On the contrary, if the local oscillator is ahead of the reference, the "Down" signal is set. The pulse widths of the "Up" and "Down" are proportional to the phase difference  $(\phi_{ref} - \phi_F)$ .

## Charge Pump

PFD is often applied together with a charge pump, which is effectively a pair of controllable current sources [20]. Figure 3.3 illustrate how the charge pump works. In this figure, the LPF is replaced by a capacitor in order to simplify the explanation. When "Up" is active, the upper switch turns on and  $V_c$  goes up; When "Down" is active, the lower switch turns on and  $V_c$  goes down.



Figure 3.3: Charge Pump in PLL

# 3.1.4 Low-Pass Filter (LPF)

As mentioned above, the output of the phase detector or the charge pump is a series of pulses, which can not be directly used to control the VCO. So a LPF is inserted between the phase detector and the VCO to average the pulses.

When the frequency of the feedback signal is close to the reference frequency, the repetitive frequency of the output pulses of the phase detector is approximately equal to the reference frequency. Therefore the attenuation of the LPF at the reference frequency is an important parameter in PLL design, because these pulses always causes some spurs on the VCO¹. Obviously, a high-order LPF, e.g. a 4th-order or a 5th-order one, has a better performance on suppressing spurs than a low-order LPF.

However, a high-order LPF may cause the PLL to become unstable. If the transfer function of LPF  $H_f(s)$  is redefined as

$$H_f(s) = \frac{a(s)}{b(s)}$$

where a(s) and b(s) are polynomial expressions, the order of b(s) indicates how many poles the LPF transfer function has. Applying this definition to Equation (3.4) on page 15, then

$$\phi_{out} = \frac{NK_vK_da(s)}{sNb(s) + K_vK_da(s)}\phi_{ref}$$

<sup>&</sup>lt;sup>1</sup>A detailed description of these spurs is presented in Sub-Section 3.1.5 on Page 20.

Therefore the PLL will always have at least one pole, and always has one more pole than the LPF. This extra pole is due to the integration effect of the VCO, i.e.  $\phi_{out}$  is the integration of  $\omega_o$ .

Since in practical implementations, the PLL will always have more than one pole, the PLL is potentially unstable, especially when a high-order LPF is used in the PLL. Consequently, its stability must be carefully investigated.

# 3.1.5 Voltage-Controlled Oscillator (VCO)

The VCOs used in the PLLs are not different from those employed for other applications, such as modulation and automatic frequency control [18]. Four types of VCO commonly used are given in the order of decreasing stability, namely, voltage-controlled crystal oscillators (VCXO), resonator oscillators, RC multi-vibrators, and YIG tuned oscillators [14, 15].

As crystals are not available on-chip, the resonator oscillators are often used in on-chip high-performance PLLs. This type of VCO has a tunable LC-tank, which is a passive circuit involving inductors (L) and capacitors (C). The LC-tank provides a resonant frequency, and this frequency is tunable via a variable capacitor (or sometimes a pair of variable capacitors). The frequently-used single-ended resonator VCOs includes Colpitts oscillators, Hartley oscillators, and Clapp oscillators [20, 23]. But the VCO to be used in the presented DAQ system is a differential VCO, which is often termed Negative-R VCO [23, 24].



Figure 3.4: Differential Negative-R VCO

Figure 3.4 is a simplified differential Negative-R VCO. In this VCO, the cross-coupled transistors provide a negative resistance which is in parallel with the LC-tank. Therefore the resistive loss inside the LC-tank is compensated by the negative resistance, and the circuit oscillates at the resonant frequency of the LC-tank. Its differential structure naturally generates a pair of outputs which have 180° of phase difference.

#### Spurs in VCO spectrum

As mentioned in Sub-Section 3.1.4 on page 18, the pulses from the phase detectors cause spurs in the VCO spectrum. This is because  $V_c$ , the control voltage of the VCO, is frequency-modulated into the VCO output. Any ripples on  $V_c$  will cause a small offset on the VCO oscillating frequency.

Typically, when the PLL is phase-locked, the output pulses from the phase detector has a frequency the same as the reference input,  $f_{ref}$ . Although these pulses are significantly suppressed by the LPF, they will still affect the spectrum of the VCO.

As for the PFD shown in Figure 3.2 on page 17, ideally, when the reference and the output of the FD are perfectly synchronised, the charge pump would not operate in any time, and its output is a stable DC voltage without any frequency information on  $f_{ref}$ . However in reality, the PMOS and NMOS transistors in the charge pump turn on for a very short time almost simultaneously when the rising edges of the input signals come. This results in a small ripple on the output of the charge pump. Naturally, the ripples have a repeating rate of  $f_{ref}$ .

These pulses or ripples on  $f_{ref}$  generate a few spurs in the spectrum of the VCO output. Figure 3.5 shows an example of a typical VCO spectrum. These spurs have a constant interval of  $f_{ref}$ , and the two spurs next to the main peak (the oscillating frequency) are  $f_{ref}$  away from it as well. In this case,  $f_{ref}$  is termed spur frequency. The interference on the spur frequency should be suppressed as

much as possible by the LPF, so that the spurs on the VCO output spectrum can be retained in the smallest amount.



Figure 3.5: Spectrum of VCO output

# 3.1.6 Frequency Divider (FD)

Frequency dividers are basically digital counters, which are usually available in design libraries, or can be easily synthesized from digital Flip-Flops.



Figure 3.6: Current Mode Logic

However in high-speed applications, the conventional CMOS Flip-Flops are not quick enough. Current-Mode-Logic (CML) circuits are widely used in this case [25, 26, 27]. CML circuits use differential amplifiers as the basic elements, because differential circuits are quicker than the normal logic circuits. As there are two branches in the circuit, the logic "1" and "0" are represented by which branch the current goes through, as shown in Figure 3.6. Figure 3.7 shows a CML T-type Flip-Flop, which can work as a divide-by-2 FD.



Figure 3.7: CML T-type Flip Flop

Over the last few years, there has been considerable research focusing on optimizing CML circuits [27, 28], especially those using CMOS fabrication processes [29, 30].

# 3.2 Delay-Locked Loop (DLL)

One major limitation of using PLL as a clock synthesizer is the phase noise. An alternative solution to it is the Delay-Locked Loop (DLL). Its phase noise does not depend on the integrated inductor quality factor, and the random timing error does not accumulate from cycle to cycle [31].



Figure 3.8: Delay-Locked Loop

Figure 3.8(a) is the block diagram of a 3-stage DLL. The delay time of the 3 delay stages is controlled by the voltage output of the LPF. When the circuit becomes stable, the phase of the 3rd delay stage  $\phi_3$  is synchronised with the input phase  $\phi_0$ , i.e.  $\phi_3 = \phi_0$ . Since the three delay stages are identical,  $\phi_1 = \phi_0 + 120^\circ$ , and  $\phi_2 = \phi_0 + 240^\circ$ . The edge combiner adds the output of the delay stages together, and obtains a signal in 3 times the frequency of the input, as shown in Figure 3.8(b).

The transfer function of DLL is

$$\phi_N = NK_L H_f(s) K_d(\phi_0 - \phi_N)$$

$$\phi_N = \frac{NK_L K_d H_f(s)}{1 + NK_L K_d H_f(s)} \phi_0$$
(3.5)

where  $K_L$  is the voltage-to-phase gain of each delay stage, and N is the number of stages. In Figure 3.8, N=3. Equation (3.5) has one less pole than Equation (3.4) on page 15. Therefore DLL is more likely to be stable than PLL.

# 3.3 Generation of quadrature signals

In the presented DAQ system, the clock source is required to produce 4-phase outputs, i.e.  $0^{\circ}$ ,  $90^{\circ}$ ,  $180^{\circ}$  and  $270^{\circ}$ . This section introduces some methods to generate these quadrature signals.

#### RC-CR network



Figure 3.9: RC-CR circuit

A simple quadrature technique is the RC-CR network [21], as shown in Figure 3.9.  $V_1$  and  $V_2$  always have a phase difference of 90°. The drawback of this circuit is that the amplitudes of  $V_1$  and  $V_2$  are usually unequal, except at the frequency  $1/2\pi RC$ .

#### Divide-by-2 FD

Another simple method is using a divide-by-2 FD. For example, the circuit in Figure 3.7 can achieve this function. When the duty cycle of CKin+/CKin-is 1:1, Qout+/Qout-is in quadrature with A+/A-.

However, CKin+/CKin- must be twice the required frequency. When that frequency is not achievable in the given fabrication process, this method is not applicable.

#### Quadrature VCO

Quadrature Voltage-Controlled Oscillator (QVCO), which provides precise quadrature outputs, is based on two cross-coupled differential VCOs [32, 33]. The coupling structure forces these two VCOs oscillating in the same frequency and keeping a phase difference of 90°. Figure 3.10 sketches the general structure of a QVCO.

In this QVCO, two LC-tanks are driven by two negative resistors, which can be practically implemented by cross-coupled transistors. Two voltage-controlled current sources,  $g_{mc}$ , are applied to couple the oscillators. So

$$V_1(\frac{1}{sL} + sC) = V_2 g_{mc} \tag{3.6}$$

and

$$V_2(\frac{1}{sL} + sC) = -V_1 g_{mc} \tag{3.7}$$



Figure 3.10: Structure of QVCO

Multiplying (3.6) by (3.7) at both sides,

$$V_1 V_2 (\frac{1}{sL} + sC)^2 = -V_1 V_2 g_{mc}^2$$

If the circuit is oscillating,  $V_1V_2 \neq 0$ ,

$$\frac{1}{g_{mc}^2} (\frac{1}{sL} + sC)^2 = -1$$

therefore,

$$\frac{1}{g_{mc}}(\frac{1}{sL} + sC) = \pm j$$

and

$$V_1 = \pm jV_2$$

which means  $V_1$  and  $V_2$  are always in quadrature. The oscillating angular frequency is

$$\omega = \sqrt{\frac{1}{LC} + \frac{g_{mc}^2}{4C^2}} \mp \frac{g_{mc}}{2C}$$

There are two output frequencies, which corresponds to  $90^{\circ}$  and  $-90^{\circ}$  of phase differences between  $V_1$  and  $V_2$ . An ideal circuit as in Figure 3.10 provides these two frequencies simultaneously. In a real QVCO, these two frequencies have different feedback loop gains because of the parasitic resistances in the inductors, therefore only the one with the larger loop gain is generated in the

oscillator [34], i.e.

$$\begin{cases} V_1 = -jV_2 \\ \omega = \sqrt{\frac{1}{LC} + \frac{g_{mc}^2}{4C^2}} + \frac{g_{mc}}{2C} \end{cases}$$

In this case,  $V_1$  is 90° later than  $V_2$ .

## 4-Stage DLL

According to Section 3.2 on page 22, it is obvious that a 4-stage DLL can provide the required quadrature output.

# 3.4 Summary

This chapter introduced the fundamental theory of clock synthesisers. Two commonly used techniques for clock synthesisers, the Phase-Locked Loop and the Delay-Locked Loop, were described here. Some methods for quadrature signal generation are also discussed in this chapter, as the multi-phase output is required for the DAQ system.

# Chapter 4

# Design of Clock Synthesiser

# 4.1 Solutions to the clock source in the DAQ

# 4.1.1 System requirement

As mentioned in Part I, the design target of the presented DAQ is a sampling rate more than 10GSample/s. Thus a clock source which is more than 10GHz, or at least containing frequency information of more than 10GHz, is required.

For this clock source, there is a perfect ready-made clock reference, the stimulation pulse laser, which is the very source of all OSAM signals. The laser source usually provides an electrical output synchronised with the laser pulses. It can be used as the reference input of the clock source.

The 128th harmonic of the laser pulse repetitive frequency is slightly above 10GHz ( $82MHz \times 128 = 10.496GHz$ ), and so meets the specification. Moreover,  $128 = 2^7$ , is an easy number for frequency division, because only 7 divideby-2 frequency dividers are needed.

In the  $0.35\mu m$  standard CMOS process, AMS C35, the maximum oscillation frequency  $(f_{max})$  of NMOS transistors is below 50GHz, and the transient frequency  $(f_T)$  of NMOS is below 30GHz [35]. It is consequently impossible to

make a sequential circuit operating at 10GHz in this process. In reality, amplifiers can not reach a bandwidth more than 6GHz even using inductors for shunt-peaking [11]. Amplifiers are always needed to buffer the signals and clocks, and inductors occupy significantly larger chip areas than any other components. (The smallest one in AMS C35 process is more than  $6 \times 10^4 \mu m^2$ , while most transistors are less than  $100\mu m^2$ ). Moreover, the RF SPICE models provided by the foundry are only valid up to 6GHz[35], which also indicates that circuits operating at more than 6GHz are not realistic.

Therefore, the only way to overcome this limitation is to use multiple clocks operating at a lower frequency, rather than a single direct 10GHz clock. For example, one option could be a 5.248GHz clock  $(82MHz \times 64)$  with two output signals at different phases, 0° and  $180^{\circ}$ . The time difference between these two signals is half of their period, i.e.  $1/(2 \times 5.248GHz)$ . Similarly, a 2.624GHz clock  $(82MHz \times 32)$  with 0°, 90°,  $180^{\circ}$ , and  $270^{\circ}$  output, or a 3.444GHz clock  $(82MHz \times 42)$  with 0°,  $120^{\circ}$ , and  $240^{\circ}$  output<sup>1</sup>, are also applicable.

Ideally, the number of inductors need to be minimised, and so the lower clock frequency was chosen. As mentioned above, those high frequency amplifiers need inductors to boost their bandwidth, while inductors occupy large chip areas. This bandwidth-boost method is not suitable for a sensor array, as every pixel has to have several inductors to achieve the performance, and this would make the total chip area alarmingly huge. So inductor-less circuits are preferred for our application, i.e. the circuit bandwidth has to be reduced further. Additionally, if considering the simplicity of the frequency dividers, the 2.624GHz clock with 4-phase output is the most suitable choice.

# 4.1.2 Clock source solutions

Once the clock frequency has been chosen, there are two possible solutions to generate the 4-phase clock signals.

 $<sup>^1 {\</sup>rm In}$  this case, the highest frequency achieved is the 126th harmonic (42  $\times$  3 = 126) of the fundamental frequency, 10.332 GHz.

## Solution 1: PLL with QVCO

The first solution is a 2.624GHz PLL with a QVCO, which is able to generate the required 4-phase output (0°, 90°, 180°, and 270°). Figure 4.1 illustrates the structure of the clock source. The PLL locks with the 82MHz synchronising signal, and provides the  $\times 32$  frequency output, i.e. 2.624GHz. VCO-I and VCO-Q are cross coupled so that their outputs are exactly in quadrature.



Figure 4.1: Clock source solution 1: PLL with QVCO

## Solution 2: PLL followed by DLL

The second solution is shown in Figure 4.2. Firstly, a normal  $\times 32$  PLL provides the 2.624GHz clock. Then a 4-stage DLL is applied to generate the 4 phases,  $0^{\circ}$ ,  $90^{\circ}$ ,  $180^{\circ}$ , and  $270^{\circ}$ .



Figure 4.2: Clock source solution 2: PLL followed by a DLL

### 4.1.3 Solution comparison

#### Chip area

A VCO requires an LC-tank, which contains at least one inductor, so the VCO will require a large chip area. Since the QVCO is essentially two cross-coupled VCOs, its chip area is approximately double. *Solution 1* has a QVCO, while the *Solution 2* has a normal VCO only. The DLL contains no inductors, and therefore needs much less chip area.

#### Power consumption

VCO is also a power-hungry circuit, and so the QVCO will have approximately double the power consumption of a single VCO. On the other hand, DLL contains several buffers operating at 2.624GHz. These high-frequency buffers are also power-consuming. So both of the two solutions need lots of power.

#### Responding time

Solution 1 has only one feedback loop, the PLL. But Solution 2 has two feedback loops, the PLL and the DLL. As a result, the responding time of the Solution 1 is faster than Solution 2.

## Signal degradation

In DLL design, it is very important to maintain the signal level throughout all the delay stages [31]. Otherwise, the signal going through the delay stages will degrade, i.e. the voltage swing would get smaller and smaller after every stage. This results in a serious problem for DLL, as the voltage swing affects the delay time of the stage. If the delay stages have different voltage swings, they have different delay times. Consequently, their output phases are no longer 90°, 180°,

270°, and 0°, but four unevenly-divided phases. The later stages provide less delay than its previous stages, for example, the output phases can be something like 93°, 184°, 273°, and 0° from the first stage to the last stage. The phase difference provided by each stage in this case is not 90°, but 93°, 91°, 89° and 87°, respectively. This is merely an example, and the real situation can be dramatically worse if signal degradation is obvious.

To overcome this problem, each delay stage must have enough gain and bandwidth to regenerate the input signal in the required delay time, namely

$$1/82MHz/32/4 = 95.3ps$$

Consequently, 95.3ps after the input changes, the output of the buffer amplifier must be no smaller than that of the input. This requirement is similar to the bandwidth requirement for a given rise/fall time for signal integrity. Accordingly, the bandwidth can be estimated from

$$BW = \frac{0.35}{RT}$$

where BW is the required bandwidth, RT is the rise (or fall) time of the signal [36]. The rising time here is defined as from 10% of the desired change to 90% of it. Therefore the bandwidth of the delay stage in our DLL can be estimated as

$$BW \approx \frac{0.35}{95.3ps \times 0.8} = 4.6GHz$$

This bandwidth is almost impossible to achieve without inductors in our given  $0.35\mu m$  CMOS process. Even if each stage contains just one inductor, the total of 4 inductors would make the DLL circuit much larger than the PLL, which has only one or two inductors.

Not like Solution 2, Solution 1 uses the QVCO to provide quadrature signals. The two VCOs inside the QVCO generate the phase output by themselves. Therefore the signal degradation is not an issue to QVCO.

#### Summary of comparison

Table 4.1 summarises the characteristics of the two solutions for the clock source. As shown in the table, the "PLL+DLL" solution is more economical as it needs less chip area. The "PLL with QVCO" solution is more functional, although one of its features, the responding time, is unimportant to the application.

|                    | Solution 1:   | Solution 2:             |
|--------------------|---------------|-------------------------|
|                    | PLL with QVCO | PLL+DLL                 |
| Chip area          | Big           | Relatively small        |
| Power Consumption  | High          | High                    |
| Responding time    | Short         | Long                    |
|                    |               | Severe, can be          |
| Signal degradation | Not a problem | overcome by sacrificing |
|                    |               | chip area               |

Table 4.1: Comparison of clock source solutions

However, to overcome signal degradation, the "PLL+DLL" solution has to sacrifice even more chip area than the other solution. This makes "PLL with QVCO" the more reliable and suitable solution.

Therefore, the "PLL with QVCO" solution is selected as the clock source for the proposed DAQ system design, and its structure is shown in Figure 4.1 on page 29. The sub-modules of the clock source are described in detail in the following sections.

In Section 7.7 on page 102, it is mentioned that besides the 10.496GSample/s DAQ, another 2.624GSample/s DAQ circuit is also implemented. This circuit needs a 2.624GHz clock source without the multi-phase output. Therefore, only a normal PLL is required. Its structure is the same as the PLL part of the "PLL+DLL" solution (Figure 4.2 on page 29). Most of its sub-circuits (PD, LPF, FD) can share the same design as those in the "PLL with QVCO" solution, except the VCO, which is described in detail in Sub-Section 4.4.2.

# 4.2 Phase/Frequency Detector and charge pump

The Flip-Flop based Phase/Frequency Detector (PFD) is used as the phase detector in Figure 4.1 on page 29. Although an analogue multiplier or an XOR gate can also be used as the phase detector, it may cause a problem of nonconstant phase change.

In multiplier or XOR gate based PLLs, the control voltage for VCO is provided by LPF. The voltage of LPF results from the phase difference between the local oscillator and the reference input. When the environment parameters (such as the temperature) change, the characteristics of VCO may change. To keep the PLL operating at the same frequency, the control voltage needs to be changed as well. Therefore the phase difference between the local oscillator and the reference input should be changed.

As a result, the phase difference between the output clock (which provides the "local oscillator" signal) and the laser pulse (which provides the "reference" signal) is not a constant, but may change when the environment changes. Although the phase value is not a required parameter for the measurement, it is necessary to keep it constant for data alignment, i.e. the measured data from different tests can be precisely aligned for comparison. Therefore multiplier or XOR gate based phase detectors are not suitable for the application.

On the other hand, the PFD using sequential logic in Figure 3.2(a) on page 17 can guarantee the phase difference between the local oscillator and the reference is always fixed when PLL is stable, whatever the environment is.

The PFD in the presented PLL is shown in Figure 4.3 together with the charge pump. The PFD is slightly different to the theoretical diagram in Figure 3.2 on page 17.

In this circuit, an additional capacitor  $C_{ext}$  is inserted on the output of the AND gate in order to extend the reset signal, Rst. If  $C_{ext}$  was not included, the reset times of the two D-FFs would depend thoroughly on the parasitic



Figure 4.3: Implementation of PFD and charge pump

capacitance, and so the reset times of the two D-FF would be different. There is a possibility that one D-FF is instantly reset to zero, deactivating Rst, before the other D-FF can be reset.  $C_{ext}$  causes a delay on Rst so that it remains active for a short time after the first D-FF changes to zero. Therefore resetting both D-FFs is ensured.

The transistors in the charge pump (MP1 and MN1) are not ideal current source, but this issue does not affect the functionality of the PLL. The current provided by either MP1 or MN1 ranges approximately from 0.2mA to 0.4mA, when the transistor is in saturation region. In the following sections, the gain of the PFD and the charge pump  $(G_{PDCP})$  are considered as

$$G_{PDCP} = \frac{0.3mA}{2\pi}$$

for the system-level evaluation of the PLL.  $G_{PDCP} = \frac{0.4mA}{2\pi}$  is also used as the extreme condition for stability analysis, as this is where the PLL is most likely to be unstable.

# 4.3 Frequency divider (FD)

# 4.3.1 FD using Source-Coupled Logic

The FD in the presented PLL is a divide-by-32 divider. As  $32 = 2^5$ , it can be implemented by five divide-by-two dividers in cascade mode. The input frequency is 2.624GHz, which is divided to 82MHz by the FD. As CML has better performance in high frequency than CMOS logic, CML is used to implement the FD.

The structure of the  $\div 32$  divider is shown in Figure 4.4. Five  $\div 2$  dividers (FD1, FD2, ..., FD5) are connected in cascade mode.



Figure 4.4: CML frequency divider

The circuit of each  $\div 2$  FD is shown in Figure 4.5. It is essentially a T-type Flip-Flop, which consists of 2 cross-coupled D-type latches. Sometimes the load resistors in the Flip-Flop are replaced by PMOS transistors, as their non-linear resistance is more functional for this application. However, transistors have larger parasitic capacitors than the linear poly-silicon resistors. To achieve a higher speed, the linear resistors are used here.

The five FDs have three different configurations on transistor sizes and load resistance, i.e. FDcfg1, FDcfg2 and FDcfg3 in the figure. These difference are

caused by trade-off between circuit speed and power consumption, The first two FDs (FD1 and FD2) need more speed as they operate in higher frequency. The latter three (FD3, FD4 and FD5) operate at lower frequency, so the performance requirement is eased. Therefore power-saving becomes a priority. The trade-off and optimisation is discussed in detail at Sub-Section 4.3.3.



Figure 4.5: Divide-by-2 frequency divider

A buffer circuit, as shown in Figure 4.6, is inserted between FD2 and FD3. It is needed because the voltage swing at the output of FD2 is not big enough to drive FD3.



Figure 4.6: Differential Buffer

The differential-to-single-ended buffer is a simple push-pull Op-Amp, as shown in Figure 4.7 [37]. It transfers the differential output of FD5 into a single-ended logic signal which is compatible with normal CMOS logic. This is the signal which is fed into the "Local Oscillator" terminal of PFD.



Figure 4.7: Differential to single-ended buffer

# 4.3.2 Optimisation for frequency dividers

As shown in Figure 4.4 on page 35, the first frequency divider FD1 operates at the highest frequency, divider 2.624GHz to 1.312GHz. It has the most critical performance requirement than any of other FDs in the figure. In this subsection, the mechanism of the SCL Flip-Flop based FD is investigated, and a methodology to optimise the circuit performance is presented.

The CML Flip-Flop based FD consists of two D-type latches, which are connected in the master-slave mode as shown in Figure 4.5 on the preceding page. The toggle speed of the latches determines the maximum operating frequency of the Flip-flop. To fully understand the speed limitations of the FD, the mechanism of the latch is analysed. There are some literature on general optimising methods for CML [30, 29] in CMOS processes, and those for the bipolar processes[27], yet this sub-section presents an optimising technology specified for CML D-type latches.

#### Simplified transistor model

As a digital circuit, the latch operates in the large-signal mode, which is quite complicated for theoretical analysis. To simplify the calculation, a piecewise linear model is applied to the current-voltage characteristics of the MOS transistors, namely

$$I_{DS} = \begin{cases} G_m(V_{GS} - V_T) & \text{if } V_{GS} \ge V_T \\ 0 & \text{if } V_{GS} < V_T \end{cases}$$
 (4.1)

where  $I_{DS}$  is the DC current from drain to source,  $V_{GS}$  is the DC voltage from gate to source,  $V_T$  is the effective threshold voltage, and  $G_m$  is the effective mean trans-conductance.  $V_{GS} < V_T$  is the cutoff region of the transistor, and  $V_{GS} \ge V_T$  is the combination of the triode and saturation regions.

 $V_T$  and  $G_m$  can be estimated from experimental measurements or simulations using a more accurate model, e.g. BSIM3[38]. In the proposed latch design, the values of  $V_T$  and  $G_m$  applied are those which have the minimum root-mean-square error to the BSIM3 model in the current-voltage curve. In this estimation,  $V_T$  is slightly larger than the values used in other transistor models, and  $G_m$  can be considered as an "average" value of the AC trans-conductance,  $g_m$ . Similar to  $g_m$ ,  $G_m$  can be adjusted by changing the transistor gate size. Figure 4.8 illustrates the comparison of an I-V curve based on the presented model and the one based on BSIM3 model in simulation.



Figure 4.8: Comparison of the presented piecewise linear model and BSIM3 model (Simulation condition:  $V_{SB}=1.5V,\,V_{DS}=1.5V,\,5\mu m/0.35\mu m$  NMOS transistor)

It must be noted that this piecewise linear model is inaccurate, and ignores the variety of  $V_{DS}$  as well. It is only suitable for design-parameter and performance

estimation in early-stage design. Accurate simulations on CAD software are necessary to fine-tune the design parameters.

#### Basic equations of latch toggling

Figure 4.9 shows a single D-type latch, which is half of the divide-by-2 FD.



Figure 4.9: SCL D-type latch

The circuit latches the data value when the clock input is low  $(V_{Clk+} < V_{Clk-})$ . Under this condition, transistor MN5 is off and transistor MN6 is on. The output of the latch (Dout+ / Dout-) remain constant, irrespective of the data input (Din+ / Din-), because of the feedback from the output to the input of the differential pair formed by transistors MN3 and MN4. When the clock goes high  $(V_{Clk+} > V_{Clk-})$ , MN6 turns off and MN5 turns on, and the output is determined by the data input (Din+ / Din-) through the differential pair MN1/MN2. Consequently the toggle speed of the latch depends on the response of the output ports to the input ports after the rising edge of the clock. This speed determines the maximum operating frequency of an SCL Flip-flop.

In the following analysis of the latch toggling, it is defined that the analogue voltages on the data input (Din+ / Din-) are  $V_{IN+}$  and  $V_{IN-}$ , respectively, and those on the data output (Dout+ / Dout-) are  $V_{OUT+}$  and  $V_{OUT-}$ , respectively.

It is assumed that the output logic state of the latch is low  $(V_{OUT+} < V_{OUT-})$ ,

and a logic-high signal has been applied to the input  $(V_{IN+} > V_{IN-})$  and settled before the rise edge of the clock, i.e. the input capacitors are fully charged. Thus the latch will start to toggle its logic state immediately after the clock turns high. To simplify the analysis, the transient effects from the data inputs are ignored  $(V_{IN+})$  and  $V_{IN-}$  remain constant through at the analysis).

Before the rise edge of the clock, the voltage of the common source point S  $(V_S)$  is equal to  $V_{IN+} - V_T$  as there is no current through MN5. After the rising edge, MN5 switches on and  $V_S$  reduces. This increases the current through MN1. As the output state will change from  $V_{OUT+} < V_{OUT-}$  to  $V_{OUT+} > V_{OUT-}$ , the current on MN1 helps the toggling by discharging the output capacitor on the point Dout-. If  $V_S$  reduces to a value lower than  $V_{IN-} - V_T$ , MN2 will switch on and a current will go through this transistor, reducing the charging current of the output capacitor on Dout+, and thereby slowing the toggling process. Thus, it is essential for a fast toggling to ensure that MN2 is off all the time during the toggling process, i.e.  $V_{IN-} - V_S \leq V_T$ . Furthermore, the most effective condition is  $V_{IN-} - V_S = V_T$  at the end of the toggling, as the most differential gain is obtained here. Under these conditions, a value for the bias current source  $I_{DS}$  can be found:

$$I_{DS} = G_m(V_{IN+} - V_{IN-}) = G_m(V_{IN+} - V_S - V_T)$$
(4.2)

This condition can be roughly met by carefully setting the DC bias points, although it is based on an approximate model.

In the ideal conditions described above, the circuit of Figure 4.9 can be modified to Figure 4.10. In deriving this model, all transistors are assumed to switch on and off perfectly.  $C_1$  and  $C_2$  are the total capacitors on the corresponding points to ground, including gate capacitors, load capacitors, and parasitic capacitors. Dout+ and Dout- are assumed to be symmetrical, so have identical capacitors,  $C_2$ . Although there are capacitances other than those connecting to ground (e.g. from Dout- to Din+), they can be transferred to effective capacitances to ground because signals on all positions are co-related.



Figure 4.10: Modified D-latch circuits of the initial state of toggling

To aid modelling, the voltage on each of the capacitors is assumed to be zero. This requires the addition of DC voltage sources as shown in Figure 4.10. These voltage sources do not affect  $V_{OUT+}$ ,  $V_{OUT-}$ , and  $V_S$ . Therefore the circuit performance remains the same as that in the original topology.

By applying Kirchhoff's current law in the Laplace domain at nodes S, Dout+, and Dout-, a set of simultaneous equations can be formed,

$$\begin{cases}
G_m(V_{IN+} - V_T - V_S) + sC_1(V_{IN+} - V_T - V_S) = I_{DS} \\
\frac{V_{DD} - V_{OUT+}}{R} + (V_{DD} - V_{OUT+})sC_2 = G_m(V_{IN+} - V_T - V_S) \\
(V_{DD} - RI_{DS} - V_{OUT-})sC_2 = \frac{V_{OUT-} - V_{DD}}{R}
\end{cases}$$
(4.3)

The actual value of the differential output voltage  $(V_{OUT^+} - V_{OUT^-})$  depends on the biasing and application. However, in all applications, the output values must regenerate the input values, or the flip-flop will not correctly toggle. So the circuit is analysed in terms of a large signal gain defined by

$$G_v(t) = \frac{V_{OUT^+}(t) - V_{OUT^-}(t)}{V_{IN^+} - V_{IN^-}}$$

To keep the circuit operating, this gain must be greater than or equal to 1, namely  $G_v(t) \geq 1$ . Therefore, the transit time of toggling  $(t_T)$  can be defined from

$$G_v(t_T) = 1 (4.4)$$

As mentioned above, it is assumed that the input does not change during the whole transition. Consequently the time dependence on the input is ignored.

Inverse Laplace transforms are used to solve equations (4.2) and (4.3) to obtain

$$G_v(t) = RG_m \left( 1 - \frac{2T_1 - T_2}{T_1 - T_2} e^{-\frac{t}{T_1}} + \frac{T_2}{T_1 - T_2} e^{-\frac{t}{T_2}} \right)$$
(4.5)

where

$$\begin{cases} T_1 = RC_2 \\ T_2 = \frac{C_1}{G_m} \end{cases}$$

## Optimisation

From equation (4.5), to obtain a faster circuit response time,  $T_1$  and  $T_2$  should be as small as possible. To achieve this, it requires R,  $C_1$ , and  $C_2$  to be reduced, whilst increasing  $G_m$ .

However,  $G_m$  cannot be increased indefinitely because higher  $G_m$  requires larger bias currents. The bias current is usually limited by power consumption constraints. Moreover,  $C_1$  and  $C_2$  include contributions from the gate capacitors and other parasitic capacitors related to the gate size. Increasing  $G_m$  also results in an increase in the gate size and hence  $C_1$  and  $C_2$ . Therefore any gain from increasing  $G_m$  is offset by the increase in  $C_1$  and  $C_2$ , which may have further circuit constraints limiting the optimised value.

Thus, the most convenient parameter for optimising is R. A smaller R gives a smaller  $T_1$ . But  $G_v$  must be equal to, or larger than 1 when the time t is long enough. Otherwise, the signals would attenuate from latch to latch, and disappear after a few loops. So R should be carefully selected so that it gives a small  $T_1$  and a large enough  $A_v$  simultaneously.

The divider reaches its maximum operating frequency when (4.4) is satisfied. The optimum value of R can be found by solving (4.4) and (4.5) numerically.

An analytical solution can be obtained if a further simplification is made. The Taylor series of (4.5) at t=0 is

$$G_v(t) = RG_m \left(-1 + \frac{1}{T_1}t + \frac{(T_1 - T_2)}{2T_1^2T_2}t^2 + \dots\right)$$

The first-order item,  $\frac{1}{T_1}t$ , is not related to  $T_2$ . That means  $T_1$  dominates the characters of  $G_v(t)$  around t=0, while  $T_2$  provides only a second order effect. If  $T_2$  is ignored, (4.5) becomes

$$G_v(t) = RG_m \left( 1 - 2e^{-\frac{t}{T_1}} \right) \tag{4.6}$$

Solving this equation for  $t_T$  by using the condition (4.4), the optimum value of R for the fastest toggling is achieved when  $\frac{\partial t_T}{\partial R} = 0$  in Equation (4.6), i.e.

$$\begin{cases}
R_{op}G_m = \left(\ln \frac{2RG_m}{RG_m - 1}\right)^{-1} + 1 \\
t_T = R_{op}C_2 \ln \frac{2RG_m}{RG_m - 1}
\end{cases}$$
(4.7)

Where  $R_{op}$  is the optimum value of R.

Let  $X = RG_m$ , Equation (4.7) becomes

$$X = \left(\ln \frac{2X}{X - 1}\right)^{-1} + 1$$

which can be solved iteratively to get

$$X = 1.59582518$$

Therefore

$$\begin{cases}
R_{op} \approx \frac{1.60}{G_m} \\
t_T \approx 1.68 R_{op} C_2 \approx 2.68 \frac{C_2}{G_m}
\end{cases}$$
(4.8)

Therefore,  $R_{op}$  can be easily estimated when all other parameters have been determined according to the application requirements.  $R_{op}$  depends on  $G_m$  only, and the toggling time  $t_T$  is generally proportional to  $C_2$ .

A Flip-flop consists of two latches, hence its maximum operating frequency  $f_{max-op}$  can be estimated as

$$f_{max-op} = \frac{1}{2t_T}$$

$$= 0.187 \frac{G_m}{C_2} = \frac{0.298}{R_{op}C_2}$$
(4.9)

As  $C_2$  may be different on the two latches, the larger value should be applied in Equation (4.9).

It must be noted that these equations are valid if and only if  $T_2$  is ignored. Otherwise, Equation (4.4) and (4.5) need to solved numerically.

Figure 4.11 gives the optimum load resistance  $R_{op}$  obtained numerically using a  $G_m$  value of  $2.2 \times 10^{-3} \Omega^{-1}$ , while  $C_1$  and  $C_2$  are scanning parameters. This numerical solution of  $R_{op}$  is quite near the estimation of Equation (4.8)  $(R_{op} \approx \frac{1.60}{G_m} = 0.73k\Omega)$  in most cases, except when  $C_2$  is smaller than approximately twice of  $C_1$ . This situation can be generally avoided by careful circuit layout.



Figure 4.11: Numerical solutions of optimum load resistance  $R_{op}$ 

Figure 4.12 shows the contours of the numerical solution of the toggling time  $t_T$ 

in the same conditions as above.  $t_T$  is generally proportional to  $C_2$ , and slightly affected by  $C_1$ . This solution tends to be equal to that in Equation (4.8) when  $C_1$  approaches zero.



Figure 4.12: Numerical solutions of toggling time  $t_T$ 

#### Validation and trade-off

To validate the above optimising method, test frequency divider circuits were designed and tested[39]. The details of this validation chip (Chip RF1) can be found in Appendix A on page 205. In this sub-section, only the simulation and measurement results are presented.

These FDs in Chip RF1 are almost the same except that the load resistors R are different. According to the above discussion, the FD with the optimum value of R would have the maximum operating frequency.

Figure 4.13 shows some waveforms in the post-layout simulation in Cadence. In Figure 4.13(b), R (0.66 $k\Omega$ ) is smaller than the optimum value  $R_{op}$ . As a result, the circuit has insufficient gain to regenerate the input signal, and the output signal drops at each toggling event. After a few clock cycles, the differential output becomes zero. In Figure 4.13(c), R (0.73 $k\Omega$ ) is equal to  $R_{op}$ , and the FD operates successfully with the 5.7GHz clock input. In Figure 4.13(d), R (0.87 $k\Omega$ ) is larger than  $R_{op}$ . Although it is able to provide a bigger gain, its



Figure 4.13: Simulation results for different load resistor R (Input frequency= 5.7GHz)

toggling time  $t_T$  is longer. Therefore the output amplitude decreases in every clock cycle, until the circuit fails to respond to a clock edge. When such a failure occurs, the circuit has enough time to resume its output amplitude in the next clock cycle. Since it misses a clock edge every a few cycles, it is not able to operate as a normal frequency divider.

The simulation and measurement results of the maximum operating frequencies of the FDs are given in Figure 4.14. The continuous curve is the prediction of maximum operating frequency based on Equation (4.5) and (4.4), i.e. when  $T_2$  is ignored. The optimum load resistance  $R_{op}$  obtained by Equation (4.8) is at the peak of this curve. The dashed curve is the prediction as well, except that  $T_2$  is also considered. A number of chips are tested to show the effect of process variety. Their measured results are those circular dots in the figure.



Figure 4.14: Simulation and measurement results of maximum operating frequency

The predictions based on presented analysis underestimate the maximum operating frequency about 5% to 10%. This is mainly caused by ignoring the effects of MN3 and MN4 in Figure 4.9, which boost the output signals when MN6 is switched on. However, the estimation to the load resistance is very close to the reality. The  $R_{op}$  derived from Equation (4.8) is  $0.726k\Omega$ , and if  $T_2$  is not ignored, its numerical solution is  $0.729k\Omega$ . The simulation results show that this value is

around  $0.73k\Omega$ . The dividers with the estimated optimum resistance  $(0.73k\Omega)$  have an average maximum operating frequency of 5.5GHz, while the fastest one reaches 5.7GHz. As far as we can see, this is the fastest static frequency divider reported in literature using  $0.35\mu m$  CMOS process [40, 41, 42].

The maximum operating frequency reduces slowly when R is larger than  $R_{op}$ . However, it drops significantly if R is smaller than  $R_{op}$ . This is because a small  $RG_m$  makes Equation (4.4) difficult to meet. In the worst case, i.e.  $RG_m < 1$ , Equation (4.4) is impossible to be met however long  $t_T$  is. In reality, this indicates the gain of the circuit is smaller than 1, and therefore the circuit would not operate.

The resistivity of resistors used in the CMOS process have a large variation approximately 20%. So setting R to the optimum value might result in a low yield, as those circuits with low resistivity will have a very poor performance. Therefore R should be chosen to a value slightly larger than the optimum value, for e.g. 10% larger. This slightly reduces the over-all performance, but gives a better yield.

Moreover, the larger value of R results in larger gain for the differential pair, and is consequently more robust to noise and interference. When the frequency requirement is eased, bigger R is preferred for reliability.

#### 4.3.3 Implementation of FD

The frequency input of the desired  $\div 32$  FD is 2.624GHz, which is much lower than the maximum operating frequency (5.5GHz) achieved by those FDs in Chip RF1. However, these FDs are power-hungry circuits  $(3.3V \times 3mA = 9.9mW$  for each  $\div 2$  FD), as the design target of Chip RF1 was to pursue the highest possible operating frequency. As for the FD in the DAQ system, it is not necessary to achieve such high performance.

Therefore the biasing current for the FD (i.e. the current source in Figure 4.9 on page 39) is significantly reduced, which results in smaller  $G_m$  for each  $\div 2$ 

FDs in the  $\div 32$  FD. The sizes of the transistors used in the FDs can also be reduced, which makes not only  $G_m$ , but also  $C_2$  smaller. According to (4.9), the maximum operating frequency of the FD would not reduce as fast as the biasing current drops, actually much slower.

As shown in Figure 4.4 on page 35, there are five  $\div 2$  FDs in the  $\div 32$  FD, which uses three different configurations of transistor sizes (which affect the bias current and  $G_m$ ) and load resistances.

FD1 operates at the highest frequency range, 2.624GHz frequency input. It uses the configuration FDcfg1, which reduces the total bias current to 0.9mA (0.45mA for each latch). The transistor sizes in this FD (the gate width) are reduced by half comparing to those in Chip RF1. The load resistance is slightly larger than the optimum value  $R_{op}$ , because of the reason described in the last two paragraphs of Sub-Section 4.3.2 on Page 48. The maximum operating frequency  $f_{max-op}$  of FD1 is 5.2GHz in post-layout simulation.

FD2 operates at 1.312GHz, which uses the configuration FDcfg2. The total bias current here is reduced further to 0.4mA (0.2mA for each latch), but the transistor sizes remain the same as FDcfg1. The load resistance is selected in the same way as FDcfg1.  $f_{max-op}$  of FD2 is 3.9GHz in post-layout simulation.

FD3, 4 and 5, operating below 700MHz, share the same configuration FDcfg3. It is almost the same as FDcfg2, except that the load resistance is much larger in order to provide a larger voltage swing at the output. Its  $f_{max-op}$  is 3.2GHz in post-layout simulation.

#### 4.4 VCO

# 4.4.1 Design of QVCO

The VCO for the presented PLL is a QVCO, which consists of two cross-coupled normal negative-R VCOs, and provides quadrature outputs. The structure of

the designed QVCO is shown in Figure 4.15.



Figure 4.15: Quadrature Voltage-Controlled Oscillator  $L_1=2.6nH,\,C_p=0.1pF,\,C_{var}=0.33pF({\rm maximum,\,with\,\,57\%\,\,tuning\,\,range});$  Transistor sizes (W/L): MN1~4:  $60/0.35\mu m;\,{\rm MP1}^-8:\,80/0.35\mu m$ 

Each of the two normal VCOs (VCO-I and VCO-Q) contains an LC-tank which provides the resonant frequency. Four transistors (MP1, MP2, MN1, and MN2 for VCO-I, MP3, MP4, MN3, and MN4 for VCO-Q) are used in each VCO to build the "negative-R", which gives the energy for oscillating, as described in Sub-Section 3.1.5 on page 19. VCO-I and VCO-Q are cross-coupled together, as shown Figure 4.15: I+ and I- are coupled to Q- and Q+ via transistors MP7 and MP8, respectively. On the other direction, Q+ and Q- are coupled to I+ and I- (not I- and I+) via transistors MP6 and MP5, respectively.

With this topology, the voltage signals on  $\mathbf{Q}+$  and  $\mathbf{Q}-$  are 90° later than  $\mathbf{I}+$  and  $\mathbf{I}-[34]$ . Therefore the four-phase outputs, 0°, 90°, 180°, and 270°, correspond to the voltage signals on  $\mathbf{I}+$ ,  $\mathbf{Q}+$ ,  $\mathbf{I}-$ , and  $\mathbf{Q}-$ , respectively.

#### LC-tank

Each LC-tank, as shown in Figure 4.15, contains a pair of inductors  $(L_1)$ , a pair of poly-silicon capacitors  $(C_p)$ , and a pair of varactors  $(C_{var})$ . The intrinsic resonant frequency of the LC-tank is

$$f_{res} = \frac{1}{2\pi\sqrt{2L_1 \times (\frac{1}{2}C_p + \frac{1}{2}C_{var})}} = \frac{1}{2\pi\sqrt{L_1(C_p + C_{var})}}$$

The real oscillating frequency is slightly higher than  $f_{res}$  because of the cross-coupling and the parasitic resistances in the LC-tank[34].  $f_{res}$  is variable by changing  $C_{var}$ , i.e. changing the bias voltage  $V_{ctrl}$  of the varactors in Figure 4.15. Consequently the oscillating frequency also changes corresponding to  $f_{res}$ .

It may look redundant to use two inductors in the tank rather than just one inductor with the inductance of  $2L_1$ . However, the two ports of an on-chip inductor are not identical, as shown in Figure 4.16 [35, 43]. One port connects to the outer terminal of the metal spiral, while the other connects to the inner terminal. As a result, these two ports are not symmetrical concerning the structure and values of the parasitic capacitance and resistance[35]. This does not cause any obvious problems for an normal VCO. But in QVCO, where two VCOs are cross-coupled, this asymmetry results in unbalanced coupling. The output of the two VCOs are no longer in 90° phase difference, but a few degrees away from that. The simulation in ADS shows that the phase difference between the two VCOs is 88°, when a single inductor is used in each LC-tank.

However, if two identical inductors are used, the two ports of the LC-tank can be symmetric. Therefore the phase difference between VCO-I and VCO-Q remains 90°.

For the same reason, the LC-tank has two poly-silicon capacitors, rather than one. The poly-silicon capacitors are made from two piled-up poly-silicon layers, one facing the substrate, and the other facing upwards[43]. The two ports of the capacitor, which connect with the two poly-silicon layers respectively, are



Figure 4.16: Layout of of an on-chip inductor (MET3, MET4: The 3rd and 4th metal layers away from the substrate in AMS C35 process)

obviously not identical. Therefore two poly-silicon capacitors are used so that the two LC-tank ports are symmetric, as shown in Figure 4.15.

### Oscillating frequency range

In the presented PLL, the output frequency is fixed at 2.624GHz. The oscillating frequency range of the QVCO must be sufficient to offset any process variances of the inductors and capacitors. However, there is a compromise since if the oscillating range is chosen too large, this will provide a determinant effect on the noise performance.

Fortunately, the inductance of the on-chip inductors are determined by its geometry shape [20, 24], which hardly changes due to process variety. However, the process variety does affects the capacitance, including those of the poly-silicon capacitors, the varactors, and the parasitic capacitors of all devices.

The oscillating range was checked in post-layout simulation by Cadence. The results are shown in Table 4.2. The *Typical Mean* setting is the most commonly

used simulation setting. It uses typical and mean values of the process parameters. The Worst Speed Capacitor setting, as it is named, is the worst case of slow capacitors, i.e. using the biggest unit capacitances of all devices. On the contrary, the Worst Power Capacitor setting means the smallest unit capacitance of all devices, which gives the highest operating frequency and consequently makes the circuit most power-consuming.

| Simulation Setting    | Lowest Frequency    | Highest Frequency   |  |
|-----------------------|---------------------|---------------------|--|
|                       | $(V_{ctrl} = 0.5V)$ | $(V_{ctrl} = 2.8V)$ |  |
| Typical Mean          | 2.44GHz             | 2.70GHz             |  |
| Worst Speed Capacitor | 2.40GHz             | 2.64GHz             |  |
| Worst Power Capacitor | 2.48GHz             | 2.71GHz             |  |

Table 4.2: Frequency range of QVCO

In Table 4.2, the range of the bias voltage  $V_{ctrl}$  was chosen to be between 0.5V and 2.8V, rather than the ground (0V) and the supply voltage (3.3V), because it is difficult for the charge pump to provide an output voltage range from 0 to 3.3V. As shown in Figure 4.3 on page 34, the charge pump is made from two transistors, whose threshold voltages  $(V_T)$  are around 0.6V in AMS C35 process. The output voltage of the charge pump is actually the drain-source voltage  $(V_{ds})$  of the transistors, which can be only slightly lower than  $V_T$  in the smallest case.

The simulation results in Table 4.2 shows that the desired frequency, 2.624GHz, is always in the QVCO's operating range. The average voltage-to-frequency gain of the QVCO,  $K_{QVCO}$ , in *Typical Mean* setting is

$$K_{QVCO} = \frac{2.70GHz - 2.44GHz}{2.8V - 0.5V} = 113MHz/V$$

#### 4.4.2 Design of the VCO for 2.6GS/s DAQ

As mentioned in Sub-Section 4.1.3 on page 32 and Section 7.7 on page 102, another 2.624GSample/s DAQ is also implemented, which needed a 2.624GHz clock source without quadrature output. The PLL to generate this clock signal

had almost the same sub-modules as the PLL with quadrature output, except that a normal negative-R VCO replaces the QVCO.

In Sub-Section 4.4.1 on page 51, it is mentioned that single-inductor LC-tank can be used for normal VCOs. But there is another problem to be concerned for this 2.624*GHz* VCO, the current limit of metal wires in the inductor, as illustrated below.

For a LC-tank with inductance L and capacitance C, the oscillating frequency  $f_{osc}$  is

$$f_{osc} = \frac{1}{2\pi\sqrt{LC}} \tag{4.10}$$

The energy stored in the LC-tank  $E_t$  is

$$E_t = \frac{1}{2}CV_{p-p}^2 = \frac{1}{2}LI_{p-p}^2$$

where  $V_{p-p}$  is the peak-to-peak voltage of the tank capacitor, and  $I_{p-p}$  is the peak-to-peak current of the tank inductor [24]. Therefore

$$CV_{p-p}^2 = LI_{p-p}^2 (4.11)$$

According to Equation (4.10) and (4.11),

$$I_{p-p} = \frac{V_{p-p}}{2\pi f_{osc} L}$$

If assuming the current is a sine wave, the Root-Mean-Square (RMS) of the current  $I_{rms}$ , i.e. the equivalent DC current, can be estimated to

$$I_{rms} \approx \frac{V_{p-p}}{2\sqrt{2}\pi f_{osc}L}$$

For the presented VCO<sup>2</sup>,  $V_{p-p} \approx 2.3V$  and  $f_{osc} = 2.624GHz$ . So

$$I_{rms}L \approx 99mA \cdot nH \tag{4.12}$$

On the other hand, there is a current density limit to the metal wires in the chip (and all other materials in the chip as well). An RMS current density larger than that limit may cause over-heating and possibly damage the circuit.

In AMS C35 process, the inductors are pre-defined and fixed. Therefore the width of the spiral metal wire in the inductor determines its maximum current. Unfortunately, the product of the inductance and the maximum current of all available inductors in AMS C35 process can not exceed  $99mA \cdot nH$ . The maximum product is  $86mA \cdot nH$ , which is for a 1.4nH inductor [35, 43].

To overcome this problem, two inductors, rather than one, are used in the LC-tank. The structure of this VCO, as shown in Figure 4.17, is very similar to VCO-I and VCO-Q in the previous Sub-Section, but without the coupling transistors. The two inductors in the VCO are both 2.6nH, with the current limit of 24mA. The series connection of the two inductors make the overall current-inductor-product at  $125mA \cdot nH$ , which meets the requirement of Equation (4.12).

The oscillating frequency range of the VCO was checked in post-layout simulation, and presented in Table 4.3. The desired frequency, 2.624*GHz*, is always with the operating range even in the boundary parameter settings, i.e. *Worst Speed Capacitor* and *Worst Power Capacitor*.

The average voltage-to-frequency gain of the VCO,  $K_{VCO}$ , in Typical Mean setting is

$$K_{VCO} = \frac{2.81GHz - 2.47GHz}{2.8V - 0.5V} = 148MHz/V$$

<sup>&</sup>lt;sup>2</sup>Similar to the available range of  $V_{ctrl}$ , mentioned in Sub-Section 4.4.1 on Page 53, the lower and upper peak voltages can not be 0V and 3.3V, but slightly higher than 0V and lower than 3.3V. Therefore  $V_{p-p}$  is set to 2.3V, giving 0.5V margins to both boundaries.



Figure 4.17: VCO for the 2.624GSample/s DAQ  $L_1=2.6nH,~C_p=0.13pF,~C_{var}=0.33pF$  (maximum, with 57% tuning range); Transistor sizes (W/L): MN1,2:  $45/0.35\mu m$ ; MP1,2:  $120/0.35\mu m$ 

| Simulation Setting    | Lowest Frequency $(V_{ctrl} = 0.5V)$ | $	ext{Highest Frequency} \ (V_{ctrl} = 2.8V)$ |  |
|-----------------------|--------------------------------------|-----------------------------------------------|--|
| Typical Mean          | 2.47GHz                              | 2.81GHz                                       |  |
| Worst Speed Capacitor | 2.40GHz                              | 2.74GHz                                       |  |
| Worst Power Capacitor | 2.53GHz                              | 2.83GHz                                       |  |

Table 4.3: Frequency range of the VCO for 2.6GS/s DAQ

## 4.5 Loop filter

The function of the loop filter in a PLL is to integrate the output pulses from the PFD and its charge pump to form a stable DC voltage, which can be used to control the VCO.

To reduce the reference spurs on the VCO (see Sub-Section 3.1.5 on page 19 for details), the fundamental frequency 82MHz must be suppressed by the loop filter. Therefore the cut-off frequency ( $f_{co}$ ) of the low-pass loop filter should be as low as possible. The further  $f_{co}$  is away from 82MHz, the better the suppression effect is.

However, a loop filter with a low  $f_{co}$  are not good at eliminating the phase noise from the VCO[20]. A solution to both the phase noise issue and the spur issue is a high-order loop filter, which is able to provide a high attenuation at 82MHz while  $f_{co}$  need not to be very far away from 82MHz. But a high-order filter can potentially make the PLL unstable, and therefore is complicated to design.

The loop filter in the presented PLL is a widely-used passive 3rd-order low-pass filter, as shown in Figure 4.18. The filter can be divided into two sub-filters. The first one consists of  $C_1$ ,  $R_1$  and  $C_2$ , while  $R_2$  and  $C_3$  compose the second one.



Figure 4.18: The 3rd-order loop filter in the presented PLL

The first sub-filter is the main filter which implements the function of a loop filter, i.e. transfers the pulses into a stable DC voltage.  $C_2$  is the biggest

capacitor in the filter, which stores most of the electrical charge to maintain the control voltage of VCO  $(V_{ctrl})$ .  $R_1$  connects in serial with  $C_2$ . It provides an instant voltage response to the current from the charge pump.  $C_1$  also stores some charge to maintain  $V_{ctrl}$ , but its main function is to smooth the ripples generated by the instant response of  $R_1$ .

The second sub-filter,  $R_2$  and  $C_3$ , is a 1st-order RC-filter with a much higher cut-off frequency than the first sub-filter. The purpose of inserting this sub-filter is to provide more attenuation to the spur frequency, in addition to that naturally provided by the first sub-filter.

The component parameters, i.e. the capacitance and the resistance, are optimised by the Design Guide program in ADS, and based on the PLL with QVCO. In this design program, the setting of the acceptable ranges of the parameters are based on the device availability in AMS C35 process, and other practical issues such as the chip area. As for the gain of the PFD and charge pump  $(G_{PDCP})$ , and the gain of the QVCO  $(K_{QVCO})$ , the average values are applied. The desired attenuation on the spur frequency is set to more than 50dB, and the unit-gain frequency<sup>3</sup> is set to 4MHz, approximately 1/20 of 82MHz.

The optimising results are the parameter values shown in Figure 4.18. Although these values are optimised for the PLL with QVCO, they are also applicable to the PLL for the 2.6GS/s DAQ, i.e. the one with a normal VCO<sup>4</sup>. Table 4.4 gives the simulation results of the two PLLs, including the bandwidth (unit-gain frequency), stability (phase margin), and attenuation at the spur frequency.

As shown in the table, the unit-gain frequencies are around 4MHz, and the phase margins are more than  $30^{\circ}$  even in the extreme conditions. The extreme conditions are where the charge pump has its maximum output currents. The

 $<sup>^3</sup>$ As the primary concern in Design Guide for PLL is the stability, it is more interested in the unit-gain frequency (0dB-gain point) rather than the cut-off frequency (-3dB point). The unit-gain frequency here is that of the whole PLL in terms of phase signals. It can be considered as the over-all effective bandwidth of the PLL, and is highly dependent on the bandwidth of the loop filter.

<sup>&</sup>lt;sup>4</sup>The details of this PLL with a normal VCO is discussed in Sub-Section 4.1.3 on Page 32, and Sub-Section 4.4.2 on page 53.

| Simulation setting                                                     | Unit-gain<br>frequency of<br>PLL (MHz) | Phase margin<br>at unit-gain<br>frequency | Attenuation at spur frequency $(dB)$ |
|------------------------------------------------------------------------|----------------------------------------|-------------------------------------------|--------------------------------------|
| PLL with QVCO, $G_{PDCP} = 0.3mA/2\pi$                                 | 3.252                                  | 43.7°                                     | 55.49                                |
| PLL with QVCO,<br>$G_{PDCP} = 0.4mA/2\pi$<br>(extreme condition)       | 4.072                                  | 39.1°                                     | 52.99                                |
| PLL with normal VCO,<br>$G_{PDCP} = 0.3mA/2\pi$                        | 3.981                                  | 40.0°                                     | 53.14                                |
| PLL with normal VCO,<br>$G_{PDCP} = 0.4mA/2\pi$<br>(extreme condition) | 5.012                                  | 33.9°                                     | 50.64                                |

Table 4.4: Characteristics of the 3rd-order filter in the presented PLLs (simulation results)

simulations also show that the attenuation on the spur frequency is always more than 50dB.

## 4.6 Simulation of clock synthesiser



Figure 4.19: System-level simulation of the PLL with QVCO (FrefMHz: Reference frequency in MHz; FoscGHz: Oscillating frequency of the QVCO in GHz)

Figure 4.19 shows the system-level simulation of the presented PLL with QVCO in ADS. The PFD, VCO, and FD used in the simulation are system-level models

with the parameters extracted from the post-layout simulation of these subcircuits. The loop filter in the simulation is made up by the corresponding components from AMS C35 library.

Figure 4.19 illustrates the transient effect when the reference input jumps from 81MHz to 82MHz. After the sudden change of the reference, the oscillating frequency of the QVCO gradually rises from 2.592GHz ( $81MHz \times 32$ ) to 2.624GHz ( $82MHz \times 32$ ). The lock-in time is approximately  $1\mu s$ .

The post-layout simulation of the PLL with QVCO has been performed in Cadence. Figure 4.20 shows how the control voltage of the QVCO  $(V_{ctrl})$  is stabilized after power-up. It takes about 1.2 $\mu s$  to lock the QVCO on 2.624GHz.



Figure 4.20:  $V_{ctrl}$  (control voltage of the QVCO) in post-layout simulation in Cadence

The power consumption of the PLL with QVCO is high, 56mA total current from the 3.3V power supply, i.e. 0.18W. The power is mainly dissipated in those circuits operating at 2.624GHz, including the QVCO, the QVCO's output buffers, and the FD. As mentioned in Section 4.1, it is difficult to implement gigahertz applications in AMS C35 process, whose NMOS transistors has  $f_T < 30GHz$  and  $f_{osc} < 50GHz$ , and the PMOS ones are even worse. As a result, high bias-currents are usually applied in those circuits operating at 2.624GHz, so that enough gain can be achieved. If a more advanced process technology

was used, i.e. with higher  $f_T$  and  $f_{osc}$  for transistors, the power consumption would be reduced.

The simulation results for the PLL with a normal VCO are similar to the one with QVCO. Its lock-in time is also approximately  $1\mu s$ , but its power consumption is much less. The normal VCO has less than half of the components, and so its power consumption is less than half of the QVCO. Besides, the QVCO has four output ports (0°, 90°, 180°, 270°), while the normal VCO has only two. The number of output buffers required by the normal VCO are halved in this case. Moreover, as described later in Chapter 5, the pulse generator for the 10.5GS/s DAQ and that for the 2.6GS/s DAQ are different. The 2.6GS/s DAQ has a smaller loading effect than the 10.5GS/s DAQ. So the VCO, which drives the 2.6GS/s DAQ requires smaller output buffers which will consume less power. According to the post-layout simulation results, the total power used by the PLL with normal VCO is 60mW (18mA current in 3.3V power supply).

## 4.7 Summary

This chapter presented the design details of the clock source of the DAQ system. As 10GHz is beyond the performance that AMS C35 process could deliver, a direct synthesis of a clock more than 10GHz is not achievable. By comparing two possible solutions to this issue, the idea of "PLL with QVCO" was selected, and so a 2.624GHz PLL with a QVCO is designed.

The PLL's output frequency (2.624GHz) was 32 times the 82MHz reference input. The oscillator inside the PLL was a QVCO, which was effectively 2 cross-coupled VCOs. The coupling made the phase between the output of VCOs fixed at 90°. Therefore the over-all output phases were 0°, 90°, 180°, and 270°. The effectively clock frequency was 4 times the actual frequency, i.e. 10.496GHz (or 10.24GHz).

To implement this PLL, a optimising method for fast CML Frequency Divider (FD) design was developed in Sub-Section 4.3.2. It was based on a piecewise

model of transistors in order to simplify the optimising analysis and calculation. With this method, an optimised FD design in AMS C35 process achieved an operating frequency of 5.5GHz in average. This is the fastest one reported so far in  $0.35\mu m$  CMOS processes.

# Chapter 5

# Pulse Generator

With the presented clock synthesiser, the PLL with QVCO described in Chapter 4, it is possible to provide the pulse signals to control the high-speed DAQ. This chapter presents the circuit which generates these pulse signals.

## 5.1 System requirement of the pulse generator

The main strategy for sampling the 82MHz reflected probe laser light is subsampling and repetitive sampling. To achieve the required 10GHz sampling rate, 128 samples, termed as 128 Target Samples, are taken evenly on the whole period of the input signal. The equivalent sampling rate is therefore

$$82MHz \times 128 = 10.5GSample/s$$

Each  $Target\ Sample$  is obtained by repeatedly sampling the input with an exactly 82MHz pulse signal. The electrical charge from the repetitive subsampling are stored on a holding capacitor so that a stable voltage can be presented for a slow-speed ADC to digitise the sample. After a  $Target\ Sample$  has been digitised, a delay of  $\frac{1}{128}$  of the signal period is inserted so that the next

Target Sample can be obtained. This process has to be performed 128 times to achieve all Target Samples. Figure 5.1 shows the flow chart of the whole sampling procedure. Details of this sampling strategy are presented in Chapter 7 on page 93.



Figure 5.1: Brief sampling procedure of the presented DAQ system (The T/128 delay in the diagram is not in proportion)

As a result, the pulse generator needs to provide control pulses on 82MHz. These pulses should be synchronised with the input signal, which can be easily achieved by the PLL-based clock synthesiser. The pulses also need to be so short that the frequency information of up to several gigahertz is not going to be lost during sampling. As the DAQ gets 128 samples for one full period, the pulse width should be in a similar magnitude as  $\frac{1}{128}T$ , where T is the input

signal period<sup>1</sup>. Moreover, the pulse generator should be flexible for inserting a  $\frac{1}{128}T$  delay.

# 5.2 Architecture and mechanism of the pulse generator

#### 5.2.1 Timing of control pulses for DAQ

As mentioned above, the required inserted delay is  $\frac{1}{128}T$ , and the sampling pulse width needs to be similar to that amount as well. On the other hand, the clock synthesiser presented in Chapter 4 generates a 2.624GHz clock, i.e. with a period of  $\frac{1}{32}T$ . This clock has 4 evenly-divided phase outputs,  $0^{\circ}$ ,  $90^{\circ}$ ,  $180^{\circ}$ , and  $270^{\circ}$ , and this 4-phase outputs can be exploited to provide the required  $\frac{1}{128}T$  delay and pulse width.



Figure 5.2: Timing of control pulse signals for 10.5GS/s DAQ

Figure 5.2 presents the timing of the control pulses for the 10.5GSample/s DAQ. As shown in the figure, all pulse signals have the pulse width of T/32, which is the same as one period of the 2.624GHz clock provided by the PLL. Signal Ap/An and Bp/Bn are two pairs of differential pulses driving the sample-and-hold amplifiers in the DAQ. The situation Ap>An is defined as the active

<sup>&</sup>lt;sup>1</sup>Detailed discussion about how the pulse width affects the frequency-information loss can be found in Sub-Section 8.2.1 on page 115.

status of the differential pulse pair Ap/An, and the similar definition applies to Bp/Bn as well. The activation of Bp/Bn are  $\frac{3}{128}T$  later than that of Ap/An, which is equivalent to 270° phase delay of the PLL clocks. If the rising/falling time of the signals is ignored, there is a short period of time,  $\frac{1}{128}T$ , when both Ap/An and Bp/Bn are active. This is where the sampling of the DAQ's input is performed. With such a short sampling time, the high-frequency information of the input is retained. Signal Cn is an assistant control pulse required by the DAQ, which transfers the sampled charge into the holding capacitor.

All five pulse signals, Ap, An, Bp, Bn, and Cn, have the same period of T in most cases. The only exception is when the DAQ changes the sampling position to the next  $Target\ Sample$ , in which case a delay of  $\frac{1}{128}T$  should be inserted to each of the five signals simultaneously (not shown in Figure 5.2).

Details of the pulse timing are presented in Section 7.7 on page 102.

#### 5.2.2 Pulse generator architecture

Figure 5.3 is the architecture of the pulse generator, which provides the control pulses in Figure 5.2.



Figure 5.3: Pulse Generator

The pulse generator provides the control signals (Ap, An, Bp, Bn, and Cn) for the Sub-Sampling SHA, which is the core module in the designed DAQ system. The pulse generator is based on the clock synthesiser presented in Chapter 4, i.e. the  $\times 32$  PLL with QVCO. The clock source provides the 2.624GHz output (32 times of the fundamental frequency 82MHz) at 4 different phases,  $0^{\circ}$ ,  $90^{\circ}$ ,  $180^{\circ}$ , and  $270^{\circ}$ . Therefore the highest harmonic presented by the clock is 128 times ( $32 \times 4$ ) the fundamental frequency.

The switch box and the 32/33 Frequency Divider (32/33 FD) are used to generate the  $\frac{1}{128}T$  delay. The output of the switch box,  $\phi_0$ ,  $\phi_1$ ,  $\phi_2$ , and  $\phi_3$ , are a reshuffle of the PLL output.  $\phi_0$  can be any of the 4 input phases, depending on the address lines  $A_0$  and  $A_1$ .  $\phi_1$  is always 90° later than  $\phi_0$ , and so is  $\phi_2$  to  $\phi_1$ ,  $\phi_3$  to  $\phi_2$ . The 32/33 FD operates in ÷32 mode in most cases, which generates a 82MHz signal synchronised to the reference signal. When it switches to ÷33 mode, a delay of  $\frac{1}{32}T$  is generated. The required  $\frac{1}{128}T$  delay is produced by the switch box and the 32/33 FD working accordingly.

The function of the low-frequency dividers, i.e. the  $\div 2N$  frequency divider  $(\div 2N \text{ FD})$  and the  $\div 2$  frequency divider  $(\div 2 \text{ FD})$  in lower-left corner of Figure 5.3, is to calculate the repetitive sampling times. When the sampling time is due, the address lines  $A_0$  and  $A_1$  change so that the configuration of the switch box changes. Edge Detector 2 transfers the falling edge of  $A_1$  to a short pulse. This pulse enables the 32/33 FD into  $\div 33 \text{ mode}$  for just 33 clock cycles, therefore a  $\frac{1}{32}T$  delay is generated.

The output pulses are generated by a Digital Delay Unit (DDU) and the edge detector before it (Edge Detector 1 in Figure 5.3). Edge Detector 1 transfers a rising edge to a pulse with the width of  $1/32f_0$ . This pulse is fed to DDU so that the control signals shown in Figure 5.2 are generated.

#### 5.2.3 Mechanism of pulse generator

The mechanism of how the presented pulse generator works can be considered as two related processes, as illustrated in Figure 5.4(a) and 5.4(b).



(a) Signal path of pulse generation



Figure 5.4: Control mechanism of the presented pulse generator

Figure 5.4(a) explains how the control pulses for the Sub-Sampling SHA are generated. This process has been briefly described in the previous sub-section.

The 2.624GHz clock from the PLL is divided by K. K can be either 32 or 33, which is implemented by the 32/22 FD. The divided signal ( $\sim 82MHz$  clock) is transferred to a T/32-wide pulse ( $\sim 82MHz$  pulse) by Edge Detector 1. Then the pulse is fed into DDU to generate the control signals required by the Sub-Sampling SHA. DDU is a sequential digital circuit, which is driven by the trigger clock from the Switch Box. The phase of the trigger clock is ph, which is determined by the address lines  $A_1$  and  $A_0$  in Figure  $5.3^2$ .

The other process, as shown in Figure 5.4(b), is operating simultaneously with the first process. This process modifies the parameter K and ph accordingly so that the required  $\frac{1}{128}T$  delay can be achieved. It is implemented by the low-frequency dividers, the address lines  $A_1$  and  $A_0$ , and Edge Detector 2.

The  $\frac{1}{128}T$  delay is equal to 90° phase of the 2.624GHz clock, whose full period is  $\frac{1}{32}T$ . Therefore the required time delay can be achieved by inserting a 90° phase delay to the clock.

The low-frequency dividers, i.e. the  $\div 2N$  and  $\div 2$  FDs in Figure 5.3, count the 82MHz synchronised output of the PLL. Each count means one set of control pulses (Ap, An, Bp, Bn and Cn) has been sent to the Sub-Sampling SHA, and a sampling operation has consequently been completed. The parameter N in Figure 5.4(b) is the same as that in Figure 5.1 on page 64, i.e. each Target Sample needs to be repetitively sampled for N times. As illustrated in 5.4(b), after every N count, the parameter ph increases 90°. In real circuits, this corresponds to the changing on  $A_1$  and  $A_0$ , which changes the configuration of the Switch Box.

When ph changes from 270° to 360°(0°), it effectively provides a 270° phase lead rather than a 90° phase delay. To recover this issue, an extra 360° delay is delivered by the 32/33 FD with the setting K=33. In real circuits, this is implemented by Edge Detector 2 when it detects the falling edge of  $A_1$  and then sends a pulse to the 32/33 FD. It sends a pulse rather than a stable enable

 $<sup>^2</sup>$ The detail of the relationship between ph and the address lines is described in Section 5.3 on the following page.

signal, so that K resumes to 32 in the next sampling operation. This is to ensure that the control pulses are still synchronised with the reference input.

#### 5.3 Switch box

The switch box generates the four Relative-Phase Clocks ( $\phi_0$ ,  $\phi_1$ ,  $\phi_2$  and  $\phi_3$  in Figure 5.3 on page 66) from the four Absolute-Phase Clocks, i.e. the four-phase outputs of the PLL, which are termed CK0, CK90, CK180 and CK270.  $\phi_1$  is always 90° later than  $\phi_0$ , so does  $\phi_2$  to  $\phi_1$ , and  $\phi_3$  to  $\phi_2$ . However, the source of  $\phi_0$  can be any one of the four Absolute-Phase Clocks.  $\phi_0 \sim \phi_3$  are the clock source of DDU, which is presented in Sub-Section 5.4. As the absolute phase of  $\phi_0$  has four options (any one of CK0, CK90, CK180, or CK270), so do the output pulses of DDU.

Table 5.1 shows the sources of  $\phi_0 \sim \phi_3$ . There are four different options, which are presented as *Clock Types* (Type 0, 1, 2 and 3). The circuit diagram is shown in Figure 5.5. The Clock Types are selected by the address lines  $A_1$  and  $A_0$ . The commonly-used CMOS transmission gates [44] are applied as the switches in this circuit.

| Clock  | $A_1A_0$ | $\phi_0$ | $\phi_1$ | $\phi_2$ | $\phi_3$ |
|--------|----------|----------|----------|----------|----------|
| Type 0 | 00       | CK0      | CK90     | CK180    | CK270    |
| Type 1 | 01       | CK90     | CK180    | CK270    | CK0      |
| Type 2 | 10       | CK180    | CK270    | CK0      | CK90     |
| Type 3 | 11       | CK270    | CK0      | CK90     | CK180    |

Table 5.1: Clock sources of Relative-Phase Clocks

The layout of Relative-Phase Clocks (and some part of the layout of Absolute-Phase Clocks) needs to be routed and buffered as identically as possible so that  $\phi_0 \sim \phi_3$  keeps identical phase differences. However, asymmetry is inevitable in the layout of Switch Box. This results in slight differences on the output pulses of DDU when the Clock Types are different. This effect is discussed in detail at Section 8.3 on page 120.





Figure 5.5: Circuit diagram of Switch Box

## 5.4 Digital Delay Unit and Edge Detector 1

The output pulses are generated by a Digital Delay Unit (DDU) and Edge Detector 1, whose circuit diagram is shown in Figure 5.6. This figure is only a sketch. Actually, these digital circuits are all differential logic devices, i.e. every signal and clock has an inverse counterpart. For example, Flip-Flop D1 is driven by  $\phi_0$  in the figure, but in reality, D1 is a differential D-Flip-Flop driven by a pair of clocks,  $\phi_0$  as the positive, and  $\phi_2$  as the negative. Ap and An are the inverse counterparts to each other, and so are Bp and Bn, Cpo and Cno,  $\phi_1$  and  $\phi_3$ .



Figure 5.6: Sketch of Edge Detector 1 and Digital Delay Unit

As shown in Figure 5.2 on page 65, the voltage levels of Cn are different to those of Ap, An, Bp and Bn. Consequently a differential-to-single-ended amplifier is applied to transfer Cpo/Cno, which have the same voltage level as Ap/An and Bp/Bn, to the required signal Cn.

#### Synchronising the output of 32/22 FD

The 32/33 FD is an asynchronous circuit, so its output phase can be any value. It is necessary to synchronise the output of the FD with the clock signals  $\phi_0 \sim \phi_3$ , otherwise the output pulses of the DDU may be triggered in a wrong order, although their relative phases are still correct. The synchonisation is achieved by the latch L1, L2 and the MUX (multiplexer).

To explain more clearly, the timing diagrams without and with synchronisation are shown in figures 5.7 and 5.8.

In Figure 5.7, it is assumed that L1, L2 and MUX are removed, and the output of the FD is directly sampled by D-Flip-Flop D1. For example, if the rising edge of the FD's output comes between the rising edges of CK90 and CK180, the output of D1 would be like those shown in Figure 5.7.



Figure 5.7: Edge detection without synchronising

In this condition, the rising edge of D1 at  $A_1A_0=10$  would be  $270^\circ$  earlier than that at  $A_1A_0=01$ , rather than  $90^\circ$  delay as expected. The output pulses of the DDU have a fixed delay time with the output of D1. Therefore when the address lines are changed from 01 to 10, there would not be a  $90^\circ$  delay  $(\frac{1}{128}T)$ , but a  $270^\circ$  lead.

On the other hand, the existence of L1, L2, and MUX ensures the pulses are triggered at the correct phases, as shown in Figure 5.8. When the address lines

are 00 or 01, the output from L1 is selected by MUX. While the address lines are 10 or 11, the output from L2 is selected. As the output from L2 is surely later than that from L1, the required 90° delay is guaranteed.



Figure 5.8: Edge detection with synchronising

#### Pulse generation

Figure 5.9 shows the generation of the output pulses in Edge Detector 1 and DDU. Flip-Flop D1, D2 and the AND gate transfers a rising edge to a  $\frac{1}{32}T$ -wide pulse. This pulse is fed to DDU so that the control signals shown in Figure 5.2 are generated: D3 is driven by  $\phi_0$ , while D4 is driven by  $\phi_3$ ; Therefore Bp/Bn is 270° later than Ap/An, i.e.  $\frac{3}{128}T$ ; Similarly, Cpo/Cno is 540° later than Bp/Bn ( $\frac{6}{128}T$ ). If considering the delay provided by the amplifier between Cn and Cpo/Cno, the phase delay between Cn and Bp/Bn is more than 540°. But according to the structure of the Sub-Sampling SHA described in Sub-Section 7.7 on page 102, the extra delay does not cause any serious issue to the system.

The edge detector and DDU contains digital circuits only. The output signals are aligned by the clocks  $\phi_0 \sim \phi_3$ . Therefore, the jitter caused by circuit delays can be significantly reduced.



Figure 5.9: Waveforms in Edge Detector 1 and Digital Delay Unit

# 5.5 32/33 Frequency divider (32/33 FD) and Edge Detector 2



Figure 5.10: 32/33 Frequency Divider

Figure 5.10 is the structure of the presented 32/33 FD. It contains four normal  $\div 2$  FDs, and a 2/3 FD ( $FD\_2or3$  in the figure), which can either divide the input frequency by 2 or by 3. The 4 normal  $\div 2$  FDs and the buffer are the same as those of the  $\div 32$  FD in the PLL, which is described in Sub-Section 4.3.1 on page 35 and Sub-Section 4.3.3 on page 48.

The 2/3 FD is based on a modular programmable FD family designed by Vaucher et al [45], and is tailored to our application. Figure 5.11 is the logic-block diagram of the 2/3 FD, and Figure 5.12 is the differential-logic implementation of the D-Flip-Flop and the AND gate.



Figure 5.11: 2/3 Frequency Divider



Figure 5.12: Differential logic implementation of D-FF with AND gate

To let the 32/33 FD perform a  $\div 33$  operation, an enabling pulse with the width of  $\frac{3}{32}T$  is required on the port Div3en. With this enabling pulse, the 2/3 FD takes 3 clock cycles to finish a logic loop, i.e. it needs 3 clock cycles to resume its initial logic state. When the pulse has gone, the 2/3 FD changes back to the divide-by-2 mode. Therefore the 32/33 FD takes 33 clock cycles in total to finish a logic loop, as one of the  $\div 2$  operation has been replaced by a  $\div 3$  one. Therefore an extra  $\frac{1}{32}T$  delay is generated.

The  $\frac{3}{32}T$ -wide enabling pulse is generated by Edge Detector 2, which is illustrated in Figure 5.13. This circuit is implemented in differential logic as well.



Figure 5.13: Edge Detector 2

## 5.6 Low-frequency dividers

The low-frequency dividers, i.e. the  $\div 2N$  FD and the  $\div 2$  FD in the lower-left corner of Figure 5.3 on page 66, are implemented with the standard CMOS-logic circuits, i.e. the D-Flip-Flops, T-Flip-Flops, and some combinational logic gates.

As mentioned in previous sections, N is the number of samples taken for a Target Sample before moving to the next. The value of N should be at least several hundreds so that a stable output can be obtained. It can be much larger, e.g. thousands or tens of thousands, so that more measurement noise can be reduced<sup>3</sup>. To provide more flexibility to the system, the  $\div N$  FD is implemented off-chip so that any desired values of N can be used during measurement, as shown in Figure 5.14.



Figure 5.14: Low frequency dividers



Figure 5.15: Layout of Pulse Generator for 10.5GS/s DAQ (1: QVCO; 2: PLL (except QVCO); 3: 32/33 FD; 4: Switch Box; 5: Edge Detector 1 and DDU)

## 5.7 Layout and simulation

Figure 5.15 is the layout of the presented pulse generator in Cadence<sup>4</sup>. Its size is approximately  $970\mu m \times 720\mu m$ . In post-layout simulation, it consumes about 92mA of current from a 3.3V power supply. More than half of the supply current (56mA) is used by the PLL with QVCO, as mentioned in Section 4.6 on page 59. The Switch Box, 32/33 FD, Edge Detector 1 and DDU are also power-hungry modules, as they all operate at a very high frequency, 2.624GHz.

Figure 5.16 shows the post-layout simulation of the generation of the pulses when the address lines  $A_1A_0$  are different. It only shows the pulse Ap, as other pulses (An, Bp, Bn and Cn) are similar.  $\overline{\text{FDout}}$  is the negative part of the differential output of the 32/33 FD. Its falling edge triggers a set of pulses, i.e. one pulse on each of Ap, An, Bp, Bn, and Cn in the proper timing. The pulses of Ap are compared with the Absolute-Phase Clock, CK0, i.e. the 0° output of the QVCO. As shown in the figure, the pulse delays 90° every time  $A_1A_0$  is increased by 1.

# 5.8 Design of Pulse Generator for 2.6GS/s DAQ

As mentioned in 7.7 on page 102, a 2.624GSample/s DAQ is also designed as a conservative trial for the presented technologies in this thesis. The sampling strategy for this DAQ is almost the same as that of the 10.5GS/s DAQ, except that it needs 32 Target Samples for one full period of the input signal only. Consequently the additional delay for switching Target Samples is  $\frac{1}{32}T$ .

Figure 5.17 is the required timing of the control pulses for this DAQ. It is similar to that for the 10.5GS/s DAQ, except that the Bp/Bn is in  $180^{\circ}$  phase delay to Ap/An. Consequently the pulse generator in this case does not need a 4-phase clock. The differential output (i.e.  $0^{\circ}$  and  $180^{\circ}$ ) from a normal VCO will be sufficient. Details of the pulse timing are presented in Section 7.7 on page 102.

<sup>&</sup>lt;sup>3</sup>Please see Chapter 9 on page 139 for more details of noise reduction

<sup>&</sup>lt;sup>4</sup>Some sub-circuits are not labelled on the figure as they are too small.



Figure 5.16: Pulse Ap under different Switch Box configurations



Figure 5.17: Timing of control pulse signals for 2.6GS/s DAQ

The architecture of the pulse generator for the 2.6GS/s DAQ is shown in Figure 5.18. Compared to that for the 10.5GS/s DAQ in Figure 5.3 on page 66, this pulse generator has no switch box or address lines. Its working mechanism is also simpler than that of the previous one: once the N times of repetitive sampling have finished, the falling edge of the  $\div N$  FD output triggers an enabling pulse at Edge Detector 2. This pulse makes the 32/33 FD perform a  $\div 33$  operation, therefore the required  $\frac{1}{32}T$  delay is generated.



Figure 5.18: Pulse Generator for 2.6GS/s DAQ



Figure 5.19: Edge Detector 1 and Digital Delay Unit for 2.6GS/s DAQ

32/33 FD and Edge Detector 2 are the same as those described in Section 5.5 on page 75, and Section 5.6 on page 77, respectively. Edge Detector 1A and DDU2 in this pulse generator are shown in Figure 5.19.

Figure 5.20 is the layout of the pulse generator in Cadence<sup>5</sup>. Its size is approximately  $650\mu m \times 730\mu m$ . In post-layout simulation, it consumes about 44mA from a 3.3V power supply, 18mA of which is for the 2.624GHz PLL. The power dissipation is much less than that of the pulse generator of 10.5GS/s DAQ, as it has a less number of high-frequency sub-modules.



Figure 5.20: Layout of Pulse Generator for 2.6GS/s DAQ (1: PLL; 2: 32/33 FD; 3: Edge Detector 1A and DDU2)

# 5.9 Summary

This chapter presented the design of Pulse Generator (PG), which provides the pulse signals to control the high-speed DAQ.

The PG's clock source was the 2.624GHz PLL with QVCO, which was presented in Chapter 4. The pulses was generated by a digital circuit, DDU (Digital Delay Unit). It used the 4-phase output from the PLL as the trigger clocks. Therefore the jitter of the control pulses was minimized as the pulses were aligned with the PLL.

The PG had a 32/33 dual-mode frequency divider, and a switch box which can

<sup>&</sup>lt;sup>5</sup>Some sub-circuits are not labeled on the figure as they are too small.

re-shuffle the 4-phase clocks. These two sub-modules were used to generate a short delay, which was only  $\frac{1}{128}$  of the fundamental period (i.e. 95ps for 82MHz reference, or 98ps for 80MHz reference). This delay was required by the sampler to shift the acquired samples one by one on the output port. To generate the  $\frac{1}{128}T$  delay, the switch box re-shuffles the 4-phase clock so that a  $90^{\circ}$  delay is provided for the 2.624GHz (or 2.56GHz) clock.

# Part III

Sub-sampling SHA

Given the clock source presented in Part II, it is now possible to design the sampling circuit for the DAQ. Part III presents the core circuit of the DAQ system for OSAM, an ultra-fast sub-sampling Sample-and-Hold Amplifier (SHA). A charge-domain sampling strategy and double differential switches are used in this circuit to significantly improve the sampling speed. The periodicity of the system input is exploited in repetitive sampling to reduce the noise. Two circuits are implemented in a standard  $0.35\mu m$  CMOS process, one has an equivalent sampling rate of 2.6GS/s, and the other achieves 10.5GS/s.

The background introduction is given in Chapter 6. The design details are presented in Chapter 7 (the core circuit) and Chapter 8 (the peripheral circuits), while the noise analysis is discussed in Chapter 9.

# Chapter 6

# Introduction to

# Sample-and-Hold Amplifier

This chapter introduces the background theories related to Part III, including sampling-and-hold amplifiers, sub-sampling, and switched-capacitor filters.

# 6.1 Sample-and-Hold Amplifier (SHA)

Sample-and-Hold Amplifiers (SHA), sometimes called Track-and-Hold Amplifiers, are usually employed in multi-step Analogue-to-Digital Converters (ADC)[46]. The SHA takes and holds the voltage samples from the input signal, in order to ensure that the core circuit of the ADC has enough time to digitise the sample, and is not affected by the time-varying input.

Figure 6.1 sketches two basic SHA techniques [46]. Figure 6.1(a) is the parallel sampling, which is a direct and simple sampling method. Figure 6.1(b) is the series sampling. During the sampling phase, the switches  $S_1$  and  $S_3$  are on, and  $S_2$  is off. In the hold phase,  $S_1$  and  $S_3$  are off, but  $S_2$  is on. Therefore the right terminal of the capacitor  $C_H$  is released from the reference  $V_{ref}$ , and



Figure 6.1: Basic SHA techniques

the left terminal is short to the ground. Thus the voltage drop on the input of the amplifier is equal to the input  $V_{in}$ . If the amplifier has a unit-gain,  $V_{out} = V_{ref} - V_{in}$ .

One advantage of the series sampling is that the DC level of the amplifier is isolated from the input, while in the parallel sampling the input is DC-coupled with the amplifier. However, the series sampling is usually slower that the parallel sampling. This is because  $V_{out}$  is reset to  $V_{ref}$  during the sampling phase, and has to settle to the target voltage from  $V_{ref}$  in every holding phase. On the other hand,  $V_{out}$  in the parallel sampling needs not reset, therefore the settling time is shorter [46].

In CMOS analogue circuits, the switches are usually implemented by MOS transistors. When the transistors switch off, a phenomenon called "channel charge injection" occurs [47]. This is because when a transistor is on, there is a certain amount of electrical charge in the conducting channel. When the transistor turns off, this channel disappears, and the charge originally in the channel is released to the circuit via the source and drain terminals. In a SHA, some of this charge is injected into the sampling capacitor  $C_H$ , causing measurement errors.

A frequently-used method to cancel the "channel charge injection" is to use differential architectures [47, 48, 49]. As the differential circuits are symmetric, the two input branches encounter the same amount of charge injection. Therefore it can be cancelled as a common-mode noise.

## 6.2 Sub-sampling

Sub-sampling means sampling the input at a slower rate than a conventional DAQ system. According to Nyquist law, if the sampling rate is more than twice the bandwidth of the input, information contained by the input signal will not be lost. Therefore it is possible to sample a high-frequency narrow-band signal by a much slower rate, as long as the rate is more than twice the signal bandwidth.

Sub-Sampling SHA is a device which operates using this principle [23, 48, 50, 51, 52]. Its basic principle is shown in Figure 6.2. The sampling rate of the SHA is  $f_s$ , which is much lower than the signal, but in frequency domain, the sampling pulse has significant harmonics, one of which  $(Nf_s)$  will be near the signal. This harmonic mixes with the input signal and down-converts it to baseband. The information inside the input signal is not lost, but is moved to a low frequency.



Figure 6.2: Sub-sampling in frequency domain

Figure 6.3 illustrates a straightforward example of sub-sampling in time-domain. The high-frequency sine wave (the continuous line) is sub-sampled at the circle dots. The output signal (the dashed line) forms a low-frequency sine wave.

As illustrated above, the Sub-Sampling SHA samples the high-frequency signal at a relatively slow rate, and provides output at low frequency. Effectively, it



Figure 6.3: Sub-sampling in time domain

demodulates a base-band signal from its RF-band carrier. So sometimes Sub-Sampling SHA is also termed Sub-Sampling Mixer [23, 48, 51]. This type of mixer can be used for signal demodulation, which significantly simplifies the system architecture [48, 50, 52].

However, the application of Sub-Sampling SHA is restricted by its noise performance. Sub-sampling systems usually have an extremely poor noise figure (e.g. 30dB [23]), caused by a phenomenon called noise folding. As shown in Figure 6.4, the sampling pulse has a large number of harmonics in the frequency domain. Every harmonic mixes the noise around its frequency down to the baseband. Therefore the noise power folds up at the base-band, and significantly degrades the SNR.

## 6.3 Switched-capacitor filter

The front-end of the Sub-Sampling SHA is very similar to a switched-capacitor filter. To fully understand the operation of the SHA, it is necessary to have an understanding of the switched-capacitor filter.

Switched-capacitor circuits are widely applied in low-speed filters, comparators, ADCs, and DACs [47]. Especially, the switched-capacitor filters provide more



Figure 6.4: Noise folding in Sub-sampling Mixer

flexibility to filter design, when compared to the conventional filters. The basic idea of the switched-capacitor filter is to replace the resistors with switches and capacitors [53].



Figure 6.5: Switched-capacitor as a resistor

Figure 6.5 is an equivalent resistor built by the switched-capacitor. The switches SW1 and SW2 are turned on and off repetitively at a high frequency  $f_{sw}$ . However, they are turned on at different phases of the switching period ( $\phi_1$  and  $\phi_2$  respectively), and do not overlap.

If  $V_1 \neq V_2$ , the electrical charge transfers between the two terminals. Assuming the capacitor is fully charged every time a switch turns on, the charge transferred from  $V_1$  to  $V_2$  at each switching period is  $C(V_1 - V_2)$ . The transferred charge in a unit time is the current. As the switching frequency is  $f_{sw}$ , the current in this circuit is

$$I = f_{sw}C(V_1 - V_2)$$

So this circuit is equivalent to a resistor

$$R_{eff} = \frac{V_1 - V_2}{I} = \frac{1}{f_{sw}C}$$

With this virtual resistor, a 1st-order RC low-pass filter can be formed as shown in Figure 6.6. Its corner frequency is  $f_c = \frac{1}{2\pi R_{eff}C_2} = \frac{f_{sw}C_1}{2\pi C_2}$ .



Figure 6.6: 1st-order switched-capacitor low-pass filter

The advantage of this filter includes the ability for its corner frequency to be freely adjusted by changing  $f_{sw}$ , therefore a programmable filter can be established. Moreover, the accuracy of the filter depends on the capacitance ratio  $\frac{C_1}{C_2}$ , which is much easier to guarantee than the resistance in CMOS processes [47].

On the other hand, one disadvantage of switched-capacitor filters is that the frequency of the input signal must be significantly smaller than  $f_{sw}$ , e.g. only one twentieth of it [53]. Otherwise, all equations mentioned above are not valid any more. Therefore the applications of switched-capacitor filters are usually restricted in the low-frequency range.

Another drawback of switched-capacitor filters is that it introduces two extra noise sources into the output signals, channel charge injection and clock feed-through [47]. The channel charge injection is similar to the same issue in SHA. The clock feed-through means that the clock signals are coupled into the output via the parasitic capacitors of the transistor switches.

### 6.4 Summary

This chapter introduces the background theories related to Part III, including Sampling-and-Hold Amplifiers (SHA), sub-sampling, and switched-capacitor filters. SHA is the basic architecture used in the sampling circuit of the designed DAQ, while sub-sampling is the strategy to sample a high-frequency narrow-band signal with a relatively slower sampling rate. The theory of switch-capacitor filters would be used to characterise the frequency response of the designed sampler in the later chapters.

## Chapter 7

## Design of Sub-sampling SHA

#### 7.1 System requirement of the SHA

As mentioned in Part I, this SHA is used in the Data-Acquisition (DAQ) system for Optical Scanning Acoustic Microscopy (OSAM) [4, 5]. OSAM uses laser pulses to generate surface acoustic waves on a material surface, then another laser ("the probe") to detect the vibrations of the material, which reveals its properties and hidden faults.

Since the pulse repetitive frequency of the stimulation laser is 82MHz, the input of the DAQ, i.e. the reflection of the probe laser, has a fundamental frequency of 82MHz. Previous work by Sharples et al [4] showed that the harmonics of this input are present at up to at least several gigahertz. Therefore a DAQ with a sampling rate as high as possible is required. Although a SHA with a gigahertz sampling rate is difficult to implement in a standard CMOS process, a sub-sampling SHA with an equivalent sampling rate can be achieved.

The architecture of the DAQ system for OSAM, which has been briefly described in Chapter 2 on page 7, is shown in Figure 7.1. The function of the Sub-Sampling SHA is to sample the voltage signal provided by the TIA and the LPF, then



Figure 7.1: Architecture of DAQ system for OSAM

to present the sampled value at a low frequency, so that the low-speed ADC can digitise it. Its control signals are given by the pulse generator, which is presented in Chapter 5 on page 63.

### 7.2 Sub-sampling for periodical signal

As the input is a periodic signal with a fundamental frequency of 82MHz [4], the sampling rate of the mixer can be set to a value near 82MHZ.

According to the theory of Fourier Transform, the periodic input signal  $V_{in}$  can be represented by the sum of a series of sinusoidal waves, i.e.

$$V_{in}(t) = \sum_{n} A_n \cos(2\pi n f_0 t + \phi_n)$$

where n is an integer,  $f_0$  is the fundamental frequency (82MHz),  $A_n$  and  $\phi_n$  are the amplitude and phase of different harmonics of the input. The example diagrams of  $V_{in}$  in frequency and time domain are shown in the first lines of Figure 7.2 on the next page and Figure 7.3 on page 96, respectively.

Similarly, an ideal sampling pulse series  $\mathcal{V}_p$  can be represented as

$$V_p(t) = \sum_{k} \delta(t - \frac{k}{f_0 - \Delta f}) = \sum_{m} \cos(2\pi m (f_0 - \Delta f)t)$$
 (7.1)

where k and m are integers,  $\Delta f$  is the difference between the fundamental frequency and the sampling rate, and  $\delta(t)$  is the impulse function.  $\sum_{k} \delta(t-t) \delta(t) = 0$ 

 $\frac{k}{f_0 - \Delta f}$ ) represents the periodic pulses  $V_p$  in time domain, while  $\sum_m \cos(2\pi m (f_0 - \Delta f)t)$  represents it by the sum of its harmonics. The diagram of  $V_p$  in frequency and time domain are shown in the second lines of Figure 7.2 and Figure 7.3 on the following page, respectively.

If the mixer is ideal, its output is the product of  $V_{in}$  and  $V_p$ , i.e.

$$V_{out} = V_{in}(t)V_p(t)$$

$$= \sum_{n,m} \frac{1}{2} A_n (\cos(2\pi((m+n)f_0 - m\Delta f)t + \phi_n) + \cos(2\pi((m-n)f_0 + m\Delta f)t + \phi_n))$$

If considering the base-band part  $(V_{base})$  only, i.e. only the terms with n=m are included,

$$V_{base} = \sum_{n=m} A_n \cos(2\pi((m-n)f_0 + m\Delta f)t + \phi_n))$$
$$= \sum_n A_n \cos(2\pi n\Delta f t + \phi_n)$$

Thus the information contained in the input signal is represented at a much lower fundamental frequency,  $\Delta f$ .



Figure 7.2: Sub-sampling for periodical signal

Figure 7.2 illustrates the effect of a sub-sampling mixer in frequency domain. The Fourier Transform of the input signal produces a series of spikes at the integer multiples of 82MHz, while the ideal sub-sampling pulses have a fundamental frequency very close to 82MHz. In the output of the mixer, all the frequency information is represented in the base-band.



Figure 7.3: Sub-sampling for periodical signal in time domain

In time domain, sub-sampling of a periodical signal is shown in Figure 7.3. The sampling pulses are slightly slower than the frequency of the input signal. The output of the mixer is similar to a normal SHA, whose sampling rate is more than twice that of the signal bandwidth. The only difference is that the mixer's output is much slower than that in a normal SHA, i.e. the input signal is represented in a low frequency so that it is possible to be fed into a low-speed low-cost ADC.

## 7.3 Charge-domain sampling

Even though the frequencies have been down converted to the base band, there is a requirement for its sampling operation at full speed. A major obstacle to increasing the sampling speed is the charging time of the sampling capacitor. The capacitor must be charged so that its voltage is equal to that of the input. Since the TIA has a finite output resistance, the sampling capacitor and the TIA form a low-pass filter, which sets the lower limit for the sampling time.

However, in the presented design, as shown in the conceptional diagram (Figure

7.4), a Trans-Conductance Amplifier (TCA) is inserted between the input and the sampling capacitor. Ideally, the TCA provides a current proportional to the input voltage. If the switch turns on for a time which is short relative to the changes of the input voltage, the change in voltage on the capacitor will be proportional to the output current of the TCA, and hence the input voltage. This charge-domain sampling structure was first introduced by S. Karvonen et al to reduce noise in 2001 [54].



Figure 7.4: Charge-domain sampling

The advantage of this structure is that there is no settling time required for the voltage on the sampling capacitor. The output voltage of a sample,  $V_{out}$ , is

$$V_{out} = V_{in}g_t t_{sw}/C_{smp}$$

while  $V_{in}$  is the input voltage,  $g_t$  is the gain of the TCA,  $t_{sw}$  is the total time for which the switch is on, and  $C_{smp}$  is the sampling capacitance.  $V_{out}$  is always proportional to  $V_{in}$  no matter the value of  $t_{sw}$ , while in a traditional SHA  $t_{sw}$  must be long enough to fully charge the sampling capacitor. This circuit topology provides a method of sampling the input in a very short turn-on time  $(t_{sw})$ . The speed limitation is now determined by the switching speed of the transistors.

The disadvantage of this circuit topology is that the output is not a "real" sample of the input, but an integration of the input over a total time of  $t_{sw}$ , i.e.

$$V_{out}(T_s) = \int_{T_s - \frac{t_{sw}}{2}}^{T_s + \frac{t_{sw}}{2}} \frac{V_{in}g_t}{C_{smp}} dt$$

where  $T_s$  is the moment when the sampling is performed. The integration effect results in a sinc-type low-pass filter [52, 54], with the frequency response H(f) given by

$$H(f) = \frac{V_{out}(f)}{V_{in}(f)} = \frac{g_t}{C_{smp}\pi f}\sin(\pi t_{sw}f)$$
(7.2)

To recover the signal, a compensating filter should follow the sampler. A detailed discussion of H(f) and the filter will be given in Section 8.2 on page 115.

## 7.4 Double Differential Switch (DDS)

According to Equation (7.2), a smaller  $t_{sw}$  gives a better frequency response. Since differential switches are much quicker than MOS transistor switches, they are used in the presented circuit. To achieve an even better performance (smaller  $t_{sw}$ ), two differential switches (Double Differential Switch, DDS) are used in each SHA, as shown in Figure 7.5(a). Figure 7.5(b) gives the waveforms of the control signals of the switches. Transistor MN1, which acts as the TCA in Figure 7.4, provides a charging current to  $C_{smp}$  only when both  $A_p > A_n$  and  $B_p > B_n$ . These two switches are equivalent to one switch controlled by a shorter virtual pulse as shown in Figure 7.5(b). The generation of the switch signals  $(A_p/A_n$  and  $B_p/B_n)$  is described in Chapter 5 on page 63.



Figure 7.5: SHA with Double Differential Switch

The non-ideal nature of the differential switches cannot be ignored, and its effects will be discussed in detail in Sub-Section 8.2.2 on page 117.

The value of  $R_b$  in Figure 7.5(a) needs to be carefully chosen. It should be

sufficiently large so that when DDS is on, the current through  $R_b$  is significantly smaller than the charging current of  $C_{smp}$ . On the other hand, the DDS is off,  $R_b$  must discharge  $C_{smp}$  before the next clock pulse. This therefore sets a limit on  $R_b$ . In the presented Sub-Sampling SHA, the sampling period (1/82MHz = 12.2ns) is significantly longer than the turn-on time of the switches (less than 0.2ns). So there is enough time for  $R_b$  to discharge  $C_{smp}$  before the next sampling pulse comes.

#### 7.5 Repetitive sampling

As discussed in Section 6.2 on page 88, Sub-sampling SHA suffers from a poor noise figure [23], because noise near every harmonic of the sampling pulses are mixed down to the base-band. But in the DAQ system of OSAM, the input signal is periodical, and this can be exploited to overcome this drawback.



Figure 7.6: Repetitive sampling strategy

The basic strategy is illustrated in Figure 7.6. The sampling rate is exactly equal to the fundamental frequency of the input (82MHz), rather than one very near 82MHz. Therefore the input is always sampled at the same phase at every period. If a number of samples (usually at least hundreds of samples)

are taken and averaged to obtain the voltage value at that particular phase, the error due to noise can be significantly reduced. After that, a slight delay is applied to the sampling pulses, so that the input voltage at another phase can be measured. This procedure is invoked repeatedly until the full period of the input has all been measured. The total points measured for a full period determines the equivalent sampling rate as a normal SHA, i.e. if the procedure mentioned above is invoked N times to measure one period of the input, N final samples are given, and the sub-sampling SHA is equivalent to a normal SHA with a sampling rate of  $Nf_0$ .

The circuit implementing this strategy, which is the core circuit of the presented Sub-Sampling SHA, is shown in Figure 7.7. The holding capacitor,  $C_{hld}$ , is much larger than the sampling capacitor  $C_{smp}$  (more than 20 times in the presented circuit). Moments after each sampling, a transistor MP1 is switched on, and charge is transferred from  $C_{smp}$  to  $C_{hld}$ . After enough sampling periods (at least hundreds of periods in the presented circuit), the voltage on  $C_{hld}$  will be equal to that on  $C_{smp}$ .



Figure 7.7: Structure of proposed sub-sampling SHA

Effectively, the output voltage  $V_{out}$  is not the mathematical average of the voltage samples on  $C_{smp}$ . The samples taken later has more effect on  $V_{out}$  than those taken earlier. The circuit is more like a low-pass filter than an averager, and also able to significantly reduce the noise. The detail of its low-pass filter

effect and noise reducing is discussed in Sub-Section 9.2.1 on page 140.

Figure 7.8 is the flow chart of the operating procedure of the presented Sub-Sampling SHA, in terms of time sequence. The timing control of the sampling pulses is described in Chapter 5 on page 63.



Figure 7.8: Operating procedure of the Sub-Sampling SHA (The T/128 delay in the diagram is not in proportion)

## 7.6 Terminologies

The presented Sub-Sampling SHA, as shown in Figure 7.7, is the core of the DAQ system for OSAM. Most of the other circuits in this system are constructed based on its characteristics. So it will be frequently referred to in the subsequent

chapters. To avoid confusion and misunderstanding, several terms are defined as follows. These definitions are also illustrated in Figure 7.8.

Front-End Sample is the effective voltage on  $C_{smp}$  in Figure 7.7 during sampling, i.e. the voltage value on  $C_{smp}$  just before MP1 switches on. In every sampling period (1/82MHz=12.2ns), the Sub-Sampling SHA gets one Front-End Sample.

**Holding Sample** is the voltage value on  $C_{hld}$  in Figure 7.7. Only when *Holding Sample* is equal to *Front-End Sample*, the former is valid.

Target Sample is the final N points needed to represent the full period of the input signal (details in Section 7.5 on page 99). For example, to sample the 82MHz input of the DAQ at more than 10GS/s, 128 samples of one period are needed ( $82MHz \times 128 = 10.5GHz$ ). Each one of the 128 samples is a Target Sample. Normally, one Holding Sample gives one Target Sample. But it is also feasible to obtain many Holding Samples for the same Target Sample, which could allow the noise to be reduced by averaging.

Some related terms which appear in later chapters are best defined here.

Linearised Holding Sample is the *Holding Sample* which has a linear relationship with the input signal. It is the output of a LFA (Linearising Feedback Amplifier) connecting with the Sub-Sampling SHA (Details in Sub-Section 8.1.2 on page 107)

Presenting Time of a Sample is the total time of one *Linearised Holding*Sample is presented at the output of LFA (Details in Section 9.3 on page 142).

## 7.7 Implementation of Sub-Sampling SHA

Two Sub-Sampling SHAs have been implemented in the DAQ system of O-SAM.

The first one provides 128 Target Samples over one period of the 82MHz input signal. This is equivalent to a 10.5GHz sampling rate. This Sub-Sampling SHA is used to achieve the system requirement of DAQ for O-SAM. If ignoring circuit delay and rise/fall times, the total switch-on time of DDS (i.e. the pulse width of the "Virtual Pulse" in Figure 7.5(b) on page 98) is 95.3ps ( $\frac{1}{128}$  of one period). It needs the 2.624GHz PLL with quadrature outputs (as presented in Chapter 4 on page 27) to generate the control signals for the switches.



Figure 7.9: Timing of switch control signals for 10.5GHz Sub-Sampling SHA

Figure 7.9 shows the timing of control signals, i.e. Ap, An, Bp, Bn and Cn. All control signals have a period of  $T=1/f_0=12.2ns$ . According to Section 7.4 on page 98, the sampler charges  $C_{smp}$  (as shown in the circuit diagram of Figure 7.7 on page 100) when both Ap > An and Bp > Bn. Thus the charging time in Figure 7.9 is T/128 (i.e. 95.3ps).

Cn, which turns on the PMOS transistor switch between  $C_{smp}$  and  $C_{hld}$ , is activated slightly later than when Bp/Bn turns off (delay time= T/64, i.e. 191ps). Ideally, Cn can turn on MP1 earlier, e.g. at the same time when Bp/Bn turn off, or even when Ap/An turn off. However, switches are usually noisy at the time turning on and off. A short delay such as T/64 can minimize the number of switching noise sources to just one, MP1.

The generation of Ap, An, Bp, Bn, and Cn is presented in Chapter 5 on page 63.

The second Sub-Sampling SHA gives 32 Target Samples to one period of the



Figure 7.10: Timing of switch control signals for 2.6GHz Sub-Sampling SHA

input, which is equivalent to 2.6GHz sampling rate. Its switch-on time is 191ps ( $\frac{1}{64}$  of one period). It requires a normal 2.6GHz PLL to generate the control signals. The timing of the control signals is shown in Figure 7.10, which has a similar mechanism as those in the 10.5GHz Sub-Sampling SHA. As the switch-on time is different to that in 10.5GHz SHA, the values of  $C_{smp}$  and  $R_b$  in Figure 7.7 are different.

The second Sub-Sampling SHA uses more conservative techniques for circuit designing, in order to ensure a more reliable, less noisy, and power-saving circuit. It is different to the previous one, which applies radical strategies to achieve a high sampling speed. Table 7.1 summarises the features of these two circuits.

|                             | 10.5GHz Sampler                    | 2.6GHz Sampler       |
|-----------------------------|------------------------------------|----------------------|
| No. of Target Samples       | 128 (one per $95.3ps$ )            | 32  (one per  381ps) |
| Switch-on time for sampling | 95.3ps                             | 191ps                |
| Clock source                | 2.6GHz PLL with quadrature outputs | Normal 2.6GHz PLL    |

Table 7.1: Implementations of proposed Sub-Sampling SHA

In the following chapters, these two Sub-Sampling SHAs will be mentioned frequently. If the configuration for these two are different, it will be described respectively. On the other hand, if neither of them are named, it means the same configuration will be applied to both two Sub-Sampling SHAs.

### 7.8 Summary

This chapter presented the design details of the core circuit of the Sample-and-Hold Amplifier (SHA). It used the sub-sampling method to obtain high-frequency information at a relatively slow sampling rate. The charge-domain sampling strategy and double differential switches were used in this circuit to significantly shorten the effective sampling pulse, so that the high-frequency information would not lost during the sampling. The periodicity of the system input was exploited in repetitive sampling to reduce the noise. The presented sampler obtained 128 samples for the whole period of the input signal, which was equivalent to a sampling rate of  $82MHz \times 128 = 10.496GSample/s$ .

## Chapter 8

## Measurement Errors and

## **Correcting Circuits**

As mentioned in Chapter 7, there are a few non-ideal properties in the presented sub-sampling SHA, which are not ignorable. This chapter discusses these properties, and presents several circuits to correspondingly undo their effects.

# 8.1 Non-linearity output and Linearising Feedback Amplifier

#### 8.1.1 Non-linearity of presented Sub-Sampling SHA

As shown in Figure 7.5 on page 98, the capacitor  $C_{smp}$ 's initial voltage before sampling is the highest voltage in the circuit, Vdd. However, the value of a Front-End Sample (and also a Holding Sample) should be well below Vdd so that it is within the operating range of the following circuits, including amplifiers, filters, etc. Empirically, the difference between Vdd and the Front-End Sample should be more than the threshold voltage of PMOS transistors (about 0.65V in

AMS C35 process), which would enable the PMOS transistors to operate. On the other hand, a large difference between Vdd and a Front-End Sample means more electric charge on  $C_{smp}$ , and consequently more robustness to noise.

Therefore, the voltage swing on  $C_{smp}$  is too large for the transistor MN1 in Figure 7.5 on page 98 to be operating in the small-signal mode, and the operating condition of MN1 should be considered as in a large-signal manner (i.e. the Front-End Sample is not linearly related to the input signal). As the Holding Sample is equal to the Front-End Sample when it is valid, it is not linearly related to the input either.

#### 8.1.2 Linearising Feedback Amplifier (LFA)

To solve this issue, a Linearising Feedback Amplifier (LFA) is presented as shown in Figure 8.1. On the top-left quarter part of the figure, an Input Sampler, which is the core circuit of the Sub-Sampling SHA described in Chapter 7, samples the input signal and provides a *Holding Sample*. This *Holding Sample* is fed to the negative input terminal of a high-gain amplifier (the Buffer in Figure 8.1). The output of the Buffer is fed back to a clone of the Input Sampler, the Feedback Sampler. The *Holding Sample* of the Feedback Sampler is connected to the positive input terminal of the Buffer.

This structure is similar to a Operational-Amplifier (OpAmp)-based source follower, except that two Sub-Sampling SHA are inserted. It should be noted that the input is connected to the positive input terminal of the OpAmp in the case of a normal source follower. But it is connected to the other terminal in Figure 8.1, because the Sub-Sampling SHA has a negative gain. So the input is effectively connected to a positive terminal of the entire "amplifier" including the SHAs and the Buffer.

Similar to a OpAmp-based source-follower, the *Holding Samples* of both samplers are almost equal as long as the Buffer is in its linear operating region. Therefore the output of the Buffer is almost equal to the voltage of the input.



Figure 8.1: Linearising Feedback Amplifier

More precisely, the output of the Buffer is almost equal to the average voltage of the input during when the *Front-End Sample* is being acquired.

Therefore the output of the Buffer keeps a significantly linear relationship with the input. The output voltage here can be termed as *Linearised Holding Sample*.

#### 8.1.3 Low-bandwidth buffer for avoiding oscillation

As Sub-Sampling SHA samples the input in a finite rate  $f_0 = 82MHz$ , there is delay between the input and Front-End Sample. Hence Holding Sample of the Feedback Sampler in Figure 8.1 lags behind the Buffer's output. If the Buffer responds to its input too quickly, instability could occur and the circuit will oscillate.

For example, assume that the Buffer has a wide bandwidth (quick response) and a high gain. If *Holding Sample* of the Input Sampler is slightly higher than that of the Feedback Sampler, the Buffer will instantly give a large negative output

voltage. When the Feedback Sampler acquires this voltage at the next sampling pulse, the *Holding Sample* of the Feedback Sampler may be higher than that of the Input Sampler. Consequently, the Buffer gives a large positive output voltage. Similarly, an opposite situation will happen in the next sampling cycle, and so on. The oscillation frequency is half the sampling rate.



Figure 8.2: Feedback loop in LFA

To inspect this issue quantitatively, a block diagram only with the feedback loop, as shown in Figure 8.2, is investigated. As the Feedback Sampler operates in discrete time points, it is characterized in z-domain  $(H_{SHA}(Z))$ . On the other hand, the Buffer is an analogue amplifier, and therefore characterized in s-domain (G(s)). If this circuit is stable, so is the LFA.

#### Feedback Sampler

As mentioned in Section 7.3, the charge-domain sampler (MN2 and  $C_{smp}$  in Figure 8.2) has a frequency response  $H_c(f)$  that (based on Equation (7.2))

$$H_c(f) = \frac{g_t}{C_{smp}\pi f}\sin(\pi t_{sw}f)$$

where  $g_t$  is the small-signal trans-conductance of MN2 in this case. The feedback loop operates in base-band frequency (less than 1MHz), which is significantly lower than  $1/t_{sw}$ , the reciprocal of the switch-on time of the differential switches (at least several gigahertz). Therefore  $H_c(f)$  can be approximated to

$$H_c = \frac{g_t}{C_{smp}} \tag{8.1}$$

which is a constant.

Every time MP2 in Figure 8.2 is switched on, the voltages on  $C_{smp}$  and  $C_{hld}$  are averaged. As the total amount of electrical charge is unchanged,

$$zV_{hld}(C_{smp} + C_{hld}) = V_{smp}C_{smp} + V_{hld}C_{hld}$$

where  $V_{smp}$  and  $V_{hld}$  are the initial voltages on the corresponding capacitors before MP2 switches on, and  $zV_{hld}$  is the final voltage after MP2 switches on. Therefore the transfer function of the Feedback Sampler in z-domain is

$$H_{SHA}(z) = \frac{V_{hld}}{V_{in}} = \frac{H_c C_{smp} z^{-1}}{C_{smp} + C_{hld} - C_{hld} z^{-1}}$$
(8.2)

where  $H_c$  is defined in Equation (8.1).

#### Stability analysis with a high-gain high-bandwidth Buffer

In base-band frequency range, the ideal high-gain high-bandwidth Buffer can be assumed to have a frequency-independent, constant high gain, G. According to Figure 8.2 and Equation (8.2),

$$\frac{V_o}{V_i} = \frac{-G}{1 + GH_{SHA}(z)} 
= \frac{-G(1 - \frac{C_{hld}}{C_{smp} + C_{hld}}z^{-1})}{1 + \frac{GC_{smp} + C_{hld}}{C_{smp} + C_{hld}}z^{-1}}$$

Therefore the denominator has a root  $\frac{GC_{smp}-C_{hld}}{C_{smp}+C_{hld}}$ . As the Buffer is an ideal OpAmp, G tends to be positive infinite. So the root is larger than 1. In z-domain, this means the circuit is unstable.

#### Using a high-gain low-bandwidth Buffer to avoid oscillation

There are two ways of avoiding the oscillation of the circuit. One is to reduce the Buffer's gain, the other is to reduce its bandwidth. The former method is unsuitable, as it degrades the linearising function. Thus the latter should be applied. In this case, the transfer function of the Buffer is no longer a constant G, but G(s). So the open-loop gain of the circuit in Figure 8.2 is

$$G_{OL}(s) = G(s)H_{SHA}(s)$$
(8.3)

where  $H_{SHA}(s)$  is the transfer function of the Feedback Sampler in s-domain. Naturally, Equation (8.2) needs to be converted to s-domain.

The accurate conversion is  $z = e^{sT}$ , where  $T = 1/f_0$  is the sampling period [55, 56]. In digital filter design, a bi-linear approximation  $Z = \frac{2+sT}{2-sT}$  is applied rather than  $Z = e^{sT}$ , so that the transfer functions can usually remain rational [56]. However, this approximation is fairly accurate only if the frequency being investigated is much smaller than the sampling rate  $f_0$  [56]. Unfortunately, the circuit in Figure 8.2 does oscillate at  $f_0/2$ , which cannot be considered as "much smaller than  $f_0$ ". So the bi-linear approximation is not applicable.

Applying  $z = e^{sT}$  to Equation (8.2):

$$H_{SHA}(s) = \frac{H_c C_{smp} e^{-sT}}{C_{smp} + C_{hld} - C_{hld} e^{-sT}}$$

According to Equation (8.3)

$$G_{OL}(s) = \frac{G(s)H_cC_{smp}e^{-sT}}{C_{smp} + C_{hld} - C_{hld}e^{-sT}}$$
(8.4)

Considering the frequency response, i.e. applying  $s=2j\pi f$ , the term  $e^{-sT}$  adds an extra phase delay to  $G_{OL}$ , which depends on the frequency, and reaches  $180^{\circ}$  at 1/2T. So  $|G_{OL}|$  might be larger than 1 when its phase delay reaches  $180^{\circ}$ . This is the fundamental reason why the LFA is possibly unstable.

To avoid the oscillation, G(s) must provide additional phase margin to the open-loop gain, i.e. its first pole needs to move to a lower frequency. As the Buffer needs a high gain to keep the functionality of the LFA, the desired Buffer

should be a high-gain low-bandwidth OpAmp. Moreover, using a low-bandwidth amplifier not only prevents the oscillation, but also reduces the noise power.

#### 8.1.4 Implementation of high-gain low-bandwidth Buffer

To significantly limit the bandwidth while still keep a high gain, a differential-input single-ended-output amplifier is presented as shown in Figure 8.3(a).



Figure 8.3: High-Gain Low-Bandwidth Buffer

The main difference between this and a normal amplifier is that two resistors  $(R_1 \text{ and } R_2)$  are inserted into the output stage (transistor MN4 and MP4). This structure is inspired by Widlar Current Source, which is a small-current source as shown in Figure 8.3(b) [53, 57]. Widlar Current Source provides a very small output current as well as a large output resistance. Therefore, together with the capacitor  $C_L$ , the output resistance gives a pole at a quite low frequency, which dominates the bandwidth of the presented amplifier. Figure 8.4 illustrates its AC-simulation results in Agilent ADS. The gain near DC is 41dB and its 3dB attenuation point is approximately 30kHz. The CMRR (Common-Mode Rejection Ratio) of it is 63dB.



Figure 8.4: AC simulation results of the present high-gain low-bandwidth Buffer

#### 8.1.5 Stability analysis of LFA

According to Figure 8.4, the second pole of the Buffer is approximately 430MHz, and other poles are much higher than the sampling frequency  $f_0$ , which can be ignored. Thus the Buffer can be modelled as

$$G(s) = \frac{G_0}{(1 + s/2\pi f_{p1})(1 + s/2\pi f_{p2})}$$

where  $G_0$  is its gain near DC (41dB, or 112),  $f_{p1}$  and  $f_{p2}$  are the first two poles (30kHz and 430MHz). According to Equation (8.4), the open-loop gain is

$$G_{OL}(s) = e^{-sT} \frac{G_0 H_c C_{smp}}{(C_{smp} + C_{hld} - C_{hld} e^{-sT})(1 + s/2\pi f_{p1})(1 + s/2\pi f_{p2})}$$
(8.5)

 $G_0$ ,  $f_{p1}$  and  $f_{p2}$  are from the amplifier, whilst  $H_c$ ,  $C_{smp}$  and  $C_{hld}$  are determined by the Sub-sampling SHA ( $H_c = 1$ ,  $C_{hld}/C_{smp} = 25$ ). The Bode Diagram of the open-loop gain is shown in Figure 8.5.

According to this Bode Diagram, the open-loop circuit has a phase margin of 21°, which means the feedback circuit in Figure 8.2 on page 109 is stable.



Figure 8.5: Bode Diagram of Equation (8.5)

Consequently, the presented Linear Feedback Amplifier is stable, and its close-loop bandwidth is 1.3MHz.

Increasing the capacitance of  $C_L$ , or the resistance of  $R_1$  and  $R_2$  in Figure 8.3 on page 112, would move the pole to a lower frequency, and provide more phase margin. However, either larger capacitance or larger resistance needs more chip area. More importantly, narrower bandwidth results in slow response to the input. Therefore more time is needed to obtain a *Linearised Holding Sample*.

A long total measuring time may be a potential issue to O-SAM, as the properties of the material being measured may change due to the environment during a long measuring time, i.e. the material may be different from the beginning of the measurement to the end of the measurement. Moreover, the changing of the material properties with the time is also a topic to be investigated in O-SAM. Only the quick response of the measuring circuit can ensure that the changing of the properties can be accurately monitored.

## 8.2 Frequency Response and Compensating Filter

As mentioned in Section 7.3 on page 96, the strategy of charge-domain sampling does not obtain a genuine sample, but an integration of the input signal over a short period of time,  $t_{sw}$  (the turn-on time of the sampler's switch). This section discusses this effect in detail, and presents a compensating FIR filter to correct this.

#### 8.2.1 Integration effect of sampling capacitor

As mentioned in Section 7.3 on page 96, one disadvantage of charge-domain sampling is that the output is not a "real" sample of the input, but an integration of the input over a total time of  $t_{sw}$ , i.e.

$$V_{out}(T_s) = \int_{T_s - \frac{t_{sw}}{2}}^{T_s + \frac{t_{sw}}{2}} \frac{V_{in}g_t}{C_{smp}} dt$$
 (8.6)

where  $T_s$  is the moment when the sampling is performed,  $t_{sw}$  is the turn-on time of the sampler's switches,  $V_{in}$  is the input signal,  $g_t$  is the gain of the transconductance amplifier, and  $C_{smp}$  is the sampling capacitance<sup>1</sup>. If  $t_{sw}$  is not short enough to be idealised, this integration effect will degrade high-frequency information.

Applying the setting in Equation (7.1) on page 94 to (8.6),

$$V_{out}(T_s) = \frac{g_t}{C_{smp}} \int_{T_s - \frac{t_{sw}}{2}}^{T_s + \frac{t_{sw}}{2}} \sum_n A_n \cos(2\pi n f_0 t + \phi_n) dt$$
$$= \frac{g_t}{C_{smp}} \sum_n (A_n \cos(2\pi n f_0 T_s + \phi_n)) \left(\frac{1}{\pi n f_0} \sin(\pi n f_0 t_{sw})\right) (8.7)$$

In Equation (8.7), each item of  $\sum_{n}$  is a harmonic of the input multiplied by a function of  $t_{sw}$ . Therefore the frequency response H(f) of the presented SHA

<sup>&</sup>lt;sup>1</sup>The detailed definition of these parameters can be found in Section 7.3 on page 96

can be easily derived:

$$H(f) = \frac{V_{out}(f)}{V_{in}(f)} = \frac{g_t}{C_{smp}\pi f}\sin(\pi t_{sw}f)$$
(8.8)

This frequency response is due to the integration effect of the charge-domain sampling structure as mentioned in Section 7.3 on page 96. It was first introduced by S. Karvonen et al to reduce noise in 2001 [54, 52]. However in the case of measurement in our system, the non-uniform frequency response results in extra attenuation to high-frequency harmonics of the input.



Figure 8.6: Idealised circuit for charge-domain sampling

For example, if the circuit obtaining *Front-End Sample* is idealised as that in Figure 8.6<sup>2</sup>, the frequency response near DC is  $\frac{g_t t_{sw}}{C_{smp}}$  according to Equation (8.8). Define the normalised frequency response as

$$H_{norm}(f) = H(f) \frac{C_{smp}}{g_t t_{sw}} = \frac{\sin(t_{sw} \pi f)}{t_{sw} \pi f}$$

Figure 8.7 shows  $H_{norm}(f)$  when  $t_{sw}$  is equal to 191ps and 95.3ps. These two switch-on times correspond to  $\frac{1}{64}$  and  $\frac{1}{128}$  of the period of the input signal of our OSAM system ( $\frac{1}{82MHz}=12.2ns$ ), respectively. As shown in the figure, the frequency response gradually decreases as the frequency increases from DC to the high-frequency range. The longer the switch-on time  $(t_{sw})$ , the worse the high-frequency performance. The first zero point for  $t_{sw}=191ps$  is at 5.2GHz, and that of  $t_{sw}=95.3ps$  is at 10.5GHz.

This integration effect caused by the charge-domain sampling is not negligible for a measurement application, and must be compensated by following circuits.

 $<sup>^2</sup>$ The detailed definition of the parameters in the figure are the same as those in Section 7.3 on page 96



Figure 8.7: Normalised frequency response of charge-domain sampling

In the presented DAQ system for O-SAM, this is achieved by an FIR digital filter, which is described in detail at Sub-Section 8.2.3.

Moreover, high-frequency information at the input suffers from poor Signal-Noise Ratio (SNR) as it can't get a gain as big as those in low-frequency. This cannot be compensated by the FIR (actually, it becomes worse after compensation). Consequently, a shorter  $t_{sw}$  is preferred for high-frequency sampling as it gives a flatter frequency response.

#### 8.2.2 Aperture Window Effect

In Sub-Section 8.2.1, only the integration effect due to charge-domain sampling is concerned. However, the frequency response of the Sub-Sampling SHA is also affected by the non-ideal nature of the switches. The switches need time to stabilise in either the "on" or "off" state. Moreover, the control signals of the switches, which are generated by the pulse generators presented in Chapter 5, are not perfect rectangular pulses.

To simplify the modelling of these imperfections, it can be considered that the input signal is multiplied by a *Virtual Pulse*, or an "Aperture Window", P(t).

P(t) is 0 during non-sampling time. When the switches turn on, P(t) gradually rises to 1. When one switch turns off, P(t) gradually falls to 0 again. The ideal condition in Sub-Section 8.2.1 is a special case of P(t) that its waveform is a rectangular pulse. Therefore, Equation (8.6) becomes

$$V_{out}(t) = \int_{T} \frac{V_{in}P(t)g_t}{C_{smp}}dt$$
(8.9)

where T is the period of the input signal and sampling pulses. The definitions of other parameters are the same as those for Equation (8.6). According to (8.9)<sup>3</sup>,

$$V_{out}(t) = \frac{g_t}{C_{smp}} V_{in}(t) * P(T - t)$$

where \* is the symbol of convolution integration. Consequently in frequency domain,

$$V_{out}(f) = \frac{g_t}{C_{smp}} V_{in}(f) P^*(f)$$

where  $P^*(f)$  is the conjugate of the frequency-domain function of P(t). So the frequency response of the Front-End Sample to the input  $(H_{FE})$  is

$$H_{FE}(f) = \frac{g_t}{C_{smp}} P^*(f)$$

According to the mechanism of the presented Sub-Sampling SHA, the transfer function from Front-End Sample to Holding Sample and Linearised Holding Sample is at base-band. Holding Samples always keep the sample value as Front-End Samples, while Linearised Holding Samples undo the non-linear effect. Therefore, Target Samples, which are a set of Linearised Holding Samples, have the same frequency response as  $H_{FE}$ , i.e. the over-all Frequency Response,  $H_{SHA}$ , is

$$H_{SHA}(f) = \frac{g_t}{C_{smp}} P^*(f) \tag{8.10}$$

P(t) is a virtual pulse. It is impractical to compute P(t) theoretically, and

<sup>&</sup>lt;sup>3</sup>Strictly, the convolution operator is defined in the integration range from  $-\infty$  to  $+\infty$ . But as  $V_{in}$  and P(t) are periodical functions of T, the following deduction is still valid.

consequently so is  $H_{SHA}$ . However, as the input signal is periodical,  $H_{SHA}(f)$  is valid only when f is an integer multiple of the fundamental frequency,  $f_0$  (in the case of the presented system, 82MHz). For a system with limited bandwidth, this means a few discrete values. For example, with a bandwidth of 5GHz,  $H_{SHA}$  has 62 values from f=0 to  $f=61f_0$ .

Therefore,  $H_{SHA}$  can be measured by the following method: A set of sinusoidal signals, which are all multiples of the fundamental frequency  $f_0$ , is applied to the input respectively, and the response of the circuit is measured. It must be noted that the output response occurs at base-band, rather than at RF.

Illustration of the Aperture Window Effect To illustrate the Aperture Window effect, a number of transient simulations for the Sub-Sampling SHAs (the circuit in Figure 8.1 on page 108) have been performed, and the response has been measured using the method above.

In these simulations, trapezoidal pulse waves are applied as the control signals of the switches (Ap, An, Bp, Bn, and Cn). The rising and falling time of these signals are set to 60ps, which is a typical value in the circuits designed in Chapter 5. The timing of the control signals are described in Figure 7.9 on page 103 and 7.10 on page 104.



Figure 8.8: Frequency response of proposed circuit in simulation

Figure 8.8 shows the simulation results of the 2.6GHz and 10.5GHz Sub-Sampling SHA, compared with the ideal charge-domain samplers (the sinc-type

filters). As mentioned on Page 119, the frequency response of the SHAs are a set of discrete values. These values are worse than the ideal charge-domain samplers, because there is not only the integration effect, but also the aperture window effect.

It should be noted that the real frequency response of these circuits will be quite different to Figure 8.8, as the real control signals are not exactly the same as those in the simulations, i.e. trapezoidal pulses with 60ps rising and falling times.

#### 8.2.3 Compensating FIR Filter

The digital FIR filter after the A/D converter (as shown in Figure 7.1 on page 94) can be applied to compensate both Integration Effect (described in Sub-Section 8.2.1 on page 115), and Aperture Window Effect due to P(t) (described in Sub-Section 8.2.2 on page 117).

To represent the input signal, the frequency response of the FIR,  $H_{FIR}$ , should make the over-all frequency response of the whole system flat, i.e.

$$H_{SHA}(f)H_{FIR}(f) = Constant$$

According to Equation (8.10) on Page 118,  $H_{FIR}(f)$  can be set to:

$$H_{FIR}(f) = \frac{1}{H_{SHA}(f)} = \frac{C_{smp}}{P^*(f)g_t}$$
 (8.11)

As mentioned in Sub-Section 8.2.2 on page 117,  $H_{SHA}(f)$  can be measured by experiment, and therefore  $H_{FIR}(f)$  can be determined.

## 8.3 System errors due to 4-phase clock source

In the 10.5GS/s Sub-Sampling SHA, there are some system errors in the output signal due to the 4-phase clock source. As the 2.6GHz Sub-Sampling SHA uses

a single-phase clock source, this kind of errors does not occur in this SHA.

## 8.3.1 System errors on DC operating points and frequency responses

As mentioned in Sub-section 8.2.2 and Equation (8.9), the voltage of the *Front-End Sample* is

$$V_{out}(t) = \int_{T} \frac{V_{in}P(t)g_t}{C_{smp}}dt$$

According to this equation, any change on the Virtual Pulse (P(t)) would lead to two kinds of system errors: the first and obvious one is a change to the frequency response of the sampler, i.e.  $H_{SHA}(f)$ ; The second is on the DC operating point of  $V_{out}$ , i.e.  $V_{out}$  at when there is no AC input.

This is not an issue for the 2.6GHz Sub-Sampling SHA, because it has only one clock signal and P(t) does not change. But the 10.5GHz Sub-Sampling SHA uses the four different phase outputs of the 2.624GHz clock synthesiser to trigger the control pulses. Consequently there are four different types of Virtual Pulses.

In the design of the pulse generator (Chapter 5), the clock and pulse signals are routed and buffered carefully to make all types of pulses as identical as possible. However, there are always some inevitable asymmetry in the chip layout, especially on generation of the clocks  $\phi_0 \sim \phi_3$  from the switch box (detail in Section 5.3 on page 70), and the process variations. This asymmetry results in a slight difference on the output pulses of DDU when the *Clock Type* (defined in Section 5.3 on page 70) is changed. Therefore P(t) will also change depending on the pulses. The difference is mainly on the rising and falling edges of P(t). This is because the asymmetry on the layout results in different parasitic capacitance and resistance, which affects the transition time of the signals, not these final stable states.

For example, in the presented 10.5GHz Sub-Sampling Sampler, the clock is

2.624GHz, i.e. 381ps per period. The expected pulse width of P(t) in the ideal case (i.e. ignoring circuit delay) is one fourth of the clock period, 95.3ps. On the other hand, the transition times of the clock and pulse signals are typically around 60ps, and P(t)'s transition time cannot be shorter. Therefore, the transition time of P(t), including the rising and falling edges, takes a large portion of the sampling pulse<sup>4</sup>. Any difference on the transition time, which comes from the asymmetry of the layout of Switch Box, will cause differences to P(t). As a result, the DC operating points of Front-End Sample,  $V_{out}$ , have different values in different Clock Types, and so does the frequency response of the Sampler,  $H_{SHA}(f)$ .

#### 8.3.2 Precise solution

To overcome this issue, the difference among the *Clock Types* needs to be calibrated. This sub-section presents a solution to precisely calibrate this error.

The 10.5GS/s Sub-Sampling SHA obtains 128 samples in total. But because of the 4-phase clock error, the system gets 4 sets of 32 samples. Each set can be considered as a 32-point sampling data without the 4-phase clock errors. However, 32-point sampling can not fulfill the Nyquist Law. Although the 32 points of data contain all the frequency information of the input, the frequencies are aliasing to each other on the output. For example, the harmonics  $f_0$ ,  $31f_0$ ,  $33f_0$  and  $63f_0$  will all alias to  $f_0$  in a 32-point sampling system.

This precise solution is to exploit these four sets of 32-point aliasing data to extract a new set of 128-point data without frequency aliasing and the 4-phase clock errors. The following is the proof of this solution.

#### Discretisation of Virtual Pulses

Clock Types 0, 1, 2 and 3 generate four different Virtual Pulses,  $P_0$ ,  $P_1$ ,  $P_2$ , and  $P_3$ , respectively. The input is  $V_{in}$ , and the output (Target Samples) is  $V_{out}$ .

 $<sup>^4</sup>$ Consequently, the effective sampling pulse width is wider than 95.3ps. This is  $Aperture\ Window\ Effect$  discussed in Sub-Section 8.2.2.

The aim of this solution is to determine  $V_{in}$  as precisely as possible from  $V_{out}$  and the pre-measured  $P_0 \sim P_3$ .

 $V_{out}$  has 128 samples for one period. In discrete domain, these samples are defined as

$$V_{out}(n), \quad n = 0, 1, \dots, 127$$

The final calibrated results,  $V_{cal}$ , should have 128 samples as well:

$$V_{cal}(n), \quad n = 0, 1, \dots, 127$$

 $V_{cal}$  should be equal or similar to  $V_{in}$  as much as possible.



Figure 8.9: 4 different Virtual Pulses applied to Target Samples Vout

Without loss of generality, it is assumed that  $P_0$  is applied on those samples  $n=0,\,4,\,8,\,...,\,124,\,P_1$  is applied on  $n=1,\,5,\,9,\,...,\,125,\,P_2$  is applied on  $n=2,\,6,\,10,\,...,\,126,$  and  $P_3$  is applied on  $n=3,\,7,\,11,\,...,\,127,$  as shown in Figure 8.9.

Virtual Pulses are of course a continuous signal, but the equivalence in the discrete frequency domain can be defined as

$$\mathcal{D}_{z}(k) = \begin{cases} P_{z}(kf_{0}) & k = 0, 1, ..., 63 \\ 0 & k = 64 \\ P_{z}((k-128)f_{0}) & k = 65, 66, ..., 127 \end{cases}$$

where  $z = 0, 1, 2, \text{ or } 3, P_z(f)$  is Fourier Transform of the Virtual Pulse  $P_z(t)$  in RF band. By applying IDFT (Inverse Discrete Fourier Transform,  $\mathbb{F}^{-1}$ ) on

 $\mathcal{D}_z(k)$ , a discrete time series,  $\mathcal{D}_z(n)$ , is obtained:

$$D_z(n) = \mathbb{F}^{-1} \left[ \mathcal{D}_z(k) \right]$$

This is the discretised form of the *Virtual Pulses*, as illustrated in Figure 8.10.



Figure 8.10: Discretisation of Virtual Pulses

So if  $V_{in}(t)$  is ideally discretized to  $V_{in}(n)$ , the convolution in discrete domain,

$$V_{out}(n) = V_{in}(n) * D_z(n)$$

is equivalent to sample  $V_{in}(t)$  by  $Virtual\ Pulse\ P_z(t)$  in continuous-time domain. In this equation, it is assumed that only one type of  $Virtual\ Pulse$  is applied to all samples. (In reality, there are four different types.)

#### **Output Groups**

Similar to Virtual Pulses,  $V_{out}(n)$  can be divided into four Output Groups, as illustrated in Figure 8.11:

• Group 0: 
$$V_{o0}(n) = \begin{cases} V_{out}(n) & n = 0, 4, ..., 124 \\ 0 & n = others \end{cases}$$



Figure 8.11: Output Groups of SHA output

• Group 1: 
$$V_{o1}(n) = \begin{cases} V_{out}(n) & n = 1, 5, ..., 125 \\ 0 & n = others \end{cases}$$

• Group 2: 
$$V_{o2}(n) = \begin{cases} V_{out}(n) & n = 2, 6, ..., 126 \\ 0 & n = others \end{cases}$$

• Group 3: 
$$V_{o3}(n) = \begin{cases} V_{out}(n) & n = 4, 7, ..., 127 \\ 0 & n = others \end{cases}$$

If defining an ideal pulse series

$$Q(n) = \begin{cases} 1 & n = 0, 4, ..., 124 \\ 0 & n = others \end{cases}$$
 (8.12)

the four groups of  $V_{out}$  becomes

$$\begin{cases} V_{o0}(n) = Q(n) \left( V_{in}(n) * D_0(n) \right) \\ V_{o1}(n) = Q(n-1) \left( V_{in}(n) * D_1(n) \right) \\ V_{o2}(n) = Q(n-2) \left( V_{in}(n) * D_2(n) \right) \\ V_{o3}(n) = Q(n-3) \left( V_{in}(n) * D_3(n) \right) \end{cases}$$

#### Calibration Matrix

Applying DFT (Discrete Fourier Transform,  $\mathbb{F}$ ) on  $V_{o0}(n)$ ,

$$\mathcal{V}_{o0}(k) = \mathbb{F}\left[V_{o0}(n)\right]$$

$$= \mathbb{F}\left[Q(n)\left(V_{in}(n) * D_0(n)\right)\right]$$

$$= \frac{1}{128}\mathcal{Q}(k) * \left(\mathcal{V}_{in}(k)\mathcal{D}_0(k)\right)$$

where Q(k) and  $V_{in}(k)$  are DFT of Q(n) and  $V_{in}(n)$ , respectively. According to Equation (8.12),

$$Q(k) = \begin{cases} 32 & n = 0, 32, 64, 96 \\ 0 & n = others \end{cases}$$

So

$$\mathcal{V}_{o0}(k) = \frac{1}{4} (\mathcal{V}_{in}(k)\mathcal{D}_{0}(k) + \mathcal{V}_{in}(k \oplus 32)\mathcal{D}_{0}(k \oplus 32) 
+ \mathcal{V}_{in}(k \oplus 64)\mathcal{D}_{0}(k \oplus 64) + \mathcal{V}_{in}(k \oplus 96)\mathcal{D}_{0}(k \oplus 96)) \quad (8.13)$$

where  $\oplus$  is *Modulo-128 Add*, i.e.

$$a \oplus b = (a+b) \mod 128$$

Similarly,

$$\mathcal{V}_{o1}(k) = \frac{1}{4} (\mathcal{V}_{in}(k)\mathcal{D}_{1}(k) + j\mathcal{V}_{in}(k \oplus 32)\mathcal{D}_{1}(k \oplus 32) 
- \mathcal{V}_{in}(k \oplus 64)\mathcal{D}_{1}(k \oplus 64) - j\mathcal{V}_{in}(k \oplus 96)\mathcal{D}_{1}(k \oplus 96)) \quad (8.14)$$

$$\mathcal{V}_{o2}(k) = \frac{1}{4} (\mathcal{V}_{in}(k)\mathcal{D}_{2}(k) - \mathcal{V}_{in}(k \oplus 32)\mathcal{D}_{2}(k \oplus 32) 
+ \mathcal{V}_{in}(k \oplus 64)\mathcal{D}_{2}(k \oplus 64) - \mathcal{V}_{in}(k \oplus 96)\mathcal{D}_{2}(k \oplus 96)) \quad (8.15)$$

$$\mathcal{V}_{o3}(k) = \frac{1}{4} \left( \mathcal{V}_{in}(k) \mathcal{D}_{3}(k) - j \mathcal{V}_{in}(k \oplus 32) \mathcal{D}_{3}(k \oplus 32) - \mathcal{V}_{in}(k \oplus 64) \mathcal{D}_{3}(k \oplus 64) + j \mathcal{V}_{in}(k \oplus 96) \mathcal{D}_{3}(k \oplus 96) \right)$$
(8.16)



Figure 8.12: Vectorial sum of Output Groups in discrete frequency domain

Figure 8.12 illustrates Equations  $(8.13)^{\sim}(8.16)$  in a vectorial form. In this figure, the vector  $VD_n(k)$  is defined as

$$VD_n(k) = \frac{1}{4}\mathcal{V}_{in}(k \operatorname{\mathbf{mod}} 128)\mathcal{D}_n(k \operatorname{\mathbf{mod}} 128)$$

where n = 0, 1, 2, 3. According to Equations (8.13) $^{\sim}$  (8.16), each Output Group  $(\mathcal{V}_{o0}(k))^{\sim} \mathcal{V}_{o3}(k)$  mixes 4 frequency components from the input into 1 frequency component on the output. However, as shown in Figure 8.12, each Output Group mixes the 4 components in different vector phases. Therefore, it is possible to retrieve the original 4 components.

Combining Equations  $(8.13)^{\sim}(8.16)$  together,

$$\begin{bmatrix} \mathcal{V}_{o0}(k) \\ \mathcal{V}_{o1}(k) \\ \mathcal{V}_{o2}(k) \\ \mathcal{V}_{o3}(k) \end{bmatrix} =$$

$$\frac{1}{4} \begin{bmatrix}
\mathcal{D}_0(k) & \mathcal{D}_0(k \oplus 32) & \mathcal{D}_0(k \oplus 64) & \mathcal{D}_0(k \oplus 96) \\
\mathcal{D}_1(k) & j\mathcal{D}_1(k \oplus 32) & -\mathcal{D}_1(k \oplus 64) & -j\mathcal{D}_1(k \oplus 96) \\
\mathcal{D}_2(k) & -\mathcal{D}_2(k \oplus 32) & \mathcal{D}_2(k \oplus 64) & -\mathcal{D}_2(k \oplus 96) \\
\mathcal{D}_3(k) & -j\mathcal{D}_3(k \oplus 32) & -\mathcal{D}_3(k \oplus 64) & j\mathcal{D}_3(k \oplus 96)
\end{bmatrix} \begin{bmatrix}
\mathcal{V}_{in}(k) \\
\mathcal{V}_{in}(k \oplus 32) \\
\mathcal{V}_{in}(k \oplus 64) \\
\mathcal{V}_{in}(k \oplus 96)
\end{bmatrix}$$

where k=0, 1, ..., 127. But because of modulo-128 adding, k=0, 1, ..., 31 can include all frequency information. k=32, ..., 127 are redundant, as each  $\mathcal{V}_{oz}(k)$  (z=1, 2, 3, 4) has its equivalent in  $k=0 \sim 31$ . Actually, since  $V_{out}(n)$  are divided to four groups ( $V_{o0}(n) \sim V_{o3}(n)$ ), each group has got 32 "real" samples only. As a result, each of their frequency forms ( $\mathcal{V}_{o0}(k) \sim \mathcal{V}_{o3}(k)$ ) should have 32 non-redundant points only.

Defining a Calibration Matrix,

$$\mathbb{C}_{k} = 4 \begin{bmatrix}
\mathcal{D}_{0}(k) & \mathcal{D}_{0}(k \oplus 32) & \mathcal{D}_{0}(k \oplus 64) & \mathcal{D}_{0}(k \oplus 96) \\
\mathcal{D}_{1}(k) & j\mathcal{D}_{1}(k \oplus 32) & -\mathcal{D}_{1}(k \oplus 64) & -j\mathcal{D}_{1}(k \oplus 96) \\
\mathcal{D}_{2}(k) & -\mathcal{D}_{2}(k \oplus 32) & \mathcal{D}_{2}(k \oplus 64) & -\mathcal{D}_{2}(k \oplus 96) \\
\mathcal{D}_{3}(k) & -j\mathcal{D}_{3}(k \oplus 32) & -\mathcal{D}_{3}(k \oplus 64) & j\mathcal{D}_{3}(k \oplus 96)
\end{bmatrix}^{-1}$$

then

$$\begin{bmatrix} \mathcal{V}_{o0}(k) \\ \mathcal{V}_{o1}(k) \\ \mathcal{V}_{o2}(k) \\ \mathcal{V}_{o3}(k) \end{bmatrix} = \mathbb{C}_k^{-1} \begin{bmatrix} \mathcal{V}_{in}(k) \\ \mathcal{V}_{in}(k \oplus 32) \\ \mathcal{V}_{in}(k \oplus 64) \\ \mathcal{V}_{in}(k \oplus 96) \end{bmatrix}$$

 $\mathbb{C}_k$  can be measured with the method mentioned in Sub-Section 8.2.2 on Page 119.

Therefore, the final aim of this Sub-Section,  $V_{cal}$ , which should represent  $V_{in}$  as

precise as possible, may be defined as follow:

$$\begin{bmatrix} \mathcal{V}_{cal}(k) \\ \mathcal{V}_{cal}(k+32) \\ \mathcal{V}_{cal}(k+64) \\ \mathcal{V}_{cal}(k+96) \end{bmatrix} = \mathbb{C}_k \begin{bmatrix} \mathcal{V}_{o0}(k) \\ \mathcal{V}_{o1}(k) \\ \mathcal{V}_{o2}(k) \\ \mathcal{V}_{o3}(k) \end{bmatrix} = \begin{bmatrix} \mathcal{V}_{in}(k) \\ \mathcal{V}_{in}(k+32) \\ \mathcal{V}_{in}(k+64) \\ \mathcal{V}_{in}(k+96) \end{bmatrix}$$
(8.17)

where k = 0, 1, ..., 31, and

$$V_{cal}(n) = \mathbb{F}^{-1} \left[ \mathcal{V}_{cal}(k) \right]$$

where n = 0, 1, ..., 127.

It should be noted that the compensating filter, which is mentioned in Sub-Section 8.2.3, is included in *Calibration Matrix*.  $\mathbb{C}_k^{-1}$  is effectively  $H_{SHA}(f)$  considering the difference among *Clock Types*, and  $\mathbb{C}_k$  is effectively  $H_{FIR}(f)$ .

Up to now,  $\mathcal{V}_{cal}(k)$  looks totally equal to  $\mathcal{V}_{in}(k)$ , and so does  $V_{cal}(n)$  to  $V_{in}(n)$ . However, there are two exceptions, k=0 and k=16, which concern frequencies of  $16f_0$ ,  $32f_0$ ,  $48f_0$  and DC.

The reason for the exceptions is that each output group  $(V_{o0}(n) \sim V_{o3}(n))$  effectively obtains 32 samples of the input.  $16f_0$  is exactly half of the sampling rate, which is a singular point. Assuming a sine wave in  $f = 16f_0$  is sampled by the rate of  $32f_0$ , each period would be sampled twice at the same two phases (suppose they are  $\psi$  and  $\psi + 180^{\circ}$ ). The sampled values depend on both the amplitude of input and  $\psi$ . However, based on the sampled values, the solution to the amplitude and  $\psi$  is not unique. On the contrary, they can be of any value. So  $\mathcal{D}_z(16)$  (z = 1, 2, 3, 4) is not measurable. For the same reason, all of its multiples, including  $D_z(32)$ ,  $D_z(48)$ ,  $D_z(64)$ ,  $D_z(80)$ ,  $D_z(96)$ , and  $D_z(112)$ , are not measurable as well.

Therefore, two *Calibration Matrices*,  $\mathbb{C}_0$  and  $\mathbb{C}_{16}$ , cannot be obtained. The real valid range for Equation 8.17 is k = 1, 2, ..., 15 and 17, 18, ..., 31. As for

 $\mathcal{V}_{cal}(0)$ ,  $\mathcal{V}_{cal}(16)$ ,  $\mathcal{V}_{cal}(32)$ ,  $\mathcal{V}_{cal}(48)$ ,  $\mathcal{V}_{cal}(64)$ ,  $\mathcal{V}_{cal}(80)$ ,  $\mathcal{V}_{cal}(96)$ , and  $\mathcal{V}_{cal}(112)$ , there is no other choice but to arbitrarily set them to zero.

Those information on the affected frequencies, including DC,  $16f_0$ ,  $32f_0$ , and  $48f_0$ , are lost on  $V_{out}$  and  $V_{cal}$ . Although  $\mathbb{C}_0$  affects  $64f_0$  as well, it is not measurable in a 128-point sampling systems whatsoever.

# 8.3.3 Approximate solution

In the above precise solution for calibration, there are 30 Calibration Matrix concerned ( $\mathbb{C}_1 \,^{\sim} \, \mathbb{C}_{15}$ ,  $\mathbb{C}_{17} \,^{\sim} \, \mathbb{C}_{31}$ ). Each Calibration Matrix has 16 parameters to be measured. Each parameter,  $\mathcal{D}_z(k)$ , is a complex value, which contains both the amplitude and phase information of the response to a designated frequency. Therefore in real measurements,  $\mathcal{D}_z(k)$  includes two parameters to be measured, the amplitude and the phase. But because of the property of DFT for real signals,  $\mathcal{D}_z(k) = \mathcal{D}_z(128-k)$ , which means the parameter number can be halved. So the total number of parameters to be measured is

$$30 \times 16 \times 2 \div 2 = 480$$

for only one Sub-Sampling SHA.

As for a photo-diode array, which probably includes a large number of Sub-Sampling SHAs, the calibration data size may reach a huge value. This would result in a heavy load for both the processor and the memory for the Digital Filter after ADC (as shown in Figure 7.1 on page 94).

In this Sub-Section, another approximate solution is given, which can reduce the load to 27.5%. The main idea here is to ignore the difference on the frequency response due to *Clock Types*, and only to remove the difference on DC operating points.

According to measurement results, the difference on  $P_z(f)$  ( $z=0\sim 3$ , as defined on Page 123) in different Clock Types (i.e. difference among  $\mathcal{D}_z(k)$ 

when z is changed but k keeps constant) is approximately 5%  $\sim$  10%. If the average values of  $\mathcal{D}_z(k)$  ( $z=0\sim3$ ) are used for all Clock Types as  $P_{avg}(kf_0)$ , the calculation becomes significantly simpler and more direct, just as a normal sampling system.

Assuming the signal energy is distributed evenly to the four *Virtual Pulses* for sampling, the systematic error on the output voltage due to this approximation is between  $5\% \sim 10\%$  as well, which means an SNR of  $100{\sim}400$ . If the original noise level is no better than that, i.e. SNR < 100, this approximation can be applied to simplify calculation.

Nevertheless, the 5%  $\sim 10\%$  error on the DC signal is not ignorable, because the DC signal contains two sources, DC in the laser input, and the DC operating point (DC-Op) of  $V_{in}$  in Figure 7.7 on page 100. In Equation (8.9) on Page 118, DC-Op of  $V_{in}$  dominates DC-Op of  $V_{out}$ , i.e. DC-Op of  $V_{in}$  is effectively a very large "DC input" compared to the laser input. The 5%  $\sim 10\%$  error mentioned above also applies on this large "DC input".

As a result, each  $Output\ Group$  (as defined on Page (8.3.2)) has its own DC-Op, and the difference among these DC-Ops are sometimes even higher than the amplitudes of the AC signals. Figure 8.13 illustrates such a typical output without any calibration. (The data for this figure is obtained from a digital-stored oscilloscope, and displayed in "AC mode" in order to get enough effective digits. Therefore, the "over-all" DC-Op, which is more than 2V, is removed by the oscilloscope. But the difference of DC-Ops among  $Output\ Groups$  are still remarkably visible.)

DC-Ops of the four  $Output\ Groups$  can be easily measured by removing the laser input  $(Dark\ Output)$ . Thus the difference among DC-Ops can be eliminated by subtracting  $Dark\ Output$  from the obtained results  $(V_{out}(n))$ , as shown in Figure 8.14.

Unlike the precise solution, which has included the compensating FIR filter mentioned in Sub-Section 8.2.3, the approximate solution removes only the DC-



Figure 8.13: DC-Op difference among  ${\it Output~Groups}$  when no calibration is applied



Figure 8.14: Output Groups removing DC-Op difference

Op difference. The compensating FIR filter needs to be applied to remove the Integration Effect and the Aperture Window Effect. Therefore, the total number of parameters involved in the approximate solution is 4 DC-Op points, plus 128 filter parameters, which is 132, about 27.5% of the precise solution<sup>5</sup>.

In this approximate solution, the frequency information on DC,  $16f_0$ ,  $32f_0$  and  $48f_0$  still "exist". But they only exist because of the assumption that there are no difference among different *Clock Types*. Actually they are as inaccurate as those in the precise solution.

# 8.4 Architecture of Digital Filter

As a summary of Section 8.2 and 8.3, this section presents the architecture of the Digital Filter after ADC (as shown in Figure 7.1 on page 94), and the calibration method. This Digital Filter can be implemented either on an FPGA, or as a programme in a computer or DSP (Digital Signal Processor).

In the following two sub-sections, the presented architectures are designated for 10.5GHz Sub-Sampling SHA. As for 2.6GHz Sub-Sampling SHA, the architecture for the precise solution is not applicable, but the one for the approximate solution can be used.

# 8.4.1 Architecture for the precise solution

The Digital Filter for the precise solution presented in Sub-Section 8.3.2 on page 122 is illustrated in Figure 8.15.

The input, which are *Linearised Holding Samples* digitised by ADC, are stored in a memory block with the size of  $128 \times M$  (M is a positive integer, and can be

 $<sup>^5</sup>$ As it will be mentioned Chapter 12, there is a static dark noise from the Pulse Generator which also has to be removed. Thus 128 more parameters are needed for both the precise solution and the approximate solution. Finally the approximate solution has about 43% parameter numbers as the precise solution, and its calculation is significantly simpler than the latter.



Figure 8.15: Digital Filter for the precise solution

any value depending on the availability of hardware). Mathematical averaging are applied to each set of *Linearised Holding Samples* which correspond to the same *Target Sample*. 128 *Target Samples* are obtained totally. The averaging part is optional for removing more noise<sup>6</sup>. It can be omitted by just taking 128 *Linearised Holding Samples* as *Target Samples*.

Target Samples  $(V_{out}(n))$  are divided into four Output Groups  $(V_{o0}(n) \sim V_{o3}(n))$ , and respectively transformed to frequency domain  $(\mathcal{V}_{o0}(k) \sim \mathcal{V}_{o3}(k))$  by FFT (Fast Fourier Transform). Then Calibration Matrices  $(\mathbb{C}_k)$  are applied to compensate the Integration Effect and the Aperture Window Effect, and eliminate the system errors due to difference among Clock Types. After that, IFFT (Inverse Fast Fourier Transform) is applied to obtain the output in time domain  $(V_{cal}(n))$ .

#### Calibration Procedure

Similar to Sub-Section 8.2.2 (on 119), Calibration Matrices can be obtained as the following procedure:

- 1 k = 1
- 2 Modulate a sine wave with the frequency of  $kf_0$  into the laser input, where  $f_0$  is the fundamental frequency 82MHz. (A synchronised signal of  $f_0$  is needed as the reference input of Pulse Generator.)
- 3 Get 128 Target Samples, divide into four Output Groups, and apply FFT respectively.
- 4 Record the corresponding frequency response, including amplitude and phase, as the frequency response of *Virtual Pulses*, i.e.  $\mathcal{D}_{oz}(k) = \mathcal{V}_{oz}(k)$ , and  $\mathcal{D}_{oz}(128 k) = \mathcal{V}_{oz}^*(k)$ , z = 1, 2, 3, and 4.
- $5 \ k = k + 1$

 $<sup>^6\</sup>mathrm{Noise}$  removing by averaging is discussed in detail in Section 9.3 on page 142

6 if k = 16 or 32 or 48, then k = k + 1

7 if k < 64, then go to Step 2; Otherwise, finish.

## 8.4.2 Architecture for the approximate solution

The Digital Filter for the approximate solution presented in Sub-Section 8.3.3 on page 130 is illustrated in Figure 8.16.



Figure 8.16: Digital Filter for the approximate solution

Similar to the precise solution, Target Samples  $(V_{out}(n))$  can be obtained by taking the average of Linearised Holding Samples, as shown in the figure. Alternatively, 128 Linearised Holding Samples can be taken directly as Target Samples. The following process is much simpler than that in the precise solution: Remove the DC-Op difference among Output Groups, and apply the compensating FIR filter to get the output  $(V_{cal}(n))$ .

#### Calibration Procedure

The DC-Ops of four *Output Groups* are obtained when there is no laser input, i.e. the *Dark Output*.

 $\mathcal{H}_{FIR}(k)$  and  $H_{nFIR}(n)$  are obtained as following:

- 1 k = 0
- 2 Modulate a sine wave with the frequency of kf<sub>0</sub> into the laser input, where f<sub>0</sub> is the fundamental frequency 82MHz. For k = 0, it is a DC signal. (A synchronised signal of f<sub>0</sub> is needed as the reference input of Pulse Generator.)
- 3 Get 128 Target Samples, remove DC-Ops, and apply FFT.
- 4 Record the corresponding frequency response  $(\mathcal{V}_{out}(k))$ , including amplitude and phase, then  $\mathcal{H}_{FIR}(k) = 1/\mathcal{V}_{out}(k)$ , and  $\mathcal{H}_{FIR}(128 k) = 1/\mathcal{V}_{out}^*(k)$ .
- $5 \ k = k + 1$
- 6 if k < 64, then go to Step 2.
- 7 Do IFFT,  $H_{nFIR}(n) = \mathbb{F}^{-1} [\mathcal{H}_{FIR}(k)].$

#### Architecture of the Digital Filter in 2.6GHz Sub-Sampling SHA

As mentioned before, the architecture of the approximate solution can also be used in 2.6GHz Sub-Sampling SHA. The only modification in this case is changing the data size and FIR parameters from 128 to 32. Because 2.6GHz Sub-Sampling SHA does not suffer the system errors due to the 4-phase clock source, all frequency response measured here are valid, unlike the 10.5GHz Sub-Sampling SHA, where DC,  $16f_0$ ,  $32f_0$ , and  $48f_0$  are actually invalid, and the obtained frequency response is an average of those of the four *Clock Types*. Consequently, in 2.6GHz Sub-Sampling SHA, this architecture for Digital Filter is no longer an "approximate solution", but an accurate solution.

# 8.5 Summary

This chapter presented two assisting modules to correct the intrinsic errors in the core circuit of Sub-Sampling SHA. Firstly, a novel Linearising Feedback Amplifier was designed to remove the non-linear effect of the SHA. Secondly, a digital filter was presented to compensate the uneven frequency response of the SHA, and the 4-phase-clock error due to the asymmetry in the clock source. There were two versions of the digital filter, a precise one which removed as much error as possible, and an approximate one which ignored the AC part of 4-phase-clock error and simplified the calculation.

# Chapter 9

# Noise Analysis

# 9.1 Noise folding and filtering in Sub-sampling SHA

As mentioned in Section 6.2 on page 88, sub-sampling systems suffer from noise folding, and exhibit terrible noise figures (e.g. 30dB) [23]. The presented Sub-Sampling SHA has the same issue as well.

For a system demodulating a signal from a high-frequency carrier, the noise can be limited by applying a band-pass filter, which allows only the signals in the designated band to pass. In the presented Sub-Sampling SHA, however, the input signal has frequency information ranging from its fundamental frequency  $f_0 = 82MHz$  to several GHz. Since the lower cut-off frequency is much lower than the upper one, there is little to gain in the application for using a band-pass filter.

Although it is difficult to reduce the noise in RF-band, it is possible in base band. According to Section 7.5 on page 99, the input signal is sampled at the same phase to achieve one *Holding Sample*. During the whole process to get that *Holding Sample*, the only useful output is the final stable DC voltage value

on  $C_{hld}$ . All the AC signals are either folded noise from the RF-band input, or circuit noise from SHA itself. Ideally, a low-pass filter in base band with very low cut-off frequency would eliminate most of the noise, as shown in Figure 9.1. This low cut-off frequency would result in a slow responding time.



Figure 9.1: Noise filtering in Sub-Sampling SHA

# 9.2 Filters in Sub-Sampling SHA

There are already two built-in low-pass filters in the presented circuits, the switched-capacitor structure in the core circuit of Sub-Sampling SHA, and LFA (Linearising Feedback Amplifier). These two circuit also act as filters, and eliminate most of the noise in base band.

## 9.2.1 Switched-capacitor filter in sampling circuit

The first one is the switched-capacitor structure involving DDS,  $C_{smp}$ , MP1, and  $C_{hld}$  in the core circuit of Sub-Sampling SHA (Figure 7.7 on page 100). When obtaining one *Holding Sample*, the input of Sub-Sampling SHA is virtually constant as the input is sampled at the same phase of every period. Therefore SHA acts as a switched-capacitor filter discussed in Section 6.3 on page 89 [53].

The differential switches (DDS),  $C_{smp}$ , and the PMOS switch (MP1) form an equivalent resistor

$$R_{eff} = \frac{1}{f_0 C_{smp}}$$

where  $f_0$  is the switching frequency (82MHz). This equivalent resistor and  $C_{hld}$  form a low-pass RC filter with cut-off frequency

$$f_{cut-off} = \frac{1}{2\pi R_{eff} C_{hld}} = \frac{f_0 C_{smp}}{2\pi C_{hld}}$$

For the 10.5GHz Sampler,  $f_{cut-off}$  is 0.4MHz, whilst that of the 2.6GHz Sampler is 1MHz. The later is higher because the 2.6GHz Sampler has a larger  $C_{smp}$ .

Ignoring the bandwidth limit of the circuits, the 10.5GHz Sampler, which takes 128 points for a period, collects up to the 63rd harmonics. The noise power across the whole frequency region are folded down to base band (DC to 41MHz, half of  $f_0$ ). Assuming there is white noise only, the SNR (Signal-to-Noise Ratio) would be 63 times lower than the input in the worst case.

But with the built-in switched-capacitor filter, the base band noise is limited to below  $f_{cut-off}$ . The noise power is then reduced by a factor of approximately 100 (41 $MHz/0.4MHz \approx 100$ ). Therefore SNR can be significantly increased.

As for the 2.6GHz Sampler, which takes 32 points for a period, the SNR would be 15 times lower than the input without any noise filter. But with the built-in switched-capacitor filter, the noise power is reduced by a factor of approximately 40, which increases the SNR by a factor of 40.

#### 9.2.2 Linearising Feedback Amplifier as a noise filter

The built-in switched-capacitor filter in Sub-Sampling SHA reduces the folded noise to the level similar to that in a normal SHA without noise-folding. However, the switched-capacitor structure introduces extra interference due to chan-

nel charge injection and clock feed through, as illustrated in Section 6.3 on page 89.

Fortunately, the second built-in filter, LFA (Linearising Feedback Amplifier), has a Small bandwidth and so acts like a low-pass filter which reduces the noise associated with the switched-capacitor filter.

In the LFA, the Input Sampler and the Feedback Sampler have the same circuit structure, and so provide the same amount of channel charge injection and clock feed through. Therefore the interference from switched-capacitor structures becomes a common-mode input to the Buffer. As the Buffer provides a high CMRR (63dB), the output of this common-mode interference is small compared to the required differential-mode output.

Of course, the channel charge injection and clock feed-through cannot be entirely equal between the Input Sampler and Feedback Sampler. There is a small amount of differential-mode interference, which is amplified by the Buffer with the same gain as the needed output. Nevertheless, the source of these interference is the controlling pulses (Ap, An, Bp, Bn and Cp in Figure 8.1 on page 108). Consequently interference from the channel charge injection and clock feed-through has a fundamental frequency of 82MHz. Since the Buffer has a very low bandwidth (see Sub-Section 8.1.4 on page 112), it will provide approximately 28dB attenuation to these interference signal.

# 9.3 Consideration of flicker noise

So far, it is only white noise (including thermal noise and shot noise) has been considered. CMOS transistors, especially NMOS, suffer from flicker noise (1/f noise, or pink noise).

The spectral density of flicker noise increases when frequency decreases [23]. For a given frequency band, the total noise power depends on the logarithm of the ratio of its upper limit frequency  $(f_h)$  and lower limit frequency  $(f_l)$ :

$$\overline{V_{nf}^2} = K \ln(\frac{f_h}{f_l}) \tag{9.1}$$

where  $V_{nf}$  is the Root-Mean-Square (RMS) flicker noise voltage, and K is a constant depending on the fabrication process and the transistor size [23]. This indicates that there would be a quite large flicker noise in low frequency even if the band width is very narrow. (For example, when  $f_l = 1kHz$  and  $f_h = 2kHz$ , it has the same flicker noise power as that of  $f_l = 1GHz$  and  $f_h = 2GHz$ , although the former has only 1kHz bandwidth and the later has 1GHz.)

#### Flicker noise and low-pass filters

To understand the effects of flicker noise on the DAQ system, the bandwidth of the DAQ needs to be calculated.

The lower-end of the DAQ bandwidth should be set to a frequency that any noise lower than that frequency will not affect the measurement. If the time to acquire one Linearised Holding Sample (i.e. the Presenting Time) is  $T_p$ , a noise signal with the frequency less than  $\frac{1}{10T_p}$  will not change significantly during sampling, and so will not affect the measurement. If all 128 Holding Samples are obtained one by one, this frequency limit is changed to  $\frac{1}{1280T_p}$ . Therefore the lower-end of the DAQ bandwidth can be considered as  $f_l < \frac{1}{1280T_p}$ .

On the other hand, the upper-end of the DAQ bandwidth,  $f_h$ , depends on the noise-reducing low-pass filters mentioned in Section 9.2 and 9.3. The filter with the lowest upper-limit frequency determines  $f_h$ .  $f_h$  must be distinctively larger than  $\frac{1}{T_p}$ , otherwise the output will not be stable. A factor of 10 is considered here, i.e.  $f_h > \frac{10}{T_p}$ .

Therefore, the lower limit of  $\frac{f_h}{f_l}$  can be calculated.

$$\frac{f_h}{f_l} > \frac{10/T_p}{1/1280T_p} = 12800$$

According to Equation (9.1),

$$\overline{V_{nf}^2} > K \ln 12800 = 9.5K$$

where K is a constant depending on the fabrication process and the sizes of the involved transistors. This equation means that the flicker noise has a non-zero minimum value, which is independent to the *Presenting Time*. So even if the low-pass filters are applied to reduce the noise bandwidth as much as possible, only white noise will tend to be eliminated, but the flicker noise will not.

#### Removing flicker noise by digital averaging

It is possible to reduce of the noise further by averaging<sup>1</sup> a number of digitised Linearised Holding Samples.

In the following discussion, it is assumed that the RMS noise voltage of one Linearised Holding Sample is  $V_n$ , the Presenting Time of a Linearised Holding Sample is  $T_p$ , and N Linearised Holding Samples ( $V_o$ ) are taken for one Target Sample ( $\overline{V_o}$ ). It is further assumed that the white noise is much smaller than flicker noise.

According to the Central-Limit Theorem [58],  $V_o$  has a Gaussian Distribution, as the input noise and device noise are from a large number of independent noise sources (each transistor or resistor is an independent noise source). So the standard error of  $V_o$  is the RMS noise voltage,  $V_n$ .

If the noise was white, the N samples are supposed to be unrelated to each other. The standard error  $(V_E)$  of the  $Target\ Sample$  should be

$$V_E = \frac{V_n}{\sqrt{N}}$$

<sup>&</sup>lt;sup>1</sup>Here means to calculate the mathematical mean value of a number of samples, i.e. the genuine averaging. It is unlike the "averaging" done by the core SHA circuit in Sub-Section 7.5, which is effectively a low-pass filter.

However, as for the pink noise, i.e. flicker noise, averaging of samples does not reduce the noise level as quick as for the white noise [59]. This is because flicker noise has stronger power at lower frequency. Repetitive sampling, which takes longer time, encounters more low-frequency noise, and so the N samples can no longer be considered as "unrelated".



Figure 9.2: Continuous sampling affected by low-frequency noise

Figure 9.2 illustrates this effect. Obtaining N samples requires  $NT_p$  of time. Consequently, some fluctuation (low-frequency noise), which is too slow to affect one sample, can make obvious difference among the N samples. If the noise was white, the fluctuation had the same power density as the high-frequency noise, and therefore submerged into the usual sample deviations. But as the pink noise has strong power in low frequency, the co-relation among the samples caused by the fluctuation is no longer ignorable. The mathematical proof is presented below.

When N samples are taken, the total Presenting Time is increased to  $NT_p$ . Consequently the lower limit frequency  $f_l$  in Equation (9.1) should be divided by N. Therefore

$$\begin{cases} V_n = K \ln \frac{f_h}{f_l} \\ V_{na} = K \ln \frac{Nf_h}{f_l} \end{cases}$$

where  $V_{na}$  is the over-all RMS noise voltage of the N samples, and K,  $f_h$  and  $f_l$  have the same definition as Equation (9.1). So

$$K = V_n (\ln \frac{f_h}{f_l})^{-1}$$

and

$$V_{na} = V_n + K \ln N$$

$$= V_n + V_n \left(\ln \frac{f_h}{f_l}\right)^{-1} \ln N$$

$$= V_n (1 + \alpha \ln N)$$

where  $\alpha = (\ln \frac{f_h}{f_l})^{-1}$ . As  $f_l$  is typically smaller than  $\frac{1}{10T_p}$  and  $f_h$  is typically higher than  $\frac{10}{T_p}$ , it is fairly enough to ensure that  $0 < \alpha < (\ln e^2)^{-1} = \frac{1}{2}$ .

Thus the standard error of the Target Sample is

$$V_E = \frac{V_{na}}{\sqrt{N}}$$
$$= V_n \frac{1 + \alpha \ln N}{\sqrt{N}}$$

As  $0 < \alpha < \frac{1}{2}$ ,

$$1+\alpha \ln N < 1+\frac{1}{2} \ln N < \sqrt{N}$$

So

$$V_E < V_n$$

which means the noise level is reduced by digital averaging. It is reduced by a factor of  $\frac{1+\alpha \ln N}{\sqrt{N}}$ , weaker than  $\frac{1}{\sqrt{N}}$  in the case of white noise. As N increases,  $V_E$  approaches zero.

In practice, however, N cannot increase unlimitedly. Large N needs a large total  $Presenting\ Time$ , which probably encounters measurement errors other than noise, i.e. those errors due to environmental changes, such as temperature, and mechanical vibration affecting the light path.

# 9.4 Summary

This chapter analysed the noise performance in the Sub-Sampling SHA. The theory of noise-folding in sub-sampling was presented at first, then two built-in low-pass filters were characterised. These filters were actually the switched-capacitor structure in the core circuit of SHA, and the high-gain low-bandwidth buffer in the LFA (Linearising Feedback Amplifier). They could eliminate most white noise due to the noise-folding, and interference from control signals. The flicker noise was also considered in this chapter, and it could be reduced by digital averaging.

# Part IV

# On-Chip Data Acquisition System

Part IV presents the structure of the on-chip ultra-fast DAQ for OSAM. The DAQ contains a sensor array of optical front-ends. The optical front-end circuits for the DAQ, including an on-chip photo-diode and a broadband transimpedance amplifier, are based on the work of Dr. Li [10, 11]. A power-management circuit is included in each of the pixel circuits in order to minimise the power dissipation. Part of the Sub-Sampling SHA is also embedded in each of the array pixel, so that the sampling quality can be guaranteed. Current-based buffers are applied to send the control pulses from the pulse generator to the pixel circuits and the common back-end circuit. The timing and spatial scanning methodology for the measurement is also introduced in Part IV.

The front-end circuits are described in Chapter 10. Chapter 11 presents the details of the DAQ system for the OSAM sensor array.

# Chapter 10

# Front-End Circuits

This chapter introduces the optical front-end circuits used in the presented DAQ system for OSAM. These circuits are based on designs by my colleague, Dr. Mexiong Li in his PhD thesis [11] and two of his papers[10, 60]. Modifications have been made to the circuits, so that they can be used in the presented DAQ system.

# 10.1 Photo-Diode

The requirement of the Photo-Diode (PD) in the on-chip DAQ system includes the compatibility with the standard CMOS process, and with a several-GHz bandwidth. Figure 10.1 shows the cross-section of the PD designed by Li in [11], which meets the requirement.

In this PD, the N-well is the active area where the incoming light is detected. The P+ and N+ diffusion regions are the anode and cathode of the PD, respectively. When the PD is reverse-biased, the electron-hole pairs generated in the N-well by the incoming photons are separated by the electrical field, and collected by either the anode (electrons) or the cathode (holes). Therefore a



Figure 10.1: Cross-section of the Photo-Diode implemented in AMS C35

current proportional to the light power is generated. The N-well is also used as a screening terminal to block the slow bulk carriers [61], thus increasing the speed and bandwidth.

The PD in the 10.5GS/s DAQ is identical to Li's design, and is approximately  $45\mu m \times 45\mu m$  in size. The PD in the 2.6GS/s DAQ has the same structure, but the total length and width are doubled, i.e. approximately  $90\mu m \times 90\mu m$ . The size increase provides a larger output current for the same light intensity. Since its capacitor is also increased, the bandwidth is reduced. However, as the bandwidth requirement for the 2.6GS/s DAQ is significantly eased, the size increase improves the over-all performance.

# 10.2 Trans-Impedance Amplifier and Low-Pass Filter

The Trans-Impedance Amplifier (TIA) and its associated Low-Pass Filter (LPF) used in the DAQ are shown in Figure 10.2, and are based on the input stage of the TIA designed by Li[60], i.e. a Regulated Cascode (RGC) TIA. The following stages in Li's design are removed because several inductors are included in those stages, whose area is too big to fit into every pixel of a sensor array. Moreover, the output load of the TIA in the presented DAQ, which is the input capacitance

of the Sub-Sampling SHA, is quite small (less than 20fF even considering the parasitic capacitance). Therefore the following stages in Li's design, whose function is increasing the output power of the TIA, are unnecessary.



Figure 10.2: Trans-Impedance Amplifier and Low-Pass Filter

As shown in the figure, transistor MN1 acts as a common-gate amplifier, or a current buffer, which has a current gain of 1 but has a small input impedance. Therefore the output AC current  $i_{out}$  is equal to the input  $i_{in}$ , and the transimpedance gain is

$$G_{TIA} = \frac{v_{out}}{i_{in}} = \frac{i_{out}R_L}{i_{in}} = R_L$$

MN2 is an active feedback to the common-gate amplifier, which significantly reduces the input impedance of the TIA further (only  $9\Omega$  in ADS simulation). With such a small input impedance, the amplifier can achieve a GHz bandwidth, even when the PD has a big parasitic capacitance itself<sup>1</sup>. The capacitor  $C_L$  forms a first-order LPF together with  $R_L$ . This LPF is used to limit the bandwidth of the TIA so that the Nyquist law can be satisfied, i.e. the bandwidth of the input must be less than half of the sampling rate.

The transistor sizes and the resistance of  $R_L$  in Figure 10.2 are different to those in Li's design. These modifications are required because the DC operating point needs to match the Sub-Sampling SHA, and the gain is also raised to improve

 $<sup>^1</sup>$  The parasitic capacitance is approximately  $0.3pF\sim0.4pF[11].$  The corner frequency of the input port of the TIA is at least  $f_c=\frac{1}{2\pi\times0.4pF\times9\Omega}=44GHz.$  Therefore the bandwidth of the TIA is mainly limited by the output port and the intrinsic high-frequency performance of the transistors in the TIA.

the SNR before the signal enters the noisy SHA.



Figure 10.3: Frequency response of TIA

Figure 10.3 shows the simulation results. The gain of the TIA for 10.5GS/s DAQ is  $2.0k\Omega(66dB\Omega)$ , and its 3dB corner frequency is 2.4GHz. The gain of the TIA for 2.6GS/s DAQ is  $4.0k\Omega(72dB\Omega)$ , and its 3dB corner frequency is 0.8GHz in Cadence post-layout simulation. Figure 10.4 shows the noise levels at the output ports of the TIAs in ADS simulation<sup>2</sup>. These are equivalent to a 0.85mV-RMS noise at the TIA for the 10.5GS/s DAQ, and a 1.5mV-RMS noise at that for the 2.6GS/s DAQ.



Figure 10.4: Noise at the output of TIA

# 10.3 Summary

This chapter introduced the optical front-end circuits used in the DAQ. These circuits are based on the works of my colleague, Dr. Mexiong Li [11, 10, 60].

<sup>&</sup>lt;sup>2</sup>In these simulations, the PD is replaced by a capacitor.

The circuits included a high-speed Photo Diode, and a broad-band TIA (Trans-Impedance Amplifier). Some modifications were made to the circuits, so that they could be used in the presented DAQ system.

# Chapter 11

# DAQ for OSAM Sensor Array

As mentioned in the introduction in Part I, a sensor array is usually used to sense the probe laser so that the spatial information can be obtained. This chapter presents the integration of the DAQ for the OSAM sensor array based on the pulse generator and the sub-sampling SHA, which are described in Part III and Part III respectively.

The contents in this chapter are applied to both of the 10.5GSample/s DAQ and the 2.6GSample/s DAQ. The following discussion is mainly focused on the 10.5GS/s DAQ, while the same design techniques are also used in the 2.6GS/s DAQ.

# 11.1 Power management

## 11.1.1 The power issue

A reoccurring problem with high-speed design is power consumption. With any design of multiple sensor arrays, more modules which have large power consumption should be placed in the common ports of the chip.

Table 11.1 shows the supply current of some key modules in the 10.5GS/s DAQ system.

| Module Name                             | Supply Current $(V_{dd} = 3.3V)$ | Design details in           |
|-----------------------------------------|----------------------------------|-----------------------------|
| PLL with QVCO                           | 56mA                             | Chapter 4 on page 27        |
| PG (Pulse Generator)<br>(exc. PLL)      | 36mA                             | Chapter 5 on page 63        |
| PD (Photo Detector)                     | Tiny                             | Section 10.1 on<br>page 150 |
| TIA (Trans-Impedance<br>Amplifier)      | 1mA                              | Section 10.2 on<br>page 151 |
| Sub-Sampling SHA<br>(core circuit)      | 1mA                              | Chapter 7 on page 93        |
| LFA (Linearising<br>Feedback Amplifier) | 0.43mA                           | Section 8.1 on page 106     |

Table 11.1: Power Consumption of some key modules in the 10.5GS/s DAQ

According to the table, the PG (including the PLL) must be put in the common part of the on-chip DAQ circuit, rather than implemented in every single pixel circuit. This saves not only the power consumption, but also the chip area.

All other modules consume significantly less power. However, each pixel needs one PD, one TIA, two core Sub-Sampling SHAs, and one LFA. The total supply current of one pixel is therefore 3.43mA plus the current for the bias sources. For a  $2\times 8$  array, the over-all array current is more than 54.9mA, which corresponds to 181mW of power dissipation.

## 11.1.2 Pseudo-parallel array operating

To overcome the power-consumption issue, a pseudo-parallel strategy is applied to the array operating. In this strategy, only one or several pixels are enabled and operating, while the remaining pixels are powered down and so consume little power. The control circuit enables the array pixels one by one, or several pixels each time.

As for the DAQ for OSAM system, the input laser is a stable periodic signal. Therefore this pseudo-parallel strategy does not affect the system performance in theory, and just increases the time to acquire the signal. However in reality, the total time for obtaining data from all pixels should not be so long that the environmental parameters, such as the temperature, are obviously changed.

According to Chapter 7 on page 93, each pixel circuit provides 128 *Linearised Holding Samples*<sup>1</sup>. Consequently there are two scanning methods for the whole array.

**Timing-first scanning** Every time one pixel (or several pixels) is enabled, all 128 *Linearised Holding Samples* are obtained. After that, the pixel is disabled, and the next one is enabled to obtain its *Linearised Holding Samples*.

**Spacial-first scanning** Every time one pixel (or several pixels) is enabled, only one *Linearised Holding Sample* is obtained. After every pixel has been accessed, the  $\frac{1}{128}T$  delay is inserted in the Pulse Generator. Therefore at the next time when each of the pixels is enabled one by one, the next *Linearised Holding Sample* can be obtained.

The presented on-chip DAQ has a  $2 \times 8$  (row×column) sensor array, in which two pixels on the same column are enabled together every time. A 3-bit address bus is used to select the column to be enabled. The pseudo-parallel strategy is implemented by changing the low-frequency dividers in the Pulse Generator (presented in Section 5.6 on page 77), as shown in Figure 11.1.

# 11.1.3 Current/voltage source with enabling feature

The enabling feature of the pixel circuits is implemented in their current or voltage sources, i.e. when the pixel needs to be enabled, the sources give the correct biases so that the pixel circuits are operating; But when the pixel needs to be disabled, the sources provide the biases which make the pixel circuits shut down.

 $<sup>^1</sup>$ The definition of *Linearised Holding Sample* can be found in Section 7.6 on page 101 and Sub-Section 8.1.2 on page 107.



Figure 11.1: Implementation of pseudo-parallel array operating



Figure 11.2: Current source for TIA with enabling feature

Figure 11.2 shows a current source with such a feature, which is used by the TIAs. This source is based on a self-biased reference in Lee's book[23]. The PNP transistor T1 is connected as a diode. The reference current  $I_{ref} = \frac{V_{EB}}{R}$ , where  $V_{EB}$  is the voltage between the emitter and the base terminal of T1.  $V_{EB}$  is usually a constant, i.e. the forward-biased voltage of a diode. Therefore  $I_{ref}$  is inversely-proportional to R. If ignoring the matching variety during the chip fabrication,  $I_{ref}$  is inversely-proportional to  $R_L$  of the TIA<sup>2</sup> as well. So no matter how the resistivity is changed by the process variety, the DC operating point of  $v_{out}$  (the output port of the TIA) does not change, i.e.

$$V_{out} = V_{dd} - I_{ref}R_L = V_{dd} - constant$$

When the pixel is disabled ( $\overline{En}=1$ ), transistor MN1 pulls down  $V_{Bn}$  to a voltage close to ground.  $I_{ref}$  is consequently equal to zero. So no current, except leakage ones, goes through the TIA, and it hardly consumes any power.

When the pixel is enabled  $(\overline{En}=0)$ , transistor MN1 shuts off. Because of the delay of the inverter INV0, there is a very short time that transistors MP1 and MP2 are both turned on. Therefore  $V_{Bn}$  is connected to  $V_{dd}$  during this short time, which charges it to a high voltage. In this condition, transistors MN2 and MN3 are turned on, and so are transistors MP3 and MP4. After MP1 shuts off, the self-biased reference gradually turns to the normal operating status, i.e.  $I_{ref}$  is stabilized in the desired value. The simulation in Cadence shows that it takes less than 6ns for the current source to become stable after the enabling signal is established.

# 11.2 SHA partition

Because of the pseudo-parallel strategy, only one or several pixels are operating at a moment while all others shut off. Therefore it is possible for the pixels to share some part of their circuits.

<sup>&</sup>lt;sup>2</sup>See Figure 10.2 on page 152 for details.

As mentioned in Sub-Section 11.1.1, the PG (Pulse Generator) is definitely in the common part of the on-chip system due to its high power consumption and large chip area. Theoretically, all other modules in the DAQ, except the PDs (Photo-Diodes), can be shared among the pixels.

However, the geometry size of the PD array is quite large. For example, the presented array is  $2 \times 8$ , while each PD is  $45\mu m \times 45\mu m$ . If adding a  $5\mu m$  gap between the PDs for isolation and connection, the total PD array size is approximately  $100\mu m \times 400\mu m$ .

In this case, if all other modules, including the TIA and the Sub-Sampling SHA, are shared by the pixels, the connection wires must travel hundreds of microns from the PDs to the commonly-shared circuits. These wires inevitably introduce huge parasitic capacitance, which causes a narrower bandwidth and a longer signal delay. For this reason, those circuits which require a high bandwidth or high speed, e.g. the TIA, are not suitable for sharing among the pixels.

As for the Sub-Sampling SHA, which transfers the RF-band signal to a very low frequency, its high-speed part should remain in every pixel, and the low-frequency part can be put in the common circuits. Figure 11.3 shows the partition of the Sub-Sampling SHA<sup>3</sup>.



Figure 11.3: Partition of Sub-Sampling SHA

Every pixel has its own Input Sampler, which samples the RF-band signal from the front end (PD and TIA), in order to keep the bandwidth of the signal. The

 $<sup>^3</sup>$ The details of the Input Sampler, the Feedback Sampler, and the Buffer can be found in Sub-Section 8.1.2 on page 107.

Buffer operates in low frequency, and therefore can be shared. The Feedback Sampler is also a high-speed sub-module. But it samples the output of the Buffer, which is a base-band signal from a shared sub-module. Therefore it can be shared by all pixels as well.

A CMOS switch controlled by the pixel address lines is inserted between the Input Sampler and the Buffer. This is because all Input Samplers sharing the same Buffer are connected together to this point. A switch on each pixel can avoid the unexpected circuit short.

As mentioned in Sub-Section 11.1.2 on page 156, two pixels at the same column are enabled to operate at the same time. Therefore in the presented DAQ, there are two sets of the structure shown in Figure 11.3, each of which is for one row of the pixels in the  $2 \times 8$  array.

#### 11.3 Interface to Pulse Generator

The PG (Pulse Generator) must be commonly shared by all pixels due to the power consumption issue. As a result, the output of PG, the control pulses<sup>4</sup>, need to travel hundreds of microns to reach every pixel and the common part of the circuit.

Fortunately, transferring the control pulses are easier than transferring the output of PDs to a shared TIA. The output current of a PD is an analogue signal, and cannot be distorted in any case. On the other hand, the control pulses are digital signals, which are quite robust to distortion. Moreover, the distortion, which is effectively the Aperture Window Effect mentioned in Sub-Section 8.2.2 on page 117, can be compensated by a digital filter<sup>5</sup>.

 $<sup>^4 {\</sup>rm i.e.}$  Ap, An, Bp, Bn and Cn in Figure 5.2 on page 65, and Figure 5.3 on page 66  $^5 {\rm See}$  Section 8.2 on page 115 for details

#### 11.3.1 The current-mode buffer

To help the control pulses travel through all pixels, a current-mode buffer is designed to regenerate the pulses at the pixel side. Figure 11.4 shows the structure of the buffer.



Figure 11.4: Current-mode buffer for control pulses

The buffer can be considered as a source-follower at the PG side, and a commongate amplifier at the pixel side. The source-follower has a low output resistance, while the common-gate amplifier has a low input resistance. As a result, both sides can keep a high bandwidth, even with the large parasitic capacitance from the long connecting wires.

Moreover, the form of the signal on the long connecting wires is current rather than voltage, as the PG side is a current amplifier while the pixel side is a current buffer. This is the reason why it is called the "current-mode" buffer.

Transistor MN1 in Figure 11.4 can be put in the PG side, so that it needs just one transistor to be shared for all pixels. However, it remains in the pixel side in order to provide a better frequency response for the common-gate amplifier. Therefore the rising and falling edges of the pulses regenerated at the pixel side can be sharper.

Another advantage of this buffer is that when the pixel is disabled, transistors MN1 and MN2 are turned off. Then the parasitic capacitance on the terminal of the connecting wire is approximately 3.7fF. On the other hand, if a normal voltage buffer was used, the gate terminal of the transistor would be connected with the wire, and the capacitance would be about 16fF in total (Assuming the same size of transistors are used).

The current-mode buffer was used to transfer the differential signals Ap/An and  $Bp/Bn^6$ , and so each pair of the differential signals requires two sets of buffers in Figure 11.4. As for the control pulse Cn, whose voltage swing is much larger than that of Ap/An and Bp/Bn, the buffer is not suitable. Consequently, the differential signal pair Cpo/Cno is transferred by two sets of the current-mode buffers. On the pixel side, a differential-to-single-ended buffer generates Cn from the pair Cpo/Cno. Thus, there are in total six sets of current-mode buffers which are used to transfer the control pulses from the PG to the pixels.

# 11.4 Array architecture

#### 11.4.1 Single-ended sensor array

As a summary, Figure 11.5 illustrates the final system-level architecture of the 10.5GSample/s DAQ for OSAM sensor array. This is a  $2 \times 8$  single-ended sensor array which operates in the pseudo-parallel mode. Three address lines,  $Addr2 \sim Addr0$ , are used to select the column to be enabled. The two pixels on the same column are enabled together, so there are two output channels, i.e. Output0 and Output1 in the figure. As the two pixels in one column are identical, they share one bias source and one current-mode buffer (pixel side). The same configuration applies to the two output channels as well.

Two enabled pixels in the same column consume approximately 5.8mA of current in total. In comparison, the currents of disabled pixels are significantly

 $<sup>^6</sup> Please$  refer to Section 5.4 on page 72 for the details of the generation of Ap/An, Bp/Bn, Cn, and Cpo/Cno mentioned later on.



Figure 11.5: DAQ system architecture for OSAM sensor array

smaller and can be ignored. The output channels consumes 6.9mA current. Therefore the total current of the analogue part, i.e. the pixels and the output channels, is 12.7mA in 3.3V power supply. On the other hand, the digital part, i.e. the power-hungry PG, takes 92mA. The total power of all on-chip circuits of the DAQ system is approximately 0.35W ( $105mA \times 3.3V$ ).

#### 11.4.2 1-D differential sensor array

The 1-Dimensional differential sensor array is also used in OSAM applications[4]. The presented  $2 \times 8$  array can be easily configured to a  $1 \times 8$  differential array, by adding a differential-to-single-ended amplifier. This can be done on-chip or off-chip.

The presented 2.6GSample/s DAQ has been designed for such a 1-D differential array. Its architecture is generally the same as Figure 11.5, except that the output channels are replaced as Figure 11.6. The differential-to-single-ended amplifier is an instrumentation amplifier with a gain option of either  $\times 50$  or  $\times 250$ , which is selected by the signal  $G_{sel}$ .



Figure 11.6: Output channel for 1-D differential sensor array

The pixel circuit of the 2.6GS/s DAQ consumes 4.3mA of current, while the output channel consumes 6.6mA. Therefore the total power dissipation of the analogue part is 36mW in 3.3V power supply, and that of the digital part is 170mW. The whole on-chip circuits of the DAQ consume approximately 0.21W of power.

# 11.5 Summary

This chapter presented the design of the DAQ for the OSAM sensor array. The DAQ system was based on the the Pulse Generator and the Sub-Sampling SHA presented in Part II and Part III respectively. To minimise the power consumption of the DAQ system, a pseudo-parallel strategy of array scanning, and the bias sources with enabling feature were developed. A current-based buffer was designed to transfer the control pulses from the pulse generator to the pixel circuits without degrading the quality of the pulses very much. The partition of the SHA and the overall architecture were also discussed and presented in this chapter.

# Part V

Implementation,

Measurement, and Summary

# Chapter 12

# Implementation and measurement results

# 12.1 Specification of Chip RF2

Three prototypes of the DAQ system have been implemented on Chip RF2, which was fabricated in June 2007 using AMS C35 process. Table 12.1 gives the detailed specification of these prototypes. Figure 12.1(a) shows the fabricated chip under a microscope. The size of the die is  $3.1mm \times 3.1mm$ .

Prototype 1 was designed to achieve the main design target, i.e. a DAQ for OSAM sensor array with a sampling rate of more than 10GSample/s. Its architecture is exactly the one shown in Figure 11.5 on page 164. Figure 12.1(b) is its layout diagram.

Prototype 2 is the 2.6GSample/s DAQ, which applied some conservative design techniques, and so has a lower sampling rate, higher gain, and better SNR. Moreover, it was designed as a differential sensor array, in order to reduce more common-mode noise. Prototype 2's architecture is generally similar to Figure 11.5 on page 164, except that the output channel is modified to include an

|                        | Prototype 1                   | Prototype 2               | Prototype 3               |
|------------------------|-------------------------------|---------------------------|---------------------------|
|                        | $2 \times 8 \text{ PD array}$ | $1 \times 7$ differential | One differential          |
|                        | with $10.5GS/s$               | PD array with             | PD with                   |
|                        | DAQ                           | 2.6GS/s DAQ               | 10.5GS/s  DAQ             |
| Front End              | -                             | -                         | -                         |
| Photo-Diode (PD)       |                               |                           |                           |
| size (each pixel)      | $45 \times 45$                | $90 \times 90 (\times 2)$ | $45 \times 45 (\times 2)$ |
| $(\mu m \times \mu m)$ |                               |                           |                           |
| PD array size          | $2 \times 8$                  | $1 \times 7$              | $1 \times 1$              |
| Electrical input       | 0                             | 1                         | 0                         |
| Differential /         | Cinale anded                  | Differential              | Differential              |
| Single-ended           | Single-ended                  | Differential              | Differential              |
| TIA Gain $(\Omega)$    | 2000                          | 4000                      | 4000                      |
| First Corner           |                               |                           |                           |
| Frequency of LPF       | 2.4                           | 0.8                       | 1.3                       |
| (GHZ)                  |                               |                           |                           |
| Sub-Sampling           |                               |                           |                           |
| SHA                    | -                             | -                         | -                         |
| Sample number for      | 128                           | 32                        | 128                       |
| a full period          | 120                           | 32                        | 120                       |
| Equivalent             |                               |                           |                           |
| sampling rate for      | 10.496                        | 2.624                     | 10.496                    |
| 82MHz input            | 10.430                        | 2.024                     | 10.430                    |
| (GSample/sec)          |                               |                           |                           |
| Voltage Gain           | 1                             | 50 or 250                 | 50 or 250                 |
| Pulse Generator        | -                             | -                         | -                         |
| Clock source           | $\times 32 \text{ PLL}$       | ×32 PLL                   | $\times 32 \text{ PLL}$   |
| Quadrature clock       | Yes                           | No                        | Yes                       |
| outputs                |                               |                           |                           |
| Effective Virtual      | $\frac{1}{128}$ of signal     | $\frac{1}{64}$ of signal  | $\frac{1}{128}$ of signal |
| Pulse width            | period                        | period                    | period                    |

Table 12.1: Circuit Specifications







(b) Layout diagram of Prototype



Figure 12.1: Chip RF2: Photo and layout diagrams (A: Pulse Generator; B: PDs; C: Pixel circuits other than PD; D: Output channels; E:  $\rm I/O~pads$ )

output differential amplifier as shown in Figure 11.6 on page 165. The circuit was designed for a  $1 \times 8$  differential PD array. However, the last pair of PDs have not been implemented. One of its TIA inputs is connected to a chip input pin, and the other is open ended. This allows the electronic-only testing. The layout diagram of *Prototype 2* is shown in Figure 12.1(c).

Prototype 3 is effectively a combination of design techniques used in Prototype 1 and 2. It has Prototype 1's PD, Sub-Sampling SHA, and Pulse Generator. On the other hand, it also has Prototype 2's TIA for a higher gain and differential structure for better noise performance. Figure 12.1(d) illustrates Prototype 3's layout.

Figure 12.2 shows a photo of the testing platform for Chip RF2.



Figure 12.2: Testing platform for Chip RF2
(A: Pulse laser source; B: Laser attenuators and lenses; C: Focusing lens; D: Testing board with Chip RF2 mounted; E: FPGA board; F: Continuous-wave laser source (not in use).

In the next two sections, Section 12.2 and 12.3, the measurement results of *Prototype 1* and *Prototype 2* are presented. *Prototype 3* has very similar measurement results, and encountered similar issues as those in *Prototype 1* and 2, which are therefore omitted in this thesis. However, the omitted results of *Prototype 3* can be found in the paper [62].

## 12.2 Measurement Results of *Prototype 1*

#### 12.2.1 Measurement setup

#### Laser source

To test the chip, the reflected probe laser was replaced by either a pulse laser, or a modulated Continuous-Wave (CW) laser.

The pulse laser source used in the measurement is a femto-second pulse laser with the repetitive rate of 80MHz. As a result, the internal PLL in *Prototype* 1 operates at

$$80MHz \times 32 = 2.56GHz$$

and the sampling rate is therefore

$$80MHz \times 128 = 10.24GSample/s$$

The wavelength of the laser is 800nm, and the light power reaching the surface of the chip is 2.2mW. The 80MHz synchronised signal from the laser source is used as the reference input of the PLL inside the DAQ.

The CW laser source is a laser diode HFE6391-561 from Advanced Optical Ltd., which provides light at 840nm wavelength and 0.6mW of power. This laser source was directly modulated by either a 80MHz signal, or one of its harmonics.

#### **FPGA**

The off-chip logic, which provides the low-frequency divider (Section 5.6 on page 77) and the control of the data acquisition (i.e. pseudo-parallel array operating, Sub-Section 11.1.2 on page 156), is implemented on an FPGA, Xilinx



Figure 12.3: Off-chip logic used for chip-testing

Spartan-3 XCS200FT256-4. Figure 12.3 shows the sketch of the circuits inside the FPGA and their interface with the on-chip DAQ.

As shown in the figure, the FPGA provides four options for the presenting time of a sample (Section 7.6 on page 101 and Section 9.3 on page 142):  $2\mu s$ ,  $20\mu s$ ,  $200\mu s$ , and manual control. The first 3 options respectively correspond to 160, 1600, and 16000 times of repetitive sampling for each Target Sample (Section 7.6 on page 101). The last option uses a button as a manual clock input, which can be used to lock one Linearised Holding Sample on the output channel during the testing.

As mentioned in Sub-Section 11.1.2 on page 156, there are two possible modes of scanning (the timing-first scanning and the spatial-first scanning) which can be implemented on the FPGA. In current measurements, timing-first scanning was usually applied (as shown in Figure 12.3), because it is more convenient for separately processing the data of each pixel. If the spacial-first scanning was used, the data from one pixel would be interwoven with the data from the other pixels. The address line can also be switched to manual input mode, which is used to lock one pixel on the output channel during the testing.

#### ADC and Digital Filter

In order to simplify and shorten the design period, a digitally-stored oscilloscope, rather than a custom ADC chip, was used as the ADC. The digital filter was actually implemented with a few Matlab programmes<sup>1</sup>. These two off-chip modules were not the main design targets of this thesis, and can be easily implemented with current mature design technologies, either off-chip or on-chip.

#### 12.2.2 Measurement of dark output

When there is no light applied on the PD array, the output of *Prototype 1* is not a straight line. Figure 12.4 is the dark output of one pixel in *Prototype 1*<sup>2</sup>. In this test, the 80MHz electrical synchronising signal from the laser source was connected to the circuit as the reference of the clock source, but no light was shone on the photo-diodes. As shown in the figure, the 128 samples are divided into 4 *output groups*, each of which has its own DC level. This is caused by the asymmetry of the clock source, namely the 4-phase clock errors (see Section 8.3 on page 120 for details).

Since there is no light input on the chip, one would expect 4 straight lines, one for each *output group*. However, there is some fluctuation around the DC offset lines caused by electrical noise within the detector. This is the static dark noise.

There is a correlation among the dark noises of all pixels, indicating a common noise source.

The PLL in the PG (Pulse Generator) is synchronised with the reference signal, and all of its signals are either 80MHz or its harmonics. The VCO and its buffers in the PLL are power-hungry modules. Consequently the supply current

<sup>&</sup>lt;sup>1</sup>The functions of the ADC and the digital filter can be found in Section 7.1 on page 93. The design detail of the digital filter is in Section 8.4 on page 133.

 $<sup>^2</sup>$ This means the output on the pin of Chip RF2, i.e. the output signal of the on-chip output channel in Figure 11.5 on page 164. The signal on the chip pin is in a much lower frequency because of the Sub-Sampling SHA. But in the following figures of this chapter, the time-domain signals are all presented as if they were in the original RF band, i.e. the repeating frequency is 80MHz.



Figure 12.4: Dark output of Prototype 1

of the PLL will have a large frequency components at 80MHz and its harmonics. The currents and voltages in the PLL can cause significant interference via the power supply wires, parallel wires, and the substrate. To minimise the interference, the power supply of the PG is independent from that of the pixel circuits and the output channels. However, the generated pulses are used to drive the SHAs, which are physically close to the TIAs. The TIA circuit is sensitive to small currents, including noise currents in the substrate.

### 12.2.3 Measurement with pulse laser input

The femto-second laser pulses are significantly shorter than the time respond of the circuit used in the detection system. Effectively, the laser pulses can be considered as a perfect ideal impulse stream which includes all frequency from DC to a frequency significantly higher than 10GHz. When the laser pulses are applied to the PD array, the output of the DAQ will be the impulse response of the system, i.e. the Inverse Fourier Transform of the frequency response of the DAQ.

Figure 12.5 shows the original output of a pixel on *Prototype 1* when the pulse



Figure 12.5: Original output of *Prototype 1* when pulse laser is applied

laser was applied on that pixel<sup>3</sup>. 128 samples were obtained for the whole period of the input signal, i.e. one sample every 97.7ps. As shown in the figure, there is a sharp negative peak near 2ns, which is the position when the laser pulse hits the PD. The buffer in the output channel has a negative gain, and so the initial output is negative. After the negative peak, there are a positive overshoot and a damped oscillation, which will be explained later. According to the figure, the error due to 4-phase clock is obvious and needs to be removed.

The RMS (Root-Mean-Square) of the random noise on the output is 8mV, while the peak-to-peak voltage of the signal is 420mV.

$$\frac{420mV}{8mV} = 52.5 < 2^6$$

So a 6-bit ADC is enough for digitizing the output<sup>4</sup>.

To eliminate the static dark noise and the system errors, the methods presented in Section 8.3 on page 120 should be applied. As the precise solution needs

The size of the focused laser spot is much larger than a pixel (approximately as big as  $3 \times 3$  pixels). So not all the 2.2mW laser power goes into the same pixel.

<sup>&</sup>lt;sup>4</sup>Since the pulse laser is the most powerful input signal in the current measurements, 6 bits can be considered as the maximum resolution of the presented DAQ.

the measurement results using CW laser source, it will be discussed in the next sub-section.

Figure 12.6 shows the processed output of the pixel on *Prototype 1* after the approximate solution is applied to remove the 4-phase clock error and the dark noise.



Figure 12.6: Processed output of  $Prototype\ 1$  by removing system error and dark noise

In this figure, the peak is much wider ( $\sim 0.5ns$ ) than the laser pulse, because the LPF in the front-end has limited the bandwidth. Moreover, the intrinsic bandwidth of the Sub-Sampling SHA widens the pulse further.

After the peak, there is a damped oscillation with a period approximately 2ns. This indicates a pair of poles near 500MHz, which is possibly caused by the feedback loop in the TIA. One pair of its poles depends on the parasitic capacitor of the PD. As the photo-diode is not a standard device in AMS C35 Library, its parasitic capacitors and resistors may have not been accurately modelled in the post-layout simulation.

Another possible reason for the damped oscillation may be the leakage current in the photo-diode, as shown in Figure 12.7. In the photo-diode, the N-well and the P-substrate form an additional reverse-biased PN junction. This junction will also generate electron-hole pairs when the photons enter the junction, and therefore produce a small current. A small proportion of this current would go through the substrate, and could possibly interfere with the TIA circuits. The current should arrive the TIA later than the current coming from the P+terminals of the photo-diode, therefore forms the damped oscillation after the initial peak response.



Figure 12.7: Leakage current from the N-well-P-sub junction

Figure 12.8 is the normalised frequency response of the DAQ system, i.e. the DFT of Figure 12.6. Due to the damped oscillation, there is a peak near 400MHz, which indicates the position of the pole pair mentioned above. This frequency response can be used to generate the FIR filter described in Section 8.4 on page 133.

#### 12.2.4 Measurement with modulated CW laser input

The testing method for the CW laser is similar to the Calibration Procedure of the approximate solution, which is described on Page 137, Sub-Section 8.4.2. The only difference is that the fundamental frequency  $f_0$  is 80MHz in the measurement, in order to be more comparable to the measurement result from the pulse laser.

It needs to be noted that although the signal being modulated to the laser



Figure 12.8: Frequency response of the DAQ in *Prototype 1* 

source is a sine wave, the actual optical signal is not sinusoidal. This is because the output power range of the laser diode being used is relatively narrow for this application<sup>5</sup>. To achieve enough noticeable response on the output port, the voltage swing of the signal being modulated has to be of a large value. It is so large that the laser diode is not working in its linear range, and consequently the optical signal is not sinusoidal. Moreover, the laser diode circuit cannot keep its input impedance constant due to the large operating range. Therefore the unmatched impedance will cause reflections to the signal generator, which will distort the output waveform even worse.

Figure 12.9(a) shows the original output when a signal  $f = 2f_0$  is modulated onto the CW laser source. After the 4-phase clock system error and the dark noise are removed, as shown in Figure 12.9(b), the output is not a sine wave.

Because of the non-sinusoidal input signal, there could be more than one frequency element on the output, i.e. one is at the input frequency, and the others are its harmonics. For example, if input frequency  $f=2f_0$ , the frequency elements on the output would include  $2f_0$ ,  $4f_0$ ,  $6f_0$ , etc.

 $<sup>^5</sup>$ The slope efficiency is only 0.075mW/mA near the standard forward bias current 6.5mA





(b) Output removing system error and dark noise

Figure 12.9: Waveform of signal  $f=2f_0$ 

Figure 12.10 shows the normalised frequency response measured with a modulated CW laser input. This result was obtained by the calibration procedure for the approximate solution presented on page 137. During the measurement, only the response on the original input frequency is considered, while the harmonics are ignored. The frequencies of more than  $40f_0$  (3200MHz) are not shown here because the obtained output is too weak and noisy.



Figure 12.10: Frequency Response of Circuit C in CW laser-input test

Compared to the measurement result from the pulse laser in Sub-Section 12.2.3 (the dashed line), the results from the CW laser are much more uneven. This is mainly because the CW laser source has much lower power, and the power is spread over the time. On the other hand, the power of the pulse laser source is higher, and concentrated on just one spot of each period. Therefore the SNR of the CW laser measurement is much lower than that of the pulse laser one, and the measurement result is less accurate.

Moreover, the non-linear effect on the laser diode, and the different wavelengths of the two laser sources introduced more variation between the two measurement results.

In both of the two measurements, the digital part of the chip, i.e. the Pulse

Generator, consumes 123mA of current, while the analogue part, i.e. the pixel circuit and the output channels, consumed 15.8mA of current.

# Retrieve laser pulse input with the digital filter based on the precise solution

According to the theory in Sub-Section 8.3.2 on page 122, and the digital filter presented in Sub-Section 8.4.1 on page 133, the measurement result with CW laser input can be used to generate the calibration matrices. Moreover, the pulse laser input can be retrieved from its measurement result by these calibration matrices.

However, as mentioned above, the measurement result with CW laser input is very noisy, and it contains unexpected harmonics because the laser diode operated in the non-linear region. Therefore the calibration matrices would be inaccurate, and so would be the retrieved signal.

There are two issues in the CW laser measurements, and so two corresponding amendments to the generation of the calibration matrices are applied here:

#### 1 Frequencies higher than $40f_0$

As mentioned above, the results for frequencies higher than  $40f_0$  are not available in CW laser measurement. The corresponding coefficients (i.e.  $\mathcal{D}_z(k)$ , for  $z=1,\,2,\,3,\,4$  and 40< k<89), which are unknown in this case, are replaced by a significantly large random value. Therefore, the calibration matrices, which are the inverse matrices of those with  $\mathcal{D}_z(k)$  coefficients, would have very small factors for these frequencies. Consequently, the digital filter will provide small and ignorable values at those frequencies.

#### 2 Phase information

The phase information of the CW laser measurement is unavailable. There were two signal generators during the measurement, one provided the  $f_0$ 

signal to synchronise the on-chip PLL, the other provided the  $Nf_0$  signal to drive the laser diode. These two generators were phase-locked to each other, but their phase difference was not a constant. It changed randomly every time the frequency of either one of the generators was modified. However, the phase information can be estimated, because the relative phases among the 4 Output Groups are still measurable, and the absolute phases should be very close to the results in the approximate solution. The phases are estimated as follows:

- (a) In Step 4 of the calibration procedure on page 135, get  $\mathcal{D}_{oz}(k)$  for all Output Groups;
- (b) Calculate the phases of these complex values, namely  $\phi_0$ ,  $\phi_1$ ,  $\phi_2$ , and  $\phi_3$ ;
- (c) The mean phase  $\overline{\phi} = \frac{1}{4} \sum_{z=1}^{3} \phi_z$ ;
- (d) Get the corresponding phase value  $\psi_a$  in the pulse laser measurement with the approximate solution;
- (e) The new phase  $\psi_z = \phi_z \overline{\phi} + \psi_a$ , where  $z = 0 \sim 3$ ;
- (f) Adjust the phases of  $\mathcal{D}_{oz}(k)$  to  $\psi_z$

Figure 12.11(a) shows the calculation result of the digital filter output. Ideally, the retrieved signal is supposed to be similar to a short pulse. Its frequency response is a nearly flat line from DC to half of the sampling rate, except that there are 3 zero-points,  $16f_0$ ,  $32f_0$ , and  $48f_0$ . However, as shown in Figure 12.10, the measured results of CW laser and pulse laser are quite different. Consequently the retrieved signal in frequency domain will not be flat.

In figure 12.11(a), there are a few spikes in high-frequency range, more precisely,  $21f_0$ ,  $27f_0$ ,  $33f_0$ ,  $36f_0$ , etc. Compared to Figure 12.10, the measured frequency responses of CW laser at these points are abnormally small due to the poor SNR. This results in larger-than-normal coefficients at the calibration matrices for these frequencies.



(a) Initial calculation result



(b) retrieved signal with low-frequency only

Figure 12.11: Retrieved signal in frequency domain

To retrieve a more reasonable signal, the frequency information higher than  $16f_0$  is eliminated as shown in 12.11(b). By applying the Inverse Discrete Fourier Transform, the retrieved laser pulse signal in time domain is shown in Figure 12.12. As expected, the retrieved signal is poor because of the low SNR in the CW laser measurement. However, a positive pulse is obviously shown in the figure.



Figure 12.12: Retrieved signal in time domain

If a CW laser with stronger light power is used, the measurement results from CW laser input would be more accurate, and so would be the calibration matrices. In this case, a better retrieved signal could be generated.

#### 12.2.5 Array output and light leakage

Figure 12.13 is a photo of *Prototype 1* under testing when the pulse laser is focused on the top-side of its PD array. The voltage output of each PD in the array is shown in Figure 12.14.

To investigate the light power received on each pixel, the RMS of the output voltage is calculated, as shown in Figure 12.15(a). In this figure, the brightness



Figure 12.13: Photo: the laser is focusing to the top of the array in *Prototype 1* 

of the 16 rectangles represents the RMS voltage of the 16 pixels. A brighter color means a larger RMS voltage. Because each pixel has its own gain due to the process and match variety in the chip, Figure 12.15(a) does not clearly show the trend of brightness changing.

This gain variety can be calibrated by a set of reference outputs with equal light inputs, i.e. applying an equal light signal onto each pixel, and measuring the RMS output voltage. This measurement result for the equal input is shown in 12.15(b).

By dividing the values in Figure 12.15(a) by the values in Figure 12.15(b), the normalised RMS output voltage was obtain in 12.15(c). It indicates the light power received in each pixel. As shown in the figure, the left rectangles are brighter as the laser is focusing on the top-side of the array in the photo.

However, even those pixels not hit by the focused laser spot (the "dark" pixels) have outputs. The outputs are similar to the pixels hit by the laser (the "bright" pixels), but have smaller amplitudes. This means the laser still affects the "dark" pixels. There are two possible ways for the laser signal to reach the "dark" pixels, optically or electrically.



Figure 12.14: Output waveforms of the pixel array (X-axes: Time (ns); Y-axes: Voltage (V))



Figure 12.15: Relative light power received on the PD array

When a "dark" pixel is enabled, the "bright" pixels are disabled, therefore the current through the "bright" pixels is small. Compared to the dark noise generated by the power-hungry pulse generator, the electrical interference from the disabled "bright" pixels can be ignored.

So the "dark" pixel signals are induced optically by the laser. The light entering the "bright" pixels reflects or scatters from the area around the "bright" pixels into the dark ones, because the isolation between the PDs are narrow. Also the laser will produce some current in the substrate, as shown in Figure 12.7 on page 178, and this current will interfere the "dark" pixels as well.

# 12.3 Measurement Results of Prototype 2

As mentioned in Section 12.1,  $Prototype\ 2$  is a  $2.624GSample/s\ DAQ$  with a  $1\times 8$  differential array. Each of the first 7 pixels has one pair of PDs, while the last pixel has an electronic input only. According to its design details presented from Part II to Part IV, it has a much slower sampling rate and a narrower Front-End bandwidth, but a much higher gain. It is based on more conservative design techniques, which should make it more reliable than  $Prototype\ 1$ .

#### 12.3.1 Measurement of the photo-diode array

The measurement setup for the PD array testing is similar to that for *Prototype* 1, i.e. applying either a pulse laser or a modulated CW laser to the PDs.

Unfortunately, the optical measurement was unsuccessful. The DC input to the two differential input terminals of the instrumentation amplifier in the output channel (see Figure 11.6 on page 165 for details) are unbalanced. The difference between their DC-operation points is far more than expected. In most chip samples, it is so large that it exceeded the linear range of the instrumentation amplifier, and the output signal was "stuck" to either GND or VDD, and no valid data can be obtained.

For those rare pixels where the inputs to the instrumentation amplifier were nearly balanced, part of the expected waveform can be seen on the output. However, the static dark noise was larger than expected. The overall sum of the dark noise and the required output exceeds the linear output range, i.e. the supposed peak-to-peak voltage is more than VDD-GND.

This imbalance was mainly caused by the layout difference and mismatching among the pixel circuits. It is a big mistake not to add a bias circuit to adjust the balance the instrumentation amplifier<sup>6</sup>. To overcome this, a bias circuit should have been added to allow the DC offsets to be adjusted.

#### 12.3.2 Measurement of the electrical-input port

The inherent DC offset problem could be solved for the electrical input once a DC current is inserted to compensate the imbalance.

The testing method for the electrical input is similar to that with the CW laser, except that the modulated CW laser is replaced by an electrical signal. The fundamental frequency in this measurement is 82MHz.

<sup>&</sup>lt;sup>6</sup>Prototype 3 also has the issue of unbalanced differential signals. However, in Prototype 3, the smaller PD size provides a smaller gain. So the problem is easier to solve in Prototype 3, which is achieved by moving the focused laser spot closer to one PD than the other in the PD pair. In this situation, the electrical imbalance is compensated by the optical imbalance.



Figure 12.16: Normalised frequency response of *Prototype 2* 

Figure 12.16 shows the normalised frequency response of the pixel with electrical input<sup>7</sup>. This result was obtained by the calibration procedure for the approximate solution presented on page 137. The bandwidth shown in the figure is narrow (the 3dB point is less than 400MHz), because the chip package and the input pin are not designated for RF applications<sup>8</sup> and limit the over-all bandwidth.

# 12.4 Summary

This chapter presented the measurement results of the designed DAQ system. This DAQ was implemented in AMS C35 process on Chip RF2. The DAQ Prototype 1 in Chip RF2 contains a  $2\times 8$  high-speed optical sensor array, and the 10.496GS/s ( $82MHz\times 128$ ) sampling circuits. But due to the availability of the laser sources, it operated in 10.24GS/s ( $80MHz\times 128$ ) during testing. The

 $<sup>^7</sup>$ The electrical input is not a standard RF terminal. So the measured absolute voltage gain is inaccurate. The estimation of the absolute gain at 82MHz is 75dB.

<sup>&</sup>lt;sup>8</sup>The RF input of the presented DAQ system is an optical signal, and the output of the system is in base-band. Consequently a "non-RF" IC package is used to reduce the cost.

measurement results showed that the circuits successfully achieved the required sampling rate (> 10GS/s), with a maximum output resolution of approximately 6 bits. However, the prototypes also encountered some problems, which include the static dark noise, severe 4-phase-clock errors, and light leakage.

The DAQ Prototype 2 in Chip RF2 has a more conservative sampling rate of 2.624GS/s ( $82MHz \times 32$ ), a  $1 \times 7$  differential optical sensor array, and another electrical-input port as the 8th pixel of the array. The measurement on the electrical-input showed that this DAQ achieved the expected sampling rate. However, because the optical differential pixels were badly unbalanced, no usable data was collected during the measurements with optical inputs.

Possible solutions for these arising issues are discussed in the next chapter.

# Chapter 13

# Issues arising and further work

# 13.1 Current issues and possible solutions

Although the presented DAQ system worked successfully, there are a few issues which need to be solved. This section presents possible solutions, which can be applied to future work.

#### 13.1.1 Static dark noise and 4-phase-clock error

As mentioned in Section 12.2, the static dark noise and the noise caused by the 4-phase clock source have an obvious influence on the output, especially in the case of the a CW laser source. As shown in Figure 12.9 on page 180(b), the peak-to-peak voltage of the output signal is approximately 18mV. On the other hand, according to Figure 12.4 on page 175, the biggest DC difference among the 4 *output groups* is more than 40mV, while the static dark noise has a peak-to-peak voltage about 5mV.

In the current DAQ system, these errors are pre-measured and corrected in the off-chip digital filter. However, the errors are comparable to or even larger than the desired signal. This will inevitably limit the dynamic range of the output buffer. For example, if an ADC with a linear input range of  $0 \sim 2V$  is used to digitise the output, an amplifier with a gain of 100 can be inserted before the ADC for a 18mV peak-to-peak signal  $(18mV \times 100 = 1.8V)$  assuming no DC offsets. However with the 40mV of 4-phase-clock error and 5mV of static dark noise, the real peak-to-peak voltage at the output pin is about 50mV (Figure 12.9(a) on page 180). Therefore, the gain of the amplifier should be no more than 40. Consequently, the effective resolution is decreased.

A solution to this problem is to remove the errors on-chip in the first place. Figure 13.1 illustrates one solution, which is a modified pixel circuit.



Figure 13.1: Pixel circuit removing dark noise and 4-phase-clock error

This circuit has two operating modes. One is the *sampling mode*, in which the switch S1 is turned on, and the sampled electrical charge from  $C_{smp}$  is stored in the capacitor  $C_{hld}$ . This mode is similar to the pixel circuits in Chip RF2. The output  $V_{out}$  includes both the required signal and the errors.

The other mode is the reference mode, in which the switch S1 is turned off, and the sampled electrical charge from  $C_{smp}$  is stored in the capacitor  $C_{hld0}$ . In this mode, the optical signal is blocked. The output  $V_{ref}$  is the "dark" output, which contains the static dark noise and the 4-phase-clock error.

After the reference has been obtained, the required signal is the difference between  $V_{out}$  and  $V_{ref}$ .

This circuit does not overcome one source of DC offset. The PD does have a current flowing through it even in the dark. This current is included in the sampling mode, but not in the reference mode. However, this current is usually less than 100pA, while the "bright" currents in our experiments are usually more than  $1\mu A[11]$ . So this current can be ignored.

Another error still remaining is that caused by the 4-phase clock. The input of the reference mode is a DC signal. So only the DC part of the 4-phase-clock error can be removed. The AC part of the error depends on the AC property of the input signal itself, and cannot be removed by this method<sup>1</sup>. But as mentioned in Sub-Section 8.3.3 on page 130, the DC part is the dominant error, and cannot be ignored. The AC part is relatively much smaller, and can be ignored.

In short,  $V_{out} - V_{ref}$  can be considered as a hardware implementation of the approximate solution presented in Sub-Section 8.3.3 on page 130.

As there are two output ports in Figure 13.1, the number of LFAs (Linearising Feedback Amplifiers) in the output channel should be doubled, one LFA for linearising  $V_{out}$ , the other for linearising  $V_{ref}$ . A differential amplifier can be used after the two LFAs to amplify  $V_{out} - V_{ref}$ . Figure 13.2 shows the new output channel for the single-ended PD array.



Figure 13.2: Output channel for the error-removing pixel circuits

 $<sup>^{1}</sup>$  The detail of the principle of the 4-phase-clock error is described in Section 8.3 on page 120.

#### 13.1.2 Unbalanced differential pixels

Prototype 2 tries to increase the output gain by using a differential instrumentation amplifier. Unfortunately, as mentioned in Section 12.3, the circuit fails because of the large difference in DC offsets. Extra circuits are required to adjust the balance of the differential signals.

If the pixel circuit in Figure 13.1, which removes the dark noise and the 4-phase-clock error, is applied, the output is generally the "net" response to the laser signal only. This means, ideally, the two differential inputs of the instrumentation amplifier are naturally balanced, because the DC levels are removed in the same way as the dark noise and the 4-phase-clock error.

However, the balance-adjusting circuit will still be required in case of mismatch in the output channel circuits. The conventional methods for balancing operational amplifiers [47, 57] could be applied here.

#### 13.1.3 Issues in the front-end circuits

Although the design of the front-end circuits is not the main target of this thesis, it is still worth discussing the solutions to the encountered issues.

#### Light leakage

As mentioned in Sub-Section 12.2.5 on page 185, there was light leakage among the PD pixels because the isolation and distance between the PD is too small, which leads to scattered light being detected by the adjacent pixels. This can be solved by increasing the gap between the PDs, and adding more isolation, such as densely-placed and interlaced metal wires and vias, and thick guard-rings.

#### Peak in frequency response

As shown in Figure 12.8 on page 179 and Figure 12.10 on page 181, there is a peak near 400MHz caused by a pair of poles generated by the parasitic capacitor of the PD and the TIA. This peak can be removed by modifying the feedback gain of the TIA.

On the other hand, the pole pair can be exploited to increase the bandwidth. If the pole pair was placed near the original 3dB cut-off frequency, the attenuation around that frequency can be compensated by the pole pair. Consequently the over-all bandwidth is increased.

However, the PD is not a standard device in the given process, and the estimation of its parasitic capacitance is inaccurate in the design software. Moreover, the reverse-biased PD cannot be simply considered as an ideal capacitor, as the resistivity of the N-well is not small enough to ignore. A more accurate model is needed if we are going to exploit the pole pair quantitatively.

### 13.2 Other possible improvements

#### 13.2.1 Using more advanced process technology

A possible direct improvement to the presented DAQ system is to use a more advanced CMOS process rather than the current  $0.35\mu m$  process.

With a shorter gate width, higher  $f_T$ , and higher  $f_{max}$ , the transistors in a more advanced process would have a quicker switching speed and better RF performance. The DAQ circuit can therefore achieve a higher sampling rate with the same architecture and design technique. Generally, if  $f_T$  and  $f_{max}$  were increased by a factor of N, it could be expected that the sampling rate would be boosted by approximately N times as well.

Alternatively, if the sampling rate remains unchanged, other properties of the DAQ can be easily improved.

Firstly, the power consumption is expected to decrease. In a more advanced CMOS process, the required power supply voltage is usually smaller. This will result in a lower power consumption, if the supply current does not increase. On the other hand, if the same clock frequency is used, the supply current should decrease rather than increase in the power-hungry pulse generator.

The reason for less supply current is because in a more advanced process, the switching speed of the transistors is higher. Consequently the switching time of the clock buffers, i.e. the time of the clock signal switch between 1 and 0, is shortened. Those clock buffers are actually logic inverters, and are the cause of a large portion of the power consumption of the pulse generator. Most of the power is dissipated during the switching time. If  $f_T$  and  $f_{max}$  were increased by the factor of N, the switching time was expected to decrease by approximately N times. The shortened switching time with an unchanged operating frequency results in a lower supply current, and consequently lower power consumption.

Secondly, the higher switching speed may provide a better frequency response for the DAQ. In Sub-Section 8.2.2 on page 117, it is described that the Aperture Window Effect depends on the speed of the differential transistor switches. A higher switching speed means a sharp aperture window, and therefore a better response for the DAQ at the high frequency.

#### 13.2.2 Larger-size array

Another possible improvement is to increase the array size. But two potential issues may arise in the larger-size array.

One issue is the trade-off between the scanning time and the power consumption. In Chip RF2, the pseudo-parallel strategy is applied to save the total power. It sacrifices the total sampling time as the system scans the pixels one by one<sup>2</sup>. If

<sup>&</sup>lt;sup>2</sup>See Section 11.1 on page 155 for details.

the array size is increased, the scanning time will have to increase. However, it cannot be so long that the environmental parameters, such as the temperature, changed. On the other hand, a longer scanning time introduces more low-frequency flicker noise to the system, which reduces the SNR.

To reduce the scanning time, several pixels have to operate simultaneously, i.e. in parallel. In Chip RF2, two pixels in the same row operate at the same time. For a larger array, the parallelism should be enhanced to reduce the scanning time. But more parallelism means more power consumption. A careful trade-off between the power and scanning time needs to be investigated for a large-size array.

The other issue is the optical efficiency. Currently, the array size for single-ended PDs is  $2 \times 8$ . The average area of one PD is  $2.5 \times 10^3 \mu m^2$ , while its pixel circuit requires on average<sup>3</sup> approximately  $13.7 \times 10^3 \mu m^2$ . As there are only 2 rows, the PDs can be assembled in one place, while the pixel circuits are at two sides of them, as shown in Figure 12.1(b) on page 170. In this case, the large size of the pixel circuits does not cause any problem. However, if there are more than two rows, the pixel circuits would inevitably be placed between the PDs. Therefore some light energy would be wasted as some of the light hits the circuits rather than the PDs. In this case, a more powerful laser source would be required.

<sup>&</sup>lt;sup>3</sup>In *Prototype 1*, two pixels in the same row share the current source and pulse buffers. Here the average area for one pixel circuit is half of the total area of two pixel circuits in the same row.

## Chapter 14

## Conclusions

This thesis presents an on-chip ultra-fast DAQ (Data AcQuisition) system for OSAM (Optical Scanning Acoustic Microscopy), which is implemented on a standard  $0.35\mu m$  CMOS process, AMS C35 process.

OSAM is a non-contact method for investigating the properties of solid materials. In OSAM system, a high-power pulse laser is applied on the material, and stimulates surface acoustic waves on the material surface. At the same time, another continuous-wave laser (the "probe" laser) with a much lower power is also applied on the surface. Its reflection can be used to investigate the vibration of the material.

The purpose of the presented DAQ is to sample the reflection of the probe laser, and then digitise it. The reflected laser signal has a period of approximately 80MHz. The actual value depends on the repetitive rate of the pulse laser (either 82MHz or 80MHz during designing and measurement). The required sampling rate for the DAQ is at least 10GSample/s.

To achieve this sampling rate, a clock signal greater than this frequency is needed. However, the transistors in the  $0.35\mu m$  CMOS process are not quick enough to provide a 10GHz clock directly.

To overcome this limitation, a PLL with 4-phase clock outputs was designed and implemented. The reference signal from the pulse laser source is used as its reference input. The output frequency is 32 times the reference, i.e. 2.624GHz (or 2.56GHz). The oscillator inside the PLL is a QVCO, which is effectively 2 cross-coupled VCOs. The coupling makes the phase between the output of VCOs fixed at 90°. Therefore the over-all output phases are 0°, 90°, 180°, and 270°. The effectively clock frequency is 4 times the actual frequency, i.e. 10.496GHz (or 10.24GHz).

Based on this clock source, a pulse generator was designed to provide the control pulses for the sampler. The pulses was generated by a digital circuit, DDU (Digital Delay Unit). It used the 4-phase output from the PLL as the trigger clocks. Therefore the jitter of the control pulses was minimized as the pulses were aligned with the PLL.

The pulse generator had a 32/33 dual-mode frequency divider, and a switch box which can re-shuffle the 4-phase clocks. These two sub-modules were used to generate a short delay, which was only  $\frac{1}{128}$  of the fundamental period (i.e. 95ps for 82MHz reference, or 98ps for 80MHz reference). This delay was required by the sampler to shift the acquired samples one by one on the output port. To generate the  $\frac{1}{128}T$  delay, the switch box re-shuffles the 4-phase clock so that a  $90^{\circ}$  delay is provided for the 2.624GHz (or 2.56GHz) clock.

The signal was acquired by a Sub-Sampling SHA (Sample-and-Hold Amplifier), which used the sub-sampling method to obtain high-frequency information at a relatively slow sampling rate. The charge-domain sampling strategy and double differential switches were used in this circuit to significantly shorten the effective sampling pulse, so that the high-frequency information would not lost during the sampling. The periodicity of the system input was exploited in repetitive sampling to reduce the noise. The presented sampler obtained 128 samples for the whole period of the input signal, which was equivalent to a sampling rate of  $82MHz \times 128 = 10.496GSample/s$  (or 10.24GSample/s in the case of the 80MHz pulse laser).

To correct the intrinsic errors in the Sub-Sampling SHA, several assisting modules was designed. These include a Linearising Feedback Amplifier to remove the non-linear effect, and a digital filter to compensate the uneven frequency response of the sampler and the 4-phase-clock error.

A DAQ for the OSAM sensor array was presented, based on the Sub-Sampling SHA and the pulse generator. The optical front-end (the photo-diode, the transimpedance amplifier and the low-pass filter) in the sensor array is a modified version of Dr. Li's work. To minimise the power consumption of the DAQ system, a pseudo-parallel strategy of array scanning, and the bias sources with enabling feature were designed. A current-based buffer was presented to transfer the control pulses from the pulse generator to the pixel circuits without degrading the quality of the pulses very much.

The presented DAQ system was implemented in AMS C35 process on Chip RF2. The measurement results show that the circuits have achieved the required more-than-10GHz sampling rate successfully, with a maximum output resolution of approximately 6 bits.

However, the prototypes also encountered some problems, which include that the static dark noise and 4-phase-clock error were far more severe than expected, and the differential pixels were badly unbalanced. A new pixel circuit with a dark output as an auxiliary reference output is suggested to overcome these issues. In addition, using a more advanced CMOS process and increasing the array size are also discussed in the thesis.

The following list is the highlights of the novel contribution of this thesis and their locations in the thesis.

• A clock source providing high-frequency information with low-cost process technology (Chapter 4): the PLL with 4-phase clock outputs, which is generated by a QVCO. The clock operates at 2.624GHz, but the 4-phase outputs give an equivalent 10.496GHz frequency information.

- An optimising method for designing high-speed static CML frequency dividers (Sub-Section 4.3.2 and Appendix A): With this method, one frequency divider in Chip RF1 achieves an operating frequency of 5.5GHz (this is the average value for all samples, while the maximum one is 5.7GHz). This is the fastest one reported so far in 0.35μm CMOS processes.
- A novel pulse generator to provide control pulses for the ultra-fast sampler (Chapter 5):
  - The digital circuit based DDU (Digital Delay Unit) minimizes the jitter of the pulses by aligning them with the clock signals from the PLL (Section 5.4).
  - The switch box and the 32/33 dual-mode frequency divider generate the required  $\frac{1}{128}T$  delay smartly, while the clock period is just  $\frac{1}{32}T$  (Section 5.2, 5.3, and 5.5).
- The 10.496 GS ample/s Sub-Sampling SHA (Chapter 7 and 8) with features including:
  - Sub-sampling for periodic signal to obtain high-frequency information by a achievable sampling rate (Section 7.2);
  - Charge-domain sampling for quicker sampling (Section 7.3);
  - Double differential switches for quicker sampling (Section 7.4);
  - Repetitive sampling to remove noise (Section 7.5);
  - Linearising Feedback Amplifier to remove non-linearity (Section 8.1);
  - Digital filter to compensate for the integration effect and the aperture window effect, and to remove the 4-phase-clock error (Section 8.2, 8.3, and 8.4).
- The DAQ for OSAM sensor array (Chapter 11):
  - Pseudo-parallel strategy of array scanning to minimize the power consumption (Section 11.1);

Current-based buffer for re-generating control pulses in the pixel circuits (Section 11.3).

Two papers have been published based on the work in this thesis:

- Peiliang Dong, Richard Smith, Barrie Hayes-Gill, and Ian Harrison, 10.2GSample/s
   DAQ system for Optical Scanning Acoustic Microscopy using 0.35μm CMOS
   Technology, IET Seminar on RF and Microwave IC Design, Feb 2008;
- Peiliang Dong, Barrie Hayes-Gill, Ian Harrison, Simple optimising methodology for static frequency divider design, Electronics Letters, Volume 42, Issue 22, Oct. 26 2006 Page(s):1267 – 1268;

Part VI

Appendix

## Appendix A

# Description of Chip RF1

### A.1 Review of the optimising theory

Sub-Section 4.3.2 on page 37 presents an optimising methodology for designing static CML Frequency Dividers (FD). This theory is focused on speed optimisation of the CML divide-by-2 FD, which consists of two CML D-type latches. Figure A.1 shows such a latch.



Figure A.1: SCL D-type latch

According to the theory, the optimising method can be summarised as two simple steps[39]: Firstly, in the transistors MN1 and MN2's operating range,

apply a DC simulation to obtain the mean value of the trans-conductance,  $G_m$ ; Secondly, use Equation (4.8) to calculate the estimated optimum value for the load resistors  $R_{op}$ , i.e.

$$R_{op} \approx \frac{1.60}{G_m} \tag{A.1}$$

This value gives nearly the fastest operating speed when other parameters are given and unchanged. The maximum operating frequency  $f_{max-op}$  is (Equation (4.9))

$$f_{max-op} = 0.187 \frac{G_m}{C_2} = \frac{0.298}{R_{op}C_2}$$
 (A.2)

However, Equations (A.1) and (A.2) ignore the delay effect due to the capacitance on the point S in the Figure (A.1). If this is considered, the results are the numerical solution of Equation (4.4) and (4.5), i.e.

$$\begin{cases}
G_v(t_T) = 1 \\
G_v(t_T) = RG_m \left( 1 - \frac{2T_1 - T_2}{T_1 - T_2} e^{-\frac{t_T}{T_1}} + \frac{T_2}{T_1 - T_2} e^{-\frac{t_T}{T_2}} \right)
\end{cases} (A.3)$$

and

$$\begin{cases} T_1 = RC_2 \\ T_2 = \frac{C_1}{G_m} \end{cases}$$

where R is the load resistance,  $C_1$  is the capacitance on the point between the load resistor and the transistor (either MN1 or MN2),  $C_2$  is the capacitance on the point S,  $t_T$  is the toggling time of the latch, i.e.

$$f_{max-op} = \frac{1}{2t_T}$$

Equations (A.1) and (A.2) are actually based on the assumption that  $T_1$  dominates the delay effect, and  $T_2$  is ignored.

A fine-tune based on CAD software is needed after this optimisation, as a lot of simplifications are applied to obtain all equations above. This optimising method is suitable for design parameter estimation in early-stage design.

### A.2 Implementation

To validate this optimising method, nine  $\div 4$  static FDs are designed and fabricated on Chip RF1 with a standard  $0.35\mu m$  CMOS process (AMS C35 process). Every divider consists of two  $\div 2$  FDs, which are connected in cascade mode. The investigation is focused on the first-stage  $\div 2$  FDs, which works at the higher frequency environment. The second-stage FDs of all circuits are the same, in order to give the same load capacitance to the first-stage FDs.

The feeding current of the first-stage FDs are all the same (3mA). So each FD consumes the same amount of power and has nearly the same  $C_1$  and  $C_2$ . The only difference amongst the first-stage FDs is the load resistance R. The nine different values of R were chosen for each divider. These values cover a wide range so that the effect of R on the maximum operating frequency can be shown. If the proposed Equation (A.1) is valid, the FD with the optimum load resistance will have the highest operating frequency.

Based on (A.1), the optimum value of R is  $0.726k\Omega$ . If  $T_2$  in (A.3) is not ignored, the numerical solution of optimum R is  $0.729k\Omega$ .

The designate load resistance of the nine first-stage FDs ranges from  $0.51k\Omega$  to  $1.25k\Omega$ . One of them has a load resistance of  $0.73k\Omega$ , which should be the fastest FD, if the proposed optimizing method is right. Figure A.2 shows the die photos. The left photo (Figure A.2(a)) shows all circuits, including the nine  $\div 4$  FD and a  $\div 2$  FD. The last circuit is used to characterize the second-stage  $\div 2$  FDs in those  $\div 4$  FDs. It has the second-stage FD and the output buffer only, without the first-stage FD. The right photo (Figure A.2(b)) is one  $\div 4$  FD under testing, which is connected by three probes and two needles.

#### A.3 Simulation and measurement results

The simulation and measurement results of RF1 are presented in Sub-Section 4.3.2, page 45, the paragraphs after "Validation and trade-off".



Figure A.2: Die photos of divided-by-four frequency dividers

Bibliography and Index

# Bibliography

- [1] M. Clark, S. Sharples and M. Somekh, 'Non-contact acoustic microscopy', Measurement Science & Technology, Vol. 11, Issue 12, 2000, pp.1792-1801.
- [2] M. Clark, S. D. Sharples and M. G. Somekh, 'Fast, All-Optical Rayleigh Wave Microscope: Imaging on Isotropic and Anisotropic Materials', Ultrasonics, Ferroelectrics and Frequency Control, IEEE Transactions on, Vol. 47, Issue 1, Jan. 2000, pp.65-74.
- [3] S. D. Sharples, M. Clark and M. G. Somekh, 'All-optical adaptive scanning acoustic microscope', Ultrasonics, Vol. 41, Issue 4, June 2003, pp.295-299.
- [4] S. D. Sharples, 'All-Optical Scanning Acoustic Microscope' Ph.D. thesis, the University of Nottingham, 2003.
- [5] S. D. Sharples, M. Clark and M. Somekh, 'Surface acoustic wavefront sensor using custom optics', Ultrasonics, Vol. 42, Issue 1-9, Apr. 2004, pp.647-651.
- [6] J.-P. Monchalin, 'Optical detection of ultrasound', IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, Vol. 33, Issue 5, 1986, pp.485-499.
- [7] J.-P. Monchalin, 'Heterodyne interferometric laser probe to measure continuous ultrasonic displacements', Review of Scientific Instruments, Vol. 56, Issue 4, 1985, pp.543-546.

[8] M. Klein, B. Pouet and P. Mitchell, 'Photo-emf detector enables laser ultrasonic receiver', Laser Focus World, Vol. 36, Issue 8, 2000, pp.25-27.

- [9] O. B. Wright and K. Kawashima, 'Ultrasonic Detection from Picosecond Surface Vibrations: Application to Interfacial Layer Detection', Jpn. J. Appl. Phys, Vol. 32, 1993, pp.2452-2454.
- [10] M. Li, B. Hayes-Gill, M. Clark et al., '5 GHz front-end for active pixel applications in standard 0.35 $\mu$ m CMOS', Proceedings of SPIE The International Society for Optical Engineering, Jan. 2007, .
- [11] M. Li, '5 GHz Optical Front End in 0.35μm CMOS' Ph.D. thesis, The University of Nottingham, Oct. 2007.
- [12] Bellescizede , 'La reception synchrone', Onde. Electr., Vol. 11, June 1932, pp.230-240.
- [13] M.-F. Lai and M. Nakano, 'Special section on Phase-Locked Loop Techniques', IEEE Transactions on Industrial Electronics, Vol. 43, Issue 6, 1996, pp.607-608.
- [14] A. Blachard, 'Phase-Locked Loops: Applications to Coherent Receiver Design', New York: Wiley, 1976.
- [15] F. M. Gardner, 'Phase Lock Techniques', 2nd Edition, New York: Wiley, 1979.
- [16] W. C. Lindsey and C. M. Chie, 'A survey of digital phase-locked loops', Proc. IEEE, Vol. 69, 1981, pp.410-431.
- [17] E. Wilson, 'Electronic Communication Technology', London: Prentice-Hall International, 1989.
- [18] G.-C. Hsieh and J. C. Hung, 'Phase-Locked Loop Techniques A survey', IEEE Transactions on Industrial Electronics, Vol. 43, Issue 6, 1996, pp.609-615.
- [19] S. G. Burns and P. R. Bond, 'Principles of Electronic Circuits', 2nd Edition, Boston: PWS Pub. Co., 1997.

[20] B. Razavi, 'Design of Integrated Circuits for Optical Communications', International Edition, McGraw-Hill Companies, Inc., 2003.

- [21] B. Razavi, 'RF Microelectronics', Prentice Hall PTR, 1998.
- [22] C. A. Sharpe, 'A 3-State Phase Detector Can Improve Your Next PLL Design', EDN, 20 Sept 1976, pp.55-59.
- [23] T. H. Lee, 'The Design of CMOS Radio-Frequency Integrated Circuits', 2nd Edition, Cambridge University Press, 2004.
- [24] Z. Tang, 'LC Voltage-Controlled Oscillators' Ph.D. thesis, Fudan University, China, Spring 2004.
- [25] B. Razavi, 'Challenges in the design of frequency synthesizers for wireless applications', Custom Integrated Circuits Conference, 1997., Proceedings of the IEEE 1997, 5-8 May 1997, pp.395-402.
- [26] H.-D. Wohlmuth and D. Kehrer, 'A high sensitivity static 2:1 frequency divider up to 27GHz in 120nm CMOS', Solid-State Circuits Conference, 2002.
   ESSCIRC 2002. Proceedings of the 28th European, 24-26 Sept. 2002, pp.823-826.
- [27] W. Fang, A. Brunnschweiler and P. Ashburn, 'An analytical maximum toggle frequency expression and its application to optimizing high-speed ECL frequency dividers', Solid-State Circuits, IEEE Journal of, Vol. 25, Issue 4, Aug. 1990, pp.920-931.
- [28] T. Collines, V. Manan and S. Long, 'Design analysis and circuit enhancements for high-speed bipolar flip-flops', Solid-State Circuits, IEEE Journal of, Vol. 40, Issue 5, 2005, pp.1166-1174.
- [29] J. Lu, L. Tian, H. Chen et al., 'Design techniques of CMOS SCL circuits for Gb/s application', ASIC, 2001. Proceedings. 4th International Conference on, 23-25 Oct 2001, pp.559-562.

[30] M. Alioto and G. Palumbo, 'Design Strategies for Source Coupled Logic Gates', Circuits and Systems I: Fundamental Theory and Applications, IEEE Trans. on, Vol. 50, Issue 5, 2003, pp.640-654.

- [31] G. Chien, 'Low-Noise Local Oscillator Design Techniques using a DLL-based Frequency Multiplier for Wireless Applications' Ph.D. thesis, University of California, Berkeley, Spring 2000.
- [32] A. Rofougaran, J. Rael, M. Rofougaran and A. Abidi, 'A 900 MHz CMOS LC-oscillator with quadrature outputs', Solid-State Circuits Conference, 1996.
  Digest of Technical Papers. 43rd ISSCC., 1996 IEEE International, 08-10
  Feb. 1996, pp.392-393.
- [33] B. Razavi, 'A 1.8-GHz CMOS voltage-controlled oscillator', Solid-State Circuits Conference, 1997. Digest of Technical Papers. 44th ISSCC., 1997 IEEE International, 6-8 Feb. 1997, pp.388-389.
- [34] A. Rofougaran, G. Chang, J. J. Rael et al., 'A single-chip 900-MHz spreadspectrum wireless transceiver in 1 – μm CMOS-Part I: Architecture and transmitter design', Solid-State Circuits, IEEE Journal of, Vol. 33, Issue 4, Apr. 1998, pp.515-534.
- [35] Austria Micro Systems,  $0.35\mu m$  CMOS C35 RF SPICE Models, Rev. 5.0, Nov., 2005, .
- [36] E. Bogatin, 'Signal Integrity: Simplified', Simplified Chinese edition, Pearson Education Asia Ltd. and Publishing House of Electronics Industry, 2005.
- [37] P. E. Allen and D. R. Holberg, 'CMOS Analog Circuit Design', 2nd Edition, Oxford University Press Inc, USA, 2002.
- [38] Y. Cheng, M. Chan, K. Hui et al., 'BSIM3v3 Manual', Final Version, Dept. of EECS, U. of California, Berkeley, Regents of the University of California, 1995, 1996.
- [39] P. Dong, B. Hayes-Gill and I. Harrison, 'Simple optimising methodology for static frequency divider design', Electronics Letters, Vol. 42, Issue 22, Oct. 2006, pp.1267-1269.

[40] J. Wong, V. Cheung and H. Luong, 'A 1-V 2.5-mW 5.2-GHz frequency divider in a 0.35μm CMOS process', Vol. 38, Issue 10, Oct. 2003, pp.1643-1648.

- [41] F. De Miranda, S. Navarro Jr. and W. Van Noije, 'A 4 GHz dual modulus divider-by 32/33 prescaler in 0.35μm CMOS technology', Integrated Circuits and Systems Design, 2004. SBCCI 2004. 17th Symposium on, 7-11 Sept. 2004, pp.94-99.
- [42] L. Romano, S. Levantino, S. Pellerano et al., 'Low jitter design of a 0.35μm CMOS frequency divider operating up to 3GHz', Solid-State Circuits Conference, 2002. ESSCIRC 2002. Proceedings of the 28th European, 24-26 Sept. 2002, pp.611-614.
- [43] Austria Micro Systems,  $0.35\mu m$  CMOS C35 Process Parameters, Rev. 4.0, 2005, ...
- [44] D. A. Hodges, H. G. Jackson and R. A. Saleh, 'Analysis and Design of Digital Integrated Circuits: In Deep Submicron Technology', 3rd Edition, The McGraw-Hill Companies, Inc., 2003.
- [45] C. S. Vaucher, I. Ferencic, M. Locher et al., 'A family of low-power truly modular programmable dividers in standard 0.35 μm CMOS technology', IEEE Journal of Solid-State Circuits, Vol. 35, Issue 7, July 2000, pp.1039-1045.
- [46] B. Razavi, 'Design of sample-and-hold amplifiers for high-speed low-voltage A/D converters', Custom Integrated Circuits Conference, 1997., Proceedings of the IEEE, 1997, pp.59-66.
- [47] B. Razavi, 'Desgin of Analog CMOS Integrated Circuits', McGraw-Hill Higher Education, 2001.
- [48] P. Chan, A. Rofougaran, K. Ahmed and A. Abidi, 'A Highly Linear 1-GHz CMOS Downconversion Mixer', European Solid State Circuits Conference, 22-24 Sept 1993, pp.210-213.

[49] B. Razavi, 'Principles of data conversion system design', IEEE Press, New York, 1995.

- [50] S. Chandrasekaran and W. C. Black Jr., 'Sub-sampling sigma-delta modulator for baseband processing', Custom Integrated Circuits Conference, 2002. Proceedings of the IEEE 2002, 12-15 May 2002, pp.195-198.
- [51] H. Pekau and J. W. Haslett, 'A 2.4 GHz CMOS sub-sampling mixer with integrated filtering', Solid-State Circuits, IEEE Journal of, Vol. 40, Issue 11, 2005, pp.2159-2166.
- [52] S. Karvonen, T. Riley, S. Kurtti and J. Kostamovaara, 'A quadrature charge-domain sampler with embedded FIR and IIR filtering functions', Solid-State Circuits, IEEE Journal of, Vol. 41, Issue 2, 2006, pp.507-515.
- [53] A. S. Sedra and K. C. Smith, 'Microelectronic circuits', 5th Edition, Oxford University Press, 2003.
- [54] S. Karvonen, T. Riley and J. Kostamovaara, 'A low noise quadrature sub-sampling mixer', Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on, 6-9 May 2001, pp.790-793.
- [55] R. T. Stefani, B. Shahian, C. J. Savant Jr. and G. H. Hostetter, 'Design of Feedback Control Systems', 4th Edition, Oxford University Press, 2002.
- [56] L. Thede, 'Practical Analog and Digital Filter Design', Artech House, Inc., 2004.
- [57] P. R. Gray, P. J. Hurst, S. H. Lewis and R. G. Meyer, 'Analysis and Design of Analog Integrated Circuits', 4th Edition, John Wiley & Sons, Inc., 2001.
- [58] H. Tijms, 'Understanding Probability: Chance Rules in Everyday Life', Cambridge University Press, 2004.
- [59] M. R. Sayeh and H. R. Bilger, 'Flicker Noise in Frequency Fluctuations of Lasers', Phys. Rev. Lett., Vol. 55, Issue 7, Aug. 1985, pp.700-702.

[60] M. Li, B. Hayes-Gill and I. Harrison, '6 GHz transimpedance amplifier for optical sensing system in low-cost 0.35-μm CMOS', Electronics Letters, Vol. 42, Issue 22, Oct. 2006, pp.1278-1279.

- [61] T. K. Woodward and A. V. Krishnamoorthy, '1-Gb/s Integrated Optical Detectors and Receivers in Commercial CMOS Technologies', IEEE Journal of Selected Topics in Quantum Electronics, Vol. 5, Issue 2, Mar/Apr 1999, .
- [62] P. Dong, R. Smith, B. Hayes-Gill and I. Harrison, '10.2GSample/s DAQ system for Optical Scanning Acoustic Microscopy using 0.35µm CMOS Technology', IET Seminar on RF and Microwave IC Design, 2008, .

## Index

Absolute-Phase Clock, 70

Aperture Window, 117

Aperture Window Effect, 117

Calibration Matrix, 128

Clock Type, 70

CML, 21

Continuous-Wave Laser, 172

CW Laser, 172

DAQ, 5

Dark Output, 131

DC-Op, 131

DDS, 98

DDU, 67

Delay-Locked Loop, 22

DFT, 126

Digital Delay Unit, 72

**DLL**, 22

Double Differential Switch, 98

DSP, 133

Fast Fourier Transform, 135

FD, 21

FFT, 135

Frequency Divider, 21

IDFT, 123

IFFT, 135

Inverse Fast Fourier Transform, 135

LFA, 107

Linearising Feedback Amplifier, 107

Modulo Add, 126

O-SAM, 2

OpAmp, 107

Output Group, 124

PFD, 17

Phase-Locked Loop, 12

Phase/Frequency Detector, 17

PLL, 12

Presenting Time, 102, 143

Relative-Phase Clock, 70

RMS, 54, 143

Root Mean Square, 54

Sample, Front-End, 102

Sample, Holding, 102

Sample, Linearised Holding, 102, 108

Sample, Target, 102

Sample-and-Hold Amplifier, 86

SHA, 86

Spacial-first scanning, 157

INDEX 218

spur frequency, 20

 $Sub\text{-}Sampling SHA, \ 88, \ 93$ 

Switch Box, 70

TCA, 97

Timing-first scanning, 157

Trans-Conductance Amplifier, 97

Virtual Pulse, 117