Research Abstracts

An Energy-Efficient, Fully Integrated 1920x1080 H.265/HEVC Decoder with eDRAM ................................................................. 1
Ultra-Low-Power, High-Sensitivity Secure Wake-Up Transceiver for the Internet of Things .................................................. 2
Energy-Efficient Security-Acceleration Core for the Internet of Things ........................................................................... 3
An Offset-Cancelling, Four-Phase Voltage Sense Amplifier for Resistive Memories in 14-nm CMOS ............................ 4
A Pipelined ADC with Low-Gain, Low-Bandwidth Op-amps ......................................................................................... 5
Data-Dependent SAR ADC ......................................................................................................................................................... 6
High-Performance GaN HEMT Track-and-Hold Sampling Circuits with Digital Post-Correction .......................... 7
A CMOS Flash ADC for a GaN/CMOS Hybrid Continuous-Time ΔΣ Modulator ......................................................... 8
Terahertz Beam-Steering Imager Using a Scalable 2D-Coupled Architecture and Multi-Functional Heterodyne Pixels .... 9
Broadband Inter-Chip Link Using a Terahertz Wave on a Dielectric Waveguide .......................................................... 10
Bilateral Dual-Frequency-Combs-Based 220-to-320GHz Spectrometer in 65-nm CMOS for Gas Sensing ............ 11
High-Stability, Miniature Terahertz Molecular Clock on CMOS .................................................................................. 12
2D Quasi-Optical Power Combining at 1 THz: An 80-μW Source in Silicon ................................................................. 13
uVIO: Energy-Efficient Accelerator for Microdrone Navigation in GPS-Denied Environments .......................... 14
Depth Estimation for Low-Power Time-of-Flight Imaging ................................................................................................. 15
Phase-Shift Impedance Modulation for Fast Response Dynamic Impedance Matching ........................................ 16
Dynamic Matching System for Radio-Frequency Plasma Generation ........................................................................... 17
Magnetic Field Sensing for Smart Industrial Infrastructure ......................................................................................... 18
An Energy-Efficient, Fully Integrated 1920x1080 H.265/HEVC Decoder with eDRAM

M. Tikekar, V. Sze, A. P. Chandrakasan
Sponsorship: TSMC University Shuttle Program, NSF

Video playback on mobile devices has become extremely popular in recent years to the extent that video accounts for 55% of all mobile data traffic. In response, new video coding standards that efficiently compress video without sacrificing quality have been developed. H.265/High Efficiency Video Coding (HEVC), the state-of-the-art video-coding standard, achieves 2x coding efficiency over its predecessor H.264/Advanced Video Coding (AVC). However, it comes at the cost of increased energy and area for video decoding due to computational complexity and memory accesses. In particular, data movement to/from off-chip dynamic random-access memory (DRAM) dominates energy consumption, consuming 2.8x-6x more energy per pixel than the processing itself. Accordingly, the focus of this work is to minimize the energy cost of data movement for an H.265/HEVC decoder.

Wearable devices such as smartwatches, smart glasses, and fitness trackers have stringent budgets for power (< 100mW) and form factor. Previous work has focused on video specifications of 4K and beyond, which are better suited for devices with larger power budgets like smartphones, tablets, set-top boxes, etc. The large frames need to be stored in DRAM, which dominates the overall system power. For example, our previous chip uses DDR3 memory which has a background power of 92mW. Embedded DRAM (eDRAM) helps reduce system power and physical footprint but requires more frequent refreshes than DRAM.

In this work, we propose several techniques to reduce the amount of data movement and optimize for overall system energy. The energy cost of data movement consists of active power for reading from/writing to memory and standby power for retaining memory contents. Through optimized data storage and movement, both active and standby energy cost can be minimized. We demonstrate the proposed techniques in a fully integrated H.265/HEVC decoder that does not require any external components.

FURTHER READING

Nanopower Internet-of-Things (IoT) devices are evolving yearly and increasing in size to an estimate of more than 70 billion devices in just ten years. Such devices usually operate at short ranges for personal health monitoring, home automation systems, and longer-range industrial monitoring. Unfortunately, all of these nodes consume a huge portion of their energy on their wireless communication systems. In this work, we propose protocol optimizations for sensor-node driven communications. For base-station driven communication, we propose to achieve power reduction through an ultra-low-power wake-up receiver with optimizations in the protocols as well as the circuits.

Wireless protocols such as Bluetooth low energy (BLE) are optimized for short packets with small preambles and header sizes. However, low duty-cycle performance in the default connected-mode of operation is limited by periodic beacons and the requirement that the node absorbs the timing uncertainties. The analysis of a commercial BLE radio performance, shown in Figure 1, shows that the average power is bounded to tens of microwatts due to the protocol overhead. Such consumption would not last several years using conventional coin cell batteries. Fortunately, the standby power is much less, and an optimized protocol can be developed to complete the wake-up without a full connection while achieving sub-µW consumption. On the wake-up receiver chain, the design can exploit the protocol trade-offs as well as the wake-up scheme to maintain the system specifications. Additionally, the tremendous growth of IoT devices allows open communication among all sorts of devices. With such a huge amount of data flowing through the network, security becomes a critical issue. Hence, we propose the wake-up transceiver system shown in Figure 2, where a transmitter closes the loop by providing small amounts of data sporadically upon request and creates a two-way communication channel for secure wake-ups and transmissions.

FURTHER READING

The Internet of Things (IoT) has introduced a vision of an Internet where all computing and sensing devices are interconnected. Digitally connected devices are encroaching on every aspect of our lives, including our homes, cars, offices, and even our bodies. Researchers estimate that there will be over 50 billion wireless connected devices by 2020. On one hand, the IoT enables fundamentally new applications, but on the other, these devices are attractive targets for cyber attackers, thus making IoT security a major concern. According to a security survey from 2016, only 10% IoT products have adequate security features.

Most commercial IoT transceivers either have no security implementations in hardware or only support symmetric key primitives like Advanced Encryption Standard (AES). To achieve end-to-end security in IoT networks, public key algorithms, like elliptic curve cryptography (ECC) are indispensable. Software implementations of these algorithms involve significant computational costs, and the power consumption presents a bottleneck in resource-constrained environments. In this work, we propose to design low-power security-acceleration hardware that interfaces with a standard micro-processor and supports ECC for key exchange and digital signatures, along with standard cryptographic components like AES (Figure 1), thus alleviating the security and efficiency trade-off observed in embedded devices.

Our work also focuses on optimizing network security protocols for efficient implementation in embedded devices. Standard implementations of these protocols tend to have a large communication overhead, which becomes an additional concern for battery-powered or energy-harvesting IoT devices. Therefore, our proposed hardware can not only secure private data using low power cryptographic computations, but also reduce energy consumption of the RF transceiver (Figure 2).

**FURTHER READING**

Resistive memories are a class of non-volatile embedded memories that have the potential to be a universal memory technology by providing the density of dynamic random access memory (DRAM), the speed of static random access memory (SRAM), and the non-volatility of Flash. Resistive memories typically consist of a 1T-1R structure, i.e., a resistive storage device, e.g., a magnetic tunnel junction (MTJ) for spin-transfer-torque random access memory (STT-RAM) and an access transistor. There are two resistance states: high-resistance ($R_H \uparrow \uparrow \uparrow \uparrow$) and low-resistance ($R_L \downarrow \downarrow \downarrow \downarrow$), where $R_H = (1+TMR)R_L$. For MTJ devices the tunnel-magneto-resistance ratio (TMR) is typically around 100%-200%, depending on technology, temperature, etc., which makes it challenging to distinguish the two resistance states correctly. On the other hand, due to increasing variations in sub-32-nm complementary metal-oxide semiconductor (CMOS) processes along with variation in MTJ resistance, it becomes challenging to design a read-sensing scheme that achieves low read disturbance and high yield.

In this work, we try to address these issues to design a robust read-sensing circuit for resistive memories that would work for yield $>5.5\sigma$ and reduce power consumption. The robustness to variations is achieved mainly in two ways. First, due to the pseudo-differential nature of the sensing scheme (comparing data to two references ‘ref1’ and ‘ref0’), we get 2x the signal margin as compared to a single reference scheme. Second, the offset-cancellation of the sense-amplifier (SA) makes it more suitable to tolerate variation from the array due to MTJ resistance variation. The proposed SA was implemented in a 14-nm CMOS process. It achieved correct operation with 20mV input and a DC offset-$\sigma$ of 1.9mV. This shows that the SA can tolerate large variations from the memory array to achieve a high yield. On the other hand, due to the offset-cancellation technique, the SA can be designed using small devices to achieve low area and power. Hence it benefits from CMOS technology scaling.

In this work, we try to address these issues to design a robust read-sensing circuit for resistive memories that would work for yield $>5.5\sigma$ and reduce power consumption. The robustness to variations is achieved mainly in two ways. First, due to the pseudo-differential nature of the sensing scheme (comparing data to two references ‘ref1’ and ‘ref0’), we get 2x the signal margin as compared to a single reference scheme. Second, the offset-cancellation of the sense-amplifier (SA) makes it more suitable to tolerate variation from the array due to MTJ resistance variation. The proposed SA was implemented in a 14-nm CMOS process. It achieved correct operation with 20mV input and a DC offset-$\sigma$ of 1.9mV. This shows that the SA can tolerate large variations from the memory array to achieve a high yield. On the other hand, due to the offset-cancellation technique, the SA can be designed using small devices to achieve low area and power. Hence it benefits from CMOS technology scaling.

FURTHER READING

Among various analog-to-digital converter (ADC) architectures, pipelined ADCs are well suited for applications that need medium to high resolution above hundreds-of-megahertz sampling rate. In order to obtain good linearity, conventional pipelined ADCs must minimize MDAC charge-transfer error by employing high-gain, fast-settling op-amps (Figure 1). However, such op-amp design has become increasingly difficult due to the reduced intrinsic gain and voltage headroom in a fine-line complementary metal-oxide semiconductor (CMOS) technology. With low intrinsic gain devices, either gain-boosting technique or multi-stage topology is necessary to make the op-amp meet the gain requirement. Using a decreased power supply demands larger capacitance to maintain the same level of SNR. As a result, the power consumption of op-amps becomes prohibitively large.

In order to address this issue, numerous techniques have been proposed. Digital calibration has been one of the popular ways to use low-performance op-amps in pipelined ADCs. By taking advantage of digital computation, op-amp-induced MDAC charge-transfer errors are measured and removed in the digital domain. More specifically, dithering injection or the least-mean-square algorithm has been exploited to model the MDAC nonlinearity and digitally calibrate the error. In the same context of relaxing op-amp performance requirements, novel analog circuit techniques have also been developed. Virtual ground reference technique and correlated level shifting are techniques that improve MDAC performance without using high-performance op-amps. In some approaches, op-amps are completely replaced by other circuitries that are more amenable to the scaled CMOS technologies. A zero-crossing-based pipelined ADC is a representative example of such approaches.

In this project, we propose a digital calibration scheme for op-amp-based pipelined ADCs. To validate the functionality of the proposed calibration technique, a proof-of-concept ADC has been designed in 28-nm CMOS technology and is currently being fabricated.

---

**FURTHER READING**

This work on successive-approximation-register (SAR) analog-to-digital converters (ADCs) (Figure 1) aims at improving data-dependent savings in energy in key components of a SAR ADC by leveraging the information available from the signal’s immediate past samples and the signal type. The dominant energy consuming components are the DAC and comparator.

Energy expenditure in DAC per sample conversion depends on the DAC topology and sequence of steps taken during successive approximations. Energy in a comparator is directly proportional to the number of comparisons done per sample conversion. A design with data-dependent savings takes advantage of the correlation between successive samples in completing the conversion in fewer bit-cycles and also operate DAC energy-efficiently.

Previous work presents data-dependent savings by doing least-significant-bit (LSB) first successive approximation to convert an input sample. By starting with a previous sample and doing LSB-first, the algorithm converges in a fewer number of cycles than conventional most-significant-bit (MSB) first SAR conversion. Fewer cycles translate into energy savings in the comparator and DAC. Another work developed successive approximation algorithms to find a sub-range from the full range in a few cycles before carrying on a binary search in this small range. In this work, we investigate a SAR ADC with a search algorithm based on the statistical characteristics of the signal for optimum energy expenditure.

**FURTHER READING**


![Figure 1: SAR ADC.](image-url)
The performance limit of integrated systems for emerging high-performance applications such as ultrafine imaging and sensing, advanced wireless communication, and data server optical networks often comes from analog-to-digital converters (ADCs) whose performance is, in turn, limited at least partly by a track-and-hold sampling circuit (THSC). The low supply voltage of deeply scaled complementary metal-oxide semiconductor (CMOS) transistors determines the THSC input signal range, therefore becoming a fundamental barrier to the signal-to-noise ratio (SNR) of CMOS circuits.

This research ultimately aims to design ultra high-performance THSCs in GaN-on-Si technology, which monolithically integrates GaN high-electron-mobility transistors (HEMTs) with Si-CMOS transistors. Operating GaN HEMTs at a high voltage (>30 V) allows a very large input swing (>16 V) and provides performance beyond the limit of CMOS THSCs. As a first step, we designed two GaN HEMT THSCs. The first THSC was fabricated in a commercial GaN foundry technology on SiC substrate, providing 98-dB SNR at 200-MS/s (Figure 1). The second THSC design was fabricated in a GaN technology that was developed at MTL on Si substrate, which operates at 1 GS/s thanks to a higher current-gain cutoff frequency $f_T$ and external gate-bootstrapping clock (Figure 2).

While these two GaN THSCs achieved very high SNR at a given input frequency, they suffered from nonlinearity. We characterized how the static nonlinearity and dynamic memory effects of GaN HEMT THSCs affect the sampled output; we observed that the GaN HEMT dynamic on-resistance does not significantly degrade the THSC linearity because the capacitive load does not suffer from on-resistance variation on the sampled voltage. We identified that most of the dynamic nonlinearity originates from the GaN HEMT source-follower buffers for gate-bootstrapping sampling clock generation. Although dynamic nonlinearity correction techniques are mature with RF power amplifiers (PAs) and improve PA linearity typically by 20-40 dB depending on signal bandwidth and modeling accuracy, these pre-distortion techniques cannot be directly applied to THSCs. To overcome this challenge, we are developing a digital post-correction technique, which will demonstrate improved linearity of GaN HEMT THSCs without using a dedicated reference ADC.

**Figure 1:** Pseudo-differential two-stage track-and-hold sampling circuit in 0.25-μm GaN HEMT technology on SiC substrate, which demonstrates 200-MS/s 98-dB SNR and 240-MHz track-mode bandwidth.

**Figure 2:** Track-and-hold sampling circuit with external gate-bootstrapping clock in a GaN technology developed at MTL on Si substrate, which provides over 700-MHz track-mode bandwidth and operates at 1 GS/s.

---

**FURTHER READING**

High-speed and low-resolution flash analog-to-digital converters (ADCs) are widely used in applications such as 60-GHz receivers, serial links, and high-density disk drive systems, as well as in quantizers in delta-sigma modulators. In this project, we propose a flash ADC with a reduced number of comparators by means of interpolation. One application for such a flash ADC is a hybrid gallium-nitride (GaN) and complementary metal-oxide semiconductor (CMOS) delta-sigma modulator. The GaN first stage exploits the high-voltage property of the GaN while the CMOS backend employs high-speed, low-voltage CMOS. This combination may achieve an unprecedented signal-to-noise ratio (SNR)/bandwidth combination by virtue of its high input signal range and high sampling rate.

One key component of the hybrid modulator is a flash ADC. To take advantage of the high signal-to-thermal-noise ratio of the proposed system, the quantization noise must be made as small as possible. Therefore, a high-speed, 8-bit flash ADC is proposed for this system. Sixty-five comparators are used to achieve the six most significant bits. Sixty-four interpolators are inserted between the comparators to obtain two extra bits. The input capacitance of this design is \( 1/4 \) of the conventional 8-bit flash ADC. Therefore, a higher operating speed can be achieved. We introduce gating logic so that only one interpolator is enabled during operation, which reduces power consumption significantly. A high-speed, low-power comparator with low noise and low offset requirements is a key building block in the design of a flash ADC. We choose a two-stage dynamic comparator because of its fast operation and low power consumption. With the scaling of CMOS technology, the offset voltage of the comparator keeps increasing due to greater transistor mismatch. In this project, we also propose a novel offset compensation method that eliminates the speed problem.

\[ \text{Figure 1: Flash ADC architecture, with 65 comparators and 64 2-bit interpolators.} \]

\[ \text{Figure 2: Schematic of the two-stage dynamic comparator.} \]

**FURTHER READING**

Terahertz (THz) imaging has increasing potentials in industrial quality control, security screening, and accurate distance mapping for short-range applications such as robots, augmented reality, and virtual reality sensing. With the development of a high-power THz illumination source, future deployment of THz imaging in autonomous vehicles is also possible in order to complement lidar, which fails to operate under foggy and dusty conditions, as well as microwave/mmWave radar, which does not provide high resolution. An ideal imager calls for high spatial/ranging resolution, high sensitivity, fast scanning speed, and low SWaP-C (size, weight, power, and cost). A THz imager formed by a large-scale heterodyne sensing array in a silicon integrated circuit provides the opportunity to achieve all these requirements at once.

In this project, we develop a multi-functional design of heterodyne pixel and a scalable array architecture. Shown in Figure 1, a compact electromagnetic structure simultaneously performs voltage-controlled 140-GHz local oscillation, 280-GHz signal receiving, sub-harmonic mixing, and intermediate-frequency (IF) signal extraction. Each pixel consumes 10-mW power and achieves a sensitivity of 2.9 pW in simulation. The local oscillator (LO) of the pixel is phase coupled with its neighbors; the whole oscillator array is then stabilized by an on-chip THz phase-locked loop (PLL, Figure 2). This architecture gives excellent array scalability. First, the LO power is evenly distributed and does not degrade in a larger array scale as a normal centralized array does. Second, the phase noise of the coupled LO network improves linearly with the array size. The simulated phase noise at 1-MHz frequency offset is -90 dBc/Hz for an 8x8 array and -101 dBc/Hz for a 32x32 array. This chip is capable of digital beam steering, too. The simulated response of a steered beam direction for a 32x32 array is shown in Figure 3. A chip prototype with 10x10 array is currently under fabrication using a 130-nm SiGe BiCMOS process.

**FURTHER READING**

The development of data links between different microchips of an on-board system has encountered a speed bottleneck due to the excessive transmission loss and dispersion of the traditional inter-chip electrical interconnects. Although high-order modulation schemes and sophisticated equalization techniques are normally used to enhance the speed, they also lead to significant power consumption. Silicon photonics provide an alternative path to solve the problem, thanks to the excellent transmission properties of optical fibers; however, the existing solutions are still not fully integrated (e.g., an off-chip laser source is used) and normally require process modification to the mainstream complementary metal-oxide semiconductor (CMOS) technologies.

Here, we aim to utilize a modulated THz wave to transmit broadband data. Similar to the optical link, the wave is confined in dielectric waveguides, with sufficiently low loss (~0.1dB/cm) and bandwidth (>100GHz) for board-level signal transmission (Figure 1). In commercial CMOS/BiCMOS platforms, we have previously demonstrated high-power THz generation with modulation, frequency conversion, and phase-locking capabilities. In addition, a room-temperature Schottky-barrier diode detector (in 130-nm CMOS) with <10pW/Hz^{1/2} sensitivity (antenna loss excluded) is also reported. The proposed data link will leverage these techniques to achieve a >100Gbps/channel transmission rate with <1pJ/bit energy efficiency. As the first step of this project, we have designed a new broadband chip-to-fiber THz wave coupler. In contrast to previous couplers using off-chip antennas, our THz coupler is entirely implemented using the metal backend of a CMOS process and requires no post-processing (e.g., wafer thinning). The structure is also fully shielded, which prevents THz power leakage into the silicon substrate. Conventional on-chip radiators using ground shield work are the resonance type (e.g., patch antenna) and have only <5% bandwidth. In comparison, our design is based on a traveling-wave, tapered structure, which supports broadband transmission. A proof-of-concept is shown in Figure 1: two on-chip couplers are connected with a 2-cm waveguide using Rogers 3006 dielectric material. The entire back-to-back setup exhibits only ~1dB insertion loss across over 60-GHz bandwidth (Figure 2).

**FURTHER READING**

Bilateral Dual-Frequency-Combs-Based 220-to-320GHz Spectrometer in 65-nm CMOS for Gas Sensing

C. Wang, R. Han
Sponsorship: NSF, MIT Lincoln Laboratory, CICS

Millimeter-wave/terahertz rotational spectroscopy offers an ultra-wide detection range of gas molecules for chemical and biomedical sensing. The linewidth of the absorption spectrum, limited by the Doppler effect of molecules, has a quality factor near $10^6$, indicating absolute specificity for molecule identification. Therefore, broadband, energy-efficient, and high-precision complementary metal-oxide semiconductor (CMOS) spectrometers are in high demand. Spectrometers using narrow-pulse sources and electromagnetic scattering, although broadband, fail to provide resolution that meets the requirement for absolute specificity. Alternatively, spectrometers using a single tunable tone not only exhibit significant trade-off between bandwidth and performance, but also have a low speed limited by molecular saturation, which sets an upper bound for the single tone power. Given a typical 10-kHz resolution and 1-ms integration time, scanning a 100-GHz bandwidth with a single signal tone takes as long as 3 hours.

We report a rapid, energy-efficient spectrometer based on a bilateral, dual-frequency-comb architecture. This architecture generates and detects multiple comb lines by using cascaded continuous frequency conversion. Due to the narrow band operation of each conversion stage, it significantly increases the performance and energy efficiency of the system. A 220-to-320GHz CMOS spectrometer prototype based on this architecture is demonstrated with a total measured radiated power of 5.2 mW, which is evenly distributed among 10 comb lines. It also serves as a 10-channel heterodyne receiver with a noise figure of 14.6 to 19.5 dB within the 220-320GHz band. Through the bi-directional parallel operation, this spectrometer increases the scanning speed by 20×. In addition, the improved signal-to-noise ratio reduces the integration time at each frequency point, so the scanning speed and energy efficiency are thus further improved.

A fraction of rotational absorption spectrum of acetonitrile (CH$_3$CN) is measured, and it agrees with the JPL molecule catalog and demonstrates the absolute specificity of the spectrometer.

**FURTHER READING**

- C. Wang and R. Han, “Rapid and Energy-Efficient Molecular Sensing Using Dual mm-Wave Combs in 65nm CMOS: A 220-to-320GHz Spectrometer with 5.2mW Radiated Power and 14.6-to-19.5dB Noise Figure,” in International Solid-State Circuit Conference (ISSCC), San Francisco, CA, 18–20, 2017.
High-Stability, Miniature Terahertz Molecular Clock on CMOS

C. Wang, J. P. Mawdsley, R. Han
Sponsorship: MIT Lincoln Laboratory, NSF, Texas Instruments

Frequency reference is an essential component in most electronic systems. Applications such as communication, navigation, and reconnaissance in portable platforms or under GPS-denied environments demand highly stable frequency references (Allan deviation \(1\times10^{-12}, \tau=1\)s), which exceed the capability of widely used temperature or oven-compensated crystal oscillators. Currently, chip scale atomic clocks (CSACs), which rely on hyperfine states of alkali atoms, are used at high cost. In this work, we develop a low-cost miniature clock on a complementary metal-oxide semiconductor (CMOS), which relies on rotational energy state transitions of gaseous molecules at low-THz range. The spectral line probed by the spectrometer inside our molecular clock has a slightly lower quality factor \(Q\) compared to that in CSACs but a much higher signal-to-noise ratio (SNR). The short-term frequency stability, determined by the product of \(Q\) and SNR, is therefore improved. The chemically stable gas molecules used in this clock also provide frequency robustness against environmental variations (temperature, pressure, Stark effect, and Zeeman effect), leading to excellent long-term stability.

An experimental system of molecular clock has been demonstrated. The spectral line at 279.865GHz of carbonyl sulfide (OCS) is chosen for its strong absorption and simple molecular structure. The system probes the spectral line using frequency modulated terahertz waves and dynamically adjusts a crystal oscillator to synchronize the output center frequency of the terahertz transmitter with the spectral line. The preliminary measured Allan deviation (frequency stability) with \(\tau=30\)s is \(5\times10^{-11}\). To achieve ultra-low size, weight, power consumption, and cost (SWaP-C), a CMOS molecular clock is under development. On the transmitter side, it adopts a 40-bits ΔΣ fractional PLL with integrated FSK modulation function. On the receiver side, a homodyne detector along with lock-in detection is used. In simulation, an Allan deviation of \(2\times10^{-12} \ (\tau=100\)s\) and DC power consumption of 50 mW are obtained.

![Figure 1: System architecture and measured Allan deviation of the molecular clock.](image)

FURTHER READING

Generating high-power signal in the mid terahertz (THz) range is a challenge since this frequency band lies beyond the $f_{\text{max}}$ of all silicon-based transistors and below the typical operation range of quantum cascaded lasers. Nevertheless, unique properties of waves in this band are highly beneficial for several research fields: (i) high-resolution real-time THz imaging, (ii) vibrational spectroscopy of large bio-molecules, (iii) sub-μm-precision vibrometry, etc. The total radiated power of previously reported radiator chips near-1 THz was below 10 µW. In addition, there is no prior research on quasi-optical power combining in this band, since two challenges remain unsolved: (i) maintenance of strong coupling across a large-scale array and (ii) high-density placement of radiator structures with the integration of oscillator, antennas, and couplers, within a limited area of ($\lambda/2$ by $\lambda/2$). In this work, we have successfully addressed these issues and built a high-power 2D-scalable 1 THz radiator array in silicon.

Figure 1 shows a conceptual diagram of our design: the chip consists of oscillator units coupled in a 2D fashion. Coupling is achieved by slotlines on the four edges of each unit, which strongly lock the phase (and hence frequency) with its four adjacent counterparts. Inside each unit, there are two self-feeding oscillators, coupled in the differential mode, generating 250-GHz fundamental ($f_0$) oscillation and resultant harmonics. Among all harmonics, only $4f_0$ signal at 1 THz is radiated while other lower harmonics are re-injected into transistors for up-mixing to $4f_0$. This critical function is achieved by forming the standing-wave patterns inside the slotlines of the unit so that near-field radiations at $f_0$ and $3f_0$ are cancelled between horizontally adjacent slots while the radiation at $2f_0$ is cancelled between vertically adjacent slots.

Lastly, the standing waves at $4f_0$ in all horizontal slotlines are in phase, which leads to efficient coherent radiation. Due to the compactness of the design, the spacing between neighboring dipole slot antenna structures is $\lambda_{\text{THz}}/2$ in both horizontal and vertical directions, permitting large array scale (91 antennas in total) and high radiation directivity (24 dB). Figure 2 shows the setup and results for radiated power and pattern measurements using a zero-biased diode detector. The measured total radiated power is 80 μW, which is 10× higher than the prior state-of-the-arts, and the equivalent isotropically radiated power is 13 dBm. The chip was fabricated using IHP 0.13-μm SiGe BiCMOS technology.

**FURTHER READING**

Drones are getting increasingly popular nowadays; it is reported that their sales have tripled in the last year. Microdrones specifically are easily portable and can fit in a pocket. Equipped with multiple sensors like cameras and inertial measurement unit (IMU), the functionality of drones is getting more powerful and smart, e.g., they track objects, build accurate 3-D maps, and even avoid obstacles. These capabilities are enabled by powerful computing platforms like CPUs and GPUs, which consume a lot of energy, installed on the drones. Both the size of these platforms as well as the battery's weight and limited energy make it prohibitive to deploy to microdrones, which can be as small as two inches and operate on very small batteries as shown in Figure 1.

In this project we present uVIO, an energy-efficient hardware accelerator for microdrone navigation in GPS-denied environments. The hardware is co-optimized with the algorithm and the drone design to enable a lightweight drone (~100 grams). The microdrone is equipped with a stereo camera and an IMU and can operate in real time without any external communications. The accelerator is designed to be energy-efficient to operate on a small battery, satisfying the overall payload weight of the drone.

The proposed accelerator implements a robust and optimized visual inertial odometry (VIO) algorithm, shown in Figure 2. It combines the visual information and the IMU measurements to estimate the position, orientation, and velocity of the microdrone as well as the 3-D environment via Gauss-Newton algorithm. The implementation is highly parallelized and pipelined to achieve real-time performance and energy efficiency. This accelerator gives a microdrone the smart sensing capability, which now only exists in large drones with bulky batteries. It enables numerous applications where large drones do not fit, such as indoor exploration and surveillance as well as rescue operations in collapsed buildings.

**FURTHER READING**

Depth Estimation for Low-Power Time-of-Flight Imaging

J. Noraky, V. Sze
Sponsorship: Analog Devices, Inc.

Depth sensing is used in a variety of applications that range from augmented reality to robotics. One way to measure depth is with a time-of-flight (TOF) camera, which obtains depth by emitting light and measuring its round trip time. TOF cameras are appealing because they are compact, have no moving parts, and require minimal computation to obtain depth. However, the illumination source of TOF cameras requires a significant amount of power and limits the application time for mobile and battery-operated devices. To reduce the power for TOF imaging, we propose an algorithm that leverages images, which can be efficiently collected alongside the TOF camera, to estimate depth maps without the need of continuously illuminating the scene (Figure 1).

Our technique is best suited for estimating the depth of rigid objects and uses the temporal correspondences between images to estimate the 3-D motion of objects in the scene, from which a new depth map can be obtained. Our algorithm is computationally simple and produces 640 × 480 depth maps at 30 FPS on a low power embedded platform. We evaluated our technique on a RGB-D dataset, where it estimated depth maps (Figure 2) with a mean relative error of 0.85% while reducing the total power required for depth sensing by 3×.

▲ Figure 1: Minimize the power of TOF imaging by minimizing its usage and estimate new depth maps using images.

▲ Figure 2: Example of the estimated depth map obtained using our approach. Pixel-wise relative error is shown in the third column.
Phase-Shift Impedance Modulation for Fast Response Dynamic Impedance Matching

A. S. Jurkov, D. J. Perreault
Sponsorship: MKS Instruments Inc.

Accurate, rapid, and dynamically-controlled impedance matching offers significant advantages to a wide range of present and emerging radio-frequency (RF) power applications such as software-defined radios, frequency-agile and adaptive RF transmitters and receivers, the design of new types of highly-efficient RF power amplifiers, plasma drivers, generators, wireless power transfer, and many other industrial processes.

For high-frequency (HF) and very-high-frequency (VHF) applications, e.g., 3-300 MHz, a tunable impedance matching network (TMN) is typically implemented as an ideally lossless, lumped-element reactive network, in which some of its reactive elements are realized as variable (tunable) components. Conventional techniques for implementing variable reactances for high power RF applications often impose limitations on tuning resolution or speed.

This work proposes an approach to TMNs that allows for a combination of much faster and more accurate impedance matching than is available with conventional techniques and is suitable for use at high-power levels. This implementation is based on a narrow-band technique, termed here phase-switched impedance modulation (PSIM). The notion of phase-switched variable reactances relies on the ability to modulate the effective impedance of a switched reactive element (capacitor, inductor, or some combination of both) at the switching frequency (i.e., the RF frequency). In essence, it is a narrow-band technique for controlling the effective impedance seen looking into the terminals of a reactive element at the frequency at which this element is switched (including either by a shunt or a series switch) by appropriately adjusting the phase and/or duty-cycle of the switch (Figure 1).

A TMN prototype that demonstrates the performance of a PSIM-based implementation is designed to provide a 50-Ω match over a load impedance range associated with inductively-coupled plasma processes and operate in a narrow frequency band centered around 13.56 MHz (Figure 2). Ongoing work aims to further explore PSIM-based design of both variable and fixed-frequency matching networks and a new class of switching RF power amplifiers that can operate efficiently over wide load and frequency range.

FURTHER READING

The use of plasma in the processing of materials has become prevalent in a wide range of industries. A commonly used means of generating these plasmas is to inductively couple energy from a radio-frequency (RF) power amplifier into the chamber containing the gas to be ionized (e.g., by driving RF current through a coil wound around the chamber). Key challenges in RF plasma generation include efficiently generating and controlling the RF power delivered into the plasma while maintaining acceptable loading of the associated RF power amplifier under the high operating frequency (e.g., 13.56 MHz) and the highly-variable conditions in a plasma system. Inductively coupled plasma (ICP) loads represent a dynamically variable load impedance that depends on gas type and pressure, operating mode, power level, and other features. The effective load impedance can vary substantially in both its real and reactive components, making matching challenging.

Because the effective loading provided by the plasma coil varies greatly across operating conditions, a tunable matching network is typically utilized between the power amplifier and the plasma coil. However, this tends to be costly and bulky, and it exhibits a slow response to load changes. We introduce a dynamic matching system for ICP generation that losslessly maintains near-constant driving point impedance (for low reflected power) across the entire plasma operating range. This new system utilizes a resistance compression network (RCN), an impedance transformation stage, and a specially-configured set of plasma drive coils to achieve rapid adjustment to plasma load variations. As compared to conventional matching techniques for plasma systems, the proposed approach has the benefit of relatively low cost and fast response and does not require any moving components. We develop suitable coil geometries for the proposed system and treat the design of the RCN and matching stages, including design options and tradeoffs. A prototype system is implemented (Figure 1) and its operation is demonstrated with low pressure ICP discharges with O₂, C₄F₈, and SF₆ gases at 13.56 MHz and over the entire plasma operating range of up to 250 W (Figure 2).

FURTHER READING

This newly initiated project seeks to develop hardware and algorithms to support the contactless measurement of currents in connectors. A natural future extension of this project is to develop hardware and algorithms to support the contactless measurement of voltages. Such measurements in turn will support the development of smart industrial infrastructure that monitors the power and signals passing through them. In concert with signal processing and machine learning algorithms, the electrical data might be used to further monitor upstream sources and downstream loads. For example, it might be possible with enough “intelligence” to learn normal operation from prior transmitted power and signals and then recognize future signatures that predict subsequent failures, making the power and data infrastructure truly smart.

During the initial part of this project, we have developed a random-walk algorithm to find the best layout of magnetic-field sensors around a connector for measuring the internal connector currents in the presence of external magnetic field disturbances, nearby magnetizable materials, and conductors. The objective is to reduce the error of the measured currents to the same level as the inevitable noise in the magnetic field measurements. In parallel, we are building an experimental system to validate and demonstrate the ability to detect connector current.