Scaling CMOS Image Sensors

Manufacturing issues grow as cameras become more sophisticated.


After a period of record growth, the CMOS image sensor market is beginning to face some new and unforeseen challenges.

CMOS image sensors provide the camera functions in smartphones and other products, but now they are facing scaling and related manufacturing issues in the fab. And like all chip products, image sensors are seeing slower growth amid the coronavirus outbreak.

Manufactured at mature nodes in 200mm and 300mm fabs, these sensors are used in phones, cars, consumer products, industrial/medical systems, and security cameras. Smartphones, for example, incorporate two or more cameras, each of which is powered by a CMOS image sensor that converts light into signals in to order to create images.

Fig. 1: CMOS image sensor Source: Wikipedia/Filya1

Smartphones are incorporating more CMOS image sensors than ever before, enabling high-resolution, feature-rich cameras in systems. For example, Samsung’s new 5G smartphone consists of five cameras, including a rear-facing, wide-angle camera based on a 108-megapixel (MP) image sensor. This equates to over 100 million pixels on a small die size. A front-facing camera for selfies incorporates a 48MP image sensor based on the world’s smallest pixel pitch—0.7µm, according to TechInsights.

An image sensor incorporates a multitude of tiny photosensitive pixels. The pixel pitch is the distance from the center of one pixel to another, which is measured in µm. Not all phones are equipped with state-of-the-art image sensors, and consumers do not need them to take acceptable photos. But clearly, consumers are demanding more imaging functions.

“As higher-bandwidth data performance progressed from 3G to 4G and now to 5G, the demand for higher quality cameras has grown,” said David Hideo Uriu, technical director of corporate marketing at UMC. “This trend, coupled with the need for higher pixel counts and better resolution, have propelled the CMOS image sensor boom. Beyond these trends loom the areas of biometric ID, 3D sensing, and enhanced human vision applications in the IR/NIR spectrums in phones.”

Still, image sensor vendors face some challenges. For years they have been racing to reduce the pixel pitch. That way they can pack more pixels in an image sensor, which boosts the resolution of the device. Recently, though, pixel scaling is becoming more difficult, as the pitch approaches the wavelength of light. “Pixel R&D teams now have to find new ways to avoid reduction in sensitivity and more crosstalk in the sensor,” said Lindsay Grant, vice president of process engineering at OmniVision.

On the flip side, there is also a trend to maintain larger pixel sizes in phones and introduce the best improvements from smaller pixels to improve image quality. These trends support the customer demand for bigger and better cameras, resulting in more sensors with bigger die sizes.

Nonetheless, image sensor vendors have found ways to solve some of the challenges. Among them:

  • New processes. High-k films and other fab technologies have jumpstarted pixel scaling.
  • Die stacking and interconnects. Putting different functions on two dies and stacking them isn’t new. But new interconnect schemes, such as pixel-to-pixel connections, are in R&D.

Image sensor market dynamics
There are two main types of image sensors — CMOS image sensors and charge-coupled devices (CCDs). CCDs, which are current-driven devices, are found in digital cameras and various high-end products.

CMOS image sensors are different. “A complementary metal oxide semiconductor (CMOS) image sensor has a photodiode and a CMOS transistor switch for each pixel, allowing the pixel signals to be amplified individually,” according to TEL’s website.

Targeted for various applications, CMOS image sensors come in different formats, frame rates, pixel sizes and resolutions. Image sensors have global or rolling shutters. For example, OmniVision’s new 64MP image sensor features a 0.8µm pixel size in a 1/1.7-inch format. With still image capture and 4K video performance, the sensor features a type-2, 2×2 microlens phase detection autofocus to boost autofocus accuracy. Output formats include 64MP at 15 frames per second (fps).

Suppliers are split into two camps—fabless and IDMs. IDMs have their own fabs, while fabless companies use foundries. In either case, a vendor manufactures image sensor dies on a wafer, which are cut and assembled into a package.

Some 65% of image sensors are produced in 300mm fabs, according to Yole Développement. “200mm is still critical for a wide range of security, medical and automotive CMOS image sensor products,” said David Haynes, managing director of strategic marketing at Lam Research.

Today, Sony is the largest supplier of CMOS image sensors, followed by Samsung and OmniVision. Other suppliers include Sharp, ON Semi, STMicroelectronics, GalaxyCore, SK Hynix, Panasonic and Canon, according to IC Insights.

In 2019, image sensor sales reached $18.4 billion, up 30% over 2018, according to IC Insights. “In 2020, we currently forecast a 3% drop in CMOS image sensor sales to $17.8 billion, snapping the string of record sales because of a falloff in demand for sensors in cellphones and other systems in the Covid-19 virus health crisis,” said Rob Lineback, an analyst with IC Insights.

In a different and more optimistic forecast, the CMOS image sensor market grew 25% in 2019, according to Yole. In 2020, the market is expected to slow and grow a positive 7%, according to the firm. The big driver is smartphones. In 2018, there were 2.5 cameras per phone, according to Yole. “In 2019, it has jumped to 2.8 cameras per smartphone. We see that it will go to three cameras per smartphone in 2020,” said Guillaume Girardin, division director of photonics & sensing at Yole.

Each phone is different. For example, Apple’s iPhone 11 Pro incorporates a 12MP triple-camera technology (wide, ultra-wide, and telephoto). Meanwhile, Samsung’s 5G phone has five cameras, including four rear-facing and one front-facing. One camera features a time-of-flight sensor, which is used for gesture and 3D object recognition.

Higher-resolution cameras don’t necessarily equate to better photos. “It’s a tradeoff question between the pixel size and resolution,” Girardin said. “Pixel scaling means that it has more pixels. When the resolution is more than 40MP and 50MP, the capabilities may be beyond the human eye to see what they capture. For CMOS image sensors, the pixel with a better quantum efficiency (QE) and a signal-to-noise ratio are the most important things for image quality.”

In addition, smartphones will not displace DSLR cameras for the professional. But clearly, smartphones offer more features than ever before. “People are definitely attracted to 5G for more bandwidth and the potential applications, such as 8K streaming of a live sport event to real-time AR/VR/MR gaming,” said Ronald Arif, senior manager of product marketing at Veeco. “The cameras in the latest 5G phones have become more advanced. They’re starting to incorporate VCSEL devices for depth sensing, which can be used anywhere from autofocus to 3D mapping of your living room. One can imagine a combination of advanced cameras with depth mapping capability and 5G. This could open up rich, new applications such as gaming, live streaming, remote learning and video conferencing.”

In other innovations, vendors are shipping near-infrared (NIR) image sensors. NIR, which illuminates objects with wavelengths outside the visible spectrum, is designed for applications that operate in near or total darkness. OmniVision’s new NIR technology provides a 25% improvement in the invisible 940nm NIR light spectrum and a 17% bump at the barely visible 850nm NIR wavelength.

In a separate development, Sony and Prophesee have developed an event-based vision sensor. Targeted for machine vision apps, these sensors detect fast moving objects in a wide range of environments.

Pixel scaling race
Several years ago, CMOS image sensor vendors started the so-called pixel scaling race. This refers to the “pixel pitch,” which describes the distance between each pixel in a device. The goal was (and still is) to reduce the pixel pitch at each generation over a given time period. Higher pixel density equates to more resolution, but not all sensors require smaller pitches.

Years ago, the pixel pitch for an image sensor was at the 7µm generation. Vendors have reduced the pitch along the way, but there have been some hiccups.

The image sensor itself is a complex chip. The top layer is called a microlens array. The next layer is a color filter based on a mosaic green, red and blue array. The next layer is an active pixel array, which consists of light capturing components called photodiodes as well as other circuitry.

Fig. 2: Block diagram of CMOS image sensor. Source: OmniVision

The active pixel array is sub-divided into tiny and individual photosensitive pixels. The actual pixel consists of a photodiode, transistors and other components. The pixel size is measured in µm.

An image sensor with a larger pixel size collects more light, which equates to a stronger signal. Larger image sensors take up board space. Image sensors with smaller pixels collect less light, but you can pack more of them on a die. This, in turn, boosts the resolution.

There are several ways to make an image sensor in the fab. In one simple example, the pixel array is formed. The flow starts with a front-side process on a substrate. The wafer is bonded to a carrier or handle wafer. The top portion undergoes an implant step, followed by an anneal process. An anti-reflective coating is applied on top. The color film and microlens are developed.

In another and separate simple flow, the surface of a silicon substrate undergoes an implant step. Diffusion wells and a metallization stack are formed on top. The structure is flipped. Trenches are etched on the back side. A liner is deposited on the sidewalls of the trenches, which are filled with dielectric materials. A filter and microlens are fabricated on top.

Nonetheless, up until 2009, the mainstream CMOS image sensor was based on a frontside-illuminated (FSI) pixel array architecture. In operation, light hits the front side of the device. The microlens gathers the light and transports it to a color filter. Light goes through a stack of interconnects and is captured by a diode. The charge is converted to a voltage at each pixel and then the signals are multiplexed.

Over the years, the FSI architecture enabled vendors to reduce the pitch for several generations. For example, vendors reduced the pitch from 2.2µm in 2006 to 1.75µm in 2007, according to TechInsights.

In 2008, the industry hit the wall with the FSI architecture at the 1.4µm generation. So starting around 2009, vendors moved to a new architecture — backside illumination (BSI). The BSI architecture turns the image sensor upside down. Light enters from the back side of the silicon substrate. The photons have a shorter path to the photodiodes, which boosts the quantum efficiency.

Fig. 3: FSI vs BSI. Source: Omnivision

BSI also jump-started pixel scaling. “In terms of pixel scaling, BSI sensor technology allows for optimum pixel dimensions in the range of 1.2µm to 1.4µm, and stacked BSI allows the footprint of sensors with such pixel dimensions to remain below 30mm2,” Lam’s Haynes said. “Pixels with sub-micron dimensions can be enabled using quad-pixel architectures, enabling resolutions in excess of 48MP.”

Besides BSI, the industry required other changes. In pixel scaling, the photodiode—the key light capturing component—shrinks within the image sensor, making it less efficient. And the diodes are closer together, creating crosstalk.

So at 1.4µm around 2010, the industry moved to another innovation in the fab — deep trench isolation (DTI). In DTI, the goal is to make the photodiodes taller, which increases the capacity per area.

To enable DTI in the fab, vendors took the BSI architecture and made the photodiodes taller using various process steps. Taller diodes also require thicker silicon around the structures.

Still, pixel scaling slowed. At one time, vendors moved to a new pitch every year. But it took vendors three years to move from 1.4µm (2008) to 1.12µm (2011), four years to reach 1µm (2015), and another three to reach 0.9µm (2018), according to TechInsights.

“In summary, it is our belief that development of DTI and associated passivation schemes was the main contributor to delayed pixel introduction of 1.12µm down to 0.9µm pixels,” said Ray Fontaine, an analyst at TechInsights, in a recent blog.

Recently, vendors have ironed out the issues and the pixel scaling race has resumed. In 2018, Samsung broke the 1µm barrier with 0.9µm, followed by Sony with 0.8µm in 2019, and Samsung with 0.7µm in 2020.

For sub-µm pixel scaling, the industry requires more innovations. “As pixels shrink, thicker active (silicon) is required to maintain a suitable photodiode size,” Fontaine said in a recent presentation. “A key technology enabler for thicker active (silicon) is DTI and associated high-k defect passivation films.”

Making an image sensor with high-k films follows a traditional flow. (The process flow is described above.) What’s different is that high-k films are deposited over the liner in the DTI trenches.

For high-k and other processes, vendors take two different approaches in the fab—front-DTI (F-DTI) and back-DTI (B-DTI). “F-DTI uses a poly silicon gap fill, and the poly can have voltage bias for improved surface pinning. F-DTI can also have more thermal treatment for etch damage leakage reduction,” OmniVision’s Grant said. “B-DTI uses high-k films with a negative charge to accumulate charge and pin the Fermi level at the surface, which then suppresses dark-current leakage. The high-k film process is atomic layer deposition (ALD). B-DTI typically uses an oxide gap fill, but some metal fill and even air gap have also been tried and used in mass production.”

Will pixel scaling continue? “It’s likely that pixel scaling will continue beyond 0.7µm,” Grant said. “As pixels shrink beyond 0.7µm, many aspects need to be optimized. Key items, such as B-DTI, high-energy implant for deep diode, optical structure shrink for color and microlens, will remain the focus for development. The more basic design rules that define in-pixel transistors and interconnects need to be updated.”

Another issue is that the pixel pitch for mobile sensors is approaching the wavelength of light. “Some people may consider this a limit for minimum pixel size,” Grant said. “For example, the 0.6µm pixel pitch is used in R&D today. This is smaller than the wavelength of red light at 0.65µm (650nm). So the question may arise, ‘Why shrink to sub-wavelength? Will there be any useful benefit for the camera user? Shrinking the pixel size to sub-wavelength does not mean there is no valuable spatial resolution information at the pixel level.’”

Grant pointed out that the optical structures for a 1.0µm pixel use many sub-wavelength features. “For example, narrow metal grids for crosstalk suppression and narrow dielectric walls for quantum-efficiency are seeing improvement through light guiding. This nano-scale optical engineering is already in current pixels and has been for many years, so moving to sub-wavelength is not such a revolution,” he said. “The limitation for continued shrink may come from the user benefit rather than the technology. Today, applications continue to find end user value in shrinking the pixel size, so this is driving the trend. As long as that continues, CMOS image sensor technology development will support that direction.”

Stacking and interconnects
Besides pixel scaling, CMOS image sensors are undergoing other innovations like die stacking. Vendors are also using different interconnect technologies, such as through-silicon vias (TSVs), hybrid bonding, and pixel-to-pixel.

For years, the image sensor, including the pixel array and logic circuitry, were on the same die. The big change occurred in 2012, when Sony introduced a two-die stacked image sensor. Die stacking enables vendors to split the sensor and processing functions on different chips. This allows more functions in the sensor, while also reducing die size.

For this, Sony developed one pixel-array die, based on a 90nm process. That die was stacked on a separate 65nm image signal processor (ISP) die, which provides the processing functions. The two dies are then connected.

Eventually, others moved to a similar die stack approach. Generally, the top pixel array die is based on mature nodes. The bottom ISP die ranges from 65nm, 40nm and 28nm processes. 14nm finFET technology is in R&D.

Meanwhile, in 2018, Samsung and Sony developed triple-layer devices. For example, in one version of Sony’s CMOS image sensor line, a DRAM cell is sandwiched between the image sensor and logic dies. Embedded DRAM enables faster data readouts.

Besides die stacking, vendors are also developing different interconnect schemes, which connect one die to another one. Initially, OmniVision, Samsung and Sony used TSVs, which are tiny via-like electrical interconnects.

In 2016, Sony moved to an interconnect technology called copper hybrid bonding. Samsung is still in the TSV camp, while OmniVision does both TSVs and hybrid bonding.

In hybrid bonding, the dies are connected using copper-to-copper interconnects. For this, two wafers are processed in a fab. One is the logic wafer, while the other is the pixel array wafer. The two wafers are joined using a dielectric-to-dielectric bond, followed by a metal-to-metal connection.

Both TSVs and hybrid bonding enable fine pitches. “With respect to stacking of CMOS image sensor pixel and logic wafers, TSV integration and hybrid bonding are likely to continue to co-exist for stacked BSI,” Lam’s Haynes said. “But as multi-stacked BSI sensors become more common place, TSV integration will become increasingly relevant.”

There are other trends. “In the future, we expect to see two trends related to chip-stacking in CMOS image sensors. The first is further shrinking of the pitch to enable an even higher chip-to-chip interconnect density. The second is increased deployment of three or more devices being stacked,” said Steve Hiebert, senior director of marketing at KLA.

The next big thing is pixel-to-pixel interconnects. Xperi is developing a technology called “3D Hybrid BSI” for pixel-level integration. Sony and OmniVision have demonstrated the technology.

“It enables more interconnects,” said Abul Nuruzzaman, senior director of product marketing at Xperi. “It allows pixel-level interconnect between each pixel of the sensor and an associated A/D converter. This allows parallel A/D conversion for all pixels. The connection provides high-density electrical interconnection between the stacked pixel and logic layers, allowing implementation of as many A/D converters as the number of effective megapixels. Hybrid bonding can also be used to stack memory with dedicated memory to each pixel.”

This architecture supports massive parallel signal transfer, making it possible to read and write all pixel data of the image sensor at high speeds. “It enables global shutter with scaled pixels for real-time, high-resolution imaging for various timing critical applications, such as autonomous vehicles, medical imaging and high-end photography,” Nuruzzaman said.

Clearly, the CMOS image sensor market is dynamic. But 2020 will be a tough year for vendors amid the COVID-19 outbreak.

Still, there is a wave of innovation in the market. “Embedded CMOS image sensors and cameras are increasing in more systems for security, safety, vision-based user interfaces and recognition, IoT, autonomous cars and drones,” IC Insights’ Lineback said.


Masahiko says:

There must be room for more improvements in color filters with thinner material, right? For better color separation and reproducing the original color image.
And how about microlens? What are the major issues in these technologies?

Mark LaPedus says:

Hi Masahiko,
Thanks for the feedback. You are correct. There are new and interesting developments with color filters and the microlens. The same is true with different image sensor technologies like facial ID, SIR, ToF, event-based, etc.

In other words, it’s difficult to include everything about image sensors in a single article. Nonetheless, Bayer and non-Bayer CFA schemes are important. So are phase detection autofocus (PDAF) pixels and 2×1 on-chip lens (OCL) structures.

One could write separate articles about those topics. However, it was slightly off my radar screen. FYI. TechInsights has some great blogs on filters, OCLs and other subjects. That might be the best reference at this stage. Go to:

The state of the art of smartphone imagers
Part 4: Non-Bayer CFA, Phase Detection Autofocus (PDAF)

Here’s another great paper from TechInsights on these and other subjects:

Masahiko says:

Hi Mark,
Thank you for your kind comments.
I will check TechInsights’s blogs and the paper.

Michael Libman says:

A pixel at 0.8um or 07um is below light diffraction limit, so even a perfect lens cannot resolve especially given that lens F number has to be maintained at a reasonable level because focusing becomes an issue. What’s the point in shrinking pixels ever smaller?

Jim says:

Do you see cell phone manufactures offering above 8 or 10-bit color? TheA/D converters would however create the challenge In such a small form factor. This accomplishment would bring the cell phone industry closer to the DSLr and Mirroless markets.
This increase in color resolution adds to the quality of the final image clarity,

Leave a Reply

(Note: This name will be displayed publicly)