|
1.INTRODUCTIONMoore’s Law [1] expressed an economical optimism of semiconductor industry forward looking – one could expect half of size, double performance and reduce power, at almost equal cost, in every 12 months (later it modified to 18 months and then 2 years interval). It not only brought confidence to the industry but also continuously guided its 40+ years of growth, till 28nm. The entire industry, and the entire world, benefited from the booming of semiconductor. It seems since 20nm, due to the usage of multi-patterning lithography, the economic benefit stated by Moore’s Law, became more and more challenged – the scaling, improved power-performance still possible with less degree, the “almost equal cost” aspect, is unlikely. Figure 1 shows a trend of processing performance trend along with time [2]. 2.FPGA CAPABILITYFPGA’s capability, which includes not only its programmable logic count, performance and power, but also functionality, software and applications, has been continually increasing, under Moore’s Law and beyond. Figure 2 shows the maximum logic cell (each logic cell roughly equals to ~ 100 logic gates) counts of each generation product family since 0.25um Virtex to the newest 7nm Versal product family. From the chart it can be seen following Moore’s Law the maximum logic cell counts (red color) increased from 27,000 in 0.25um node to ~3,700,000 in 16nm (if chose to do so) in a monolithic die, approximately ~137x increase in 9 nodes. Whereas by employing passive 3D-IC (orange color), the maximum logic cell count gains another 2~3x higher than merely following Moore’s Law. It is noticeable that the logic cell count increase is less than what Moore’s Law prediction from node to node. One of the reasons is that not all types of circuits are scaled equally. For example, when normal logic circuits, SRAM in general following technology scaling factor, IOs, analog and RF circuits are usually scaled far less. The other important reason is from one generation to another, FPGA has been continually increasing its functions (hard embedded IPs), which is also counted into die size real estate. Despite more area required by these increases of functionality, they are the key part of FPGA capability improvement, and have very positive impacts of customers’ applications. We will discuss several examples of functionality critical to FPGA applications in the following sections. 2.1SERDESSerializer and De-serializer (SERDES) have become popular gradually in past decades. FPGA first employed SERDES in Virtex II Pro family in 2002. It was only a few high-end family members with limited number of SERDES channels of only 3.125Gpbs. Today in 16nm Ultrascale-plus family, SERDES have been much proliferated with higher performance. For example the UV-29Plus has 48 channels of 58Gbps PAM4 and 32 channels of 32Gbps SERDES. The VU-13Plus has 120 channels of 32Gbps SERDES. Both are in a 3D-IC integration. These high speed and large number of SERDES channels become essential in communication industry. Not only 58Gbps PAM4, Xilinx also demonstrated 112Gbps SERDES [3], and it will be a part of 7nm Versal family offering. To incorporate SERDES design into FPGA, one key challenge is timing – one needs to be able to bring complicate RF design at early development stage of a technology, not just waiting for its matureness. This requires in-depth understanding the technology, analyzing existing data and forecasting its RF behavior in RF modeling. Xilinx has spent major effort to master this aspect. Beyond 112Gbps, industry has not had clear conclusion the roadmap. Xilinx believes optical photonics integration could be one of the options, and has been studying accordingly. Below Figure 3 lists publications showing the evolution of SERDES in past a decade and half. 2.2Embedded micro-processors, SOC and RFSOCIn the same Virtex-II Pro family in 2002, it was also the first time a micro-processor, PowerPC 405 (PPC405), embedded into certain high-end FPGA family members. A major effort was spent to ensure intimate connections between PPC405 to programmable logic fabric. The embedded micro-processor enabled many new high-end industrial and communication FPGA applications. However the application usage of embedded micro-processor has still be relatively limited, until the 28nm 7-Series Zynq SOC family in 2012 time frame. The 7-Series Zynq had a much more popular, lower cost ARM A9 processor, with vast effort to build it not only a connected processor but far more capable SOC, which includes memories, IOs and large amount of system software to support applications. The 7-Series Zynq SOC attracted a lot more applications in all range (including very low cost ones). In 2016 the 16nm Ultrascale-Plus FPGA products, this ARM based SOC has been further upgraded into MPSOC and RFSOC. The MPSOC was an order of magnitude more sophisticate and capable (multiple ARM A53 and AMR R5 with GPU + video + Codec) than the original 7-Series Zynq SOC; the RFSOC built 4Gbps ADC and DAC (Analog to Digital Convertor and Digital to Analog Convertor) [4]. Combined with MPSOC and RFSOC capability, they become perfect candidate of today’s 5G wireless front-haul applications. Figure 4 is an illustration of a 16 TRX MIMO (Multi-Input Multi-Output) antenna design in a 3.5Gbps 5G NR (New Radio), by using ZU-29DR RFSOC. Because of the integration of ADC and DAC, and the massive capability of MPSOC, it serves beam-forming, DFE (Digital Front End) and ADC/DAC functions. It reduces power, cost and footprint drastically, and allows system change by its programmability, to suit today’s 5G initial deployment which still faces many uncertainties. It is a very successful example of how adding right functionality will expand capability of FPGA and benefit the end user ultimately. Similar to design and integration of SERDES in FPGA during a not-so-mature technology node, bring up ADC and DAC also requires strong technical capability of predicting and modeling the analog and RF behavior of a new technology. 2.3HBM integrationLogic and DRAM memory technology development have long been parted. Each optimizes for better technology capability, market needs and cost. Thus integrating DRAM and logic in system in past was mostly through DDR interface. For wide bandwidth DRAM connections, large amount of DDR IOs are needed. Even so the bandwidth is still limited, power and latency still high. HBM (High Bandwidth Memory) is a stacked DRAM cube using 3D-IC with uBump (micro-bump) and TSV (Through Silicon Via). Because its wide (1024 channel in HBM-2) and high speed (2~3 Gbps in HBM-2), its bandwidth, power and latency are un-surpassable. Integrating HBM to a logic IC also needs to employ passive 3D-IC technologies, such as CoWoS (Chip on Wafer on Substrate) and others. In the 16nm Ultrascale-Plus FPGA family, HBM integration becomes available, which largely expands on-chip data processing capability because of ultra-wide bandwidth with low power and low latency. Figure 5 showed an illustration of FPGA integrated with 2 HBMs [5]. The key to succeed in HBM integration relies on Xilinx’ long and successful experiences of 3D-IC SSIT (Stacked Silicon Integration Technology) since 2011, as well as industry’s volume production learning. 3.NEXT, VERSAL THE ACAPBecause of the rapid growth in datacenter, explosion of AI (Artificial Intelligence), ML (Machine Learning), CNN (Convoluted Neuron Network), as well as other rapid evolutions in industry such as 5G wireless initial deployment, automotive ADAS (Advanced Driver Assistance System) and development of fully AD (Auto Drive), etc., the next generation of FPGA, Versal, has been built in 7nm technology and set up an ACAP (Adaptive Compute Acceleration Platform) [6]. Figure 6 is a function block diagram of Versal. Several new functional blocks, such as AI Engine, Network-on-chip, etc. will be created, and many other functional blocks, will be upgraded. All these hard blocks will serve as foundation to establish an adaptive acceleration platform including both hardware and software, as illustrated in Figure 7. With 7nm technology, as a first order calculation, one would be able to estimate approximately ~50% of scaling, ~20% performance gain at equal power vs. 16nm. However with all these innovations, software and hardware stacked platforms, the actual application will see much higher gain in overall performance, as shown in Figure 8. 3.1EUV readiness in 7nm and possible applicationIn TSMC’s 7nm technology, critical layers are all using immersion multi-patterning with uni-directional design. TSMC’s 7-Plus platform, will adopt several EUV lithography. Potential use of EUV for better uniformity thus better power-performance (for example in MEOL (Mid-End of Line) to reduce resistance), will be assessed. 4.CONCLUSIONFPGA continually increases its capability which ultimately benefits end customer applications, despite in past several technology nodes more challenges becoming obvious. The increase of capability has been achieved not only by technology scaling, but also by multiple “more-than-Moore” techniques in integration as well as continual architecture, design, software innovations in functionality advancement. Author would like to express their thanks to many in Xilinx of their help on this publication. REFERENCESMoore, Gordon E.,
“Cramming more components onto integrated circuits,”
Electronics, Google Scholar
K. Rupp,
“42 Years of Microprocessor Trend Data,”
(2018) https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/ Google Scholar
KeeHian Tan; Ping-Chuan Chiang; Yipeng Wang; Haibing Zhao; Arianne Roldan; Hongyuan Zhao; Nakul Narang; Siok Wei Lim; Declan Carey; Sai Lalith Chaitanya Ambatipudi; Parag Upadhyaya; Yohan Frans; Ken Chang,
“A 112-GB/S PAM4 Transmitter in 16NM FinFET,”
VLSI Symposium 2018, 45
–46
(2018). Google Scholar
Xilinx RFSOC product datasheet
(2019) https://www.xilinx.com/publications/product-briefs/rfsoc-product-brief.pdf Google Scholar
Suresh Ramalingam,
“HBM package integration: Technology trends, challenges and applications,”
2016 IEEE Hot Chips 28 Symposium (HCS),
(2016). Google Scholar
Victor Peng,
“Adaptable Intelligence,”
2018 IEEE Hot Chips 30 Symposium (HCS) Key Note Speech,
(2018). Google Scholar
|