Reducing Total System Cost with Low-Power 28-nm FPGAs

When building systems for high-volume applications, it is very important to keep costs in check. There are several dimensions that affect total cost of ownership beyond the price per part. These include the power demands of the silicon, total bill of materials (BOM) cost, and the productivity of the engineers who design and test the system. It is important to choose an FPGA vendor that has considered all the dimensions of system cost that can factor into a product design cycle.

Altera® Cyclone® V FPGAs help designers reduce total system cost in a number of ways. Designers benefit not only from TSMC’s 28-nm Low Power (28LP) manufacturing process, but also from the architectural decisions that have gone into the Cyclone V device family and the array of powerful productivity-enhancing tools featured in Altera’s design tool ecosystem. With Cyclone V FPGAs, customers not only enjoy the lowest cost of ownership in the industry, but the widest array of low-cost parts available—from 25K logic elements (LE) to 301K LEs—and the only 28-nm solution under 100K LEs.

Introduction

Do more with less—reduce your costs and power, increase your productivity, and make your product run faster. This vexing set of challenges is the reality that design engineers face today. Fortunately, Altera’s 28-nm product portfolio offers a tailored approach to solve these problems.

Utilizing TSMC’s 28LP process plus wire bond packaging, Cyclone V FPGAs provide an unparalleled combination of low system cost, the lowest power of any 28-nm FPGA, and high functionality. The Cyclone V FPGA family comes in six targeted variants: a logic-only (E) variant, a 3G transceiver-based (GX) variant, and a 5G transceiver-based (GT) variant plus SoC derivatives of these variants (SE, SX, and ST, respectively), each with an integrated dual-core ARM® Cortex™-A9 MPCore™ applications-class processor. Each device variant offers an abundance of hard intellectual property (IP) blocks that enable you to differentiate your products and do more with less. Examples of technological advances over previous-generation architectures include adaptive logic modules (ALMs), variable-precision digital signal processing (DSP) blocks, fractional phase-locked loops (fPLLs), and hard memory controllers, just to name a few.

You can enjoy a reduced total cost of ownership with Cyclone V FPGAs as compared to earlier Altera device families and as compared to competing 28-nm FPGAs. Cost benefits stem from TSMC’s 28LP manufacturing process, the rich architectural features available in the device, and Altera’s wide array of productivity-enhancing design tools. Cyclone V FPGAs are the right choice for applications spanning numerous markets, including industrial, communications, military, and automotive.
Leverage TSMC’s 28LP Manufacturing Process to Reduce Design Costs

Altera implemented a two-pronged manufacturing strategy at 28-nm, utilizing TSMC’s 28-nm high performance (28HP) process for systems that demand the highest bandwidth possible and the 28LP process for cost- and power-sensitive applications. Stratix® V FPGAs utilize the 28HP process, while Arria® V and Cyclone V FPGAs both utilize the LP process. Lower power, of course, translates to lower operational costs and lower total cost of ownership for any electronic system.

Granularity of process options gives customers a choice of products to deliver the best fit for their needs. TSMC, Altera’s foundry partner, states: “With solutions on multiple technology platforms, customers can enjoy much wider flexibility and deliver products with the best performance.” Offering multiple products is better for customers than the “one-size-fits-all” approach used in competing 28-nm products. With a single manufacturing process, it is simply not possible to optimize for low power and high performance at the same time. Even the best-intentioned binning strategies will not overcome the fact that a single-process introduces a performance hit on power-binned parts and a power increase on performance-binned parts. In addition, binning is likely to increase system cost and introduce significant schedule and supply risk to customers.

In contrast, the cost-optimized 28LP process utilized in Cyclone V FPGAs is tailored specifically to low-cost and low-power applications. It minimizes both leakage current and dynamic current through a variety of techniques, including using longer gate channels than in the 28HP process. Cost is also minimized by employing a more conventional metallization scheme than in the 28HP process and by using wirebond packaging. Compared with flip chip packaging, wirebond packages can save customers up to approximately $5 per part. Altera’s transceiver design expertise is reflected in the high reliability and low power of its high-speed serial interfaces. In early power estimation benchmarks, Cyclone V FPGAs have demonstrated considerable power savings as compared to Cyclone IV FPGAs (Figure 1) and up to 40 percent total power savings over Xilinx Artix-7 FPGAs (Figure 2).

Figure 1. Estimated Cyclone V FPGA Power Savings vs. Previous Generation Technology
Altera’s Wide Range of Low-Cost 28-nm Options Add Design Flexibility

From a system design perspective, it is advantageous to have a wide selection of device densities available within a given FPGA family. Cyclone V FPGAs are the clear leader in the low-cost 28-nm device market with family members ranging in size from 25K LEs to 301K LEs. This gives designers the opportunity to design into smaller parts and migrate up if the product scope expands. Likewise, it gives them the opportunity to use a smaller device if the design is scaled back. Ordinarily, switching device families in the middle of a design cycle to handle these types of engineering change orders (ECOs) is costly in terms of time and resources. With abundant vertical migration options across the Cyclone V family, Altera offers the most comprehensive and cost-effective range of low-cost FPGA device options.

**Figure 2.** Estimated Cyclone V FPGA Power Savings vs. Competing Devices on a Broadcast Market Design, Assuming Worst Case Process

Note to Figure 2:
(1) Based on Altera’s EPE v11.1SP2B5 and Xilinx’s XPE13.4.
Cyclone V FPGA Architecture Cuts Design Costs

Altera’s 28-nm architecture cuts design cost in a variety of ways. The core fabric maximizes logic efficiency and provides the tightest interconnect available today. The hard IP enables high performance and flexibility with minimal design time. Optimized transceivers provide best-in-class signal integrity while minimizing debug time. Using only two voltage rails makes the power distribution network cheaper and easier to design. The fPLLs allow the synthesis of any frequency clock without expensive oscillators, and intelligent pin placement improves device routability and signal integrity.

Core Fabric and Routing Maximizes Logic Efficiency

Cyclone V FPGAs utilize an innovative core fabric to efficiently implement both logic and DSP functions. It is estimated that the enhancements can save designers up to an estimated $20 per part compared to previous-generation technology due to improved logic utilization.

The basic building block of the Cyclone V architecture is the ALM. It consists of an 8-input fracturable look-up table (LUT) plus two adders and four registers—all tightly packed for high performance and to make optimal use of silicon real estate. This architecture is similar to Altera’s high-end devices and constitutes an evolution from Cyclone IV FPGAs, where the fundamental building block is the LE, which has a 4-input LUT and a single register. The ALM, with its tight packing, not only increases the cost-effectiveness of the silicon, but makes timing closure easier, particularly in register-rich and heavily pipelined designs. There are up to 301K-LE equivalents in the Cyclone V family, organized into vertically-adjacent logic array blocks (LABs) with 10 ALMs per LAB. The ALMs are automatically configured by the fitter (provided by Altera’s Quartus® II development software) to implement pure combinational or arithmetic functions as required by the application.
Cyclone V FPGAs feature a new embedded memory block, the M10K. This memory block is smaller than embedded memory blocks in competing architectures, which translates to higher granularity, more memory ports per silicon area, and fewer wasted blocks. This on-chip memory architecture is well suited to DSP-intensive applications, such as motor control, studio equipment, and 3D television. For efficient and low-cost handling of wide shallow buffers and delay elements, Cyclone V devices also offer the smaller 640-bit MLAB block.

Cyclone V FPGAs also employ high-performance variable-precision DSP blocks. Altera’s innovative DSP blocks, with dedicated coefficient banks and a feedback path for finite impulse response (FIR) filters, allow designers to independently configure the precision of each multiplier from 9x9 to 27x27 bits depending on the needs of the application. This capability enables Cyclone V FPGAs to deliver the appropriate precision multiplier for job at hand, enabling designers to realize the most efficient hardware implementation possible.

For example, a simple video processing application may require only 9-bit precision, while a high-end color system may require 24 bits. In the case of 9-bit video, a single block can fracture to support three 9-bit multipliers, tripling the DSP block efficiency. A single variable-precision block can efficiently address this full range. This allows designers to adapt FPGA resources to their algorithms rather than adapting the algorithm to fixed resources.
Hard IP Enables High Performance and Flexibility While Minimizing Design Time

Altera has hardened certain commonly used IP blocks (e.g., double data-rate memory controllers, protocol stacks, and even embedded ARM processors) into fixed silicon to enhance performance, lower power, and reduce costs by freeing up valuable programmable logic resources for use in other logic functions. As an example, a PCI Express® (PCIe®) protocol stack, which requires around 150K LEs as a soft implementation, requires as little as one-third the device area in hardened blocks. Customers who have tried implementing a PCIe core in competing technologies and tools have expressed that they can save up to six weeks average design and debug time using Altera’s hard IP in conjunction with the Qsys system integration tool. This translates to significant cost savings to the design team.

Altera also introduced the first PCIe multi-function support in FPGAs. This technology simplifies sharing of PCIe link bandwidth between multiple peripherals. Supporting up to eight functions, PCIe multi-function support integrates multiple single-function endpoints into one multifunction endpoint. This shortens development time and can save up to 20K LEs.

The PCIe multifunction gives designers a great way to customize industry-standard processors with completely unique peripheral sets that reside within the FPGA logic. In addition, it enables designers to use standard operating system (OS) software drivers to share PCIe link bandwidth among the peripherals in the FPGA. Without multifunction support, it is a major development effort to customize software drivers to achieve this type of resource sharing. Moreover, multifunction support can also potentially reduce costs by eliminating the need for multiple soft or hard PCIe cores, which are integrated into a single multifunction PCIe endpoint.

Hard IP first appeared in Altera’s 40-nm devices at the PHY layer, eliminating the need for external high-performance serial I/O board components. In Altera’s 28-nm devices, embedded hard IP blocks provide a measure of ASIC cost, performance, and power characteristics without compromising design flexibility. For instance, the PCIe hard IP block can be configured to support PCIe Gen1 or Gen2 in Cyclone V GT devices. In addition, Cyclone V FPGAs offer up to two hard PCIe cores—double that of competing devices. As an added benefit, hard IP blocks also consume up to 65 percent less power while offering up to 50 percent higher performance than soft logic implementations. Table 1 lists the hard IP functions in Cyclone V FPGAs and the amount or resources saved via hard implementation.

Table 1. Hard IP Functions in Cyclone V FPGAs

<table>
<thead>
<tr>
<th>Hard IP Block</th>
<th>FPGA Resources Saved per Block</th>
</tr>
</thead>
<tbody>
<tr>
<td>32-bit DDR3/DDR2 memory controller with ECC, command or data</td>
<td>&gt;40K LEs and 45 M10K blocks</td>
</tr>
<tr>
<td>PCIe Gen1 and Gen2</td>
<td>&gt;10K LEs</td>
</tr>
<tr>
<td>PCIe Multifunction</td>
<td>&gt;20K LEs</td>
</tr>
<tr>
<td>ARM Cortex-A9 MPCore processor and peripherals</td>
<td>&gt;40K LEs</td>
</tr>
</tbody>
</table>
Proven Transceivers Optimized for Various Data Rates Minimize Debug Time

Altera’s 28-nm portfolio introduces a modular transceiver that enables designers to match device performance with the application. The transceivers use the same base architecture across all of Altera’s 28-nm FPGA families, where maximum operating speeds range from 3.125- to 28-Gbps. As in Stratix V and Arria V devices, Cyclone V transceivers can be dynamically switched between several different speed settings to draw less power at reduced speeds. This selectability provides a way to reduce average system power consumption by operating the transceivers at minimal speed when idle and ramping to higher speeds as needed.

If an application, such as I/O expansion, requires 5-Gbps or less of transceiver performance, then it does not require the power and cost of the large transistors needed for 28-Gbps operation. Rather, the design could be perfectly suited for the Cyclone V FPGA family, as the transceivers achieve 3.125- and 5-Gbps performance at the lowest power and cost. Like the transceivers in Stratix V and Arria V FPGAs, Cyclone V FPGA transceivers support a wide variety of protocols, including 3G SDI, Gigabit Ethernet (GbE), CPRI, Display Port, PCIe, Serial ATA (SATA), and Serial RapidIO®. Altera transceivers’ high signal integrity and real-time debug capability via the Transceiver Toolkit can shave weeks off board bring-up and debug time.

For more information about the Transceiver Toolkit, refer to the Transceiver Toolkit page of the Altera website.

Only Two Voltage Rails Simplifies Power Distribution and Reduces Cost

Cyclone V FPGAs require the fewest voltage rails of any low-cost FPGA. They have built-in on-chip voltage regulators so that you can use as few as two voltage rails to support both logic and transceiver power. This reduces the need for on-board voltage regulators and can simplify board design by easing routing congestion and reducing the number of board layers needed. Competing devices require a minimum of three voltage rails to support the core, I/O, and transceiver logic. A single extra power rail adds an estimated cost of $10 to $30 to your board development budget because of the additional components needed, the PCB real-estate required, and the routing congestion penalties that will be incurred.

fPLLs Synthesize Any Frequency and Replace Extra Oscillators

Altera’s 28-nm devices implement general-purpose phase-locked loops as fPLLs, which are capable of advanced fractional frequency synthesis in addition to M/N frequency realization. In standard PLLs, both M and N values are integers. Altera utilize a delta-sigma modulator in conjunction with 32-bit M and N values in the feedback path to allow the feedback M divider to take on fractional values. This allows precision frequency synthesis. With the capability to synthesize any clock frequency, fPLLs can replace extra oscillators on your board, reducing board costs and space.
Intelligent Pin Placement Improves Routability and Reduces Debug Time

Cyclone V FPGAs provide the best signal integrity at the lowest development cost. A regular checkerboard power and ground pattern is employed for easy layout. In addition, transceiver placement is regular and repeating along the left edge of the device with the receiver always on the outside for the best signal integrity. Memory I/O pins are also placed away from and shielded from the transceivers. Altera’s approach is to reduce time-consuming debug efforts by avoiding pin-placement issues from the outset.

Altera System Design Tools Reduce Total Cost of Ownership

Altera’s integrated design environment, featuring the Quartus II software, provides the FPGA industry’s most advanced set of tools to reduce development costs and time to market. The Quartus II software lets you quickly and efficiently design an entire FPGA, from concept to production. It offers an ASIC-like timing closure tool (TimeQuest Timing Analyzer) and plenty of in-system debug capabilities. Its productivity-enhancing features include the Qsys system integration tool, the System Console, and the Transceiver Toolkit, as well as DSP Builder and the SoC Virtual Target software platform.

System Integration with Qsys

Qsys is the next-generation SOPC Builder tool and is intended to help designers build and scale systems. Qsys enables fast integration of user-created and off-the-shelf IP blocks, which accelerate your design flow and increase your productivity. In addition, Qsys supports hierarchical design to ease management of large designs, such as implementing and testing systems with hundreds of components easily and manageably, and to facilitate design reuse. It also offers up to twice the push-button performance interconnect of SOPC Builder, based on a network-on-a-chip architecture and automatic pipelining. Ultimately, Qsys helps designers shave months off their development time, implementing cores like PCIe in days rather than weeks.

System Exploration and Debug with System Console

System Console is a utility that enables users to debug an FPGA in real time at a high level of abstraction using system-level transactions and a convenient software application programming interface (API) that can be scripted or run interactively from a command shell or the System Console graphical user interface (GUI). System Console is particularly useful for tasks like board bring-up, saving designers weeks of struggle through its ability to let you access and control FPGA hardware over JTAG or TCP/IP.
DSP Application Design Using the DSP Builder Featuring the Advanced Blockset

DSP Builder lets you design FPGAs in the world’s premier DSP design tool, MATLAB® Simulink®. This design tool lets you stay in your familiar EDA environment, design with an easy-to-understand schematic entry tool, and automatically generate synthesizable RTL code to target an Altera FPGA. You can even compile the design in the Quartus II software directly from the MATLAB environment, allowing you to build FPGA designs without prior knowledge of Verilog or VHDL. This seamless integration from the engineering system level (ESL) design environment to the FPGA design environment can save the design team immensely in terms of investment in personnel and FPGA design expertise.

DSP Builder offers two main plug-ins, the basic and advanced blocksets, to Simulink that let you drop in components, stitch them together, and simulate. Both blocksets let you drop synthesizable components into the Simulink schematic browser. With the advanced blockset, DSP Builder will automatically pipeline your data path to meet your $f_{\text{MAX}}$ objective and re-use blocks whenever possible.

The SoC Virtual Target

The Altera SoC FPGA Virtual Target provides a fast functional simulation of a dual-core ARM Cortex-A9 MPCore embedded processor development system found in the Cyclone V SoC FPGAs. This complete prototyping tool, which models a real development board, runs on a PC and boots the Linux operating system out of the box. Designed to be binary- and register-compatible with the real hardware that it simulates, the Virtual Target enables the development of device-specific production software that can run unmodified on real hardware. Using a virtual prototyping tool, you can jump-start your software development well in advance of hardware availability, make your software team more productive, and improve your software quality.

To fully represent Altera’s SoC FPGA devices, the Virtual Target also features an FPGA extension to the PC-based simulation called FPGA-in-the-loop. As shown in Figure 5, the FPGA-in-the-loop extension allows the Virtual Target to interface to an Altera off-the-shelf FPGA development board, where you can implement your custom IP and coexecute it with the other components of the Virtual Target. This feature allows you to test your software with FPGA hardware, such as custom peripherals and hardware accelerators.
Integration Example—Automotive Analytics with Cyclone V FPGAs

Cyclone V FPGAs are ideal for numerous applications. One application that is gaining momentum is automotive analytics. Cyclone V FPGAs’ low cost of ownership and high functionality fit ideally in this application space. Hardware features like hard memory controllers, high-speed serial transceivers, fPLLs, and abundant internal logic and memory resources can all be exploited by the computationally and memory-intensive processes required in serial video data processing.

In addition, Altera’s Video and Image Processing (VIP) Suite lets users easily develop complex video processing systems in Qsys. Figure 6 shows an example of video data integration inside an automobile. Cyclone V FPGAs can be used very effectively in this environment because they provide high-definition capability and other video processing features like scaling and object detection inexpensively and with low power dissipation.
**Conclusion**

Cyclone V FPGAs reduce the overall cost of ownership. TSMC’s 28LP process, which is designed for the lowest power possible, is also the lowest cost 28-nm manufacturing process. Low power dissipation translates to improved system reliability and system life and to lower overall operational costs throughout the customer value chain. In addition, Cyclone V FPGAs have numerous architectural advantages that contribute to lower system cost, including hard memory controllers, highly efficient logic and routing resources, fPLLs, variable-precision DSP blocks, and minimal voltage-rail requirements. In addition, the Quartus II software with Qsys and System Console, DSP Builder, and the SoC Virtual Target platform make designing for Cyclone V FPGAs easy and efficient. Altera silicon and design tools work in concert to provide the lowest total cost of ownership to the FPGA designer.
Further Information

- White Paper: *Accelerating DSP Designs with the Total 28-nm DSP Portfolio*
- White Paper: *Optimize Power and Cost with Altera’s Diversified 28-nm Device Portfolio*
- White Paper: *Achieving Lowest System Cost with Midrange 28-nm FPGAs*
- White Paper: *Decrease Total System Costs with the Industry’s Lowest Cost, Lowest Power FPGAs*
- Documentation: Cyclone V Devices
  www.altera.com/literature/lit-cyclone-v.jsp
- Training and Events: Online and instructor-led training classes
  www.altera.com/education/edu-index.html
- 28 Nanometer Process Technology, TSMC

Acknowledgements

- David Olsen, Product Marketing Manager, MTS—Low Cost Products, Altera Corporation

Document Revision History

Table 2 lists the revision history for this document.

<table>
<thead>
<tr>
<th>Date</th>
<th>Version</th>
<th>Changes</th>
</tr>
</thead>
<tbody>
<tr>
<td>April 2012</td>
<td>1.1</td>
<td>Added “System Integration with Qsys”.</td>
</tr>
<tr>
<td>March 2012</td>
<td>1.0</td>
<td>Initial release.</td>
</tr>
</tbody>
</table>