Multi-Processor Solutions with FPGAs
by Bob Garrett, Senior Marketing Manager, Nios Marketing, Altera Corporation
Q1 2006 Issue
Page 1 of 2
Embedded designers seeking high-performance processing inevitably face the cost, performance, and power "Bermuda triangle" where the best of intentions can achieve any two of the key objectives, but fails to achieve all three. Custom ASIC designs are suitable for those few who can afford the time, expense, and risk involved. As device geometries continue to shrink and ASIC design costs continue to grow, fewer and fewer applications can justify the expense of a full-custom design.
FPGA-based embedded systems featuring multiple soft core processors offer a powerful set of new options for the embedded designer. No longer are ASIC designers alone in their ability to configure performance-optimized systems-on-chip with a custom tailored feature set. Now developers can change the performance characteristics of their embedded system right up to the time the product goes into final test. Developers also can extend product life cycle, getting to market quickly and upgrading both software and hardware features remotely over the Internet.
While the term "multi-processor" can conjure up memories of academic papers on "parallel processing," commercial applications of multiple CPUs in a single device are much more straightforward. When starting a new design, developers must meet certain performance criteria. Partitioning duties among multiple soft processors not only provides the design flexibility to adapt to last-minute design changes caused by evolving standards or competing products, but also the ability to keep pace with this performance criteria. Designers can use multiple soft processors as a divide-and-conquer strategy to increase overall system performance or offload tasks from an existing processor. Designers typically use 400- to 800-MHz discreet processors to perform the myriad of device tasks required, both simple and demanding. Using multiple soft processors enables a more efficient use of processing power by partitioning tasks based on time and power requirements, while providing the same or better overall performance.
The number of soft core processors that designers can implement in a single FPGA is limited only by the device's resources (i.e., its logic and memory). High-density FPGAs, for example, can contain hundreds of soft core processors. Likewise, designers can implement different types of soft core processors (i.e., 16- or 32-bit, performance optimized, or logic-area optimized processors).
The coding algorithm can be split among multiple processors, depending on the tasks involved. Time critical tasks can be assigned to dedicated processors, while less demanding duties can be shared on one or more other CPUs. This flexibility enables logical grouping of tasks, and potentially higher performance levels while running at a reduced clock frequency lowering system power consumption.
Embedded Processors in FPGAs
Building a custom device containing an exact set of peripherals, memory interfaces, and processing functions is not hard to imagine—ASIC designers have been doing it for years. Efforts to create an economically viable custom FPGA-based embedded processor device were unsuccessful until the late 1990s when FPGAs had enough on-chip memory, programmable logic, and raw performance. Today, embedded intellectual property (IP) functions designed specifically for FPGAs—including CPUs, signal processing engines, peripherals, and standard communications interfaces—are readily available and offer both cost and performance benefits over traditional discrete embedded devices.
Essentially, designers partition the problem the same way they might if they were building a multi-processor system on a printed circuit board (PCB), each assigned to a specific task. For example, one processor might perform general system housekeeping such as monitoring cabinet fans, man-machine interface, or the system console, while the others handle communications, signal processing, statistics gathering, or other system tasks.
The multiple processor approach can reduce overall device cost by moving individual processors from the device board onto the FPGA, which decreases device board size. This also enables less signal routing between processors requiring fewer interconnects, and more low-level processors running at lower clock frequencies, which reduces layers on the circuit board. See Figure 1.
Figure 1. Multiple Independent Processor

This approach can also reduce software design costs, which represent 80 percent of overall system design expenses due to time-consuming code writing. If the task can be partitioned to multiple processors, it is easier for engineers to write, debug, and maintain codes. This potentially represents a huge backend savings, enabling faster code development and debugging and, when the product matures, easier code maintenance because it is much easier to analyze.
Multi-Channel Applications
Multi-channel applications can be scaled to meet system throughput by using multiple processors in a single chip, each dedicated to handling a portion of the overall channel throughput. Each processor may run the exact same code, or some may change algorithms on-the-fly to adapt to system requirements. In some cases, a master processor is added to handle general housekeeping chores such as system initialization, statistics gathering, and error handling. See Figure 2.
Figure 2. Channelized Processing

Serially-Linked Processors
Combining several processors in a chain lets system architects treat each as a stage in a larger processing pipeline. Each CPU, responsible for one piece of the overall processing task, can share data memory (arbitrated or dedicated memory interfaces if off-chip, or dual-ported memory if on-chip) to pass results from the output of one stage to the input of the next. See Figure 3.
Figure 3. Pipelined Processing

Processor Companion Chip
Discrete processor and digital signal processing (DSP) chips connected to an FPGA can also benefit from hardware acceleration, peripheral expansion, and interface bridging, regardless of whether a CPU is inside the FPGA. Chip-to-chip interface IP is readily available today to provide external access to peripherals, acceleration logic, and I/O interfaces contained within the FPGA. See Figure 4.
Figure 4. Co-Processing/Companion Chip

Establishing Processor Performance
Establishing processor performance requirements for embedded systems can be challenging, particularly when the application software is still in flux. Industry-standard benchmarks provide some guidance, but nothing is certain until the software is complete. This tends to make designers cautious about under-calling their performance needs and can result in selecting a higher performance (and higher price) device than necessary. If a designer could accurately predict the performance required, processor selection would be much simpler. Such estimates would consider performance required by time-critical tasks as well as the load created by one or more low-priority tasks.
FPGA-based embedded systems can provide scalable performance, allowing last-minute changes to boost system performance based on customer demands. Compute-intensive algorithms, converted to logic in an FPGA, can run orders of magnitudes faster than the same algorithm run in software by a microprocessor or digital signal processor. More importantly, hardware resources can be applied to performance-hungry algorithms where they are needed most, potentially reducing the need for a high-performance CPU, reducing clock frequency, reducing power consumption, and simplifying the board design.
Page 1 2
|