HSDPA—Low-Cost FPGA Co-Processor for Channel Coding
High-speed downlink packet access (HSDPA) is based on the evolution of wideband code division multiple access (W-CDMA) technology and has been standardized in the 3GPP W-CDMA Release 5 specifications. Targeted at mobile multimedia applications, HSDPA is capable of achieving reduced delays and peak data rates up to 14 Mbps in the downlink, (i.e., from the basestation to the mobile terminal). This is made possible by the addition of a new high-speed downlink shared channel along with three fundamental technologies relying on rapid adaptation of transmission parameters to the instantaneous channel conditions:
- Adaptive modulation and coding (AMC)
- Fast hybrid automatic-repeat-request (ARQ)
- Fast scheduling
HSDPA Channel Coding Implementation Issues
HSDPA channel coding involves rate-one-third turbo encoding and other functions such as cyclic redundancy check (CRC), rate matching and interleaving (shown in Figure 1).
Figure 1. Channel Coding Scheme in HSDPA

The turbo encoder consists of two recursive convolutional encoders and an internal interleaver. While the convolutional encoders are simple to implement in both hardware and software, the interleaver tends to be complex due to its variability. Any block size from 40 to 5114 must be supported, and the block size can vary every transmission time interval (TTI) of 2 ms. This is a significant computational burden for a digital signal processor and will add to the latency, which is a critical parameter in HSDPA.
An alternative to using a digital signal processor to perform this function is to download blocks of data to a turbo encoder accelerator function implemented on an FPGA. This removes the need to calculate the look-up table (LUT) content for the interleaver, and also takes the highly repetitive encoding task off the digital signal processor, freeing up bandwidth for the other operations the digital signal processor has to perform.
Channel Coding Acceleration with Altera FPGAs
This section describes the efficient implementation of channel coding functions using cost-effective Cyclone™ devices from Altera.
Integrated Channel Coding Solution
In addition to turbo encoding, other functions such as CRC generation, code-block segmentation, rate matching, interleaving, and symbol mapping can also be efficiently implemented on a single Cyclone EP1C12 FPGA. This not only removes the computational burden for highly repetitive instructions from the digital signal processor, but also reduces the required data bus bandwidth. As data passes through the channel coding chain, shown in Figure 1, the number of bits increases. If data is downloaded at the very beginning of the chain, the smallest number of bits must be transferred from the digital signal processor to the accelerator. Table 1 lists the estimated number of logic elements (LEs) and memory bits required to implement each of the channel coding functions. The total computational requirement is well within the capacity of a single Cyclone EP1C12 device.
| Table 1. Computational Requirements for Integrated Solution |
|
Function
|
LE |
Memory Bits |
| CRC Attachment |
50 |
0 |
| Bit Scrambling |
30 |
0 |
| Code Block Segmentation |
300 |
0 |
| Turbo Encoding |
2,100 |
30,000 |
| Physical Layer Hybrid-ARQ Functionality |
1,400 |
30,000 |
| Physical Channel Segmentation |
100 |
0 |
| High-Speed Downlink Shared Channel Interleaving |
500 |
30,000 |
| Constellation Rearrangement for 16QAM |
100 |
0 |
| Physical Channel Mapping |
50 |
0 |
| Parameter Calculations |
4,000 |
10,000 |
| All Functions Together |
8,630 |
100,000 |
Adaptive Parameter Calculations on Nios Processor
The physical layer Hybrid-ARQ functionality involves performing rate matching in two stages. Implementing the two stages involves parameters calculation that determines the necessity and extent of puncturing or repetition. In addition, other variable parameters such as the block size of the turbo encoder and parameters for physical channel segmentation also need to be computed. The algebraic computations involved in these parameter calculations can be efficiently implemented on the flexible Nios® embedded processor. This gives designers the flexibility and portability of high-level software design, while maintaining the performance benefits of parallel hardware operations in FPGAs.
Altera FPGA Co-Processor Features
Altera has developed design tools and methodologies that enable companies to develop FPGA co-processing solutions using Altera’s Stratix® II, Stratix, and Cyclone devices. Altera® FPGA co-processors interface with a wide range of digital signal processors and general-purpose processors, providing increased system performance and lower system costs. The high-level architecture for hardware acceleration using Altera’s FPGA co-processors with TI’s digital signal processors is illustrated in Figure 2. The hardware accelerators are direct memory access (DMA)-driven via the TI external memory interface (EMIF), and the data is buffered using first-in first-out (FIFO) buffers.
Figure 3. Altera FPGA Co-Processor Example

Altera Advantage for HSDPA
This section outlines the many advantages of using Altera solutions to implement HSDPA.
Low Cost
A high-end digital signal processor typically costs around $130, with the turbo encoding process alone taking up 30 to 40 percent of its resource. This is very inefficient when compared to Altera’s cost-effective Cyclone FPGA, which can do everything from CRC, turbo, and rate matching to interleaving and quadrature amplitude modulation—all in one Cyclone EP1C12 device that costs around one-fifth the price of a high-end digital signal processor (10K units pricing).
Table 2 gives an example of the cost reduction that can be achieved by performing just the turbo encoder accelerator function on the Cyclone platform, as opposed to a high-end digital signal processor.
| Table 2. FPGA Accelerator: Cost Analysis Example |
|
Encoder
|
Cyclone Device |
High-End Digital Signal Processor |
| 14.4 Mbps Turbo Encoder |
- $7.00
- 2,544 LE equivalent
- Based on EP1C3T100C8 10K Units (July 2003)
|
- $29.50
- 136/600 MHz
- Based on 9.7 cycles/bit (1) high-end digital signal processor/600 MHz @ $130/10K unit pricing
|
| 58 Mbps Turbo Encoder |
- $10.50
- 2,544 LE equivalent
- Based on EP1C3T100C6 10K units (July 2003)
|
- $122.00
- 563/600 MHz
- Based on 9.7 cycles/bit (1) high-end digital signal processor/600 MHz @ $130/10K unit pricing
|
Note to Table 2:
1. The source for the high-end digital signal processor is the TI web site. The 9.7 cycles/bit does not include the calculations necessary for the interleaver table setup when the block size changes every 2ms.
Flexibility
The channel coding process involves bit-wise operations. This leads to inefficient use of resources when implemented with digital signal processors, which have fixed data-bus widths. The Cyclone device’s M4K memory blocks and the Nios processor can be customized by using different data widths, coefficient widths, and precision choices as needed, providing an optimal digital signal processor implementation for the channel coding application.
Co-Processor Features
The overall architecture flexibility of Altera FPGA co-processors enable a system definition that can be relatively tightly coupled to the master CPU or a loosely coupled data-processing plane that has only minimal set-up and status interaction with the master CPU. This wide variation in capabilities makes Altera FPGA co-processors suitable for dealing with systems with a wide range of performance and flexibility requirements.
Development Environment
Altera’s DSP Builder and SOPC builder development tools and Quartus® II software enable system designers to easily build and interface Altera FPGA co-processing blocks with standard processors. Systems designers are not required to have a background in register transfer level (RTL) design and do not need to make any changes to the software development environment or the digital signal processor platform.
Related Links
|