CPU Architecture
The first generation Nios® embedded processor CPU instruction set architecture is optimized for programmable logic and system-on-a-programmable-chip (SOPC) integration. The Nios CPU is a five-stage pipelined general-purpose RISC microprocessor that supports both a 32-bit and 16-bit data path. Both the 32-bit and 16-bit Nios CPUs utilize a 16-bit instruction format to reduce code footprint and instruction memory bandwidth. The instruction set is optimized for compiled embedded applications.
This page describes the Nios CPU architecture, including:
The Nios embedded processor implements the CPU with separate data and instruction-memory bus masters, generally known as a modified-Harvard memory architecture. The SOPC Builder system development tool allows users to easily specify connections between both Avalon™ bus masters and slaves in a system. These slaves may be memories or peripherals.
The Nios instruction bus is a 16-bit wide, latency-aware Avalon master used to fetch instructions from memory. The Nios data bus is 32 or 16 bits wide for, respectively, 32-bit and 16-bit configurations of the Nios CPU. For details about the implementation of the latest version of the Nios CPU, please refer to the Nios CPU Data Sheet (PDF).
Instruction Set
The Nios instruction set is tailored to support compiled C and C++ programs. It includes a standard set of arithmetic and logical operations and instruction support for bit operations, byte extraction, data movement, control flow modification, as well as a small set of conditionally executed instructions, which can be useful in eliminating short conditional branches. The instruction set contains rich addressing modes to reduce code size and increase the processor performance.
For more information about the Nios instruction set, refer to the Nios 32-Bit Programmer's Reference Manual (PDF) and the Nios 16-Bit Programmer's Reference Manual (PDF).
Register File
The Nios CPU architecture has a large general-purpose windowed register file, several machine-control registers, a program counter, and the K register that is used for instruction prefixing.
The general-purpose registers are 32 bits wide in the 32-bit Nios CPU and 16 bits wide in the 16-bit Nios CPU. The register file size is configurable and contains a total of 128, 256, or 512 registers. The software can access the registers exposed in a 32-register-long sliding window that moves with a 16-register granularity. This sliding window allows fast context switching, accelerating subroutine calls and returns.
For more information about the Nios CPU registers, refer to the Nios 32-Bit Programmer's Reference Manual (PDF) and Nios 16-Bit Programmer's Reference Manual (PDF).
Cache Memory
The configurable Nios CPU can optionally contain an instruction and data cache. In general, cache is used to improve CPU performance by providing a local memory system that can respond quickly to CPU-generated bus transactions. The Nios cache implementation is a simple, direct-mapped, write-through architecture that is designed to maximize performance and minimize device resource consumption.
For more information about the cache implementation in the Nios CPU, refer to the Nios 32-Bit Programmer's Reference Manual (PDF) and Nios 16-Bit Programmer's Reference Manual (PDF).
Exception Handling
The Nios processor allows up to 64 vectored exceptions, which can be generated from any of these three sources: external hardware interrupts, internal exceptions, or explicit software trap instructions. The Nios exception-processing model allows precise handling of all internally generated exceptions.
For more information regarding the Nios CPU exception handling, refer to the Nios 32-Bit Programmer's Reference Manual (PDF) and the Nios 16-Bit Programmer's Reference Manual (PDF).
Users can optionally disable support for TRAP instructions, hardware interrupts, and internal exceptions. This option reduces the size of the Nios system, and is intended for use only in systems where the processor is not running complex software. For details about the latest version of the Nios CPU implementation, refer to the Nios CPU Data Sheet (PDF).
Hardware Acceleration
The Nios instruction set can be configured to take advantage of hardware to increase system performance. Specific cycle-intensive software operations can be offloaded to hardware, increasing system performance significantly. This feature is provided through instruction set modifications. The Nios processor has two levels of instruction set modifications:
Custom Instructions
Developers can accelerate time-critical software algorithms by adding custom instructions to the Nios processor instruction set. Developers can use custom instructions to implement complex processing tasks in single-cycle (combinatorial) and multi-cycle (sequential) operations. Additionally, user-added custom instruction logic can access memory and/or logic outside of the Nios system. Figure 1 shows a block diagram of the instruction logic.
Figure 1. Custom Instruction Logic

A complex sequence of operations can be reduced to a single instruction implemented in hardware. This feature empowers developers to optimize their software inner loops for digital signal processing (DSP), packet header processing, and computation-intensive applications.
The Altera® SOPC Builder software provides a graphical user interface (GUI) that developers can use to add up to five of their own custom instructions to the Nios embedded processor. Designers needing higher processor performance, smaller FPGA footprint, or more robust software development tools can take advantage of the Nios II embedded processor family.
Standard CPU Options
Altera provides several pre-defined instruction set extensions to increase software performance. The MUL and MSTEP instructions are implemented with additional hardware units. When you select either of these CPU options in the SOPC Builder, logic is added to the arithmetic logic unit (ALU). For example, if a user chooses to implement the MUL instruction, an integer multiply unit is added automatically to the CPU's ALU to return a 16-bit by 16-bit multiplication operation in two clock cycles. This same operation performed using an iterative software routine would take 80 clock cycles. See Table 1 for clock cycle to resource usage tradeoffs.
| Table 1. Hardware-Assisted Multiplication: Speed to Resource Usage Tradeoff |
| Multiplication Option |
Logic Elements
(LEs) Used |
Clock Cycles
16 x 16=>32
(1) |
Clock Cycles
32x32=>32
(2) |
| None (Software) |
0 |
80 |
250 |
| MSTEP |
125 |
18 |
80 |
| MUL |
370 (3) |
3 |
20 |
Notes:
- Integer multiplication between two unsigned 16-bit numbers produces an unsigned 32-bit result. Integer multiplication between two signed 16-bit numbers produces a signed 32-bit result.
- Integer multiplication between two unsigned 32-bit values produces an unsigned 32-bit result. Integer multiplication between two signed 32-bit values produces a signed 32-bit result.
- Note: When targeting Stratix™ devices, the MUL unit is implemented in DSP blocks, utilizing no additional logic elements.
Additionally, the Nios CPU includes an internal shift unit for executing logical and arithmetic shift instructions. The CPU uses fixed barrel-shifter logic that executes all shift operations in two clock cycles. For more information on the Nios embedded processor CPU implementation details, refer to the Nios CPU Data Sheet (PDF).
Hardware-assisted operations underline a key benefit of soft-core processors in programmable logic devices (PLDs). Nios developers can make speed to area tradeoffs, thereby adding flexibility to SOPC designs.
|