APS-DSP Digital Signal Processing Coprocessor Core
On this page: Description
| Implementation Results | Features
| Applications | Block Diagram
| Functional Description | Example
| Coprocessor Performance | Support
| Verification | Deliverables
The APS-DSP implements a fixed point,16-bit RISC
DSP coprocessor extension to the APS family of processors. It extends
both the hardware and the instruction set to provide fast math and optimized
data handling for analog or mixed-signal applications.
The core integrates with an APS2 or APS3 processor core through the patented
APS coprocessor interface. The DSP and main CPU operate in parallel: instructions
complete in a single cycle and at the same speed, with out-of-order instruction
completion in both processors. Up to six operations can be executed per
cycle, for example: multiply, accumulate, load data and update pointer
(with wrap around), and load coefficient and update pointer.
The core adds three ALUs operating in parallel, enabling one arithmetic
operation and two address calculations on every clock cycle. The two address
ALUs support a number of addressing modes, implementing circular buffers
and bit-reverse arithmetic without additional programming. The core also
adds two 32k by 32-bit memory interfaces, using a Harvard bus architecture
for simple memory design. This dual memory interface ensures that each
instruction can perform two memory accesses—including pointer updates
and wrap around—as well as the instruction fetch in a single cycle.
A bit-reverse arithmetic feature facilitates calculations like Fast Fourier
Transforms (FFTs), and a special Zero Overhead Loop (ZOL) feature enables
smarter, more effi-cient data processing.
Instructions for the APS-DSP are written in assembler language using
a simple set of constructs. Provided macros make it easy to work with
the DSP routines from the C and C++ programming environments of the APS
processors. An included library offers pre-coded solutions for typical
DSP challenges such as Fast Fourier Transforms (FFTs) and Finite and Infinite
Impulse Response filters (FIRs and IIRs), plus numerous test cases. Designers
can use these without having the DSP expertise required to implement them
from scratch.
Like the APS processor family, the APS-DSP is suitable for implementation
in ASICs, structured ASICs, and many FPGAs. It is fast and relatively
compact—running at 250 MHz and requiring just 14,000 gates in a
0.13 µm ASIC process—and its efficiency complements the low-power
nature of APS processors. The core has been rigorously verified through
thousands of test cases, and has been implemented in FPGAs.
See representative implementation results (each
in a new pop-up window):

Features
- Fixed point, 16-bit RISC DSP processor for APS main processors
- Extends hardware and language for easier DSP programming and faster
DSP execution
- Operates in parallel with main APS CPU, with independent register
set and memory access
- Executes nearly all instructions in a single cycle, including multiplies
and multiply-accumulates
- Integrates with main APS CPU through patented APS coprocessor interface
Hardware Extensions
- Adds three ALUs and two memory interfaces
- Arithmetic ALU
- Two 20-bit Accumulators
- Four 16-bit general purpose data registers
- 16 x 16 Multiplier gives 20-bit results for MAC, Multiply, Shift,
etc.
- Two Address ALUs
- Eight Address Pointers
- Eight Offset Registers
- Eight Circular Buffer Registers
- Support for Bit-Reverse Arithmetic, as used for FFTs
- Two 128 kB memory interfaces, seen by software as 64k x 16 bits wide
each
Language Extensions
- Assembly language instruction set includes:
- dsp_add
- dsp_clr
- dsp_mac
- dsp_macn
- dsp_mul
- dsp_muln
- dsp_mov
- parallel moves
- dsp_nop
- dsp_sub
- dsp_shl
- dsp_shr
- dsp_zol
- dsp_zol_end
- Zero Overhead Loop (ZOL) construct enables algorithm iteration without
instruction fetches
- Special macros facilitate DSP programming in the C and C++ development
environments of the APS processor family
Included Solutions Library
- Pre-coded and verified routines for over 60 DSP functions and test
cases
Applications
Designed for demanding signal processing applications such as Internet
telephony, audio processing, automotive systems, and voice recognition.
Block Diagram
See to the right.
Functional Description
The APS-DSP adds dedicated math and memory hardware and signal processing
language extensions to an APS family main processor.
Hardware Extensions
The core adds three ALUs and two memory interfaces.
The main ALU has four 16-bit general purpose registers and two 20-bit
accumulators. A 16 x 16 multiplier gives 20-bit re-sults. Each of the
two address ALUs has four address pointers, each of which is associated
with an offset register and a circular buffer register, all 16-bits long.
The two additional memory interfaces are each 32k by 32-bits long.
A bit reverse arithmetic mode flips an address pointer register’s
MSB and LSB and causes the carry to be propagated in the opposite direction
from normal processing. This facilitates algorithms such as FFTs that
benefit from being able to read tables of parameters in both directions.
Language Extensions
The DSP coprocessor is programmed in assembly language using a special
set of DSP constructs. These include parallel move operations and thirteen
other special functions, for arithmetic, multiplication, multiply-accumulates,
moving, and shifting. Nearly all of these execute within a single cycle.
A Zero Overhead Loop (ZOL) construct allows part of an al-gorithm to
be iterated automatically. The instructions are stored in an internal
buffer, and need be fetched just once for the entire loop. This loop then
executes in parallel with the application running on the main CPU.
Pre-Coded Library
Delivered with the core are a set of over 60 frequently-used DSP functions
and test cases that are pre-coded, verified, and ready to execute. These
include common architectures for FFTs, FIR and IIR digital filters, arithmetic
and multiplication functions, data manipulation operations, and typical
combinations of specific functions.
DSP Code Example: A Simple FIR
Executes multiple operations in a single cycle: multiply, accumulate,
load data, update data pointer with wraparound, load coefficient, update
coefficient pointer.

Coprocessor Performance
The APS-DSP operates in parallel with the main APS processor and can
execute most signal processing instructions in a single clock cycle. Test
cases show a significant reduction in processing time when the APS-DSP
executes an algorithm instead of programming that algorithm in C for the
main CPU. Results for some examples are shown here (see chart above).
DSP Algorithm
|
C cycles
|
DSP cycles
|
Execution Time Reduction |
FIR 1 (256 taps) |
8,279 |
283 |
71% |
FIR 2 (256 taps) |
8,023 |
283 |
72% |
FFT (256 pts) |
205,556 |
12,807 |
84% |
FFT SHIFT (256 pts) |
199,412 |
21,509 |
91% |
Support
The core as delivered is warranted against defects for ninety days from
purchase. Thirty days of phone and email technical support are included,
starting with the first interaction. Additional maintenance and support
options are available.
Verification
The core has been verified through extensive simulation and rigorous
code coverage measurements. FPGA demonstration units have also been implemented
and evaluated.
Deliverables
The core is intended for use with an APS family main processor, and
includes everything else required for successful implementation:
- HDL RTL source code or optimized netlist for structured ASICs and
FPGAs
- Sophisticated self-checking HDL Testbench including everything needed
to test the core
- Simulation scripts, vectors, and expected results; Synthesis scripts
- Comprehensive user documentation, including detailed specifications
and a programmers reference manual
On this page: Description
| Implementation Results | Features
| Applications | Block Diagram
| Functional Description | Example
| Coprocessor Performance | Support
| Verification | Deliverables
Download PDF datasheets for more info: Datasheet
This core developed by the processor experts at Cortus, SA
|