AXI4-SGDMA
AXI4 to/from AXI4-Stream Scatter-Gather DMA

The AXI4-SGDMA IP core implements a Host-to-Peripheral (H2P), or a Peripheral-to-Host (P2H) Direct Memory Access (DMA) engine, which interfaces the host system with an AXI4 Memory-Mapped master port and the peripheral with either a slave or a master AXI4-Stream port.

The core operates in either Scatter-Gather (SG) Mode, reading descriptors from a run-time defined memory mapped-location, or in Direct Mode, transferring data according to a descriptor stored in local registers. In Scatter-Gather mode, the descriptors are accessed from a linear or cyclic (ring) buffer. This descriptors buffer can be stored in any memory-mapped location, allowing for example data to be transferred to/from a DRAM and descriptors being stored either in the same DRAM or in separate on-chip SRAM. 

The core is highly configurable both at synthesis and at run time, allowing fine-tuning of its behavior and silicon footprint accord-ing to the user’s requirements. Synthesis time configuration options allow adjusting not only interfaces parameters (e.g. bus widths, unaligned access support), but also the descriptors format, as well as the reset values for the CSRs. Two descriptor for-mats are allowed, one is 32 bytes wide and supports 64-bit address offsets and up to 2GBytes data block transfers per descriptor, and the second, a more compact descriptor format supports 32bit address offsets and 64kB data block transfer per descriptor. The configurable CSR reset values enable usage of the core without initialization right after reset and make it suitable for cases where data transfer on boot is required without invoking the host processor. At run-time, the type and location of the descriptor list, the interrupt triggers, buffers watermark levels, and the core’s treatment of end-of-file conditions are all under software control. 

Designed for ease of integration, the AXI4-SGDMA imposes no unnecessary restrictions to the system using it. Its 32-bit or 64-bit AXI4 Memory Mapped interface supports fixed-size and incrementing bursts, as well as unaligned data accesses. The data-bus width of the AXI4-stream interfaces is synthesis-time configurable and can be 32 up to 512 bits. DMA transfers can be initiated and monitored by software using the core’s Control and Status Registers, or by a peripheral via a dedicated handshaking “start-busy-done” interface. The latter allows data transfers to occur without waking up the host CPU, a feature especially useful for low-power operation. Finally, each of the Host, Peripheral, and CSR interfaces operates on an independent clock domain, and the core implements clean CDC boundaries. 

The AXI4-SGDMA core is rigorously verified, LINT-clean and scan-ready. It is available in synthesizable Verilog and FPGA netlist forms and includes everything required for successful implementation, including a UVM testbench, simulation, and synthesis scripts and comprehensive user documentation.

The AXI4-SGDMA core can be used in any SoC integrating streaming-capable peripherals that need to receive input or store outputs in the system memory. Examples include compression, video processing, or packet processing engines. The core is especially suited as a companion to CAST’s data compression, video or image codecs, IP stacks, or cryptography cores.

Support

The AXI4-SGDMA as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available. 

Deliverables

The core is available in synthesizable Verilog and targeted netlist forms and includes everything required for successful implementation. The deliverable includes a UVM testbench with sample test cases, sample simulation and synthesis scripts, and comprehensive user documentation.

The AXI4-SGDMA core is a purely digital design and can be mapped to any standard cell technology. The following table provides sample implementation results for the core constrained to 1GHz and synthesized on a TSMC 7nm technology.

Configuration Logic
(Eq. Gates)
Memory
bits
P2H, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
25,467 584
P2H, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
47,411 20,800
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
56,688 36,928
H2P, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
18,111 296
H2P, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
30,300 2,368
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
35,727 18,496

These sample implementation figures do not represent the highest speed or smallest area possible for the core. The size of the AXI4-SGDMA and its maximum operating frequency depends on the core’s configuration and the target technology. Please contact CAST to discuss silicon resource utilization and performance for your target configuration and technology.

The AXI4-SGDMA core can be mapped to any Intel Device (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation results for the core implemented on an Intel Arria 10 10AS016E4F29E3SG device.

Configuration ALMs Memory
bits
Clock
Freq
(MHz)
P2H, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
1,662 584 279
P2H, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
2,711 20,800 306
P2H, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
4,789 20,800 293
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
5,306 36,928 261
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
3,025 36,928 295
H2P, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
1,275 296 327
H2P, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
1,873 2,368 320
H2P, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
3,987 18,496 328
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
4,122 18,496 332
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
2,032 18,496 334

These sample implementation figures do not represent the highest speed or smallest area possible for the core. The size of the AXI4-SGDMA and its maximum operating frequency depends on the core’s configuration and the target FPGA. Please contact CAST to discuss silicon resource utilization and performance for your target configuration and technology.

The AXI4-SGDMA core can be mapped to any Xilinx Device (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation results for the core constrained to 300MHz and synthesized on a Xilinx UltraScale+ device with speed grade -1.

Configuration LUTs RAM Blocks
P2H, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
2,184 -
P2H, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
3,538 4 RAMB36
1 RAMB18
P2H, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
7,370 4 RAMB36
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
7,729 8 RAMB36
1 RAMB18
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
4,203 8 RAMB36
1 RAMB18
H2P, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
1,605 -
H2P, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
2,715 -
H2P, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
6,257 4 RAMB36
1 RAMB18
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
6,304 4 RAMB36
1 RAMB18
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
2,685 4 RAMB36
1 RAMB18

These sample implementation figures do not represent the highest speed or smallest area possible for the core. The size of the AXI4-SGDMA and its maximum operating frequency depends on the core’s configuration and the target FPGA. Please contact CAST to discuss silicon resource utilization and performance for your target configuration and technology.

Features List

AXI4 Memory Mapped to Stream & Stream to Memory Mapped DMA

  • Scatter-Gather Mode
    • Descriptors on any memory-mapped location, independent of data buffers
    • Linear or cyclic (ring) descriptor list
  • Direct Transfer Mode
    • Descriptor on CSR
  • Software and/or Hardware Triggered Transfers
  • Standalone & Independent Host-to-Peripheral (H2P) and Peripheral-to-Host (P2H) DMA channel modules

Interfaces 

  • Host Data Interface
    • AXI4 Master port used to access data, and descriptors
    • Fixed and incrementing bursts
    • Unaligned accesses.
    • Synthesis-time configurable data-bus width (32 or 64 bits) and address bus width
  • Peripheral Data Interface
    • AXI4-Stream slave port for Host-to-Peripheral, and master port for Peripheral-to-Host
    • Synthesis-time configurable data-bus width (32 to 512 bits)
    • TLAST behaviour controlled by descriptors for H2P channel
    • P2H channel reports TLAST assertions and optionally updates descriptor
  • CSR Interface
    • AXI4-Lite Slave port 
    • Provides access to control and status registers

Run-Time Programming Options

  • Max burst Size for the Host bus
  • Descriptors list type, base address, size, and Head pointer
  • End-of-file conditions treatment

Synthesis-Time Configuration Options

  • Reset values for CSRs
  • Descriptor format (16 or 32 bytes) 
  • FIFO size
  • Unaligned access support, data-bus and address-bus widths

Let's talk about your project and our IP solutions

Request Info