AXI4-SGDMA
AXI4 to/from AXI4-Stream Scatter-Gather DMA

The AXI4-SGDMA IP core implements a Host-to-Peripheral (H2P), or a Peripheral-to-Host (P2H) Direct Memory Access (DMA) engine, which interfaces the host system with an AXI4 Memory-Mapped master port and the peripheral with either a slave or a master AXI4-Stream port.

The core operates in either Scatter-Gather (SG) Mode, reading descriptors from a run-time defined memory mapped-location, or in Direct Mode, transferring data according to a descriptor stored in local registers. In Scatter-Gather mode, the descriptors are accessed from a linear or cyclic (ring) buffer. This descriptors buffer can be stored in any memory-mapped location, allowing for example data to be transferred to/from a DRAM and descriptors being stored either in the same DRAM or in separate on-chip SRAM. 

The core is highly configurable both at synthesis and at run time, allowing fine-tuning of its behavior and silicon footprint accord-ing to the user’s requirements. Synthesis time configuration options allow adjusting not only interfaces parameters (e.g. bus widths, unaligned access support), but also the descriptors format, as well as the reset values for the CSRs. Two descriptor for-mats are allowed, one is 32 bytes wide and supports 64-bit address offsets and up to 2GBytes data block transfers per descriptor, and the second, a more compact descriptor format supports 32bit address offsets and 64kB data block transfer per descriptor. The configurable CSR reset values enable usage of the core without initialization right after reset and make it suitable for cases where data transfer on boot is required without invoking the host processor. At run-time, the type and location of the descriptor list, the interrupt triggers, buffers watermark levels, and the core’s treatment of end-of-file conditions are all under software control. 

Designed for ease of integration, the AXI4-SGDMA imposes no unnecessary restrictions to the system using it. Its 32-bit or 64-bit AXI4 Memory Mapped interface supports fixed-size and incrementing bursts, as well as unaligned data accesses. The data-bus width of the AXI4-stream interfaces is synthesis-time configurable and can be 32 up to 512 bits. DMA transfers can be initiated and monitored by software using the core’s Control and Status Registers, or by a peripheral via a dedicated handshaking “start-busy-done” interface. The latter allows data transfers to occur without waking up the host CPU, a feature especially useful for low-power operation. Finally, each of the Host, Peripheral, and CSR interfaces operates on an independent clock domain, and the core implements clean CDC boundaries. 

The AXI4-SGDMA core is rigorously verified, LINT-clean and scan-ready. It is available in synthesizable Verilog and FPGA netlist forms and includes everything required for successful implementation, including a UVM testbench, simulation, and synthesis scripts and comprehensive user documentation.

The AXI4-SGDMA core can be used in any SoC integrating streaming-capable peripherals that need to receive input or store outputs in the system memory. Examples include compression, video processing, or packet processing engines. The core is especially suited as a companion to CAST’s data compression, video, jpeg, or lossless image codecs, IP stacks, or cryptography cores.

Support

The AXI4-SGDMA as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available. 

Deliverables

The core is available in synthesizable Verilog and targeted netlist forms and includes everything required for successful implementation. The deliverable includes a UVM testbench with sample test cases, sample simulation and synthesis scripts, and comprehensive user documentation.

The AXI4-SGDMA core is a purely digital design and can be mapped to any standard cell technology. The following table provides sample implementation results for the core constrained to 1GHz and synthesized on a TSMC 7nm technology.

Configuration Logic
(Eq. Gates)
Memory
bits
P2H, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
31,688 872
P2H, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
55,574 23,104
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
66,539 55,630
H2P, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
23,391 296
H2P, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
35,727 2,368
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
43,768 18,496

These sample implementation figures do not represent the highest speed or smallest area possible for the core. The size of the AXI4-SGDMA and its maximum operating frequency depends on the core’s configuration and the target technology. Please contact CAST to discuss silicon resource utilization and performance for your target configuration and technology.

The AXI4-SGDMA core can be mapped to any AMD FPGA Device (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation results for the core constrained to 300MHz and synthesized on an UltraScale+ device with speed grade -1.

Configuration LUTs RAM Blocks
P2H, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
2,707 -
P2H, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
4,443 4 RAMB36
1 RAMB18
P2H, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
8,181 4 RAMB36
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
8,784 8 RAMB36
1 RAMB18
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
4,769 8 RAMB36
1 RAMB18
H2P, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
1,999 -
H2P, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
3,099 -
H2P, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
6,697 4 RAMB36
1 RAMB18
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
6,848 4 RAMB36
1 RAMB18
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
3,188 4 RAMB36
1 RAMB18

These sample implementation figures do not represent the highest speed or smallest area possible for the core. The size of the AXI4-SGDMA and its maximum operating frequency depends on the core’s configuration and the target FPGA. Please contact CAST to discuss silicon resource utilization and performance for your target configuration and technology.

The AXI4-SGDMA core can be mapped to any Intel Device (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation results for the core implemented on an Intel Arria 10 10AS016E4F29E3SG device.

Configuration ALMs Memory
bits
Clock
Freq
(MHz)
P2H, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
1,828 872 323
P2H, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
2,911 23,104 323
P2H, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
4,953 39,232 311
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
5,494 55,360 305
P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
3,256 55,360 317
H2P, 32-bit AXI4, 32-bit AXI4 Stream, 8 words
FIFO, 16-byte descriptors, no data alignment
1,433 296 318
H2P, 32-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
2,079 2,368 334
H2P, 256-bit AXI4, 32-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
4,224 18,496 321
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, data alignment
4,448 18,496 325
H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words
FIFO, 32-byte descriptors, no data alignment
2,282 18,496 325

These sample implementation figures do not represent the highest speed or smallest area possible for the core. The size of the AXI4-SGDMA and its maximum operating frequency depends on the core’s configuration and the target FPGA. Please contact CAST to discuss silicon resource utilization and performance for your target configuration and technology.

Related Content

Image
CAST AXI4-SGDMA Streaming to.from memory scatter-gather DMA IP core

Features List

Memory Mapped to Stream & Stream to Memory Mapped DMA

  • Scatter-Gather Mode
    • Descriptors on any memory-mapped location, independent of data buffers
    • Linear or cyclic (ring) descriptor list
  • Direct Transfer Mode
    • Descriptor on CSR
  • Software and/or Hardware Triggered Transfers
  • Standalone & Independent Host-to-Peripheral and Peripheral-to-Host DMA channel modules

Interfaces 

  • Host Data Interface
    • AXI4 Master port used to access data, and descriptors
    • Fixed and incrementing bursts
    • Unaligned accesses.
    • Synthesis-time configurable data-bus width (32 or 64 bits) and address bus width
  • Peripheral Data Interface
    • AXI4-Stream slave port for Host-to-Peripheral, and master port for Peripheral-to-Host
    • Synthesis-time configurable data-bus width (32 to 512 bits)
    • TLAST behavior controlled by descriptors for Host-to-Peripheral channel
    • Peripheral-to-Host channel reports TLAST assertions and optionally updates descriptor
  • CSR Interface
    • AXI4-Lite or optionally APB3 Slave port 
    • Provides access to control and status registers

Run-Time Programming Options

  • Max burst Size for the Host bus
  • Descriptors list type, base address, size, and Head pointer
  • End-of-file conditions treatment

Synthesis-Time Configuration Options

  • Reset values for CSRs
  • Descriptor format (16 or 32 bytes) 
  • FIFO size
  • Unaligned access support, data-bus and address-bus widths

Let's talk about your project and our IP solutions

Request Info