AXI4-SGDMA
AXI4 Memory-Mapped to/from Stream Scatter-Gather DMA

The AXI4-SGDMA IP core implements a Host-to-Peripheral (H2P), or a Peripheral-to-Host (P2H) Direct Memory Access (DMA) engine, which interfaces the host system with an AXI4 Memory-Mapped master port and the peripheral with either a slave or a master AXI4-Stream port.

The core operates in either Scatter-Gather (SG) Mode, reading descriptors from a run-time defined memory mapped-location, or in Direct Mode, transferring data according to a descriptor stored in local registers. In Scatter-Gather mode, the descriptors are accessed from a linear or cyclic (ring) buffer. This descriptors buffer can be stored in any memory-mapped location, allowing for example data to be transferred to/from a DRAM and descriptors being stored either in the same DRAM or in separate on-chip SRAM.

The core is highly configurable both at synthesis and at run time, allowing fine-tuning of its behavior and silicon footprint according to the user’s requirements. Synthesis time configuration options allow adjusting not only interfaces parameters (e.g. bus widths, unaligned access support), but also the descriptors format, as well as the reset values for the CSRs. Two descriptor formats are allowed, one is 32 bytes wide and supports 64-bit address offsets and up to 2GBytes data block transfers per descriptor, and the second, a more compact descriptor format supports 32bit address offsets and 64kB data block transfer per descriptor. The configurable CSR reset values enable usage of the core without initialization right after reset and make it suitable for cases where data transfer on boot is required without invoking the host processor. At run-time, the type and location of the descriptor list, the interrupt triggers, buffers watermark levels, and the core's treatment of end-of-file conditions are all under software control.

Designed for ease of integration, the AXI4-SGDMA imposes no unnecessary restrictions to the system using it. Its 32-bit or 64-bit AXI4 Memory Mapped interface supports fixed-size and incrementing bursts, as well as unaligned data accesses. The data-bus width of the AXI4-stream interfaces is synthesis-time configurable and can be 32 up to 512 bits. DMA transfers can be initiated and monitored by software using the core’s Control and Status Registers, or by a peripheral via a dedicated handshaking “start-busy-done” interface. The latter allows data transfers to occur without waking up the host CPU, a feature especially useful for low-power operation. Finally, each of the Host, Peripheral, and CSR interfaces operates on an independent clock domain, and the core implements clean CDC boundaries.

The AXI4-SGDMA core is rigorously verified, LINT-clean and scan-ready. It is available in synthesizable Verilog and FPGA netlist forms and includes everything required for successful implementation, including a UVM testbench, simulation, and synthesis scripts and comprehensive user documentation.

**FEATURES**

**Memory Mapped to Stream & Stream to Memory Mapped DMA**
- Scatter-Gather Mode
  - Descriptors on any memory-mapped location, independent of data buffers
  - Linear or cyclic (ring) descriptor list
- Direct Transfer Mode
  - Descriptor on CSR
  - Software and/or Hardware Triggered Transfers
- Standalone & Independent Host-to-Peripheral and Peripheral-to-Host DMA channel modules

**Interfaces**
- Host Data Interface
  - AXI4 Master port used to access data, and descriptors
  - Fixed and incrementing bursts
  - Unaligned accesses.
  - Synthesis-time configurable data-bus width (32 or 64 bits) and address bus width
- Peripheral Data Interface
  - AXI4-Stream slave port for Host-to-Peripheral, and master port for Peripheral-to-Host
  - Synthesis-time configurable data-bus width (32 to 512 bits)
  - TLAST behaviour controlled by descriptors for Host-to-Peripheral channel
  - Peripheral-to-Host channel reports TLAST assertions and optionally updates descriptor
- CSR Interface
  - AXI4-Lite and optionally APB3 Slave port
  - Provides access to control and status registers

**Run-Time Programming Options**
- Max burst Size for the Host bus
- Descriptors list type, base address, size, and Head pointer
- End-of-file conditions treatment

**Synthesis-Time Configuration Options**
- Reset values for CSRs
- Descriptor format (16 or 32 bytes)
- FIFO size
- Unaligned access support, data-bus and address-bus widths

The AXI4-SGDMA core is rigorously verified, LINT-clean and scan-ready. It is available in synthesizable Verilog and FPGA netlist forms and includes everything required for successful implementation, including a UVM testbench, simulation, and synthesis scripts and comprehensive user documentation.

**Block Diagram**

```
AXI4 Master
Data Bus Align
CSRs & DMA Control
FIFO
AXI4-Stream Slave

AXI4-Lite Slave

AXI4-SGDMA
(P2H)

AXI4 Master
Data Bus Align
CSRs & DMA Control
FIFO
AXI4-Stream Master

AXI4-Lite Slave

AXI4-SGDMA
(H2P)
```

---

**Trademarks** are the property of their respective owners.
Performance and Size

The AXI4-SGDMA core can be mapped to any Intel® FPGA device (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation results for the core implemented on an Intel® Arria® 10 10AS016E4F29E3SG device.

<table>
<thead>
<tr>
<th>Configuration</th>
<th>ALMs</th>
<th>Memory Bits</th>
<th>Clock Freq (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>P2H, 32-bit AXI4, 32-bit AXI4 Stream, 8 words FIFO, 16-byte descriptors, no data alignment</td>
<td>1,828</td>
<td>872</td>
<td>323</td>
</tr>
<tr>
<td>P2H, 32-bit AXI4, 256-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, data alignment</td>
<td>2,911</td>
<td>23,104</td>
<td>323</td>
</tr>
<tr>
<td>P2H, 256-bit AXI4, 32-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, data alignment</td>
<td>4,953</td>
<td>39,232</td>
<td>311</td>
</tr>
<tr>
<td>P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, data alignment</td>
<td>5,494</td>
<td>55,360</td>
<td>305</td>
</tr>
<tr>
<td>P2H, 256-bit AXI4, 256-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, no data alignment</td>
<td>3,256</td>
<td>55,360</td>
<td>317</td>
</tr>
<tr>
<td>H2P, 32-bit AXI4, 32-bit AXI4 Stream, 8 words FIFO, 16-byte descriptors, no data alignment</td>
<td>1,433</td>
<td>296</td>
<td>318</td>
</tr>
<tr>
<td>H2P, 32-bit AXI4, 256-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, data alignment</td>
<td>2,079</td>
<td>2,368</td>
<td>334</td>
</tr>
<tr>
<td>H2P, 256-bit AXI4, 32-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, data alignment</td>
<td>4,224</td>
<td>18,496</td>
<td>321</td>
</tr>
<tr>
<td>H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, data alignment</td>
<td>4,448</td>
<td>18,496</td>
<td>325</td>
</tr>
<tr>
<td>H2P, 256-bit AXI4, 256-bit AXI4 Stream, 64 words FIFO, 32-byte descriptors, no data alignment</td>
<td>2,282</td>
<td>18,496</td>
<td>325</td>
</tr>
</tbody>
</table>

These sample implementation figures do not represent the highest speed or smallest area possible for the core. The size of the AXI4-SGDMA and its maximum operating frequency depends on the core’s configuration and the target FPGA. Please contact CAST to discuss silicon resource utilization and performance for your target configuration and technology.

Deliverables

The core is available in synthesizable Verilog and targeted netlist forms and includes everything required for successful implementation. The deliverable includes a UVM testbench with sample test cases, sample simulation and synthesis scripts, and comprehensive user documentation.

Applications

The AXI4-SGDMA core can be used in any SoC integrating streaming-capable peripherals that need to receive input or store outputs in the system memory. Examples include compression, video processing, or packet processing engines. The core is especially suited as a companion to CAST’s data compression, video or image codecs, IP stacks, or cryptography cores.

Support

The AXI4-SGDMA as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.

Deliverables

The core is available in synthesizable Verilog and targeted netlist forms and includes everything required for successful implementation. The deliverable includes a UVM testbench with sample test cases, sample simulation and synthesis scripts, and comprehensive user documentation.