# LZ4SNP-C

# LZ4/Snappy Data Compressor

compression engine that complies with the LZ4 and Snappy compression standards. The core receives uncompressed input files and produces compressed files. No post-processing of the compressed files is required, as the core encapsulates the compressed data payload with the proper headers and footers.

The core's flexible architecture enables fine-tuning of its compression efficiency and throughput to match the requirements of the end application. More than one block compression engine can be internally instantiated to scale throughput, while block and history window sizes can be adjusted to optimize either hardware resources utilization or compression efficiency. Furthermore, the computation of the optional checksums can be disabled to reduce the size of the core.

LZ4SNP-C offers compression efficiency practically equivalent to the corresponding software applications. Analyzing hardware resources utilization versus compression efficiency to achieve the best tradeoff for a specific system is facilitated by the included software model, and by support from CAST's data compression experts.

LZ4SNP-C has been designed for ease of use and integration. It operates on a standalone basis, off-loading the host CPU from the demanding task of data compression. Streaming AXI-Stream interfaces ease SoC integration.

Technology mapping is straightforward, as the design is scan-ready, LINT-clean, microcode-free, and uses easily replaceable, generic memory models. Memory blocks can optionally support Error Correction Codes (ECC) to meet Functional Safety or Enterprise-Class reliability requirements.

# **Applications**

LZ4SNP-C—a dual-format hardware compressor supporting both LZ4 and Snappy formats—is well-suited for embedded systems, data storage controllers, networking devices, and edge/cloud accelerators where fast, flexible compression is critical. Its ability to produce standards-compliant output for either algorithm makes it ideal for platforms that must interface with third-party systems using different compression schemes. The core's high-throughput, streaming design allows real-time compression of logs, telemetry, video metadata, or database records FPGA-based architectures. By combining performance with format versatility, LZ4SNP-C enables efficient bandwidth and storage reduction across applications in data centers, industrial automation, and mobile edge computing, without reliance on host processors or software libraries.

### **Block Diagram**



#### **FEATURES**

#### **Dual-Format Compression Engine**

- LZ<sup>2</sup>
  - Configurable block size and search window size
  - All frame and block formats
  - o xxHash32 checksums
  - Dictionary support can be added on-request
- Snappy
  - Configurable block and search window size
  - o All frame and stream formats
  - CRC32C checksums

#### **Scalable Throughput**

- Single core, single block engine throughput is approximately 1 byte/cycle
- Single-core throughput scales linearly with the number of block engines
- More than 100Gbps with one core instance on high-end Altera FPGAs

#### **Highly Configurable**

- Compression efficiency area trade off, to match application requirements
  - Silicon resources requirement and compression efficiency grow with history window size
  - Compression efficiency can be on par with Unix/Linux default compression option
- Configuration options (partial list):
  - History search window size (up to 32kb)
  - Block size
  - Number of block engines
  - o Interfaces bit-width
  - FIFOs and buffers sizing
  - o Optional ECC memories

#### Easy to Use and Integrate

- Processor-free, standalone operation
- AXI4-stream data interfaces
  - AXI4-Stream to AHB or AXI4-Lite bridge and DMAs available separately
- Optional AXI4-Lite or APB CSR interface for configuration
- Single clock domain design
- Microcode-free, LINT-clean, scanready design



# **Area and Performance (ALTERA)**

The LZ4SNP-C core can be mapped on any Altera FPGA, provided sufficient silicon resources are available. Its silicon resource requirements and throughput depend on its configuration.

The following table provides sample resource utilization data for the core mapped on an Agilex™ 5 device (A5EC013AB23AE3V\_E3) running at 270MHz. The core configurations listed on the table are indicative and represent a small subset of the possible configuration options.

| ·                                                                                                                           | -                             |                                |
|-----------------------------------------------------------------------------------------------------------------------------|-------------------------------|--------------------------------|
| Configuration                                                                                                               | Logic<br>Resources<br>(ALUTs) | Memory<br>Resources<br>(BRAMs) |
| 1 block engine, 8-bit interfaces, 8KB max. block,<br>LZ4 (no checksums), no Snappy, no<br>uncompressed blocks, 128B history | 2,654                         | 15                             |
| 1 block engine, 8-bit interfaces, 8KB max. block,<br>LZ4 (no checksums), no Snappy, uncompressed<br>blocks, 128B history    | 2,751                         | 19                             |
| 1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, no uncompressed blocks, 128B history                      | 2,272                         | 14                             |
| 1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, uncompressed blocks, 128B history                         | 2,508                         | 18                             |
| 1 block engine, 8-bit interfaces, 8KB max. block,<br>LZ4 (no checksums), Snappy, uncompressed<br>blocks, 128B history       | 3,279                         | 20                             |
| 1 block engine, 8-bit interfaces, 8KB max. block,<br>LZ4 (no checksums). Snappy, uncompressed<br>blocks, 512B history       | 6,980                         | 20                             |
| 1 block engine, 8-bit interfaces, 8KB max. block,<br>LZ4 (no checksums). Snappy, uncompressed<br>blocks, 2KB history        | 21,754                        | 20                             |
| 1 block engine, 8-bit interfaces, 16KB max.<br>block, LZ4 (no checksums), Snappy,<br>uncompressed blocks, 8KB history       | 80,502                        | 32                             |
| 2 block engines, 16-bit interfaces, 8KB max.<br>block, LZ4 (no checksums), Snappy,<br>uncompressed blocks, 512B history     | 12,325                        | 52                             |
| 4 block engines, 32-bit interfaces, 4KB max.<br>block, LZ4 (no checksums), Snappy,<br>uncompressed blocks, 256B history     | 17,950                        | 71                             |
| 8 block engines,64-bit interfaces, 2KB max.<br>block, LZ4 (no checksums), Snappy,<br>uncompressed blocks, 128B history      | 24,911                        | 133                            |

The core's throughput scales linearly with the number of block engines and is independent of the history size or compression algorithm. The LZ4SNP-C processes one input byte per clock cycle per block engine. In configurations with multiple block engines, achieving maximum throughput requires sufficiently sized buffers at the boundaries of each block engine.

Please contact CAST to get characterization data for your target configuration and technology.

# **Support**

The core as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.

#### **Deliverables**

The core is available in synthesizable HDL (SystemVerilog) or targeted FPGA netlist forms and includes everything required for successful implementation. Its deliverables include:

- · Sophisticated SystemVerilog test environment
- · Software Bit-Accurate model and test vector generator
- Simulation and synthesis scripts
- · Comprehensive user documentation
- IP-XACT register descriptions

## **Related Cores**

- LZ4SNP-D: LZ4/Snappy Data Decompressor
- AXI4-SGDMA: AXI4 to/from AXI4-Stream Scatter-Gather DMA Controller
- MC-SDMA: Multi-channel Streaming DMA
- MM2ST: AHB/AXI4-Lite to AXI4-Stream Bridge
- ZipAccel-D: GZIP/ZLIB/Deflate Data Decompressor
- ZipAccel-C: GZIP/ZLIB/Deflate Data Compressor

