LZ4SNP-C
LZ4/Snappy Data Compressor

LZ4SNP-C is a custom hardware implementation of a lossless data compression engine that complies with the LZ4 and Snappy compression standards. The core receives uncompressed input files and produces compressed files. No post-processing of the compressed files is required, as the core encapsulates the compressed data payload with the proper headers and footers.

The core’s flexible architecture enables fine-tuning of its compression efficiency and throughput to match the requirements of the end application. More than one block compression engine can be internally instantiated to scale throughput, while block and history window sizes can be adjusted to optimize either hardware resources utilization or compression efficiency. Furthermore, the computation of the optional checksums can be disabled to reduce the size of the core.

LZ4SNP-C offers compression efficiency practically equivalent to the corresponding software applications. Analyzing hardware resources utilization versus compression efficiency to achieve the best tradeoff for a specific system is facilitated by the included software model, and by support from CAST’s data compression experts.

LZ4SNP-C has been designed for ease of use and integration. It operates on a standalone basis, off-loading the host CPU from the demanding task of data compression. Streaming AXI-Stream interfaces ease SoC integration.

Technology mapping is straightforward, as the design is scan-ready, LINT-clean, microcode-free, and uses easily replaceable, generic memory models. Memory blocks can optionally support Error Correction Codes (ECC) to meet Functional Safety or Enterprise-Class reliability requirements.

LZ4SNP-C— a dual-format hardware compressor supporting both LZ4 and Snappy formats is well-suited for embedded systems, data storage controllers, networking devices, and edge/cloud accelerators where fast, flexible compression is critical. Its ability to produce standards-compliant output for either algorithm makes it ideal for platforms that must interface with third-party systems using different compression schemes. The core’s high-throughput, streaming design allows real-time compression of logs, telemetry, video metadata, or database records in ASIC or FPGA-based architectures. By combining performance with format versatility, LZ4SNP-C enables efficient bandwidth and storage reduction across applications in data centers, industrial automation, and mobile edge computing, without reliance on host processors or software libraries.

LZ4SNP-C is process-independent, and its silicon resource requirements and throughput depend on its configuration. The following table provides silicon resource utilization data for the core mapped on a 7nm ASIC technology with its clock set to 1,500MHz, which is not the maximum clock frequency the core can run at this technology. The core configurations listed on the table are indicative and represent a small subset of the possible configuration options.

Configuration Logic Resources
(eq. Gates)
Memory Resources
(bits)
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), no Snappy, no uncompressed blocks, 128B history
59,950 137,328
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), no Snappy, uncompressed blocks, 128B history
62,453 202,864
1 block engine, 8-bit interfaces, 8KB max. block,
no LZ4, Snappy, no uncompressed blocks, 128B history
44,088 137,148
1 block engine, 8-bit interfaces, 8KB max. block,
no LZ4, Snappy, uncompressed blocks, 128B history
46,623 202,684
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 128B history
69,156 202,992
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 512B history
141,834 203,056
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 2KB history
431,850 203,184
1 block engine, 8-bit interfaces, 16KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 8KB history
1,686,574 399,924
2 block engines, 16-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 512B history
296,361 545,896
4 block engines, 32-bit interfaces, 4KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 256B history
390,467 554,352

The core’s throughput scales linearly with the number of block engines and is independent of the history size or compression algorithm. The LZ4SNP-C processes one input byte per clock cycle per block engine. In configurations with multiple block engines, achieving maximum throughput requires sufficiently sized buffers at the boundaries of each block engine. Please contact CAST to get characterization data for your target configuration and technology.

The LZ4SNP-C core can be mapped on any Altera FPGA, provided sufficient silicon resources are available. Its silicon resource requirements and throughput depend on its configuration. The following table provides sample resource utilization data for the core mapped on an Agilex™ 5 device (A5EC013AB23AE3V_E3) running at 270MHz. The core configurations listed on the table are indicative and represent a small subset of the possible configuration options.

Configuration Logic Resources
(ALUTs)
Memory Resources
(BRAMs)
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), no Snappy, no uncompressed blocks, 128B history
2,654 15
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), no Snappy, uncompressed blocks, 128B history
2,751 19
1 block engine, 8-bit interfaces, 8KB max. block,
no LZ4, Snappy, no uncompressed blocks, 128B history
2,272 14
1 block engine, 8-bit interfaces, 8KB max. block,
no LZ4, Snappy, uncompressed blocks, 128B history
2,508 18
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 128B history
3,279 20
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 512B history
6,980 20
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 2KB history
21,754 20
1 block engine, 8-bit interfaces, 16KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 8KB history
80,502 32
2 block engines, 16-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 512B history
12,325 52
4 block engines, 32-bit interfaces, 4KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 256B history
17,950 71
8 block engines, 64-bit interfaces, 2KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 128B history
24,911 133

The core’s throughput scales linearly with the number of block engines and is independent of the history size or compression algorithm. The LZ4SNP-C processes one input byte per clock cycle per block engine. In configurations with multiple block engines, achieving maximum throughput requires sufficiently sized buffers at the boundaries of each block engine. Please contact CAST to get characterization data for your target configuration and technology.

The LZ4SNP-C core can be mapped on any AMD FPGA, provided sufficient silicon resources are available. Its silicon resource requirements and throughput depend on its configuration. The following table provides sample resource utilization data for the core mapped on a Artix UtrlaScale+™ device (speed grade -1-e) running at 280MHz, which is not the fastest frequency for this device. The core configurations listed on the table are indicative and represent a small subset of the possible configuration options.

Configuration Logic Resources
(LUTs)
Memory Resources
(BRAMs)
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), no Snappy, no uncompressed blocks, 128B history
2,014 4
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), no Snappy, uncompressed blocks, 128B history
2,183 6
1 block engine, 8-bit interfaces, 8KB max. block,
no LZ4, Snappy, no uncompressed blocks, 128B history
2,089 4
1 block engine, 8-bit interfaces, 8KB max. block,
no LZ4, Snappy, uncompressed blocks, 128B history
2,222 6
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 128B history
2,843 6
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 512B history
5,476 6
1 block engine, 8-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 2KB history
17,771 6
1 block engine, 8-bit interfaces, 16KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 8KB history
65,181 12
2 block engines, 16-bit interfaces, 8KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 512B history
11,818 17
4 block engines, 32-bit interfaces, 4KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 256B history
15,564 18
8 block engines, 64-bit interfaces, 2KB max. block,
LZ4 (no checksums), Snappy, uncompressed blocks, 128B history
26,527 31

The core’s throughput scales linearly with the number of block engines and is independent of the history size or compression algorithm. The LZ4SNP-C processes one input byte per clock cycle per block engine. In configurations with multiple block engines, achieving maximum throughput requires sufficiently sized buffers at the boundaries of each block engine. Please contact CAST to get characterization data for your target configuration and technology.

Deliverables

The core is available in synthesizable HDL (SystemVerilog) or targeted FPGA netlist forms and includes everything required for successful implementation. Its deliverables include:

  • Sophisticated SystemVerilog test environment
  • Software Bit-Accurate model and test vector generator
  • Simulation and synthesis scripts
  • Comprehensive user documentation
  • IP-XACT register descriptions

Related Content

Features List

Dual-Format Compression Engine

  • LZ4
    • Configurable block size and search window size
    • All frame and block formats
    • xxHash32 checksums
    • Dictionary support can be added on-request
  • Snappy
    • Configurable block and search window size
    • All frame and stream formats
    • CRC32C checksums

Scalable Throughput

  • Single core, single block engine throughput is approximately 1 byte/cycle
  • Single-core throughput scales linearly with the number of block engines
  • More than 100Gbps with one core instance even on FPGA targets

Highly Configurable

  • Compression efficiency – area trade off, to match application requirements
    • Silicon resources requirement and compression efficiency grow with history window size
    • Compression efficiency can be on par with Unix/Linux default compression option
  • Configuration options (partial list):
    • History search window size (up to 32kb)
    • Block size
    • Number of block engines
    • Interfaces bit-width
    • FIFOs and buffers sizing
    • Optional ECC memories

Easy to Use and Integrate

  • Processor-free, standalone operation
  • AXI4-stream data interfaces
    • AXI4-Stream to AHB or AXI4-Lite bridge and DMAs available separately
  • Optional AXI4-Lite or APB CSR interface for configuration
  • Single clock domain design
  • Microcode-free, LINT-clean, scan-ready design

Let's talk about your project and our IP solutions

Request Info