Silicon IP Cores
LZ4SNP-C
LZ4/Snappy Data Compressor
LZ4SNP-C is a custom hardware implementation of a lossless data compression engine that complies with the LZ4 and Snappy compression standards. The core receives uncompressed input files and produces compressed files. No post-processing of the compressed files is required, as the core encapsulates the compressed data payload with the proper headers and footers.
The core’s flexible architecture enables fine-tuning of its compression efficiency and throughput to match the requirements of the end application. More than one block compression engine can be internally instantiated to scale throughput, while block and history window sizes can be adjusted to optimize either hardware resources utilization or compression efficiency. Furthermore, the computation of the optional checksums can be disabled to reduce the size of the core.
LZ4SNP-C offers compression efficiency practically equivalent to the corresponding software applications. Analyzing hardware resources utilization versus compression efficiency to achieve the best tradeoff for a specific system is facilitated by the included software model, and by support from CAST’s data compression experts.
LZ4SNP-C has been designed for ease of use and integration. It operates on a standalone basis, off-loading the host CPU from the demanding task of data compression. Streaming AXI-Stream interfaces ease SoC integration.
Technology mapping is straightforward, as the design is scan-ready, LINT-clean, microcode-free, and uses easily replaceable, generic memory models. Memory blocks can optionally support Error Correction Codes (ECC) to meet Functional Safety or Enterprise-Class reliability requirements.
LZ4SNP-C— a dual-format hardware compressor supporting both LZ4 and Snappy formats is well-suited for embedded systems, data storage controllers, networking devices, and edge/cloud accelerators where fast, flexible compression is critical. Its ability to produce standards-compliant output for either algorithm makes it ideal for platforms that must interface with third-party systems using different compression schemes. The core’s high-throughput, streaming design allows real-time compression of logs, telemetry, video metadata, or database records in ASIC or FPGA-based architectures. By combining performance with format versatility, LZ4SNP-C enables efficient bandwidth and storage reduction across applications in data centers, industrial automation, and mobile edge computing, without reliance on host processors or software libraries.
LZ4SNP-C is process-independent, and its silicon resource requirements and throughput depend on its configuration. The following table provides silicon resource utilization data for the core mapped on a 7nm ASIC technology with its clock set to 1,500MHz, which is not the maximum clock frequency the core can run at this technology. The core configurations listed on the table are indicative and represent a small subset of the possible configuration options.
Configuration | Logic Resources (eq. Gates) |
Memory Resources (bits) |
---|---|---|
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), no Snappy, no uncompressed blocks, 128B history |
59,950 | 137,328 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), no Snappy, uncompressed blocks, 128B history |
62,453 | 202,864 |
1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, no uncompressed blocks, 128B history |
44,088 | 137,148 |
1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, uncompressed blocks, 128B history |
46,623 | 202,684 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 128B history |
69,156 | 202,992 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 512B history |
141,834 | 203,056 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 2KB history |
431,850 | 203,184 |
1 block engine, 8-bit interfaces, 16KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 8KB history |
1,686,574 | 399,924 |
2 block engines, 16-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 512B history |
296,361 | 545,896 |
4 block engines, 32-bit interfaces, 4KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 256B history |
390,467 | 554,352 |
The core’s throughput scales linearly with the number of block engines and is independent of the history size or compression algorithm. The LZ4SNP-C processes one input byte per clock cycle per block engine. In configurations with multiple block engines, achieving maximum throughput requires sufficiently sized buffers at the boundaries of each block engine. Please contact CAST to get characterization data for your target configuration and technology.
The LZ4SNP-C core can be mapped on any Altera FPGA, provided sufficient silicon resources are available. Its silicon resource requirements and throughput depend on its configuration. The following table provides sample resource utilization data for the core mapped on an Agilex™ 5 device (A5EC013AB23AE3V_E3) running at 270MHz. The core configurations listed on the table are indicative and represent a small subset of the possible configuration options.
Configuration | Logic Resources (ALUTs) |
Memory Resources (BRAMs) |
---|---|---|
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), no Snappy, no uncompressed blocks, 128B history |
2,654 | 15 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), no Snappy, uncompressed blocks, 128B history |
2,751 | 19 |
1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, no uncompressed blocks, 128B history |
2,272 | 14 |
1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, uncompressed blocks, 128B history |
2,508 | 18 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 128B history |
3,279 | 20 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 512B history |
6,980 | 20 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 2KB history |
21,754 | 20 |
1 block engine, 8-bit interfaces, 16KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 8KB history |
80,502 | 32 |
2 block engines, 16-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 512B history |
12,325 | 52 |
4 block engines, 32-bit interfaces, 4KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 256B history |
17,950 | 71 |
8 block engines, 64-bit interfaces, 2KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 128B history |
24,911 | 133 |
The core’s throughput scales linearly with the number of block engines and is independent of the history size or compression algorithm. The LZ4SNP-C processes one input byte per clock cycle per block engine. In configurations with multiple block engines, achieving maximum throughput requires sufficiently sized buffers at the boundaries of each block engine. Please contact CAST to get characterization data for your target configuration and technology.
The LZ4SNP-C core can be mapped on any AMD FPGA, provided sufficient silicon resources are available. Its silicon resource requirements and throughput depend on its configuration. The following table provides sample resource utilization data for the core mapped on a Artix UtrlaScale+™ device (speed grade -1-e) running at 280MHz, which is not the fastest frequency for this device. The core configurations listed on the table are indicative and represent a small subset of the possible configuration options.
Configuration | Logic Resources (LUTs) |
Memory Resources (BRAMs) |
---|---|---|
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), no Snappy, no uncompressed blocks, 128B history |
2,014 | 4 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), no Snappy, uncompressed blocks, 128B history |
2,183 | 6 |
1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, no uncompressed blocks, 128B history |
2,089 | 4 |
1 block engine, 8-bit interfaces, 8KB max. block, no LZ4, Snappy, uncompressed blocks, 128B history |
2,222 | 6 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 128B history |
2,843 | 6 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 512B history |
5,476 | 6 |
1 block engine, 8-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 2KB history |
17,771 | 6 |
1 block engine, 8-bit interfaces, 16KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 8KB history |
65,181 | 12 |
2 block engines, 16-bit interfaces, 8KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 512B history |
11,818 | 17 |
4 block engines, 32-bit interfaces, 4KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 256B history |
15,564 | 18 |
8 block engines, 64-bit interfaces, 2KB max. block, LZ4 (no checksums), Snappy, uncompressed blocks, 128B history |
26,527 | 31 |
The core’s throughput scales linearly with the number of block engines and is independent of the history size or compression algorithm. The LZ4SNP-C processes one input byte per clock cycle per block engine. In configurations with multiple block engines, achieving maximum throughput requires sufficiently sized buffers at the boundaries of each block engine. Please contact CAST to get characterization data for your target configuration and technology.
Deliverables
The core is available in synthesizable HDL (SystemVerilog) or targeted FPGA netlist forms and includes everything required for successful implementation. Its deliverables include:
- Sophisticated SystemVerilog test environment
- Software Bit-Accurate model and test vector generator
- Simulation and synthesis scripts
- Comprehensive user documentation
- IP-XACT register descriptions
Features List
Dual-Format Compression Engine
- LZ4
- Configurable block size and search window size
- All frame and block formats
- xxHash32 checksums
- Dictionary support can be added on-request
- Snappy
- Configurable block and search window size
- All frame and stream formats
- CRC32C checksums
Scalable Throughput
- Single core, single block engine throughput is approximately 1 byte/cycle
- Single-core throughput scales linearly with the number of block engines
- More than 100Gbps with one core instance even on FPGA targets
Highly Configurable
- Compression efficiency – area trade off, to match application requirements
- Silicon resources requirement and compression efficiency grow with history window size
- Compression efficiency can be on par with Unix/Linux default compression option
- Configuration options (partial list):
- History search window size (up to 32kb)
- Block size
- Number of block engines
- Interfaces bit-width
- FIFOs and buffers sizing
- Optional ECC memories
Easy to Use and Integrate
- Processor-free, standalone operation
- AXI4-stream data interfaces
- AXI4-Stream to AHB or AXI4-Lite bridge and DMAs available separately
- Optional AXI4-Lite or APB CSR interface for configuration
- Single clock domain design
- Microcode-free, LINT-clean, scan-ready design