ZipAccel-D
GUNZIP/ZLIB/Inflate Data Decompression

ZipAccel-D is a custom hardware implementation of a lossless data decompression engine that complies with the Inflate/Deflate, GZIP/GUNZIP, and ZLIB compression standards.

The core features fast processing, with low latency and high throughput. On average the core outputs three bytes of decompressed data per clock cycle, providing over 15Gbps in a typical 40nm technology. Designers can scale the throughput further by instantiating the core multiple times to achieve throughput rates exceeding 100Gbps. The latency is in the order of a few tens of clock cycles for blocks coded with static Huffman tables, and typically less than 2,000 cycles for blocks encoded with dynamic Huffman tables.

The decompression core has been designed for ease of use and integration. It operates on a standalone basis, off-loading the host CPU from the demanding task of data decompression. The core receives compressed input files and outputs decompressed files. No preprocessing of the compressed files is required, as the core parses the file headers, checks the input files for errors, and outputs the decompressed data payload. Featuring extensive error tracking and reporting errors, the core enables smooth system operation and error recovery, even in the presence of errors in the compressed input files. Furthermore, internal memories can optionally support Error Correction Codes (ECC) to simplify the achievement of Enterprise-Class reliability or functional safety requirements.

The ZipAccel-D core is a microcode-free design developed for reuse in ASIC and FPGA implementations. Streaming data, optionally bridged to AMBA AXI4-stream, interfaces ease SoC integration. Technology mapping is straightforward, as the design is scan-ready, microcode-free, and uses easily replaceable, generic memory models.

Verification

The core has been verified through extensive synthesis, place and route, and simulation runs. It has also been embedded in several commercially-shipping products, and is proven in both ASIC and FPGA technologies.

The core has been verified for interoperability with a number of software applications that use GZIP, ZLIB, or Deflate compression.

Deliverables

The core is available in ASIC (synthesizable HDL) and FPGA (netlist) forms, and includes everything required for successful implementation. The ASIC version includes:

HDL (Verilog) RTL source code
Sophisticated Test Environment
Simulation scripts, test vectors and expected results
Synthesis script
Comprehensive user documentation

Support

The core as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.

ZipAccel-D silicon resources requirements and throughput depends on its configuration. Also ZipAccel-D performance can scale by using multiple core instances.

Over 100 Gbps throughputs are feasible, and the silicon footprint can be less than 200k Gates. Contact CAST Sales for help defining likely configuration options and estimating implementation results for your specific system.

The ZipAccel-D can be mapped to any Altera® FPGA device (provided sufficient resources are available). The FPGA resources requirements and throughput depend on the core’s configuration. Also, ZipAccel-D’s performance can scale by using multiple core instances.

The following table provides sample performance and resource utilization data for different configurations of the core on an Arria10 device. The sample implementation data do not represent the smallest possible area requirements nor the highest possible clock frequency. Please contact CAST to get characterization data for your target configuration and technology.

Family / Device	Huffman Tables	History Window	Freq. (MHz)	Logic Resources	Memory Resources	Gbps
Arria10 GX-1150	Dynamic	512	135	7,348 ALMs	154,178 bits	3.24
	Dynamic	1,024	135	7,470 ALMs	158,786 bits	3.24
	Dynamic	2,048	135	7,334 ALMs	171,074 bits	3.24
	Dynamic	4,096	135	7,329 ALMs	187,714 bits	3.24
	Dynamic	8,192	135	7,378 ALMs	220,738 bits	3.24
	Dynamic	16,384	135	7,360 ALMs	286,530 bits	3.24
	Dynamic	32,768	130	7,385 ALMs	417,858 bits	3.12
	Static	512	160	4,914 ALMs	25,680 bits	3.84
	Static	1,024	160	5,012 ALMs	30,288 bits	3.84
	Static	2,048	135	4,824 ALMs	42,576 bits	3.24
	Static	4,096	160	4,862 ALMs	59,216 bits	3.84
	Static	8,192	125	4,882 ALMs	92,240 bits	3.00
	Static	16,384	140	4,917 ALMs	158,032 bits	3.36
	Static	32,768	155	4,957 ALMs	289,350 bits	3,72

The ZipAccel-D core can be mapped in any AMD FPGA device, providing sufficient resources are available. The FPGA resources requirements and throughput depend on the core’s configuration. Also, ZipAccel-D’s performance can scale by using multiple core instances.

The following are sample implementation results for two configurations of the core on AMD devices. These results do not represent the smallest possible area requirements nor the highest possible clock frequency.

Family / Device	Freq. (MHz)	Logic Resources	Memory Resources	Gbps
Spartan-7 xc7s100-1	75	12,593 LUTs	23 BRAM Tiles	1.8
Kintex Ultrascale xcku085-1-c	200	14,761 LUTs	12 BRAM Tiles	4.8
Artix Ultrascale+ xcau25p-1-e	125	13,669 LUTs	12 BRAM Tiles	3.0
Kintex Ultrascale+ xcku9p-1-e	200	14,642 LUTs	12 BRAM Tiles	4.8
Versal Premium xcvp1202-2MP-e-L	200	16,791 LUTs	2 URAMs 0.5 BRAM Tile	4.8

Table 1: Implementation results for a sample configuration supporting both Dynamic and Static Huffman Tables, and a 32kB History

Family / Device	Freq. (MHz)	Logic Resources	Memory Resources	Gbps
Spartan-7 xc7s100-1	100	3,910 LUTs	9 BRAM Tiles	2.4
Kintex Ultrascale xcku085-1-c	325	4,328 LUTs	9 BRAM Tiles	7.8
Artix Ultrascale+ xcau25p-1-e	225	4,309 LUTs	8 BRAM Tiles	5.4
Kintex Ultrascale+ xcku9p-1-e	325	4,336 LUTs	8 BRAM Tiles	7.8
Versal Premium xcvp1202-2MP-e-L	300	4,363 LUTs	2 URAMs	7.2

Table 2: Implementation results for a sample configuration supporting only Static Huffman Tables, and a 32kB History

Contact CAST Sales for help defining likely configuration options and estimating implementation results for your specific system.

ZipAccel-D silicon resources requirements and throughput depend on its configuration. Also, ZipAccel-D performance can scale by instantiating more Huffman decoders and by using multiple core instances.

The core can be mapped on any Lattice FPGA provided sufficient silicon resources are available. The following are sample implementation results for two different configurations of the core on a CertusPro-NX and an Avant-E device, and do not represent the smallest possible area requirements nor the highest possible clock frequency.

Family / Device	Huffman Tables	History Window	Freq. (MHz)	Logic Resources	Memory Resources	Gbps
CertusNX-Pro LFCPNX-100 (-9)	Static	32,768	82	9,153 Slices	24 EBR	1.97
CertusNX-Pro LFCPNX-100 (-9)	Dynamic	32,768	77	26,099 Slices	29 EBR	1.85
Avant-E LFCPNX-100 (-1)	Static	32,768	125	6,308 Slices	12 EBR	3.00
Avant-E LFCPNX-100 (-1)	Dynamic	32,768	100	15,365 Slices	17 EBR	2.40

Compression Standards

ZLIB (RFC-1950)
Inflate/Deflate (RFC-1951)
GZIP/GUNZIP (RFC-1952)

Inflate/Deflate Features

Up to 32KB history window size
All Deflate block types
- Static and dynamic Huffman-coded blocks
- Stored Deflate blocks

High Performance & Low Latency

Three bytes per clock average processing rate, for throughputs exceeding 20Gbps in modern ASIC technologies, and scalable to more than 100Gbps with multiple core instances
Latency from 20 clock cycles for Static Huffman blocks, and typically less than 2000 cycles for Dynamic Huffman Blocks

Easy to Use and Integrate

Processor-free, standalone operation
Extensive error-catching & reporting for smooth operation and recovery in the presence of errors
- Header syntax errors
- CRC/Adler 32 errors
- File size errors
- Coding errors
- Huffman tables errors
- Non-correctable ECC memory errors
Optional ECC memories
Streaming AXI-Stream or native FIFO-like data interfaces
Microcode-free, LINT-clean, scan-ready design
Complete, turn-key accelerator designs available on FPGA boards from different vendors

Configuration Options

Synthesis-time configuration options allow fine-tuning the core’s size and performance (partial list):
- Input and output bus width
- FIFO Sizes
- Maximum history window
- Static-only, or dynamic and static Huffman tables support

ZIPACCEL-D BRIEF (ASIC)

ZIPACCEL-D BRIEF (ALTERA)

ZIPACCEL-D BRIEF (AMD)

ZIPACCEL-D BRIEF (LATTICE)

Resources

Applicable Standards
• RFC 1952 – GZIP file format
• RFC 1950 – ZLIB Compressed Data Format
• RFC 1951 – DEFLATE Compressed Data Format
Background & More Info
• Data Compression in Solid State Storage, presentation at Flash Memory Summit 2013 (PDF)
• Wikipedia entries on GZIP, ZLIB, and Deflate
• An explanation of the Deflate algorithm by Antaeus Feldspar
• GZIP Project website
• ZLIB Project website

ZipAccel-D
GUNZIP/ZLIB/Inflate Data Decompression

Verification

Deliverables

Support

Related Content

Features List

Compression Standards

Inflate/Deflate Features

High Performance & Low Latency

Easy to Use and Integrate

Configuration Options

Resources

Let's talk about your project and our IP solutions

ZipAccel-D GUNZIP/ZLIB/Inflate Data Decompression

Overview

Verification, Deliverables, & Support

Verification

Deliverables

Support

ASIC Implementation Results

Altera Implementation Results

AMD Implementation Results

Lattice Implementation Results

Related Content

Compression Standards

Inflate/Deflate Features

High Performance & Low Latency

Easy to Use and Integrate

Configuration Options

Let's talk about your project and our IP solutions

ZipAccel-D
GUNZIP/ZLIB/Inflate Data Decompression