H264-D-BP
Low-Latency AVC/H.264 Baseline Profile Decoder

The H264-D-BP IP core is a video decoder complying with the Constrained Baseline Profile of the ISO/IEC 14496-10/ITU-T H.264 standard. It implements a hardware decoder with very low latency and high throughput that is suitable for live streaming and other delay-sensitive applications up to full HD resolution.  

The decoder adds just one macroblock line of latency, which means a negligible real-world latency under one msec for most widely used video formats, including HD/720p and Full-HD/1080p video.  

The H264-D-BP is designed for straightforward, trouble-free SoC integration. It operates on a stand-alone basis such that decoding proceeds with no assistance or input from the host processor. The decoder’s memory interface—used to store reconstructed video data—is independent from the external memory type and memory controller, and is tolerant to large latencies. Optionally, the core can be reduced to support only Intra-coded streams, in which case the required external memory is just 128kB and can be implemented on-chip. The decoder reports decompressed video parameters, detects and reports bit stream errors to the system, and simplifies video cropping at its output. The core is optionally delivered with a raster-to-block converter, and wrappers for AMBA® AHB, AXI, or AXI-Streaming buses are available.  

Customers can further decrease their time to market by using CAST’s integration services to receive complete video encoding/decoding subsystems. These integrate the decoder core with video encoders, video and networking interface controllers, networking stacks, or other CAST or third-party IP cores.  

The H264-D-BP IP core has been verified with Fraunhofer’s compliance test stream suit, and has been silicon and production proven. Its deliverables include a complete verification environment and a bit-accurate software model. 
 

The H264-D-BP synthesizes to less than 500k gates and requires 532 kbits of internal memory. When configured to support only Intra-coded streams, the core synthesizes to about 400k gates, and requires only 128kB of external memory. 

See sample ASIC and FPGA implementation results below.

Potential customers can readily evaluate the video decoder’s low latency characteristics by using the Video over IP reference design with the compressed stream captured over Ethernet, and the decoded video driving an HDMI interface.  

The core is available in source-code VHDL or as a targeted netlist, and its deliverables include everything required for successful implementation: 

  •     Sophisticated self-checking Testbench  
  •     Synthesis scripts. 
  •     Simulation script, vectors and expected results. 
  •     Software Bit-Accurate Model  
  •     Comprehensive user documentation. 

The H264-D-BP can be mapped to any ASIC technology or FPGA device (provided sufficient silicon resources are available). The following table provides sample implementation data. Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core. Please contact CAST to get characterization data for your target configuration and technology.

Technology Logic Area
(μm2)
Logic Area
(Gates)
Memory
(kbits)
Freq.
(MHz)
Throughput
(Mpixels/sec)
tsmc16-sc7-svt 85,390 494,155 532 1,000 400
tsmc28hpm-sc9-c35-svt-ss 257,570 529,979 532 1,000 400
tsmc40g-sc9-rvt 257,570 645,329 532 800 320

When configured to support only Intra-coded streams, the core synthesizes to about 400k gates and requires just 128kB of external memory.

The H264-D-BP can be mapped to any AMD FPGA, provided sufficient silicon resources are available. The following table provides sample performance and resource utilization data for different AMD device families. Please contact CAST to get characterization data for your target configuration and technology.

  720p30 720p50 720p60 1080p30
ARTIX ULTRASCALE+
KINTEX ULTRASCALE
KINTEX-7
ARTIX-7
LUTs1 52k
BRAMs 23 RAMB36 / 95 RAMB18
DSPs 19

1: Exact resource requirements and max performance depend on target device
2: List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades

An Intra-only version of the core (i.e. decoder core limited to decoded I-frames only) occupies about 20% less silicon resources, and requires just 128kB of external memory.

The H264-D-BP can be mapped to any Intel FPGA (provided sufficient silicon resources are available).  The following table provides sample performance and resource utilization data for different Intel Device Families.  Please contact CAST to get characterization data for your target configuration and technology.

  720p30 720p50 720p60 1080p30
StratixV
Arria10
CycloneV
ALMs 32K
Memory bits 532k
DSPs 19

1: List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades

An Intra-only version of the core (i.e. decoder core limited to decoded I-frames only) occupies about 20% less silicon resources, and requires just 128kB of external memory.

The H264-D-BP can be mapped to any Microsemi FPGA (provided sufficient silicon resources are available).  The following table provides sample performance and resource utilization data on a PolarFire device. Please contact CAST to get characterization data for your target configuration and device.

  480p60 576p60 720p30 720p60
1080p30
PolarFire
4LUTs 85,137
RAM Blocks 255 uSRAM, 53 LSRAM
Math Blocks 37

1: List of video formats is not exhaustive. Indicated video formats may not be supported at devices of all speed grades

An Intra-only version of the core (i.e. decoder core limited to decoded I-frames only) occupies about 20% less silicon resources, and requires just 128kB of external memory.

Related Content

Features List

Constrained Baseline Profile AVC/H.264 decoder

  • Ultra-Low-Latency: less than one msec latency for most widely used formats
  • High performance: 2.5 cycles per pixel; Full-HD capable

Standard Support

  • ISO/IEC 14496-10/ITU-T H.264, Constrained Baseline Profile specification
    • I and P slices (Intra-only version also available)
    • Multiple slices per frame
    • Multiple reference frames
    • Multiple sequence parameter sets (SPS)
    • Multiple picture parameter sets (PPS)
    • In-loop deblocking filter
    • CAVLC entropy decoding
  • Real time performance up to level 4.1

Video Formats

  • Progressive, 4:2:0 YCbCr with 8 bits per color sample
  • From QCIF (176x144), to 2048x2048 resolutions

Low Latency

  • No decoded frame buffering
  • Decoded pixels are streamed out with less than one macro-block line of latency
  • Less than 1 msec for almost all widely used video formats

Ease of Integration

  • Zero CPU overhead, stand-alone operation
  • AMBA® AXI external memory interface is independent of memory type and tolerant to latencies

  • Streaming interfaces for bit-stream and pixel data with flow control; easily bridged to AMBA® AXI Streaming
  • Error catching and reporting capability
  • Reports video format and enables cropping
  • Optional Block to Raster Conversion

Maturity

  • Silicon proven
  • Verified with Fraunhofer H.264 Compliance Test Streams Suite

Resources

Let's talk about your project and our IP solutions

Request Info