H264-E-BPF
Ultra-Fast AVC/H.264 Baseline Profile Encoder

The H264-E-BPF IP core is a video encoder supporting the Constrained Baseline Profile of the ISO/IEC 14496-10/ITU-T H.264 standard. It Implements an ultra-high throughput, UHD/4K capable hardware encoder that is optimized for ultra-low-latency video streaming at low bit rates. 

The H264-E-BPF encoder requires less silicon area than most equally capable hardware H.264 encoders—approximately 250K gates—allowing for very cost-effective implementations. Its small silicon footprint, low external memory bandwidth requirements, and zero software overhead enable high-throughput H.264 coding at an extremely low energy cost. The encoder is able to process UHD/4K video when mapped on modern ASIC technologies, and Full-HD when mapped on FPGAs. 

Despite being small, the H264-E-BPF produces high-quality video, especially at low bit rates, and is suitable for systems with low latency requirements. It uses constant quantization to output video streams of Variable Bit-Rate (VBR), or automatically regulates quantization multiple times within a frame to output Constant Bit-Rate (CBR) streams. In CBR mode it responds rapidly to temporal or spatial changes in the video content. This can be combined with an artifacts-free Intra-Refresh coding implementation to effectively eliminate bit rate peaks, while preserving the periodic intra-coded references. As a result, the stream buffers can be smaller than those typically required, and the end-to-end latency can be brought down to frame or sub-frame levels. Video quality at  low bit rates is preserved, as the encoder intelligently uses block-skipping and quantization coefficient thresholding to reduce bit rate with minimal quality loss, and uses the in-loop deblocking filter to eliminate the blocking artifact. 

The core was designed for ease of use and integration. Once initially programmed, it operates without any assistance from the host processor. The encoder’s memory interface is extremely flexible: it operates on a separate clock domain, is independent from the external memory type and memory controller, and is tolerant to large latencies. The core is optionally delivered with a raster-to-block converter, and wrappers for AMBA® AHB, AXI, or AXI-Streaming buses are available  

Customers can further decrease their time to market by using CAST’s integration services to receive complete video encoding subsystems. These integrate the encoder core with video and networking interface controllers, networking stacks, or other CAST or third-party IP cores.  

The H264-E-BPF IP core is designed using industry best practices and has been production proven multiple times. Its deliverables include a complete verification environment and a bit-accurate software model.

  • Variable Bit-Rate with Constant Qp (VBR-CQP) and Constant Bit-Rate (CBR) output with CAVLC Encoding
  • Efficient Inter- and Intra- Prediction
    • Motion vector up to –16.00/+15.75 pixels down to ¼ pel accuracy
    • All intra 16x16 and most intra 4x4 modes
  • Options for improved error resilience: Multiple slices per frame, Intra-only coding
  • Options for better quality at low bit-rates
    • Block skipping
    • Deblocking filter
    • Separate quantization values for luma and chrome
    • Thresholding of quantized transform coefficient

Potential customers can readily evaluate the video encoder’s compression efficiency by using:   

  •     Available sample compressed video streams  
  •     The available Bit-Accurate Model with your choice of input videos 
  •     The Video over IP reference design with video captured over an HDMI interface  

Please contact CAST to arrange for your evaluation preference.  

The core is available in source-code HDL (Verilog or VHDL) or as a targeted netlist, and its deliverables include everything required for successful implementation: 

  •     Sophisticated self-checking Testbench  
  •     Synthesis scripts. 
  •     Simulation script, vectors and expected results. 
  •     Software (C++) Bit-Accurate Model and test-vector generator 
  •     Comprehensive user documentation. 
     

The H264-E-BPF can be mapped to any ASIC Technology (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation data for the core configured to operate with a throughput of 2 pixels per cycle (H264-E-BPF/2). Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core. Please contact CAST to get characterization data for your target configuration and technology.

H264-E-BPF/2 Core

Technology Logic Memory Freq. Throughput Video Format
TSMC 16nm 46k um2
270k Gates
375K bits 1 GHz 500 Mpixels/sec UHD/4k@60fps
TSMC 28nm 122k um2
252k Gates
375K bits 850 MHz 425 Mpixels/sec UHD/4k@50fps
TSMC 16nm 43k um2
250k Gates
375K bits 500 MHz 250 Mpixels/sec UHD/4k@30fps
TSMC 28nm 115k um2
235k Gates
375K bits 500 MHz 250 Mpixels/sec UHD/4k@30fps

The H264-E-BPF can be mapped to any AMD device (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation data for the core configured to operate with a throughput of 2 cycles/pixel (H264-E-BPF/2). Please contact CAST to get characterization data for your target configuration and technology.

H264-E-BPF/2 Logic
Resources 1
Memory
Resources 1
Video
Formats 2
ARTIX® 7 35K LUTs
8 DSPs
51 BRAM 1080p50/30/25
720p60/50/30/25
KINTEX® 7
KINTEX®
Ultrascale™ 
1080p60/50/30/25
720p60/50/30/25

1: Exact resource requirements and max performance depend on the target device
2: The list of video formats is not exhaustive. Indicated video formats may not be supported on devices of all speed grades

The H264-E-BPF can be mapped to any Intel® FPGA Family (provided sufficient silicon resources are available) and optimized to suit the particular project’s requirements. The following table provides sample implementation data for the core configured to operate with a throughput of 2 cycles/pixel (H264-E-BPF/2).

H264-E-BPF/2 Area 1
(ALMs)
Memory
Bits
DSPs /
MULs
Video
Formats 2
Stratix® V 20K 818K 18 1080p50/30/25
720p60/50/30/25
Arria® 10 20K 818K 18

1: Exact resource requirements and max performance depend on the target device
2: The list of video formats is not exhaustive. Indicated video formats may not be supported on devices of all speed grades

 

Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core. Please contact CAST to get characterization data for your target configuration and technology.

Related Content

Features List

Ultra-fast, low-latency, production-proven, AVC/H.264 encoder  

  • 1.33 and 2 cycles per pixel options 
  • UHD/4K and very high frame rates at lower resolutions in ASICs and FPGAs 

Standard Support 

  • ISO/IEC 14496-10/ITU-T H.264 Constrained Baseline Profile specification  
  • Interlaced Video using Main Profile syntax 
  • Output Annex B NAL byte stream decodable by Baseline, Main, and High Profile decoders 

Input Video Formats  

  • Progressive or Interlaced, 4:2:0 YCbCr input with 8 bits per color sample 
  • Up to UHD/4k in ASICs; up to Full-HD in FPGAs 

Low Latency and Low Bit Rates with Fewer Artifacts 

  • Constant Bit-Rate (CBR) output for smaller stream buffers and end-to-end latency 
  • Advanced rate control regulates Qp multiple times within a frame, and rapidly responds to temporal or spatial video variations  
  • Enables artifacts-free Intra-Refresh to eliminate bit rate peaks of I frames 
  • Block skipping, Quantized coefficients thresholding, and in-loop deblocking filter improve quality at low bit rates  

Small and Low Power  

  • 250K Gates and 375 Kbits of RAM (for two-cycles-per-pixel configuration) 
  • Uses less power than equally capable hardware H.264 encoders thanks to its smaller silicon footprint and small external memory bandwidth 
  • Consumes much less power than any equivalent software, or software-hardware encoder 

Ease of Integration 

  • Zero CPU overhead, stand-alone operation  
  • Flexible external memory interface uses separate clock, is independent of memory type and tolerant to latencies 

Resources

Let's talk about your project and our IP solutions

Request Info