SHA-3
SHA-3 Secure Hash Crypto Engine

The SHA-3 is a high-throughput, area-efficient hardware implementation of the SHA-3 cryptographic hashing functions, compliant to NIST's FIPS 180-4 and FIPS 202 standards. 
The core implements all the fixed-length and extendable hashing functions provisioned by these standards. The hashing function is synthesis-time configurable; a version supporting run-time hashing function selection can be made available upon request. 
The number of SHA-3 permutation rounds per clock cycle is configurable at synthesis time, allowing users to trade throughput for silicon resources. Under its minimum configuration of one permutation per cycle, the core processes 24 to 56 bits per cycle depending on the hashing function. This throughput can scale by a factor of 2x, 3x, or 4x by implementing 2, 3, or 4 permutations per cycle respectively, enabling throughputs in excess of 100Gbps in modern ASIC technologies. 
The core is designed for ease of use and integration and adheres to industry best-standards coding and verification practices. It requires no assistance from a host processor and uses standard AMBA® AXI4-Stream interfaces for input and output data. Technology mapping, timing closure, and scan insertion are trouble-free, as the core contains no multi-cycle or false paths, and uses only rising-edge-triggered D-type flip-flops, no tri-states, and a single-clock/reset domain. Its reliability and low risk have been proven through rigorous verification and FPGA validation.

Sample implementation results for a limited set of the SHA3 core configurations are provided in the following table.

Hash Function Rounds per Cycle Number of Buffers Target Technology Area (kGates) Freq. (MHz) Gbps
SHA3-224 1 2 TSMC 28nm HPM 48.3 700 33.6
SHA3-256 1 2 TSMC 28nm HPM 46,8 700 31.7
SHA3-384 1 2 TSMC 28nm HPM 42.9 700 24.3
SHA3-512 1 2 TSMC 28nm HPM 36.8 700 16.8
SHAKE-128 1 2 TSMC 28nm HPM 52.6 700 39.2
SHAKE-256 1 2 TSMC 28nm HPM 47.6 700 31.7
SHA3-224 2 0 TSMC 7nm 42.3 1,000 96.0
SHA3-224 4 0 TSMC 7nm 105.0 7,000 134.0

Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.  

Sample implementation results a limited set of configurations implemented on an Intel Arria10 (speed grade -2) device are pro-vided in the following table.  

Hash Function Rounds per Cycle Number of Buffers ALMs Block RAMs Freq. (MHz) Gbps
SHA3-224 1 2 3,869 0 275 13.20
SHA3-256 1 2 3,748 0 275 12.47
SHA3-384 1 2 3,328 0 275 9.53
SHA3-512 1 2 3,135 0 275 6.60
SHAKE-128 1 2 4,315 0 275 15.40
SHAKE-256 1 2 3,819 0 275 12.47

Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.

 

Sample implementation results for a limited set of configurations implemented on a Xilinx Kintex UltraScale+ (speed grade -1) device are provided in the following table. 

Hash Function Rounds per Cycle Number of Buffers LUTs RAM Blocks Freq. (MHz) Gbps
SHA3-224 1 2 6,123 0 400 19.20
SHA3-256 1 2 6,008 0 400 18.13
SHA3-384 1 2 5,495 0 400 13.87
SHA3-512 1 2 4,926 0 400 9.60
SHAKE-128 1 2 6,523 0 400 22.40
SHAKE-256 1 2 6,044 0 400 18.13

Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.

Related Content

Features List

Standards Support 

  • FIPS 202: SHA-3 - Permutation-Based Hash and Extendable-Output Function  
  • FIPS 180-4: Secure Hash Functions (limited to SHA-3 use) 
  • All four fixed-length SHA-3 Hash Functions: 
    • SHA3-224 
    • SHA3-256 
    • SHA3-384 
    • SHA3-512 
  • Both SHA-3 Extendable Output Functions (XOF): 
    • SHAKE-128 
    • SHAKE-256 
  • NIST Certified

Performance  

  • User-selectable (1 to 4) permutation rounds per clock cycle, resulting in a throughput of:
    • SHA3-224: 48.0. to 192 Mbits/MHz 
    • SHA3-256: 45.3 to 181.2 Mbits/MHz 
    • SHA3-384: 34.7 to 138.8 Mbits/MHz 
    • SHA3-512: 24.0 to 96 Mbits/MHz 
    • SHAKE-128: 56.0 to 224 Mbits/MHz 
    • SHAKE-256: 45.3 to 181.2 Mbits/MHz 
  • Intelligent buffers management optionally allows receiving new input while processing the previous message 
  • Throughput in excess of 100 Gb/s in modern ASIC technologies 

Interfaces  

  • AMBA® AXI4-Stream  

Fully autonomous operation  

  • Requires no assistance from the host processor 
  • Automatic padding insertion 

Configuration Options  

  • Hashing function (bit-rate, capacity, number of permutation rounds)
  • Input & output bus bit-width  
  • Number of input buffers 
  • Number of Hash rounds per cycle 

Deliverables 

  • Verilog RTL source code or targeted FPGA netlist 
  • Integration Test-Bench  
  • Software C-Model 
  • User documentation 

Let's talk about your project and our IP solutions

Request Info