Silicon IP Cores
SHA-3
SHA-3 Secure Hash Crypto Engine
The SHA-3 is a high-throughput, area-efficient hardware implementation of the SHA-3 cryptographic hashing functions, compliant to NIST's FIPS 180-4 and FIPS 202 standards.
The core implements all the fixed-length and extendable hashing functions provisioned by these standards. The hashing function is synthesis-time configurable; a version supporting run-time hashing function selection can be made available upon request.
The number of SHA-3 permutation rounds per clock cycle is configurable at synthesis time, allowing users to trade throughput for silicon resources. Under its minimum configuration of one permutation per cycle, the core processes 24 to 56 bits per cycle depending on the hashing function. This throughput can scale by a factor of 2x, 3x, or 4x by implementing 2, 3, or 4 permutations per cycle respectively, enabling throughputs in excess of 100Gbps in modern ASIC technologies.
The core is designed for ease of use and integration and adheres to industry-best standards coding and verification practices. It requires no assistance from a host processor and uses standard AMBA® AXI4-Stream interfaces for input and output data. Technology mapping, timing closure, and scan insertion are trouble-free, as the core contains no multi-cycle or false paths, and uses only rising-edge-triggered D-type flip-flops, no tri-states, and a single-clock/reset domain. Its reliability and low risk have been proven through rigorous verification and FPGA validation.
Sample implementation results for a limited set of the SHA3 core configurations are provided in the following table.
Hash |
Rounds |
Number |
Target |
Area |
Freq. |
Gbps |
---|---|---|---|---|---|---|
SHAKE-128 | 1 | 1 | TSMC 7nm | 43.6 | 1,000 | 56.0 |
SHAKE-128 | 1 | 1 | TSMC 7nm | 49.8 | 1,800 | 100.8 |
SHAKE-256 | 1 | 1 | TSMC 7nm | 40.5 | 1,000 | 45.3 |
SHAKE-256 | 1 | 1 | TSMC 7nm | 43.5 | 1,800 | 81.6 |
SHA-512 | 1 | 1 | TSMC 7nm | 33.3 | 1,000 | 24.0 |
SHA-512 | 1 | 1 | TSMC 7nm | 36.7 | 1,900 | 45.6 |
SHA-384 | 1 | 1 | TSMC 7nm | 36.7 | 1,000 | 34.7 |
SHA-384 | 1 | 1 | TSMC 7nm | 43.7 | 1,900 | 65.9 |
SHA-256 | 1 | 1 | TSMC 7nm | 39.9 | 1,000 | 45.3 |
SHA-256 | 1 | 1 | TSMC 7nm | 44.4 | 1,800 | 81.6 |
SHA3-224 | 1 | 1 | TSMC 7nm | 40.7 | 1,000 | 48.0 |
SHA3-224 | 1 | 1 | TSMC 7nm | 47.3 | 1,800 | 86.4 |
SHA3-224 | 2 | 1 | TSMC 7nm | 53.5 | 1,000 | 96.0 |
SHA3-224 | 2 | 1 | TSMC 7nm | 83.3 | 1,300 | 124.8 |
SHA3-224 | 3 | 1 | TSMC 7nm | 95.8 | 900 | 129.6 |
SHA3-224 | 4 | 1 | TSMC 7nm | 120.8 | 700 | 192.0 |
SHA3-224 (12-rounds) |
3 | 2 | TSMC 7nm | 110.9 | 900 | 259.0 |
Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.
Sample implementation results a limited set of configurations implemented on an Intel® Arria10® (speed grade -2) device are provided in the following table.
Hash |
Rounds |
Number |
ALMs | Block |
Freq. |
Gbps |
---|---|---|---|---|---|---|
SHA3-224 | 1 | 2 | 3,869 | 0 | 275 | 13.20 |
SHA3-256 | 1 | 2 | 3,748 | 0 | 275 | 12.47 |
SHA3-384 | 1 | 2 | 3,328 | 0 | 275 | 9.53 |
SHA3-512 | 1 | 2 | 3,135 | 0 | 275 | 6.60 |
SHAKE-128 | 1 | 2 | 4,315 | 0 | 275 | 15.40 |
SHAKE-256 | 1 | 2 | 3,819 | 0 | 275 | 12.47 |
Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.
The SHA-3 can be mapped to any Microsemi device, provided sufficient silicon resources are available. Sample implementation results for a limited set of configurations implemented on an RTG4 (STD speed grade) device are provided in the following table. Please note that the figures on this table do not represent the highest clock frequency or smallest area possible for the core.
Hash |
Rounds |
Number |
4LUTs | Freq. |
Gbps |
---|---|---|---|---|---|
SHA3-224 | 1 | 1 | 7,411 | 50 | 2.40 |
SHA3-224 | 4 | 1 | 20,161 | 50 | 5.78 |
SHA3-256 | 1 | 1 | 7,124 | 50 | 2.27 |
SHA3-384 | 1 | 1 | 7,297 | 40 | 1.39 |
SHA3-512 | 1 | 1 | 4,926 | 50 | 1.20 |
SHAKE-128 | 1 | 1 | 7,768 | 50 | 2.80 |
SHAKE-256 | 1 | 1 | 7,147 | 40 | 1.81 |
Sample implementation results for a limited set of configurations implemented on a Xilinx Kintex UltraScale+ (speed grade -1) device are provided in the following table.
Hash |
Rounds |
Number |
LUTs | RAM |
Freq. |
Gbps |
---|---|---|---|---|---|---|
SHA3-224 | 1 | 2 | 6,123 | 0 | 400 | 19.20 |
SHA3-256 | 1 | 2 | 6,008 | 0 | 400 | 18.13 |
SHA3-384 | 1 | 2 | 5,495 | 0 | 400 | 13.87 |
SHA3-512 | 1 | 2 | 4,926 | 0 | 400 | 9.60 |
SHAKE-128 | 1 | 2 | 6,523 | 0 | 400 | 22.40 |
SHAKE-256 | 1 | 2 | 6,044 | 0 | 400 | 18.13 |
Note that these sample implementation figures do not represent the highest speed or smallest area possible for the core.
This product is sourced from Technology Partner Beyond Semiconductor.
Features List
Standards Support
- FIPS 202: SHA-3 - Permutation-Based Hash and Extendable-Output Function
- FIPS 180-4: Secure Hash Functions (limited to SHA-3 use)
- All four fixed-length SHA-3 Hash Functions:
- SHA3-224
- SHA3-256
- SHA3-384
- SHA3-512
- Both SHA-3 Extendable Output Functions (XOF):
- SHAKE-128
- SHAKE-256
- NIST Certified
Performance
- User-selectable (1 to 4) permutation rounds per clock cycle, resulting in a throughput of:
- SHA3-224: 48.0. to 192 Mbits/MHz
- SHA3-256: 45.3 to 181.2 Mbits/MHz
- SHA3-384: 34.7 to 138.8 Mbits/MHz
- SHA3-512: 24.0 to 96 Mbits/MHz
- SHAKE-128: 56.0 to 224 Mbits/MHz
- SHAKE-256: 45.3 to 181.2 Mbits/MHz
- Intelligent buffers management optionally allows receiving new input while processing the previous message
- Throughput in excess of 200 Gb/s in modern ASIC technologies
Interfaces
- AMBA® AXI4-Stream
Fully autonomous operation
- Requires no assistance from the host processor
- Automatic padding insertion
Configuration Options
- Hashing function (bit-rate, capacity, number of permutation rounds)
- Input & output bus bit-width
- Number of input buffers
- Number of Hash rounds per cycle
Deliverables
- Verilog RTL source code or targeted FPGA netlist
- Integration Test-Bench
- Simulation & synthesis scripts
- Bit Accurate C Model
- User documentation