Digital IP Cores
and Subsystems

Our family of microcontroller and microprocessor related cores includes capable and competitive 32-bit BA22s and the best-available set of proven 8051s.

32-bit Processors
BA2x Family Overview

Secure Processors
Geon - Protected Execution

Application Processors
BA25 Adv. App. Processor
BA22 Basic App. Processor

Cache-Enabled Embedded
BA22 Cache-Embedded

Embedded Processors
BA22 Deeply Embedded
BA21 Low Power
BA20 PipelineZero

Processor-Based AMBA® Subsystems
Family Overview
AHB Low-Power
AHB Performance/Low-Power
AXI Custom Performance

AMBA Bus Infrastructure Cores
See Peripherals Cores >

Efficiently compress media or data with these high-performance hardware codecs.
• See the video and image compression Family Page

JPEG Still & Motion
Encoders
Baseline
Extended
Ultra-Fast
Decoders
Baseline
Extended
Ultra-fast

Easily integrate memories, peripherals, and hardware networking stacks into SoCs.

Display Controllers
TFT LCD

Device Controllers
smart card reader

NOR Flash Controllers
Parallel Flash for AHB
SPI Flash
Octal, XIP for AHB
Quad, XIP for AHB
Quad, XIP for AXI

Legacy Peripherals
DMA Controllers
8237, 82380
UARTs
16450S, 16550S, 16750S
Timer/Counter
8254

Quickly complete the standard parts of your SoC with these memory and peripheral controllers, interfaces, and interconnect cores.

Ethernet MAC
• 1G eMAC Controller

Network Stacks
1G/10G UDP/IP stack
• Hardware RTP Stack
  – for H.264
  – for JPEG
• MPEG Transport Stream
  Encapsulator

SPI
Octal SPI
XIP for AHB
Quad SPI
XIP for AHB
XIP for AXI
Master/Slave
Single SPI
Master/Slave
Bridges
SPI to AHB-Lite

Data Link Controllers
• SDLC & HDLC
UARTs
16450S, 16550S, 16750S

PCI Express
Family Overview
x1/x4, x8
application interface

PCI — Target
32-bit, 32-bit multi, 64-bit
PCI — Master
32-bit, 32-bit multi, 64-bit
PCI — Host Bridge
32 bit, 32 bit - AHB
32 bit & device - AHB

These encryption cores make it easy to build security into a variety of systems.

AES
AES, programmable
  CCM, GCM
Key Expander

DES
DES single
DES triple

Hash Functions
SHA-3 (Keccak)
SHA-256
SHA-1
MD5

Other Posts & News

Recent Blog Posts

Recent News

See all the blog posts or news items

by CAST, Inc.

White Paper — Firmware Compression for Lower Energy and Faster Boot in IoT Devices

by White Paper

Dr. Nikos Zervas, CAST, Inc. 

The phrase “IoT” for Internet of Things has exploded to cover a wide range of different applications and diverse devices with very different requirements. Most observers, however, would agree that low energy consumption is a key element for IoT, as many of these devices must run on batteries or harvest energy from the environment.  

Looking at how IoT devices actually use energy, it is clear that most:

  1. Sit idle much of the time,
  2. Wake up periodically or in response to an event,
  3. Perform some kind of processing,
  4. Transmit the results, and
  5. Go back to sleep.  

Step 2, booting or waking up, can be a significant power drain, and savings here can reduce the overall energy budget. In this article we will look at just that.

Energy Savings through Data Compression

Specifically, here we will show how GZIP data compression can help lower energy dissipation in embedded systems that use code shadowing, a common technique employed in IoT devices.

The basic idea is simple: on-the-fly decompression of previously compressed firmware reduces the data load and minimizes the number of accesses to long-term storage during boot or wake-up, hence reducing the energy (and the delay) during this critical phase of operation.

The possible energy and time-to-boot savings are proportional to the data compression level, which in turn depends on the compression algorithm and the code itself. Real-life examples we will explore here indicate that code size (and therefore power and time-to-boot) can be reduced as much as 50% using commercially available IP cores for hardware Deflate/GUNZIP decompression.

Furthermore, we will see that the savings much more than offset the extra resources used by building the right decompression core into the system.

Code Shadowing Versus Execute In Place

While they are in sleep mode, low-power embedded systems typically store their application code—and in some cases also application data—in a Non-Volatile Memory (NVM) device such as Flash, EPROM, or OTP.

When such systems wake up to perform their task, they run the application code via either of two methods:

  • They fetch and execute the code directly from the NVM, called XIP for eXecute In Place, or
  • They first copy the code to an on-chip SRAM unit, called the shadow memory, and execute it from there.

Which method is best depends on the access speed and access energy of the NVM memory. In general, NVM memories are significantly slower than on-chip SRAMs, and the energy cost of reading data from an NVM memory is much higher than reading the same data from an on-chip SRAM (especially when data are accessed in random order).

While using shadow memory seems best when considering the system’s active mode, the picture changes when we recall that an IoT device is usually asleep for most of its lifetime. Large on-chip SRAMs unfortunately suffer from leakage currents, and hence consume power even when in sleep mode, while most NVMs do not.

Designers therefore often chose code shadowing in cases where the shadow SRAM can be kept relatively small, or where the stringent real-time requirements make the slow access times of XIP unacceptable. 

Lower-Power Code Shadowing via Fast Data Compression

architecture for code shadowing with decompressionFigure 1:  System architecture for Code Shadowing with Decompression.

We can address both issues—and steer the design decision towards the energy-saving code shadowing method—by reducing the size of the application code stored in the NVM.

Compressing the code using a lossless algorithm such as GZIP achieves this, but means the code must be decompressed before execution. Figure 1 illustrates an example IoT system architecture that does this. Here the NVM controller connects to the SoC bus (and from there to the on-chips SRAM) via a decompression engine, like the GUNZIP IP core offered by CAST [1]

Storing compressed code means fewer energy-expensive NVM accesses are required for system wake-up, but now we have added the extra step of decompression with its own delays and energy consumption. Whether this is an overall good idea depends on:

  1. How much can we reduce the size of the application code, i.e., what is the achievable compression ratio, and
  2. What are the silicon area, power and latency requirements for the decompression hardware when we tune the compression algorithm settings to achieve a worthwhile compression ratio?

Let us next explore these factors by working through the numbers to see if the net energy savings of code shadowing using compression really provides a net energy savings.

Example Systems: How Much Energy is Really Saved?

Let us consider three IoT-like systems:

  • In our first example, the R8051XC2 8051 [2] microcontroller runs the Cygnal FreeRTOS port [3].
  • In our second example, the BA22-DE [4] processor runs a sensor control application with multiple threads managed by a FreeRTOS port.
  • In our third example, a Cortex-M3 processor [5] is running InterNiche Technologies’ demo of embedded TCP/IP and HTTP stacks [6].


Figure 2:  Hardware lossless data decompression engine; Huffman decoding type and History size are parameterized.

In all cases, we use the ZipAccel-D GUNZIP IP core  [1] to decompress the firmware as it is read out of a low-power serial Flash NVM memory.

The energy savings depends on the compression level, which in turn depends on the compressibility of the code itself (a function of the ISA and the application) and the chosen GZIP parameters. The GZIP parameters most affecting the compression are the type of the Huffman engine and the size of the History. These parameters also affect the silicon requirements and latency of our GZIP engine.

The uncompressed code sizes for our applications are 25.5 Kbytes for the 8051 system, 161 KBytes for the BA2 system, and 985 KBytes for the Cortex-M3 system. Figure 3 shows the compression ratio for each of our example binaries and Table 1 the area and latency of our decompression core for different sets of GZIP parameters.

Figure 3:  Compression Ratio for the image of InterNiche’s demo of embTCP and embHTTP.

 

ZipAccel-D Configuration

Area in kGates

Memory in kBytes

Latency in clock cycles

Static Huffman, 1024 History

22

1.5

20

Dynamic Huffman, 1024 History

38

6.0

~1,500

Static Huffman, 2048 History

22

2.5

20

Dynamic Huffman, 2048 History

38

7.0

~1,500

Static Huffman, 4096 History

22

4.5

20

Dynamic Huffman, 4096 History

38

9.0

~1,500

Table 1:  ZipAccel-D decompression core silicon resources and latency.

 

To keep the GZIP processing latency and silicon overhead low, we will use Static Huffman tables and a 2048 History. This makes our compressed code about half the size of uncompressed code, and similarly cuts the NVM size for code storage as well as the time and energy required to read code out during boot or wake up. Table 2 and Table 3 summarize these savings, assuming modern low-power serial Flash NVMs with a 5mA read current and a 50MHZ read clock.

 

 

Code Size in kBytes

Required NVM Size

 

System #1
(8051)

System #2
(BA2)

System #3
(ARM)

System #1
 (8051)

System #2
 (BA2)

System #3
 (ARM)

Uncompressed Code

25.5

161

985

256kbits

2Mbit

8Mbit

Compressed Code

10.9

76

511

128kbits

1Mbit

4Mbits

Savings

57.25%

52.80%

48.12%

50.00%

50.00%

50.00%

Table 2:  NVM Size Savings achieved using Code Compression.
 

 

Boot Time in msec

Boot Power in mA x sec

 

System #1
(8051)

System #2
(BA2)

System #3
(ARM)

System #1
 (8051)

System #2
 (BA2)

System #3
 (ARM)

Uncompressed Code

3.98

25

154

0.02

0.13

0.77

Compressed Code

1.7

12

80

0.01

0.06

0.40

Savings

57.29%

52.00%

48.05%

57.25%

52.80%

48.12%

Table 3:  Boot Time and Energy Savings achieved using Code Compression.

 

The resource savings average about 50% and are clearly significant, but at what cost?

Analyzing Compression Overheads

Using compression in the manner described introduces overheads in two areas: time and energy.

The ZipAccel-D decompression core we’re using in our example systems  [1] introduces a latency of 25 to 2000 clock cycles depending on whether static or dynamic Huffman tables are used for compression.

Even at 2000 cycles latency, and assuming that the decompression core would operate at the NVM’s 50MHz clock, the additional delay added by the decompression core is just is 0.04msec. So, the additional delay due to compression is practically negligible, since the time to just read out the code from the NVM is two orders of magnitude higher.

On the energy side, the power usage of the decompression core is negligible while the system is active, but it also consumes energy while the system is idle. The significance of this idle-state power drain depends on the duty cycle of the system.

In our example systems, the idle power usage of the decompression core is 3 to 6 orders of magnitude lower than the power savings it enables. However, since energy is power over time, longer system sleep times make this extra power drain more important. 

With the huge power savings we achieve, it is clear that storing compressed code and decompressing it when needed yields a net system energy savings for most IoT systems, even those with a duty cycle as low as a few msec per day.

Conclusion

IoT devices that employ code shadowing can enjoy significant energy savings by using code compression.

The compressed application code needs a smaller NVM device for long-term storage, and the system consumes significantly less time and energy reading the compressed code from the NVM into the on-chip SRAM.

An efficient hardware decompression engine, like the IP core available from CAST, can decompress the code in-line (as it is read out of the NVM), at the cost of practically negligible additional delay or energy usage.

 

 

[1]     ZipAccel-D GUNZIP/ZLIB/Inflate Data Decompression Core:
http://www.cast-inc.com/ip-cores/data/zipaccel-d/index.html

[2]     R8051XC2 High-Performance, Configurable, 8051-Compatible, 8-bit Microcontroller:
http://www.cast-inc.com/ip-cores/8051s/r8051xc2/index.html

[3]     FreeRTOS Cygnal (Silicon Labs) 8051 Port:
http://www.freertos.org/portcygn.html

[4]     BA22-DE 32-bit Deeply Embedded Processor:
http://www.cast-inc.com/ip-cores/processors32bit/ba22-de/index.html

[6]     InterNiche Technologies embedded TCP/IP stacks demo: http://www.iniche.com/source-code/networking-stack/nichestack.php

 

tw    fbk    li    li    li
Top of Page