Digital IP Cores
and Subsystems

Our family of microcontroller and microprocessor related cores includes capable and competitive 32-bit BA22s and the best-available set of proven 8051s.

32-bit Processors
BA2x Family Overview

Secure Processors
Geon - Secure Execution

Application Processors
BA25 Adv. App. Processor
BA22 Basic App. Processor

Cache-Enabled Embedded
BA22 Cache-Embedded

Embedded Processors
BA22 Deeply Embedded
BA21 Low Power
BA20 PipelineZero

Processor-Based AMBA® Subsystems
Family Overview
AHB Low-Power
AHB Performance/Low-Power
AXI Custom Performance

AMBA Bus Infrastructure Cores
See Peripherals Cores >

Efficiently compress media or data with these high-performance hardware codecs. See the video and image compression Family Page for a media compression overview.

 H.264 Video Decoders
Low Latency Constrained
  Baseline Profile

Low-Power Constrained
  Baseline Profile

 H.265 HEVC Decoders
Main Profile

Companion Cores
Image Processing
CAMFE Camera Processor
Network Stacks
40G UDPIP Stack
1G/10G UDPIP Stack
• Hardware RTP Stack
  – for H.264 Encoders
  – for H.264 Decoders
  – for JPEG Encoders
IEEE 802.1Qav & 802.1Qbv

• MPEG Transport Stream

JPEG Still & Motion

Lossless & Near-Lossless

Lossless Data Compression
GZIP Compressor
GUNZIP Decompressor
GZIP Reference Designs
    • for Intel FPGAs
    • for Xiinx FPGAs

Easily integrate memories, peripherals, and hardware networking stacks into SoCs.

Display Controllers

Device Controllers
smart card reader

Flash Controllers
Parallel Flash
Parallel Flash for AHB
Universal Serial NOR/NAND
   Flash for AHB

Quickly complete the standard parts of your SoC with these memory and peripheral controllers, interfaces, and interconnect cores.

Octal/Quad/Dual/Single SPI
Quad SPI
Single SPI
SPI to AHB-Lite

Master/Slave Controller
Master  • Slave

These encryption cores make it easy to build security into a variety of systems.

GEON SoC Security
GEON Security

Encryption Primitives
AES, Programmable
Key Expander
Single, Triple

Hash Functions
SHA-3 (Keccak)

Other Posts & News

Recent Blog Posts

Recent News

See all the blog posts or news items


by CAST, Inc.

8051 Interrupt Latency: Designing with Modern 8-bit MCUs

by White Paper Article

by Dr. Nikos Zervas, CAST, Inc.

Systems in today’s fast-growth application areas—Internet of Things, smart automobile systems, wearable electronics, etc.—require a large number and variety of focused, inexpensive, low-energy subsystems. This sets the stage for a new microcontroller war between 32-bit processors—who thought they had already won—and 8-bit veterans like the 8051—who have undergone a fitness regimen and returned to the field with new vigor.

One battle in this war is fought over interrupt latency, the number of clock cycles required between an interrupt making itself known and the processor starting to execute the appropriate operations in response. This delay matters in any application doing real-time processing. Variations in this latency over time also matter, as large differences or jitter lead to errors in some applications.

Here we’ll look at what determines interrupt latency in today’s latest MCS®51-compatible 8-bit 8051 MCUs, and how this deflates old war stories about 32-bit RISC MCUs being better.

Interrupt Latency in Modern 8051s

8051 IP cores such as the popular R8051XC2 and new S8051XC3 available from CAST, Inc. use the latest processor design techniques to give these microcontrollers vastly better performance and energy characteristics compared to their original discrete chip ancestors.

In discussing latency handling, let’s assume:

·         Zero wait states in the memory system,

·         No connection delays between the interrupt sources and the processor, and

·         No blockage of the interrupt service by another currently-running exception or interrupt service.

From Interrupt to the First ISR Instruction

Our two example microcontrollers have these latencies from interrupt line assertion to first Interrupt Service Routine (ISR) instruction execution:

·         R8051XC2 = 4 cycles delay

·         S8051XC3 = 3 cycles delay

These figures show how quickly modern 8051s can enter an ISR, but one must look closer: usually that first ISR instruction doesn’t correspond to the first functional C-code instruction but rather is just to save the processor context to the stack so that execution can resume there after the interrupt is handled.

Stacking and Unstacking Processor Context

Five 8051 registers are relevant to context saving:

·         Program Status Word (PSW) — carries status flags (e.g. overflow), and a two-bit selector field for the GPR register bank.

  • Accumulator (A) — used by most instructions.
  • B — an accumulator extension used by multiply and divide instructions
  • Data Pointers Registers (DPTRs) — dual DPTRs speed up memory-to-memory move instructions and can be banked up to eight banks. The R8051XC2 can use one, two or eight DPTRs, while the S8051XC3 always uses dual DPTRs.
  • General Purpose Registers (GPR) — arranged in four banks, with eight GPRs each (named R0-R7).

In practice, the compiler (or the assembly code developer) only stacks registers whose value is altered by the ISR. For example, some ISRs don’t stack any registers when the interrupt just increases a counter value (Example 1).

Example 1

Example 1: This ISR routine does not need to stack any register values.

Most ISRs performing data calculations will save the PSW and the Accumulator values (Example 2). For multiply or divide instructions, the ISR should also save the B register.

Example 2

Example 2: This ISR only needs to stack two register values (PSW & ACC).

The GPRs are rarely saved, as software developers usually exploit register banking to avoid this need. For example, a system with three interrupt priority levels can assign all priority 0, 1, and 2 interrupts to GPR banks 0, 1, and 2 respectively, and assign the “main” program to bank 3. Since no ISR can be interrupted by another ISR of the same priority, the need for stacking and unstacking the GPRs is eliminated. (Note that in C/C++ assigning an ISR to a register bank is done by adding the “using “ in the function declaration, as shown in our code examples.)

Finally, if the DPTRs are not banked, and if the ISR alters their value, they may also need to be stacked (and later unstacked) by the ISR. Software developers typically use compiler directives to avoid this.

Table 1 summarizes the latency for register stacking and unstacking in our 8051s.

Table 1

Table 1: Clock cycles for stacking/unstacking registers with CAST 8051 cores.

So, a typical ISR routine saving just the PSW and ACC and initializing the PSW has a delay of five cycles. The worst realistic case (i.e., when shadowing is exploited) is 15 cycles (six for PSW, ACC, B, eight for DPTRs, and one for initializing the PSW). Note that failing to exploit register shadowing is generally a bad coding practice but in extreme cases may be unavoidable; stacking or unstacking each GPR then adds two cycles of latency.

Real Latency

Regardless of how processor vendors define interrupt latency, users care about the number of cycles between the interrupt assertion and the execution of the first “useful instruction” of their ISR. Let’s call this the Real Interrupt Latency.

Figure 1 depicts this for CAST 8051 cores. It’s typically 10 or 18 cycles, depending on whether the DPTRs are used by the ISR. 

Figure 1

Figure 1. 8051 interrupt handling sequence.

Interrupt Latency Jitter

Variations in interrupt response time (or jitter) can cause undesirable effects in some applications, causing, for example, audio distortion or motor noise and vibration. Instructions that execute in multiple cycles (where the ISR must wait for the current instruction to complete) are the main source of this jitter.

Older 8051 architectures could be problematic for jitter-sensitive applications, as most instructions executed in multiple cycles (up to 12 in the original 8051). But the best modern 8051s eliminate this shortcoming by executing most instructions in one cycle. For example, the only S8051XC3 instructions needing an additional cycle are those that operate on data not stored in registers, yielding a remarkably low interrupt latency jitter of one clock cycle.

Comparing 8051 to 32-bit RISC Architectures

Choosing between a small, low-power 32-bit RISC processor and an 8051 is a common dilemma for today’s designers. The ARM® Cortex-M0 is especially popular, and is promoted for its low interrupt latency and jitter. Let’s compare this to CAST’s S8051XC3.

The Cortex processors employ a Nested Vectored Interrupt Controller (NVIC) that works with the CPU to reduce interrupt latency. With its NVIC, the Cortex-M0 resolves interrupt priorities and saves some processor context and five user registers in just 16 cycles. The Real Interrupt Latency is further extended by at least one more cycle, as every ISR must clear the NVIC’s interrupt pending bit.

In many cases, an ISR would use more than the five registers automatically  pushed to the stack, with the worst case needing 11 more cycles to push all 16 user registers if these are all in use. This brings the Cortex-M0 processor’s Real Interrupt Latency up to 17 cycles in the best case, and up to 28 cycles in the worst.

Interrupt latency jitter of the Cortex-M0 can be quite significant, as interrupts may arrive during a load or store instruction that executes over multiple cycles (depending on the amount of data being moved). Moreover, the Cortex-M0 needs three or four cycles to execute some instructions, potentially causing more jitter than our 8051’s one or two.

Table 2 compares the interrupt latency measures of CAST’s S8051XC3 and ARM’s Cortex-M0.

As promoted, the Cortex-M0 does indeed exhibit remarkably low interrupt latency, without using shadow-registers. But the 8051 has better typical and worst case delays, and less jitter. Moreover, the Cortex-M0 needs its rather complex NVIC to compete, and this carries penalties in silicon area and power consumption. Although information on the NVIC size is not publicly available, the silicon footprint of such a complex module is likely comparable to that of an entire 8051 CPU.

Table 2

Table 2. Comparing 8051 and 32-bit RISC interrupt processing.


We have shown that modernized 8051s can deliver lower interrupt latency than 32-bit RISC processors, despite this being one of the larger processor’s widely-promoted strengths. These 8051s also execute interrupt routines very quickly—offering comparable performance and running at much higher clock frequencies when required—and they have an edge in power consumption with their significantly smaller silicon footprint and advanced power management techniques.

In conclusion, modernized 8051s can clearly still win the selection battle over 32-bit RISC processors for IoT and other devices requiring small, low-energy embedded controllers that respond quickly to interrupts, and the smart general/system designer is wise to consider deploying them for such applications.

About the Author

Dr. Nikos Zervas joined CAST, Inc. in 2010 and is the VP of Marketing. Before that he was a co-founder, chairman, and CEO of video/image SIP vendor Alma Technologies, SA. He served as a board member of the Hellenic Silicon Industry Association from 2010 to 2013, is a senior member of IEEE, and has published over forty papers in journals and international conferences.

CAST is a trademark of CAST, Inc. All other trademarks and registered trademarks are the property of their respective owners.


tw    fbk    li    li    li
Top of Page