Hyper-V - Life of Interrupt: Part 1
In this series of posts, I am going to talk about interrupt handling in a virtualized environment (specifically Hyper-V). This discussion would also include interrupt handling on systems that have IOMMU based interrupt remapping support. Interrupt remapping is required in Hyper-V for supporting SR-IOV enabled devices and device assignment to virtual machines.
The series is mostly written from software perspective and hardware details are only provided where necessary for understanding of the concepts. The series is focused on x86/x64 based architectures though the concepts described here can be applied to other architectures.
Interrupts are signals sent to the processor to indicate a condition that requires immediate attention. An interrupt may cause processor to interrupt its current instruction stream and switch to a different instruction stream. Interrupts typically have priority associated with them and only an interrupt with priority higher than the currently executing instruction stream can interrupt the processor. A lower priority interrupt may remain pending until the processor is done executing higher priority code. An interrupt is typically handled via special subroutines called interrupt handlers or interrupt service routines (ISRs). ISRs typically run for a short duration, doing the minimal work required handle the interrupting condition and return to the next pending instruction stream. In many cases, ISRs queue work to be executed at a later point of time at a lower priority level to ensure that the lower priority interrupts are not disabled for a long duration.
8086 Microprocessors and Programmable Interrupt Controller
Intel 8086 based microprocessors up to 80386 supported only a uni-processor design and processor supported 256 interrupts. PC based on this architecture used an interrupt controller known as programmable interrupt controller (PIC), also referred as Intel 8259A, to support hardware interrupts from external sources. One PIC had 8 interrupt inputs and various devices were connected to it. In the PC/AT based architectures, two PICs were connected in a cascaded fashion providing 15 hardware interrupts. There could be up to 15 devices connected via this PIC configuration to the PC. These devices include devices such as System Timer, Keyboard, Hard Disk Controller etc. The remaining interrupts (out of 256) are used for non-maskable interrupt, exceptions and software interrupts. Each of these type of interrupts are described below.
Hardware Interrupts: These are the interrupts that are sent to processor by external devices that are connected to PIC. Multiple devices can also share an interrupt, in which case, multiple ISRs from each device need to be chained together, as without running device's ISR, it cannot be determined as to which device is issuing the interrupt. There can be two types of hardware interrupts, edge triggered and level triggered. Edge triggered interrupts are asserted when the interrupt line transitions between high and low state. These interrupts are issued per transition and not repeated. Many hardware devices issue edge triggered interrupt when a transition of a state occurs, e.g. in networking when the packet queue goes from empty to non-empty. This makes it critical to never miss an edge triggered interrupt, because failure to handle one interrupt may result in no further interrupts from happening (e.g. because the packet queue never goes from empty to non-empty again). Level triggered interrupts happen as long as the interrupt line is held at active level (active level can be high or low for a device). These interrupts require acknowledgement at device to indicate that the interrupt is services so that the interrupt line is moved to inactive state. This is typically done by programming the device and then issuing an EOI (end of interrupt) request.
Non Maskable Interrupts: These are hardware interrupts that cannot be masked. Interrupt 2 is reserved for non-maskable interrupts. These are usually sent due to unrecoverable errors such as non-correctable bit errors in RAM etc.
Exceptions: The difference between exception and interrupts are that exceptions are raised due to certain abnormal conditions. The exceptions include things such as divide by zero exception, invalid opcode exception, breakpoint or single step exception etc. These conditions are raised due to processor exceuting an instruction that causes processor to raise an exception. This can be an arithmetic instruction resulting in divide by zero exception or an INT 3 instruction causing processor to raise a breakpoint exception. The exceptions can be further categorized into faults, traps, aborts etc. However, from software handling perspective, the ISR for all types of exception and interrupts is written in a similar fashion. It is up to system software to decide specific handling for certain exception types as needed.
Software Interrupts: These are interrupts that are generated by software. On 8086, using instruction INT, software can cause an interrupt to be generated. The usage of software interrupts depends on the design of system software. They may be used by system software to raise exceptions, cause tasks to be paused or switched or any other purpose suitable for the system software to function. A common usage for this is INT 3, which is an instruction put in by the software to cause a breakpoint exception, useful during debugging programs.
Multiprocessor Architecture and Advanced Programmable Interrupt Controller
Intel multiprocessor architecture introduced an enhanced interrupt controller, referred as Advanced Programmable Interrupt Controller (or APIC). APIC is an umbrella term that refers to two separate components, known as local APIC (LAPIC) and I/O APIC (IOAPIC) as described below.
Local APIC (LAPIC): Each processor in a multiprocessor system consists of one LAPIC. LAPIC is responsible for receiving various interrupt requests and delivering them to the processor, handling prioritization of interrupts, sending interrupts to other processors (known as inter processor interrupts or IPIs). IPIs are important part of a multiprocessor operating systems as they are used for synchronization across different processors. LAPIC can be connected directly to I/O devices via local interrupt inputs or through IOAPIC via external interrupt inputs. LAPIC can generate interrupts due to interrupt requests received from various sources such as IPIs received from other processors or itself (known as self-IPI), interrupts coming from LINT or EXTINT, thermal and performance interrupts or APIC timer interrupt etc.
I/O APIC (IOAPIC): IOAPIC connects to the devices to allow device interrupt requests to be routed to LAPIC(s). There can be one or more IOAPIC in the system. IOAPIC receives interrupt requests from the devices and sends them to LAPIC(s) based upon the redirection table entries (RTE) programmed in the IOAPIC.
Message Signalled Interrupts (MSI)
PCI introduced a concept of message signalled interrupts. MSI worked by doing a DMA of 64-bit value (MSI data) to a 64-bit physical memory location (MSI address). This memory location is a set of special addresses that are intercepted by the chipset and appropriate interrupt request is sent to the LAPIC. Please note that PCI devices can also be connected via IOAPIC, in which case, the interrupt request to LAPIC is generated based on IOAPIC RTE. However, for PCI devices, not connected via IOAPIC, interrupt requests are generated by doing DMA of MSI data to MSI address. Using MSI, removes the dependency on IOAPIC and provides a more flexible way to generate interrupts as the interrupt requests are not limited by number of interrupt inputs on the IOAPIC.
I have very briefly touched upon the topic of interrupts in this post. It is a complex topic with lots of details, so I have provided a list of references for the folks who would like to learn more about it. In future posts in this series, I would talk about, how interrupts are handled in Hyper-V environment and then moving on to topic of interrupt remapping and how interrupt requests from physical devices are routed to virtual machines.