Windows Embedded CE 6.0 Advanced Memory Management

1/19/2010

Douglas Boling

February 2007

Summary

This article covers how the new version of Windows Embedded CE handles memory, how it is architected, and what impacts these changes will have on applications.

Important

This topic applies to an older product version. See documentation for the most current version of Windows Embedded Compact. Or visit Windows Embedded Products & Solutions for the latest information about intelligent systems powered by Microsoft.

The Memory Architecture

The Application Virtual Address Space

The Kernel Virtual Address Space

Programming Impacts of the New Design

Security

Interprocess Communication

Introduction

Over the last 10 years, Windows Embedded CE has grown from a fresh-faced newcomer to a grizzled veteran of the embedded operating system world. During this time, Microsoft has improved almost everything about Windows Embedded CE but the way it manages memory. Sure, Windows Embedded CE is and has always been a modern, preemptively multitasking operating system with virtual memory support, but there were some severe limits for memory and code intensive systems such as set-top boxes.

Specifically, the limits are the 32 concurrent process limit and the 32 MB application virtual space limit. Neither of these limits were a problem in the early days of Windows Embedded CE, nor are they a problem on many embedded systems built today. The problems occur on systems that are intensively media driven, therefore running Windows Media® player, systems needing large amounts of system and application code, and on systems that tend to create systems with large numbers of small processes, such as some process control systems.

Windows Embedded CE 6.0 blows away the “two 32’s,” due to a completely rewritten kernel and new operating system architecture. The new kernel allows up to 32 thousand processes running at any one time. I suspect this new 32K process “limit” should not be a problem for at least a few years. In addition, virtual memory space for a given application has been improved from 32 MB of virtual address space per process to 2 GB of address space per process.

The Memory Architecture

To fully understand the improvements in the new kernel, a review of the architecture used in Windows Embedded CE 5.0 is helpful. Figure 1 shows the Windows Embedded CE 5.0 unified virtual address space. As with both Windows® XP and Windows Embedded CE 6.0, the top 2 GBs of the address space is reserved by the system. The lower half of the address space is divided into a number of regions. The majority of this area, almost half of the space, is defined as the Large Memory Area. This area is used to allocate large blocks of memory space typically used for memory-mapped files.

Bb331824.1f359180-f75e-4541-9de7-ca05aee9738d(en-us,MSDN.10).gif

Figure 1

Below the Large Memory Area is the set of 31 “process slots” that contain the virtual address images for the processes that are currently running. Below the process slots, at the extreme low end of the memory space is a 64-MB area. This 64-MB area, more precisely the lowest 32 MB of the area, is the replicated process slot for the process that contains the currently running thread.

It is this “slot architecture” that imposes both the 32 process and 32-MB virtual memory limit. Because there are a limited number of slots (31) there are a limited number of concurrent processes. (In marketing math, 31 process slots turns into 32 processes when you add in the kernel process, which is located in the upper 2 GBs of the address space.)

An expanded look at the lowest 64 MB of the earlier Windows Embedded CE process space is shown in Figure 2. The lower 32 MB of the process space is the replicated space from the process slot of the process where the currently running thread is executing. The upper 32 MB of this space used to load the code and read only memory for ROM-based dynamic-link libraries (DLLs). This upper 32 MB, known as “slot 1,” is shared across all running applications.

Bb331824.9f372154-b178-432a-a8ef-4d509a6fff5f(en-us,MSDN.10).gif

Figure 2

The 32 MB virtual space limit comes from the size of each individual process slot. Because of the single, unified address space, making the virtual space per process larger would result in fewer total slots and thus fewer concurrent processes. As it is, the compromise of 32 processes and 32 MB per process works, or at least worked, pretty well.

The Application Virtual Address Space

The memory architecture of Windows Embedded CE 6.0 is shown in Figure 3. With the new design, each running process gets its own copy of the entire lower 2 GBs of the address space. While this 2 GB space seems at the surface the same layout as Windows XP, the use of the application address space is different.

Bb331824.21198bcf-7b57-4aab-9d5b-7e509c880f40(en-us,MSDN.10).gif

Figure 3

Figure 4 shows the layout of the virtual memory space of a given Windows Embedded CE 6.0 process. Like the earlier Windows Embedded CE architecture, an application’s virtual memory space is divided in two. The lower 1 GB is space where the application code is loaded and where the application can allocate memory. This is where all memory allocations will be placed as well as the location for the stacks for all threads in the application.

Bb331824.e6d63e77-d841-4b32-8d6f-ad64fcb2f122(en-us,MSDN.10).gif

Figure 4

Just above this region is a 512-MB region where the system will load the code and read only data for the DLLs that are loaded by the various applications currently running. Like earlier versions of Windows Embedded CE, a given DLL loaded by an application is loaded at the specific address for one process is loaded at the same address for all processes that load that DLL. DLLs in this region are loaded from the bottom up in the region (starting at 0x4000 0000) instead of top down, as was the case for earlier versions of Windows Embedded CE.

Just above the DLL space, starting at address 0x6000 0000, is a 256 MB region used to allocate RAM-backed memory-mapped files. RAM-backed memory-mapped files, also known as memory-mapped objects are memory-mapped files that do not have an actual file backing the data in the object. Memory-mapped objects are typically used for interprocess communication. To facilitate backward compatibility, if a named memory-mapped object is allocated in more than one process, it is mapped at the same base address for all processes that map the object. If a process opens an actual file for memory-mapped access, the buffer for that memory-mapped file is allocated in the lower 1 GB of the application’s address space.

At virtual address 0x7000 0000 is a 255-MB region that is used for communication between the operating system and the application. This region is read only to the application but can be read and written to by the operating system. Finally, there is a 1 MB guard region at 0x7FE0 0000 that not accessible by the application or operating system.

To summarize, the application has 1 GB of its address space for its code and for all memory and stack allocations, and another GB that is available for dedicated purposes. While only half of the virtual memory space is available for memory allocations, it is much better than the earlier 32-MB region in Windows Embedded CE 5.0 that was available for the same purpose. In addition, because Windows Embedded CE 6.0 retains the current limit of 512 MB of physical RAM, I suspect a system will run out of physical RAM before an application runs out of virtual memory space.

The Kernel Virtual Address Space

The memory map for kernel address space in Windows Embedded CE 6.0 is shown in Figure 5.

Bb331824.1458e733-59eb-4e04-af0e-f44483231571(en-us,MSDN.10).gif

Figure 5

As with earlier versions of Windows Embedded CE, the first two regions of the kernel address space are the cached and uncached windows in the physical address space. It is through these windows that the operating system and the drivers access the RAM and memory-mapped peripherals.

At address 0xC000 0000 there is a 128-MB region where the ROM-based DLLs loaded by the kernel are mapped. The 128-MB region just above this at 0xC800 0000 is used by the file system to map the RAM-based object store.

The region starting at 0xD000 0000 is the kernel’s virtual machine space. This region is where the kernel mode side of the operating system executes. The kernel, all operating system extensions such as FileSys and GWE, as well as all kernel mode device drivers, are loaded in this region. The size of this region depends on the CPU. For SH4 CPUs, the region is 256 MB but for all other CPUs the size is 512 MB. Finally, the region at 0xF000 0000 is used by the kernel for CPU-specific purposes.

This new memory map is a big clue that this is not your father’s Windows Embedded CE. For an even clearer example of the vast changes to the operating system, let’s turn to how the Windows Embedded CE 6.0 is architected.

The Operating System Architecture

As with the memory discussion, to fully appreciate the changes in the Windows Embedded CE 6.0 kernel, we need to hearken back to Windows Embedded CE 5.0 to see how it was put together. From its inception, Windows Embedded CE had been designed around a series of user mode processes called Process Server Libraries (PSLs). While the kernel, Nk.exe operated in kernel mode, the other parts of the operating system such as the file system, the device manager, and the graphics subsystem were each separate, user mode executables named FileSys.EXE, Device.EXE, and GWES.EXE, respectively.

These separate processes made the operating system robust, because the major subsystems were protected from one another, but at the expense of performance. A single function call to the operating system caused at least one and possibly two process switches. In addition these processes were subjected to the same 32-MB process limit that Windows Embedded CE imposed on all processes.

The new Windows Embedded CE 6.0 kernel does away with separate processes and brings all the subsystems into the kernel’s virtual machine. This change improves the performance of the operating system because communication between the subsystems is now a simple, intra-process call. Figure 6 shows a diagram of the new operating system architecture.

Bb331824.e6c4ca2e-ad9e-4600-98da-0068b7ee5247(en-us,MSDN.10).gif

Figure 6

Notice that the earlier subsystems (FileSys, Device, and GWES) are now DLLs. In addition, the kernel code that used to be in Nk.exe is now in Kernel.dll. The new Nk.exe contains only the OEM abstraction layer code and a very thin compatibility layer. This separation will improve maintainability because the kernel can now be updated independently from the OEM code.

Now that the device manager is in the kernel VM, most of the device drivers also migrate there too. As with previous versions of Windows Embedded CE, the device manager will load device drivers both on boot and on demand but now, instead of running in user mode, most device drivers will operate in kernel mode.

While running the drivers in the kernel VM might imply that a big porting job is looming for OEMs moving to version 6.0, the porting should be quite simple. The key to the simple porting task is a new DLL named k.Coredll.dll. This DLL mimics Coredll.dll, which still resides in user mode, to provide the same API to kernel mode code as is presented to user mode applications. When a kernel mode DLL calls an API such as VirutalAlloc, k.CoreDll simply reflects the call to Kernel.dll for processing. Because this call is all within the same VM, the time to call the VirtualAlloc code is significantly less than if a driver had called it in Windows Embedded CE 5.0 or before.

Coredll.dll is not the only DLL to get this “k.” treatment. Any DLL that needs to be loaded in both kernel and user mode will be actually loaded in both. Because Windows Embedded CE 6.0 retains the need to keep a single instance of a DLL at a fixed address, the kernel copy of the DLL will have its name mangled by prefixing a k. to the DLL name.

There are some drivers that, on some systems, should not be in the kernel VM, for example, a third-party driver that was installed on a device after it shipped. For these types of drivers, Windows Embedded CE 6.0 provides a user-mode device driver manager that will load the driver in user mode. User-mode driver will be somewhat slower when communicating with calling applications but their isolation will improve security.

Services are supported in Windows Embedded CE 6.0 almost identically to the way they have been supported in previous versions of the operating system. Like before, services under Windows Embedded CE 6.0 will operate in user mode and will be loaded by the services manager. Their design does not change and, except for a minor registry, change services written for Windows Embedded CE 5.0 will run unmodified in Windows Embedded CE 6.0.

Programming Impacts of the New Design

While the new architecture looks interesting, most programmers reading this article may be wondering, “What does this mean for my applications?” Fortunately, the operating system changes noticeable by applications are going minor, and mostly, for the good.

First, let’s look at the old problems that are now gone. Windows Embedded CE systems have been plagued for years with the problem of too many DLLs taking too much of a processes virtual memory space. The small application VM space, along with some rules that Windows Embedded CE used for loading DLLs causes fits for some systems. With the new, 2 GB address space, the “DLL crunch” problem is a thing of the past.

Another effect of the larger memory space, the problem of running out of virtual memory space when reserving large numbers of virtual memory blocks is now gone. In fact, system programmers who used to be obsessed with the small VM space are now going to see applications that run out of physical RAM before they run out of VM space. It’s not that Windows Embedded CE 6.0 uses that much more RAM than earlier versions, rather than with the VM barrier removed, lazy programmers will simply allocate memory to fill up the available space.

Security

A standard feature of Windows Embedded CE since 2.12 has been the “trusted module” method of security. In this scheme, modules (EXEs and DLLs) are checked during the load process. The OEM code can then decide to have the system load the module in either “trusted” or “untrusted” mode. If the module runs in “trusted mode,” it can call any API in the system. Code in modules that are “untrusted” cannot call a small set of system critical API and cannot set any thread to a priority higher than eighth from the lowest priority. In addition, the OEM can even tell the system not to load a module.

Windows Embedded CE 5.0 and before also has a special mode where the system runs all code in kernel mode, instead of running the kernel in kernel mode and the remainder of the system in user mode. The advantage of running in “all kernel mode” is performance with the tradeoff being security, because the kernel code and memory space is accessible to all applications.

Windows Embedded CE 6.0 does away with both “all kernel mode” and the trusted model of security. All kernel mode is not necessary because most of the performance gains of “all kernel mode” are gained by the new kernel architecture. The trust model goes away in anticipation of the porting the desktop’s Access Control List (ACL) security to a future version of Windows Embedded CE. For Windows Embedded CE 6.0, there is no trusted model, nor is there ACL security.

Interprocess Communication

Windows Embedded CE 6.0 provides the same interprocess communication tools as in previous versions of Windows Embedded CE. These tools include RAM-backed memory-mapped files, point-to-point message queues, and the old classics such as the WM_COPYDATA message.

What is not available in Windows Embedded CE 6.0 are the “slot based” communication tricks. There are some applications that use the MapCallerToProcess and SetProcPermissions APIs to be able to read and write memory across process boundaries. These two APIs, along with a few other functions that depend on the slot model, are no longer relevant. While they are exported from Coredll.dll for compatibility, the have no effect in Windows Embedded CE 6.0. A workaround for applications that use SetProcPermissions is to use ReadProcessMemory and WriteProcessMemory, which are supported in CE 6.0.

Another change in Windows Embedded CE 6.0 is that applications can no longer copy handles from one process to another. CE 6.0 uses separate handle tables for each process, so handle values are independent for each process. To work around this issue, applications should use the DuplicateHandle API to clone a handle for use in another process.

In general, well written Windows Embedded CE applications (well written meaning they do not use slot-based tricks) will run unmodified in Windows Embedded CE 6.0. To ensure that your application will run, you can use a compatibility testing tool that will be delivered in the Windows Embedded CE 6.0 Platform Builder.

Conclusion

Windows Embedded CE 6.0 is a huge advance for Windows Embedded CE. The removal of the classic kernel limits is going to reduce the demands for consultants who have made a living helping companies work around these issues. This new memory model brings Windows Embedded CE much closer to the desktop model without the size or cost of Windows XP. Expect to see a whole new class of powerful devices driven by this new, better version of Windows Embedded CE.

See Also

Other Resources

Windows Embedded CE 6.0 Technical Articles