What’s New for Performance in WPF in .Net 3.5 SP1
As you know the .NET Framework 3.5 Service Pack 1 Beta download is now available. There are many improvements in this release that we are very excited about. Scott Guthrie blogged about these improvements here and so did Tim Sneath. In this blog I want to specifically focus and provide more details on the performance improvements coming in WPF.
Significantly improved BitmapEffects performance
- Up until .Net 3.5 SP1, all BitmapEffects were rendered in Software, now the Blur and DropShadow BitmapEffects are Hardware Accelerated and rendered by the GPU.
We measured ~5x CPU, frame rate and video memory usage gains in by using the GPU for some scenarios.
- It is important to note that the other built-in effects, (namely OuterGlow, Bevel, Emboss) continue to render in Software and should be avoided.
- “Most’’ apps see gains simply by upgrading to .Net 3.5 SP1, without re-compile.
- No gains will be seen after upgrade if the app uses effects that are:
- Are used in an EffectGroup
- Have too large a blur radius
- Aren’t running on a Tier-2 video card (DX Ver. >= 9.0)
- New Effects (now called BlurEffect, DropShadowEffect ) are introduced and these will always be Hardware Accelerated.
- It is recommended that you do not use old Effects: BevelBitmapEffect, BlurBitmapEffect, DropShadowBitmapEffect, BevelBitmapEffect, EmbossBitmapEffect, OuterGlowBitmapEffect as these are being marked obsolete. Instead use the new effects.
- Note that the legacy Effects extensibility mechanism has not been accelerated
New Effect Extensibility
- We added support to Hardware Accelerated HLSL shaders with the new ShaderEffect class
- These extensible effects can expose dependency properties that can be written/read to/from, animated and data- bound just like any other DPs
- Currently we support full-trust only
- Support "multi-input" effects where multiple UIElements can be provided to and manipulated by an Effect. (included only in final RTM bits)
- See more details in Greg Schechter’s blog.
Improved DirectX Integration
- New D3DImage class, enabled developer to overlay or blend Direct3D content interchangeably with WPF content (e.g. use the Direct3D surface as a brush for WPF content, or apply it as a texture within a WPF 3D scene ) without any major performance impact.
- Note: this feature will only be available in the RTM bits
Improvements to WriteableBitmap
- WriteableBitmap is a mechanism for drawing & updating a system-memory bitmap to the screen on a per-frame basis.
- This feature was available before, but before we allocated a new bitmap with every frame update, which made it too slow for some scenarios.
- The new implementation is now much more efficient; it is synchronized with UI changes and has constant memory usage. This enables new scenarios such as, paint programs, fractal renderers, scatter plots, music visualization and other complex geometries that were not possible before.
- Note: it is recommended that you use BGR32 or pBGRA32 pixel formats when possible. Other formats must run through a format converter each frame, which reduces perf.
Improved Text Performance
- We improved our glyph management infrastructure which provides noticeable faster text rendering speed. Note that these gains will be mostly noticeable in specific scenarios such as VisualBrushes, DrawingBrushes, Viewport2DVisual3D , and not everywhere
Improved Z-index Scenarios
- We significantly improved performance of scenarios that continuously modify Z-Index property of Panel elements (such as in a 2D Carrousel scenario)
Although we have not improved this scenario, it is important to highlight some differences related to Remoting
On .Net Framework 3.0 and .Net Framework 3.5:
- Vista to Vista with DWM on, we Remoted content as Primitives (e.g. the channel protocol went over the network) (This is for the Remote Desktop case only, not Terminal Server)
- In other cases: we Remoted content as Bitmaps
On .Net Framework 3.5 SP1
- We now remote as bitmaps in ALL cases.
- The reason is that WPF 3.5 SP1 now uses a new graphics DLL (wpfgfx.dll) and certain changes could not be made to Vista’s existing graphics DLL (milcore.dll) that is also used by DWM.
- Although this could be seen a regression at first, depending on the complexity of the application scene (e.g. very rich scenes) this can actually improve performance in certain scenarios . Also, connections with reasonably high bandwidth and scenarios that don’t involve a lot of animation or 3D, for instance, tend to remote just fine via bitmaps.
New “Nearest Neighbor” BitmapScaling Mode
- We added NearestNeighbor interpolation mode to the BitmapScalingMode enumeration. This should provide further performance benefits over LowQuality mode when using the software rasterizer and more importantly, it provides missing functionality to scenarios such zoom-in while editing a bitmap in a “Paint” program. Without this feature WPF would produce blurry results.
Minor improvement to Tier APIs
- Two new members added to RenderCapability class to allow applications to fine-tune their use of shader-based Effects:
- IsPixelShaderVersionSupported(major, minor) : indicates whether the system the specified Pixel Shader version on the GPU. This is useful for understanding whether effects that you want to run will run in H, as you may potentially want to exclude them if they don't run in hardware.
- IsShaderEffectSoftwareRenderingSupported: Boolean indicating whether the system supports SSE2 and where the HLSL “JIT” will work.
- Almost all of our Animation smoothing improvements already included in .Net 3.5 , however some additional (smaller) improvements are included in Vista Service Pack 1.
Layered Windows improvements
I have blogged about the availability of QFEs for Vista and XP before.
These fixes now included in the recently released Windows XP Service Pack 3 and in Vista Service Pack 1
Note that there is a difference fix on each platform.
On XP: Huge improvements as now render in HW vs. SW before
On Vista: Smaller but still significant improvement to readback from GPU
Improved scrolling performance (by introducing Container Recycling support)
- As user is scrolling through items in a control we re-use the UI elements that go out of view (as oppose to destroy and recreate these elements in .Net 3.0 and 3.5). This turns out to provide significant scrolling improvements.
We see up to 40% scroll performance improvement in the basic case (e.g. elements contain text). We call this feature Container Recycling.
- Container Recycling is supported in WPF controls that use VirtualizingStackPanel, namely: ListView, ListBox, TreeView
- This is an opt in feature, your app need to add the Recycling mode to take advantage of this improvement:
<ListBox …VirtualizingStackPanel.VirtualizationMode="Recycling" />
Added virtualization support to TreeView control
- We now added virtualization support to the TreeView control, similar to ListView and ListBox.
- This ensures that TreeView is not consuming huge amount of memory when it contains large number of elements (e.g. 1,000’s) and it also expends much faster. This enable scenarios that were not possible before (e.g. Windows Explorer) .
- This is an opt in feature, your app need to use IsVirtualizing in order to take advantage of this improvement:
<TreeView …VirtualizingStackPanel.IsVirtualizing = “true” />
- Note that the TreeView also supports the Container Recycling feature mentioned above, so in order to take advantage of this improvement you must do:
<TreeView …VirtualizingStackPanel.IsVirtualizing=“true”... VirtualizingStackPanel.VirtualizationMode="Recycling" />
- This provides a perceived perf improvement for scrolling. It keeps the content in view as static until scrolling is complete, similar to how Microsoft Outlook scrolls.
- You could use the ScrollViewer.ScrollChanged event and write your own code to display preview thumbnail during scroll if you like.
- This is an opt in feature, so your app needs to add IsDeferredScrollingEnabled property in order to take advantage of this feature. E.g.
<ListBox …ScrollViewer.IsDeferredScrollingEnabled = “true" />
Better fundamentals for DataGrid control
Various enhancements were made to enable developers to write a faster and better DataGrid:
- Container Recycling
- Discussed earlier…
- Column Virtualization Extensions
- New APIs added to VirtualizingStackPanel that exposes hooks to enable writing your own custom panel with virtualization in the horizontal direction. These could be used by a DataGrid or by other custom panels.
- Other non-perf DataGrid related features:
- MultiSelector support to handle multi-selection and bulk editing scenarios
- IEditableCollectionView - New interface between data controls and data source that enables editing/adding/removing items in a transactional way.
- StringFormat - shortcut to display data bound number as formatted text
- Alternating Rows - Enables setting alternating properties on rows of ItemsControl (e.g. alternating background colors in a DataGrid)
- Conversions for Null Values - Recognize null values in editable controls and convert to appropriate value based on type. It also adds TargetNullValue property which enables app author to designate any value as equivalent to “null”
- Item-Level Validation - By using Binding Groups this applies validation rules to an entire bound item. For example it can enable validate & commit scenario for a form with few bind-able edit fields. (available in final RTM bits only)
Various enhancements were made to improve WPF application startup time
a) Improved Coldstart time
- By optimizing how we layout the code blocks within CLR and WPF NGEN images we improved the disk IO access patterns we are measuring up to 40% faster coldstart time on average. (we define 'coldstart' for the first WPF app you launch after system re-boot)
- These gains are depending on the scenario and the application size. Larger apps likely to see better gains than small apps.
Improved coldstart for strong-name signed assemblies not GAC’d. In certain deployment scenarios, an app author may choose to strong-name sign their application assemblies but not to GAC the assemblies because it requires admin privileges.
In this scenario, the entire assembly had to be read off the disk so the CLR can do the hash verification. This had a significant coldstart hit.
In .Net 3.5 SP1 the CLR bypassed this hash verification by default, which means that the app have less disk seeks and disk read and therefore startup time is improved. (As indicated in this blog the majority of coldstart time is spent on disk access.)
The expected startup gains depends on the assemblies size and on what else the app is doing in its startup, but in one scenario we measured ~5% coldstart gains in addition to the gains measured in (a).
If you want to disable this default, an app author can provide a config file (e.g. YourApp.exe.config) and add this:
<bypassTrustedAppStrongNames enabled=”0” />
You can find more details in this blog.
c) XBAP coldstart improvements
- By improving the concurrency of the ClickOnce download sequence we are now seeing up to 10% startup gains for XBAPs scenarios in addition to the gains measured in (a).
- Up until 3.5 SP1, the XBAP progress UX was rendered using WPF, which means we had to coldstart the CLR and WPF. Starting with .Net 3.5 SP1, we now render the progress UX using HTML which will be shown almost instantly and will improve the perceived startup performance.
d) Splash Screen to improve perceived startup
- Although coldstart time is improved, it is still not instant. In .Net 3.5 SP1 we are adding new public Splash Screen APIs to allow developers to easily add Splash Screen and make their application startup experience more responsive.
- Some simple customization is possible with these APIs.
- An intuitive integration with Visual Studio 2008 SP1 to make this experience even easier is coming in RTM.
- Note: these APIs will only be available in the RTM bits
Important note: Although we made some impressive improvements to WPF application startup, don’t expect it to be as fast as Win32 application or Winform. WPF simply load more code off the disk. We still expect application developers to properly optimize their app for faster startup and follow the recommendations provided in this blog.
- .NET Framework 3.5 Client Profile (Client Profile): Smaller and faster redistribution of .Net Framework 3.5 SP1 with a download size of close to 28MB for client applications.
- Brand-able deployment UX
- New, smaller deployment and bootstrapper
- Framework install integrated with Application. Installer can launch app after the framework installed. Supports .msi, .application or .xbap
- Read more in this blog as well as this.
- Modest improvements to working set, especially during startup
- We fixed few memory leaks mostly related to usage of Image control.
More details and additional info on how to find and avoid memory leaks in WPF is in my blog.
Improvements to NGEN / JIT
- The NGEN throughput is improved
- Modest improvements to JIT time & execution time of the JIT-ed code