question

SAM-7891 avatar image
0 Votes"
SAM-7891 asked SAM-7891 commented

How to increase Azure RTOS GUIX performance

Hello,

I designed my demo with Azure RTOS GUIX, And running it on a STM32H750-Disco board;
The problem is, I'm getting around 15-18 fps from GUIX which is not ideal, how can I improve this?

Some extra information that would help:
- I have 4 circular gauges in my demo,
- Code is executing from External flash(QSPI) + pixel maps are in that section as well,
- I tried moving gauge's pixel map to SDRAM/Internal Flash so far, but didn't effect the performance that much,

Would be appreciated if you help, I'll try to answer ASAP if anybody had a question.
Thanks in advance,
Best regards.

azure-rtos
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

hi @SAM-7891,

15 FPS seems like a reasonable rate for updating a gauge. What target FPS are you wanting to see?

Ryan

1 Vote 1 ·

Hello,

Yes but it's not only about frame-rate, It's running slow as well, several times slower than what it should be;

I need at least above 30 fps for smooth movements of the gauge I think.

Regards.

0 Votes 0 ·
KenMaxwell-4349 avatar image
2 Votes"
KenMaxwell-4349 answered

Hello @SAM-7891,

Regarding the flickering, that sounds a lot like a memory bandwidth issue. it sounds like the DMA2D isn't getting enough bandwidth to perform the background rendering operation. I'm wondering if something is fundamentally wrong, like the main CPU clock is not configured correctly, or the memory cache is not enabled?

My colleague put some time into further optimizing this for you. I have copied her reply below:




I modified gx_circular_gauge_background_draw to make it work for multiple circular gauge case, see attachment.
Changes:
Call _gx_icon_background_draw after call gx_display_driver_callback_assign to set wait function, so that needle rotation can be process while the hardware is drawing circular gauge background.

I also created a demo with 4 circular gauges for STM32H753I-EVAL board, see attachment.
1. Use LCD refresh interrupt as timer source to increase frame rate of the animation.
2. Replace gx_circular_gauge_background_draw with the modified version.

Performance:
1. gx_display_driver_callback_assign is NOT set, frame rate is around 49.4 frames/second
2. gx_display_driver_callback_assign is set, frame rate is around 55.6 frames/second.




She found that the order of setting the display driver callback and drawing the background image was wrong (a bug), and fixed it in the attached gx_circular_gauge_background_draw.c. However as you can see, the difference between running the needle rotation code in parallel with background drawing is not dramatic, 49 fps to 55 fps. Either case is much better than the performance you are reporting, which again makes me wonder if something very fundamental is incorrect in your CPU/memory configuration. I copied links below for the modified source file and the example project which she created for STM32H753I eval board. This change will be in our next GUIX update release.

Modified GUIX lib source file: https://expresslogic.sharefile.com/d-s3b26ec2edbc64b4787859810230b3a85
Circular gauge test project: https://expresslogic.sharefile.com/d-sa9bdeffed18a47faa8a0c09aac531191



Best Regards,

Ken

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

KenMaxwell-4349 avatar image
2 Votes"
KenMaxwell-4349 answered SAM-7891 commented

A couple of basic things to look at:

1) Is the ChromeArt graphics accelerator enabled? This is enabled/disabled via a #define in the display driver. Make sure it is enabled.
2) Is the timer running fast enough? The default setting is usually 20 ms, which would give you 50 FPS if the animation interval is set to 1 tick. You can re-define the timer source if you want to, for example you can drive the GUIX timer based on vertical sync interrupt from the LCD controller, to give you a faster and higher resolution timer upon which the animations are based.
3) Turn off RLE encoding (i.e. compressed) option for the pixelmaps that need to be fast. RLE encoding saves some space, but it also adds some time and prevents using the ChromeArt engine for pixelmap rendering. Turn off this option for the gauge pixelmaps and see what effect this has.
4) Are you manually invoking the canvas refresh, or just setting the gauge parameters and allowing the gauge to invalidate and refresh naturally? If you force things, which is allowed, you can accidently cause extra buffer toggle operations which slows things down. Best just to use the gauge API and let it refresh itself as needed.

Let me know if you need any more details, and if these suggestions help.

Best Regards,

Ken

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@KenMaxwell-4349
Hello and Thanks for helping here;

1) Yes, DMA2D is enabled of course.

2) Yes, this is true exactly, Actually I tried to change the timer source before, but I got a bit confused about several defines in gx_api.h and gx_port.h, I wanted to define GX_SYSTEM_TIMER_MS for 16ms (equal to LCD refresh rate.) but the problem was about this define:GX_SYSTEM_TIMER_TICKS, It's an integer value and It will get:
((GX_SYSTEM_TIMER_MS * TX_TIMER_TICKS_PER_SECOND) / 1000) = 16*100/1000 = 1.6 which is not compatible with the Library and that should be an integer, And I have to change TX_TIMER_TICKS_PER_SECOND from ThreadX Library which I'm not sure if it's correct;

I think I need some details how to solve this; and which files and defines should be edited.

3) Compress output is disabled for All of my gauges already, but not all of my images, Is this helps if I disable that for all of my images even the fix ones?

4) No, I just change the angle property of the gauges every time, and it will refresh naturally,

Another thing that effecting the performance probably is the animations I defined in the GUIX Studio, addition to the 4 gauges, I created more than 20 animations for some little images in my demo, which they are running in a loop (once one of them finished its animation, makes an event and the next one starts), the animation is on the opacity value of the images, And images will show/hide consecutive other words.

0 Votes 0 ·
KenMaxwell-4349 avatar image
1 Vote"
KenMaxwell-4349 answered SAM-7891 commented

Can you tell me which GUIX source code version you are using? We have done some work on making it easy to use an external timer source, but that work is very recent so I need to know your GUIX release to give you the correct advice here.

For images, if you use the "Compress Output" option the image is saved in your resource file as an RLE encoding pixelmap, which is not compatible with DMA2D. At runtime, the driver checks the image format and if the format is not compatible with DMA2D the image is rendered using our generic software rendering.

In addition, if the pixelmap is encoded with an alpha channel and 16 bpp format, this format is also not compatible with DMA2D (GUIX saves the alpha channel information in an auxiliary data chunk when configured for 16 bpp 565 format with alpha). The ST 565 format display driver looks to me like it supports images saved in ARGB 8:8:8:8 format and sending those through DMA2D, but I just tried it and GUIX Studio won't let me select that image format. This looks like a bug to me and I have entered a task to get that fixed.

If you have enough memory available and you are not already doing so you could try running in 24 bit xrgb format. It takes more memory for the display frame buffer(s), but it can be faster if there is a lot of alpha blending going on.

· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I'm using GUIX v6.1.0

Yes that is completely true, There is no RGB565 + 8 bit alpha format(ARGB8565 in other words) for DMA2D.
And I'm not using ST RGB565 as well, I'm using the 24xRGB display driver.

0 Votes 0 ·

HI Ken, @KenMaxwell-4349
Hope you are doing good;

I was a bit busy with other projects up to now,
And I just want to ask are there any updates about my question? I think you wrote in your last message that you need to know my GUIX version before you want to suggest a solution, you can find my GUIX version in my previous comment, if any further information is needed, please let me know.

I'm just asking because the FPS of my demo is not good at all! (in compare to other graphical libraries) and the FPS is in 8-10 interval, so I feel something is wrong in the project.

Thanks for your time in advance and I'm looking forward to your answer, much appreciated.

Regards,
Sam.

0 Votes 0 ·

Hello Sam,

Looking into it, will get back to you.

Best Regards,

Ken

1 Vote 1 ·

That is Great!
Thanks for your attention.

0 Votes 0 ·
KenMaxwell-4349 avatar image
1 Vote"
KenMaxwell-4349 answered

Hello @SAM-7891 ,

The logic to rotate the gauge needle image can consume some CPU time. We've worked hard to optimize this logic, and actually in a head-to-head performance test with ST's own graphics package we came out on top. But there is a key feature here that needs to be enabled. In the function gx_circular_gauge_background_draw(), we try to do two things in parallel:

1) Fire off DMA2D to draw the gauge background image and
2) Calculate the rotated needle image.

In order to do these two things in parallel, the display driver needs to have a callback function assignment function. This means when we trigger the DMA2D operation to draw the gauge background, we get a callback to do some other work while DMA2D is rendering the background image. In our example drivers for ST, we initialize the callback assignment function like this:

display -> gx_display_driver_callback_assign = gx_display_wait_function_set_24xrgb;

which is the key thing to enable this parallel execution. Is this being done in your display driver?

The only other thing we can think of is basic CPU configuration. Is the data cache enabled?

Best Regards,

Ken

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SAM-7891 avatar image
0 Votes"
SAM-7891 answered SAM-7891 edited

Hello @KenMaxwell-4349
Thanks for your detailed answer,

I do appreciate your work, and I'm actually surprised ​why the FPS is too low considering your rich library;
I have tried many different embedded GUI libraries and platforms so far, and this FPS is a bit unusual.

I understood your explanation,
I'm actually using the display driver which belongs to the STM32F746G-DK IAR Samples, and modified it a bit to port it for STM32H750B-DK;
And I just realized, that line of code you mentioned above, has been commented in the display driver as you can see below:

     #if defined(GX_CHROMEART_ENABLE)
         /* override those functions that can be accelerated with DMA2D */
         display -> gx_display_driver_horizontal_line_draw = gx_chromeart_horizontal_line_draw;
         display -> gx_display_driver_vertical_line_draw = gx_chromeart_vertical_line_draw;
         display -> gx_display_driver_canvas_copy        = gx_chromeart_canvas_copy;
         display -> gx_display_driver_pixelmap_draw      = gx_chromeart_pixelmap_draw;
         display -> gx_display_driver_pixelmap_blend     = gx_chromeart_pixelmap_blend;
         display -> gx_display_driver_8bit_glyph_draw    = gx_chromeart_glyph_8bit_draw;
         //display -> gx_display_driver_callback_assign    = gx_display_wait_function_set;
        
        
         //display -> gx_display_driver_canvas_blend                  = _gx_display_driver_24xrgb_canvas_blend;
         //display -> gx_display_driver_4bit_glyph_draw               =  _gx_display_driver_generic_glyph_4bit_draw;
         //display -> gx_display_driver_1bit_glyph_draw               =  _gx_display_driver_24bpp_glyph_1bit_draw;
 #endif

And if I uncomment this line: "//display -> gx_display_driver_callback_assign = gx_display_wait_function_set;" , there will be some abnormalities with the gauge's needle, needle will flicker a lot and most of the times the needle doesn't get rendered as well and you can't see the needle on the screen (only needle, not background.),


Now the question is, for enabling that feature (having a callback assignment function and doing those two tasks you mentioned in parallel), are there any further steps to do?
Because it is not working properly as I said, and it seems I need to modify some other places in the code as well.

I'm trying to solve it as well, but this is the display driver I'm using if you want to take a look at: 131203-display-driver.txt



And about data cache, I didn't modify that part from the sample project, the functions for initializing I&D cache are being called from here: "common_hardware_code.c -> hardware_setup()"



Thanks for your time once again,

Kind Regards.



display-driver.txt (19.0 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SAM-7891 avatar image
0 Votes"
SAM-7891 answered SAM-7891 edited

Hi Ken @KenMaxwell-4349

After you had mentioned about the potential fundamental issue in my project, I was just curious to find out if there are anything like that or not;
So I tried to compare my Clock Config, MPU setting, and other configurations with the configuration codes in the project you sent to me.

The final result was incredible, As you know, In the MPU configuration, each region has a struct with this type: MPU_Region_InitTypeDef, which includes few parameters and variables, In my MPU setting there was three regions for SRAM, SDRAM and External Flash (QSPI), when I was searching for the issues in my configurations, one of the Suspicious cases was about the External Flash region, in this setting:

   MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;

I realized it wasn't compatible with the MPU config in the STM32H743 project you sent;
After I modified this line to:

 MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;

The FPS suddenly jumped from around 8-10, to 29-34 !
Which looks absolutely strange to me!

Even further, after I have done the additional steps to optimize the GUI that your colleague suggested, It became better and now I have around 34-42 FPS!

Now I just want to say many thanks for your patience and considering my problem, I greatly appreciate your assistance in my project, and thanks to your colleague for her time spent on my issue.

After all, 34-42 FPS look pretty reasonable to me, in compare to what She reported for the FPS (55.6 fps at the end).
I think these two reasons might cause this difference:

  1. I'm using STM32H750B-Disco board, which has a low amount of Internal flash in comparing to STM32H743i-Eval, and I have to put all of the read-only data(including all of the pixelmaps and codes) in the external flash (QSPI), and it makes sense if I get lower FPS because of that, am I right?


  2. And another one, In her example project, there are 4 gauges which each one has a needle pixelmap image with 2140 pixels;
    But in my own demo, I have 4 gauges as well, but two of them using a 401-pixels image and two of them using a 9947-pixels image, means in total
    my needle images are bigger more than two times in comparing to the needle images in her project;
    And this should effect the performance a lot I think.

I would like to see if you agree with me about the above cases;

Again, I do appreciate all of your helps; I just have another quick question which I'll ask in another message!

Best Regards,
Sam.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SAM-7891 avatar image
0 Votes"
SAM-7891 answered SAM-7891 commented

I think it doesn't worth to open another topic for this little question, So I'm going to ask it here!

As long as I realized, the accuracy of the angle in the Circular Gauges is only 1 degree,

And because the Radius of Rotation in my circular gauges are pretty high (more than 300 pixels), I do need more accuracy because I need smoother rotation animations;

Are there anyway to use more accurate values for the angle? For example 0.1 degree. @KenMaxwell-4349

Regards,
Sam.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

UPDATE: I've actually done that myself, modified the library to use more than 1 degree accuracy;
But I still prefer to use the library standard features itself for this purpose, if there are any.

0 Votes 0 ·