Thanks for posting a good question. All those points are valid, and your understanding is correct. Just that, you can have either stateful or stateless app - If your application is stateful, scaling up would be best, while if your application is stateless, scaling out gives you more flexibility and higher scale potential.
Just to highlight, the App Service Plan (ASP) is the scale unit of the App Service apps.
If the plan is configured to run five VM instances, then all apps in the plan run on all five instances.
If the plan is configured for autoscaling, then all apps in the plan are scaled out together based on the autoscale settings.
If you are using auto scaling, it makes sense for your apps to be stateless since if the app scales in the user session on that instance will be lost.
Scaling out shouldn’t matter since the client would be affinitized to one instance that is still present.
I wish to add more information to give you more insight on the concepts. So, apologies for the long post.
I’m sharing all the necessary information you may be interested to know.
Yes, when the app runs, it runs on all the VM instances configured in the App Service plan.
If multiple apps are in the same App Service plan, they all share the same VM instances.
If you have multiple deployment slots for an app, all deployment slots also run on the same VM instances.
If you enable diagnostic logs, perform backups, or run WebJobs, they also use CPU cycles and memory on these VM instances.
Azure WebApps by default have ARR Affinity cookie enabled, this cookie pairs a client request to a specific server.
However, Azure Web Apps is a stateless platform and, in an environment, where we are scaling the Website across multiple instances, the ARR Affinity cookie will be bound to a specific server.
In case, the server is no longer in rotation, then all the requests corresponding to the ARR Affinity cookie will fail.
It’s advisable to avoid the use ARR cookies in a scaled environment where we have multiple instances that serve our application requests.
Disabling ARR cookies is a sustainable resolution for issues related to ARR Affinity cookies in scaled environments, where these cookies rely on the relationship with the worker machine they are paired with.
Also, when we have ARR Affinity cookies enabled, we are truly not utilizing the scaling ability where a requests can get handled by any instance that has the resources to handle the request.
Instead, based on ARR Affinity cookie mapping, the requests will always go to only to the server tied to the Affinity cookie.
For more information about “stateful” vs “stateless” applications you can watch this video:
Planning a Scalable End-to-End Multi-Tier Application on Azure App Service.
Improve scalability in an Azure web application - reference architecture