Assessment Quality

The Assessment Quality feature is an improvement on the previous Prerequisite blade.

On-Demand Assessment Assessment Quality feature

Assessment findings are only as good as the data collected to assert those findings. Unsuccessful discovery and data collection degrades the value of the assessment results and can contribute to perceptions of poor quality. Sometimes these problems even go un-noticed without a capability to surface them and provide you actionable guidance to address them. This major enhancement to the existing prerequisites focus area blade on the Azure Log Analytics assessment dashboard was developed with two goals in mind:

  • Surface assessment quality issues to allow you an opportunity to remediate and re-run assessment to ensure good assessment quality.
  • Minimize the need for you to raise support tickets to address data submission quality issues by offering specific and actionable remediation content.

Specifically, the capabilities brought to market which enhance the existing prerequisite focus area blade include:

  1. Categorizing potential failure states into the initial assessment execution phases which include discovery and prerequisite collection.
  2. Assessment quality index to visually a %success rate for assessment data collection.
  3. An updated donut graphic to visually represent the categories and assessment quality index.

What does "Assessment Quality Index" represent?

AssessmentQualityIndex = Passed Prerequisite Workflows / Total Prerequisite Workflows

When the assessment runs, we run various Collectors and then run Analyzers on the results of those. If any Collectors fail (i.e. because WMI remoting failed against a target), we won't have anything to run Analysis on. This will result in an incomplete assessment result which reduces the quality we deliver to you.

Prerequisites were initially created to remedy this situation. Prior to running Collectors, we run the Collectors in "prerequisite mode" to test whether specific prerequisites have been met (i.e. we verify WMI remoting is enabled on the target). If any prerequisites fail, we surface those failures in the Azure portal under the "Prerequisites" blade; but the initial implementation of prerequisites did not do a great job of showing the end user an overall picture of the quality of the assessment.

Consider the following scenario. You are running the ADAssessment and 100 targets are identified during Discovery. We run a Collector in prerequisite mode to confirm WMI Remoting is enabled, but it fails against every target because you have not enabled WMI Remoting in their environment. Prior to Assessment Quality, you would see a single prerequisite failure in Azure related to "WMI Remoting not enabled." However, there would actually be 100 failed prerequisite workflows and the assessment would hardly have anything to analyze, resulting in a very poor assessment. This wasn't necessarily obvious because you could just see a single prerequisite failure in Azure.

Now, with the Assessment Quality feature, we provide an Assessment Quality Index, which is simply the percentage of Passed Prerequisite Workflows/Total Prerequisite Workflows. So in the example above, you would see a 0% or 1% Assessment Quality Index, making it clear that the overall assessment quality was extremely poor due to prerequisite failures.

*Note: In real life, ADAssessment probably runs a wider variety of prerequisite workflows, not just WMI Remoting, so we'd more likely see a higher Assessment Quality Index.

What's the difference between Discovery Failure, Important Prerequisite Failures, and Other Prerequisite Failures?

The assessment goes through various phases when it runs. First, we execute Discovery to find the various nodes that will be assessed. Next, we run various Collectors in Prerequisite Mode. Finally, we will run the Collectors normally, then run Analysis.

Prerequisites are primarily concerned with the first two phases: Discovery and Collectors run in Prerequisite Mode.

The prerequisites output file will now specify the phase in which each prerequisite failure occurred and we will surface that in Azure.

Why does the Important Prerequisite Failures key only sometimes show up in the Donut Chart/Legend?

When IP authors create their Assessments, they can optionally mark Workflows as Important. This signifies that the Workflow is critical to the quality of the assessment. If the given Assessment has no Important Workflows defined, we will NOT show the Important Prerequisite Failures in the Donut Chart/Legend in Azure.

Why do we sometimes only display "Discovery Failures", but not the other categories in Azure?

If the MVE (Minimum Viable environment) test fails during Discovery, it signifies that certain fundamental prereqiusites were not met (i.e. in SQLAssessment, no SQL servers could be found). In that case, Collectors are not even run in Prerequisite Mode - we bail out early from the assessment run. When this happens, we only display Discovery Failures in Azure.

Why do I see a blank Assessment Quality blade?

Not all data collection machines/scheduled tasks submitting data to this Log Analytics workspace have rerun the Assessment with the new Assessment Quality bits. Will resolve itself automatically once 1) all data collection machines and scheduled tasks on those machines have rerun the assessment using new bits, OR 2) data Retention period (default of 30 days) causes old data to decay, leaving only "good" data that was generated after the Assessment Quality feature was released.

The Assessment Quality feature required us to add a new CustomData column to the assessment output files, and the new UX parses this new CustomData column to generate the statics displayed in the new Assessment Quality blade.

This made backward compatibility tricky. The new UX only works if you've run the assessment using the new ODA Client changes that will populate the CustomData column. So we have some code in our UX that will identify whether the Log Analytics workspace has any records with CustomData populated, signifying the assessment has been run with the new bits. If not, we fall back to the old prerequisites blade. If CustomData DOES exist, then we display the new Assessment Quality blade.

But it's possible for multiple data collection machines (or even multiple scheduled tasks on the SAME data collection machine) to be submitting data to a single Log Analytics workspace, and this blade is an aggregation of prerequisite results for all data collection machines connected to the workspace. So what happens if some machines have submitted data with the new CustomData column, but some haven't? Do we display the old UX or the new UX?

This is when you'll see the blank blade in the screenshot above. There wasn't a great solution here, so you'll see this broken intermediate state until all data collection machines have submitted data using the new bits.

There are some unfortunate edge cases here that could result in a painful experience for customers:

  1. We know that for some assessments it's quite common to have multiple scheduled tasks set up on the same data collection machine (Windows Client Assessment, for example), and those scheduled tasks are set up to run on different days. Let's say you have two tasks, one on Monday and one on Wednesday. Once the task runs on Monday, you'll see the blank blade above until the second task runs on Wednesday, at which point the customer should start seeing the new Assessment Quality blade.

  2. What if 3 data collection machines all have SQL Assessment running and pointing to the same Log Analytics workspace, but one of those machines was decommissioned 2 weeks ago? The other two machines would run the assessment with our new bits, but because the third machine was decommissioned, it can't run the assessment with our new bits. In this scenario, the customer would see the blank blade above.

  3. We've seen this problem in some of our test workspaces that have LOTS of people running assessments and submitting data to the same Log Analytics workspace. In this case, it's very difficult (probably impossible) to track down everybody and get them to rerun the assessment using the new bits.

However, the problem will automatically resolve itself once one of the two following things happen

  1. All data collection machines and scheduled tasks on those machines have rerun the assessment using new bits, OR
  2. Data Retention period (default of 30 days) causes old data to decay, leaving only "good" data that was generated after the Assessment Quality feature was released.

For that reason, we decided this was an acceptable amount of turbulence to endure.

Worst case scenario, customers can always create a new Log Analytics workspace, or use the Data Purger API, then rerun the assessment which will result in a clean Assessment Quality blade.

For general feedback on the Resource Center or content, please submit your response to UserVoice. For specific requests and content updates regarding the Services Hub, contact our Support Team to submit a case.