Characteristics and limitations for Computer Vision spatial analysis

Computer Vision spatial analysis is a building block for creating an end-to-end solution which includes other building blocks. It's not possible to provide universally applicable estimates of accuracy for the actual system you are planning to deploy.

How accurate is spatial analysis?

System accuracy depends on a number of factors, including core model accuracy, camera placement, configuration of regions of interest, how people interact with the system, and how people interpret the system's output. The following sections are designed to help you understand key concepts about accuracy as they apply to using spatial analysis.

Language of accuracy

The accuracy of a spatial analysis skill is a measure of how well the system-generated events correspond to real events that happened in the space. For example, the PersonCrossingLine should generate a system event whenever a person crosses a designated line in the cameras field of view. To measure accuracy, one might record a video with people walking across the designated line, count the true number of events based on human judgement, and then compare with the output of the system. Comparing the human judgement with the system generated events would allow you to classify the events into two kinds of correct (or "true") events and two kinds of incorrect (or "false") events.

Term Definition Example
True Positive The system-generated event correctly corresponds to a real event. The system correctly generates a PersonLineEvent when a person crosses the line.
True Negative The system correctly does not generate an event when a real event has not occurred. The system correctly does not generate a PersonLineEvent during a time when no one has crossed the line.
False Positive The system incorrectly generates an event when no real event has occurred. The system incorrectly generates a PersonLineEvent when no one has actually crossed the line.
False Negative The system incorrectly fails to generate an event when a real event has occurred. The system incorrectly fails to generate a PersonLineEvent when a person has actually crossed the line.

There are many scenarios in which spatial analysis can be used. Accuracy has different implications for the people involved depending on the scenario. Considering each of the defined example use cases:

Measuring social distancing compliance In this case, a false positive would occur if the system inaccurately flagged an interaction between two people as a social distance violation. A false negative would be if the system missed an instance where two people violated the social distancing guideline.

A customer who is concerned about safety may be willing to accept more false positives in order to prevent false negatives, where the system misses cases in which the 6-foot social distancing rule is violated. This customer might choose to set a 7 foot threshold instead of 6, increasing the number of potential violations flagged by the system, in order to reduce the chance that a violation is missed.

Queue management In this case, the system could use PersonCrossingPolygon enter and exit events to calculate how many people are in line and the wait time. If several false positive enter events occur, the system would overestimate the wait time resulting in recommending the store manager deploy more associates to checkout than necessary wasting resources. If several false negative enter events occur, the system would underestimate the wait time resulting in recommending too few associates be deployed to checkout, creating a negative customer experience.

In this case, a customer would likely be equally concerned about false positives and false negatives. To minimize the chances of each, it would be important to have a well-defined fixed space for the queue and following the best practices listed below especially the Camera Placement Guidance, Zone and Line Placement Guide and minimizing occlusion in the queue area.

Both kinds of errors reduce the accuracy of the system. For deployment recommendations, including providing effective human oversight to reduce the potential risks associated with these errors see the Responsible Use Deployment documentation.

System limitations and best practices to improve system accuracy

  • Spatial analysis should not be relied on for scenarios where real-time alerts are needed to trigger intervention to prevent injury, like turning off a piece of heavy machinery when a person is present. Space analytics is better used to reduce the number of unsafe acts by measuring the aggregate number of people violating rules like entering restricted/forbidden areas.

  • Spatial analysis has not heavily tested with data containing minors under the age of 18 or adults over age 65. We would recommend that customers thoroughly evaluate error rates for their scenario in environments where these ages predominate.

  • Spatial analysis face mask detection attribute should not be relied on if a person is wearing a transparent shield or glittery face masks; they make it challenging for the system to function accurately.

  • Spatial analysis will work best when configured with a ~15 frames per second input video stream with at least 1080p resolution. A slower frame rate or lower resolution risks losing track of people when they move quickly or are too small in the camera view.

  • Camera placement should maximize the chance of a good view of people in the space and reduce the likelihood of occlusion. Follow the instructions in Camera Placement Guidance whenever possible to ensure the system functions optimally.

  • Often objects or people will block the view of a camera occluding part a scene. This will impact the accuracy of the system, especially if occlusions occur in a region of interest. Spatial analysis has a limited ability to re-identify a person after they have been occluded. Cameras should be setup to minimized occlusions as much as possible.

  • Zone and line placement designate a specific region of interest for generating insights. The region should be optimized to cover the largest area possible without including any area that you do not care about. Too small a region can result in unreliable data. For details see Zone and Line Placement Guide.

  • Cameras should be setup to yield high quality images, avoiding lighting conditions outside the recommended operating range that result in over or under exposure of images.

  • CCTV Cameras are often setup outside or with exterior views, so lighting and weather can influence the quality of video dramatically. This will impact the accuracy of insights derived from such a camera.

  • Fisheye or 360 cameras are sometimes used in CCTV deployments. Spatial analysis can consume de-warped video from a 360 camera, but directly consuming a raw 360 stream is not supported. The system will be less accurate in detecting people in the video since it has not been trained with this kind of distortion.

  • Spatial analysis is designed to work well with fixed cameras. When cameras move, you may need to adjust regions and rerun autocalibration.

  • Skills that use auto-calibration assume that the floor in the space is relatively flat. If the floor in the space has dramatic changes in slope it may impact accuracy.

Next steps