Spatial Analysis operations
Spatial Analysis enables the analysis of real-time streaming video from camera devices. For each camera device you configure, the operations for Spatial Analysis will generate an output stream of JSON messages sent to your instance of Azure IoT Hub.
The Spatial Analysis container implements the following operations:
| Operation Identifier | Description |
|---|---|
| cognitiveservices.vision.spatialanalysis-personcount | Counts people in a designated zone in the camera's field of view. The zone must be fully covered by a single camera in order for PersonCount to record an accurate total. Emits an initial personCountEvent event and then personCountEvent events when the count changes. |
| cognitiveservices.vision.spatialanalysis-personcrossingline | Tracks when a person crosses a designated line in the camera's field of view. Emits a personLineEvent event when the person crosses the line and provides directional info. |
| cognitiveservices.vision.spatialanalysis-personcrossingpolygon | Emits a personZoneEnterExitEvent event when a person enters or exits the zone and provides directional info with the numbered side of the zone that was crossed. Emits a personZoneDwellTimeEvent when the person exits the zone and provides directional info as well as the number of milliseconds the person spent inside the zone. |
| cognitiveservices.vision.spatialanalysis-persondistance | Tracks when people violate a distance rule. Emits a personDistanceEvent periodically with the location of each distance violation. |
| cognitiveservices.vision.spatialanalysis | Generic operation which can be used to run all scenarios mentioned above. This option is more useful when you want to run multiple scenarios on the same camera or use system resources (e.g. GPU) more efficiently. |
All above the operations are also available in the .debug version, which have the capability to visualize the video frames as they are being processed. You will need to run xhost + on the host computer to enable the visualization of video frames and events.
| Operation Identifier | Description |
|---|---|
| cognitiveservices.vision.spatialanalysis-personcount.debug | Counts people in a designated zone in the camera's field of view. Emits an initial personCountEvent event and then personCountEvent events when the count changes. |
| cognitiveservices.vision.spatialanalysis-personcrossingline.debug | Tracks when a person crosses a designated line in the camera's field of view. Emits a personLineEvent event when the person crosses the line and provides directional info. |
| cognitiveservices.vision.spatialanalysis-personcrossingpolygon.debug | Emits a personZoneEnterExitEvent event when a person enters or exits the zone and provides directional info with the numbered side of the zone that was crossed. Emits a personZoneDwellTimeEvent when the person exits the zone and provides directional info as well as the number of milliseconds the person spent inside the zone. |
| cognitiveservices.vision.spatialanalysis-persondistance.debug | Tracks when people violate a distance rule. Emits a personDistanceEvent periodically with the location of each distance violation. |
| cognitiveservices.vision.spatialanalysis.debug | Generic operation which can be used to run all scenarios mentioned above. This option is more useful when you want to run multiple scenarios on the same camera or use system resources (e.g. GPU) more efficiently. |
Spatial Analysis can also be run with Live Video Analytics as their Video AI module.
| Operation Identifier | Description |
|---|---|
| cognitiveservices.vision.spatialanalysis-personcount.livevideoanalytics | Counts people in a designated zone in the camera's field of view. Emits an initial personCountEvent event and then personCountEvent events when the count changes. |
| cognitiveservices.vision.spatialanalysis-personcrossingline.livevideoanalytics | Tracks when a person crosses a designated line in the camera's field of view. Emits a personLineEvent event when the person crosses the line and provides directional info. |
| cognitiveservices.vision.spatialanalysis-personcrossingpolygon.livevideoanalytics | Emits a personZoneEnterExitEvent event when a person enters or exits the zone and provides directional info with the numbered side of the zone that was crossed. Emits a personZoneDwellTimeEvent when the person exits the zone and provides directional info as well as the number of milliseconds the person spent inside the zone. |
| cognitiveservices.vision.spatialanalysis-persondistance.livevideoanalytics | Tracks when people violate a distance rule. Emits a personDistanceEvent periodically with the location of each distance violation. |
| cognitiveservices.vision.spatialanalysis.livevideoanalytics | Generic operation which can be used to run all scenarios mentioned above. This option is more useful when you want to run multiple scenarios on the same camera or use system resources (e.g. GPU) more efficiently. |
Live Video Analytics operations are also available in the .debug version (e.g. cognitiveservices.vision.spatialanalysis-personcount.livevideoanalytics.debug) which has the capability to visualize the video frames as being processed. You will need to run xhost + on the host computer to enable the visualization of the video frames and events
Important
The computer vision AI models detect and locate human presence in video footage and output by using a bounding box around a human body. The AI models do not attempt to discover the identities or demographics of individuals.
These are the parameters required by each of these Spatial Analysis operations.
| Operation parameters | Description |
|---|---|
| Operation ID | The Operation Identifier from table above. |
| enabled | Boolean: true or false |
| VIDEO_URL | The RTSP url for the camera device (Example: rtsp://username:password@url). Spatial Analysis supports H.264 encoded stream either through RTSP, http, or mp4. Video_URL can be provided as an obfuscated base64 string value using AES encryption, and if the video url is obfuscated then KEY_ENV and IV_ENV need to be provided as environment variables. Sample utility to generate keys and encryption can be found here. |
| VIDEO_SOURCE_ID | A friendly name for the camera device or video stream. This will be returned with the event JSON output. |
| VIDEO_IS_LIVE | True for camera devices; false for recorded videos. |
| VIDEO_DECODE_GPU_INDEX | Which GPU to decode the video frame. By default it is 0. Should be the same as the gpu_index in other node config like DETECTOR_NODE_CONFIG and CAMERACALIBRATOR_NODE_CONFIG. |
| INPUT_VIDEO_WIDTH | Input video/stream's frame width (e.g. 1920). This is an optional field and if provided, the frame will be scaled to this dimension while preserving the aspect ratio. |
| DETECTOR_NODE_CONFIG | JSON indicating which GPU to run the detector node on. It should be in the following format: "{ \"gpu_index\": 0 }", |
| TRACKER_NODE_CONFIG | JSON indicating whether to compute speed in the tracker node or not. It should be in the following format: "{ \"enable_speed\": true }", |
| CAMERA_CONFIG | JSON indicating the calibrated camera parameters for multiple cameras. If the skill you used requires calibration and you already have the camera parameter, you can use this config to provide them directly. Should be in the following format: "{ \"cameras\": [{\"source_id\": \"endcomputer.0.persondistancegraph.detector+end_computer1\", \"camera_height\": 13.105561256408691, \"camera_focal_length\": 297.60003662109375, \"camera_tiltup_angle\": 0.9738943576812744}] }", the source_id is used to identify each camera. It can be get from the source_info of the event we published. It will only take effect when do_calibration=false in DETECTOR_NODE_CONFIG. |
| CAMERACALIBRATOR_NODE_CONFIG | JSON indicating which GPU to run the camera calibrator node on and whether to use calibration or not. It should be in the following format: "{ \"gpu_index\": 0, \"do_calibration\": true, \"enable_orientation\": true}", |
| CALIBRATION_CONFIG | JSON indicating parameters to control how the camera calibration works. It should be in the following format: "{\"enable_recalibration\": true, \"quality_check_frequency_seconds\": 86400}", |
| SPACEANALYTICS_CONFIG | JSON configuration for zone and line as outlined below. |
| ENABLE_FACE_MASK_CLASSIFIER | True to enable detecting people wearing face masks in the video stream, False to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be returned if detected people are not facing the camera or are too far from it. Refer to the camera placement guide for more information |
Detector node parameter settings
This is an example of the DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
{
"gpu_index": 0,
"enable_breakpad": false
}
| Name | Type | Description |
|---|---|---|
gpu_index |
string | The GPU index on which this operation will run. |
enable_breakpad |
bool | Indicates whether to enable breakpad, which is used to generate a crash dump for debug use. It is false by default. If you set it to true, you also need to add "CapAdd": ["SYS_PTRACE"] in the HostConfig part of container createOptions. By default, the crash dump is uploaded to the RealTimePersonTracking AppCenter app, if you want the crash dumps to be uploaded to your own AppCenter app, you can override the environment variable RTPT_APPCENTER_APP_SECRET with your app's app secret. |
Camera calibration node parameter settings
This is an example of the CAMERACALIBRATOR_NODE_CONFIG parameters for all spatial analysis operations.
{
"gpu_index": 0,
"do_calibration": true,
"enable_breakpad": false,
"enable_orientation": true
}
| Name | Type | Description |
|---|---|---|
do_calibration |
string | Indicates that calibration is turned on. do_calibration must be true for cognitiveservices.vision.spatialanalysis-persondistance to function properly. do_calibration is set by default to True. |
enable_breakpad |
bool | Indicates whether to enable breakpad, which is used to generate a crash dump for debug use. It is false by default. If you set it to true, you also need to add "CapAdd": ["SYS_PTRACE"] in the HostConfig part of container createOptions. By default, the crash dump is uploaded to the RealTimePersonTracking AppCenter app, if you want the crash dumps to be uploaded to your own AppCenter app, you can override the environment variable RTPT_APPCENTER_APP_SECRET with your app's app secret. |
enable_orientation |
bool | Indicates whether you want to compute the orientation for the detected people or not. enable_orientation is set by default to True. |
Calibration config
This is an example of the CALIBRATION_CONFIG parameters for all spatial analysis operations.
{
"enable_recalibration": true,
"calibration_quality_check_frequency_seconds": 86400,
"calibration_quality_check_sample_collect_frequency_seconds": 300,
"calibration_quality_check_one_round_sample_collect_num": 10,
"calibration_quality_check_queue_max_size": 1000,
"calibration_event_frequency_seconds": -1
}
| Name | Type | Description |
|---|---|---|
enable_recalibration |
bool | Indicates whether automatic recalibration is turned on. Default is true. |
calibration_quality_check_frequency_seconds |
int | Minimum number of seconds between each quality check to determine whether or not recalibration is needed. Default is 86400 (24 hours). Only used when enable_recalibration=True. |
calibration_quality_check_sample_collect_frequency_seconds |
int | Minimum number of seconds between collecting new data samples for recalibration and quality checking. Default is 300 (5 minutes). Only used when enable_recalibration=True. |
calibration_quality_check_one_round_sample_collect_num |
int | Minimum number of new data samples to collect per round of sample collection. Default is 10. Only used when enable_recalibration=True. |
calibration_quality_check_queue_max_size |
int | Maximum number of data samples to store when camera model is calibrated. Default is 1000. Only used when enable_recalibration=True. |
calibration_event_frequency_seconds |
int | Output frequency (seconds) of camera calibration events. A value of -1 indicates that the camera calibration should not be sent unless the camera calibration info has been changed. Default is -1. |
Camera calibration output
This is an example of the output from camera calibration if enabled. Ellipses indicate more of the same type of objects in a list.
{
"type": "cameraCalibrationEvent",
"sourceInfo": {
"id": "camera1",
"timestamp": "2021-04-20T21:15:59.100Z",
"width": 640,
"height": 360,
"frameId": 531,
"cameraCalibrationInfo": {
"status": "Calibrated",
"cameraHeight": 13.294151306152344,
"focalLength": 372.0000305175781,
"tiltupAngle": 0.9581864476203918,
"lastCalibratedTime": "2021-04-20T21:15:59.058"
}
},
"zonePlacementInfo": {
"optimalZoneRegion": {
"type": "POLYGON",
"points": [
{
"x": 0.8403755868544601,
"y": 0.5515320334261838
},
{
"x": 0.15805946791862285,
"y": 0.5487465181058496
},
...
],
"name": "optimal_zone_region"
},
"fairZoneRegion": {
"type": "POLYGON",
"points": [
{
"x": 0.7871674491392802,
"y": 0.7437325905292479
},
{
"x": 0.22065727699530516,
"y": 0.7325905292479109
},
...
],
"name": "fair_zone_region"
},
"uniformlySpacedPersonBoundingBoxes": [
{
"type": "RECTANGLE",
"points": [
{
"x": 0.0297339593114241,
"y": 0.0807799442896936
},
{
"x": 0.10015649452269171,
"y": 0.2757660167130919
}
]
},
...
],
"personBoundingBoxGroundPoints": [
{
"x": -22.944068908691406,
"y": 31.487680435180664
},
...
]
}
}
See Spatial analysis operation output for details on source_info.
| ZonePlacementInfo Field Name | Type | Description |
|---|---|---|
optimalZonePolygon |
object | A polygon in the camera image where lines or zones for your operations can be placed for optimal results. Each value pair represents the x,y for vertices of a polygon. The polygon represents the areas in which people are tracked or counted and polygon points are based on normalized coordinates (0-1), where the top left corner is (0.0, 0.0) and the bottom right corner is (1.0, 1.0). |
fairZonePolygon |
object | A polygon in the camera image where lines or zones for your operations can be placed for good, but possibly not optimal, results. See optimalZonePolygon above for an in-depth explanation of the contents. |
uniformlySpacedPersonBoundingBoxes |
list | A list of bounding boxes of people within the camera image distributed uniformly in real space. Values are based on normalized coordinates (0-1). |
personBoundingBoxGroundPoints |
list | A list of coordinates on the floor plane relative to the camera. Each coordinate corresponds to the bottom right of the bounding box in uniformlySpacedPersonBoundingBoxes with the same index. See the centerGroundPoint field under the JSON format for cognitiveservices.vision.spatialanalysis-persondistance AI Insights section for more details on how coordinates on the floor plane are calculated. |
Example of the zone placement info output visualized on a video frame:

The zone placement info provides suggestions for your configurations, but the guidelines in Camera configuration must still be followed for best results.
Speed parameter settings
You can configure the speed computation through the tracker node parameter settings.
{
"enable_speed": true,
}
| Name | Type | Description |
|---|---|---|
enable_speed |
bool | Indicates whether you want to compute the speed for the detected people or not. enable_speed is set by default to True. It is highly recommended that you enable both speed and orientation to have the best estimated values. |
Spatial Analysis operations configuration and output
Zone configuration for cognitiveservices.vision.spatialanalysis-personcount
This is an example of a JSON input for the SPACEANALYTICS_CONFIG parameter that configures a zone. You may configure multiple zones for this operation.
{
"zones": [
{
"name": "lobbycamera",
"polygon": [[0.3,0.3], [0.3,0.9], [0.6,0.9], [0.6,0.3], [0.3,0.3]],
"events": [
{
"type": "count",
"config": {
"trigger": "event",
"threshold": 16.00,
"focus": "footprint"
}
}
]
}
]
}
| Name | Type | Description |
|---|---|---|
zones |
list | List of zones. |
name |
string | Friendly name for this zone. |
polygon |
list | Each value pair represents the x,y for vertices of a polygon. The polygon represents the areas in which people are tracked or counted and polygon points are based on normalized coordinates (0-1), where the top left corner is (0.0, 0.0) and the bottom right corner is (1.0, 1.0). |
threshold |
float | Events are egressed when the person is greater than this number of pixels inside the zone. |
type |
string | For cognitiveservices.vision.spatialanalysis-personcount this should be count. |
trigger |
string | The type of trigger for sending an event. Supported values are event for sending events when the count changes or interval for sending events periodically, irrespective of whether the count has changed or not. |
output_frequency |
int | The rate at which events are egressed. When output_frequency = X, every X event is egressed, ex. output_frequency = 2 means every other event is output. The output_frequency is applicable to both event and interval. |
focus |
string | The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). |
Line configuration for cognitiveservices.vision.spatialanalysis-personcrossingline
This is an example of a JSON input for the SPACEANALYTICS_CONFIG parameter that configures a line. You may configure multiple crossing lines for this operation.
{
"lines": [
{
"name": "doorcamera",
"line": {
"start": {
"x": 0,
"y": 0.5
},
"end": {
"x": 1,
"y": 0.5
}
},
"events": [
{
"type": "linecrossing",
"config": {
"trigger": "event",
"threshold": 16.00,
"focus": "footprint"
}
}
]
}
]
}
| Name | Type | Description |
|---|---|---|
lines |
list | List of lines. |
name |
string | Friendly name for this line. |
line |
list | The definition of the line. This is a directional line allowing you to understand "entry" vs. "exit". |
start |
value pair | x, y coordinates for line's starting point. The float values represent the position of the vertex relative to the top,left corner. To calculate the absolute x, y values, you multiply these values with the frame size. |
end |
value pair | x, y coordinates for line's ending point. The float values represent the position of the vertex relative to the top,left corner. To calculate the absolute x, y values, you multiply these values with the frame size. |
threshold |
float | Events are egressed when the person is greater than this number of pixels inside the zone. The default value is 16. This is the recommended value to achieve maximum accuracy. |
type |
string | For cognitiveservices.vision.spatialanalysis-personcrossingline this should be linecrossing. |
trigger |
string | The type of trigger for sending an event. Supported Values: "event": fire when someone crosses the line. |
focus |
string | The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint. |
Zone configuration for cognitiveservices.vision.spatialanalysis-personcrossingpolygon
This is an example of a JSON input for the SPACEANALYTICS_CONFIG parameter that configures a zone. You may configure multiple zones for this operation.
{
"zones":[
{
"name": "queuecamera",
"polygon": [[0.3,0.3], [0.3,0.9], [0.6,0.9], [0.6,0.3], [0.3,0.3]],
"events":[{
"type": "zonecrossing",
"config":{
"trigger": "event",
"threshold": 48.00,
"focus": "footprint"
}
}]
},
{
"name": "queuecamera1",
"polygon": [[0.3,0.3], [0.3,0.9], [0.6,0.9], [0.6,0.3], [0.3,0.3]],
"events":[{
"type": "zonedwelltime",
"config":{
"trigger": "event",
"threshold": 16.00,
"focus": "footprint"
}
}]
}]
}
| Name | Type | Description |
|---|---|---|
zones |
list | List of zones. |
name |
string | Friendly name for this zone. |
polygon |
list | Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are tracked or counted. The float values represent the position of the vertex relative to the top,left corner. To calculate the absolute x, y values, you multiply these values with the frame size. |
target_side |
int | Specifies a side of the zone defined by polygon to measure how long people face that side while in the zone. 'dwellTimeForTargetSide' will output that estimated time. Each side is a numbered edge between the two vertices of the polygon that represents your zone. For example, the edge between the first two vertices of the polygon represent first side, 'side'=1. The value of target_side is between [0,N-1] where N is the number of sides of the polygon. This is an optional field. |
threshold |
float | Events are egressed when the person is greater than this number of pixels inside the zone. The default value is 48 when type is zonecrossing and 16 when time is DwellTime. These are the recommended values to achieve maximum accuracy. |
type |
string | For cognitiveservices.vision.spatialanalysis-personcrossingpolygon this should be zonecrossing or zonedwelltime. |
trigger |
string | The type of trigger for sending an event Supported Values: "event": fire when someone enters or exits the zone. |
focus |
string | The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint. |
Zone configuration for cognitiveservices.vision.spatialanalysis-persondistance
This is an example of a JSON input for the SPACEANALYTICS_CONFIG parameter that configures a zone for cognitiveservices.vision.spatialanalysis-persondistance. You may configure multiple zones for this operation.
{
"zones":[{
"name": "lobbycamera",
"polygon": [[0.3,0.3], [0.3,0.9], [0.6,0.9], [0.6,0.3], [0.3,0.3]],
"events":[{
"type": "persondistance",
"config":{
"trigger": "event",
"output_frequency":1,
"minimum_distance_threshold":6.0,
"maximum_distance_threshold":35.0,
"aggregation_method": "average"
"threshold": 16.00,
"focus": "footprint"
}
}]
}]
}
| Name | Type | Description |
|---|---|---|
zones |
list | List of zones. |
name |
string | Friendly name for this zone. |
polygon |
list | Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are counted and the distance between people is measured. The float values represent the position of the vertex relative to the top,left corner. To calculate the absolute x, y values, you multiply these values with the frame size. |
threshold |
float | Events are egressed when the person is greater than this number of pixels inside the zone. |
type |
string | For cognitiveservices.vision.spatialanalysis-persondistance this should be persondistance. |
trigger |
string | The type of trigger for sending an event. Supported values are event for sending events when the count changes or interval for sending events periodically, irrespective of whether the count has changed or not. |
output_frequency |
int | The rate at which events are egressed. When output_frequency = X, every X event is egressed, ex. output_frequency = 2 means every other event is output. The output_frequency is applicable to both event and interval. |
minimum_distance_threshold |
float | A distance in feet that will trigger a "TooClose" event when people are less than that distance apart. |
maximum_distance_threshold |
float | A distance in feet that will trigger a "TooFar" event when people are greater than that distance apart. |
aggregation_method |
string | The method for aggregate persondistance result. The aggregation_method is applicable to both mode and average. |
focus |
string | The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). |
Configuration for cognitiveservices.vision.spatialanalysis
This is an example of a JSON input for the SPACEANALYTICS_CONFIG parameter that configures a line and zone for cognitiveservices.vision.spatialanalysis. You may configure multiple lines/zones for this operation and each line/zone can have different events.
{
"lines": [
{
"name": "doorcamera",
"line": {
"start": {
"x": 0,
"y": 0.5
},
"end": {
"x": 1,
"y": 0.5
}
},
"events": [
{
"type": "linecrossing",
"config": {
"trigger": "event",
"threshold": 16.00,
"focus": "footprint"
}
}
]
}
],
"zones": [
{
"name": "lobbycamera",
"polygon": [[0.3, 0.3],[0.3, 0.9],[0.6, 0.9],[0.6, 0.3],[0.3, 0.3]],
"events": [
{
"type": "persondistance",
"config": {
"trigger": "event",
"output_frequency": 1,
"minimum_distance_threshold": 6.0,
"maximum_distance_threshold": 35.0,
"threshold": 16.00,
"focus": "footprint"
}
},
{
"type": "count",
"config": {
"trigger": "event",
"output_frequency": 1,
"threshold": 16.00,
"focus": "footprint"
}
},
{
"type": "zonecrossing",
"config": {
"threshold": 48.00,
"focus": "footprint"
}
},
{
"type": "zonedwelltime",
"config": {
"threshold": 16.00,
"focus": "footprint"
}
}
]
}
]
}
Camera configuration
See the camera placement guidelines to learn about more about how to configure zones and lines.
Spatial Analysis operation output
The events from each operation are egressed to Azure IoT Hub on JSON format.
JSON format for cognitiveservices.vision.spatialanalysis-personcount AI Insights
Sample JSON for an event output by this operation.
{
"events": [
{
"id": "b013c2059577418caa826844223bb50b",
"type": "personCountEvent",
"detectionIds": [
"bc796b0fc2534bc59f13138af3dd7027",
"60add228e5274158897c135905b5a019"
],
"properties": {
"personCount": 2
},
"zone": "lobbycamera",
"trigger": "event"
}
],
"sourceInfo": {
"id": "camera_id",
"timestamp": "2020-08-24T06:06:57.224Z",
"width": 608,
"height": 342,
"frameId": "1400",
"cameraCalibrationInfo": {
"status": "Calibrated",
"cameraHeight": 10.306597709655762,
"focalLength": 385.3199462890625,
"tiltupAngle": 1.0969393253326416
},
"imagePath": ""
},
"detections": [
{
"type": "person",
"id": "bc796b0fc2534bc59f13138af3dd7027",
"region": {
"type": "RECTANGLE",
"points": [
{
"x": 0.612683747944079,
"y": 0.25340268765276636
},
{
"x": 0.7185954043739721,
"y": 0.6425260577285499
}
]
},
"confidence": 0.9559211134910583,
"centerGroundPoint": {
"x": 0.0,
"y": 0.0
},
"metadata": {
"attributes": {
"face_mask": 0.99
}
}
},
{
"type": "person",
"id": "60add228e5274158897c135905b5a019",
"region": {
"type": "RECTANGLE",
"points": [
{
"x": 0.22326200886776573,
"y": 0.17830915618361087
},
{
"x": 0.34922296122500773,
"y": 0.6297955429344847
}
]
},
"confidence": 0.9389744400978088,
"centerGroundPoint": {
"x": 0.0,
"y": 0.0
},
"metadata":{
"attributes": {
"face_nomask": 0.99
}
}
}
],
"schemaVersion": "1.0"
}
| Event Field Name | Type | Description |
|---|---|---|
id |
string | Event ID |
type |
string | Event type |
detectionsId |
array | Array of size 1 of unique identifier of the person detection that triggered this event |
properties |
collection | Collection of values |
trackinId |
string | Unique identifier of the person detected |
zone |
string | The "name" field of the polygon that represents the zone that was crossed |
trigger |
string | The trigger type is 'event' or 'interval' depending on the value of trigger in SPACEANALYTICS_CONFIG |
| Detections Field Name | Type | Description |
|---|---|---|
id |
string | Detection ID |
type |
string | Detection type |
region |
collection | Collection of values |
type |
string | Type of region |
points |
collection | Top left and bottom right points when the region type is RECTANGLE |
confidence |
float | Algorithm confidence |
face_mask |
float | The attribute confidence value with range (0-1) indicates the detected person is wearing a face mask |
face_nomask |
float | The attribute confidence value with range (0-1) indicates the detected person is not wearing a face mask |
| SourceInfo Field Name | Type | Description |
|---|---|---|
id |
string | Camera ID |
timestamp |
date | UTC date when the JSON payload was emitted |
width |
int | Video frame width |
height |
int | Video frame height |
frameId |
int | Frame identifier |
cameraCallibrationInfo |
collection | Collection of values |
status |
string | The status of the calibration in the format of state[;progress description]. The state can be Calibrating, Recalibrating (if recalibration is enabled), or Calibrated. The progress description part is only valid when it is in Calibrating and Recalibrating state, which is used to show the progress of current calibration process. |
cameraHeight |
float | The height of the camera above the ground in feet. This is inferred from auto-calibration. |
focalLength |
float | The focal length of the camera in pixels. This is inferred from auto-calibration. |
tiltUpAngle |
float | The camera tilt angle from vertical. This is inferred from auto-calibration. |
JSON format for cognitiveservices.vision.spatialanalysis-personcrossingline AI Insights
Sample JSON for detections output by this operation.
{
"events": [
{
"id": "3733eb36935e4d73800a9cf36185d5a2",
"type": "personLineEvent",
"detectionIds": [
"90d55bfc64c54bfd98226697ad8445ca"
],
"properties": {
"trackingId": "90d55bfc64c54bfd98226697ad8445ca",
"status": "CrossLeft"
},
"zone": "doorcamera"
}
],
"sourceInfo": {
"id": "camera_id",
"timestamp": "2020-08-24T06:06:53.261Z",
"width": 608,
"height": 342,
"frameId": "1340",
"imagePath": ""
},
"detections": [
{
"type": "person",
"id": "90d55bfc64c54bfd98226697ad8445ca",
"region": {
"type": "RECTANGLE",
"points": [
{
"x": 0.491627341822574,
"y": 0.2385801348769874
},
{
"x": 0.588894994635331,
"y": 0.6395559924387793
}
]
},
"confidence": 0.9005028605461121,
"metadata": {
"attributes": {
"face_mask": 0.99
}
}
}
],
"schemaVersion": "1.0"
}
| Event Field Name | Type | Description |
|---|---|---|
id |
string | Event ID |
type |
string | Event type |
detectionsId |
array | Array of size 1 of unique identifier of the person detection that triggered this event |
properties |
collection | Collection of values |
trackinId |
string | Unique identifier of the person detected |
status |
string | Direction of line crossings, either 'CrossLeft' or 'CrossRight'. Direction is based on imagining standing at the "start" facing the "end" of the line. CrossRight is crossing from left to right. CrossLeft is crossing from right to left. |
orientationDirection |
string | The orientation direction of the detected person after crossing the line. The value can be 'Left', 'Right, or 'Straight'. This value is output if enable_orientation is set to True in CAMERACALIBRATOR_NODE_CONFIG |
zone |
string | The "name" field of the line that was crossed |
| Detections Field Name | Type | Description |
|---|---|---|
id |
string | Detection ID |
type |
string | Detection type |
region |
collection | Collection of values |
type |
string | Type of region |
points |
collection | Top left and bottom right points when the region type is RECTANGLE |
groundOrientationAngle |
float | The clockwise radian angle of the person's orientation on the inferred ground plane |
mappedImageOrientation |
float | The projected clockwise radian angle of the person's orientation on the 2D image space |
speed |
float | The estimated speed of the detected person. The unit is foot per second (ft/s) |
confidence |
float | Algorithm confidence |
face_mask |
float | The attribute confidence value with range (0-1) indicates the detected person is wearing a face mask |
face_nomask |
float | The attribute confidence value with range (0-1) indicates the detected person is not wearing a face mask |
| SourceInfo Field Name | Type | Description |
|---|---|---|
id |
string | Camera ID |
timestamp |
date | UTC date when the JSON payload was emitted |
width |
int | Video frame width |
height |
int | Video frame height |
frameId |
int | Frame identifier |
Important
The AI model detects a person irrespective of whether the person is facing towards or away from the camera. The AI model doesn't run face recognition and doesn't emit any biometric information.
JSON format for cognitiveservices.vision.spatialanalysis-personcrossingpolygon AI Insights
Sample JSON for detections output by this operation with zonecrossing type SPACEANALYTICS_CONFIG.
{
"events": [
{
"id": "f095d6fe8cfb4ffaa8c934882fb257a5",
"type": "personZoneEnterExitEvent",
"detectionIds": [
"afcc2e2a32a6480288e24381f9c5d00e"
],
"properties": {
"trackingId": "afcc2e2a32a6480288e24381f9c5d00e",
"status": "Enter",
"side": "1"
},
"zone": "queuecamera"
}
],
"sourceInfo": {
"id": "camera_id",
"timestamp": "2020-08-24T06:15:09.680Z",
"width": 608,
"height": 342,
"frameId": "428",
"imagePath": ""
},
"detections": [
{
"type": "person",
"id": "afcc2e2a32a6480288e24381f9c5d00e",
"region": {
"type": "RECTANGLE",
"points": [
{
"x": 0.8135572734631991,
"y": 0.6653949670624315
},
{
"x": 0.9937645761590255,
"y": 0.9925406829655519
}
]
},
"confidence": 0.6267998814582825,
"metadata": {
"attributes": {
"face_mask": 0.99
}
}
}
],
"schemaVersion": "1.0"
}
Sample JSON for detections output by this operation with zonedwelltime type SPACEANALYTICS_CONFIG.
{
"events": [
{
"id": "f095d6fe8cfb4ffaa8c934882fb257a5",
"type": "personZoneDwellTimeEvent",
"detectionIds": [
"afcc2e2a32a6480288e24381f9c5d00e"
],
"properties": {
"trackingId": "afcc2e2a32a6480288e24381f9c5d00e",
"status": "Exit",
"side": "1",
"dwellTime": 7132.0,
"dwellFrames": 20
},
"zone": "queuecamera"
}
],
"sourceInfo": {
"id": "camera_id",
"timestamp": "2020-08-24T06:15:09.680Z",
"width": 608,
"height": 342,
"frameId": "428",
"imagePath": ""
},
"detections": [
{
"type": "person",
"id": "afcc2e2a32a6480288e24381f9c5d00e",
"region": {
"type": "RECTANGLE",
"points": [
{
"x": 0.8135572734631991,
"y": 0.6653949670624315
},
{
"x": 0.9937645761590255,
"y": 0.9925406829655519
}
]
},
"confidence": 0.6267998814582825,
"metadataType": "",
"metadata": {
"groundOrientationAngle": 1.2,
"mappedImageOrientation": 0.3,
"speed": 1.2
},
}
],
"schemaVersion": "1.0"
}
| Event Field Name | Type | Description |
|---|---|---|
id |
string | Event ID |
type |
string | Event type. The value can be either personZoneDwellTimeEvent or personZoneEnterExitEvent |
detectionsId |
array | Array of size 1 of unique identifier of the person detection that triggered this event |
properties |
collection | Collection of values |
trackinId |
string | Unique identifier of the person detected |
status |
string | Direction of polygon crossings, either 'Enter' or 'Exit' |
side |
int | The number of the side of the polygon that the person crossed. Each side is a numbered edge between the two vertices of the polygon that represents your zone. The edge between the first two vertices of the polygon represent first side. 'Side' is empty when the event isn't associated with a specific side due to occlusion. For example, an exit occurred when a person disappears but wasn't seen crossing a side of the zone or an enter occurred when a person appeared in the zone but wasn't seen crossing a side. |
dwellTime |
float | The number of milliseconds that represent the time the person spent in the zone. This field is provided when the event type is personZoneDwellTimeEvent |
dwellFrames |
int | The number of frames that the person spent in the zone. This field is provided when the event type is personZoneDwellTimeEvent |
dwellTimeForTargetSide |
float | The number of milliseconds that represent the time the person spent in the zone and were facing to the target_side. This field is provided when enable_orientation is True in CAMERACALIBRATOR_NODE_CONFIG and the value of target_side is set in SPACEANALYTICS_CONFIG |
avgSpeed |
float | The average speed of the person in the zone. The unit is foot per second (ft/s) |
minSpeed |
float | The minimum speed of the person in the zone. The unit is foot per second (ft/s) |
zone |
string | The "name" field of the polygon that represents the zone that was crossed |
| Detections Field Name | Type | Description |
|---|---|---|
id |
string | Detection ID |
type |
string | Detection type |
region |
collection | Collection of values |
type |
string | Type of region |
points |
collection | Top left and bottom right points when the region type is RECTANGLE |
groundOrientationAngle |
float | The clockwise radian angle of the person's orientation on the inferred ground plane |
mappedImageOrientation |
float | The projected clockwise radian angle of the person's orientation on the 2D image space |
speed |
float | The estimated speed of the detected person. The unit is foot per second (ft/s) |
confidence |
float | Algorithm confidence |
face_mask |
float | The attribute confidence value with range (0-1) indicates the detected person is wearing a face mask |
face_nomask |
float | The attribute confidence value with range (0-1) indicates the detected person is not wearing a face mask |
JSON format for cognitiveservices.vision.spatialanalysis-persondistance AI Insights
Sample JSON for detections output by this operation.
{
"events": [
{
"id": "9c15619926ef417aa93c1faf00717d36",
"type": "personDistanceEvent",
"detectionIds": [
"9037c65fa3b74070869ee5110fcd23ca",
"7ad7f43fd1a64971ae1a30dbeeffc38a"
],
"properties": {
"personCount": 5,
"averageDistance": 20.807043981552123,
"minimumDistanceThreshold": 6.0,
"maximumDistanceThreshold": "Infinity",
"eventName": "TooClose",
"distanceViolationPersonCount": 2
},
"zone": "lobbycamera",
"trigger": "event"
}
],
"sourceInfo": {
"id": "camera_id",
"timestamp": "2020-08-24T06:17:25.309Z",
"width": 608,
"height": 342,
"frameId": "1199",
"cameraCalibrationInfo": {
"status": "Calibrated",
"cameraHeight": 12.9940824508667,
"focalLength": 401.2800598144531,
"tiltupAngle": 1.057669997215271
},
"imagePath": ""
},
"detections": [
{
"type": "person",
"id": "9037c65fa3b74070869ee5110fcd23ca",
"region": {
"type": "RECTANGLE",
"points": [
{
"x": 0.39988183975219727,
"y": 0.2719132942065858
},
{
"x": 0.5051516984638414,
"y": 0.6488402517218339
}
]
},
"confidence": 0.948630690574646,
"centerGroundPoint": {
"x": -1.4638760089874268,
"y": 18.29732322692871
},
"metadataType": ""
},
{
"type": "person",
"id": "7ad7f43fd1a64971ae1a30dbeeffc38a",
"region": {
"type": "RECTANGLE",
"points": [
{
"x": 0.5200299714740954,
"y": 0.2875368218672903
},
{
"x": 0.6457497446160567,
"y": 0.6183311060855263
}
]
},
"confidence": 0.8235412240028381,
"centerGroundPoint": {
"x": 2.6310102939605713,
"y": 18.635927200317383
},
"metadataType": ""
}
],
"schemaVersion": "1.0"
}
| Event Field Name | Type | Description |
|---|---|---|
id |
string | Event ID |
type |
string | Event type |
detectionsId |
array | Array of size 1 of unique identifier of the person detection that triggered this event |
properties |
collection | Collection of values |
personCount |
int | Number of people detected when the event was emitted |
averageDistance |
float | The average distance between all detected people in feet |
minimumDistanceThreshold |
float | The distance in feet that will trigger a "TooClose" event when people are less than that distance apart. |
maximumDistanceThreshold |
float | The distance in feet that will trigger a "TooFar" event when people are greater than distance apart. |
eventName |
string | Event name is TooClose with the minimumDistanceThreshold is violated, TooFar when maximumDistanceThreshold is violated, or unknown when auto-calibration hasn't completed |
distanceViolationPersonCount |
int | Number of people detected in violation of minimumDistanceThreshold or maximumDistanceThreshold |
zone |
string | The "name" field of the polygon that represents the zone that was monitored for distancing between people |
trigger |
string | The trigger type is 'event' or 'interval' depending on the value of trigger in SPACEANALYTICS_CONFIG |
| Detections Field Name | Type | Description |
|---|---|---|
id |
string | Detection ID |
type |
string | Detection type |
region |
collection | Collection of values |
type |
string | Type of region |
points |
collection | Top left and bottom right points when the region type is RECTANGLE |
confidence |
float | Algorithm confidence |
centerGroundPoint |
2 float values | x, y values with the coordinates of the person's inferred location on the ground in feet. x and y are coordinates on the floor plane, assuming the floor is level. The camera's location is the origin. |
When calculating centerGroundPoint, x is the distance from the camera to the person along a line perpendicular to the camera image plane. y is the distance from the camera to the person along a line parallel to the camera image plane.

In this example, centerGroundPoint is {x: 4, y: 5}. This means there's a person 4 feet away from the camera and 5 feet to the right, looking at the room top-down.
| SourceInfo Field Name | Type | Description |
|---|---|---|
id |
string | Camera ID |
timestamp |
date | UTC date when the JSON payload was emitted |
width |
int | Video frame width |
height |
int | Video frame height |
frameId |
int | Frame identifier |
cameraCallibrationInfo |
collection | Collection of values |
status |
string | The status of the calibration in the format of state[;progress description]. The state can be Calibrating, Recalibrating (if recalibration is enabled), or Calibrated. The progress description part is only valid when it is in Calibrating and Recalibrating state, which is used to show the progress of current calibration process. |
cameraHeight |
float | The height of the camera above the ground in feet. This is inferred from auto-calibration. |
focalLength |
float | The focal length of the camera in pixels. This is inferred from auto-calibration. |
tiltUpAngle |
float | The camera tilt angle from vertical. This is inferred from auto-calibration. |
JSON format for cognitiveservices.vision.spatialanalysis AI Insights
Output of this operation depends on configured events, for example if the there is a zonecrossing event configured for this operation then output will be same as cognitiveservices.vision.spatialanalysis-personcrossingpolygon.
Use the output generated by the container
You may want to integrate Spatial Analysis detection or events into your application. Here are a few approaches to consider:
- Use the Azure Event Hub SDK for your chosen programming language to connect to the Azure IoT Hub endpoint and receive the events. See Read device-to-cloud messages from the built-in endpoint for more information.
- Set up Message Routing on your Azure IoT Hub to send the events to other endpoints or save the events to your data storage. See IoT Hub Message Routing for more information.
- Setup an Azure Stream Analytics job to process the events in real-time as they arrive and create visualizations.
Deploying Spatial Analysis operations at scale (multiple cameras)
In order to get the best performance and utilization of the GPUs, you can deploy any Spatial Analysis operations on multiple cameras using graph instances. Below is a sample for running the cognitiveservices.vision.spatialanalysis-personcrossingline operation on fifteen cameras.
"properties.desired": {
"globalSettings": {
"PlatformTelemetryEnabled": false,
"CustomerTelemetryEnabled": true
},
"graphs": {
"personzonelinecrossing": {
"operationId": "cognitiveservices.vision.spatialanalysis-personcrossingline",
"version": 1,
"enabled": true,
"sharedNodes": {
"shared_detector0": {
"node": "PersonCrossingLineGraph.detector",
"parameters": {
"DETECTOR_NODE_CONFIG": "{ \"gpu_index\": 0, \"batch_size\": 7, \"do_calibration\": true}",
}
},
"shared_calibrator0": {
"node": "PersonCrossingLineGraph/cameracalibrator",
"parameters": {
"CAMERACALIBRATOR_NODE_CONFIG": "{ \"gpu_index\": 0, \"do_calibration\": true, \"enable_zone_placement\": true}",
"CALIBRATION_CONFIG": "{\"enable_recalibration\": true, \"quality_check_frequency_seconds\": 86400}",
}
},
"parameters": {
"VIDEO_DECODE_GPU_INDEX": 0,
"VIDEO_IS_LIVE": true
},
"instances": {
"1": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 1>",
"VIDEO_SOURCE_ID": "camera 1",
"SPACEANALYTICS_CONFIG": "{\"zones\":[{\"name\":\"queue\",\"polygon\":[[0,0],[1,0],[0,1],[1,1],[0,0]]}]}"
}
},
"2": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 2>",
"VIDEO_SOURCE_ID": "camera 2",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"3": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 3>",
"VIDEO_SOURCE_ID": "camera 3",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"4": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 4>",
"VIDEO_SOURCE_ID": "camera 4",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"5": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 5>",
"VIDEO_SOURCE_ID": "camera 5",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"6": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 6>",
"VIDEO_SOURCE_ID": "camera 6",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"7": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 7>",
"VIDEO_SOURCE_ID": "camera 7",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"8": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 8>",
"VIDEO_SOURCE_ID": "camera 8",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"9": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 9>",
"VIDEO_SOURCE_ID": "camera 9",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"10": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 10>",
"VIDEO_SOURCE_ID": "camera 10",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"11": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 11>",
"VIDEO_SOURCE_ID": "camera 11",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"12": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 12>",
"VIDEO_SOURCE_ID": "camera 12",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"13": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 13>",
"VIDEO_SOURCE_ID": "camera 13",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"14": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 14>",
"VIDEO_SOURCE_ID": "camera 14",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
},
"15": {
"sharedNodeMap": {
"PersonCrossingLineGraph/detector": "shared_detector0",
"PersonCrossingLineGraph/cameracalibrator": "shared_calibrator0",
},
"parameters": {
"VIDEO_URL": "<Replace RTSP URL for camera 15>",
"VIDEO_SOURCE_ID": "camera 15",
"SPACEANALYTICS_CONFIG": "<Replace the zone config value, same format as above>"
}
}
}
},
}
}
| Name | Type | Description |
|---|---|---|
batch_size |
int | If all of the cameras have the same resolution, set batch_size to the number of cameras that will be used in that operation, otherwise, set batch_size to 1 or leave it as default (1), which indicates no batch is supported. |