April 2018

Volume 33 Number 4

[Machine Learning]

Sensors in Sports: Analyzing Human Movement with AI

By Kevin Ashley

In the future, athletes will likely be able to open their phones and ask a simple question: “What do I need to do to improve my skills?” We’re still making early steps in sports AI toward answering that fundamental question, but we hope that the productivity Microsoft tools and research are bringing will one day make this an everyday scenario. With many sports, it’s difficult for the human eye to observe all the movements an athlete might make during the course of an activity, but it’s possible to record even unobservable data with sensors. And by using machine learning (ML) on this data, the athlete and coach can learn and improve based on precise measurements and analytics. The instrumented athlete is becoming the new competitive advantage.

If the current trend continues, in a few years most sports equipment sold in stores will have a smart sensor embedded. Electronics are becoming smaller, lighter and more flexible, and it’s likely we’ll see them embedded in fabrics, shoes, skis, tennis racquets and other types of smart gear. You’ll be able to determine how to apply technology and skills learned in Internet of Things (IoT), mobile apps, Microsoft Azure and ML to sports.

To make adopting this technology easier, we’ve created an open source Sensor Kit, with components to process, measure, analyze and improve sensor measurements of athletic performance. Over time, our aim is to evolve this community Sensor Kit together with electronics and sports equipment companies, and sports associations and enthusiasts. The Sensor Kit and all examples for this article are available at bit.ly/2CmQhzq. This includes code samples for R, Python, C#, Xamarin and Azure Cosmos DB. Figure 1 shows the Winter Sports mobile app, which showcases the use of Sensor Kit and is available for download at winter-sports.co.

The Winter Sports Mobile App with Sensor Kit Integration, Illustrating the Forces Impacting a Skier
Figure 1 The Winter Sports Mobile App with Sensor Kit Integration, Illustrating the Forces Impacting a Skier

The recent dramatic increase in compute power, reliability and affordability of sensor-equipped hardware has made many scenarios newly viable. And advances in applications of AI on sensor signals produced by athletes deliver new ways to understand and improve athletic performance. For example, an athlete’s sensor signals provide interpretable “activity signatures,” as shown in Figure 2, which allow sports analytics to go beyond gross activity tracking and aggregates, and to measure key elements of a skill or activity. From acceleration generated through a specific turn, to directional g-forces engendered during each millisecond, the analytics of sports is being redefined. In this article, we detail how we detect these activity signatures; in this case, the activity in question is turns made while skiing or snowboarding.

Activity Signature for Turns in Skiing or Snowboarding
Figure 2 Activity Signature for Turns in Skiing or Snowboarding

Using Azure Artificial Intelligence (AI) in the cloud, we ingest the sensor data, transform it for rich analytics, and tap ML to extract even more useful information for the athlete and coach. ML models make it possible to classify expertise level at each skill execution, and even perhaps predict the progress and future performance of an athlete at an upcoming competitive event. The beauty of shared data standards is this allows athletes to benchmark relative to themselves or their community to understand differences, weak points and advantages. And with new advances in the ability to implement AI at the edge, we can push activity recognition, predictive model scoring and performance measures out to the device, for rapid availability to the athlete on their device or in a mixed reality display. We hope this set of open source resources nurtures and accelerates innovation in the sports community.

Sensor Kit Overview

The Sensor Kit is a set of open source cross-platform data productivity tools, including working code samples for data ingestion, analysis and ML, as well as sensor hardware reference designs. The Kit helps sports scientists, coaches and athletes capture an athlete’s movement with millisecond precision, and ML models evaluate the data for movement analysis. The Kit is designed to help equipment manufacturers, data and sports scientists, and coaches interested in modern methods of sports science, including ML and AI. Figure 3 gives a high-level view of the elements of the Sensor Kit. In the sections that follow, we describe many elements of the Sensor Kit, from hardware, data ingestion and data transformation, to analytics, ML and presentation.

Sensor Kit for Sports Applications
Figure 3 Sensor Kit for Sports Applications

The Kit is designed to allow the sports enthusiast or professional to use parts or all of its components. For example, the analyst may simply want a way to ingest the sensor data signals and transform that data into an analytics-ready format. Data pipeline code samples are available for several typical raw data formats. Or the analyst may want to go further to leverage logic or code to transform the data into consumable analytics reports or predictive ML models. We offer several code sets that recognize specific athletic activities and the performance measures of those activities, for example, turns and the acceleration realized out of the turn. The Sensor Kit SDK is written with cross-platform development in mind, as a cross-platform .NET Standard 2.0 C# library and it compiles with Xamarin, so you can use it in Android, iOS and Windows apps.

Sensor Hardware

Figure 4 shows the Sensor Kit hardware. What is a sensor? Hardware manufacturers usually refer to sensors as single-purpose measuring devices. In this article, a sensor is an inertial measurement unit (IMU) device powerful enough to take and process data from multiple sensory inputs, store the results and transmit them back to the host or a gateway. The sensor system consists of:

  • An IMU capable of providing 9-DOF (degrees of freedom) accelerometer, gyro and magnetometer data at 40-100Hz.
  • Bluetooth Low Energy (BLE) or other means of wireless communication.
  • Storage adequate for aggregate or raw sensor data.
  • Other sensors, such as those for proximity, location, and so forth.

Sensor Kit Hardware
Figure 4 Sensor Kit Hardware

The Data Model and Pipeline

As the saying goes, a journey of a thousand miles begins with the first step. And so we start with in-depth code examples to ingest, transform and prepare the data for later analysis and presentation.  We describe two foundational elements here, designing the data structure or “data model,” and designing the pipelining of data from the device to storage, then transforming and presenting the information to the athlete and coach.

Let’s start with the data model. The Sensor Kit has three modes of data aggregation—summary, event level and raw logging:

  • Summary data is pre-aggregated data on the sensor. In most retail sensor scenarios, this is a relatively small payload that’s transferred with each dataset.
  • Event-level data is triggered by specific events, such as turns, jumps and so forth. Event-level data may have several hundred or thousand records per session. These events are based on predefined activity signatures derived from the sensor data. They’re described in more detail in the turn “Detecting Activity Signatures from Sensor Data” section.
  • Raw logging is best for recording data at high frequency, typically the top frequency the sensor can provide—40-100Hz or more.

Depending on the sample rate of the sensor, aggregation of the raw logging data may be necessary to allow collection of data from the device in near-real time. In broader consumer mass market scenarios, producing, storing and transmitting so much granular data can be excessive, so we defined a standard mode in the Sensor Kit to transmit slightly aggregated data to reduce data throughput requirements. If you need all the detailed data, the Sensor Kit allows you to enable verbose logging using the SetLogging(true) call.

Connecting to Sensors

Let’s start the description of data pipelining from the sensor itself. Initializing the Sensor Kit is easy, by simply calling the Init method:


The Sensor Kit works in both pull and push modes; with pull mode the app needs to explicitly call the Sync method on a sensor, while push mode automatically registers for sensor notification. To enable push updates, you can call SetAutoUpdates(true). To subscribe to sensor updates, use the Subscribe method:

await Task.Run(async () =>
  await sensor.Instance.Subscribe();

The Sensor Kit consumes data from sensors, provides methods for time synchronization and sends the data to the cloud. Time synchronization is important, especially when athletes can have multiple sensors attached, and the Sensor Kit automatically resolves the timestamp on hardware devices to the time on the host device with the Sensor Kit-enabled app. The method to store data in the cloud is up to the app developer; the library provides a Cosmos DB connector and Azure containers for convenience, as shown in Figure 5.

The Sensor Kit Library in Visual Studio
Figure 5 The Sensor Kit Library in Visual Studio

The Sensor Kit delivers some transformed data events and aggregates from the device itself. The following list of item schema describes the taxonomy of these data elements:

  • SensorItem: Single-sensor data item; items can be of any duration or granularity
  • SensorTurnData: Aggregated data for turns
  • SensorAirData: Aggregated data for jumps
  • SensorSummaryData: Summary data aggregated per sensor
  • SensorRawData: High-frequency raw data (for example, 100Hz)
  • UserData*: User-level information (optional, app specific)
  • TeamData*: Team-level data for teams of athletes (optional, app specific)

Storing Sensor Data in Cosmos DB

Of course, there are many options when it comes to loading data into the cloud. Cosmos DB is great for IoT and telemetry data, as shown in Figure 6, and provides multiple APIs for loading and querying data, as well as scalability and global distribution.

Cosmos DB with Sensor Kit Documents
Figure 6 Cosmos DB with Sensor Kit Documents

The Sensor Kit includes a Cosmos DB connector and Azure Functions for storage containers, which you can find at bit.ly/2GEB5Mk. You can easily update data from the sensors with the Sensor Kit connected to Cosmos DB by using the following method:

await SensorKit.AzureConnectorInstance.InsertUserSensorDataAsync(userSensorData);

Athletes can have multiple sensors, and the Sensor Kit aggregates sensor data at the athlete level and updates Cosmos DB with new data. Once the data is in Cosmos DB, it’s easy to query that data via multiple interfaces, both SQL and non-SQL. For example, you can use Microsoft Power BI to create a specialized coach’s view of the data from the sensors of every athlete on the team, or use the mobile app to present the data. This following query returns summary data from each sensor found via the Sensor Kit, as shown in Figure 7:


SELECT * FROM Items.sensorSummary c

Cosmos DB Query Results for Sensor Kit Summary Data
Figure 7 Cosmos DB Query Results for Sensor Kit Summary Data

Once the data is uploaded to Azure, you can use ML to train your models or process the data.

Now that you know how the data gets into the cloud, let’s focus on the logical part of sensor data analysis and, more specifically, the analysis for the sports domain. From working with coaches, we determined that they’re interested in detecting events, such as turns, and load from g-forces experienced by athletes during runs. Let’s see how we can parse collected data to detect skiing turns.

Detecting Activity Signatures from Sensor Data

The main trajectory of the skier is aligned with movement of his center of mass, so we placed a sensor in the middle of the pelvis, inside the pocket in the ski jacket. We received data from both an accelerometer and a gyroscope. Using information from the sensor allowed us to analyze the athlete’s movement and define an activity signature.

Our data consists of accelerometer and gyro values stored in a sample text file (bit.ly/2GJkk2w). Our 9-DOF sensors give us a 3D acceleration and angular velocity vectors from the accelerometer and gyroscope, respectively, sampled at approximately 100Hz. To analyze the data we load it into RStudio. You can use our parsing code, which is available at bit.ly/2GLc5mG, and load our sample files.

Working with 9-DOF sensors requires thorough calibration of the sensor, which is a very tedious procedure. Our goal in this case was simply to calculate the number of turns, which doesn’t require performing a precise movement analysis along a specific axis. For this article, and to simplify our calculations, we’re using the magnitude of the acceleration and angular velocity.

Because our experiment involves detecting turns, we’re only somewhat interested when the athlete is moving. When the athlete stands still, the accelerometer sensor shows almost a flat line, while the angular velocity might still be changing. When the actual movement starts, the amplitude of the acceleration and gyro values changes rapidly, so the standard deviation increases. As Figure 8 shows, we can define an activity starting point at the beginning of the data where accelerometer values exceed a fixed threshold. The ending point will be toward the end of the data where accelerometer goes below that threshold.

The Start and End of Acceleration Activity
Figure 8 The Start and End of Acceleration Activity

It’s well known that accelerometer sensor data is very noisy, and using a moving average calculation yields a smoother signal. Let’s define a point in time when the standard deviation exceeds the threshold as a percentage of the average value of the signal as a starting point:

a_smooth1 <- SlidingAvg(a,lag,threshold,influence)
st<-which(a_smooth1$std > thresholdPct* a_smooth1$avg)

One way to detect peaks in data is to use the property that the magnitude of a peak must be greater than its immediate neighbors. Here’s how we calculate the magnitude of the accelerometer and gyro values:

lag       <- 30
threshold <- 1.5
influence <- 0.5
print (paste("Total peaks in A = ", length(res$pks_a), " peaks in W = ", length(res$pks_w) ))

which results in:

> res<-SmoothAndFindPeaks(df$magnitudeW,df$magnitudeA,lag,threshold,influence)
[1] "Calculation for activity start = 3396 end = 4239"
> print (paste("Total peaks in A = ”, length(res$pks_a), " peaks in W = ", length(res$pks_w) ))
[1] "Total peaks in A = 29 peaks in W = 22"

These results are illustrated in Figure 9.

Finding Peaks in Accelerometer and Gyro Values
Figure 9 Finding Peaks in Accelerometer and Gyro Values

Observation of the movement of athletes while turning shows that every time during the transition phase of the turn, the magnitude goes to a lower value, and at the apex of the turn the magnitude gets to its maximum. This means that if we count peaks in the “moving” segment of the data we’ll get our desired number. The gyro data is also much cleaner and we’ll use angular velocity magnitude to calculate peaks to determine the turns. Afterward, we need to get rid of peaks that are too close to each other.

To get low noise data that’s stable over time, we could use a complementary filter that combines the long-term stability of the accelerator with the short-term accuracy of the gyroscope.

Now that we have R code for turn detection, we can build an ML training model in a process similar to the one that’s described in a Machine Learning Blog post at bit.ly/2EPosSa.

Measuring Athlete Load with G-Forces

Athletic load in a workout goes beyond simply the aggregate measures of distance traveled or total activities completed. A key aspect of understanding an athlete’s stress and workout quality involves measuring the load on the athlete. There are several components to athletic load. One of the most important involves the g-forces generated and experienced by the athlete, which creates forces they need to control over the course of an activity like a ski turn, as well as jerks and snaps generated from the terrain or their motion that they must accommodate.

Humans have limits related to g-forces they can tolerate. These limits depend on the amount of time the stress is experienced and whether it’s a low intensity over a long duration or a high intensity for short durations. And these limits depend as well on the direction in which that g-force is felt. For example, humans are much more capable of tolerating high vertical-direction g-force (the “z” access) rather than a high lateral-direction g-force (the “y” access) with its stress on the neck, back and joints. Fortunately, there have been extensive field studies of g-force effects and limits from the aviation industry and from NASA that we can leverage to measure and characterize g-force and g-force tolerances. Let’s delve into g-forces on the human body, how we measure them, and how to represent the load they create on the athlete.

G-Force Calculations from Acceleration Sensor Measures Calculating directionless g-force on the human body with our sensor acceleration measures is as follows using the Pythagorean theorem:

Directionless G-Force = Math.sqrt(AccelerationX^2 + AccelerationY^2 + AccelerationZ^2)

G-force in a given direction is calculated by dividing acceleration by 9.81, assuming acceleration is measured in meters per second squared.  For example, Y-direction g-force is calculated as follows:

Y G-Force = AccelerationY / 9.81

Of course, an important component of the g-force load on the athlete is the time period during which it’s felt.

G-Force Maximum Tolerances by Duration For athletic activity, a useful comparison is the g-force maximums that the human body can tolerate in each direction. And while these g-forces reflect maximums for the center of gravity, rather than for a particular joint, they offer a useful way to express the g-forces experienced by an athlete as a percentage of these maximums. Given that we’re able to measure g-force durations at the sample rate of the sensor, in anywhere from 10 samples per second (10hz) to 100 samples per second (100hz), we can characterize the athlete’s g-force load as a percentage of maximum for any given duration.

Python Code to Calculate G-Forces As noted earlier, the formula just requires the Acceleration X, Y and Z measures from the sensor. And you need to understand the scale in which it’s measured, typically either in feet per second squared or in meters per second squared. From these elements, we can calculate g-force in a specific direction, as well as directionless g-force being experienced overall by the athlete’s body. Thanks to our partner XSens, we use high-precision sensors to record data movements at 100 samples per second (100hz) and load them into Azure.

Figure 10 shows a capture from the sensors mapped to a 3D animation. As you can see, the skier is experiencing g-force in the Y-direction, as well as in the X-direction as he travels down the hill. Calculating these forces, in combination with the acceleration achieved coming out of that turn, allows athletes to better understand their performance at navigating that g-force through the turn.

G-Force Load Is Different Along Different Axes, as Visualized by XSens Software
Figure 10 G-Force Load Is Different Along Different Axes, as Visualized by XSens Software

Figure 11 presents the Python code for calculating g-force.

Figure 11 Python Code for Calculating G-Force

# G-Force, Using Python 3.5+
#Directionless g-force (gg = Math.sqrt(Accx * Accx + Accy * Accy + Accz * Accz))  <pythagorean theorem>
#Assuming accelerometer is in meters per second squared, gforce measurement by dividing by 9.81
#1 acceleration of gravity [g] = 9.80664999999998 meter/second² [m/s²]
#our data is in feet per second so we use the converstion 1 ft/s2 = 0.3048 m/s2
#Set Conversion Metrics
G_conversion = (9.80664999999998)
MperS_conversion = (.3048)  #from above
#Replace na's with zeroes to avoid math errors
#Using acceleration measures from 'dataset' dataframe, we convert to meters per second squared
#In our case our acceleration variables are labeled AccX, AccY and AccZ
#In our case, acceleration was in feet per second, so we needed to apply a conversion.
dataset["AccX_mtrpersecsqrd"] = dataset["AccX"]/MperS_conversion]
dataset["AccY_mtrpersecsqrd"] = dataset["AccY"]/MperS_conversion]
dataset["AccZ_mtrpersecsqrd"] = dataset["AccZ"]/MperS_conversion]
#Generate Directionless G-Force measure, call it 'DirectionlessGG'
dataset["DirectionlessGG"] = ((dataset["AccX"]*dataset["AccX"])+­(dataset["AccY"]*dataset["AccY"])+dataset["AccZ"]*dataset["AccZ"])).astype(float)
dataset["DirectionlessGG"] = np.sqrt(dataset["DirectionlessGG"])#.astype(float)
#Generate Direction Specific G-Force, call them 'X_GG', 'Y_GG' and 'Z_GG'.
dataset["X_GG"] = dataset["AccX"]/MperS_converstion/G_conversion
dataset["Y_GG"] = dataset["AccY"]/MperS_converstion/G_conversion
dataset["Z_GG"] = dataset["AccZ"]/MperS_converstion/G_conversion

You’ll find the script to calculate g-force, as well as additional calculations in the script including g-force relative to maximums, on our GitHub rep at bit.ly/2BPS6nA. To read more about the physics of g-forces, and their impact on humans, take a look at the “Beyond Velocity and Acceleration: Jerk, Snap and Higher Derivatives” article from the European Journal of Physics at bit.ly/­2FvLkTD, which describes g-forces relative to the context of the roller-coaster experience. And for an understanding of human tolerances and limits of g-forces, refer to the Wikipedia article at bit.ly/2EPQDjE, as well as to the NASA collection of research at go.nasa.gov/2oyS9fj. More information on calculating g-force is available at bit.ly/2Fzxpfa.

Wrapping Up

We used custom-built and partner-made sensors to collect athlete data, and illustrated the use of our open source Sensor Kit that connects sensors with mobile apps and Azure Cosmos DB. We explained how to process that data with statistical tools, such as R, to extract “activity signatures” that describe the turns athletes make while skiing. Finally, we explained how to use data from sensors to calculate an athlete’s load from g-forces using Python.

Kevin Ashley is an architect evangelist at Microsoft. He’s coauthor of “Professional Windows 8 Programming” (Wrox, 2012) and a developer of top apps and games, most notably Active Fitness (activefitness.co). He often presents on technology at various events, industry shows and webcasts. In his role, he works with startups and partners, advising on software design, business and technology strategy, architecture, and development. Follow him on Twitter: @kashleytwit.

Olga Vigdorovich is a database administrator, data scientist and an avid skier. She built the data model and back end for scalable cloud platforms based on Microsoft Azure, including Winter Sports, for Active Fitness at Summit Data Corp.

Patty Ryan is an applied data scientist for Microsoft. She codes with its partners and customers to tackle tough problems using machine learning approaches, with sensor, text and vision data. Follow her on Twitter: @singingdata.

Thanks to the following Microsoft technical expert who reviewed this article: Mona Soliman Habib

Discuss this article in the MSDN Magazine forum