Customer lifetime value (CLV) prediction (Preview)
Predict potential value (revenue) that individual active customers will bring in to your business through a defined future time period. This feature can help you achieve various goals:
- Identify high-value customers and process this insight
- Create strategical customer segments based on their potential value to run personalized campaigns with targeted sales, marketing, and support efforts
- Guide product development by focusing on features tht increase customer value
- Optimize sales or marketing strategy and allocate budget more accurately for customer outreach
- Recognize and reward high-value customers through loyalty or rewards programs
Before getting started, reflect what CLV means for your business. Currently, we support transaction-based CLV prediction. The predicted value of a customer is based on history of business transactions. To create the prediction, you need at least Contributor permissions.
Since configuring and running a CLV model doesn't take much time, consider creating several models with varying input preferences and compare model results to see which model scenario best fits your business needs.
The following data is required, and where marked optional, recommended for increased model performance. The more data the model can process, the more accurate the prediction will be. Therefore, we encourage you to ingest more customer activity data, if available.
Customer Identifier: Unique identifier to match transactions to an individual customer
Transaction History: Historical transactions log with below semantic data schema
- Transaction ID: Unique identifier of each transaction
- Transaction date: Date, preferably a time stamp of each transaction
- Transaction amount: Monetary value (for example, revenue or profit margin) of each transaction
- Label assigned to returns (optional): Boolean value signifying whether the transaction is a return
- Product ID (optional): Product ID of product involved in the transaction
Additional data (optional), for example
- Web activities: website visit history, email history
- Loyalty activities: loyalty reward points accrual and redemption history
- Customer service log, service call, complaint, or return history
Data about customer activities (optional):
- Activity identifiers to distinguish activities of the same type
- Customer identifiers to map activities to your customers
- Activity information containing the name and date of the activity
- The semantic data schema for activities includes:
- Primary key: A unique identifier for an activity
- Timestamp: The date and time of the event identified by the primary key
- Event (activity name): The name of event you want to use
- Details (amount or value): Details about the customer activity
Suggested data characteristics:
- Sufficient historical data: At least one year of transactional data. Preferably two to three years of transactional data to predict CLV for one year.
- Multiple purchases per customer: Ideally, at least two to three transactions per customer ID, preferably across multiple dates.
- Number of customers: At least 100 unique customers, preferably more than 10,000 customers. The model will fail with fewer than 100 customers and insufficient historical data
- Data completeness: Less than 20% missing values in required fields in the input data
- The model requires the transaction history of your customers. Only one transaction history entity can be configured currently. If there are multiple purchase/transaction entities, you can union them in Power Query before data ingestion.
- For additional customer activity data (optional), however, you can add as many customer activity entities as you'd like for consideration by the model.
Create a Customer Lifetime Value prediction
In audience insights, go to Intelligence > Predictions.
Select the Customer lifetime value tile and select Use model.
In the Customer lifetime value (preview) pane, select Get started.
Name this model and the Output entity name to distinguish them from other models or entities.
Define model preferences
Set a Prediction time period to define how far into the future you want to predict the CLV.
By default, the unit is set as months. You can change it to years to look further in the future.
To accurately predict CLV for the time period you set, you need a comparable period of historical data. For example, if you want to predict CLV for the next 12 months, it is recommended that you have at least 18 – 24 months of historical data.
Specify what Active customers mean for your business. Set the time frame in which a customer must have had at least one transaction to be considered active. The model will only predict CLV for active customers.
- Let model calculate purchase interval (recommended): The model analyzes your data and determines a time period based on historical purchases.
- Set interval manually: If you have a specific business definition of an active customer, choose this option and set the time period accordingly.
Define percentile of High-value customer to enable the model to provide results that fit your business definition.
- Model calculation (recommended): The model analyzes your data and determines what a high value customer might be for your business based on your customers’ transaction history. The model uses a heuristic rule (inspired by the 80/20 rule or pareto principle) to find the proportion of high-value customers. The percentage of customers that contributed to 80% cumulative revenue for your business in the historical period are considered high-value customers. Typically, less than 30-40% customers contribute to 80% cumulative revenue. However, this number might vary depending on your business and industry.
- Percent of top active customers: Define high-value customers for your business as a percentile of top active paying customers. For example, you can use this option to define high-value customers as top 20% of future paying customers.
If your business defines high value customers in a different way, let us know as we would love to hear.
Select Next to proceed to the next step.
Add required data
In the Required data step, select Add data for Customer transaction history and choose the entity that provides the transaction history information as described in the prerequisites.
Map the semantic fields to attributes within your purchase history entity and select Next.
If the fields below aren't populated, configure the relationship from your purchase history entity to the Customer entity and select Save.
- Select the transaction history entity.
- Select the field that identifies a customer in the purchase history entity. It needs to relate to the primary customer ID of your Customer entity.
- Select the entity that matches your primary customer entity.
- Enter a name that describes the relationship.
Add optional data
Data reflecting key customer interactions (like web, customer service, and event logs) adds context to transaction records. More patterns found in your customer activity data can improve the accuracy of the predictions.
In the Additional data (optional) step, select Add data. Choose the customer activity entity that provides the customer activity information as described in the prerequisites.
Map the semantic fields to attributes within your customer activity entity and select Next.
Select an activity type that matches the type of customer activity you're adding. Choose from existing activity types or add a new activity type.
Configure the relationship from your customer activity entity to the Customer entity.
- Select the field that identifies the customer in the customer activity table. It can be directly related to the primary customer ID of your Customer entity.
- Select the Customer entity that matches your primary Customer entity.
- Enter a name that describes the relationship.
Add more data if there are other customer activities you want to include.
Set update schedule
In the Data update schedule step, choose the frequency to retrain your model based on the latest data. This setting is important to update the accuracy of predictions as new data is ingested in audience insights. Most businesses can retrain once per month and get a good accuracy for their prediction.
Review and run the model configuration
In the Review your model details step, validate the configuration of the prediction. You can go back to any part of the prediction configuration by selecting Edit under the shown value. You can also select a configuration step from the progress indicator.
If all values are configured correctly, select Save and run to start running the model. On the My predictions tab, you can see the status of the prediction process. The process may take several hours to complete depending on the amount of data used in the prediction.
Review prediction status and results
Review prediction status
- Go to Intelligence > Predictions and select the My predictions tab.
- Select the prediction you want to review.
- Prediction name: Name of the prediction provided when creating it.
- Prediction type: Type of model used for the prediction
- Output entity: Name of the entity to store the output of the prediction. Go to Data > Entities to find the entity with this name.
- Predicted field: This field is populated only for some types of predictions, and isn't used in customer lifetime value prediction.
- Status: Status of the prediction run.
- Queued: Prediction is waiting for other processes to complete.
- Refreshing: Prediction is currently running to create results that will flow into the output entity.
- Failed: Prediction run has failed. Review the logs for more details.
- Succeeded: Prediction has succeeded. Select View under the vertical ellipses to review the prediction results.
- Edited: The date the configuration for the prediction was changed.
- Last refreshed: The date the prediction refreshed results in the output entity.
Review prediction results
Go to Intelligence > Predictions and select the My predictions tab.
Select the prediction you want to review results for.
There are three primary sections of data within the results page.
Training model performance: A, B, or C are possible grades. This grade indicates the performance of the prediction and can help you make the decision to use the results stored in the output entity. Select Learn about this score to better understand the underlying model performance metrics and how the final model performance grade was derived.
Using the definition of high value customers provided while configuring the prediction, the system assess how the AI model performed in predicting the high value customers as compared to a baseline model.
Grades are determined based on the following rules:
- A when the model accurately predicted at least 5% more high-value customers as compared to the baseline model.
- B when the model accurately predicted between 0-5% more high-value customers as compared to the baseline model.
- C when the model accurately predicted fewer high-value customers as compared to the baseline model.
The Model rating pane shows further details about the AI model performance and the baseline model. The baseline model uses a non-AI based approach to calculate customer lifetime value based primarily on historical purchases made by customers.
The standard formula used to calculate CLV by the baseline model:
CLV for each customer = Average monthly purchase made by the customer in the active customer window * Number of months in the CLV prediction period * Overall retention rate of all customers*
The AI model is compared to the baseline model based on two model performance metrics.
Success rate in predicting high-value customers
See the difference in predicting high-value customers using the AI model compared to the baseline model. For example, 84% success rate means that out of all the high-value customers in the training data, the AI model was able to accurately capture 84%. We then compare this success rate with the success rate of the baseline model to report the relative change. This value is used to assign a grade to the model.
Another metric lets you review the overall performance of the model in terms of error in predicting future values. We use the overall Root Mean Squared Error (RMSE) metric to assess this error. RMSE is a standard way to measure the error of a model in predicting quantitative data. The AI model’s RMSE is compared to the RMSE of the baseline model and the relative difference is reported.
The AI model prioritizes the accurate ranking of customers according to the value they bring to your business. So only the success rate of predicting high-value customers is used to derive the final model grade. The RMSE metric is sensitive to outliers. In scenarios where you have a small percentage of customers with extraordinarily high purchase values, the overall RMSE metric might not give the full picture of the model performance.
Value of customers by percentile: Using your definition of high-value customers, customers are grouped into low-value and high-value, based on their CLV predictions, and shown in a chart. By hovering over the bars in the histogram, you can see the number of customers in each group and the average CLV of that group. This data can help if you want to create segments of customers based on their CLV predictions.
Most influential factors: Various factors are considered when creating your CLV prediction based on the input data provided to the AI model. Each of the factors has their importance calculated for the aggregated predictions a model creates. You can use these factors to help validate your prediction results. These factors also provide more insight about the most influential factors that contributed towards predicting CLV across all your customers.
It's possible to optimize, troubleshoot, refresh, or delete predictions. Review an input data usability report to find out how to make a prediction faster and more reliable. For more information, see Manage predictions.