Nathan Landale

The Technology for The Next Generation

Customer Lifetime Value Prediction with PyMC-Marketing | by Hajime Takeda
Computer, Gadget & Technology

Customer Lifetime Value Prediction with PyMC-Marketing | by Hajime Takeda

Explore the Depths of Buy-till-You-Die (BTYD) Modeling and Practical Coding Techniques

Hajime Takeda
Towards Data Science
Photo by Boxed Water Is Better on Unsplash

TL; DR: The Customer Lifetime Value (CLV) model is a key technique in customer analytics which help companies identify who valuable customers are. Neglecting CLV can lead to overinvestment in short-term customers who may only make a single purchase. ‘Buy Till You Die’ modeling, which utilizes the BG/NBD and Gamma-Gamma models, can estimate CLV. Although the best practices vary depending on data size and modeling priorities, PyMC-Marketing is a recommended Python library for those looking to quickly implement CLV modeling.

The definition of CLV is the total net revenue a company can expect from a single customer throughout their relationship. Some of you might be more familiar with the term ‘LTV’ (Lifetime Value). Yes, CLV and LTV are interchangeable.

Image by Author
  • The first goal is to calculate and predict future CLV, which will help you find out how much money can be expected from each customer.
  • The second objective is to identify profitable customers. The model will tell you who those valuable customers are by analyzing the characteristics of the high CLV customers.
  • The third goal is to take marketing actions based on the analysis and from there, you will be able to optimize your marketing budget allocation accordingly.
Image by Author

Let’s take the e-commerce site of a fashion brand like Nike, for example, which might use advertisements and coupons to attract new customers. Now, let’s assume that college students and working professionals are two major important customer segments. For first-time purchases, the company spends $10 on advertising for college students and $20 for working professionals. And both segments make purchases worth around $100.

If you were in charge of marketing, which segment would you want to invest more in? You might naturally think it’s more logical to invest more in the college students segment, considering their lower cost and higher ROI.

Image by author, with photos used from Pixabay

So, what if you knew this information?

The college student segment tends to have a high churn rate, meaning they don’t purchase anymore after that one purchase, resulting in $100 being spent on average. On the other hand, the working professionals segment has a higher rate of repeat purchases, resulting in an average of $400 per customer.

In that case, you would likely prefer to invest more in the business professionals segment, as it promises a higher ROI. This may seem like a simple thing that anyone can understand. However, surprisingly, most marketing people are focused on achieving the Cost Per Acquisition (CPA), but they are not considering who the profitable customers are in the long run.

Image by author, with photos used from Pixabay

By adjusting the “cost per acquisition”, CPA, we can attract more high-value customers and improve our ROI. This graph on the left represents the approach without considering CLV. The red line represents CPA.’ , which is the maximum cost we can spend to get a new customer. Using the same marketing budget for every customer leads to overinvestment in low-value customers and underinvestment in high-value customers.

Now, the graph on the right side shows the ideal spending allocation when utilizing CLV. We set a higher CPA for high-value customers, and a lower CPA for low-value customers.

Image by author, with photos used from Pixabay

It’s similar to the hiring process. If you aim to hire ex-Goolers, offering a competitive salary is essential, right? By doing this, we can acquire more high-value customers without changing the total marketing budget.

The CLV model I’m introducing only uses sales transaction data. As you can see, we have three data columns: customer_id, transaction date, and transaction value. In terms of data volume, CLVs typically require two to three years of transaction data.

Image by Author

4.1 Approaches for CLV Modeling

Let’s start by understanding the two broad types to calculate CLV: the Historical Approach and the Predictive Approach. Under the Predictive approach, there are two models. The Probabilistic Model and the Machine Learning Models.

Image by Author

4.2 Traditional CLV Formula

First, let’s start by considering a traditional CLV formula. Here, CLV can be broken down into three components. : Average order value, Purchase Frequency, and Customer lifespan.

Image by Author

Let’s consider a fashion company for example, on average:

  • Customers spend $100 per order
  • They shop 4 times per year
  • They stay loyal for 3 years

In this case, the CLV is calculated as 100 times 4 times 3, which equals $1,200 per customer. This Formula is very simple and looks straightforward, right? However, there are some limitations.

4.3 Limitations of Traditional CLV Formula

Image by Author

Limitation #1: Not All Customers Are The Same

This traditional formula assumes that all customers are homogenous by assigning one average number. When some customers make exceptionally large purchases, the average doesn’t represent the characteristics of all customers.

Limitation #2 : Differences in First Purchase Timing

Let’s say, we use the last 12 months as our data collection period.

Image by author, with photos used from Pixabay

This man made his first purchase about a year ago. In this case, we can accurately calculate his purchase frequency per year. It’s 8.

How about two customers? One started purchasing 6 months ago, and the other began 3 months ago. Everyone has been buying at the same pace. However, when we look at the total number of purchases over the past year, they differ. The key point here is we need to consider the tenure of the customer, meaning the duration since they made their first purchase.

Limitation #3 : Dead or Alive?

Determining when a customer is considered “churned” is tricky. For subscription services like Netflix, we can consider a customer to have churned once they unsubscribe. However, in the case of retail or E-commerce, whether a customer is ‘Alive’ or ‘Dead’ is ambiguous.

A customer’s ‘Probability of Being Alive’ depends on their past purchasing patterns. For example, if someone who normally buys every month doesn’t make a purchase in the next three months, they might switch to a different brand. However, there’s no need to worry if a person who typically shops only once every six months doesn’t buy anything in the next three months.

Image by author, with photos used from Pixabay

To address these challenges, we often turn to ‘Buy Till You Die’ (BTYD) modeling. This approach comprises two sub-models:

  1. BG-NBD model:This predicts the likelihood of a customer being active and their transaction frequency.
  2. Gamma-Gamma model: This estimates the average order value.

By combining the results from these sub-models, we can effectively forecast the Customer Lifetime Value (CLV).

Image by Author

5.1 BG/NBD model

We believe that there are two processes in the customer’s status: the ‘Purchase Process,’ where customers are actively buying, and the ‘Dropout Process,’ where customers have stopped purchasing.

During the Active Purchasing Phase, the model forecasts the customer’s purchase frequency with the “Poisson process”.

There’s always a chance that a customer might drop out after each purchase. The BG/NBD model assigns a probability ‘p’ to this possibility.

Consider the image below for illustration. The data indicates this customer made five purchases. However, under the assumption, the model thinks that if the customer had remained active, they would have made eight purchases in total. But, because the probability of being alive dropped at some point, we only see five actual purchases.

Image by Author

The purchase frequency follows a Poisson process while they are considered ‘active’. The Poisson distribution typically represents the count of randomly occurring events. Here, ‘λ’ symbolizes the purchase frequency for each customer. However, the customer’s purchase frequency can fluctuate. The Poisson distribution accounts for such variability in purchase frequency.

Image by Author; Graph sourced from Wikipedia

The graph below illustrates how ‘p’ changes over time. As the time since the last purchase increases (T=31), the probability of a customer being ‘alive’ decreases. When a repurchase occurs (around T=36), you’ll notice that ‘p’ increases once again.

Image by Author

This is the graphical model. As mentioned earlier, it includes lambda (λ) and p. Here, λ and p vary from person to person. To account for this diversity, we assume that heterogeneity in λ follows a gamma distribution and Heterogeneity in p follows a “beta distribution. In other words, this model uses a layered approach informed by Bayes’ theorem, which is also called Bayesian hierarchical modeling.

Image by Author

5.2 Gamma-Gamma model

We assume that Gamma Distribution models the Average Order Value. The Gamma Distribution is shaped by two parameters: the shape parameter and the scale parameter. As this graph shows, the form of the Gamma distribution can change quite a bit by changing these two parameters.

Image by Author; Graph sourced from Wikipedia

This diagram illustrates the graphical model in use. The model employs two Gamma distributions within a Bayesian hierarchical approach. The first Gamma distribution represents the “average order value” for each customer. Since this value differs among customers, the second Gamma distribution captures the variation in average order value across the entire customer base. The parameters p, q, and γ (gamma) for the prior distributions are determined by using Half-flat priors.

Image by Author

Useful CLV libraries

Here, let me introduce two great OSS libraries for CLV modeling. The first one is PyMC-Marketing and the second is CLVTools. Both libraries incorporate Buy-till-you-die modeling. The most significant difference is that PyMC-Marketing is a Python-based library, while CLVTools is R-based. PyMC-Marketing is built on PyMC, a popular Bayesian library. Previously, there was a well-known library called ‘Lifetimes’. However, ‘Lifetimes’ is now in maintenance mode, so it has transitioned into a PyMC-Marketing.

Full code

The full code can be found on my Github below. My sample code is based on yMC-Marketing’s official quick start.

Code Walkthrough

First, you will need to import pymc_marketing and other libraries.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pymc as pm
from arviz.labels import MapLabeller

from IPython.display import Image
from pymc_marketing import clv

You will need to download the “Online Retail Dataset” from the “UCI Machine Learning Repository”. This dataset contains transactional data from a UK-based online retailer and is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

import requests
import zipfile
import os

# Download the zip file
url = ""
response = requests.get(url)
filename = ""

with open(filename, 'wb') as file:

# Unzip the file
with zipfile.ZipFile(filename, 'r') as zip_ref:

# Finding the Excel file name
for file in os.listdir("online_retail_data"):
if file.endswith(".xlsx"):
excel_file = os.path.join("online_retail_data", file)

# Convert from Excel to CSV
data_raw = pd.read_excel(excel_file)


Data Cleansing

A quick data cleansing is needed. For instance, we need to handle return orders, filter out records without a customer ID, and create a ‘total sales’ column by multiplying the quantity and unit price together.

# Handling Return Orders
# Extracting rows where InvoiceNo starts with "C"
cancelled_orders = data_raw[data_raw['InvoiceNo'].astype(str).str.startswith("C")]

# Create a temporary DataFrame with the columns we want to match on, and also negate the 'Quantity' column
cancelled_orders['Quantity'] = -cancelled_orders['Quantity']

# Merge the original DataFrame with the temporary DataFrame on the columns we want to match
merged_data = pd.merge(data_raw, cancelled_orders[['CustomerID', 'StockCode', 'Quantity', 'UnitPrice']],
on=['CustomerID', 'StockCode', 'Quantity', 'UnitPrice'],
how='left', indicator=True)

# Filter out rows where the merge found a match, and also filter out the original return orders
data_raw = merged_data[(merged_data['_merge'] == 'left_only') & (~merged_data['InvoiceNo'].astype(str).str.startswith("C"))]

# Drop the indicator column
data_raw = data_raw.drop(columns=['_merge'])

# Selecting relevant features and calculating total sales
features = ['CustomerID', 'InvoiceNo', 'InvoiceDate', 'Quantity', 'UnitPrice', 'Country']
data = data_raw[features]
data['TotalSales'] = data['Quantity'].multiply(data['UnitPrice'])

# Removing transactions with missing customer IDs as they don't contribute to individual customer behavior
data = data[data['CustomerID'].notna()]
data['CustomerID'] = data['CustomerID'].astype(int).astype(str)

Image by Author

Then, we need to create a summary table using this ‘clv_summary’ function. The function returns the dataframe in an RFM-T format. RFM-T means Recency, Frequency, Monetary, and Tenure of each customer. These metrics are popular in shopper analysis.

data_summary_rfm = clv.utils.clv_summary(data, 'CustomerID', 'InvoiceDate', 'TotalSales')
data_summary_rfm = data_summary_rfm.rename(columns={'CustomerID': 'customer_id'})
data_summary_rfm.index = data_summary_rfm['customer_id']

BG/NBD model

The BG/NBD model is available as a BetaGeoModel function in this library. When you execute, your model begins the training.

When you execute bgm.fit_summary(), the system provides a statistical summary of the learning process. For example, this table shows the mean, standard deviation, High-Density Interval, HDI for short, etc. for the parameters. We can also check r_hat value, which helps assess whether a Markov Chain Monte Carlo (MCMC) simulation has converged. R-hat is considered acceptable if it’s 1.1 or less.

bgm = clv.BetaGeoModel(
data = data_summary_rfm,

The matrix below is called the Probability Alive Matrix. With this, we can infer users who are likely to return and those who are unlikely to return. The X-axis represents the customer’s historical purchase frequency and the y-axis represents the customers’ recency. The color shows the probability of being alive. Our new customers are in the bottom-left corner: Low frequency and high recency. Those customers have a high probability of being alive. Our loyal customers are those on the bottom-right: High-frequency and High-recency customers. If they don’t purchase for a long time, loyal customers become at-risk customers, which have low probability of being alive.

Image by Author

The next thing we can do is to predict the future transactions for each customer. You can use the expected_num_purchases function. Having fit the model, we can ask what is the expected number of purchases in the next period.

num_purchases = bgm.expected_num_purchases(

sdata = data_summary_rfm.copy()
sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values

Gamma-Gamma model

Next, we will move on to the Gamma-Gamma model to predict the average order value. We can predict the expected “average order value” with ‘Expected_customer_spend’ function.

nonzero_data = data_summary_rfm.query("frequency>0")
dataset = pd.DataFrame({
'customer_id': nonzero_data.customer_id,
'mean_transaction_value': nonzero_data["monetary_value"],
'frequency': nonzero_data["frequency"],
gg = clv.GammaGammaModel(
data = dataset

expected_spend = gg.expected_customer_spend(

The graph below shows the expected average order value of 5 customers. The average order value of these two customers is more than $500, while the average order value of these three customers is around $350.

labeller = MapLabeller(var_name_map={"x": "customer"})
az.plot_forest(expected_spend.isel(customer_id=(range(5))), combined=True, labeller=labeller)
plt.xlabel("Expected average order value");
Image by Author


Finally, we can combine two sub-models to estimate the CLV of each customer. One thing I want to mention here is the parameter: Discount_rate. This function uses the DCF method, short for “discounted cash flow.” When a monthly discount rate is 1%, $100 in one month is worth $99 today.

clv_estimate = gg.expected_customer_lifetime_value(
time=120, # 120 months = 10 years

clv_df = az.summary(clv_estimate, kind="stats").reset_index()

clv_df['customer_id'] = clv_df['index'].str.extract('(\d+)')[0]

clv_df = clv_df[['customer_id', 'mean', 'hdi_3%', 'hdi_97%']]
clv_df.rename(columns={'mean' : 'clv_estimate', 'hdi_3%': 'clv_estimate_hdi_3%', 'hdi_97%': 'clv_estimate_hdi_97%'}, inplace=True)

# monetary_values = data_summary_rfm.loc[clv_df['customer_id'], 'monetary_value']
monetary_values = data_summary_rfm.set_index('customer_id').loc[clv_df['customer_id'], 'monetary_value']
clv_df['monetary_value'] = monetary_values.values
clv_df.to_csv('clv_estimates_output.csv', index=False)

Now, I am going to show you how we can improve our marketing actions. The graph below shows an estimated CLV by Country.

# Calculating total sales per transaction
data['TotalSales'] = data['Quantity'] * data['UnitPrice']
customer_sales = data.groupby('CustomerID').agg({
'TotalSales': sum,
'Country': 'first' # Assuming a customer is associated with only one country

customer_countries = customer_sales.reset_index()[['CustomerID', 'Country']]

clv_with_country = pd.merge(clv_df, customer_countries, left_on='customer_id', right_on='CustomerID', how='left')

average_clv_by_country = clv_with_country.groupby('Country')['clv_estimate'].mean()

customer_count_by_country = data.groupby('Country')['CustomerID'].nunique()

country_clv_summary = pd.DataFrame({
'AverageCLV': average_clv_by_country,
'CustomerCount': customer_count_by_country,
# Calculate the average lower and upper bounds of the CLV estimates by country
average_clv_lower_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_3%'].mean()
average_clv_upper_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_97%'].mean()

# Add these averages to the country_clv_summary dataframe
country_clv_summary['AverageCLVLower'] = average_clv_lower_by_country
country_clv_summary['AverageCLVUpper'] = average_clv_upper_by_country

# Filtering countries with more than 20 customers
filtered_countries = country_clv_summary[country_clv_summary['CustomerCount'] >= 20]

# Sorting in descending order by CustomerCount
sorted_countries = filtered_countries.sort_values(by='AverageCLV', ascending=False)

# Prepare the data for error bars
lower_error = sorted_countries['AverageCLV'] - sorted_countries['AverageCLVLower']
upper_error = sorted_countries['AverageCLVUpper'] - sorted_countries['AverageCLV']
asymmetric_error = [lower_error, upper_error]

# Create a new figure with a specified size

# Create a plot representing the average CLV with error bars indicating the confidence intervals
# We convert the index to a regular list to avoid issues with matplotlib's handling of pandas Index objects
plt.errorbar(x=sorted_countries['AverageCLV'], y=sorted_countries.index.tolist(),
xerr=asymmetric_error, fmt="o", color="black", ecolor="lightgray", capsize=5, markeredgewidth=2)

# Set labels and title
plt.xlabel('Average CLV') # x-axis label
plt.ylabel('Country') # y-axis label
plt.title('Average Customer Lifetime Value (CLV) by Country with Confidence Intervals') # chart title

# Adjust the y-axis to display countries from top down

# Show the grid lines
plt.grid(True, linestyle="--", alpha=0.7)

# Display the plot

Image by Author

Customers in France tend to have a high CLV. On the other hand, customers in Belgium tend to have a lower CLV. From this output, I recommend increasing the marketing budget for acquiring customers in France and reducing the marketing budget for acquiring customers in Belgium. When we do the modeling with the U.S.-based data., we would use the states instead of the country.

You might be wondering:

  • Can we utilize additional types of data, such as access logs?
  • Is it possible to incorporate more features like demographic information or marketing activity into the model?

Basically, the BTYD model only requires transaction data. If you want to use other data or other features, an ML approach might be an option. After that, you can assess the performance of both Bayesian and ML models, choosing the one that offers better accuracy and interpretability.

The flowchart below shows a guideline for better CLV modeling.

Image by Author

First, consider your data size. If your data isn’t large enough or you only have transaction data, BTYD modeling using PyMC Marketing might be the best choice. Even though your data is large enough, I think a good approach is to start with a BTYD model and if it underperforms, try a different approach. Specifically, if your priority is accuracy over interpretability, neural networks, XGboost, LightGBM, or ensemble techniques could be beneficial. If interpretability is still important to you, consider methods like Random Forest or the explainable AI approach.

In summary, I recommend starting with PyMC Marketing is a good first step in any case!

Here are some key takeaways.

  • Customer lifetime value (CLV) is the total net profit a company can expect from a single customer throughout their relationship.
  • We can build a Probabilistic model (BTYD) using the BG/NBD model and the Gamma-Gamma model.
  • If you are familiar with Python, PyMC-Marketing is where you can start.

Thank you for reading! If you have any questions/suggestions, feel free to contact me on Linkedin! Also, I would be happy if you follow me on Towards Data Science.

Source link