In Marketing Analytics, segmentation of customer data has many use cases: targeting groups of potential customers in marketing for acquisition, improving servicing standards to customers, enhancing customer relationships and identifying customer segments for retention strategies.

In addition, novel data sources lead to an increase in data complexity whilst providing a lever to usefulness of AI-ML techniques.

Clustering of customer data enables identifying groups of customers that are similar in relation to the variables considered. Different methods can be used to segment customers into groups. The purpose of segmentation is to identify suitable groups for customers, for instance with the aim to lever success of marketing channels when targeting chosen customer segments. 

Disclaimer: The information is provided for mere information purposes as part of this blog. Auriscon Ltd and Auriscon HK Ltd assumes no responsibility or liability for any errors or omissions in the content of this site. The information contained in this site is provided  with no guarantees of completeness and accuracy. No liability is assumed by Auriscon for any damages that may occur for external use of information provided herein. 

 

Grouping Customer Data

customer groups2

Clustering of customer data enables identifying groups of customers that are similar in relation to the variables considered. Different methods can be used to segment customers into groups. K-means clustering belongs to the standard statistical techniques, and determines segments by maximizing the distance between different group centers, and by minimizing the distance to the center within each group. Using fictitious customer data and subjecting to K-Means clustering, the outcome is displayed with respect to the Income variable below. Clustering is an unsupervised ML method with clusters being identified based on data alone.

A minor caveat of K-Means is the requirement to choose the number of segments for clustering beforehand. Unlike K-Means clustering, Model-Based clustering, e.g. he R package mclust, does not require any advance specification of the number of clusters.

The three segments obtained from the the K-Means and the Model-Based clustering methods are similar for the sampled customer data.

 

 

  

 


Analyzing Customer Churn


Customer churn occurs when customers stop doing business with acompany.  Apparently, the impact of churn is negative, either through a loss of customers or through a loss of revenue. Before any data driven approach on analysis of churn data can start one has to have agreement within an organization on the definition of churn.

Definition of churn depends on the business model of companies: businesses that are subscription based encounter churn in terms of cancellations of subsriptions, whereas businesses that are based on recurring purchases encounter churn when customers refrain from buying any additional products or services for more than x months.

Understanding the drivers of churn is an important precondition before strategies to prevent churn can realistically be effective:

Auriscon assists in data driven approaches to churn analysis consisting of data pre-processing, explorative data analysis, model training and evaluation. The outcome of this analysis provides insight for drivers of customer churn and retention actions.

For example, the churn rate of customers may vary by segment, dependent
on the number of previous purchases. The data driven approach
provides the rationale associated to critical segments with
customers at higher risk for churn.

For demonstration of churn analysis a dataset of 10000 Bank Churners obtained from Kaggle is analyzed/ customer data is used. 

 

  • Inspection of data reveals outliers for variables Age and CreditScore

 

 

  • Classification is different from Clustering: Clustering is an unsupervised ML method with clusters being identified based on data alone. By contrast, Classification such as Random Forest is performed supervised. The method trains the Model on the dependent variable in conjunction with the predictor variables. This results in a Model that can be used for prediction of class membership based on new data.

  • Random Forest approach use an ensemble of trees to arrive at robust estimators that avoid overfitting and work well on new data. For estimation, the data are split into training and test data sets with the Model being trained and validated on separate data sets. The Random Forests Model consists of a large number of trees with classifications for being assigned to the segment that receives the majority of votes from the ensemble. Noteworthy that for classification problems the dependent variable is of categorical type and is cast as a factor.

Ensemble techniques in particular compare favourably to single Model techniques such as Logistic Regression. Techniques such a Random Forests and Gradient Boosting Trees provide superior performance due to combining multiple regression or decision trees whilst accounting for non-linear relationships between variables. Other Supervised Learning methods such as Artifical Neural Networks (ANN) are similarly effective. In applicatons of ANN, multi-layer networks are typically used whereby the number of layers used increases until no further improvement in accuracy is detectable. 

 → Tip:Evaluating Portfolio models and analytical approaches for Portfolio management can be a complex task with risks of errors in the evaluation processes. Auriscon supports in identifying sutiable Models and approaches alongside evaluation with analytical assistance provided end-to-end. 

 


Fin-Tech is an industry branch within Finance that provides many examples for a business model that utilized the advantages of digitalization. Various specialized services are provided by the FinTech industry such as robot financial advisors for automated advising and assistance in portfolio management. Other automated services are used for peer to peer lending, crowd funding,and access to trading systeTech ms for a retail clientele. 

Large Language Models (LLM) are utilized throughout Finance and Banking and FinTech companies are no exception to this trend. Automating email drafting using LLM leverages artificial intelligence to streamline communication, by generating professionally articulated messages that updates on previous correspondences. Trained to recognize the nuances of different types of emails allow LLM to distinguishing between updates, client inquiries, and requests. Moreover, integration of these LLM systems with existing CRM systems provides a way to access the relevant customer data swiftly, which proves useful for a more eng


Algorithmic Trading employs strategies developed and applied based on algorithmic and machine learning signals. This approach enables a speedily identification of trading patterns with sufficient accuracy, thereby leading to fast prediction of securities prices and execution on trading decisions. 

In the context of securities trading the volatility surface is an important concept for the pricing of options.The volatility surface of an option depends on the time to maturity and the strike price and evolves dynamically through time. Consequently, understanding the dynamics of a volatility surface is prerequesite for effective hedging and pricing. Using Neural Networks enables the modelling of the volatily dynamics without having to resort to complicated and at times slow traditional numerical pricing algorithms.   

Generating synthetic data with the same properties as those observed in actual data is a desirable tool for many areas including Finance. Neural Networks in the form of Autoencoders and variations thereof provide the right technique for such a task. Autoencoders are also useful for dimensionality reduction, whereby a large number of features is reduded to a much smaller number of features that possess the same information content.