Download the code and script

AI in Finance – Trading: Clustering

Pairs trading is a popular trading strategy. However, it can be shrouded in statistics. By leveraging the strengths of MATLAB’s machine learning and statistics capabilities, you too can formulate your own pairs trading strategies.  In this example, we examine 300 stocks from the S&P 500 and group them into clusters of stocks such that each cluster has similar characteristics. We then easily create all possible trading pairs, and then test all pairs for co-integration. Thereafter, we backtest one of the co-integrated pairs to show the returns. We provide a live script for you to get started immediately with model development.

The data that we are given contains fundamental metrics for all the stocks, namely the market cap, earnings per share (EPS) and price per earnings ratio (P/E). We can visualise this data by plotting the stocks in three dimensions as in Figure 1.















Figure 1: All stocks unclustered

The machine learning is introduced when we use a k-Means clustering algorithm to group the stocks into separate partitions. The algorithm decides how best to cluster the stocks based on the number of clusters that you request. Let’s choose to cluster the stocks into 6 groups. This is done using the single line of code below.



The k-Means algorithm divides the data into 6 clusters as required. We see that the clusters are colour-coded and that their centres in space are represented by the Xs in Figure 2.  Additional visualisation of the effectiveness of the clustering, like a silhouette plot, can be seen in the live script.















Figure 2: All stocks, clustered

Let’s work with one group of stocks, say cluster 6.  These are the purple stocks, and they all seem to be clustered according to a high market cap value. Within cluster 6, there are 26 stocks.  Now that we have decided on a cluster of interest, we must import the data for each stock. A typical workflow may include writing code to import each set of data. This can be tedious and time-consuming. However, the MATLAB Import Tool allows the user to use an interactive interface to import data. The Import Tool is used to get data from one .csv file of stock data. However, since all the .csv files are formatted in the same way, we use the Import Tool to generate code automatically which can be applied to all our data files. We have generated the function importStock and it can be seen in the code snippet below. It is incased within a for loop structure so that the repetitive reading in of data is done for us automatically.




Once the data has been imported, it is time to create trading pairs. A combination of 2 stocks from a group of 26 stocks gives us 325 pairs – which is a lot of pairs to test manually for co-integration! Thankfully, MATLAB allows us the freedom to automate our testing. We can easily loop through all 325 pairs of stocks and use the single line of code to test them all for co-integration.



This invokes the Engel-Granger test for co-integration. The null hypothesis (that there is no co-integration in the input time series) is returned as h. The syntax is familiar and corresponds with statistical econometric literature. This hypothesis testing is built-in and does not have to be coded manually. Of the 325 Engel-Granger tests which are smoothly performed, 29 are returned as being trading pairs which are co-integrated. The first four of these pairs are shown below in Figure 3:















Figure 3: The first four pairs of co-integrated stocks

Let’s choose any one of the co-integrated pairs, and backtest it. By means of adapting a pre-written function by Stuart Kozola we call the function pairsSignal as below



The ubiquity of MATLAB allows you to integrate code from a variety of sources into your work. The MATLAB support and user community is vast and enables you to consolidate your project into a powerful piece of analysis. The function pairsSignal performs a backtest of the stocks DHR and FOXA (whose data is stored in the matrix prices). It rebalances the portfolio every 9 days with the previous 30 days’ prices. As seen in the bottom plot in Figure 4, the final return of the portfolio is 51.2%.

Figure 4: Comprehensive pairs trading results

We have provided you with a MATLAB live script for you to experiment with machine learning for clustering and pairs trading. In doing so, you will be able to appreciate the strength of MATLAB to perform complex statistical operations with ease. MATLAB allows for data handling to be done easily and can be automated. The rich wealth of knowledge on MATLAB Central helps you as the user to lean on the analysis of others to develop your own new solutions. Get started with pairs trading by running the file PairsTrading.mlx!

What Can I Do Next?

Follow us