Machine Learning for Fundamental Analysis

In South Africa, the number of businesses listed on the JSE is few compared to some international stock exchanges, therefore manually analysing selected business’ financial statements to identify investment potential is feasible. However, as portfolio or hedge fund managers look overseas to diversify their assets, analysing thousands of business’ fundamentals could be an overwhelming task. In this article we investigate whether using alternative analytical approaches, such as unsupervised machine learning in MATLAB, could make this process of analysing business fundamentals easier.

Financial analysts rely on the balance sheets, income statements and cash flow statements to make decisions about which listed companies they would like to invest in. By scrutinising these fundamentals, analysts can infer the financial well-being of a company, identify trends within the business and even identify the manipulation of financials when profits are under- or over- reported to give an impression of consistent growth.

Business operations vary per specialty hence companies construct their financial statements differently, e.g. the business fundamentals for a retail business will look different to a services-based business. This makes it difficult to identify trends within a business or an industry, however when comparing similar businesses, we expect the structure of their financials data to look similar. Here we will explore clustering as technique to quickly group interdependent fundamentals of a business together which will identify the underlying structure of the business, make any changes in the business or changes in the way the finances are reported easy to flag for further inspection.

The business fundamentals data was provided by FactSet®, a data service provider, and imported from the workstation using the MATLAB Datafeed Toolbox®. The quarterly results were used and were normalised before the correlations between all the fundamentals were calculated. Figure 1 shows a correlation matrix (heat map) where strong correlation is represented with a lighter tone and a weak correlation with a darker tone. Hierarchical clustering was then applied to the correlation values to identify groups of interrelated fundamentals. In MATLAB you can use the evalcluster() function as an easy way to identify the optimal number of clusters. Thereafter, the clustering was performed using the cluster() function from the Statistics and Machine LearningTM toolbox

In Figure 2 each graph represents a different group of fundamentals identified by the clustering algorithm and their normalised change over time. For group 2, we can see that the sales and gross profit are closely related, which is expected, however if the relationship between these values change it is worth investigating. Similarly, in group 5, assets and depreciation expenditure are grouped together and any large deviations in their relationship could also be a flag for investigation.

The analysis demonstrated in this article is by no means exhaustive or conclusive, but serves to explore the data analytics and machine learning capabilities of MATLAB’s to enable financial analysts to rapidly investigate the financials of many businesses’ or to quickly ‘eyeball’ the health of or trends within a business.  Would you be interested in seeing this concept explored further?






What Can I Do Next?

Follow us