Voice Identification in Financial Services Using MATLAB

The use of voice analytics and voice identification systems in reducing fraudulent activities when conducting conversations, agreements and transactions over calls is increasing in popularity with a long history of research behind it. Many businesses are investing in systems that add an additional layer of security to their calls through the development of in-house systems or purchasing of off the shelf products. With the rich variety of diverse cultures, languages and accents in the South African business and retail landscape, bespoke solutions are often required to cater for the wide range of vocal characteristics.

Provided the voice data is available, one can build custom voice identification systems with the use of MATLAB and its various toolboxes. From our experience working on voice identification models, we have learnt that building personalised models in a South African context is achievable with very promising evaluation results. Voice only identification is possible using the techniques mentioned below, however, exploring voice only sentiment analysis can be more challenging.

Speech Features

There are numerous speech features which can be used when building voice identification systems but two of the most common are the voice signal’s pitch and Mel-frequency Cepstral Coefficients (MFCCs). While vocal pitch is linked to the frequency of the speech signal, MFCCs are more in depth in their calculation. They involve a process by which a windowed section of a speech signal is passed through a Fast Fourier Transform, a Mel scale filter bank is then applied to the result and then further transformations are carried out until the MFCC values have been derived as seen in  Figure 1 [1].  MATLAB’s Audio toolbox, however, makes calculating both the pitch and MFCCs of a voice signal simple and straight forward with the one line use of the pitch and mfcc functions.

Figure 1: Mel-frequency Cepstral Coefficient Calculation Flow [1]

Voice Identification Techniques

Once the speech features for a group of people have been extracted and formed into a dataset, different voice identification techniques can be explored to provide the building blocks for developing your own custom system in MATLAB. A well-known technique used in the implementation of such systems is the concept of a Universal Background Model (UBM) and the Maximum A-Posteriori (MAP) adaption of Gaussian Mixture Models (GMMs). UBMs are essentially a gaussian mixture model of all the speech feature data that is available in the dataset, and it is formed to estimate an “average individual’s” voice which will aid in identifying an imposter. When building a specific customer’s identification model, the UBM is adjusted using the MAP adaption and the speech features for that individual. This creates an adapted GMM which is unique and different from the UBM. When testing the identity of a speech segment, one develops the speech features, adapts the UBM to create a new adapted GMM which can then be checked against the original adapted GMM on record. Although the processing and adaption algorithms would have to be custom developed, the gmdistribution function in MATLAB’s Statistics and Machine Learning Toolbox makes quick work of creating the required GMMs.

Applying a Classification Model

Finally, the last piece of the puzzle would include building a classification model that would learn the difference between a customer’s adapted GMM and the GMM of the UBM. The rationale behind the classification being that an adapted test GMM of the actual customer will closely match the original adapted GMM data whereas when an imposter’s adapted GMM is passed into the model it will mostly likely be classified as the UBM GMM due to the adapted means of the GMMs being different. Support Vector Machines have proven to be very effective when used as the classification algorithm for this specific application, however, other machine learning models and approaches should be explored for which there is an extensive list available in MATLAB’s Statistics and Machine Learning Toolbox.

What Can I Do Next?

  • Reach out to us if you would like chat to about the various toolboxes and techniques mentioned in this post or take a closer look at this MathWorks example which uses pitch and MFCCs to build a multiclass K Nearest Neighbour speaker identification classifier.
  • Request a trial.
  • Find out more from the team.
  • Visit our Financial Data Science Focus Area page to learn more.
  • Follow us