Automation for Downloading, Processing and Extraction of CPI Data

Increased consumer expectations and aggressive competition for a finite customer pool are driving service fees down substantially. Executives are driving strategies such as Intelligent Automation and Robotic Process Automation strategies for improving operation efficiencies in order to remain profitable. However, the turn-around time for implementing projects can take months or years. Automating several “smaller” manual tasks can accumulate from saving an hour per task to a couple of days per month, creating an immediate impact and freeing up a significant amount of time to focus on more meaningful work or analyses.

This example looks at automating the extraction of data made available by Stats SA. Every month analysts and portfolio managers are required to download economic data, for example  Consumer Price Index (CPI) data from websites such as the Stats SA webpage. Downloading the CPI data can be cumbersome, let alone extracting the desired CPI data from the dataset provided. The process requires a user to navigate to the right Stats SA webpage, find the relevant files, download the files, unzip the files and browse through hundreds of rows and columns of data and try extract something meaningful. A single function has been created to do exactly that. This function allows a user to download, extract and present the desired CPI data in a single line of code.

getCPI

The function  called getCPI automatically downloads the CPI files from this page on the Stats SA webpage, cleans and processes the table data into a usable format, extracts the desired CPI data and finally presents the data in a usable graphic. This function has a couple of sub-functions that make the automation of the different processes possible.  Below is an example of what the data looks like in a spreadsheet format.

 

 

Syntax

getCPI has three different ways in which the function can be called. This allows a user to generate an output unique to their needs.

  1. getCPI(); requires no inputs and generates the full suite of default outputs.
  2. [dates, CPI] = getCPI(); requires no inputs and generates the full suite of default outputs as well as outputting the dates and CPI values for further analysis, if required.
  3. [dates, CPI] = getCPI(Name1, Value1, … , Name_n, Value_n); may take inputs in the form of name-value pairs as optional if the user requires the default options to be changed.

Function Automation Examples

1.    No Inputs

When no inputs are specified the getCPI function uses the default inputs. The default inputs specify that the function must extract the CPI data for all the available dates (January 2008 to the most recently published date), the CPI for all the items in the CPI basket (“All Items” in column H04) and the region is specified as “All Urban Areas” (column H13). The function will always automatically display a message informing the user which optional parameters have been implemented. The message also displays whether the data file download already exists in the current directory or if the file had to be downloaded. This improves the usage of the function if multiple outputs of the function are created.

 

 

 

 

 

 

 

 

 

 

 

 

2.    Specify an Item

An item refers to a specific item within the basket of the CPI that may be of interest which include food products, communications, telecommunications, transport, education, insurance, housing, medical and many more. This can be extracted using the getCPI function in the following manner: The value of the Items must be specified as a cell array as a user can input several types of items. The function will output a plot of  the different items. When comparing the cell input of the items and that shown in the legend, you will notice that these are different. The function automatically detects the item description closest to the item description in the cell array, if an exact match is not available.

 

 

 

 

 

 

 

 

 

 

 

 

3.    Specify the Region

There are a few different types of regions that can be specified. These include ‘All urban areas’ and ‘Total country’ but the function is not limited to these areas.

 

 

 

 

 

 

 

 

 

 

 

 

4.    Change the Start Date

The function allows a user to change the start date of the analysis. If a user did not want to look at the complete dataset but only wanted to look at the dates starting from January 2015 for example, the input would be as follows. The start date input is required to be of type datetime.

 

 

 

 

 

 

 

 

 

 

 

 

 

But what if I input an invalid date? The automation capabilities of the function will detect the error and display an error message to the user. The message will tell the user the problem that needs to be corrected and make a suggested to fix the problem. The function makes use of MATLAB’s error handling try and catch blocks.

 

 

 

 

 

5.    Switch the Graphic On and Off

Some users may not want to see the graphic and will have the choice to display the graphic or not. This will result in the desired dates and CPI values to be output into the Workspace for further analysis, if desired. The ‘Plot’ name can be controlled by using either a ‘Yes’ or ‘No’ value statement, with ‘Yes’ being the default parameter.

 

 

 

 

 

6.    Multiple Inputs

Automation allows the user to specify as many input options as required. The user has the choice to override all the default options. The function makes us of MATLAB’s capability to support variable number of inputs.

 

 

 

 

 

 

 

 

 

 

 

 

Conclusion

Automating processes is made easy using MATLAB. This example not only showed how a function created in MATLAB allows a user to download, process and extract data from an online source, but how flexibility and robustness can be built into the function. The real value is in the data analytics. Why waste time doing manual tasks that may be easily automated?

Note:

The functions demonstrated in this live script have been developed as an example for the automation of downloading, processing and extracting of CPI data from the Stats SA webpage. As of 9 April 2020 the Stats SA platform has only made available the data up to February 2020. The functions have not been tested on future releases post 9 April 2020, but will be done once new data is made available. The functions have not been tested for production use, but rather to serve as an example of MATLAB’s automation capabilities.

What Can I Do Next?

Follow us