Parallel Paradise: How to Speed up your Code

Verushen Coopoo, Applications Engineer

Opti-Num Solutions, 2020

Download the code

In this article we demonstrate the benefit that parallel computing resources can offer within MATLAB. We explore two main approaches to speed up the computation – locally and on a cluster – and review the speed-ups gained.

In a previous post on Constructing the Optimal Portfolio with MATLAB and Smart Beta, we used the strategy of equal risk parity to find the optimal weightings for a portfolio of 20 stocks from the JSE. This optimisation runs in a couple of seconds which makes it difficult to demonstrate the power of parallel computing. Thus, we have created an artificial portfolio that is extremely large, on the order of a few thousand stocks. This creates a scenario where we see the effect of the parallelisation on the speed of the optimisation.

The local computation is done on a Dell laptop, and the cluster computation is performed on Opti-Num Solutions’ High-Performance Computer (HPC). The HPC is a powerful computer used to accelerate computation, with capabilities for GPU and parallel computing. You can download the accompanying live script for this article for the full version of the code.

Local: no parallel

We run the optimisation, serially and locally, using fmincon(). Refer to the original post for more details on the implementation of the optimisation and the choices governing them.

An object to hold options for the optimisation, opts, is created when setting up the optimisation problem. We will modify opts later to perform the optimisation in parallel. The choice to use parallel computing has not been specified, because parallel computing is off by default.

opts = optimoptions('fmincon', ...
'Algorithm','interior-point', ...
'Display','off', ...
'TolFun',1e-8);

The optimisation is now run, and it is timed using the tic and toc functions. This will allow us to see how long it takes to run the optimisation. We will use the time taken to calculate the optimisations as a way to compare the efficiency of the different computing configurations.

tic
wOpt = fmincon(fH,x0, [], [], Aeq, beq, lb, ub, [], opts);
local_noParallel = minutes(seconds(toc)) % minutes

local_noParallel = 11.5594

Local: with parallel

Many MATLAB functions offer built-in support for parallel computing, via the Parallel Computing Toolbox. In most, if not all, cases, this entails setting a name-value pair or property like ‘UseParallel’ to true. The rest of the algorithm is unaffected. This is truly powerful and allows the user to enable parallel computing easily when it is available. For an extensive list of where parallel computing is supported, refer to this link: Parallel Computing Support in MATLAB and Simulink Products

Since fmincon() is one of the solvers in the Optimisation Toolbox which supports parallelisation, we enable it by simply modifying the UseParallel property of opts to true.

opts.UseParallel = true;

The enablement of this setting will be enough to begin the parallel pool (the collection of physical cores on your CPU that the code will be distributed amongst) automatically. However, since we are working with local resources and a cluster later on, we explicitly use the parpool() command to instruct MATLAB to open the parallel pool on the local machine, if it isn’t running already.

parpool('local')

There are four physical cores on the machine. We call fmincon(), and time the optimisation using tic and toc as before.

tic
wOpt = fmincon(fH,x0, [], [], Aeq, beq, lb, ub, [], opts);
local_Parallel = minutes(seconds(toc)) % minutes

local_Parallel = 6.1373

We can compare the MATLAB processes required to run the optimisations in both local cases, with and without parallel computing. In Figure 1a (from Windows Task Manager), the optimisation is run locally with parallel off. There is only one MATLAB process active. Conversely, in Figure 1b, the optimisation is run locally with parallel computing active and we can see that there are four MATLAB processes simultaneously running. This has cut down the processing time from the no parallel case considerably. The work that has to be done by the fmincon() algorithm has been split amongst these four workers and hence this is why the computation time is reduced.

 

 

 

 

 

 

 

 

Cluster

We can take the computation a step further by scaling it up to a cluster. Opti-Num’s HPC has the MATLAB Parallel Server installed on it. The MATLAB Parallel Server allows you to speed up your code by offloading it as a job to a remote computing cluster. The benefit of this is that it lets you send off your code, and you can carry on with your MATLAB session while your code runs on the MATLAB Parallel Server. We will run the computation on the cluster using 4 cores to see the effect on the computation time, and to allow us to compare the runtime of the locally run configurations.

Specify the name of the cluster, and explicitly create a reference to it using the parcluster() function.

clusterName = 'Opti-MJS';
myCluster = parcluster(clusterName);

The batch() command is one way to offload code to a cluster, and is used here. Refer to the live script for full implementation details of batch().

Extract the time taken for the job to run.

cluster_4cores = job.FinishDateTime - job.StartDateTime

cluster_4cores = duration

00:03:21

Review

Now that we have run the same code in various computing configurations, let’s compare the different computing times.

 

 

 

 

 

 

 

 

 

 

It is not surprising that the cluster offered the best speed-up when compared to running the code locally without parallel computing capabilities.

 

 

Another interesting observation is that both the local, parallel configuration and the cluster configuration used 4 cores. Why was the cluster faster? This can be attributed to improved computing resources and hardware on the HPC as opposed to the local desktop machine.  For example, the CPU on the HPC is already better: it is an Intel® Core™ i9-9900KS (two generations better than the local machine) and its base speed is 4.01 GHz: roughly twice as fast as the laptop.

In this article, we have demonstrated the benefits that parallel computing can introduce to your analysis. The most significant benefit is going to be the speed-up gained in running code. Further, this is all integrated within the MATLAB environment. There is minimal software overhead required to parallelise your code, whether locally or onto a cluster. Do you have any use cases of where parallel computing could be used to accelerate computation? This example sped up a hypothetical portfolio optimisation from about 10 minutes to 3 minutes. However, in a case where massive Monte Carlo simulations have to be performed, or a neural network must be trained on gigabytes’ worth of data, any speed-up is valuable.

What Can I Do Next?

Follow us