Speeding Up the Machine Learning Process
October 17, 2019 | KAUSTEstimated reading time: 4 minutes
At a time when big data reigns supreme, training machine learning algorithms to perform certain tasks is often costly and time-consuming. At KAUST’s Visual Computing Center, computer scientist Peter Richtárik and his colleagues have developed a new method for training models with greater efficiency, accuracy and flexibility.
Their method, known as the arbitrary sampling paradigm, provides a shortcut for training machine learning algorithms that use large datasets, which usually take huge amounts of computing power to process. The approach allows practitioners to pinpoint the most useful subset to work with and its optimal size for a given scenario.
“The method specifies which functions in the dataset should be sampled more often and by how much,” says Richtárik. “This means that practitioners can choose the procedure that works best for them, whether they are using a single computer or a distributed network.”
In standard machine learning procedures, algorithms learn to model datasets by working through the data in repeated steps. In each step, the computer reads every data point before updating and improving its performance in the following step. When analyzing images, for example, the computer must process each pixel in each step in order to reach the closest approximation of the actual data.
To evaluate how accurately an algorithm is modeling the dataset, loss functions are used to measure the amount of error, or loss of data, in the model. The average loss of these data needs to be minimized to ensure the machine learning model is mirroring the actual dataset accurately.
“That’s an astronomical number of data points for the computer to process. It’s also extremely costly and takes a lot of time,” says Richtárik.
With his colleague Xun Qian, Richtárik applied the arbitrary sampling paradigm to stochastic gradient descent (SGD), a widely used machine learning algorithm that works in uniform steps to map out the steepest path to optimal performance.
Using arbitrary sampling, the researchers were able to plot the steepest direction using smaller subsets of the data and used this to calculate a rough approximation of the correct direction. As a result, they were able to train the algorithm more efficiently than the standard approach of processing the entire dataset.
Another problem Qian and Richtárik tackled with arbitrary sampling was finding which subset, or minibatch, size works most efficiently. In most cases, practitioners work with one subset size through the whole training process, but efficiency tends to drop after reaching a peak value. Qian and Richtarik were able to cut through this process by pinpointing which subset size will work fastest.
“With arbitrary sampling, we show that improving the minibatch size can improve performance,” says Qian. “This means that larger steps can be applied, leading to faster convergence.”
In another analysis2, Qian and Richtárik applied arbitrary sampling to System for Automated Geoscientific Analyses (SAGA), a variance-reduced algorithm that stores information acquired from each training step. While SAGA has faster convergence rates than SGD, developing a version that can be adapted to different situations has remained a challenge.
With the arbitrary sampling paradigm, Qian and Richtárik were able to speed up the convergence rate and reduce the number of steps in the process by improving the minibatch size and choosing the most effective subset.
“This is the first time we were able to analyze the SAGA algorithm faster with the arbitrary sampling paradigm,” says Richtárik. “Before this approach, the algorithm uniformly sampled the data one point at a time at random.”
While arbitrary sampling improved the performance of convex versions of SGD and SAGA, which are relatively quick and simple to train, Richtárik and his Ph.D. student Samuel Horvath wanted to test the approach on nonconvex models, which are commonly used in deep neural networks. These types of algorithms are more powerful than convex models, but more difficult to train.
Using nonconvex versions of SAGA, and two other methods, stochastic variance reduced gradient (SVRG) and StochAstic Recursive grAdient algorithm (SARAH), Horvath and Richtárik were able to calculate the optimal sampling approach for each of the algorithms, speeding up the training process by an order of magnitude3. The results were presented at the International Conference on Machine Learning earlier this year.
“We tested our new methods on a few real-world datasets and models, and we were able to show an improvement not just in theory, but in practice,” says Horvath. “Our approach can be used on many variance-reduced models to make the training process faster.”
The next step for Richtárik and his team is to develop a way to mathematically unify arbitrary sampling with different algorithms, including quantized methods.
“We have taken a big step, but it’s only the first,” says Richtárik. “There is a whole world that is even more general than arbitrary sampling to explore in the next few years.”
Suggested Items
Koh Young Showcases Award-winning Inspection Solutions at SMTconnect with SmartRep in Hall 4A.225
04/25/2024 | Koh Young TechnologyKoh Young Technology, the industry leader in True 3D measurement-based inspection solutions, will showcase an array of award-winning inspection and measurement solutions at SMTconnect alongside its sales partner, SmartRep, in booth 4A.225 at NürnbergMesse from June 11-13, 2023. The following offers a glimpse into what Koh Young will present at the tradeshow:
Real Time with… IPC APEX EXPO 2024: Plasmatreat: Innovative Surface Preparation Solutions
04/25/2024 | Real Time with...IPC APEX EXPOIn this interview, Editor Nolan Johnson speaks with Hardev Grewal, CEO and president of Plasmatreat, a developer of atmospheric plasma solutions. Plasmatreat uses clean compressed air and electricity to create plasma, offering environmentally friendly methods for surface preparation. Their technology measures plasma density for process optimization and can remove organic micro-contamination. Nolan and Hardev also discuss REDOX-Tool, a new technology for removing metal oxides.
KYZEN to Promote Pair of Stencil Cleaning Chemistries at SMTA Ciudad Jaurez Expo & Tech Forum
04/25/2024 | KYZEN'KYZEN, the global leader in innovative environmentally friendly cleaning chemistries, will exhibit at the SMTA Ciudad Juarez Expo & Tech Forum, scheduled to take place Thursday, May 9, 2024 at the Injectronic Convention Center in Ciudad Jaurez, Chihuahua, Mexico. During the event the KYZEN Clean Team will focus on understencil cleaning products KYZEN E5631J and CYBERSOLV C8882.
Cadence, TSMC Collaborate on Wide-Ranging Innovations to Transform System and Semiconductor Design
04/25/2024 | Cadence Design SystemsCadence Design Systems, Inc. and TSMC have extended their longstanding collaboration by announcing a broad range of innovative technology advancements to accelerate design, including developments ranging from 3D-IC and advanced process nodes to design IP and photonics.
Listen Up! The Intricacies of PCB Drilling Detailed in New Podcast Episode
04/25/2024 | I-Connect007In episode 5 of the podcast series, On the Line With: Designing for Reality, Nolan Johnson and Matt Stevenson continue down the manufacturing process, this time focusing on the post-lamination drilling process for PCBs. Matt and Nolan delve into the intricacies of the PCB drilling process, highlighting the importance of hole quality, drill parameters, and design optimization to ensure smooth manufacturing. The conversation covers topics such as drill bit sizes, aspect ratios, vias, challenges in drilling, and ways to enhance efficiency in the drilling department.