Teaching A.I. to See: How Computer Vision Is Reshaping Medicine, Security, YouTube and the NBA

Reading time ( words)

It was inevitable, in a digital era, that AI would eventually come to the NBA.

And the leading-edge technology it uses has a close Stevens connection.

The nuanced stats and video game-style visualizations  created by the league since 2016 are required viewing for NBA coaches. Top-secret mixes of algorithms track, slice, dice and analyze every last move — every pick, roll, pass, shot, fast break, dunk and turnover — in every game, scanning and analyzing live footage from arena cameras and processing it to help coaches make sense of strengths, weakness, tendencies and matchups.

Those tools are largely built on technology Stevens computer science researcher Xinchao Wang originally helped design and prototype.

"Basically, we taught software to follow the trajectories, at every instant, of all the individual players on the court as well as the ball," explains Wang, who performed the work while at the Swiss government-backed institution ETH Zurich. "And that prototype ended up as the basis of the system used in the NBA today."

Wang is one of a cluster of Stevens researchers working to rapidly expand the reach of this fascinating technology, known as "computer vision," which uses AI-driven processing operations and algorithms to recognize visual features such as people, crowds, balls in flight or sudden movements that human observers may have missed due to the limits of our eyesight.

"There's an impressive body of computer vision work already developed here at Stevens," notes Stevens computer science chair Giuseppe Ateniese, who oversees much of the university's research in the white-hot field. "And it's only growing."

Boiled down to its essence, computer vision technology harnesses artificial intelligence methods to track and locate objects in space and over time, something people do automatically every moment.

Sound easy? It is and it isn't.

Multiple cameras might first need to be arrayed to capture an event or surveillance scene from varying angles in order to obtain more data. Regardless of how the video footage is captured and collected, however, the resulting frames must each be isolated, converted to data, combined, analyzed again, and output as probabilities.

That’s where machine-learning scientists enter.

"It's basically giving the computer a memory of how an object or agent moves around from instant to instant," says Enrique Dunn, another Stevens researcher in the field.

"I use deep-learning methods to track objects in motion," adds Wang. "They could be anything, but in this case the 'objects' were the ten basketball players and the ball."

For the NBA project, Wang's Swiss team set up mathematical operations that first define the relevant spaces — the basketball court, the air above the court — as a series of grids or cells. That's called an occupancy map. He also created processes to describe each individual player as a digital image. Bear in mind that every digital image is, at bottom, nothing more than a bunch of numbers — a complicated, nuanced, matrix, yes, but just math nonetheless.

Then Wang devised algorithms that calculated and recalculated the probabilities each cell in the grid is either empty or contains something from one moment to the next.

By tracking the changes in these complexes of numbers — each representing a characteristic of a frame of a live video — across the physical grid and over split-seconds of time, Wang's new system quickly learned to figured out who was who, who was moving where and how fast, and who was touching, passing or shooting the ball at what angle and with what force.

"Previously, this was always done by hand, which is amazing," says Wang. "People actually had to sit there and watch a live game, or go back and watch every action of an entire game on tape, and write down everything that happened, every single movement and pass and shot. That became the data that could be analyzed by coaches. It was a lot of work."

Wang's software instantly streamlined the process, and removed human observational bias as a bonus. In additional tests processing and analyzing real-time video of volleyball players, he has since developed additional methods that can track and summarize action and game play more accurately… and much, much faster.

"There are only a few seconds' delay between live action and a good tracking report and visualization by this system," he says. "That's pretty good."

More Efficient Hospital Staffing, Safer Public Spaces

The usefulness of the new technology doesn’t end at an NBA tipoff, though.

Wang has used similar machine-learning methods to track processes as diverse as the efficiency of operating-room procedures – by analyzing videos made, with permission, in a German medical center; the movement of in vitro human stem cells magnified and photographed using high-powered microscopes; and the motion of people and objects at transit stations and garages, with an eye toward security applications.

"We already see many potential uses for this technology," notes Wang, who has collaborated with experts and universities worldwide. "In the medical case, we would like to understand work flow better, and try to make predictions and optimize operations. For security, once you have accumulated tracking data, we can teach the machine to identify potentially suspicious poses or actions."

The AI, he points out, could be programmed to run and analyze sample videos of known criminal and terrorist acts and threatening situations, as well as footage of harmless crowds and individuals to learn the mathematical patterns of bad or suspicious acts: a person depositing a backpack-sized object in a station and walking away from it slowly, say, or an individual slowly circling a parked car for a long time.

Then, when those processes spot the same patterns in live video, they could be tuned to flag it automatically, in real time.

"This all can be used to help authorities plan and react more quickly to security threats," says Wang.

In addition to sports, transit and workplace video analysis, Wang is also working on an AI-powered innovation that can sharpen the quality of videos such as those on YouTube or those captured by security camera into super-high resolution — and another that intelligently corrects distortions in images taken by cameras with fisheye-type lenses.

"There's always another area to explore," he says. "There are always new problems and challenges."



Suggested Items

Requirements of Being a MIL-certified Shop

11/12/2019 | Barry Matties, I-Connect007
Barry Matties speaks with American Standard Circuits’ VP of Business Development David Lackey, who has nearly 40 years of experience producing PCBs for the mil/aero market. David talks about what it’s like being a MIL-certified shop and the stringent quality and reporting requirements that it entails.

Copyright © 2019 I-Connect007. All rights reserved.