Teaching A.I. to See: How Computer Vision Is Reshaping Medicine, Security, YouTube and the NBA

Reading time ( words)

It was inevitable, in a digital era, that AI would eventually come to the NBA.

And the leading-edge technology it uses has a close Stevens connection.

The nuanced stats and video game-style visualizations  created by the league since 2016 are required viewing for NBA coaches. Top-secret mixes of algorithms track, slice, dice and analyze every last move — every pick, roll, pass, shot, fast break, dunk and turnover — in every game, scanning and analyzing live footage from arena cameras and processing it to help coaches make sense of strengths, weakness, tendencies and matchups.

Those tools are largely built on technology Stevens computer science researcher Xinchao Wang originally helped design and prototype.

"Basically, we taught software to follow the trajectories, at every instant, of all the individual players on the court as well as the ball," explains Wang, who performed the work while at the Swiss government-backed institution ETH Zurich. "And that prototype ended up as the basis of the system used in the NBA today."

Wang is one of a cluster of Stevens researchers working to rapidly expand the reach of this fascinating technology, known as "computer vision," which uses AI-driven processing operations and algorithms to recognize visual features such as people, crowds, balls in flight or sudden movements that human observers may have missed due to the limits of our eyesight.

"There's an impressive body of computer vision work already developed here at Stevens," notes Stevens computer science chair Giuseppe Ateniese, who oversees much of the university's research in the white-hot field. "And it's only growing."

Boiled down to its essence, computer vision technology harnesses artificial intelligence methods to track and locate objects in space and over time, something people do automatically every moment.

Sound easy? It is and it isn't.

Multiple cameras might first need to be arrayed to capture an event or surveillance scene from varying angles in order to obtain more data. Regardless of how the video footage is captured and collected, however, the resulting frames must each be isolated, converted to data, combined, analyzed again, and output as probabilities.

That’s where machine-learning scientists enter.

"It's basically giving the computer a memory of how an object or agent moves around from instant to instant," says Enrique Dunn, another Stevens researcher in the field.

"I use deep-learning methods to track objects in motion," adds Wang. "They could be anything, but in this case the 'objects' were the ten basketball players and the ball."

For the NBA project, Wang's Swiss team set up mathematical operations that first define the relevant spaces — the basketball court, the air above the court — as a series of grids or cells. That's called an occupancy map. He also created processes to describe each individual player as a digital image. Bear in mind that every digital image is, at bottom, nothing more than a bunch of numbers — a complicated, nuanced, matrix, yes, but just math nonetheless.

Then Wang devised algorithms that calculated and recalculated the probabilities each cell in the grid is either empty or contains something from one moment to the next.

By tracking the changes in these complexes of numbers — each representing a characteristic of a frame of a live video — across the physical grid and over split-seconds of time, Wang's new system quickly learned to figured out who was who, who was moving where and how fast, and who was touching, passing or shooting the ball at what angle and with what force.

"Previously, this was always done by hand, which is amazing," says Wang. "People actually had to sit there and watch a live game, or go back and watch every action of an entire game on tape, and write down everything that happened, every single movement and pass and shot. That became the data that could be analyzed by coaches. It was a lot of work."

Wang's software instantly streamlined the process, and removed human observational bias as a bonus. In additional tests processing and analyzing real-time video of volleyball players, he has since developed additional methods that can track and summarize action and game play more accurately… and much, much faster.

"There are only a few seconds' delay between live action and a good tracking report and visualization by this system," he says. "That's pretty good."

More Efficient Hospital Staffing, Safer Public Spaces

The usefulness of the new technology doesn’t end at an NBA tipoff, though.

Wang has used similar machine-learning methods to track processes as diverse as the efficiency of operating-room procedures – by analyzing videos made, with permission, in a German medical center; the movement of in vitro human stem cells magnified and photographed using high-powered microscopes; and the motion of people and objects at transit stations and garages, with an eye toward security applications.

"We already see many potential uses for this technology," notes Wang, who has collaborated with experts and universities worldwide. "In the medical case, we would like to understand work flow better, and try to make predictions and optimize operations. For security, once you have accumulated tracking data, we can teach the machine to identify potentially suspicious poses or actions."

The AI, he points out, could be programmed to run and analyze sample videos of known criminal and terrorist acts and threatening situations, as well as footage of harmless crowds and individuals to learn the mathematical patterns of bad or suspicious acts: a person depositing a backpack-sized object in a station and walking away from it slowly, say, or an individual slowly circling a parked car for a long time.

Then, when those processes spot the same patterns in live video, they could be tuned to flag it automatically, in real time.

"This all can be used to help authorities plan and react more quickly to security threats," says Wang.

In addition to sports, transit and workplace video analysis, Wang is also working on an AI-powered innovation that can sharpen the quality of videos such as those on YouTube or those captured by security camera into super-high resolution — and another that intelligently corrects distortions in images taken by cameras with fisheye-type lenses.

"There's always another area to explore," he says. "There are always new problems and challenges."



Suggested Items

CES 2020: The Intelligence of Things

01/06/2020 | Nolan Johnson, I-Connect007
Show week for CES 2020 starts well ahead of the actual exhibition dates because it is huge. The organizers of CES state that there are more than 4,400 exhibiting companies and nearly three million net square feet of exhibit space. On the floor, you can find 307 of the 2018 Fortune Global 500 companies. Over the week, I-Connect007 Editors Dan Feinberg and Nolan Johnson will bring you some of the most interesting news, products, and announcements from 5G to IoT, semiconductor developments, autonomous vehicle technology, interconnect, fabrication materials, and much more.

NASA Sounding Rocket Technology Could Enable Simultaneous, Multi-Point Measurements — First-Ever Capability

10/21/2019 | NASA
NASA engineers plan to test a new avionics technology — distributed payload communications — that would give scientists a never-before-offered capability in sounding rocket-based research.

For Climbing Robots, the Sky's the Limit

07/15/2019 | NASA
Robots can drive on the plains and craters of Mars, but what if we could explore cliffs, polar caps and other hard-to-reach places on the Red Planet and beyond? Designed by engineers at NASA's Jet Propulsion Laboratory in Pasadena, California, a four-limbed robot named LEMUR (Limbed Excursion Mechanical Utility Robot) can scale rock walls, gripping with hundreds of tiny fishhooks in each of its 16 fingers and using artificial intelligence (AI) to find its way around obstacles.

Copyright © 2020 I-Connect007. All rights reserved.