Simulated human eye movement aims to train metaverse platforms

The results have been accepted and will be presented at the International Conference on Information Processing in Sensor Networks (IPSN), May 4-6, 2022, a leading annual forum on research in networked sensing and control.

“If you’re interested in detecting whether a person is reading a comic book or advanced literature by looking at their eyes alone, you can do that,” said Maria Gorlatova, the Nortel Networks Assistant Professor of Electrical and Computer Engineering at Duke.

“But training that kind of algorithm requires data from hundreds of people wearing headsets for hours at a time,” Gorlatova added. “We wanted to develop software that not only reduces the privacy concerns that come with gathering this sort of data, but also allows smaller companies who don’t have those levels of resources to get into the metaverse game.”

The poetic insight describing eyes as the windows to the soul has been repeated since at least Biblical times for good reason: The tiny movements of how our eyes move and pupils dilate provide a surprising amount of information. Human eyes can reveal if we’re bored or excited, where concentration is focused, whether or not we’re expert or novice at a given task, or even if we’re fluent in a specific language.

“Where you’re prioritizing your vision says a lot about you as a person, too,” Gorlatova said. “It can inadvertently reveal sexual and racial biases, interests that we don’t want others to know about, and information that we may not even know about ourselves.”

Eye movement data is invaluable to companies building platforms and software in the metaverse. For example, reading a user’s eyes allows developers to tailor content to engagement responses or reduce resolution in their peripheral vision to save computational power.

With this wide range of complexity, creating virtual eyes that mimic how an average human responds to a wide variety of stimuli sounds like a tall task. To climb the mountain, Gorlatova and her team — including former postdoctoral associate Guohao Lan, who is now an assistant professor at the Delft University of Technology in the Netherlands, and current PhD student Tim Scargill — dove into the cognitive science literature that explores how humans see the world and process visual information.

For example, when a person is watching someone talk, their eyes alternate between the person’s eyes, nose and mouth for various amounts of time. When developing EyeSyn, the researchers created a model that extracts where those features are on a speaker and programmed their virtual eyes to statistically emulate the time spent focusing on each region.

“If you give EyeSyn a lot of different inputs and run it enough times, you’ll create a data set of synthetic eye movements that is large enough to train a (machine learning) classifier for a new program,” Gorlatova said.

To test the accuracy of their synthetic eyes, the researchers turned to publicly available data. They first had the eyes “watch” videos of Dr. Anthony Fauci addressing the media during press conferences and compared it to data from the eye movements of actual viewers. They also compared a virtual dataset of their synthetic eyes looking at art with actual datasets collected from people browsing a virtual art museum. The results showed that EyeSyn was able to closely match the distinct patterns of actual gaze signals and simulate the different ways different people’s eyes react.

According to Gorlatova, this level of performance is good enough for companies to use it as a baseline to train new metaverse platforms and software. With a basic level of competency, commercial software can then achieve even better results by personalizing its algorithms after interacting with specific users.

“The synthetic data alone isn’t perfect, but it’s a good starting point,” Gorlatova said. “Smaller companies can use it rather than spending the time and money of trying to build their own real-world datasets (with human subjects). And because the personalization of the algorithms can be done on local systems, people don’t have to worry about their private eye movement data becoming part of a large database.”

This research was funded by the National Science Foundation (CSR-1903136, CNS-1908051, IIS-2046072) and an IBM Faculty Award.

https://www.sciencedaily.com/rss/all.xml