Below is an article originally published on Meta's blog. Visit the Meta company page on PowerToFly to see their open positions and learn more.
Back in the early Rift days (and even the drop-in phone days), it was often said that there are two types of people: those who believe that virtual reality is going to change the world—and those who haven’t tried it yet. Oculus Go gave us a glimpse of the promise of untethered VR, which was then fully realized by the original Quest and improved upon by Quest 2. Quest Pro upped the ante with full-color Passthrough, eye tracking, and more. And just last year, Meta Quest 3 ushered in a new era of mass-market mixed reality.
From gaming and entertainment to fitness, productivity, education, and beyond, there are few if any aspects of our lives that this technology doesn’t stand to benefit. AR, VR, MR, XR, call it what you will, the common denominator is something akin to magic—one you have to see to believe.
There’s another well-worn saying that magicians never reveal their secrets. But today, we’re taking you behind the scenes for a look at some of the magic under the hood.
At Meta Connect 2023, Meta CEO & Founder Mark Zuckerberg explored the intersection of AI and metaverse technologies—particularly in the form of products that are making their way into the hands of millions of people across the globe. AI has been part of our DNA since the company’s early years, and it’s at the heart of our consumer hardware. We’ve spent the better part of the last decade pioneering new AI advances that drive MR, hand and body tracking, eye tracking, and Natural Facial Expressions, which, in turn, have made these devices possible.
All-in-one VR → mass-market MR
Breakthroughs in areas like computer vision and simultaneous localization and mapping (SLAM) allowed us to cut the cord and bring the original Quest—the world’s first 6DOF standalone VR headset with inside-out tracking—to market. This unlocked the ability to truly explore virtual worlds and move about your space freely, no wires, consoles, or battery packs required. Nearly five years later, AI remains virtually everywhere in modern XR systems, and Quest 3 (and the new generation of MR devices it represents) is no exception.
“Building technology that can offer magical experiences to the user within the constraints of form-factor, thermal limits, and cost is extremely challenging,” explains Reality Labs Senior Research Scientist Manager of XR Tech Rakesh Ranjan. “We optimized our software to squeeze every bit of the performance from the hardware. The improvement in Passthrough quality on Quest 3 has made it possible for the first time for our users to have exciting experiences in full-color stereoscopic mixed reality. Most of the 3D geometry improvements in Passthrough come from AI. As humans, we don’t like to feel disconnected from our surroundings, and more accurate MR opens up the door to magical experiences that blend the physical and the virtual worlds. What Quest 3 can do within its constraints is almost a miracle.”
For a headset to convincingly blend the virtual and physical worlds, it’s essential that the device digitally reconstruct the environment around you in 3D. It’s the power of AI that lets Quest headsets do this by recognizing the room environment around them and dynamically locating themselves in space. Over time, we’ve improved the technology that powers Quest tracking and localization, including implementing a new AI-based approach that replaced traditional computer vision algorithms in our systems last year. This led to significantly improved environment recall (read: fewer interrupted play sessions), especially for challenging lighting and viewpoints.
“The biggest lesson we’ve learned from Passthrough is that mathematically optimum points don’t necessarily mean perceptual optimums,” says Reality Labs Director of Engineering for XR Tech Ricardo Silveira Cabral. “Having the human visual system in the loop makes it a completely different problem. We overcame this by relying on experimentation more, trying out things that we would have scratched by just looking at the theory. Demos win arguments!”
AI also plays a key role powering Quest 3’s next-generation Touch Plus controllers. They use a cutting-edge AI model combined with signal from constellation tracking that uses infrared LEDs hidden around the top cover and bottom of the controller grip. The computer vision and machine learning within this system helps estimate the controllers’ position in 3D space so that tracking rings are no longer needed.
And the tracked keyboards feature from Quest 2 and Pro also launched on Quest 3, leveraging computer vision models and allowing you to see a 3D representation of both the keyboard and your hands so you can be more productive in VR—or MR, as the case may be.
“Machine learning Depth has been the team’s most important project for years, and we had to make fundamental advances in ML modeling, acceleration, and more,” says Reality Labs Director of Engineering for XR Tech Paul Furgale. “This allowed us to launch advanced MR features on Quest 3 despite not having state-of-the-art depth hardware. And after Quest 3, I’m convinced that Passthrough and MR will be a standard feature on all future headsets.”
Hands on with hand tracking
Since we shipped hand tracking on Quest in 2019, it unlocked new mechanics for developers and creators alike while introducing the technology to a mass audience for the first time. We deploy custom neural network architectures to continually estimate the position and movement of people’s hands. Not only does this help reduce the barrier of entry by letting people navigate VR without learning a new control scheme, it also gives them entirely new, more intuitive, and more immersive ways to interact with virtual worlds.
With Hand Tracking 2.2, we improved latency by up to 40% in typical usage and up to 75% during fast movement. We made additional improvements to make fast-paced games even more responsive. You can see the tech in action with the Move Fast demo app.
And the operating system that powers Quest is integrating hands directly into the way you use your device. Last year, we shipped Direct Touch, which lets people interact directly with the system and apps using their hands in a more natural and intuitive way. We also introduced a new gesture for hands-based locomotion in First Hand to address one of the hardest problems in XR: If you aren’t going to just sit on your couch, how do you move around naturally and comfortably? We think hands might be a viable answer.
Inside-out body tracking + generative legs
Late last year, we released Inside-Out Body Tracking (IOBT), the first vision-based tracking system for mapping out upper-body movements with Quest 3, as well as Generative Legs, which generates plausible leg motions for all Quest devices.
Leveraging Quest 3’s side cameras to track wrist, elbow, shoulder, and torso movements using advanced computer vision algorithms, IOBT significantly increases tracking accuracy for fitness, gaming, and social presence use cases. Generative Legs makes it easier for developers to create full-body characters by assessing the position of their upper body, thanks to the power of AI—no additional sensors or hardware required.
“The biggest challenge was to develop the body tracking system with smooth body motion predictions when there are limitations in the headset camera coverage, which mean that the user’s body parts can go out of view or get occluded,” explains Reality Labs Senior Research Scientist Manager of XR Tech Lingling Tao. “The unique inside-out views of the body are drastically different from what a typical camera-based body tracking system uses. Our team tackled this by designing novel AI algorithms and building the hardware systems needed to collect the data for training machine learning models.”
Eye tracking and natural facial expressions
With eye tracking and Natural Facial Expressions, Quest Pro brought us a step closer to showing our authentic selves in the metaverse. At Connect 2022, we introduced Eye-Tracked Foveated Rendering (ETFR) for Quest Pro, a new graphics optimization that renders the high-pixel-density foveal region to match where the user is looking based on eye gaze, which results in substantial GPU savings for developers.
“Shipping computer vision- or AI-related features is hard, simply because there’s no simple way to evaluate quality,” explains Reality Labs Senior Engineering Manager of XR Tech Sarah Amsellem. “It’s always a combination of quantitative metrics, qualitative analysis, user studies, etc. As these features need to run in real time, we also need to keep in mind other constraints like memory consumption or compute budget.”
“Quest Pro comes with a total of 16 image sensors in the box (including controllers), powering machine perception tasks such as hand, face, and eye tracking running at hundreds of times per second,” says Reality Labs VP and XR Architect Oskar Linde. “A lot of innovation went into designing the AI algorithms, as well as training and tuning.”
“It’s extremely challenging to ship a computer vision technology that works perfectly for all users under all usage conditions,” adds Reality Labs Senior Engineering Manager of XR Tech Dmitri Model. “Getting closer to that level of performance robustness was the main challenge we tried to overcome with Quest Pro. And I’m proud that we have achieved best-in-class performance as measured in a variety of internal and external benchmarking.”
Packaging AI in the ultimate form factor
We’ve long said that all-day wearable, stylish, comfortable glasses are the ultimate form factor for delivering compelling AR experiences with an interface driven by contextualized AI. We believe this is the next great paradigm shift in human-oriented computing: glasses that can see and hear the world from your perspective and provide useful information as you go about your day—all in a more natural and less intrusive way than today’s smartphones.
We’re exploring two parallel paths to get there: virtual and mixed reality headsets like Quest 3, which feature the displays, virtual content, and augmentation we think will be key to the next computing platform, and smart glasses, which achieve the desired form factor and are beginning to incorporate generative and multimodal AI* to help us better navigate the world around us.
AI lets you use a more natural way of speaking when interacting with Ray-Ban Meta smart glasses, rather than being limited to specific voice commands. Large language models (LLMs) can often give very long, detailed answers—something no one wants to sit through being read aloud while on the go. So we use AI to give short and sweet answers to your questions, and we’re always looking for ways to improve the experience further.
And AI isn’t just about LLMs. We use cutting-edge techniques to stabilize your photos and videos when taken on the go and make full use of our five-microphone array to suppress background noise in windy environments, leading to higher quality content capture.
“On one hand, our vision for smart glasses is to be just as comfortable and stylish as a regular pair of glasses,” notes Reality Labs Product Manager Janelle Tong. On the other hand, when it comes to all the powerful and immersive features that our technology offers—first-person POV capture, open-ear audio experiences, immersive audio recording, hands-free calling, live-streaming directly to Instagram or Facebook, and Meta AI and multimodal experiences—these combined can result in a pretty compute- and power-intensive experience. We prioritized fit and comfort, making this generation of smart glasses thinner on all sides and lighter than the previous generation.”
“Core AI worked closely with the smart glasses team to develop efficient LLMs that could be deployed on-device,” adds Reality Labs Director of AI Research for XR Tech Vikas Chandra. “In addition to LLMs running on servers to support Meta AI on smart glasses, we would need smaller on-device language models to reduce latency for various tasks. This requires innovation in model architectures, novel training strategies, and inference runtime optimizations.”
A compelling vision for the future
Our vision for future AR glasses ultimately relies on an entirely new kind of ultra-low-friction interface that understands and anticipates our needs throughout the day. While several directions have potential, wrist-based EMG is the most promising for input. And while the key to that interface remained an open research question for years, recent progress on LLMs are fueling progress.
We’ve also been making steady progress on Codec Avatars, our realistic digital representations that accurately reflect how a person looks and moves in the physical world, since 2019. You may have seen Zuckerberg’s latest demo with Lex Fridman late last year. Making Codec Avatars work on consumer headsets is a technological challenge—but it’s largely an AI problem, and the AI is accelerating much faster than expected.
As Zuckerberg noted at Connect 2023, innovation comes not only from breakthrough technologies—but from putting those technologies in the hands of people at scale. That’s the magic at work at Meta, and it’s a brand of prestidigitation we continue to power every day.
“We’re creating new product categories, which are increasingly packed with cutting-edge technology, while becoming lighter and more ergonomic,” says Model. “One can look at the innovation curve from Rift to Quest to Ray-Ban Meta smart glasses and extrapolate this trend a few years into the future. It’s hard to not get excited about what lies ahead.”
*Meta AI features currently only available in the US in beta.
Learn about Life at Meta on Instagram (@lifeatmeta) and like their Life at Meta Facebook Page for the latest updates.