When Google Lens was introduced in 2017, the search feature achieved a feat that not too long ago would have seemed like science fiction: Point your phone’s camera at an object and Google Lens can identify it, show some context, maybe even give you let buy it. It was a new way of searching that didn’t involve awkwardly writing descriptions of things you see in front of you.
Lens also demonstrated how Google plans to use its machine learning and AI tools to ensure that its search engine is displayed on every possible surface. As Google increasingly uses its core generative AI models to generate summaries of information in response to text searches, Google Lens visual search is also evolving. And now the company says Lens, which powers about 20 billion searches a month, will support even more ways to search, including video and multimodal searches.
Another Lens setting means even more shopping context will appear in the results. Shopping is unsurprisingly one of the key use cases for Lens; Amazon and Pinterest also have visual search tools designed to drive more purchases. Search for a friend’s sneakers in the old Google Lens and you might be shown a carousel of similar items. In the updated version of Lens, Google says it will show more direct purchase links, customer reviews, publisher reviews and comparison shopping tools.
Lens search is now multimodal, a hot word in AI these days, meaning people can now search with a combination of video, image, and voice inputs. Instead of pointing their smartphone camera at an object, tapping the focus point on the screen, and waiting for the Lens app to return results, users can point the lens and use voice commands at the same time, such as “What kind of clouds are they?” or “What brand are these sneakers and where can I buy them?’
Lens will also begin working on real-time video capture, taking the tool a step beyond identifying objects in still images. If you have a broken record player or see a flashing light on a malfunctioning appliance at home, you can capture a quick video through Lens and, through an AI-generated review, see tips on how to fix the item.
First announced at I/O, the feature is considered experimental and is only available to people who have opted in to Google’s search labs, says Rajan Patel, an 18-year Google employee and co-founder of Lens. Google Lens’ other features, voice mode and advanced shopping, are rolling out more widely.
The “video understanding” feature, as Google calls it, is intriguing for several reasons. Although it currently works with video captured in real-time, if or when Google expands it to captured video, entire repositories of video — whether in its own camera or in a massive database like Google — could potentially become tagged and overwhelmingly shoppable.
The second consideration is that this Lens feature shares some features with Google’s Project Astra, which is expected to be available later this year. Astra, like Lens, uses multimodal inputs to interpret the world around you through your phone. As part of an Astra demonstration this spring, the company showed off a pair of prototype smart glasses.
Separately, Meta just made waves with its long-term vision for our augmented reality future, which involves mere mortals wearing dark glasses that can intelligently interpret the world around them and show them holographic interfaces. Google, of course, already tried to realize this future with Google Glass (which uses a fundamentally different technology than the last Meta presentation). Are Lens’ new features paired with Astra a natural extension of a new kind of smart glasses?