The new Google Gemini models make AI understand images and videos, in addition to texts.
OGoogle has just revealed its new LLMs (Large Language Models), called Gemini. According to the technology giant, these new models are capable of processing not only words, but also images and sounds. The project, which is being developed by the artificial intelligence sectorDeepMindanother of the company’s efforts to stay within the trends imposed by tools such asChatGPTyes OpenAI.
For this new project, Google is launching Gemini in several models, such as the Ultra, large and intended for complex tasks, and the Pro, medium-sized and aimed at generic and everyday activities. Finally, the smallest model, the Nano, was designed for mobile devices, something that is already on the company’s radar to be added to the operating system.Android from one of your cell phones in 2024.
Among current models, the Ultra surpasses the results of the others, achieving a score of 90% in a test called Massive Multitasking Language Comprehension, which assesses, in a challenging way, a model’s ability to understand more than 50 areas of knowledge. , including mathematics, physics, medicine and history. According to the company, this is the first LLM to outperform most humans in the test.
These models were previously trained by processing a large amount of data on their own. A Google spokesperson commented that the models were fueled with data from theYouTubebut did not specify whether this training took place with them literally “watching” the videos (a discovery that would have a great impact on the sector).
This functionality would open the door to countless possibilities, including the ability to analyze real-world objects through the lens of a future augmented reality headset, like theApple Vision Pro. While similar features may be developed by competitors, such as Meta, fromMark Zuckerbergreal-time visual and auditory processing capabilities represent an area of emphasis for future technological developments.
During its press presentation this week, Google showed a video showing Gemini reasoning with a set of images. In the video, a person placed an orange and a fidget toy on the table in front of a lens connected to the Gemini. Gemini immediately identified both objects and responded with an intelligent relationship between the two items:
“Ctrices can have a calming effect, just like the movement of a fidget toy,” the AI expressed out loud.
In another video, Gemini was demonstrated in a math test where a user wrote down their equations for a problem. Gemini then identified and explained the errors in the student’s calculations.
In the short term, Gemini’s capabilities can be experienced through the chatbotof Google known asBard. The company claims that it will be powered by the Gemini Pro model, which will likely grant the chatbot more advanced learning and reasoning abilities. According to Sissie Hsiao, vice president and general manager of Google Assistant and Bard, Bard will be upgraded to the Ultra model next year. Developers and enterprise customers will have access to Gemini Pro through an API provided by Google Cloud starting December 13th.
* With information from Fast Company
Follow Adnews on Instagram,LinkedIneThreads. #WhereTransformationHappens