Google's Gemini for dummies: Why experts are divided on its potential success

share on

twitter /
facebook /
linkedin /
- email
- telegram
- whatsapp
- wechat
- pinterest
- line
- snapchat
- reddit

Last night, Google launched its biggest effort in the generative AI field since the launch of Bard, an artificial intelligence chatbot technology that was announced in February this year.

With how much AI has progressed over the last year, Gemini aims to solve existing problems and be flexible and useful in a multitude of ways.

If you are lost as to why this is such a significant development, here's everything you need to know about Gemini as well as why industry professionals are divided on its potential success.

What is Gemini?

Gemini is a multimodal platform that can generalise and understand, operate across and combine different types of information including text, code, audio, image and video.

Don't miss: Google looks to generative AI ads as it amps capabilities

The system, which comprises Gemini Ultra, Gemini Pro and Gemini Nano, is also a flexible one that can run on everything from data centers to mobile devices.

"We've been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development," said Google and Alphabet's CEO Sundar Pichai in a blog post.

He added that with a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

"Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression," Pichai explained.

Gemini Ultra also achieves a state-of-the-art score of 59.4% on the new MMMU benchmark, which consists of multimodal tasks spanning different domains requiring deliberate reasoning.

With the image benchmarks tested, Gemini Ultra outperformed previous state-of-the-art models, without assistance from object character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini’s native multimodality and indicate early signs of Gemini's more complex reasoning abilities, he said.

Why is Gemini a supposed gamechange?

With AI models popping up left right and centre, why exactly is Gemini causing such a hype in the tech space? This could largely be due to its multimodal approach.

Until now, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together to roughly mimic some of this functionality, according to Google. These models can sometimes be good at performing certain tasks, such as describing images, but struggle with more conceptual and complex reasoning.

"We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state of the art in nearly every domain," said Pichai.

Essentially, Gemini can use its multimodal reasoning capabilities to make sense of complex written and visual information. It is also uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data. This could be vital when it comes to the use of AI in the workplace and in information-heavy industries such as finance or science.

On a learning front, Gemini can better understand nuanced information and can answer questions relating to complicated topics. This makes it especially good at explaining reasoning in complex subjects such as math and physics, according to Google.

It can also understand, explain and generate high-quality code in the programming languages such as Python, Java, C++, and Go.

Is it safe?

Right now, one of the core discussions surrounding generative AI is if it is really safe and secure. In fact, recently, a Sumsub Identity Fraud Report 2023 found that deepfakes in the APAC region has grown by an average of 1530% from last year, posing a threat to cyber security if the technology is misused. It also found that the Philippines saw the largest increase in deepfakes at 4500%, while Hong Kong experienced a 1300% increase and Malaysia along with Singapore experiencing a 1000% and 500% increase respectively.

This is worrying considering that a separate study by Capgemini Research Institute earlier this year found that 70% of Singaporean consumers, compared to 73% of consumers globally, trust content created by generative AI. This spans across many aspects of life, from financial planning and medical diagnosis, to even relationship advice.

To counter that, Gemini has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity, according to Pichai.

"We’ve conducted novel research into potential risk areas like cyber-offense, persuasion and autonomy, and have applied Google Research’s best-in-class adversarial testing techniques to help identify critical safety issues in advance of Gemini’s deployment," he said.

He added that to identify blind spots in its internal evaluation approach, it is working with a diverse group of external experts and partners to stress-test its models across a range of issues.

It also built dedicated safety classifiers to identify, label and sort out content involving violence or negative stereotypes, for example. Combined with robust filters, this layered approach is designed to make Gemini safer and more inclusive for everyone, it said.

Gemini is currently being rolled out across a range of products and platforms globally and certainly seems like it could be a gamechanger in the generative AI space, according to industry experts MARKETING-INTERACTIVE spoke to.

A potential gamechanger

According to Sandeep Joseph, CEO and co-founder of Ampersand Advisory, what really makes it stand out is that it can accept input from multiple sources including images and videos and that it can work on phones. He said:

Clearly Google felt the need to catch up as Microsoft threatened to run away with the AI headlines, and the market.

He added that for other firms and platforms to catch up, other models need to be able to work on phones, use less computing power and generally be more accessible to all.

Can other models catch up?

Agreeing with him, Joey Egger, head of AI at DEPT APAC added that other companies need to start focusing on the end use experience, a clear pricing model, and providing easy to use tools and APIs that make it easy for all people.

No matter if they are engineers wanting to build pipelines to improve their technical workflows, creative people who want to build automations that allow efficiency of output while still allowing them to have full creative control, or moms and dads who want to make fun birthday cards or simple games for their kids, having AI tools that are accessible for all is important when it comes to catching up, she said.

She added that while Bard did have a poor start, it is now very strong as a search engine, shopping tool and information provider.

Adding to her point, Ranganathan Somanathan, co-founder at RSquared Global Ventures explained that Gemini could pose a threat to other AI models simply because of its widespread penetration through services such as Chrome, Gmail, Android and more. "The adoption rate can be accelerated very organically," he said.

"Just as past success do not predict future victories, past failures too don’t define future outcomes," he said. "People today might be less loyal to brands but they are, at the same time, more forgiving if brands make amends and refresh themselves with integrity," I believe with Gemini, we are seeing a much stronger recovery of Bard."

A little too late in the game?

Saying that, not everyone is entirely convinced just yet. According to Mercedes-Benz's AI scientist Milind, who was expressing independent views, Gemini currently does not pose a threat to OpenAI or Microsoft.

"They are ahead by approximately six months to a year in terms of commercialising their AI models. Although this lead may seem modest, it represents a significant gap in the rapidly evolving field of AI," he said in a conversation with MARKETING-INTERACTIVE.

He added that while DeepMind has developed some great AI products in the last few years, we have yet to see what Gemini is capable of and how successful Google will be in incorporating the model in their products. He said:

But now, Google is just playing catch-up.

He added that OpenAI and Microsoft have been establishing dominance in the cutting-edge AI ecosystem all year so one positive with the release of Gemini is that it will introduce increased competition and likely encourage accelerated progress in the AI space which is advantageous in the long-run.

share on