Thursday, December 12, 2024

A former Google designer reveals what goes into building its AI models like Gemini

Must read

  • Google has launched Gemini 2.0, making it one step closer to making a universal personal assistant.
  • A former Gemini conversation designer spoke about the best practices for chatbot design.
  • He said Google’s AI products and its search engine run into self-cannibalization issues.

Google launched its Gemini 2.0 model this week, promising more “agentic” AI to bring people closer to a version of a universal personal assistant.

When Google pushed out Gemini 1.0 last December, it sought to rival OpenAI’s ChatGPT. Gemini quickly changed how users experienced Google itself, from providing an overview of search engine results to the product NotebookLM, which can convert written notes into a spoken podcast. Its 2.0 version has functions like “Deep Research,” where Gemini can scour the web for information and prepare reports.

As AI assistants become increasingly human-like in their delivery, the engineers and designers building them must tackle the questions of responsibility and tone. For instance, some AI chatbots can refuse to give answers on potentially sensitive topics.

Business Insider spoke with Kento Morita, a former Google Gemini conversation designer and Japanese-American actor and comedian.

Morita previously worked on designing conversation flows for Amazon Alexa and Google Gemini, specifically focusing on building a Japanese persona for the AI. He gave insight into how AI chatbot designers think about delivering information to users efficiently and the challenge Google has with balancing its search engine and AI products.

The following was edited for length and clarity.

Business Insider: How are “tones” designed for sensitive topics for AI?

Kento Morita: Whenever we get a question that is possibly sensitive, it goes through a checklist of sorts like: is this political in nature? Is this sexual in nature? Does this generate anything that is counterfactual, and when? When the answer is yes, it goes through a process to make sure, ultimately, all these companies have their logo next to the answer that they provide. Kind of like Warren Buffett’s rule of thumb, we should be happy to see that on the front page of The New York Times or Washington Post the next day, and we should be proud of that.

The biggest question we need to answer is: Is associating Google or ChatGPT or anybody to this answer productive to their bottom line?

If it’s not, we do what’s called punting. We just give a: sorry, I can’t help with that kind of answer now. It’s a balancing act. Some topics we don’t want to even touch with a ten-foot pole, but there’s some things that we want to give answers to, like, for instance, like election night coverage — everybody’s going to be wondering about what’s happening.

We want to make sure that answering more questions allows more people to stay on our website. There’s always a tension in these companies to want to answer as many questions as we can, which any of these LLMs can, but it needs to also be balanced by, will this create more negative press, or will this provide answers that are potentially dangerous? A lot of talking to the legal team, talking to the marketing team, talking to sales. It’s an ongoing conversation all the time about how we want to approach this.

It’s always a question of what to prioritize.

It’s also a problem of cannibalizing a market.

One of Google’s biggest products is search. When you provide Gemini, what does that mean for the search business? It’s an ongoing existential question.

For companies like Google, companies like Perplexity AI may actually have an advantage here, I would say, because they’re in it to create one product and do one product really well. They actually don’t run into self-cannibalization problems. I do think really interesting things and really bold things are happening from companies that are not associated with a big conglomerate. I think that’s only natural.

Google shifted Gemini under the DeepMind organization. I truly do not know why they did this, but as a [former] employee and also a person that’s been following Google for a long time, it is interesting that they’re consolidating a lot of the AI businesses under one organization, especially in light of the antitrust lawsuit that’s happening right now around Google, and the conversation they’re having with the Department of Justice around whether to divide up Google or not. At the very least, if they do divide it up, I think they’re going to have a conversation about how dividing it up will make sense. And having Gemini be a part of an AI organization versus a search organization, I think does make sense.

We’ve been accustomed to using Google search with ads at the top. Now it’s Gemini. It’s not the most factually up-to-date result, but it’s a shift.

The Google Search team is filled with brilliant engineers. Their North Star goal is to provide search results that are relevant and accurate, and that’s been their goal their entire time. And then now you enter ads. Now you enter Google Shopping results. Then you bring in Gemini. All these other factors within the organization are encroaching on the Google.com website design.

I wouldn’t be surprised if a lot of the engineers and people who have been working on Google search for the longest time are very frustrated. That being said, I wouldn’t be surprised either if they welcome the idea of breaking off the company to allow them to focus on what they are excited to do, which is to provide good search results.

Can you tell me about the history of adding footnotes to chatbots and whether that was a deliberate decision? How have hallucinations changed how chatbots respond now?

Even with Google Assistant and Amazon Alexa, when you ask it a factual question, it used to say immediately, according to Wikipedia, blah blah blah blah, or according to XYZ, blah blah blah blah. At the time, it was pretty difficult to convince people that that’s a good idea. And the reason being is that from a conversational standpoint of voice, when you ask somebody hey, like, when was XYZ invented? You don’t really want to hear XYZ was invented, according to Wikipedia in 1947. You just want to hear the answer. Getting quickly to the answer is considered a virtue in design. Google spent so much time and effort trying to make the time to display the search results as short as possible, so it’s been in the Google DNA to get the answer to the customer as quickly as possible.

We had to advocate for footnotes. What really convinced them was this idea that the moment you attribute a website, you get to shirk the responsibility of the accuracy of that information to another website.

So when I say, according to Wikipedia XYZ, I am no longer responsible for whether what I’m saying is correct or not. I could just shirk that responsibility to Wikipedia. And when people started to ask sensitive questions about antisemitism or like conspiracy theories and what have you, being able to say, according to XYZ, this seems to be the case, allows us to distance ourselves from that statement, which is very, very useful when we’re talking about Google’s brand image.

When you have something labeled Google Assistant, saying this is what happened, you can’t help but to associate Google with whatever you’re talking about. So that kind of distancing language allows us to take less responsibility for the information that’s being presented. And so I think that ethos has carried on, and that sort of argument has been really useful in convincing people in these companies to cite our sources. Like Perplexity AI, because it’s so explicit in footnoting everything, they actually have more freedom to talk about more controversial topics.

They don’t need to editorialize anything, which is really a huge advantage when it comes to controversial topics and sensitive topics.

Explainability is something that they talk a lot about in the LLM space. LLMs, to a lot of people, feel like a black box, like you type in some text, and it spits out text. But ultimately, it is a prediction engine. To add guardrails, to editorialize, to content design around this black box that is a prediction engine has been very, very important, especially around sensitive information.

When Google Gemini and other AI are citing sources, it’s still a predictive machine?

There’s this thing called RAG (retrieval augmented generation). I think what they’re doing is indexing sources like AP News and Reuters higher to bias those sources and the information contained in those sources higher. When the LLM pulls more information from them, there is an attribution mechanism in the background that allows them to say, “We use RAG to call Reuters or AP News to retrieve their information.” I don’t think it’s predictive. It’s much more hard-coded.

For some topics, like abortion, AI chatbots adopt a caring tone, like asking, “Do you have any concerns?” That’s a significant shift in tone.

That’s one of the biggest things that I feel very proud of being involved in. While developing Google Assistant, whatever words about suicide or self-harm came up, we went around talking to mental health professionals and people who offer these services and to ask them, if we could provide the users a number to this hotline, No. 1, would that be helpful? No. 2, what’s the best language to do that? We took real care in talking to all these resources.

For me, I talked to Japanese resources and Japanese hotline providers, and we translated these messages. It took a lot of time, but we were trying to make sure that any user, even users who were thinking about self-harm, get the best information that they can.

When it comes to abortion, that fits into the same framework of strategy, of content strategy: how do we make sure people who are looking up abortion, how do we make sure they get the information in a way that’s safe and ultimately helps them live the life that they want? When I was at Google, we were able to fulfill our mission statement, which is to gather the world’s information and make it as useful and accessible as possible to everyone.

Ultimately, the democratization of these engines is going to happen. Every company is going to have a pretty decent LLM eventually in 5-10 years. The difference between me wanting to go to X or ChatGPT or Google or Alexa or whatever, the difference is going to be in the packaging.

The more these tech companies start to treat humans like humans and make robots speak human, I think those are the ones that are going to be the most successful in the long run.

Latest article