Transcript of Google DeepMind CEO Demis Hassabis on AI, Creativity, and a Golden Age of Science | All-In Summit
All-In with Chamath, Jason, Sacks & FriedbergA genius who may hold the cards of our future. Ceo of Google DeepMind, which is the engine of the company's artificial intelligence. After his Nobel and a likelihood from King Charles, he became a pioneer of artificial intelligence. We were the first ones to start doing it seriously in the modern era. Alphago was the big watershed moment, I think not just for DeepMind and my company, but for AI in general. This was always my aim with AI from a kid, which is to use it to accelerate scientific discovery.
Ladies and gentlemen, please welcome Google DeepMind's Demis Hasabis. Welcome.
Great to be here.
Thanks for following Tucker Mark Cuban at all. First off, congrats on winning the Nobel Prize.
Thank you very much. Thank you.
For the incredible breakthrough of AlphaLogic the fold. Maybe you may have done this before, but I know everyone here would love to hear your recounting of where you were when you won the Nobel Prize. How did you find out?
Well, it's a very surreal moment, obviously. Everything about it is surreal. The way they tell you, they tell you 10 minutes before it all goes live. You're shell shocked when you get that call from Sweden. It's the call that every scientist dreams about. And then the ceremony is a whole week in Sweden with the Royal family. It's Obviously, it's been going for 120 years. And the most amazing bit is they bring out this Nobel book from the vaults in the safe, and you get to sign your name next to all the other greats. So it's quite an incredible moment, leafing back to the other pages and seeing fineman and Marie Curie and Einstein and Niels Bohr, and you just carry on going backwards, and you get to put your name in that book. It's incredible.
Did you have an inkling you had been nominated and that this might be coming your way?
Well, you hear It's amazingly locked down, actually, in today's age, how they keep it so quiet, but it's like a national treasure for Sweden. And so you hear maybe alpha fold is the thing that would be worthy of that recognition. They look for impact as well as the scientific breakthrough impact in the real world. And that can take 20, 30 years to arrive. So you just never know how soon it's going to be and whether it's going to be at all. So it's a surprise. Well, congrats.
Yeah. Thank you. And thank you. You let me take a picture with it a few weeks ago, and we had to sit there. That's something I'll cherish. What is DeepMind within Alphabets? Alphabets is a sprawling organization, sprawling business units. What is DeepMind? What are you responsible for?
Well, we see DeepMind now and Google DeepMind as it's become. We merged a couple of years back all of the different AI efforts across Google and Alphabets, including DeepMind, put it all together, bringing the strengths of all the different groups together into one division. And really, the way I describe it now is that we're the engine room of the whole of Google and the whole of Alphabets. So Gemini, our main model that we're building, but also many of the other models that we also build, the video models and interactive world models, we plug them in all across Google now. So pretty much every product, every surface area has one of our AI models in it. So billions of people now interact with Gemini models, whether that's through AI Overview, AI Mode, or the Gemini app. And that's just the beginning. We're incorporating into Workspace, into Gmail, and so on. So it's a fantastic opportunity, really, for us to do cutting-edge research, but then immediately ship it to billions of users.
And how many people? What's the profile? Are these scientists, engineers? What's the makeup of your organization?
There's around 5,000 people in my org, in Google DeepMind. And it's predominantly, I guess, 80 % plus engineers and PhD researchers. So Yeah, about three or four thousand.
So there's an evolution of models, a lot of new models coming out, and also new classes of models. The other day, you released this Genie World model. Yes. So what is the Genie World model? And I think we got a video of it. Is it worth looking at and we can talk about it live?
Yeah, we can watch it. Sure.
I think you have to see it to understand it because it's so extraordinary. Can we pull up the video? And then Demis can narrate a little bit about what we're looking at? What you're seeing are not games or videos.
They're worlds.
Each one of these is an interactive environment generated by Genie 3, a new frontier for world models. With Genie 3, you can use natural language to generate a variety of worlds and explore them interactively, all with a single text prompt.
Yeah, so all of these videos, all these interactive worlds that you're seeing. So you're seeing someone actually can control the video. It's not static video. It's just being generated by a text prompt, and then people are able to control the 3D environment using the arrow keys and the space bar. So everything you're seeing here is being fully... All these pixels are being generated on the fly. They don't exist until the player or the person interacting with it goes to that part of the world. So all of this richness. And then you'll see in a second. So this is fully generated. This is not a real video. This is generated someone painting their room, and they're painting some stuff on the wall. And then the player is going to look to the right and then and they look back. So now this part of the world didn't exist before, so now it exists. And then they look back and they see the same painting marks they left just earlier. And again, this is fully... Every pixel you can see is fully generated. Code, and then you can type things like person in a chicken suit or a jet ski, and it will just in real-time, include them in the scene.
It's quite mind-blowing, really.
But I think what's hard to grock when looking at this because we've all played video games that have a 3D element to them when you're in an immersive world. But there's no objects that have been created. There's no rendering engine. You're not using Unity or Unreal, which are the 3D rendering engines. This is actually just 2D images that are being rendered, created on the fly by the AI.
This model is reverse engineering intuitive physics. So it's watched many millions of videos and YouTube videos and other things about the world. And just from that, it's I reverse engineered how a lot of the world works. It's not perfect yet, but it can generate a consistent minute or two of interaction as you as the user in many, many different worlds. There's some videos later on where you can control a dog on a beach or a jellyfish, or that's not limited to just human things.
Because the way a 3D rendering engine works is you type in the programmer programs all the laws of physics. How does light reflect off of an object? You create a 3D object light reflects off. And then so what I see visually is rendered by the software because it's got all the programming on how to create physics, how to do physics. But this was just trained off of video and it figured it all out.
Yeah, it was trained off of video and some synthetic data from game engines, and it's just reverse engineered. And for me, it's very close to my heart, this project, but it's also quite mind-blowing because in the '90s, in my early career, I used to write video games and AI for video games and graphics engines. I remember how hard it was to do this by hand, program all the polygons and the physics engines. It's amazing to just see this, do it effortlessly. All of the reflections on the water and the way materials flow and objects behave. And it's just doing that all out of the box.
I think it's hard to describe how much complexity was solved for with that model. It's really, really, really mind-blowing. Where does this lead us? So fast forward this model to Gen 5.
Yeah. So the reason we're building these models is we feel, and we've always felt, we're obviously progressing on the normal language models, like with our Gemini model. But from the beginning with Gemini, We wanted it to be multimodal. So we wanted it to take any input, images, audio, video, and it can output anything. And so we've been very interested in this because for an AI to be truly general, to build AGI, we feel that the AGI system needs to understand the world around us and the physical world around us, not just the abstract world of languages or mathematics. And of course, that's what's critical for robotics to work. It's probably what's missing from it today. And also things like smart glasses, a smart glasses system that helps you in your everyday life. It's got to understand the physical context that you're in and how the intuitive physics of the world works. So we think that building these types of models, these Genie models, and also VEO, the best text to video models, models, those are expressions of us building world models that understand the dynamics of the world, the physics of the world. If you can generate it, then that's an expression of your system understanding those dynamics.
And that leads to a world of robotics, ultimately. One aspect, one application. But maybe we can talk about that. What is the state of the art with the vision, language, action models today? So a generalized system, a box, a machine that can observe the world with a camera, and then I can use language, I can use text or speech to tell it, I want you to do it, and then it knows how to act physically to do something in the physical world.
Yeah, that's right. So if you look at our Gemini Live version of Gemini, where you can hold up your phone to the world around you. I'd recommend any of you try it. It's magical what it already understands about the physical world. You can think of the next step as incorporating that in some more handy device like glasses users, and then it will be an everyday assistant. It'll be able to recommend things to you as you're walking the streets, or we can embed it into Google Maps. And then with robotics, we've built something called Gemini Robotics models, which are fine-tuned Gemini with extra robotics data. And what's really cool about that is, and we released some demos of this over the summer, was we've got these tabletop setups of two hands interacting with objects on a table, two robotic hands. And you can just talk to the robot. So you can say, put the yellow object into the red bucket or whatever it is, and it will interpret that instruction, that language instruction, into motor movements. And that's the power of a multimodal model rather than just a robotic-specific model, is that it will be able to bring in real-world understanding to the way you interact with it.
So in the end, it will be the UI, UX that you need, as well as the understanding the robots need to navigate the world safely.
I asked Sundar this, does that mean that ultimately you could build what would be the equivalent of, call it either a Unix, like an operating system layer or like an Android for generalized robotics, at which point, if it works well enough across enough devices, there will be a proliferation of robotics devices and companies and products that will suddenly take off in the world because this software exists to do this generally.
Exactly. That's certainly one strategy we're pursuing is a a droid play, if you like, as a robotics, almost an OS layer, cross-robotics. But there's also some quite interesting things about vertically integrating our latest models with specific robot types and robot designs. And some end-to-end learning of that, too. So both are actually pretty interesting, and we're pursuing both strategies.
Do you think that there's humanoid robots as a good form factor? Does that make sense in the world? Because some folks have criticized it as being It's good for humans because we're meant to do lots of different things. But if we want to solve a problem, there may be a different form factor to fold laundry or do dishes or clean the house or whatever.
Yeah, I think there's going to be a place for both. Actually, I used to be of the opinion maybe 5, 10 years ago that we'll have form-specific robots for certain tasks. I think in industry, industrial robots will definitely be like that, where you can optimize the robot for the specific task, whether it's a laboratory or a production line. You'd want quite different types of robots. On On the other hand, for general use or personal use, robotics, and just interacting with the ordinary world, the humanoid form factor could be pretty important because, of course, we've designed the physical world around us to be for humans. And so steps, doorways, all the things that we've designed for ourselves, rather than changing all of those in the real world, it might be easier to design the form factor to work seamlessly with the way we've already designed the world. So I think there's an argument to be made that the humanoid form factor could be very important for those types of tasks. But I think there is a place also for specialized robotic forms.
Do you have a view on hundreds of millions, millions, thousands over the next five years, seven years? Do you have in your head, do you have a vision?
Yeah, I do. And I spend quite a lot of time on this. And I think we're still a little bit early on robotics. I think in the next couple of years, there'll be a real wow moment with robotics. But I think the algorithms need a bit more development. The general purpose models that these robotics models are built on still need to be better and more reliable and better understanding the world around it. I think that will come in the next couple of years. And then also on the hardware side, the key is, I think eventually we will have millions of robots helping society and increasing productivity. But the key there is when you talk to hardware experts is, at what point do you have the right level of hardware to go for the scaling option? Because effectively, when you start building factories around trying to make tens of thousands, hundreds of thousands of a particular robot type, it's harder for you to update, quickly iterate the robot design. So it's one of those There's a lot of questions where if you call it too early, then the next generation of robot might be invented in six months time.
That's just more reliable and better and more dexterous.
Sounds like using a computing analogy, we're in the '70s era PC-DOS.
Yeah, potentially. But of course, I think maybe that's where we are, but I think, except that 10 years happens in one year, probably.
1984 might be one of those years. Exactly. Let's talk about other applications. Particularly in science, true to your heart as a scientist, as the Nobel Prize-winning scientist, I always felt like the greatest things that we would be able to do with AI would be the problems that are intractable to humans with our current technology and capabilities and our brains and whatnot, and we can unlock all of this potential. What are the areas of science and breakthroughs in science that you're most excited about, and what kinds of models do we use to get there?
Yeah, I mean, AI to accelerate scientific discovery and help with things like human health is the reason I spent my whole career on AI. And I think it's the most important thing we can do with AI. And I feel like if we build AGI in the right way, it will the ultimate tool for science. I think we've been showing at DeepMind a lot of the way of that, obviously, alpha fold, most famously. But actually, we've applied our AI systems to many branches of science, whether it's material design, helping with controlling plasma and fusion reactors, predicting the weather, solving math's Olympiad math problems. And the same types of systems with some extra fine-tuning can basically solve solve a lot of these complex problems. So I think we're just scratching the surface of what AI will be able to do, and there are some things that are missing. So AI today, I would say, doesn't have true creativity in the sense that it can't come up with a new conjecture yet or a new hypothesis. It can maybe prove something that you give it, but it's not able to come up with a new idea or new theory itself.
So I think that would be one of the tests actually for AGI.
What is that? Creativity as a human? Yeah. What is creativity then?
I think it's this intuitive leaps that we often celebrate with the best scientists in history and artists, of course. And maybe it's done through analogy or analogical reasoning. There are many theories in psychology and neuroscience as to how we as human scientists do it. But a good test for it would be something like, give one of these modern AI systems a knowledge cutoff of 1901 and see if it can come up with special relativity like Einstein did in 1905. If If it's able to do that, then I think we're onto something really, really important, where perhaps we're nearing an AGI. Another example would be with our AlphaGo program that beat the world champion at Go. Not only did it win back 10 years ago, it invented new strategies that had never been seen before for the game of Go, this famously Move 37 in game 2 that is now studied. But can an AI system come up with a game as elegant, as satisfying, as esthetically beautiful as Go, not just a new strategy. The answer to those things at the moment is no. So that's one of the things I think that's missing from a true general system, an AGI system, is it should be able to do those kinds of things as well.
Can you break down what's missing and maybe related to the point of view shared by Dario, Sam, others about AGI is a few years away. Do you not subscribe to that belief? And maybe help us understand what is it in your understanding of structure, in your understanding of the system architecture, what's lacking?
Well, so I think the fundamental aspect of this is, can we mimic these intuitive leaps rather than incremental advances that the best human scientists seem to be able to do? I always say, what separates a great scientist from a good scientist is they're both technically very capable, of course, but the great scientist is more creative. And so maybe they'll spot some pattern from another subject area that can have an a analogy or some pattern matching to the area they're trying to solve. I think one day AI will be able to do this, but it doesn't have the reasoning capabilities and some of the thinking capabilities that are going to be needed. To make that breakthrough. I also think that we're lacking consistency. So you often hear some of our competitors talk about these modern systems that we have today are PhD intelligences. I think that's a nonsense. They're not PhD intelligences. They have some capabilities that are PhD-level, but they're not in general capable, and that's exactly what general intelligence should be, of performing across the board at the PhD level. In fact, as we all know, interacting with today's chatbots, if you pose the question in a certain way, they can make simple mistakes with even high school maths and simple counting.
So that shouldn't be possible for a true AGI system. So I think that They are maybe, I would say, 5-10 years away from having an AGI system that's capable of doing those things. Another thing that's missing is continual learning, this ability to online teach the system something new or adjust its behavior in some way. And so a lot of these, I think, core capabilities are still missing, and maybe scaling will get us there. But I feel if I was to bet, I think there are probably one or two missing breakthroughs that are still required and will come over the next five or so years.
In the meantime, some of the reports and the scoring systems that are used seem to be demonstrating two things. One, perhaps, and tell me if we're wrong on this, a convergence of performance of large language models. Number two, perhaps, is a slowing down or a flatlining of improvements and performance on each generation. Are those two statements generally true or not so much?
No, we're not seeing that internally, and we're still seeing a huge rate of progress. But also So we're looking at things more broadly. You see with our Genie models and VEO models and-Nanobanana is in... It's Bananas.
Yes.
It's bananas. Yes. It's bananas. It was well named.
Can I see who's used it? Has anyone used Nana Banana? It's incredible, right? I'm a nerd who used to use Adobe Photoshop as a kid and Kai's power tools, and I was telling you, Bright 3D. So the graphic systems and recognizing what's going on there was just mind-blowing.
Well, I think that's the future of a lot of these creative creative tools is you're just going to vibe with it or just talk to them. And it'll be consistent enough where like with Nana Banana, what's amazing about it is that it's an image generator. It's state of the art and best in class. But one of the things that makes it so great is it's consistency. It's able to instruction, follow what you want changed and keep everything else the same. And so you can iterate with it and eventually get the output that you want. And that's, I think, what the future of a lot of these creative tools is going to be and signals the direction. And people love it, and they love creating with it.
So democratization of creativity, I think, is really powerful. I remember having to buy books on Adobe Photoshop as a kid, and then you'd read them to learn how to remove something from an image and how to fill it in and feather and all the stuff. Now anyone can do it with Nana Banana, and just explain to the software what they wanted to do, and it just does it.
Yeah. I think you're going to see two things, which is this democratization of these tools for everybody ready to just use and create with without having to learn incredibly complex UXs and UIs like we had to do in the past. But on the other hand, I think we're also collaborating with filmmakers and top creators and artists So they're helping us design what these new tools should be, what features would they want. People like the director, Darren Aronowski, who's a good friend of mine, an amazing director, and he's been making, and his team is making films using V. O. And some of our other tools. And we're learning a lot by observing them and and collaborating them. And what we find is that it also superpowers and turbocharges the best professionals, too, because they're suddenly the best creatives, the professional creatives, they're suddenly able to be 10X, 100X more productive. They can just try out all sorts of ideas they have in mind, very low cost, and then get to the beautiful thing that they wanted. So I actually think it's both things are true. We're democratizing it for everyday use, for YouTube creators and so on.
But on the other hand, at the high end, the people who understand these tools, and it's not everyone can get the same output out of these tools. There's a skill in that, as well as the vision and the storytelling and the narrative style of the top creatives. I think it just allows them. They really enjoy using these tools. It allows them to iterate way faster.
Do we get to a world where each individual describes what content they're interested in? Play me music like Dave Matthews, and it'll play some new track. Or I want to play a video game set in in the movie, Braveheart, and I want to be in that movie, and I just have that experience. Do we end up there? Or do we still have a one to many creative process in society? How important culturally, and I know this is a little bit philosophical, but it's interesting to me, which is, are we still going to have storytelling where we have one story that we all share because someone made it, or are we each going to start to develop and pull on our own virtual?
I actually foresee a world, and I think a lot about this having started in the games industry as a game designer and programmer in the '90s, is I think the future of this is what we're seeing is the beginning of the future of entertainment, maybe some new genre or new art form, and where there's a bit of co-creation. I still think that you'll have the top creative visionaries. They will be creating these compelling experiences and dynamic story lines, and they'll be of higher quality, even if they're using the same tools that the everyday person can do. But also, millions of people will potentially dive into those worlds, but maybe they'll also be able to co-create certain parts of those worlds. And perhaps the main creative person is almost an editor of that world. So that's the things I'm foreseeing in the next few years. And I'd actually like to explore ourselves with technologies like Genie.
Right. Incredible. And how are you spending your time? Maybe you can describe Isomorphic. Yes, of course. What Isomorphic is. And are you spending a lot of your time there?
I am. So I also run Isomorphic, which is our spin-out company to revolutionize drug discovery, building on our alpha fold breakthrough in protein folding. And of course, knowing the structure of a protein is only one step in the drug discovery process. So isomorphic, you can think of it as building many adjacent alpha folds to help with things like designing chemical compounds that don't have any side effects but bind to the right place on the protein. And I think we could reduce down drug discovery from taking years, sometimes a decade to do, down to maybe weeks or even days over the in the next 10 years.
That's incredible. Do you think that's in clinic soon, or is that still in the discovery phase?
We're building up the platform right now, and we have great partnerships with Eli Lilly. I think you had the CEO speaking earlier and Novartis, which are fantastic, and our own internal drug programs. I think we'll be entering preclinical phase sometime next year.
So candidates get handed over to the pharma company and they then take them forward. That's right.
And we're working on cancers and immunology and oncology, and we're working with places like MD Anderson.
How much of this requires, and I just want to go back to your point about AGI as it relates to what you just said. Models can be probabilistic or deterministic, and tell me if I'm reducing this down too simplistically, that the model takes an input and it outputs something very specific, like It's got a logical algorithm, and it outputs the same thing every time, and it could be probabilistic, where it can change things and make selections. The probability is 80%, I'll select this letter, 90%, I'll select this letter, next, etc. How much do we have to develop deterministic models that sync up with, for example, the physics or the chemistry underlying the molecular interactions as you do your drug discovery modeling? How much are you building novel deterministic models that work with the models that are probabilistic trained on data?
Yeah, it's a great question. Actually, for the moment, and I think probably for the next five years or so, we're building what maybe you could call hybrid models. So AlphaFold itself is a hybrid model where you have the learning component, this probabilistic component you're talking about, which is based on neural networks and transformers and things, and that's learning from the data you give it, any data you have available. But also in a lot of cases with biology and chemistry, there isn't enough data to learn from. So you also have to build in some of the rules about chemistry and physics that you already know about. So for example, with alphaFold, the angle of bonds between atoms, and make sure that the alphaFold understood you couldn't have atoms overlapping with each other and things like that. Now, in theory, it could learn that, but it would waste a lot of the learning capacity. Actually, it's better to have that as a constraint in there. Now, the trick is with all hybrid systems, and AlphaGo was another hybrid system, where there's a neural network learning about the game of Go and what patterns are good. And then we had Monte Carlos research on top, which was doing the planning.
And so the trick is, how do you marry up a learning system with a more handcrafted system, bespoke system, and actually have them work well together? And that's pretty tricky to do.
Does that architecture ultimately lead to the breakthroughs needed for AGI, do you think? Are there deterministic components that need to be-I think ultimately, what you want to do is when you figure out something where there's one of these hybrid systems, what you ultimately want to do is upstream it into the learning component.
So it's always better if you can do end-to-end learning and directly predict the thing that you're after from the data that you're given. So once you've figured out something using one of these hybrid systems, you then try and go back and reverse engineer what you've done and see if you can incorporate that learning, that information into the learning system. And this is what we did with AlphaZero, the more general form of AlphaGo. So AlphaGo had some go-specific knowledge in it. But then with AlphaZero, we got rid of that, including the human data, human games that we learned from, and actually just did self-learning from scratch. And of course, then it was able to learn any game, not just go.
A lot of hype and hoopla has been made about the demand for energy energy arising from AI. This is a big part of the AI summit we held in Washington, DC, a few weeks ago. It seems to be the number one topic everyone talks about in tech nowadays. Where's all this power going to come from? But I ask the question of you, are there changes in the architecture of the models or the hardware or the relationship between the models and the hardware that brings down the energy per token of output or the cost per token of output that ultimately maybe, say, mutes the energy demand curve that's in front of us? Or do you not think that that's the case and we're still going to have a pretty geometric energy demand curve?
Well, look, interestingly, again, I think both cases are true in the sense that, especially us at Google and at DeepMind, we focus a lot on very efficient models that are powerful because we have our own internal use cases, of course, where we need to serve, say, AI overviews to billions of users every day, and it has to be extremely efficient, extremely low latency, and very cheap to serve. And so we've pioneered many techniques that allow us to do that, like distillation, where you have a bigger model internally that trains the smaller model. So you train the smaller model to mimic the bigger model. And over time, if you look at the progress of the last two years, the model efficiencies are like 10X, even 100X better for the same performance. Now, the reason that isn't reducing demand is because we're still not got to AGI yet. So also the frontier models, you keep wanting to train and experiment with new ideas at larger and larger scale, whilst at the same time at the serving side, things are getting more and more efficient. So both things are true. And in the end, I think that from the energy perspective, I think AI systems will give back a lot more to energy and climate change and these things than they take in terms of efficiency of grid systems and electrical systems, material design, new types of properties, new energy sources.
I think AI will help with all of that over the next 10 years that will far outweigh the energy that it users today.
As the last question, describe the world 10 years from now.
Wow. Okay. Well, I mean, 10 years, even 10 weeks is a lifetime in AI.
The Brownian field of 10 years.
But I do feel like if we will have AGI in the next 10 years, full AGI, and I think that will usher in a new golden era of science, so a new renaissance. And I think we'll see the benefits of that right across from.
(0:00) Introducing Sir Demis Hassabis, reflecting on his Nobel Prize win (2:39) What is Google DeepMind? How does it interact with Google and Alphabet? (4:01) Genie 3 world model (9:21) State of robotics models, form factors, and more (14:42) AI science breakthroughs, measuring AGI (20:49) Nano-Banana and the future of creative tools, democratization of creativity (24:44) Isomorphic Labs, probabilistic vs deterministic, scaling compute, a golden age of science Thanks to our partners for making this happen! Solana: https://solana.com/ OKX: https://www.okx.com/ Google Cloud: https://cloud.google.com/ IREN: https://iren.com/ Oracle: https://www.oracle.com/ Circle: https://www.circle.com/ BVNK: https://www.bvnk.com/ Follow Demis: https://x.com/demishassabis Follow the besties: https://x.com/chamath https://x.com/Jason https://x.com/DavidSacks https://x.com/friedberg Follow on X: https://x.com/theallinpod Follow on Instagram: https://www.instagram.com/theallinpod Follow on TikTok: https://www.tiktok.com/@theallinpod Follow on LinkedIn: https://www.linkedin.com/company/allinpod Intro Music Credit: https://rb.gy/tppkzl https://x.com/yung_spielburg Intro Video Credit: https://x.com/TheZachEffect