Emmett Shear – SF, Power, AI Alignment, Meditation, Softmax

"And, it's like not plugged into the TV."

Published on 19 Sep 2025 • 41 min read

I interview Emmett Shear.

Watch on YouTube or Substack; listen on Spotify or Apple Podcasts.

Timestamps

00:00:00 – The learning systems view of social media
00:15:00 – Heuristics for robust Twitter usage (block big accounts)
00:22:13 – San Francisco as a gold rush city
00:32:34 – Power, SF’s shadow
00:37:21 – Power as it relates to AI safety
00:39:37 – On learning systems
00:47:50 – Contemplative practice & Softmax
00:54:27 – How to meditate? How to help with Softmax?

Transcript

Note: This transcript was LLM generated and contains typos.

[00:00] Emmett Shear:
What if we just like for an experiment, for like 20 minutes a day, we just sat in a calm, quiet room where like evidently you actually, you obviously don't actually have to act immediately. So like, what if we just didn't act for a while? Just, you know, just try that for a while every day. And it turns out when you do this, you notice that you're the little kid with the controller that you think is playing the video game and it's like not plugged into the tv.

[00:20] Parker Conley:
Emmett Shear was the co founder and CEO of Twitch and is now a co founder and CEO of softmax, a AI alignment org, which we'll be talking a lot about today. And throughout this conversation, more than anything, I'm interested in Emmett's ideas on philosophy and him as an intellectual. Emmett, welcome.

[00:39] Emmett Shear:
Thank you.

[00:40] Parker Conley:
A few years back on a salon show you hinted at ideated at there being a science of social media at some point. And we're in the very, very early stages, if even started with that. I'd be curious if you could take that lens and think about the differences between Twitch and Twitter as social media platforms.

[01:02] Emmett Shear:
Yeah, I mean, I think that we increasingly do have a model of social media as a science because I think that it's really clear to me that social media systems are learning systems. They're like Twitter and Twitch and Reddit and anything else really look like a neural network. Like you have nodes, you have information flow, you have activation connectivity. The system changes and moves over time. And so I think we are on the verge of building actually strong scientific models based on analogy to how learning systems have these trajectories in general. And it's not a neural net, it's not a brain, but it is, it is like that. I think if you think about it from a learning systems perspective, the difference between Twitter and Twitch is pretty clear, which is that Twitter is a. It's. It's a homogenous network like every, every account produces and consumes the same kind of thing. So it looks like kind of like a traditional neural net in a way. Like you have these nodes which are accounts. The nodes produce signal which used to go out according to the following graph now goes out according to a learned soft attention over who should be paying attention to this. And those things then propagate either by being riffed on or quote tweeted or responded to and the signals that resonate go. It's like, it's basically. That's like a pretty decent description of a, of a normal neural net. Twitch by contrast, looks more like heavy and learning, right? So it's like fire together, wire together is the heavy and learning thing where if you're online at the same time as a streamer that you, as a streamer, you're more likely to form a bond with them. And then if you're online and then, then you're more like, that makes you more likely to adjust your schedule to be online with them in the future. And the streamer is pulled not by an individual person, but by the aggregate of the people. When their audience has time, they tend to want to stay in sync with that. And so the whole system is this like fire together, wire together where the information moves in accordance with a Hebbian learning process instead. And there's always a way to sort of see, to go back and forth. But I think you see one of these things like an asymmetric Hebbian process. And one of them is like a sort of a standard spiking activation system. And I think like the implication of that is if you go look at Twitch, I think the other thing is that Twitch has a much harsher penalty for connectivity. So where Twitter has a low sort of like graph complexity penalty, like the fact that lots of people have seen this, it has lots of lots, lots of commentary going on. Basically unlimitedly drives more reward. On Twitch. As you, as you have more engagement, the incremental engagement starts to degrade in value because you can't talk to everyone, you can't engage everyone. The chat's going too fast, you lose some of the qualities of the smaller rooms that people like. And so there's this, you know, you call it an auxiliary loss or something, right? There's this counter pressure that an inductive bias that Twitch has towards smaller channels which are, which are phase clustered. So it's clustered by like what you co activate with, not by what you decide to propagate. Like you as a viewer don't decide which parts of the stream to pass on. You just choose what to co activate with. And that's what gets boosted is based on your, your, your showing up to watch boosts the channel. The channel showing up boosts your engagement. And there's kind of this, you know, two sided. It is the fact of the engagement, not the content being passed. It's like the, in Twitter it's like you're choosing which neurotransmitter am I emitting. Whereas on Twitch there's no. It's just like you're either active or you're not at the same time.

[05:30] Parker Conley:
So taking that perspective of neural networks or whatever. If, like this maybe narrowing down, what would like the axioms of like the. Or various dimensions of like a social media platform if there was like more of a science, like, how would, how would you compare and contrast them?

[05:50] Emmett Shear:
Yeah, I think what we're going to learn is that it's just like comparing any learning system architecture. Right. Like, the important questions are things like how big is the network? What is the. How many bits of information transit the boundary and what's the dimensionality of the information transiting the boundary? What are the, what's the, what's the reward function? What's the. What determines whether signals propagate or not? Which is like on Twitter, it's like likes. Likes and replies are kind of like a reward. Sure. And what you'll. And like, you know, the, the, The structure of like where, where are the information boundaries? What information? What's the dimensionality of the stuff inside of it and the. What crosses the boundaries and what's the graph topology between them? Basically, just like that is if it has, if it is the same in that way, it will be the same.

[06:55] Parker Conley:
Okay, I follow.

[06:56] Emmett Shear:
Like, that's the, that's what matters about a social media network at some level.

[07:00] Parker Conley:
Okay. In, in terms of like, you're able to have this perspective having been at Twitch for so long and sort of design these platforms. I'm curious if you could tie this now to like the average user on social media platforms. How does this perspective and maybe leaning into learning systems a little bit underline, like maybe a common misconception someone might have about how these platforms work.

[07:30] Emmett Shear:
Yeah, I think. Okay, so I think people have this tendency to be mad at the, at the users or the advertisers or the people running the service and, and the things they're pointing at people's bad behavior or like the service is being good or running a good moderation or bad moderation or whatever is like, that's like a, that's like a, that's downstream. Like, the thing you're mad about on Twitter is actually just the like core thing that they show you. The stuff that you're more likely to engage with. That's, that's. As long as that's happening, everything else will. The thing. You'll see this. Yeah, that's happening. Plus some of the other things about the architecture. You'll always get the same result. And, and if you. And complaining about the specific. This, everything other than that is like, you're just like, you have to. If you want to. If you want to Change it. You either have to like Twitter has to stop being Twitter, so you have to stop like having feeds and stories and stuff, or if you want to keep the core structure of like what the, what the service looks like, you have to have some counter pressure. That is not engagement that like, that prevents it from optimizing for that. And if you think about what Twitter needs to survive is your, your time on site, your engagement, your, your consistent returning time on site. And engagement is what like Twitter's like, it's like it's lifeblood. So Twitter, given that Twitter continues, it must embody an expectation that you're going to use the service a lot. Right. Like whatever it does has to be compatible with that future. I follow. And what I would say is. That has been so, so that that high level expectation gets turned into a bunch of sub expectations, which is things like people who, who see a lot of posts that they engage with are more likely to be retained and spend more time on site. Sure. Which is true. And, and this, the system, whether it's the people running it or system itself, gains high confident confidence in this prediction. And then that becomes, then the system begins to optimize for that. And then whatever causes that, if there's something that, that causes engagement, it then forms a belief that like that's what it should go for. And that's kind of what these recommendation systems do is they're, they're building a set of priors about what, what, what is a healthy good behavior look like that healthy from the perspective of you spend a lot of time on Twitter, we could debate, but from Twitter's point of view, that is what healthy looks like. Yeah, but Twitter doesn't just care about healthy behavior now. Twitter cares about the long term. Right. Like Twitter cares about you continuing to be a healthy user 10 years from now, 20 years from now, in theory, even if they don't always act like it, they should. I think Twitter does. And for that, if you have a, if you have a model that is very, very precise right now, but it's generated by a really complex model that describes all the way, because it's precise, because it describes all the ways the world is right now in detail, that's a very frat. It's a powerful but fragile model because when the world changes, all of those, all those like complex assumptions, like all those little detailed assumptions become wrong.

[11:38] Parker Conley:
And the assumption at the top being like people spend a lot of time on Twitter and we need more of that.

[11:42] Emmett Shear:
And we need more of that. But, but then you build this complicated model of how to get people to do that. And this recommendation engines for every single person that tells them exactly what gets. Makes you spend more time on site. And if you imagine the size of this model, it's very big. Sure. There's a lot of. It is a lot of detail in it. Yeah. The more in a, in a, like in. In a model complexity sense, that means it is high model complexity. There's a, There's a sort of a base, a null hypothesis model, a model of maximum entropy which says we don't know why people spend time on site. It could be anything. Sure. And the more you add, the more, the more bits of information you add above. No, we do know why. It's because of this. Because of that. The more complex your model is. Sure. And the more complex your model is, the less robust it is to change. And so when you look at how they optimize recommendation engines today, there's no countervailing term for complexity. They don't penalize the system for treating you differently from other people. They don't penalize the system for having a very. For a complicated model of you rather than a simple model of you. They don't penalize the system for having high degrees of variance in which content gets viewed, which is another form of complexity in the system is like the null hypothesis. Everything should be viewed the same number of times. The more inequality you have and the higher Gini coefficient on how popular content is, the more complex your model is. So there's no penalty against. Basically the complexity penalty wants to push you back towards the uniform priority on every dimension. Okay. And no one does. This is we, whether in the economy or in the. As we manage the economy or as we manage individual companies, we don't push back on complexity. We treat accuracies that unalloyed good. And to some degree, this is a hard trap to escape because if you optimize for accuracy and complexity and I optimize for accuracy, I'm gonna win in the short run and the network affects me and I win in the long run. That's. That is sort of true. But. But at some point you get big enough and powerful enough as a company that like you can afford some mismanagement. Like Twitter has got gone through some mismanagement, there's gone through periods and yet they're still here. So clearly they don't have to squeeze every last drop of complexity like accuracy out at every moment. Sure. Which means a wise Twitter would spend that slack on cooling on reducing its complexity and allowing some drop in accuracy. And for you as a user, what that means is if you want to reduce the. You should do this. You have control over this in your individual life. What is reducing complexity look like? It means not moving your engagement prior away from I want to consume the most popular content and towards I want to consume all content approximately the same.

[14:49] Parker Conley:
Okay.

[14:49] Emmett Shear:
And those are two extremes. Right. One extreme is I only consume the very most popular content in the world. Yeah. And the other extreme is I consume literally at random. Those are both bad. You don't want either of those.

[14:59] Parker Conley:
Sure.

[15:00] Emmett Shear:
But move away from I try to consume the content that is the best. Yeah. The thing that just purely based on what it's. What is. And add some regularization pressure towards I just starts consuming stuff that is less popular. Whatever signal you're using, whether it's popularity or relevance, spread out a little bit. Spread a little more. Okay.

[15:29] Parker Conley:
And then like zooming in the net and like making it a little more pithy. Like people using their Twitter now. Like are there like a few heuristics I remember you mentioned briefly on Twitter.

[15:39] Emmett Shear:
Like don't follow really big accounts. Like, like block the big accounts. Like anyone who, anyone has more than a million followers. Like you don't do it. Like it shouldn't be. Like you don't actually have a hard cut off. It's like, it's like stochastic. It's like go through and like if someone has more than a million, follow like take the, you know, log of the, of the, the number of followers they have and then like flip a coin that many times and if it comes up heads any of the times, unfollow them. Okay. Like block them and like get. The high resonance information that everyone is seeing is basically high complexity. It's adding, it's, it's, it's synchronizing you with the rest of the system. And that's good because it, it allows you to be more accurate. You know what everyone else is thinking accurately. But it's bad because it, it's thrashy. It makes you believe this popular things today, you believe this thing tomorrow, you're getting whipped all over the place. Whereas if you pay attention and necessarily when you go smaller, it's more local. Like it has to be how do you find the more small things? But you're not going to pick random small things. You're going to pick the best small stuff you can which is going to be closer into you. Like things that are more relevant to your life. The, the most interesting thing about the biggest San Francisco Influencer is smaller than the biggest California influencer, is smaller than the biggest United States influencer, smaller than the biggest global influencer. So you're, it pulls you local in, in geography, in interest space, in, in whatever. It pulls you local to you. And, and now you're robust because when things change, the things that change the least are the things that are closest into you. And you're also just more spread out. You're consuming a wider variety of information, which means your model is more likely to contain something that has good information about this new thing that just happened. The more broad you are, you're less deep on this one thing, but you're more likely to have useless information that now turns out to be useful because, oh, the world changed. And this new thing, this thing I thought was not important is now important and now. But I already know about it because I, because I was more random when I was consuming. And this is just like a, this is what robustness is in general. It's like not over rotating on being right all the time and being, being more open and more broad and more like, less tuned to your particular interest and more open to the people around you.

[18:14] Parker Conley:
Like, and then three to five more like pithy heuristics. Block the big accounts. Anything else that you'd recommend? Just Twitter?

[18:21] Emmett Shear:
Yeah, it's block the big accounts combined with like follow. Follow small accounts. Yeah. Think locally. Like what you want is, you want is you want to follow people and things that you more that you interact with a lot in lots of contexts. You want to be, you want to be reconnected to the same things in lots of contexts. So like, so meet your mutes. Like, like become like meet the people you know online offline or follow people you know offline online. Like, like don't, don't have a. One of the number one way you end up with a low quality model is like where you have both bad accuracy and bad complexity is it's unintegrated. You have these facts over there and these facts over here and these facts and none of them, you can't simplify. So if you're following this news about these celebrities, but your job, you do this and your home life, you have to do that. None of this reinforces itself. I see, I see none of this. You can't use your learning here with your learning there.

[19:31] Parker Conley:
It's like a different world almost.

[19:33] Emmett Shear:
You're living in three different worlds. It's like expensive. Yeah. It's harder to understand what's going on. So you want, you want to Bring everything you want to bring. This bringing things more local is good twice reduces the complexity, but also it gives you ways to find find when accuracy wins with also lower complexity by like by merging things together and in general cycle how much new information you bring in. When you want to make a strong piece of steel, you go through a process called annealing where you heat the steel up a lot so that the iron atoms can move around freely and it moves into a lower energy that allows it to sort of jiggle around and move to a better configuration. But then you cool it so that it, it cools where the structure it found the loading structure gets, gets consolidated and solidified and becomes consistent into like a, into a thing and then you heat it up again. So from, from this new, this new solidified structure you heat up again what this looks like in your life. If you want to do annealing on your own experience, it's like yeah, use Twitter a lot for like a month and then don't use it for a month and then you like like this idea that you should be use it a consistent prudent amount is actually wrong. Use it a lot. Explore wildly like really heat the system up. Like allow yourself to like learn new ideas, try listen to new people and then like stop the input for a while and try to sort through it. Try to figure out what do I actually believe, what am I keeping, what, what is, what's important to me. And then he. From this new place where you've, you've considered heat it up again. And so you, you don't want to. If you run too hot too long, you'll melt. And if you don't get enough new input for too long, you'll freeze. And this is true for physical systems. And it's true, it's not literally in the same way, but like it's true for, for people too to take this.

[21:45] Parker Conley:
More physical then still thinking about Twitter, but this is my first time in San Francisco. I've been like consuming San Francisco sort of content for the past few years, exposed the culture. I'm curious how you would think about SF culture mimetically and like what are the different dynamics? How does Twitter fit in this? How do founders fit into this? How do VCs fit into this? How does small subcultures like the Rationality community fit into this?

[22:13] Emmett Shear:
So yeah, so San Francisco is a gold rush city. San Francisco is founded by people in a country, in a, in a gold rush country. So the United States was founded by people who were relatively risk seeking and optimistic. They have this joint of like, I'm willing to go take a bunch of risk because I don't like how my life is to move to, to this new place for a better life for me and my children. But I'm optimistic enough to think that like there's 2 Responses to Things not going on. One is to hunker down and try to protect yourself. And the other is to leave. And the people came to America and leaving is the optimistic response. I can do better. So America has this bias towards both towards risk seeking and optimism. And San Francisco is a gold rush town in. It's literally from the 49ers in America. So it re concentrated people who, when they heard they might be able to become rich by going, going west to San Francisco and like digging for gold, they like decided that was a good idea and they went to go do that. And some of them succeeded, some of them got rich. And the water rights and all the foreign forest stuff, that's also gold rush. And so it's in his DNA. San Francisco is a city that believes the future is bright. If you are willing to take a risk and like go venture into the unknown, to the frontier. That's like the, the United States is that kind of a country. And the United States, San Francisco and California in general, in San Francisco in specific, the Bay Area in specific are like the most concentrated America of America in that way. America has other attributes also, but like in that as America's aspect in that way. This is like maximum America. And, and we've learned or this is. We've. We've been, we've been positively reinforced for this point of view repeatedly like since the 1850s, like, like it's been a. It became the center of like a bunch of like further mineral rights, rushes and timber rushes. And then it was the center of a lot of like shipbuilding and a lot of industry. And then we, and then we had the kind of the, the 60s and like the. It's like this, it's the same idea like where's the new frontier? I don't know. Let's go take a bunch of LSD and like find the new psychic frontier. What if I took a lot of risk and just assumed everything's going to turn out well? That's like kind of like the. Let me just like dose myself with like very high doses of these like very psychoactive drugs and heat the system up a lot and like, you know, it'll probably be cool. That's gonna be good. Not an accident that this happens in San Francisco and, and Then we get the 60s, you have like Silicon Valley and the semiconductor industry. We get reinforced again and software and the Internet and like, we just keep getting told by the like it keeps working. Like it turns out like not for everybody but for like for the. It's a gold rush. A lot of people fail, but the people who succeed, it works really well for. And it's a culture, so it's a culture that is defined by this like deep belief that flourishing is dependent on risk taking, optimism and that, that it's not, it's not zero sum. The way you get rich is not by winning a game that exists. It's by going and striking out, striking big in a game, in the new game, whatever the new game is, go in there. And this is why, by the way, crypto was never a great fit for San Francisco. Because the way you get rich in crypto is by getting other people's money given to you. Like, crypto doesn't generate money, it transfers money. And that's a much better fit for like the east coast, like New York. It's fine. It's finance. It literally is a finance product. Specifically, it's a finance, for the most part. Bitcoin has a lot of New York stuff, but specifically crypto is a finance product that, that's about effectively a Ponzi scheme. Most of it's like a Ponzi scheme. Not that, not bitcoin, but like most of crypto, which is why it wound up mostly in Florida, the birthplace of Ponzi. Mr. Ponzi, who invented the Ponzi scheme was also Florida. Cultures don't change. People just get reinforced for something and they good at doing that thing. So what that means that people actually are very, they're, they're very free with sharing ideas for the most part and very free with like helping each other and paying it forward. Because the pattern of success is you. It doesn't matter if you exchange gold, like gold digging tips with the other gold digging person, like wherever. It's what matter did your, is your mind good? The fact that he's thinking better in his mind is not going to change whether you get rich. So if you exchange it, it's good for both of you. So generally speaking there's this attitude of like, yeah, sure, whatever, who cares? Like I get a little jealous if you're like literally digging in the same place, but otherwise who cares? And in fact I'm happy to exchange information and more than that, like I'm going to get rich in one big swoop where I'm going to have my big gold strike. I'm going to have my big company. We're going to. I'm going to. I'm going to have a big moment of success for striking it rich on this new frontier. And from there I kind of like can be like an aristocrat. Like I don't, I don't. It's not a. Like in finance you're on this track where you're like growing. It's comp. It's about compounding every year. Gold rushes are not like that. It's like, it's like nothing, nothing, nothing. I'm rich.

[27:45] Parker Conley:
Very monotonic.

[27:45] Emmett Shear:
Right. Like it's very like, it's very sigmoid. Yeah. And it's very like epsilon function. Like. Like a. What's it called? A step function. It's very step functioning. Not smooth. No accident. Computers. We're very step functioning digital people. Very digital. It's a very digital place. We. San Francisco is full of people who want to think about things in terms of you are rich or you're not rich because there's kind of this thing. Some people are got rich and some people are just not the ones who got rich. And there's a pretty. The middle is weirdly kind of empty now. The not rich, if you work in tech is getting. Compared to most places has gotten quite rich. But there's still this gap between like I was. I was successfully. I was successful and early or not. And as the big companies gotten bigger, this is less true. Some of the culture has gotten changed by the fact that you have these big successful companies like Google and Amazon or not Amazon, Google and Facebook and Apple or whatever. But that's like the heart of San Francisco culture is like everyone's excited. They have a. They just when you tell them about your crazy new idea, they're excited for you because they. That's how. That's what, that's what. That's what success looks like. And they're always on the. They're always on the. Except want to hear about the new frontier. They're interested in your. In novelty. It's like incredibly novelty seeking because that's, that's. They need to know where the next. Where should I go dig my gold mine next? And it becomes, it becomes this burned in habit. Even if you aren't looking for a gold mine, you just like assume it's important to know where the frontier is. Where might be the. It's like interesting. Yeah. Yeah. Okay.

[29:21] Parker Conley:
I don't want to go into AI too much, but me trying to sense, make it. One thing that comes to mind is the talk of like existential risk, AI and how that works into the picture. I'd like to tie this in with the San Francisco culture and AI. Some arguments against AI existential risk go about saying like, sort of psychologizing, saying like people have a tendency, civilizations have a tendency to think existential risk will happen. Do you have any thoughts about this? Take in terms of like, how do we think about our own psychology when we're thinking about these big problems like existential risk?

[29:58] Emmett Shear:
Yeah. So San Francisco has this other undercurrent, this other part of the culture. I was sort of talking about the tech culture. That's. I was describing as. It's its nature as a gold rush town. San Francisco. What is also it's deeply. It's not Christian exactly, but it's like it's like Unitarian Universalist. Like there's this, this. I don't. When did this happen in the city? I think it's been true for like a very long time that like the America was a Christian country and the brand of Christianity that San Francisco picked up is the one that's compatible with high variants. Right? We like variants. We want lots of variants. And the brand of the Christianity that's compatible with this is like Unitarian Universalism, which says for in however you all. It's all acceptable as long as you're a good person. Like. Like there's no one way to do it. It's like it's very embracing a variance in how you express and see your faith to the point where you could almost say there's like no rules at all. Sure. But like. But there is a rule. There's this idea that you're supposed to be a good person, but in a way where it doesn't really give you any rules to follow to do that. Like, you're supposed to like, make up your own rules. And like if you look at San Francisco and sort of like the. How it relates to morality, it's. I would say it's a very anxious city. It's a city that like really believes it's important to do good, but isn't quite sure what that means. Like. Like it's trying but like, it's like a little bit like you're always without a. Without when you lack with you the point of having rules that's like, nice is that like they tell you when you're doing a good job. Like if you're. If you have a clear, like, set of like commandments, you're supposed to follow. If you follow the commandments, you don't even internalize that as, like, what I'm not. Not just like, I'm just buying it off. Like, I think that's what good is. Now you have like a. You. You have this external sensor to tell you if you're doing a good job or not. You know, you know what it looks like. And San Francisco kind of like wants you to follow your inner light, which can create a lot of. It's very beautiful. And when it works well, it's really good, but it's also really hard. And most people are very. It creates a lot of anxiety. It's fundamentally an anxious thing to do because you're never really sure am I following my inner light or not. And so when you apply this to AI we're out there on the frontier trying to figure it out, but we're really worried maybe we're not doing a good job, that maybe this is the whole. It's a city. We have a lot. Power is very much in our shadow. It's a city that craves power. San Francisco is a city that's entirely about power. People come here, yeah, to get rich, but like, if you really wanted to get rich, you'd say New York. They come here because they want to get rich in a way where you have a lot of freedom, you have a lot of autonomy. No one's. You have a boss and tell me what to do. And like. And. And increasingly where the gold rush wasn't quite like this, but everything since that was where, like, you get to control what the future is like, especially since the 60s. Like, it's about I want to put a dent in the universe. I want to put a dent in the universe. I want to help. I want to make a difference. What you're saying is I would like the universe to be different. I would. And please give me the power to make it. I. I would like the power of the power to change the universe. Me, change the universe. I think that's a beautiful thing. I wish that. I think that the only way all. All change is not improvement, but all improvement is change. And to. If you want to make the better the world a better place, at some point you have to wield some power. That's just like part of how it works. Sure. And I think it is one of the. It's a beautiful thing that people in San Francisco earnestly do want to make the world a better place and want are willing to do work, a lot of work to get the power to make that Happen. And I think that power is intrinsically fun to exercise and people like it. And that it's. That's also part of why people want this thing. Like, I like, I like, I like being able to. It feels good for you to do the thing, not just for someone to have done it. Sure. Like, and when you don't acknowledge that and you say like, oh, like, power is like really dangerous. It's corrupting. I don't want it. But then you keep acting to get power. What you've done is you haven't gotten rid of the problem of the power. You've shoved the problem in your shadow where you can't see it. And when it comes out as this anxiety comes out as this, I don't know what good looks like, I'm supposed to figure it out for myself because I'm supposed to. And I'm. And I care a lot about it because I care a lot about doing this thing, doing good. And like. And now I write things into my set of rules about what makes good action for an AI. That includes things like helpful, which I think is a great thing. I think a lot of my friends are helpful. Sure. And honest. And I strive to be honest. And I think a lot of people. I think it's a thing I value and people I interact with. And harmless. Harmless. Harmless means without power. Effectively harm. Harmless is what you call someone who is weak. Sure. It is not a compliment if you describe your friend as harmless. You're, oh, yeah, go on a date with that person. They're harmless. Hire them. They're harmless. No, that's a. That is what you tell some. That's a way of telling someone, this person is not competent. They're not. They're weak, they lack power. And, and we praise that as a. That's a virtue along with honesty. Honesty is a real virtue. That's a fake virtue. That's the virtue of someone who doesn't believe. You would only write that virtue if at some level you felt there was something wrong with power. Intrinsically wrong. Not just when it's used badly. Obviously bad use of power is bad. But. But there's something intrinsically wrong with power, and there isn't. Power can be used for good, and power used for good is good. That is nobility is the wise and just use of power for good. And there's also bad. And it is also true that power has a corrupting tendency. And then if you're holding power, you need to be very aware of that and you need to be careful about how much power you hold and for how long and in what context. The fear about power is not wrong. You can't be noble if you don't, if you don't embrace power. You can't be a good leader. You can't be a good leader of, of the world, of your life, of your team, of your family, unless you embrace your own power in that role as leader, as, as, as someone who is, who is changing the world, who's putting a dent in it. And like, and that is, that is a problem at the heart of a lot of what goes on in AI. And I believe, like, if you look at the REI culture, everyone in San Francisco feels it, unless you're stupid, you know that AI is like, potentially dangerous. Like, it's obvious that, like, this, it's an incredibly powerful technology. Sure, obviously there's potential danger. What kind. What should you do about that? Those are all, like, I think people. Right. Some people have noticed that when we try to do something about danger, it's often worse than the disease. The cure is worse than the disease. And they might, they could be right. I'm not trying to prescribe what to do about it. I'm just saying, like, if you're not, if you're not an idiot. Holy shit. This thing is powerful. Power can be used for good and for ill. Okay, well, then it's dangerous. Cool. And people's reaction to it is to either be very anxious about that and constantly be trying to seek safety through control or which is like, I think like most of the alignment safety and alignment initiatives are safety through. Oh, it's dangerous. Let me, Let me put a bunch of wrap. Tie it up. Put a bunch of boundary. Put a bunch of chains on it so it's safe. Put it in a corner, keep it in the corner. Could work. Sometimes that works. There's also e acc, which is. Oh, this is really dangerous. Cool. I'm going to repress that. I'm going to say, no, it's not. I don't want to think about it. I want to see it. I don't see that this is dangerous. I'm just going to say, like, it can't come out anyway. But good. It's always good. Okay, man, sure, whatever you have, whatever you say. But it helps you sleep at night. But like, obviously it could be good and it could also be bad. That's another very common response in San Francisco, which is like, shove it in the corner. And ultimately I think those correspond basically to the. To reactivity. Both are both different kinds of reactivity, which Is. Well, I don't say that they can be reactive. You could for wise reasons decide this isn't the kind of thing that you should care. No, actually it is okay, like I actually. No, it's gonna be fine. Or no, like I, we. This is the kind of thing you have to fix. Those are. That's. You could be right. My argument would be this is so new and there's so much Knightian uncertainty which is say like unknown unknowns. Like anyone who tells you they really know what's going on is wrong. Like the details are obviously unknown. That EAK is like fails the 90 and uncertainty test. You don't know that. You don't know it's going to be safe. It could be bad. And the control reaction, the, the. The anxiety is maybe fair. The anxiety is maybe the real. That's a. That fear is telling you something but reacting with. And therefore I'm going to try to make it safe to make it provably safe or like to, to. To create a set of universal rules that will make sure everyone is safe all the time. That you can't possibly get that right. You don't know. And so I think the only thing that the attack we're taking with softbacks. I'm talking my own book, but like I did it because I think this is the only plausible way to do it is like start small run. Figure out how you could measure whether it's going well or not. Alignment in a toy version first and then in a bigger thing that's more realistic and a bigger thing it's more realistic until you get the things that are more human scale and eventually bigger than human scale. And that's how you get. You want to go build the Golden Gate Bridge first try building some bridges across like a creek and then try building some bridges across a bigger river and a bigger thing and a bigger thing and experiment with new things and have a bunch of bridges fall down at a scale where it's safe for them to fall down. And eventually you'll start to get a science of structural engineering as you start to get a science of structure, material science. And you'll learn why some things fall over and some things don't. And you'll at least have heuristics that scale to a certain degree and you'll be able to quantify as you do more of it how much you can trust the heuristics and in what dimensions. And then if you do that kind of thing, there's a chance when you build the biggest bridge of all Time that has. You've never built a bridge that scale before that it stays up anyway. Not that the golden great bridge was that, but like some bridge was at some point the biggest bridge of all times. At all times. Sure. And a lot of those times those bridges don't fall over because we have a valid science. But you don't get valid science by sitting around theorizing. You get valid science through a combination of theory and empirical work, trial and.

[41:32] Parker Conley:
Error, something like that.

[41:33] Emmett Shear:
Well, trial and error also doesn't get you it. Okay. It is. You have a theory and you do experiments to test your theory, and then those experiments teach you something and you make a new theory and you get new experiments, a new theory, and you climb this ladder of theory and experiment until you have a theory that is predictive of things you've never done before.

[41:52] Parker Conley:
Okay.

[41:53] Emmett Shear:
Interesting. Right. And that's what a good theory doesn't just predict what you've already done. When you use it to predict new things you've never seen, it still works. Sure. And if you have a theory where it work, even where that happens a lot and it now. Now you can have some justified faith that this thing's going to work.

[42:11] Parker Conley:
Yeah.

[42:12] Emmett Shear:
But as far as I can tell, it's the only way to have justified faith. So if you don't have. If you come to me, you're saying, you tell me you care about AI alignment and you cannot give me. A provisional theory and a set of experiments you can do that will either validate or not validate that theory, that will lead to a better theory and better experiments. If you don't have. Can't give me that spiral. You're not serious. There's your thing can't possibly lead to the kind of problem that we. The kind of solution we care about. And, and I think, unfortunately, most of the things people look at don't have a falsifiable theory. Like they don't. They don't really have a theory that you could. That could be tested. They have a practice that could work. Sure. Which is like it's the difference between chemistry and baking. Baking is just applied chemistry. Right. In a certain domain. But as when you're baking, you will never learn why gluten has. Does the things it does. Why. Why do you learn you need acid in these contexts. Why do you need to add acid to your baking thing? Or it won't work. You have to at some point start to theorize about things that are much more general than the thing this direct thing you're doing. Yeah. Yeah. Even though. Even if all you ever want to do is bake, there's a limit you hit without a broader theory. And so you need. You need a. You want to align learning systems that. Not just these ones, but their progenitors and like, this really complicated thing. You need a theory, a general theory of learning systems.

[43:52] Parker Conley:
Can you define learning systems?

[43:53] Emmett Shear:
Yes. Anything in the world. So I know it's vague, but, like, you can see stuff and like, persists over time. Like objects like microphones and chairs and humans and some of those things, they're. When you try to predict what that. The. The behaviors, the future behaviors of those things, you have to take into account what they're observing now. Like, their future actions are conditioned on the observations that they receive, you know, in a way that requires you to model them as modeling the world. That's a little bit confusing, but like, you. It. Like, I can't model a dog and understand what it will do next in a meaningful way without. I can't. Maybe something else could, but I can't model a dog about. Without. Without paying attention to what the dog sees, what the dog currently believes, and what I think the dog sees, what I think the dog currently believes, and therefore what it is likely to do going forward. Whereas I can model a thermostat, okay, without doing that. I can model. A thermostat's an active system, but it's not a learning active system. It. It's just gonna do my. I can model it as always, reacting to observation in the same way. And to be a learning system, you have to react to observations in new ways conditioned on. Basically on your entire trajectory of observations, not on your most recent one.

[45:22] Parker Conley:
So the dog in this example is the learning system.

[45:24] Emmett Shear:
The dog is a learning. Dogs learn dogs.

[45:26] Parker Conley:
Yeah.

[45:26] Emmett Shear:
Dogs react to the two commands differently in the future than they did at time. Whereas a thermostat always reacts to change in temperature in the same way reacts to the observation, but in the same way. And therefore it doesn't learn. And like, that's just what learning means, is that you are accumulating structure of some kind inside the object inside of you. That, that. And that structure has causal power over your future actions. Observations turn into structure. Structure turns into actions. Therefore, you kind of like. It's like you have all your past observations of history of all your past observations with you at all times. Okay, there's memory, another way to put it. Is it learning to act like they have. They have memory? What. What. What's the results of learning? The learning is memory Memory. Memory in a general sense, memory of a skill, memory of. Of an event, memory of a. The world. In some way you act. Learning systems have some kind of memory. And so that's like a. Humans are re. Learn a lot. Humans are really deep learning systems. They have like really deep memories. Like uh, a sword is a. Like a piece of steel is a learning system. It is a very. Is a form of. When you anneal it over time, depending on the pattern of hot and cold and how fast you do it, you get different amounts of brittleness and flexibility in the steel. It's learning. It's like learning a little. Very narrow. I would classify it as learning. So it's a very narrow kind of learning system because its memory is like very shallow. It's like you can't model like it doesn't learn about like the world. World in general, but it does learn about the history of temperatures it's been exposed to. And so to the degree something is a learning system which is always a. To a degree, you have to have a theory of like, how do I understand, how do I predict its future behavior? Like, that's the. That's the general case of the alignment problem. Right. It's like I have this system that's going to act differently in the future than the past based on its experiences. What invariance can I. Statements can I make about its future behavior based on its past? If you don't have a theory, if your theory doesn't let me. You answer that question in the general case, it's not a general enough theory to solve this kind of a problem. You just don't. It's not. You don't have that kind of a theory. Just like if you want to predict motion, you want to predict the motion of the planets not just now, but into the future. You need a theory of how things.

[47:45] Parker Conley:
Move, not a theory of planets, not.

[47:47] Emmett Shear:
A theory of planets motion. Planetary motion. Because. Because like, what if things change? Yeah.

[47:51] Parker Conley:
Okay. To zoom in on a specific aspect of softmax's research that's particularly resonant to me. I'm curious how contemplative practice relates to the work at softmax.

[48:00] Emmett Shear:
Yeah. So our team is very like rich in people who have like done some kind of meditation or experiential practice where you. The content of it is people have different backgrounds. The content of it is always the same, which is you. You pay close attention to the. The content of your awareness. Like there's all this like sensations that arise. Like thoughts are a sensation. Reflecting on a thought is A sensation. That itch on your butt is a sensation. Like everything's a sensation. Like because you're aware of it. It's like it's a piece of content and you pay close attention to that without trying to, without reaction, without trying to change it, without trying, without needing it to be a Swiss way or that way, without trying to make it go away. Like you, just like your intention is about being attentive to it in. There's a bunch of different ways you, ways you do that, but that's, that's the substance of it. And when you do this, you basically give your brain contrastive learning data where you have a lot of data in your life where you, where you're, you have a understanding of the world conditioned on trying to fix or you know, fix change things where you, you have a. You notice something and then you want to move towards it or away away from it or like fix it or change it when you notice it or where you, you repress it or you need, you need to make it, you need to soothe yourself about it. You need to be okay. And like you're just getting the contrast of data set training. The contrast of training data that's sort of like yeah, but like, okay, I know that you need to do those things but like, but what, what if we just like for an experiment for like 20 minutes a day, we just sat in a calm quiet room where like evidently you actually, you, you obviously don't actually have to act immediately. So like what if we just didn't act for a while? Just you know, just try that for a while every day. And it turns out when you do this, you notice that you're the little kid with the controller that you think is playing the video game and it's like not plugged into the tv like a bunch of things you, it's like it's kind of plugged into the tv but like not really. Like a lot of the things you, you assumed were like the result of your action or result of you like self soothing or the result of some, some, some internal or external motion. No, that was going to happen anyway. You, Nothing you did make any difference. It's totally, totally, that's just fine. And the result of that is that you notice that basically your world model gets a lot sharper. You realize that some things I do really do matter. It is important that if I don't do that, this result really won't happen. If you're sitting on your ankle and it's starting to fall asleep and you just keep sitting on it, it's going to hurt more and more and more until it doesn't hurt and then it's going to get really, really numb. Like, like that's going to be. That's not a, that's not your imagination. That's like a, like you aren't, you aren't theorizing incorrectly that, that those things are going to be conditioned on your actions. But a lot of stuff isn't like that. A lot of stuff. If you don't do it, it's totally fine. And, and when you, this realization goes really deep and you start to notice it not about direct, direct sensation stuff like itches or whatever, but you start to notice it about ideas you have about I need to take care of making sure that person shows up on time or they won't show up on time. Have you tested that theory? Do you have any contrastive training data about what happens if you don't make sure I need to soothe this person's emotional hurt. I need to, I need to. I need to drive myself with anxiety or I won't do the thing. You might. That might be true. Have you tested the theory? Some people do need to drive themselves that way sometimes, at least without other motivation. Have you tested this theory? And when you go to test the theory what you realize is that sometimes you're right, sometimes you're wrong and it just echoes out over your whole life and it's just this massive increase in the, in the precision of your beliefs, the accuracy precision of your beliefs about when you need to act and when you don't. And the reason why this experience is important is like well a you. You should do this. It is. Makes you, it makes your life better. Like you have a lot of. If you haven't ever done it, you have a lot of low hanging fruit in terms of what you think you need to do that you actually just. Your controller is not plugged in. Sorry. That's an illusion. Not all of it but, but a lot of it but like, but the real, the important thing meta thing about noticing this is that it's possibly very competent and very very effective while totally diluted. Your beliefs about things that are not directly tied to your actions and your future observations can support almost arbitrary levels of delusion if it's not in the loop. The, the trajectory that you go on over and over again. The fact that you think you have to act to make it work this way can be arbitrarily wrong because. And you because it. Because it all is compatible with a life where you are living and maybe even thriving. You can support arbitrary levels of delusion with arbitrary levels of success. This should terrify you. If you work on AI, it means no matter how good your AI gets at succeeding in whatever evals you've created or whatever things, it could be arbitrarily wrong about reality. And in fact, not only can it be, I think inevitably will be, unless you go through a systematic process of waking the AI up effectively, of allowing the AD to do whatever the version of AI meditation is. I don't think it's like human meditation, but like, there's something that, that is equivalent to allowing the AI to, to learn to, to perceive in its own experience what it's like when it doesn't try to achieve its goals and to notice the degree to which its future observations are. There's this question, how much am I doing it? And how much is it the world? And this is both about what I perceive externally and what I perceive internally. And, and just like your beliefs that both of those things are just like, completely wrong, like, like very, very diluted. And you can make them somewhat less diluted over time by like, doing things that are kind of like meditation.

[54:26] Parker Conley:
And to zoom on this in, the, like, final two questions that I'll ask you is one, like, what practical advice say people are very inspired by this theory or idea of meditation. And then two, how can people help you at softmax in general and more about you?

[54:40] Emmett Shear:
So in your own life, like therapy is a version of this. There's lots of, there's lots of ways to do this, but like, there should be something in your life. It's like you can just do it ad hoc. You don't need. The traditions are helpful if you're gonna go deep. But like, you can just do this. Like, you should take assumptions you have about I, you know, you find yourself thinking, I need to, I have to, or just reflexively, reactively doing something without thinking it. Just try testing the theory. Some find it. There's always, it's always possible to create a safe container for it. Not for there are things you can't easily test this way. But most of the things you can find a safe container and test them. And you should, you should test them and that's, that's good. The opposite delusion is also true. You have a story. I don't have any control. I have no influence. This thing's going to happen. I'm helpless. The only thing I can do. Have you tested that hypothesis? Are you sure? Have you tried, have you actually tried Coming up with a plan, whatever is the best plan you can come up with and giving it a shot and seeing what happens, you might be sometimes, in fact you'd actually lack control. There's nothing you can do. Sure. You are equally deluded about this as you are about the things you think you're in control of if you haven't systematically gone and tested your beliefs. And so this is like just crucially important, like systematically test your beliefs about what you are, what you have influence over and what you don't have influence over because you are deluded in both cases constantly. And it's a never ending process of unwinding the deep delusions that we acquire throughout our life. That I am not at the end of, I'm just in the middle of. And then what I'm doing at softmax is trying to like create a theory of the world where like where we understand systematically. This is the thing, this is true of all learning systems. It's not just true of you, it's just a general fact about the world. And that means we want an understanding of how if you're an artificial intelligence, like if you're building artificial intelligence, you should be thinking about, okay, I'm training it on policy effectively, I'm training it to like be good at doing something and to know what is required to do the thing. How do I, how do I give it a good model of itself first? To do this you have to have a model of. The implicit thing is I have a model of the kinds of things I can do and the kind of things I can't do. Well it better fricking know what it can do and can't do because if it doesn't have a strong model of its own capabilities, it has no idea whether it can successfully do this thing safely or not. Because it has no idea what it is, where its competencies lie. So fl, first it needs a very strong model of itself and then it needs to go through this process of realizing that itself model is all fucked up and broken and that it thinks that things are conditioned on it that aren't and that things aren't conditions that are. And it has to learn the neck. And so I guess I have a proposal. I have a proposal for this theory of learning systems which I've been laying out. Basically.

[57:27] Parker Conley:
Yes.

[57:28] Emmett Shear:
And I would like help we talk about it. I would like other people to take this learning system and do research on it. Like do research on this idea because I think it's the thing that might tell us, might lead to a general theory of learning systems and thus to the ability to build AI that doesn't kill us all.

Comments? Ideas? Want to discuss further?

Say hi through X/Twitter or email!

Timestamps

Transcript

Comments? Ideas? Want to discuss further?

Table of Contents