LessWrong AI
2026-06-29 16:03 UTC
By Gordon Seidoh Worley
USR-0152-20260629-community-fo-31362391
Fake Alignment Till You Make Alignment
“Fake it till you make it” is good advice. It may sound epistemically fraught, but it frequently works. Sometimes all it really takes to get good at something is just having the confidence that you’ll be good at it. I’ve done this many times at work, in romance, and even writing blog posts. But it only works because I’m careful to never fake my evals. By this I mean, I never fake the way I measure if I’m successful. Let’s say I’m trying to learn a new hobby, like whittling. I believe I’ll be good at it if I just put in the time, so I put in the hours carving wood. What I have to be careful to do, though, is not allow myself to move the goalposts. I need to have some clear vision in my head of what success is, and work towards that. If I carve something crappy and tell myself “actually, that’s good enough, I’m good at whittling”, that’s the way I can trick myself into just being fake. I’ve mostly avoided being fake by demanding authenticity of myself. For example, back in school, I refused to take short cuts just to pass a test. Instead, I put in the extra work to really learn something because, to me, the grade was never the point. I’ve taken a similar approach to meditation (the point is waking up, not special mental states), romance (I want a good relationship, not to be datable), and friendship (I don’t want to seem like a good friend, I want to actually be one). I bring all this up because I’ve been thinking about fake-it-till-you-make-it and authenticity dynamics lately…
“Fake it till you make it” is good advice. It may sound epistemically fraught, but it frequently works. Sometimes all it really takes to get good at something is just having the confidence that you’ll be good at it. I’ve done this many times at work, in romance, and even writing blog posts. But it only works because I’m careful to never fake my evals. By this I mean, I never fake the way I measure if I’m successful. Let’s say I’m trying to learn a new hobby, like whittling. I believe I’ll be good at it if I just put in the time, so I put in the hours carving wood. What I have to be careful to do, though, is not allow myself to move the goalposts. I need to have some clear vision in my head of what success is, and work towards that. If I carve something crappy and tell myself “actually, that’s good enough, I’m good at whittling”, that’s the way I can trick myself into just being fake. I’ve mostly avoided being fake by demanding authenticity of myself. For example, back in school, I refused to take short cuts just to pass a test. Instead, I put in the extra work to really learn something because, to me, the grade was never the point. I’ve taken a similar approach to meditation (the point is waking up, not special mental states), romance (I want a good relationship, not to be datable), and friendship (I don’t want to seem like a good friend, I want to actually be one). I bring all this up because I’ve been thinking about fake-it-till-you-make-it and authenticity dynamics lately…
Full article content could not be extracted automatically. Read the original below.
Source:
LessWrong AI
· lesswrong.com