This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The fourth post can be found here . TLDR: Via adapting the methods of Marks et al and Li et al , we train Gemini 3 Flash to have certain traits/values by midtraining it on documents about how Gemini has those properties, followed by finetuning it on synthetic chat data where it demonstrates those properties. The chat finetuning is effective for instilling the traits robustly, working OOD. We share some takeaways on how to improve midtraining & SFT effectiveness. Introduction This work closely follows Li et al (model spec midtraining, or MSM), who show that by training a model on synthetic documents before chat finetuning starts, they can shape how the model generalizes. Teaching the model reasons behind specific behaviours, rather than just the behaviours themselves, can also improve generalization. Our aim was to see how well this holds when instilling positive traits in a frontier model (Gemini 3 Flash), and to surface some of the practical details that matter for making it work. Our motivation is deep alignment : we want to train principles into the model which guide behaviour even in highly OOD behaviours. Our MVP pipeline used a "traits document" (a short bullet-pointed list of positive traits we wanted the model to exhibit) as our universe context, with a checkpoint of Gemini 3 Flash post-trained only on the F…

Full article content could not be extracted automatically. Read the original below.