By default, I expect system cards will get worse, which would be bad. Some mechanisms could improve system cards, but I expect they will be outweighed. In any case, I think third-parties should focus on scrutinising system cards — this seems like a great activity for outsiders in the current strategic landscape. I'll sketch what that could look like, and offer some recommendations. It would be bad if system cards degraded. If labs felt pressure to evaluate the risks accurately, they'd be better incentivised to reduce them. If the risks were high enough, and a lab communicated that, then this might prompt drastic government action. It's very plausible that, if labs build misaligned AIs that take over, then most of the employees had a genuine but incorrect belief that the AIs wouldn't take over, based on evidence that was actually flimsy and misleading. Outsiders scrutinising system cards seems like a great mechanism for improving the epistemics within the labs. It's good for the outside community to know which risks are most pressing, so they know which activities to prioritise . By default, I expect system cards to get worse, because… The system will get genuinely more complicated. There will be more models, trained with more techniques, interacting in more ways — plus dozens of ad-hoc patches for the problems that arise. Already no one person holds the whole system in their head, and the fraction that fits in one head will keep shrinking. Any overall safety judgment will co…

Full article content could not be extracted automatically. Read the original below.