Meta researchers have developed an artificial intelligence system called Cicero that can play the classic strategy game Diplomacy at a level comparable to most human players.
That’s a significant achievement in natural-language processing and one that may help people forget last week’s debut of Galactica, a large language model Meta boffins trained on scientific papers that presented falsehoods as facts and was taken offline after three days of criticism from the science community.
Developed in the 1950s and currently published by Hasbro, Diplomacy focuses on communication and negotiation between players, who take the role of seven European powers at the start of the 20th century. It’s seen by some gamers as an ideal way to lose friends.
The game simulates taking territories on a map of Europe. Rather than taking turns, players write their moves down in advance and execute them simultaneously. To avoid making moves that get blocked because an opponent made a counter move, players communicate with one another privately. They discuss potential coordinated actions and then commit their moves to paper, keeping or violating commitments to other players.
Diplomacy’s focus on communication, trust, and betrayal makes it a different challenge than games more focused on rules and resources like Chess and Go. Cicero essentially is a chatbot that can negotiate with other Diplomacy players to make effective moves in the game.
“Diplomacy has been viewed for decades as a near-impossible grand challenge in AI because it requires players to master the art of understanding other people’s motivations and perspectives; make complex plans and adjust strategies; and then use natural language to reach agreements with other people, convince them to form partnerships and alliances, and more,” explained Meta in a blog post.
“Cicero is so effective at using natural language to negotiate with people in Diplomacy that they often favored working with Cicero over other human participants.”
Cicero is based on a 2.7 billion parameter BART-like language model pre-trained on text from the internet and augmented using a dataset of more than 40,000 games of Diplomacy played online at webDiplomacy.net. These games contained more than 12 million messages exchanged between players.
The AI agent’s dialogue output is tied to its strategic reasoning module which creates “intents” representing a possible set of moves by the various players.
“To generate the intents for dialogue and to choose the final actions to play each turn, Cicero runs a strategic reasoning module that predicts other players’ policies (i.e., a probability distribution over actions) for the current turn based on the state of the board and the shared dialogue, and then chooses a policy for itself for the current turn that responds optimally to the other players’ predicted policies,” the Meta researchers explain in a Science research article.
Where AI agents for games like Chess can be trained through self-play using reinforcement learning, modeling the cooperative play of Diplomacy required a different technique. According to Meta, the classical approach would involve supervised learning, through which an agent would be trained using labeled data from past Diplomacy games. But supervised learning alone produced a gullible AI agent that could be easily manipulated by lying players.
So Cicero includes an iterative planning algorithm called piKL by which it refines an initial prediction of other player’s policies and planned moves based on dialogue between the bot and other players. The algorithm tries to improve anticipated sets of moves for other players by evaluating different choices that would produce better results.
In a statement, Andrew Groff, Diplomacy World Champion, praised Cicero’s passionless approach to the game. “A lot of human players will soften their approach or they’ll start getting motivated by revenge, but Cicero never does that,” said Groff. “It just plays the situation as it sees it. So it’s ruthless in executing to its strategy but it’s not ruthless in a way that annoys other players.”
Cicero played 40 games of Diplomacy anonymously in a “blitz” league on webDiplomacy.net between August 19 and October 13, 2022, and finished in the top 10 percent of participants who played more than one game. And among the 19 who played five or more games, Cicero finished second. For all 40 games, Cicero’s mean score was 25.8 percent, more than twice the average of 12.4 percent among its 82 opponents.
While Cicero still makes some mistakes, Meta’s boffins anticipate their research will prove useful for other applications like chatbots capable of holding long-running conversations or video game characters that understand player motivations and can interact more effectively as a result.
Cicero’s code has been released under an open source license in the hope the AI developer community can improve it further.