Language-Guided World Models language-guided world model
A Model-Based Approach to AI Control

Princeton University & UC Berkeley
*Indicates Equal Contribution

We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effectively control these agents through verbal communication.

teaser

The above image illustrates the type of safe and transparent human-AI communication that our world models enable. Instead of executing a task immediately, an agent uses its world model to generate a plan for a human to review. This helps the human better understand the agent's intention. The human can also improve the plan by correcting the agent's actions or adapting its world model through language.

Overview

Developing internal world models for artificial agents opens an efficient channel for humans to communicate with and control them. In addition to updating policies, humans can modify the world models of these agents in order to influence their decisions. The challenge, however, is that currently existing world models are difficult for humans to adapt because they lack a natural communication interface.

Aimed at addressing this shortcoming, we develop Language-Guided World Models (LWMs), which can capture environment dynamics by reading language descriptions. These models enhance agent communication efficiency, allowing humans to simultaneously alter their behavior on multiple tasks with concise language feedback. They also enable agents to self-learn from texts originally written to instruct humans.

To facilitate the development of LWMs, we design a challenging benchmark based on the game of MESSENGER (Hanjie et al.,2021), requiring compositional generalization to new language descriptions and environment dynamics. Our experiments reveal that the current state-of-the-art Transformer architecture performs poorly on this benchmark, motivating us to design a more robust architecture. To showcase the practicality of our proposed LWMs, we simulate a scenario where these models augment the interpretability and safety of an agent by enabling it to generate and discuss plans with a human before execution. By effectively incorporating language feedback on the plan, the models boost the agent performance in the real environment by up to three times without collecting any interactive experiences in this environment.

Our Approach

Learning LWMs poses a challenging problem involving the retrieval and incorporation of information expressed in different modalities. Our model is an encoder-decoder Transformer which encodes a manual and decodes a trajectory. We transform the trajectory into a long sequence of tokens and train the model as a sequence generator. We implement a specialized attention mechanism inspired by EMMA (Hanjie et al., 2021) to incorporate textual information into the observation tokens.

model

Benchmarking Compositional Generalizability

Our goal is to build world models that can generalize to compositionally novel texts and environment dynamics. We construct a challenging benchmark based on the MESSENGER environment to evaluate this capability of world models. In MESSENGER, a player manages to pick up a message entity and deliver to a goal entity without colliding with an enemy entity. There is a manual describing the identity and attributes of the entities. A model is tested on previously unseen environments that are increasingly dissimilar to the training environments.


messenger

Results

We demonstrate the effectiveness of our proposed model through both intrinsic and extrinsic evaluations. The instrinsic evaluation measures the prediction loss of the model when conditioned on ground-truth observations (we also have results with self-generated trajectories in the paper). The extrinsic evaluation simulates the scenario we describe at the top of this page, in which an agent learns from a human using its world model, without interacting with the real environment. In both evaluations, our model outperforms the standard encoder-decoder Transformer and approaches the performance of an oracle with a perfect semantic-parsing capability.

result intrinsic result extrinsic

Below, we show a qualitative example taken from our hardest evaluation setting. The Observational (no language) model mistakenly captures the movement patterns of the immobile queen goal and the chasing whale message. It also misrecognizes the whale as an enemy, predicting a wrong reward after the player collides with this entity. GPTHard is an approach that leverages ChatGPT to ground descriptions to entities. It falsely identifies the queen as the message and predicts the whale to be fleeing. Meanwhile, our model (EMMA) captures all of those roles and movements accurately.

result intrinsic

BibTeX

@misc{zhang2024languageguided,
title={Language-Guided World Models: A Model-Based Approach to AI Control}, 
author={Alex Zhang and Khanh Nguyen and Jens Tuyls and Albert Lin and Karthik Narasimhan},
year={2024},
eprint={2402.01695},
archivePrefix={arXiv},
primaryClass={cs.CL}
}