This paper introduces Policy Learning with a Language Bottleneck (PLLB), a novel framework addressing the limitations of modern AI systems in terms of generalization, interpretability, and human-AI interaction. While AI agents excel in specific tasks, they often lack the ability to adapt to new situations, explain their actions, and collaborate effectively with humans.
PLLB tackles these challenges by:
- Generating Linguistic Rules: The framework leverages language models to generate rules that explain the agent’s successful behaviors, effectively capturing the underlying strategies. This is achieved by comparing high-reward and low-reward episodes and prompting the language model to provide rules leading to success.
- Policy Update Guided by Rules: The generated rules are then used to update the agent’s policy, aligning its behavior with the identified successful strategies. This is done by incorporating the rules as a regularization term in the reinforcement learning update rule.
Benefits of PLLB:
- Interpretability: The generated rules offer insights into the agent’s decision-making process, making its actions more understandable for humans.
- Generalization: By learning abstract rules instead of specific actions, the agent can better adapt to new situations and environments.
- Human-AI Collaboration: The rules can be shared with humans, facilitating communication and coordination in collaborative tasks.
Experiments and Results:
The paper demonstrates the effectiveness of PLLB through various experiments:
- SELECTSAY: A two-player communication game where PLLB agents learn human-interpretable strategies.
- MAZE: A maze-solving task where agents generalize their knowledge to new mazes and share it with humans for improved performance.
- BUILDER and BIRDS: Image reconstruction tasks where agents use language to describe images and collaborate with humans for accurate reconstruction.
The results show that PLLB agents outperform baselines in terms of generalization, interpretability, and human-AI collaboration.
Future Directions:
The paper suggests several avenues for further research:
- Complex Reward Functions: Applying PLLB to tasks with complex reward functions, potentially involving human preferences.
- Transparency and Predictability: Utilizing language rules to enhance the transparency and predictability of AI systems in various applications.
- Generating Diverse Language Information: Expanding PLLB to generate explanations, goals, and learning strategies for cultural transmission or novel update functions.
- Long-Term Sensorimotor Trajectories: Adapting PLLB to handle complex data like robot sensorimotor trajectories.
- Multimodal Models: Leveraging advancements in multimodal models for improved rule generation and applicability.
- Human-AI Interaction: Further exploring PLLB’s potential in collaborative scenarios.
Overall, PLLB presents a promising approach to bridge the gap between AI performance and human-like capabilities, paving the way for more interpretable, generalizable, and collaborative AI systems.