Socially-Aware Robot Navigation Enhanced by Bidirectional Natural Language Conversations Using Large Language Models

1Embodied AI and Robotics (AIR) Lab, New York University and NYUAD Center for Artificial Intelligence and Robotics 2UCLA Mobility Lab and Mobility Center of Excellence 3Harvard AI and Robotics Lab, Harvard University 4School of Software, Tsinghua University 5NYUAD Center for Artificial Intelligence and Robotics *Equal contribution

Strategies used to avoid incoming collision with pedestrians. Left: Detour for a longer route to prevent col lision. Middle: Beeping to alert pedestrians and create space for the robot. Right: Interactive voice interaction, enabling proactive avoidance requests from either the robot or pedestrians.

HSAC-LLM (Active request)

The Robot proactively sends voice avoidance requests after detecting collision risk. Ask the pedestrian margin to his closer side.

HSAC-LLM (Passive respond)

After receiving the pedestrian's state of his moving trajectory to the left side, the robot promptly shifts its position to the opposite side to avert a collision and responds with its next intended move.

Abstract

Robot navigation is crucial across various domains, yet traditional methods focus on efficiency and obstacle avoidance, often overlooking human behavior in shared spaces. With the rise of service robots, socially aware navigation has gained prominence. However, existing approaches primarily predict pedestrian movements or issue alerts, lacking true human-robot interaction. We introduce Hybrid Soft Actor-Critic with Large Language Model (HSAC-LLM), a novel framework for socially aware navigation. By integrating deep reinforcement learning with large language models, HSAC-LLM enables bidirectional natural language interactions, predicting both continuous and discrete navigation actions. When potential collisions arise, the robot proactively communicates with pedestrians to determine avoidance strategies. Experiments in 2D simulation, Gazebo, and real-world environments demonstrate that HSAC-LLM outperforms state-of-the-art DRL methods in interaction, navigation, and obstacle avoidance. This paradigm advances effective human-robot interactions in dynamic settings.

Video

Approach

Preprocessing

Raw observations like radar scans, pedestrian position, and speed information from the environment will be processed by Img_PreNet built by convolutional and fully connected layers. Pedestrian's vocal messages will be handled by Lang_PreNet built with LLM into action state vectors. Finally State_PreNet built by one fully connected layer will combine previously processed data with the robot's position info into a 512-dimensional vector state used for the HSAC-LLM algorithm.

HSAC(Hybrid Soft Actor Critic) Module

The behavior of the robot involves both continuous and discrete elements. We categorize the robot’s motion controls, angular and linear velocities, as continuous actions, while post-interaction decisions are discrete actions. HSAC was used to model both continuous and discrete robot actions.

LLM(Large Language Model) Module

LLM was introduced to serve as a conduit between discrete action codes and intuitive voice messages. Through the presentation of prompts that encapsulate the contextual information and environment dynamics, coupled with dialogue examples, the LLM performs a two-way translation. This translation facilitates the transformation between action code and the respective voice messages from both pedestrians and the robot.

ROS simulation with TurtleBot

Crosswalk Environment


Proactive made by robot
Robot: "Could you move to your right please?"
Pedestrian: "Sure I will do that!"

Proactive made by pedestrian
Pedestrian: "On your left!"
Robot: "I will shift to my right."

Proactive made by pedestrian
Pedestrian: "Moving towards your right side!"
Robot: "I shift to the left then."

Proactive made by pedestrian
Pedestrian: "On your right!"
Robot: "I will move to my left."

Crosswalk Environment


Proactive made by robot
Robot: "Could you please stop so I can pass?"
Pedestrian: "Sure, go ahead!"

Proactive made by pedestrian
Pedestrian: "Could you let me pass first?"
Robot: "Of course, I will stop."

Proactive made by pedestrian
Pedestrian: "I am already late for my work!"
Robot: "Ok I will stop."

Proactive made by pedestrian
Pedestrian: "I am in a hurry!"
Robot: "Ok, I will stop to left you pass."

Real world experiments

BibTeX

@article{hsacllm2024,
  author    = {Congcong Wen*, Yifan Liu*, Geeta Chandra Raju Bethala, Shuaihang Yuan, Hao Huang, Yu Hao, Mengyu Wang, Yu-Shen Liu, Anthony Tzes, and Yi Fang},
  title     = {Socially-Aware Robot Navigation Enhanced by Bidirectional Natural Language Conversations Using Large Language Models},
  journal   = {},
  year      = {2023},
}