Exploring Socially Robot Navigation via LLM-Based Human Interaction

1NYUAD Center for Artificial Intelligence and Robotics 2NYU Tandon school of engineering 3Tsinghua University Shool of Software

Strategies used to avoid incoming collision with pedestrians. Left: Detour for a longer route to prevent col lision. Middle: Beeping to alert pedestrians and create space for the robot. Right: Interactive voice interaction, enabling proactive avoidance requests from either the robot or pedestrians.

HSAC-LLM (Active request)

The Robot proactively sends voice avoidance requests after detecting collision risk. Ask the pedestrian margin to his closer side.

HSAC-LLM (Passive respond)

After receiving the pedestrian's state of his moving trajectory to the left side, the robot promptly shifts its position to the opposite side to avert a collision and responds with its next intended move.

Abstract

Robot navigation is an important research field with applications in various domains. However, traditional approaches often prioritize efficiency and obstacle avoidance, neglecting a nuanced understanding of human behavior or intent in shared spaces. With the rise of service robots, there’s an increasing emphasis on endowing robots with the capability to navigate and interact in complex real-world environments. Socially aware navigation has recently become a key research area. However, existing work either predicts pedestrian move- ments or simply emits alert signals to pedestrians, falling short of facilitating genuine interactions between humans and robots. In this paper, we introduce the Hybrid Soft Actor- Critic with Large Language Model (HSAC-LLM), a novel model for robot socially aware navigation that seamlessly integrates deep reinforcement learning and large language models, which concurrently handles the robot’s continuous and discrete actions. When a collision risk with a pedestrian is detected, it initiates a conversation either by emitting speech or receiving the pedestrian’s speech. After the dialogue concludes, the robot adjusts its motion based on the conversation’s outcome. Experimental results in a 2D simulation and Gazebo environment demonstrate that HSAC-LLM not only efficiently enables interaction with humans but also exhibits superior performance in navigation and obstacle avoidance compared to state-of-the-art DRL algorithms. We believe this innovative paradigm opens up new avenues for effective and socially aware human-robot interactions in dynamic environments.

Video

Approach

Preprocessing

Raw observations like radar scans, pedestrian position, and speed information from the environment will be processed by Img_PreNet built by convolutional and fully connected layers. Pedestrian's vocal messages will be handled by Lang_PreNet built with LLM into action state vectors. Finally State_PreNet built by one fully connected layer will combine previously processed data with the robot's position info into a 512-dimensional vector state used for the HSAC-LLM algorithm.

HSAC(Hybrid Soft Actor Critic) Module

The behavior of the robot involves both continuous and discrete elements. We categorize the robot’s motion controls, angular and linear velocities, as continuous actions, while post-interaction decisions are discrete actions. HSAC was used to model both continuous and discrete robot actions.

LLM(Large Language Model) Module

LLM was introduced to serve as a conduit between discrete action codes and intuitive voice messages. Through the presentation of prompts that encapsulate the contextual information and environment dynamics, coupled with dialogue examples, the LLM performs a two-way translation. This translation facilitates the transformation between action code and the respective voice messages from both pedestrians and the robot.

ROS simulation with TurtleBot

Crosswalk Environment


Proactive made by robot
Robot: "Could you move to your right please?"
Pedestrian: "Sure I will do that!"

Proactive made by pedestrian
Pedestrian: "On your left!"
Robot: "I will shift to my right."

Proactive made by pedestrian
Pedestrian: "Moving towards your right side!"
Robot: "I shift to the left then."

Proactive made by pedestrian
Pedestrian: "On your right!"
Robot: "I will move to my left."

Crosswalk Environment


Proactive made by robot
Robot: "Could you please stop so I can pass?"
Pedestrian: "Sure, go ahead!"

Proactive made by pedestrian
Pedestrian: "Could you let me pass first?"
Robot: "Of course, I will stop."

Proactive made by pedestrian
Pedestrian: "I am already late for my work!"
Robot: "Ok I will stop."

Proactive made by pedestrian
Pedestrian: "I am in a hurry!"
Robot: "Ok, I will stop to left you pass."

Real world test with turtlebot-3

BibTeX

@article{hsacllm2024,
  author    = {Congcong Wen, Yifan Liu, Wenyu Han, Geeta Chandra Raju Bethala, Zheng Peng, Yu-Shen Liu and Yi Fang},
  title     = {Exploring Socially Robot Navigation via LLM-Based Human Interaction},
  journal   = {},
  year      = {2023},
}