Monte Carlo Tree Search

A hybrid MCTS-DRL pathfinding algorithm designed for safe, predictive human-robot interaction and tracking.

Course
EEE598, Fall 2024
Team
Team 14
Institution
Arizona State University
Scroll ↓

Solving Complex Decision-Making Problems Through Intelligent Exploration

πŸ€–

Human-Robot Interaction

Enabling robots to follow human targets from ahead while maintaining safe distances and avoiding obstacles in dynamic environments.

🎯

Predictive Navigation

Anticipating human movements and navigating around obstacles through stochastic sampling to forecast optimal strategies based on expected future rewards.

🧠

DRL Integration

Combining MCTS with Deep Reinforcement Learning to generate reliable navigational goals while tracking human targets in uncertain environments.

⚑

Real-Time Adaptation

Making high-level decisions and efficiently exploring decision spaces by focusing on promising paths while avoiding collisions and occlusions.

The Four Stages of Monte Carlo Tree Search

1

Selection

Beginning at the root node, the algorithm traverses the tree using the Upper Confidence Bound (UCT) formula. This guides the search toward promising branches, balancing exploration of new paths with exploitation of known high-reward routes.

●──────┐ β”‚ β”‚ ● ● ← UCT β”‚ β–Ό
2

Expansion

New child nodes are added to represent potential future states and unexplored actions. This expands the search tree by simulating possible outcomes from the current decision point.

● /β”‚\ ● ● ● ← New / \ ● ●
3

Simulation

Each node undergoes a play-out or rollout to estimate future rewards. Models like SL-MCTS utilize neural networks to improve predictions and guide simulations toward more realistic outcomes.

● β”‚ ↓ Rollout ↓ ↓ ↓ [R = +1]
4

Backpropagation

Rewards from simulations update node statistics along the selected path. This enhances future path choices by favoring high-reward routes and continuously improving the decision tree.

● ←──────┐ β”‚ β”‚ ● β”‚ Update β”‚ β”‚ β—β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Upper Confidence Bound (UCB)

UCB = w/nc + c Γ— √(ln np / nc)
wValue of the leaf node after a rollout (expected reward)
ncNumber of times the node has been visited
npNumber of times the parent node has been visited
cExploration parameter (typically set to √2 or 1.4)

This formula balances exploration and exploitationβ€”the first term favors nodes with high rewards (exploitation), while the second term encourages visiting less-explored nodes (exploration).

Real-World MCTS-DRL Implementations

🏭

Multi-Agent Pathfinding

Multiple agents navigating grids without collidingβ€”commonly seen in robotics and automated warehouses where coordination is critical.

🦿

Wearable Exoskeletons

In rehabilitation exoskeletons, MCTS adjusts support according to patient feedback in real-time, optimizing gait assistance and personalizing therapy.

πŸš—

Robot Path Planning

Guaranteeing robots travel efficiently in changing, unknown environments while dynamically bypassing barriers and obstacles.

🀝

Human-Robot Collaboration

Integrating robots into dynamic interactions with humans and other agents, increasing safety and boosting task completion rates.

Performance Comparison: MCTS-DRL vs Standalone Methods

92%
SL-MCTS Success Rate
1.3s
Computation Time
12
Avg Path Length (steps)
MethodologyTrajectory AccuracyObstacle AvoidanceOcclusion HandlingMean Reward
DRL OnlyModerateLimitedPoor-18.4
MCTS OnlyInconsistentModerateModerate3.2 Β± 5.9
MCTS-DRL HybridExcellentHighHigh5.4

MCTS-DRL Algorithm Pseudocode

mcts_drl.py
function MCTS_DRL(robot_pose, human_trajectory, occupancy_map): # Initialize parent node with robot state and human positions parent_node = (robot_pose, human_trajectory[0]) for i in range(num_of_expansion): while parent_node is not fully_expanded: child = simulate(parent_node, action) if no collision: if no occlusion: R = Q(observation, action) # DRL reward estimation child.state = simulate_state(action) else: child.value = -1 # Penalize occlusion parent.value += child.value else: delete(child) # Remove collision paths parent_node = select_leaf_node(highest_UCB) return goal_point(leaf_node, UCB, c=0) # Best exploitation path

Research Paper: MCTS-DRL for Robotic Navigation

00

Abstract

Monte Carlo Tree Search (MCTS) is a heuristic search algorithm renowned for resolving complex decision-making problems through iterative randomized exploration, prominently utilized in game-playing AI and sequential decision-making tasks.

In this paper, we delve into the fundamental structure of MCTS and its applications in robotics and wearable exoskeletons, focusing particularly on robotic follow-ahead scenarios that require obstacle and occlusion avoidance. By integrating MCTS with Deep Reinforcement Learning (DRL), we propose a novel methodology enabling robots to make high-level decisions and generate reliable navigational goals while tracking a human target in uncertain environments.

We analyze the balance between exploration and exploitation within MCTS, its predictive capabilities, and how these features amplify adaptive decision-making and support efficient pathfinding. Case studies and implementation examples, including Tesla's Optimus robot, are presented to illustrate MCTS's effectiveness in real-world applications.

Index Terms: MCTS-DRL, Exoskeleton, Tesla Optimus Robot, SL-MCTS, MCTS

01

Introduction

Human-robot interaction is a rapidly advancing field with applications ranging from autonomous vehicles to assistive robotics. A particularly challenging task within this domain is enabling a robot to follow a human target from ahead, maintaining a safe distance while avoiding obstacles and occlusions.

Traditional methods often struggle with the complexities of predicting human intentions and navigating dynamic environments characterized by uncertainty. Monte Carlo Tree Search (MCTS) is a highly regarded algorithm known for its effectiveness in decision-making under uncertainty.

Initially applied in artificial intelligence for playing board games, MCTS has evolved into a versatile tool extensively utilized in robotics and the optimization of multi-agent systems. Its mechanism relies on stochastic sampling to forecast optimal strategies based on expected future rewards.

The MCTS Process

The MCTS process comprises four fundamental stages: Selection, Expansion, Simulation, and Backpropagation, each contributing to the growth and adaptability of the search tree over time.

In this paper, we explore how integrating MCTS with Deep Reinforcement Learning (DRL) offers a promising solution to the challenges of robotic follow-ahead applications.

02

Methodology

2.1 The Challenge: Obstacle and Occlusion Avoidance

In robotic follow-ahead applications, a robot must navigate in front of a human, maintaining a consistent distance and orientation. This task is complex due to:

  • Predicting Human Intentions: The robot must anticipate the human's future movements
  • Dynamic Environments: Obstacles and potential occlusions can obstruct the robot's path or line of sight
  • Safety Requirements: Avoiding collisions is critical for both human and robot safety

2.2 How MCTS Enhances Robotic Navigation

  • Make High-Level Decisions: Generate short-term navigational goals
  • Efficiently Explore Decision Space: Focus on promising paths
  • Avoid Obstacles and Occlusions: Incorporate environmental data

2.3 Role of Deep Reinforcement Learning

DRL provides a trained policy that estimates the expected rewards for actions, aiding MCTS in evaluating nodes during tree expansion. This integration improves the consistency and reliability of the navigational goals generated.

03

Experiments & Results

3.1 Performance Comparison

We compared the MCTS-DRL method against standard MCTS and DRL algorithms in a simulated environment with circular and S-shaped human movement patterns.

Human TrajectoryDRLMCTSMCTS-DRL
Circleβˆ’17.952.87 Β± 5.964.53
S-shapedβˆ’21.84βˆ’3.83 Β± 4.33βˆ’1.61

3.3 Obstacle and Occlusion Avoidance

━━━
Straight Path

Robot maintained position in front; adjusted path with obstacles to avoid occlusion.

╭━╯
U-Shaped Path

Robot adjusted path at ~12s to avoid occlusion rather than navigating around obstacle.

∿∿∿
S-Shaped Path

Robot altered course at ~17s to avoid occlusion while maintaining follow-ahead behavior.

┗━━
L-Shaped Corridor

Robot adjusted trajectory at corner, turning right to avoid collisions.

3.4 SL-MCTS vs Traditional MCTS

MetricTraditional MCTSSL-MCTS
Success Rate78%92%
Average Path Length15 steps12 steps
Computation Time2.4s1.3s
04

Conclusion

This study presents a groundbreaking approach for robotic follow-ahead applications, focusing on avoiding collisions and occlusions caused by obstacles in the environment.

βœ“The proposed MCTS-DRL approach outperforms standalone MCTS and DRL algorithms
βœ“Effectively follows a target person from the front while maintaining safe distance
βœ“Works reliably regardless of whether obstacles are present
βœ“Demonstrates potential to improve autonomous robotic navigation
05

References

[1]
"An MCTS-DRL Based Obstacle and Occlusion Avoidance Methodology in Robotic Follow-Ahead Applications"
Sahar Leisiazar, Edward J. Park, Angelica Lim and Mo Chen, 2023
Primary source for MCTS-DRL methodology
[2]
"A Self-Learning Monte Carlo Tree Search Algorithm for Robot Path Planning"
Wei Li, Yi Liu, Yan Ma, Kang Xu, Jiang Qiu, Zhongxue Gan. Frontiers in Neurorobotics, 2023
Traditional MCTS flow
[3]
"Robust walking control of a lower limb rehabilitation exoskeleton coupled with a musculoskeletal model via deep reinforcement learning"
DRL Algorithm implementation

Access the Full Research Paper

Team14_Lathi_Sinha_Chatterjee_MonteCarlo.pdf

Final Project Report β€’ EEE598 Fall 2024 β€’ 6 pages

Complete research paper including methodology, experimental results, performance comparisons, and future directions for MCTS-DRL in robotic navigation.

IIntroduction to MCTS & Human-Robot Interaction
IIChallenge: Obstacle & Occlusion Avoidance
IIIMCTS-DRL Integration Methodology
IVExperimental Results & Analysis
VFuture Directions & Improvements
VIConclusion & References
sleepingbomb/MONTE_CARLO_TREE_SEARCH
View full repository with source code and documentation β†’

Advancing MCTS-DRL Research

01

Enhanced Human Intention Prediction

Incorporating advanced models like transformers to improve trajectory prediction accuracy and anticipate human behavior more effectively.

02

Multi-Agent Scalability

Expanding the system for multi-agent environments, enabling collaborative navigation and coordinated decision-making among multiple robots.

03

Energy Optimization

Enhancing the algorithm's energy efficiency by optimizing computational resource allocation for deployment on edge devices.

04

Cross-Domain Versatility

Applying the hybrid framework to other domains including autonomous vehicles, drones, and industrial automation systems.

Team 14 β€” Arizona State University

SL

Sakshi Lathi

slathi@asu.edu
AS

Abhijit Sinha

asinh117@asu.edu
AC

Anusha Chatterjee

achatt53@asu.edu