Kaiyu Zheng

I am a roboticist / research scientist at the The AI Institute. My research interests are in robot decision-making under uncertainty and human-robot interaction. I’ve worked on a few topics in robotics, including learning for probabilistic semantic mapping (TopoNets, GraphSPN), navigation (ROS Navigation Tuning Guide), POMDP planning (pomdp-py), spatial language understanding (SLOOP) and object search (GenMOS, COS-POMDP, 3D-MOS). My PhD thesis, Generalized Object Search, was about enabling most robots with a movable viewport to search for objects in 3D environments (see this video). So far, I've mostly worked with mobile robots, and I'm gearing towards (mobile) manipulators, where major challenges remain.

I finished my PhD in computer science at Brown University as a member of the H2R Lab, advised by Stefanie Tellex. I received M.S. & B.S. in computer science from the University of Washington with a minor in math, advised by Andrzej Pronobis and Rajesh P. N. Rao. [CV] (郑开宇)

Publications

2023

Generalized Object Search
Kaiyu Zheng
PhD Thesis, Brown University. February, 2023.
[pdf (7.9MB)] [bibtex] [show abstract] [show talk]

Future collaborative robots must be capable of finding objects. As such a fundamental skill, we expect object search to eventually become an off-the-shelf capability for any robot, similar to e.g., object detection, SLAM, and motion planning. However, existing approaches either make unrealistic compromises (e.g., reduce the problem from 3D to 2D), resort to ad-hoc, greedy search strategies, or attempt to learn end-to-end policies in simulation that are yet to generalize across real robots and environments. This thesis argues that through using Partially Observable Markov Decision Processes (POMDPs) to model object search while exploiting structures in the human world (e.g., octrees, correlations) and in human-robot interaction (e.g., spatial language), a practical and effective system for generalized object search can be achieved. In support of this argument, I develop methods and systems for (multi-)object search in 3D environments under uncertainty due to limited field of view,occlusion, noisy, unreliable detectors, spatial correlations between objects,and possibly ambiguous spatial language (e.g., "The red car is behind Chase Bank"). Besides evaluation in simulators such as PyGame, AirSim, and AI2-THOR, I design and implement a robot-independent, environment-agnostic system for generalized object search in 3D and deploy it on the Boston Dynamics Spot robot, the Kinova MOVO robot, and the Universal Robots UR5e robotic arm, to perform object search in different environments. The system enables, for example, a Spot robot to find a toy cat hidden underneath a couch in a kitchen area in under one minute. This thesis also broadly surveys the object search literature, proposing taxonomies in object search problem settings, methods and systems.

A System for Generalized 3D Multi-Object Search
Kaiyu Zheng, Anirudha Paul, Stefanie Tellex
International Conference on Robotics and Automation (ICRA), 2023
[pdf] [bibtex] [code] [poster] [show abstract] [hide video]

Searching for objects is a fundamental skill for robots. As such, we expect object search to eventually become an off-the-shelf capability for robots, similar to e.g., object detection and SLAM. In contrast, however, no system for 3D object search exists that generalizes across real robots and environments. In this paper, building upon a recent theoretical framework that exploited the octree structure for representing belief in 3D, we present GenMOS (Generalized Multi-Object Search), the first general-purpose system for multi-object search (MOS) in a 3D region that is robot-independent and environment-agnostic. GenMOS takes as input point cloud observations of the local region, object detection results, and localization of the robot's view pose, and outputs a 6D viewpoint to move to through online planning. In particular, GenMOS uses point cloud observations in three ways: (1) to simulate occlusion; (2) to inform occupancy and initialize octree belief; and (3) to sample a belief-dependent graph of view positions that avoids obstacles. We evaluate our system both in simulation and on two real robot platforms. Our system enables, for example, a Boston Dynamics Spot robot to find a toy cat hidden underneath a couch in under one minute. We further integrate 3D local search with 2D global search to handle larger areas, demonstrating the resulting system in a 25m^2 lobby area.

2022

Towards Optimal Correlational Object Search
Kaiyu Zheng, Rohan Chitnis, Yoonchang Sung, George Konidaris, Stefanie Tellex
International Conference on Robotics and Automation (ICRA), 2022
[pdf] [bibtex] [poster] [code] [show abstract] [show talk] [hide demo]

In realistic applications of object search, robots will need to locate target objects in complex environments while coping with unreliable sensors, especially for small or hard-to-detect objects. In such settings, correlational information can be valuable for planning efficiently. Previous approaches that consider correlational information typically resort to ad-hoc, greedy search strategies. We introduce the Correlational Object Search POMDP (COS-POMDP), which models correlations while preserving optimal solutions with a reduced state space. We propose a hierarchical planning algorithm to scale up COS-POMDPs for practical domains. Our evaluation, conducted with the AI2-THOR household simulator and the YOLOv5 object detector, shows that our method finds objects more successfully and efficiently compared to baselines, particularly for hard-to-detect objects such as srub brush and remote control.

Hierarchical Reinforcement Learning of Locomotion Policies in Response to Approaching Objects: A Preliminary Study
Shangqun Yu, Sreehari Rammohan, Kaiyu Zheng, George Konidaris
Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2022
[pdf] [bibtex] [show abstract]

Animals such as rabbits and birds can instantly generate locomotion behavior in reaction to a dynamic, approaching object, such as a person or a rock, despite having possibly never seen the object before and having limited perception of the object's properties. Recently, deep reinforcement learning has enabled complex kinematic systems such as humanoid robots to successfully move from point A to point B. Inspired by the observation of the innate reactive behavior of animals in nature, we hope to extend this progress in robot locomotion to settings where external, dynamic objects are involved whose properties are partially observable to the robot. As a first step toward this goal, we build a simulation environment in MuJoCo where a legged robot must avoid getting hit by a ball moving toward it. We explore whether prior locomotion experiences that animals typically possess benefit the learning of a reactive control policy under a proposed hierarchical reinforcement learning framework. Preliminary results support the claim that the learning becomes more efficient using this hierarchical reinforcement learning method, even when partial observability (radius-based object visibility) is taken into account.

2021

Dialogue Object Search
Monica Roy*, Kaiyu Zheng*, Jason Liu, Stefanie Tellex
Robotics: Science and Systems (RSS) Workshop on Robotics for People (R4P): Perspectives on Interaction, Learning and Safety, 2021
[pdf] [poster] [bibtex] [workshop] [show abstract]

We envision robots that can collaborate and communicate seamlessly with humans. It is necessary for such robots to decide both what to say and how to act, while interacting with humans. To this end, we introduce a new task, dialogue object search: A robot is tasked to search for a target object (e.g., fork) in a human environment (e.g., kitchen), while engaging in a "video call" with a remote human who has additional but inexact knowledge about the target’s location. That is, the robot conducts speech-based dialogue with the human, while sharing the image from its mounted camera. This task is challenging at multiple levels, from data collection, algorithm and system development, to evaluation. Despite these challenges, we believe such a task blocks the path towards more intelligent and collaborative robots. In this extended abstract, we motivate and introduce the dialogue object search task and analyze examples collected from a pilot study. We then discuss our next steps and conclude with several challenges on which we hope to receive feedback.

Parallelizing POMCP to Solve Complex POMDPs
Semanti Basu, Sreshtaa Rajesh, Kaiyu Zheng, Stefanie Tellex, R. Iris Bahar
Robotics: Science and Systems (RSS) Workshop on Software Tools for Real-Time Optimal Control, 2021.
[pdf] [poster] [bibtex] [workshop] [show abstract]

In the real world, robots must be able to plan and operate under uncertainties. To that end, Partially-Observable Markov Decision Processes (POMDPs) are a popularly used architecture that lets us model planning problems where system states are only partially known. Planning algorithms are then used to solve POMDPs. Partially-Observable Monte Carlo Planning (POMCP) is one such planning algorithm that has been shown to be very successful in solving POMDPs, especially under larger numbers of simulations. However, POMCP becomes prohibitively slow and expensive over large and complex domains, making it unsuitable for many real world applications. This paper outlines an effort to develop a parallelized version of POMCP to aid in faster decision making in complicated scenarios. Our long term goal is to implement parallel POMCP in hardware so it can be deployed on robots for fast and accurate planning in uncertain environments.

Spatial Language Understanding for Object Search in Partially Observed Cityscale Environments
Kaiyu Zheng, Deniz Bayazit, Rebecca Mathew, Ellie Pavlick, Stefanie Tellex
International Conference on Robot and Human Interactive Communication (RO-MAN), 2021
[pdf] [bibtex] [website] [code] [show abstract] [show talk] [hide demo]

Humans use spatial language to naturally describe object locations and their relations. Interpreting spatial language not only adds a perceptual modality for robots, but also reduces the barrier of interfacing with humans. Previous work primarily considers spatial language as goal specification for instruction following tasks in fully observable domains, often paired with reference paths for reward-based learning. However, spatial language is inherently subjective and potentially ambiguous or misleading. Hence, in this paper, we consider spatial language as a form of stochastic observation. We propose SLOOP (Spatial Language Object-Oriented POMDP), a new framework for partially observable decision making with a probabilistic observation model for spatial language. We apply SLOOP to object search in city-scale environments. To interpret ambiguous, context-dependent prepositions (e.g. front), we design a simple convolutional neural network that predicts the language provider’s latent frame of reference (FoR) given the environment context. Search strategies are computed via an online POMDP planner based on Monte Carlo Tree Search. Evaluation based on crowdsourced language data, collected over areas of five cities in OpenStreetMap, shows that our approach achieves faster search and higher success rate compared to baselines, with a wider margin as the spatial language becomes more complex. Finally, we demonstrate the proposed method in AirSim, a realistic simulator where a drone is tasked to find cars in a neighborhood environment.

Multi-Resolution POMDP Planning for Multi-Object Search in 3D
Kaiyu Zheng, Yoonchang Sung, George Konidaris, Stefanie Tellex
International Conference on Intelligent Robots and Systems (IROS), 2021
IROS RoboCup Best Paper Award
[pdf] [bibtex] [website] [code] [show abstract] [show talk] [hide demo]

Robots operating in households must find objects on shelves, under tables, and in cupboards. In such environments, it is crucial to search efficiently at 3D scale while coping with limited field of view and the complexity of searching for multiple objects. Principled approaches to object search frequently use Partially Observable Markov Decision Process (POMDP) as the underlying framework for computing search strategies, but constrain the search space in 2D. In this paper, we present a POMDP formulation for multi-object search in a 3D region with a frustum-shaped field-of-view. To efficiently solve this POMDP, we propose a multi-resolution planning algorithm based on online Monte-Carlo tree search. In this approach, we design a novel octree-based belief representation to capture uncertainty of the target objects at different resolution levels, then derive abstract POMDPs at lower resolutions with dramatically smaller state and observation spaces. Evaluation in a simulated 3D domain shows that our approach finds objects more efficiently and successfully compared to a set of baselines without resolution hierarchy in larger instances under the same computational requirement. We demonstrate our approach on a mobile robot to find objects placed at different heights in two 10m² x 2m regions by moving its base and actuating its torso.

2020

pomdp_py: A Framework to Build and Solve POMDPs
Kaiyu Zheng, Stefanie Tellex
International Conference on Automated Planning and Scheduling (ICAPS) Workshop on Planning and Robotics (PlanRob), 2020
[pdf] [bibtex] [docs] [code] [workshop] [show abstract]

In this paper, we present pomdp_py, a general purpose Partially Observable Markov Decision Process (POMDP) library written in Python and Cython. Existing POMDP libraries often hinder accessibility and efficient prototyping due to the underlying programming language or interfaces, and require extra complexity in software toolchain to integrate with robotics systems. pomdp_py features simple and comprehensive interfaces capable of describing large discrete or continuous (PO)MDP problems. Here, we summarize the design principles and describe in detail the programming model and interfaces in pomdp_py. We also describe intuitive integration of this library with ROS (Robot Operating System), which enabled our torso-actuated robot to perform object search in 3D. Finally, we note directions to improve and extend this library for POMDP planning and beyond.

Star

2019

From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping
Kaiyu Zheng, Andrzej Pronobis
International Conference on Intelligent Robots and Systems (IROS), 2019
[pdf] [bibtex] [slides] [project] [code] [show abstract] [hide video]

We introduce TopoNets, end-to-end probabilistic deep networks for modeling semantic maps with structure reflecting the topology of large-scale environments. TopoNets build unified deep networks spanning multiple levels of abstraction and spatial scales, from pixels representing geometry of local places to high-level descriptions representing semantics of buildings. To this end, TopoNets leverage complex spatial relations expressed in terms of arbitrary, dynamic graphs. We demonstrate how TopoNets can be used to perform end-to-end semantic mapping from partial sensory observations and noisy topological relations discovered by a robot exploring large-scale office spaces. We further illustrate the benefits of the probabilistic representation by generating semantic descriptions augmented with valuable uncertainty information and utilizing likelihoods of complete semantic maps to detect novel and incongruent environment configurations.

2018

Learning Graph-Structured Sum-Product Networks for Probabilistic
Semantic Maps
Kaiyu Zheng, Andrzej Pronobis, Rajesh P. N. Rao
AAAI Conference on Artificial Intelligence (AAAI), 2018. (oral presentation)
[pdf] [bibtex] [slides] [project] [code] [dataset] [show abstract] [hide video]

We introduce Graph-Structured Sum-Product Networks (GraphSPNs), a probabilistic approach to structured prediction for problems where dependencies between latent variables are expressed in terms of arbitrary, dynamic graphs. While many approaches to structured prediction place strict constraints on the interactions between inferred variables, many real-world problems can be only characterized using complex graph structures of varying size, often contaminated with noise when obtained from real data. Here, we focus on one such problem in the domain of robotics. We demonstrate how GraphSPNs can be used to bolster inference about semantic, conceptual place descriptions using noisy topological relations discovered by a robot exploring large-scale office spaces. Through experiments, we show that GraphSPNs consistently outperform the traditional approach based on undirected graphical models, successfully disambiguating information in global semantic maps built from uncertain, noisy local evidence. We further exploit the probabilistic nature of the model to infer marginal distributions over semantic descriptions of as yet unexplored places and detect spatial environment configurations that are novel and incongruent with the known evidence.

2017

Learning Semantic Maps with Topological Spatial Relations using Graph-Structured Sum-Product Networks
Kaiyu Zheng, Andrzej Pronobis, Rajesh P. N. Rao
International Conference on Intelligent Robots and Systems (IROS) Workshop on Machine Learning Methods for High-Level Cognitive Capabilities in Robotics (ML-HLCR), 2017.
[pdf] [workshop] [poster]

Learning Large-Scale Topological Maps Using Sum-Product Networks
Kaiyu Zheng
Senior Thesis, University of Washington, 2017
[pdf] [bibtex] [show abstract]

In order to perform complex actions in human environments, an autonomous robot needs the ability to understand the environment, that is, to gather and maintain spatial knowledge. Topological map is commonly used for representing large scale, global maps such as floor plans. Although much work has been done in topological map extraction, we have found little previous work on the problem of learning the topological map using a probabilistic model. Learning a topological map means learning the structure of the large-scale space and dependency between places, for example, how the evidence of a group of places influence the attributes of other places. This is an important step towards planning complex actions in the environment. In this thesis, we consider the problem of using probabilistic deep learning model to learn the topological map, which is essentially a sparse undirected graph where nodes represent places annotated with their semantic attributes (e.g. place category). We propose to use a novel probabilistic deep model, Sum-Product Networks (SPNs) [20], due to their unique properties. We present two methods for learning topological maps using SPNs: the place grid method and the template-based method. We contribute an algorithm that builds SPNs for graphs using template models. Our experiments evaluate the ability of our models to enable robots to infer semantic attributes and detect maps with novel semantic attribute arrangements. Our results demonstrate their understanding of the topological map structure and spatial relations between places

ROS Navigation Tuning Guide
Kaiyu Zheng
Chapter in Robot Operating System (ROS) - The Complete Reference (Volume 6), 2021 (Publication of the Week, Weekly Robotics)
on arxiv since 2017
[arXiv] [pdf] [book] [bibtex] [ROS site] [show abstract] [hide video]

The ROS navigation stack is powerful for mobile robots to move from place to place reliably. The job of navigation stack is to produce a safe path for the robot to execute, by processing data from odometry, sensors and environment map. Maximizing the performance of this navigation stack requires some fine tuning of parameters, and this is not as simple as it looks. One who is sophomoric about the concepts and reasoning may try things randomly, and wastes a lot of time.

This article intends to guide the reader through the process of fine tuning navigation parameters. It is the reference when someone need to know the "how" and "why" when setting the value of key parameters. This guide assumes that the reader has already set up the navigation stack and ready to optimize it. This is also a summary of my work with the ROS navigation stack.

Teaching

CSCI 2951-F: Learning and Sequential Decision Making
Brown University, Providence, RI
Head Teaching Assistant. Instructor: Michael L. Littman (Fall 2021)
[website] [show description]

Graduate-level course on automated decision making from a computer science perspective. Topics include Markov decision processes, stochastic and repeated games, partially observable Markov decision processes, and reinforcement learning.

As an HTA, I contributed to course development by designing project requirements, presenting project ideas, creating homework problems and solutions, holding office hours, and coordinating teaching assistants.

CSE 446: Machine Learning
University of Washington, Seattle, WA
Teaching Assistant. Instructor: Sham M. Kakade (Winter 2018)
Teaching Assistant. Instructor: Emily B. Fox (Winter 2017)
[website (2017)] [website (2018)] [show description]

Undergraduate-level machine learning course. Topics include super- vised learning and predictive modeling: decision trees, rule induction, nearest neighbors, Bayesian methods, neural networks, support vector machines, and model ensembles. Unsupervised learning and clustering.

TA responsibilities: Lead sections (∼ 30 students), grading, office hours

CSE 311: Foundation of Computing
University of Washington, Seattle, WA
Teaching Assistant. Instructors: Paul Beame, Kevin Zatlousal (Spring 2018)
Teaching Assistant. Instructors: Paul Beame, Shayan Oveis Gharan (Fall 2016)
[website (2016)] [website (2018)] [show description]

First CS major course. Examines fundamentals of logic, set theory, in- duction, and algebraic structures with applications to computing; finite state machines; and limits of computability.

TA responsibilities: Lead sections (∼ 30 students), grading, office hours

CSE 373: Data Structures and Algorithms
University of Washington, Seattle, WA
Teaching Assistant. Instructor: Evan McCarty (Fall 2017)
[website] [show description]

For non-CS majors. Fundamental algorithms and data structures for implementation. Linked lists, stacks, queues, directed graphs. Trees: representations, traversals. Searching, hashing, sorting.

TA responsibilities: Lead sections (∼ 30 students), grading, office hours

Teaching Certificates

Sheridan Teaching Certificate I, Brown University (12/2021).
[website] [show description]

Contact

Feel free to reach me through email:
kzheng10 [at] cs.brown.edu

I am on Github@zkytony.