I am a roboticist / research scientist at the The AI Institute.
My research interests are in robot decision-making under uncertainty
and human-robot interaction. I’ve worked on a few topics
in robotics, including learning
for probabilistic semantic mapping (TopoNets,
GraphSPN),
navigation (ROS Navigation Tuning Guide), POMDP planning (pomdp-py),
spatial language understanding (SLOOP) and object search (GenMOS, COS-POMDP,
3D-MOS).
My PhD thesis, Generalized Object Search, was about
enabling most robots with a movable viewport to
search for objects in 3D environments (see this video).
So far, I've mostly worked with mobile robots,
and I'm gearing towards (mobile) manipulators, where major challenges remain.
I finished my PhD in computer science at Brown University
as a member of the H2R Lab, advised by Stefanie
Tellex. I received M.S. & B.S. in computer
science from the University of Washington with a minor in math,
advised by Andrzej Pronobis
and Rajesh P. N. Rao.
[CV] (郑开宇)
Publications
2023
Generalized Object Search
Kaiyu Zheng
PhD Thesis, Brown University. February, 2023.
[
pdf (7.9MB)]
[
bibtex]
[
show abstract]
[
show talk]
Future collaborative robots must be capable of
finding objects. As such a fundamental skill, we
expect object search to eventually become an
off-the-shelf capability for any robot, similar
to e.g., object detection, SLAM, and motion
planning. However, existing approaches either
make unrealistic compromises (e.g., reduce the
problem from 3D to 2D), resort to ad-hoc, greedy
search strategies, or attempt to learn
end-to-end policies in simulation that are yet
to generalize across real robots and
environments. This thesis argues that through
using Partially Observable Markov Decision
Processes (POMDPs) to model object search while
exploiting structures in the human world
(e.g., octrees, correlations) and in human-robot
interaction (e.g., spatial language), a practical
and effective system for generalized object
search can be achieved. In support of this
argument, I develop methods and systems for
(multi-)object search in 3D environments under
uncertainty due to limited field of
view,occlusion, noisy, unreliable detectors,
spatial correlations between objects,and
possibly ambiguous spatial language (e.g., "The
red car is behind Chase Bank"). Besides
evaluation in simulators such as PyGame, AirSim,
and AI2-THOR, I design and implement a
robot-independent, environment-agnostic system
for generalized object search in 3D and deploy
it on the Boston Dynamics Spot robot, the Kinova
MOVO robot, and the Universal Robots UR5e
robotic arm, to perform object search in
different environments. The system enables, for
example, a Spot robot to find a toy cat hidden
underneath a couch in a kitchen area in under
one minute. This thesis also broadly surveys the
object search literature, proposing taxonomies in
object search problem settings, methods and
systems.
A System for Generalized 3D Multi-Object Search
Kaiyu Zheng, Anirudha Paul, Stefanie Tellex
International Conference on Robotics and Automation (ICRA), 2023
[
pdf]
[
bibtex]
[
code]
[
poster]
[
show abstract]
[
hide video]
Searching for objects is a fundamental skill for
robots. As such, we expect object search to
eventually become an off-the-shelf capability for
robots, similar to e.g., object detection and
SLAM. In contrast, however, no system for 3D object
search exists that generalizes across real robots
and environments. In this paper, building upon a
recent theoretical framework that exploited the
octree structure for representing belief in 3D, we
present GenMOS (Generalized Multi-Object Search),
the first general-purpose system for multi-object
search (MOS) in a 3D region that is
robot-independent and environment-agnostic. GenMOS
takes as input point cloud observations of the local
region, object detection results, and localization
of the robot's view pose, and outputs a 6D viewpoint
to move to through online planning. In particular,
GenMOS uses point cloud observations in three ways:
(1) to simulate occlusion; (2) to inform occupancy
and initialize octree belief; and (3) to sample a
belief-dependent graph of view positions that avoids
obstacles. We evaluate our system both in simulation
and on two real robot platforms. Our system enables,
for example, a Boston Dynamics Spot robot to find a
toy cat hidden underneath a couch in under one
minute. We further integrate 3D local search with 2D
global search to handle larger areas, demonstrating
the resulting system in a 25m^2 lobby area.
2022
Towards Optimal Correlational Object Search
Kaiyu Zheng, Rohan Chitnis, Yoonchang Sung, George Konidaris, Stefanie Tellex
International Conference on Robotics and Automation (ICRA), 2022
[
pdf]
[
bibtex]
[
poster]
[
code]
[
show abstract]
[
show talk]
[
hide demo]
In realistic applications of object search, robots will
need to locate target objects in complex environments while
coping with unreliable sensors, especially for small or
hard-to-detect objects. In such settings, correlational
information can be valuable for planning efficiently.
Previous approaches that consider correlational information
typically resort to ad-hoc, greedy search strategies. We
introduce the Correlational Object Search POMDP
(COS-POMDP), which models correlations while preserving
optimal solutions with a reduced state space. We propose a
hierarchical planning algorithm to scale up COS-POMDPs for
practical domains. Our evaluation, conducted with the
AI2-THOR household simulator and the YOLOv5 object
detector, shows that our method finds objects more
successfully and efficiently compared to baselines,
particularly for hard-to-detect objects such as srub brush
and remote control.
Hierarchical Reinforcement Learning of Locomotion Policies in Response to Approaching Objects: A Preliminary Study
Shangqun Yu, Sreehari Rammohan,
Kaiyu Zheng, George Konidaris
Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2022
[
pdf]
[
bibtex]
[
show abstract]
Animals such as rabbits and birds can instantly
generate locomotion behavior in reaction to a
dynamic, approaching object, such as a person or
a rock, despite having possibly never seen the
object before and having limited perception of
the object's properties. Recently, deep
reinforcement learning has enabled complex
kinematic systems such as humanoid robots to
successfully move from point A to point
B. Inspired by the observation of the innate
reactive behavior of animals in nature, we hope
to extend this progress in robot locomotion to
settings where external, dynamic objects are
involved whose properties are partially
observable to the robot. As a first step toward
this goal, we build a simulation environment in
MuJoCo where a legged robot must avoid getting
hit by a ball moving toward it. We explore
whether prior locomotion experiences that
animals typically possess benefit the learning
of a reactive control policy under a proposed
hierarchical reinforcement learning
framework. Preliminary results support the claim
that the learning becomes more efficient using
this hierarchical reinforcement learning method,
even when partial observability (radius-based
object visibility) is taken into account.
2021
Dialogue Object Search
Monica Roy*,
Kaiyu Zheng*, Jason Liu, Stefanie Tellex
Robotics: Science and Systems (RSS) Workshop on Robotics for People (R4P): Perspectives on Interaction, Learning and Safety, 2021
[
pdf]
[
poster]
[
bibtex]
[
workshop]
[
show abstract]
Parallelizing POMCP to Solve Complex POMDPs
Semanti Basu, Sreshtaa Rajesh,
Kaiyu Zheng, Stefanie Tellex, R. Iris Bahar
Robotics: Science and Systems (RSS) Workshop on Software Tools for Real-Time Optimal Control, 2021.
[
pdf]
[
poster]
[
bibtex]
[
workshop]
[
show abstract]
Spatial Language Understanding for Object Search
in Partially Observed Cityscale Environments
Kaiyu Zheng, Deniz Bayazit, Rebecca Mathew, Ellie Pavlick, Stefanie Tellex
International Conference on Robot and Human Interactive Communication (RO-MAN), 2021
[
pdf]
[
bibtex]
[
website]
[
code]
[
show abstract]
[
show talk]
[
hide demo]
Humans use spatial language to naturally describe object
locations and their relations. Interpreting spatial language
not only adds a perceptual modality for robots, but also
reduces the barrier of interfacing with humans. Previous
work primarily considers spatial language as goal
specification for instruction following tasks in fully
observable domains, often paired with reference paths for
reward-based learning. However, spatial language is
inherently subjective and potentially ambiguous or
misleading. Hence, in this paper, we consider spatial
language as a form of stochastic observation. We propose
SLOOP (Spatial Language Object-Oriented POMDP), a new
framework for partially observable decision making with a
probabilistic observation model for spatial language. We
apply SLOOP to object search in city-scale environments. To
interpret ambiguous, context-dependent prepositions
(e.g. front), we design a simple convolutional neural
network that predicts the language provider’s latent frame
of reference (FoR) given the environment context. Search
strategies are computed via an online POMDP planner based on
Monte Carlo Tree Search. Evaluation based on crowdsourced
language data, collected over areas of five cities in
OpenStreetMap, shows that our approach achieves faster
search and higher success rate compared to baselines, with a
wider margin as the spatial language becomes more
complex. Finally, we demonstrate the proposed method in
AirSim, a realistic simulator where a drone is tasked to
find cars in a neighborhood environment.
Multi-Resolution POMDP Planning for Multi-Object Search in 3D
Kaiyu Zheng, Yoonchang Sung, George Konidaris, Stefanie Tellex
International Conference on Intelligent Robots and Systems (IROS), 2021
IROS RoboCup Best Paper Award
[
pdf]
[
bibtex]
[
website]
[
code]
[
show abstract]
[
show talk]
[
hide demo]
Robots operating in households must find objects on
shelves, under tables, and in cupboards. In such
environments, it is crucial to search efficiently at
3D scale while coping with limited field of view and
the complexity of searching for multiple
objects. Principled approaches to object search
frequently use Partially Observable Markov Decision
Process (POMDP) as the underlying framework for
computing search strategies, but constrain the
search space in 2D. In this paper, we present a
POMDP formulation for multi-object search in a 3D
region with a frustum-shaped field-of-view. To
efficiently solve this POMDP, we propose a
multi-resolution planning algorithm based on online
Monte-Carlo tree search. In this approach, we design
a novel octree-based belief representation to
capture uncertainty of the target objects at
different resolution levels, then derive abstract
POMDPs at lower resolutions with dramatically
smaller state and observation spaces. Evaluation in
a simulated 3D domain shows that our approach finds
objects more efficiently and successfully compared
to a set of baselines without resolution hierarchy
in larger instances under the same computational
requirement. We demonstrate our approach on a mobile
robot to find objects placed at different heights in
two 10m2 x 2m regions by moving its base and actuating its torso.
2020
pomdp_py: A Framework to Build and Solve POMDPs
Kaiyu Zheng, Stefanie Tellex
International Conference on Automated Planning and
Scheduling (ICAPS) Workshop on Planning and
Robotics (PlanRob), 2020
[
pdf]
[
bibtex]
[
docs]
[
code]
[
workshop]
[
show abstract]
In this paper, we present pomdp_py, a
general purpose Partially Observable Markov
Decision Process (POMDP) library written in
Python and Cython. Existing POMDP libraries
often hinder accessibility and efficient
prototyping due to the underlying programming
language or interfaces, and require extra
complexity in software toolchain to integrate
with robotics systems. pomdp_py features simple
and comprehensive interfaces capable of
describing large discrete or continuous (PO)MDP
problems. Here, we summarize the design
principles and describe in detail the
programming model and interfaces in pomdp_py. We
also describe intuitive integration of this
library with ROS (Robot Operating System), which
enabled our torso-actuated robot to perform
object search in 3D. Finally, we note directions
to improve and extend this library for POMDP
planning and beyond.
2019
From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping
Kaiyu Zheng, Andrzej Pronobis
International Conference on Intelligent Robots and Systems (IROS), 2019
[
pdf]
[
bibtex]
[
slides]
[
project]
[
code]
[
show abstract]
[
hide video]
We introduce TopoNets, end-to-end probabilistic deep networks for modeling
semantic maps with structure reflecting the topology of large-scale
environments. TopoNets build unified deep networks spanning multiple levels of
abstraction and spatial scales, from pixels representing geometry of local
places to high-level descriptions representing semantics of buildings. To this
end, TopoNets leverage complex spatial relations expressed in terms of
arbitrary, dynamic graphs. We demonstrate how TopoNets can be used to perform
end-to-end semantic mapping from partial sensory observations and noisy
topological relations discovered by a robot exploring large-scale office
spaces. We further illustrate the benefits of the probabilistic representation
by generating semantic descriptions augmented with valuable uncertainty
information and utilizing likelihoods of complete semantic maps to detect
novel and incongruent environment configurations.
2018
Learning Graph-Structured Sum-Product Networks for Probabilistic
Semantic Maps
Kaiyu Zheng, Andrzej Pronobis, Rajesh P. N. Rao
AAAI Conference on Artificial Intelligence (AAAI), 2018. (oral presentation)
[
pdf]
[
bibtex]
[
slides]
[
project]
[
code]
[
dataset]
[
show abstract]
[
hide video]
We introduce Graph-Structured Sum-Product Networks
(GraphSPNs), a probabilistic approach to structured prediction
for problems where dependencies between latent variables
are expressed in terms of arbitrary, dynamic graphs.
While many approaches to structured prediction place strict
constraints on the interactions between inferred variables,
many real-world problems can be only characterized using
complex graph structures of varying size, often contaminated
with noise when obtained from real data. Here, we focus on
one such problem in the domain of robotics. We demonstrate
how GraphSPNs can be used to bolster inference about semantic,
conceptual place descriptions using noisy topological
relations discovered by a robot exploring large-scale office
spaces. Through experiments, we show that GraphSPNs consistently
outperform the traditional approach based on undirected
graphical models, successfully disambiguating information
in global semantic maps built from uncertain, noisy
local evidence. We further exploit the probabilistic nature of
the model to infer marginal distributions over semantic descriptions
of as yet unexplored places and detect spatial environment
configurations that are novel and incongruent with
the known evidence.
2017
Learning Semantic Maps with Topological Spatial Relations using Graph-Structured Sum-Product Networks
Kaiyu Zheng, Andrzej Pronobis, Rajesh P. N. Rao
International Conference on Intelligent Robots
and Systems (IROS) Workshop on Machine Learning
Methods for High-Level Cognitive Capabilities in
Robotics (ML-HLCR), 2017.
[
pdf]
[
workshop]
[
poster]
Learning Large-Scale Topological Maps Using Sum-Product Networks
Kaiyu Zheng
Senior Thesis, University of Washington, 2017
[
pdf]
[
bibtex]
[
show abstract]
In order to perform complex actions in human environments, an autonomous robot needs the ability
to understand the environment, that is, to gather and maintain spatial knowledge. Topological map
is commonly used for representing large scale, global maps such as floor plans. Although much work
has been done in topological map extraction, we have found little previous work on the problem
of learning the topological map using a probabilistic model. Learning a topological map means
learning the structure of the large-scale space and dependency between places, for example, how
the evidence of a group of places influence the attributes of other places. This is an important step
towards planning complex actions in the environment. In this thesis, we consider the problem of
using probabilistic deep learning model to learn the topological map, which is essentially a sparse
undirected graph where nodes represent places annotated with their semantic attributes (e.g. place
category). We propose to use a novel probabilistic deep model, Sum-Product Networks (SPNs) [20],
due to their unique properties. We present two methods for learning topological maps using SPNs:
the place grid method and the template-based method. We contribute an algorithm that builds SPNs
for graphs using template models. Our experiments evaluate the ability of our models to enable
robots to infer semantic attributes and detect maps with novel semantic attribute arrangements.
Our results demonstrate their understanding of the topological map structure and spatial relations
between places
ROS Navigation Tuning Guide
Kaiyu Zheng
Chapter in Robot Operating System (ROS) - The Complete Reference (Volume 6), 2021
(Publication of the Week, Weekly Robotics)
on arxiv since 2017
[
arXiv]
[
pdf]
[
book]
[
bibtex]
[
ROS site]
[
show abstract]
[
hide video]
The ROS navigation stack is powerful for mobile robots to move from place to place reliably. The job
of navigation stack is to produce a safe path for the robot to execute, by processing data from odometry,
sensors and environment map. Maximizing the performance of this navigation stack requires some fine tuning
of parameters, and this is not as simple as it looks. One who is sophomoric about the concepts and reasoning
may try things randomly, and wastes a lot of time.
This article intends to guide the reader through the process of fine tuning navigation parameters. It is
the reference when someone need to know the "how" and "why" when setting the value of key parameters.
This guide assumes that the reader has already set up the navigation stack and ready to optimize it.
This is also a summary of my work with the ROS navigation stack.
Teaching
CSCI 2951-F: Learning and Sequential Decision Making
Brown University, Providence, RI
Head Teaching Assistant. Instructor: Michael L. Littman
(Fall 2021)
[
website]
[
show description]
Graduate-level course on automated decision
making from a computer science
perspective. Topics include Markov decision
processes, stochastic and repeated games,
partially observable Markov decision processes,
and reinforcement learning.
As an HTA, I contributed to course development by designing project requirements, presenting project ideas, creating homework problems and
solutions, holding office hours, and
coordinating teaching assistants.
CSE 446: Machine Learning
University of Washington, Seattle, WA
Teaching Assistant. Instructor: Sham M. Kakade
(Winter 2018)
Teaching Assistant. Instructor: Emily B. Fox
(Winter 2017)
[
website (2017)]
[
website (2018)]
[
show description]
Undergraduate-level machine learning course. Topics include super-
vised learning and predictive modeling: decision trees, rule induction,
nearest neighbors, Bayesian methods, neural networks, support vector
machines, and model ensembles. Unsupervised learning and clustering.
TA responsibilities: Lead sections (∼ 30 students), grading, office hours
CSE 311: Foundation of Computing
University of Washington, Seattle, WA
Teaching Assistant. Instructors: Paul Beame, Kevin Zatlousal
(Spring 2018)
Teaching Assistant. Instructors: Paul Beame, Shayan Oveis Gharan
(Fall 2016)
[
website (2016)]
[
website (2018)]
[
show description]
First CS major course. Examines fundamentals of logic, set theory, in-
duction, and algebraic structures with applications to computing; finite
state machines; and limits of computability.
TA responsibilities: Lead sections (∼ 30 students), grading, office hours
CSE 373: Data Structures and Algorithms
University of Washington, Seattle, WA
Teaching Assistant. Instructor: Evan McCarty
(Fall 2017)
[
website]
[
show description]
For non-CS majors. Fundamental algorithms and data structures for
implementation. Linked lists, stacks, queues, directed graphs. Trees:
representations, traversals. Searching, hashing, sorting.
TA responsibilities: Lead sections (∼ 30 students), grading, office hours
Teaching Certificates
Sheridan Teaching Certificate I, Brown University (12/2021).
[
website]
[
show description]
Develop and refine fundamental teaching and assessment strategies and communication skills based on how students learn.