I'm a PhD student at the Humans-To-Robots
Lab at Brown University,
advised by Stefanie
Tellex.
My research
interests are in robot decision making under uncertainty, especially
partial observability. I am also interested in improving robot
decision making through natural language understanding.

Before my PhD studies, I graduated with M.S. and B.S. in computer
science from the University of Washington with
minor in math. I worked on mobile robot navigation and deep generative
modeling for mobile robot semantic mapping using Sum-Product Networks,
advised by Andrzej Pronobis and Rajesh P. N. Rao.

[CV]
[Github]
[google scholar]

## Preprints

**Spatial Language Understanding for Object Search
in Partially Observed Cityscale Environments**
Kaiyu Zheng, Deniz Bayazit, Rebecca Mathew, Ellie Pavlick, Stefanie Tellex

*Under Review.*
We present a system that enables robots to interpret spatial language as a
distribution over object locations for effective search in partially observable
cityscale environments. We introduce the spatial language observation space and
formulate a stochastic observation model under the framework of Partially
Observable Markov Decision Process (POMDP) which incorporates information
extracted from the spatial language into the robot's belief. To interpret
ambiguous, context-dependent prepositions (e.g.~front), we propose a
convolutional neural network model that learns to predict the language
provider's relative frame of reference (FoR) given environment context. We
demonstrate the generalizability of our FoR prediction model and object search
system through cross-validation over areas of five cities, each with a
40,000m$^2$ footprint. End-to-end experiments in simulation show that our
system achieves faster search and higher success rate compared to a
keyword-based baseline without spatial preposition understanding.

**Multi-Resolution POMDP Planning for Multi-Object Search in 3D**
Kaiyu Zheng, Yoonchang Sung, George Konidaris, Stefanie Tellex

*Under Review.*
Robots operating in household environments must find objects on shelves,
under tables, and in cupboards. Previous work often formulates the object
search problem as a POMDP Partially Observable Markov Decision Process), yet
constrain the search space in 2D to reduce computational complexity, although
objects exist in a rich 3D environment. We present a POMDP formulation for
multi-object search in a 3D region with a frustum-shaped field-of-view and an
efficient multi-resolution planning algorithm to solve this POMDP. To achieve
efficient planning, our algorithm uses a new octree-based representation that
captures beliefs at different resolution levels, enabling the agent to induce
abstract POMDPs with dramatically smaller state and observation spaces. Our
evaluation in a simulated 3D domain shows that our approach achieves
significantly higher reward ($\geq$ 51% in the largest instance) and finds more
objects compared to baselines without a resolution hierarchy, as the search
space becomes larger, and as the sensor uncertainty increases. We show that our
approach enables a mobile robot to automatically find objects placed at
different heights in two 10m$^2\times$2m regions by moving its base and
actuating its torso.

## Publications

**pomdp_py: A Framework to Build and Solve POMDPs**
Kaiyu Zheng, Stefanie Tellex

ICAPS 2020 Workshop on Planning and Robotics (PlanRob)
[

pdf]
[

bibtex]
[

docs]
[

code]
[

show abstract]

In this paper, we present pomdp_py, a general purpose Partially Observable Markov Decision Process (POMDP) library written in Python and Cython. Existing POMDP libraries often hinder accessibility and efficient prototyping due to the underlying programming language or interfaces, and require extra complexity in software toolchain to integrate with robotics systems. pomdp_py features simple and comprehensive interfaces capable of describing large discrete or continuous (PO)MDP problems. Here, we summarize the design principles and describe in detail the programming model and interfaces in pomdp_py. We also describe intuitive integration of this library with ROS (Robot Operating System), which enabled our torso-actuated robot to perform object search in 3D. Finally, we note directions to improve and extend this library for POMDP planning and beyond.

**From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping**
Kaiyu Zheng, Andrzej Pronobis

International Conference on Intelligent Robots and Systems (IROS 2019).
[

pdf]
[

bibtex]
[

slides]
[

project]
[

show abstract]
[

hide video]

We introduce TopoNets, end-to-end probabilistic deep networks for modeling
semantic maps with structure reflecting the topology of large-scale
environments. TopoNets build unified deep networks spanning multiple levels of
abstraction and spatial scales, from pixels representing geometry of local
places to high-level descriptions representing semantics of buildings. To this
end, TopoNets leverage complex spatial relations expressed in terms of
arbitrary, dynamic graphs. We demonstrate how TopoNets can be used to perform
end-to-end semantic mapping from partial sensory observations and noisy
topological relations discovered by a robot exploring large-scale office
spaces. We further illustrate the benefits of the probabilistic representation
by generating semantic descriptions augmented with valuable uncertainty
information and utilizing likelihoods of complete semantic maps to detect
novel and incongruent environment configurations.

**Learning Graph-Structured Sum-Product Networks for Probabilistic Semantic Maps**
Kaiyu Zheng, Andrzej Pronobis, Rajesh P. N. Rao

AAAI Conference on Artificial Intelligence (AAAI 2018).
[

pdf]
[

bibtex]
[

slides]
[

project]
[

code]
[

dataset]
[

show abstract]
[

hide video]

We introduce Graph-Structured Sum-Product Networks
(GraphSPNs), a probabilistic approach to structured prediction
for problems where dependencies between latent variables
are expressed in terms of arbitrary, dynamic graphs.
While many approaches to structured prediction place strict
constraints on the interactions between inferred variables,
many real-world problems can be only characterized using
complex graph structures of varying size, often contaminated
with noise when obtained from real data. Here, we focus on
one such problem in the domain of robotics. We demonstrate
how GraphSPNs can be used to bolster inference about semantic,
conceptual place descriptions using noisy topological
relations discovered by a robot exploring large-scale office
spaces. Through experiments, we show that GraphSPNs consistently
outperform the traditional approach based on undirected
graphical models, successfully disambiguating information
in global semantic maps built from uncertain, noisy
local evidence. We further exploit the probabilistic nature of
the model to infer marginal distributions over semantic descriptions
of as yet unexplored places and detect spatial environment
configurations that are novel and incongruent with
the known evidence.

**ROS Navigation Tuning Guide**
Kaiyu Zheng

chapter in Robot Operating System (ROS) - The Complete Reference (Volume 6), 2021.
[

arXiv]
[

pdf]
[

book]
[

bibtex]
[

ROS site]
[

show abstract]
[

hide video]

The ROS navigation stack is powerful for mobile robots to move from place to place reliably. The job
of navigation stack is to produce a safe path for the robot to execute, by processing data from odometry,
sensors and environment map. Maximizing the performance of this navigation stack requires some fine tuning
of parameters, and this is not as simple as it looks. One who is sophomoric about the concepts and reasoning
may try things randomly, and wastes a lot of time.

This article intends to guide the reader through the process of fine tuning navigation parameters. It is
the reference when someone need to know the "how" and "why" when setting the value of key parameters.
This guide assumes that the reader has already set up the navigation stack and ready to optimize it.
This is also a summary of my work with the ROS navigation stack.

**Learning Large-Scale Topological Maps Using Sum-Product Networks**
Kaiyu Zheng

Senior Thesis, University of Washington, 2017
[

pdf]
[

bibtex]
[

show abstract]

In order to perform complex actions in human environments, an autonomous robot needs the ability
to understand the environment, that is, to gather and maintain spatial knowledge. Topological map
is commonly used for representing large scale, global maps such as floor plans. Although much work
has been done in topological map extraction, we have found little previous work on the problem
of learning the topological map using a probabilistic model. Learning a topological map means
learning the structure of the large-scale space and dependency between places, for example, how
the evidence of a group of places influence the attributes of other places. This is an important step
towards planning complex actions in the environment. In this thesis, we consider the problem of
using probabilistic deep learning model to learn the topological map, which is essentially a sparse
undirected graph where nodes represent places annotated with their semantic attributes (e.g. place
category). We propose to use a novel probabilistic deep model, Sum-Product Networks (SPNs) [20],
due to their unique properties. We present two methods for learning topological maps using SPNs:
the place grid method and the template-based method. We contribute an algorithm that builds SPNs
for graphs using template models. Our experiments evaluate the ability of our models to enable
robots to infer semantic attributes and detect maps with novel semantic attribute arrangements.
Our results demonstrate their understanding of the topological map structure and spatial relations
between places

Email: kaiyu_zheng [at] brown [dot] edu