Synthesis of Provably Correct Autonomy Protocols
for Shared Control

Murat Cubuktepe, Nils Jansen, Mohammed Alsiekh, Ufuk Topcu

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Abstract—We develop algorithms to synthesize shared control
protocols subject to probabilistic temporal logic specifications.
More specifically, we develop a framework in which a human
and an autonomy protocol can issue commands to carry out
a certain task. We blend these commands into a joint input
to the robot. We model the interaction between the human
and the robot as a Markov decision process that represents
the shared control scenario. Additionally, we use randomized
strategies in a Markov decision process to incorporate potential
randomness in human’s decisions, which is caused by factors such
as complexity of the task specifications and imperfect interfaces.
Using inverse reinforcement learning, we obtain an abstraction
of the human’s behavior as a so-called randomized strategy. We
design the autonomy protocol to ensure that the robot behavior—
resulting from the blending of the commands—satisfies given
safety and performance specifications in probabilistic temporal
logic. Additionally, the resulting strategies generate behavior as
similar to the behavior induced by the human’s commands as
possible. We formulate the underlying problem as a quasiconvex
optimization problem, which is solved by checking feasibility
of a number of linear programming problems. We show the
applicability of the approach through case studies involving
autonomous wheelchair navigation and unmanned aerial vehicle
mission planning.


We study the problem of shared control, where a robot
is to execute a task according to the goals of a human
operator while adhering to additional safety and performance
requirements. Applications of such human-robot interaction
include remotely operated semi-autonomous wheelchairs 13,
robotic teleoperation 16, and human-in-the-loop unmanned
aerial vehicle mission planning 9. Basically, the human
operator issues a command through an input interface, which
maps the command directly to an action for the robot. The
problem is that a sequence of such actions may fail to
accomplish the task at hand, due to limitations of the interface
or failure of the human in comprehending the complexity of
the problem. Therefore, a so-called autonomy protocol provides
assistance for the human in order to complete the task according
to the given requirements.

At the heart of the shared control problem is the design of
an autonomy protocol. In the literature, there are two main
directions, based on either switching the control authority

M. Cubuktepe and U. Topcu are with the Department of Aerospace Engi-
neering and Engineering Mechanics, University of Texas at Austin, 201 E 24th
St, Austin, TX 78712, USA. Nils Jansen is with the Department of Software
Science, Radboud University Nijmegen, Comeniuslaan 4, 6525 HP Nijmegen,
Netherlands. Mohammed Alsiekh is with the Institute for Computational
Engineering and Sciences, University of Texas at Austin, 201 E 24th St,
Austin, TX 78712, USA. email:({mcubuktepe,malsiekh,utopcu},
[email protected]).

between human and autonomy protocol 26, or on blending
their commands towards joined inputs for the robot 7, 15.

One approach to switching the authority first determines
the desired goal of the human operator with high confidence,
and then assists towards exactly this goal 8, 18. In 12,
switching the control authority between the human and au-
tonomy protocol ensures satisfaction of specifications that are
formally expressed in temporal logic. In general, the switching
of authority may cause a decrease in human’s satisfaction, who
usually prefers to retain as much control as possible 17.

The second direction is to provide another command in
addition to the one of the human operator. To introduce a
more flexible trade-off between the human’s control authority
and the level of autonomous assistance, both commands are
then blended to form a joined input for the robot. A blending
function determines the emphasis that is put on the autonomy
protocol in the blending, that is, regulating the amount of
assistance provided to the human. Switching of authority can
be seen as a special case of blending, as the blending function
may assign full control to the autonomy protocol or to the
human. In general, putting more emphasis on the autonomy
protocol in blending leads to greater accuracy in accomplishing
the task 6, 7, 20. However, as before, humans may prefer
to retain control of the robot 16, 17. None of the existing
blending approaches provide formal correctness guarantees
that go beyond statistical confidence bounds. Correctness here
refers to ensuring safety and optimizing performance according
to the given requirements. Our goal is to design an autonomy
protocol that admits formal correctness while rendering the
robot behavior as close to the human’s inputs as possible.

We model the behavior of the robot as a Markov decision
process (MDP) 23, which captures the robot’s actions inside a
potentially stochastic environment. Problem formulations with
MDPs typically focus on maximizing an expected reward (or,
minimizing the expected cost). However, such formulations may
not be sufficient to ensure safety or performance guarantees in
a task that includes a human operator. Therefore, we design
the autonomy protocol such that the resulting robot behavior
is formally verified with respect to probabilistic temporal logic
specifications. Such verification problems have been extensively
studied for MDPs 2.

Take as an example a scenario involving a semi-autonomous
wheelchair 13 whose navigation has to account for a randomly
moving autonomous vacuum cleaner, see Fig. 1. The wheelchair
needs to navigate to the exit of a room, and the vacuum
cleaner moves in the room according to a probabilistic transition
function. The task of the wheelchair is to reach the exit gate
while not crashing into the vacuum cleaner. The human may not

(a) Autonomy perspective





(b) Human perspective

Fig. 1. A wheelchair in a shared control setting.

fully perceive the motion of the vacuum cleaner. Note that the
human’s commands, depicted with the solid red line in Fig 1(a),
may cause the wheelchair to crash into the vacuum cleaner.
The autonomy protocol provides another set of commands,
which is indicated by the solid red line in Fig 1(b), to carry
out the task safely without crashing. However, the autonomy
protocol’s commands deviate highly from the commands of
the human. The two sets of commands are then blended into
a new set of commands, depicted using the dashed red line
in Fig 1(b). The blended commands perform the task safely
while generating behavior as similar to the behavior induced
by the human’s commands as possible.

In 15, we formulated the problem of designing the
autonomy protocol as a nonlinear programming problem. How-
ever, solving nonlinear programs is generally intractable 3.
Therefore, we proposed a greedy algorithm that iteratively
repairs the human strategy such that the specifications are
satisfied without guaranteeing optimality, based on 22. Here,
we propose an alternative approach for the blending of the two
strategies. We follow the approach of repairing the strategy
of the human to compute an autonomy protocol. We ensure
that the resulting behavior induced by the repaired strategy (1)
deviates minimally from the human strategy and (2) satisfies
given temporal logic specifications after blending. We formally
define the problem as a quasiconvex optimization problem,
which can be solved efficiently by checking feasibility of a
number of convex optimization problems 4.

A human may be uncertain about which command to issue
in order to accomplish a task. Moreover, a typical interface
used to parse human’s commands, such as a brain-computer
interface, is inherently imperfect. To capture such uncertainties
and imperfections in the human’s decisions, we introduce
randomness to the commands issued by humans. It may not
be possible to blend two different deterministic commands. If
the human’s command is “up” and the autonomy protocol’s
command is “right”, we cannot blend these two commands
to obtain another deterministic command. By introducing
randomness to the commands of the human and the autonomy
protocol, we therefore ensure that the blending is always well-
defined. In what follows, we call a formal interpretation of a
sequence of the human’s commands the human strategy, and
the sequence of commands issued by the autonomy protocol

the autonomy strategy.
The question remains how to obtain the human strategy in

the first place. It may be unrealistic that a human can provide
the strategy for an MDP that models a realistic scenario. To this
end, we create a virtual simulation environment that captures
the behavior of the MDP. We ask humans to participate in two
case studies to collect data about typical human behavior. We
use inverse reinforcement learning to get a formal interpretation
as a strategy based on human’s inputs 1, 28. We model
a typical shared control scenario based on an autonomous
wheelchair navigation 13 in our first case study. In our second
case study, we consider an unmanned aerial vehicle mission
planning scenario, where the human operator is to patrol certain
regions while keeping away from enemy aerial vehicles.

In summary, the main contribution this paper is to synthesize
the autonomy protocol such that the resulting blended or
repaired strategy meets all given specifications while only
minimally deviating from the human strategy. We introduce all
formal foundations that we need in Section II. We provide an
overview of the general shared control concept in Section III.
We present the shared control synthesis problem and provide
a solution based on linear programming in Section IV. We
indicate the applicability and scalability of our approach on
experiments in Section V and draw a conclusion and critique
of our approach in Section VI.


In this section, we introduce the required formal models and
specifications that we use to synthesize the autonomy protocol,
and we give a short example illustrating the main concepts.

1) Markov Decision Processes: A probability distribution
over a finite set X is a function µ : X ? 0, 1 ? R with?
x?X µ(x) = µ(X) = 1. The set X of all distributions is


Definition 1 (MDP). A Markov decision process (MDP)M =
(S, sI ,A,P) has a finite set S of states, an initial state sI ? S,
a finite set A of actions, and a transition probability function
P : S ×A? Distr(S).

MDPs have nondeterministic choices of actions at the
states; the successors are determined probabilistically via the
associated probability distribution. We assume that all actions
are available at each state and that the MDP contains no
deadlock states. A cost function C : S ×A? R?0 associates
cost to transitions. If there one single action available at each
state, the MDP reduces to a discrete-time Markov chain (MC).

We use strategies resolve the choices of actions in order to
define a probability measure and expected cost on MDPs.

Definition 2 (Strategy). A randomized strategy for an MDPM
is a function ? : S ? Distr(A). If ?(s, ?) = 1 for ? ? A and
?(s, ?) = 0 for all ? ? A {?}, the strategy is deterministic.
The set of all strategies over M is StrM.

Resolving all the nondeterminism for an MDP M with a
strategy ? ? StrM yields an induced Markov chain M? .

s0 s1 s2

s3 s4















(a) MDP M
s0 s1 s2

s3 s4







(b) Induced MC M?1

Fig. 2. MDP M with target state s2 and induced MC for strategy ?unif

Definition 3 (Induced MC). For an MDP M = (S, sI ,A,P)
and strategy ? ? StrM, the MC induced by M and ? is
M? = (S, sI ,A,P?) where

P?(s, s?) =


?(s, ?) · P(s, ?, s?) for all s, s? ? S .

The occupancy measure of a strategy ? gives the expected
number of taking an action at a state in an MDP. In our
formulation, we use the occupancy measure of a strategy to
compute an autonomy protocol.

Definition 4 (Occupancy Measure). The occupancy measure
x? of a strategy ? is defined as

x?(s, ?) = E


P (?t = ?|st = s)


where st and ?t denote the state and action in an MDP at
time step t. The occupancy measure x?(s, ?) is the expected
number of taking the action ? at state s with the strategy ?.

2) Specifications: A probabilistic reachability specification
P??(?T ) with the threshold ? ? 0, 1 ? Q and the set of
target states T ? S puts an upper bound ? on the probability
to reach T from sI in M. Similarly, expected cost properties
E??(?G) restrict the expected cost to reach the set G ? S
of goal states by an upper bound ? ? Q. We also use until
properties of the form Pr??(¬T U G), which assert that the
probability of reaching the set of states G while not reaching
the set of target states T beforehand is at least ?.

The synthesis problem is to find one particular strategy ?
for an MDP M such that given specifications ?1, . . . , ?n are
satisfied in the induced MC M? , written ? |= ?1, . . . , ?n.

Example 1. Fig. 2(a) depicts an MDP M with initial state
s0. In state s0, the avaliable actions are a and b. Similarly
for state s1, the two avaliable actions are c and d. If action a
is selected in state s0, the agent transitions to s1 and s3 with
probabilities 0.6 and 0.4. For states s2, s3 and s4 we omit
actions, because of the self loops.

For a safety specification ? = P?0.21(?s2), the deterministic
strategy ?1 ? StrM with ?1(s0, a) = 1 and ?1(s1, c) =
1 induces the probability 0.36 of reaching s2. Therefore,
the specification is not satisfied, see the induced MC in
Fig. 2(b). Likewise, the randomized strategy ?unif ? StrM
with ?unif(s0, a) = ?unif(s0, b) = 0.5 and ?unif(s1, c) =







blended command

function b

model Mr

?1, . . . , ?n


Fig. 3. Shared control architecture.

?unif(s1, d) = 0.5 violates the specification, as the probability
of reaching s2 is 0.25. However, the deterministic strategy
?safe ? StrM with ?safe(s0, b) = 1 and ?safe(s1, d) = 1 induces
the probability 0.16, thus ?safe |= ?.


We now detail the general shared control concept adopted in
this paper. Consider the setting in Fig. 3. As inputs, we have a
set of task specifications, a model Mr for the robot behavior,
and a blending function b. The given robot task is described by
certain performance and safety specifications ?1, . . . , ?n. For
example, it may not be safe to take the shortest route because
there may be too many obstacles in that route. In order to
satisfy performance considerations, the robot should prefer to
take the shortest route possible while not violating the safety
specifications. We model the behavior of the robot inside a
stochastic environment as an MDP Mr.

In out setting, a human issues a set of commands for the
robot to execute. It may be unrealistic that a human can
grasp an MDP that models a realistic shared control scenario.
Indeed, a human will likely have difficulties interpreting a
large number of possibilities and the associated probability
of paths and payoffs 11, and it may be impractical for the
human to provide the human strategy to the autonomy protocol,
due to possibly large state space of the MDP. Therefore,
we compute a human strategy ?h as an abstraction of a
sequence of human’s commands, which we obtain using inverse
reinforcement learning 1, 28.

We design an autonomy protocol that provides another
strategy ?a, which we call the autonomy strategy. Then, we
blend the two strategies according to the blending function b
into the blended strategy ?ha. The blending function reflects
preference over the human strategy or the autonomy strategy.
We ensure that the blended strategy deviates minimally from
the human strategy.

At runtime, we can then blend commands of the human with
commands of the autonomy strategy. The resulting “blended”
commands will induce same behavior as with the blended
strategy ?ha, and the specifications are satisfied. This procedure
requires to check feasibility of a number of linear programming

problems, which can be expensive, but it is very efficient to
implement at run time.

The shared control synthesis problem is then the synthesis
of the repaired strategy such that, for the repaired strategy ?ha,
it holds ?ha |= ?1, . . . , ?n while deviating minimally from ?h.
The deviation between the human strategy ?h and the repaired
strategy ?ha is measured by the maximal difference between
the two strategies.


I'm James!

Would you like to get a custom essay? How about receiving a customized one?

Check it out