Imitation learning has been shown to be a successful learning technique in scenarios where autonomous agents have to adapt their operation across diverse environments or domains. The main principle underlying imitation learning is to determine a state-to-action mapping, called a policy, from trajectories demonstrated by an expert. We consider the problem of imitation learning under adversarial settings where the expert could be malicious and intermittently give incorrect demonstrations to misguide the learning agent. We propose a technique using temporally extended policies called options to make a learning agent robust against adversarial expert demonstrations. Experimental evaluation of our proposed technique for a game playing AI shows that a learning agent using our options based technique can successfully resist deterioration in its task performance as compared to using conventional reinforcement learning, when an expert adversarially modifies the demonstrations either randomly or strategically.
Multi-robot systems comprising of heterogeneous autonomous vehicles on land, air, water are being increasingly
used to assist or replace humans in different hazardous missions. Two crucial aspects in such multi-robot
systems are to: a) explore an initially unknown region of interest to discover tasks, and, b) allocate and share
the discovered tasks between the robots in a coordinated manner using a multi-robot task allocation (MRTA)
algorithm. In this paper, we describe results from our research on multi-robot terrain coverage and MRTA
algorithms within an autonomous landmine detection scenario, done as part of the COMRADES project. Each
robot is equipped with a different type of landmine detection sensor and different sensors, even of the same type,
can have different degrees of accuracy. The landmine detection-related operations performed by each robot are
abstracted as tasks and multiple robots are required to complete a single task. First, we describe a distributed
and robust terrain coverage algorithm that employs Voronoi partitions to divide the area of interest among the
robots and then uses a single-robot coverage algorithm to explore each partition for potential landmines. Then,
we describe MRTA algorithms that use the location information of discovered potential landmines and employ
either a greedy strategy, or, an opportunistic strategy to allocate tasks among the robots while attempting to
minimize the time (energy) expended by the robots to perform the tasks. We report experimental results of our
algorithms using accurately-simulated Corobot robots within the Webots simulator performing a multi-robot,
landmine detection operation.
We consider the problem of distributed sensor information fusion by multiple autonomous robots within the
context of landmine detection. We assume that different landmines can be composed of different types of material
and robots are equipped with different types of sensors, while each robot has only one type of landmine detection
sensor on it. We introduce a novel technique that uses a market-based information aggregation mechanism
called a prediction market. Each robot is provided with a software agent that uses sensory input of the robot
and performs calculations of the prediction market technique. The result of the agent's calculations is a 'belief'
representing the confidence of the agent in identifying the object as a landmine. The beliefs from different
robots are aggregated by the market mechanism and passed on to a decision maker agent. The decision maker
agent uses this aggregate belief information about a potential landmine and makes decisions about which other
robots should be deployed to its location, so that the landmine can be confirmed rapidly and accurately. Our
experimental results show that, for identical data distributions and settings, using our prediction market-based
information aggregation technique increases the accuracy of object classification favorably as compared to two
other commonly used techniques.