PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Imitation learning has been shown to be a successful learning technique in scenarios where autonomous agents have to adapt their operation across diverse environments or domains. The main principle underlying imitation learning is to determine a state-to-action mapping, called a policy, from trajectories demonstrated by an expert. We consider the problem of imitation learning under adversarial settings where the expert could be malicious and intermittently give incorrect demonstrations to misguide the learning agent. We propose a technique using temporally extended policies called options to make a learning agent robust against adversarial expert demonstrations. Experimental evaluation of our proposed technique for a game playing AI shows that a learning agent using our options based technique can successfully resist deterioration in its task performance as compared to using conventional reinforcement learning, when an expert adversarially modifies the demonstrations either randomly or strategically.
Prithviraj Dasgupta
"Using options to improve robustness of imitation learning against adversarial attacks", Proc. SPIE 11746, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, 1174610 (12 April 2021); https://doi.org/10.1117/12.2585849
ACCESS THE FULL ARTICLE
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
The alert did not successfully save. Please try again later.
Prithviraj Dasgupta, "Using options to improve robustness of imitation learning against adversarial attacks," Proc. SPIE 11746, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, 1174610 (12 April 2021); https://doi.org/10.1117/12.2585849