Evolving robust policy coverage sets in multi-objective Markov decision processes through intrinsically motivated self-play

N/ACitations
Citations of this article
21Readers
Mendeley users who have this article in their library.

Abstract

Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Abdelfattah, S., Kasmarik, K., & Hu, J. (2018). Evolving robust policy coverage sets in multi-objective Markov decision processes through intrinsically motivated self-play. Frontiers in Neurorobotics, 12(October). https://doi.org/10.3389/fnbot.2018.00065

Readers over time

‘18‘19‘20‘21‘22‘24‘250481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 10

67%

Lecturer / Post doc 2

13%

Researcher 2

13%

Professor / Associate Prof. 1

7%

Readers' Discipline

Tooltip

Computer Science 5

38%

Engineering 5

38%

Nursing and Health Professions 2

15%

Social Sciences 1

8%

Save time finding and organizing research with Mendeley

Sign up for free
0