Shen, S.*, Shimmei, M.*, Chi, M., & Matsuda, N. (2019). Applications of Reinforcement Learning to Self-Improving Educational Systems. In A. M. Sinatra, A. C. Graesser, X. Hu, K. Brawner & V. Rus (Eds.), Design Recommendations for Intelligent Tutoring Systems (Vol. 7: Self-Improving Systems, pp. 77-96). Orlando, FL: US Army Research Lab.

Introduction: Interactive e-learning systems such as Intelligent Tutoring Systems (ITSs), and educational games have become increasingly prevalent in educational settings. While these systems hold great promise, they are difficult and expensive to construct and are often brittle and inflexible in their interactions with students. In order to design an effective interactive e-learning system, developers must form the core of the system and then determine what and how to teach the desired content. Many interactive e-learning systems exist for Science, Technology, Engineering and Math (STEM) domains, but they are not all capable of the adaptive pedagogical decision-making that is central to achieving the potential learning gains afforded by such sys- tems. These limitations are due in part to the fact that they typically rely on a small set of hand-crafted rules when making pedagogical decisions. Because there are a lack of validated theories of decision-making in interactive e-learning systems, these rules are often project-specific and are rarely evaluated. Thus, there is a clear need to advance data-driven approaches to pedagogical decision-making.

Reinforcement Learning (RL) offers one of the most promising approaches to data-driven decision-making for improving student learning in interactive e-learning systems. RL algorithms are designed to induce effective policies that determine the best action for an agent to take in any given situation so as to maximize a cumulative reward. Optimal decision making in complex interactive environments is challenging. In ITSs, for example, the system's behaviors can be viewed as a sequential decision process where at each step the system chooses an appropriate action from a set of options. Pedagogical strategies are policies that are used to decide what action to take next in the face of alternatives. Each of these system decisions will affect the user's subsequent actions and performance. Its impact on outcomes cannot be observed immediately and the effectiveness of each decision is dependent upon the effectiveness of subsequent decisions. A number of researchers, including the authors of this chapter, have studied the application of existing RL algorithms to improve the effectiveness of interactive e-learning systems. In this chapter, we will describe two case studies on applying RL to improve the effectiveness of educational systems.

PDF: download