Reversi (or Othello) is a simple and popular board game played on an eight-by-eight board. In the field of reinforcement learning, searching of the game tree of Reversi is widely studied as a classic problem, since it has a small board and thus a state space not too complex to analyze. Monte Carlo tree search (MCTS) is a heuristic search algorithm for decision tree search, which is often applied to the AI methods for board games, such as the application of AlphaGo in the field of Go games. We modify and apply the Monte Carlo tree search strategy to Reversi AI. Applying some engineering optimizations (such as multithreading), we achieve significant results with high time efficiency.
In the classical Multi-Armed Bandit (MAB) problem, a player selects one out of a set of arms to play at each time, without knowing the reward models. At each time, the player selects one arm to play, aiming to maximize the total expected reward over 𝑇 times. The regret of an algorithm is the expected total loss after 𝑇 steps compared to the ideal scenario of knowing reward models. When the distributions of the arm reward are heavy-tailed, it is difficult to learn which arm has the best reward. In this paper, we introduce an algorithm based on the idea of Upper Confidence Bound (UCB) and prove that the algorithm achieves a sublinear growth of regret for heavy-tailed reward distributions. Furthermore, we consider MAB with gap periods as a dynamic model requiring that the arm will get into a gap period without offering reward immediately after being played. This model finds a broad application area in Internet advertising. Clearly the player should avoid to choose an arm until it gets out of the gap period. We extend the algorithm framework of Deterministic Sequencing of Exploration and Exploitation (DSEE) to the MAB model with gap periods, with regret reaching a growth of optimal order 𝑂ሺlog 𝑇ሻ for light-tailed distributions and a sublinear growth the for the heavy-tailed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.