Abstract Reinforcement learning is an indispensable branch of artificial intelligence (AI), referring to the technology and methods of maxim
Abstract Reinforcement learning is an indispensable branch of artificial intelligence (AI), referring to the technology and methods of maximizing the rewards from an uncertain environment. As Moore’s law is coming to an end, the operation speed and the energy consumption of the advanced integrated circuits are gradually unable to meet the ever-increasing requirements of reinforcement learning. In recent years, photonic accelerator evolves as a powerful candidate to solve this issue. Here, a brand-new photonic accelerator based on a nonlinear optoelectronic oscillator (NOEO) is proposed and demonstrated to solve the multi-armed bandit (MAB) problem and simulate the Tic Tac Toe (TTT) game, both of which are the most famous reinforcement learning problems. Through adjusting the balance between the gain and the nonlinearity in the NOEO cavity, four parallel orthogonal chaotic sequences are generated with a 6-dB bandwidth up to 18.18 GHz and a permutation entropy (PE) as high as 0.9983. With assistance of tug-of-war and time differential methods, a 512-armed bandit problem and an intelligent TTT game are successfully accelerated, respectively. This work presents an innovative photonic accelerator for solving reinforcement learning problems more efficiently. Apart from reinforcement learning, the proposed scheme can find applications in other fields of AI, such as reservoir computing and neural networks. Reinforcement learning is an indispensable branch of artificial intelligence (AI), referring to the technology and methods of maximizing the rewards from an uncertain environment. As Moore’s law is coming to an end, the operation speed and the energy consumption of the advanced integrated circuits are gradually unable to meet the ever-increasing requirements of reinforcement learning. In recent years, photonic accelerator evolves as a powerful candidate to solve this issue. Here, a brand-new photonic accelerator based on a nonlinear optoelectronic oscillator (NOEO) is proposed and demonstrated to solve the multi-armed bandit (MAB) problem and simulate the Tic Tac Toe (TTT) game, both of which are the most famous reinforcement learning problems. Through adjusting the balance between the gain and the nonlinearity in the NOEO cavity, four parallel orthogonal chaotic sequences are generated with a 6-dB bandwidth up to 18.18 GHz and a permutation entropy (PE) as high as 0.9983. With assistance of tug-of-war and time differential methods, a 512-armed bandit problem and an intelligent TTT game are successfully accelerated, respectively. This work presents an innovative photonic accelerator for solving reinforcement learning problems more efficiently. Apart from reinforcement learning, the proposed scheme can find applications in other fields of AI, such as reservoir computing and neural networks.