The proposed motion cueing algorithm (MCA), based on a reinforcement learning algorithm using gradient information to directly update the co
The proposed motion cueing algorithm (MCA), based on a reinforcement learning algorithm using gradient information to directly update the control policy, introduces three significant enhancements. First, transform the complex simulator environment into a differentiable simulator environment that provides gradient information at each time step and use this gradient information to directly update the control policy. Second, the network architecture is reconfigured into a concurrent controller format, similar to Model Predictive Control (MPC). This controller processes a sequence of vehicle motion reference signals over a future period, utilizing a multi-layer perceptron to generate the simulator’s motion reference control signal sequences for the same duration. Unlike the online optimization employed in MPC, this algorithm as an offline optimization method, providing substantial computational advantages when integrated into the driving simulator. As the prediction horizon increases, the algorithm demonstrates superior computational efficiency, which helps reduce the incidence of motion sickness during the use of the driving simulator. Third, a loss function specifically designed for the motion simulator is proposed. This function incorporates constraints derived from the MPC framework to address workspace limitations and applies them to workspace management. These constraints restrict the platform’s acceleration and speed near the workspace boundaries, allowing for better utilization of the available space. The algorithm is validated using Carla’s autonomous driving simulation software as the dataset generator. During the training process, the proposed algorithm in this paper achieves an order-of-magnitude improvement in convergence speed compared to conventional training methods of PPO and DDPG. Simulations with a 10-step prediction horizon indicate that the Root Mean Square Error (RMSE) produced by this algorithm is comparable to that of the MCA based on MPC (MPC-MCA) and significantly lower than that of the MCA based classical washout (CW-MCA). At higher prediction horizons, the algorithm achieves performance on par with state-of-the-art MPC-based motion cueing algorithms while exhibiting reduced algorithmic delay. Additionally, the proposed algorithm delivers quicker results and improved tracking performance across all prediction horizons, ultimately surpassing the current state-of-the-art MPC-MCA.