Abstract In skeleton-based action recognition, abstracting the human body to skeletal representations often results in the loss of crucial i
Abstract In skeleton-based action recognition, abstracting the human body to skeletal representations often results in the loss of crucial information, which may result in misclassification of similar actions. To address this issue, we propose a Cross-scale Spatial Refinement Graph Convolutional Network (CSR-GCN), which aims to improve action recognition accuracy by effectively capturing fine-grained features of skeleton sequences. In detail, we introduce an Attention-based Graph Pooling (AGP) module and a Cross-scale Feature Aggregation (CFA) module. The AGP module uses graph pooling to construct multi-scale skeletal sub-graphs, capturing implicit joint relationships and preserving crucial motion details. It retains global motion information while emphasizing local joint interactions, which enables a better understanding of dynamic changes in complex actions. Furthermore, the CFA module selectively integrates features from different spatial scales, enhancing feature distinctiveness while balancing global motion and local details. This multi-scale refinement of skeletal sequence representations, thereby capturing subtle dynamic changes in actions more precisely and enhancing the ability of the model to recognize and classify complex movement patterns. Finally, we validate the effectiveness of our method on three large-scale datasets, achieving superior accuracy compared to other state-of-the-art methods.