Objective Clustering involves partitioning given data into multiple classification clusters without requiring knowledge of the true labels.
Objective Clustering involves partitioning given data into multiple classification clusters without requiring knowledge of the true labels. In other words, clustering is an unsupervised learning process that ensures that samples within the same cluster are as similar as possible while those in different clusters are as dissimilar as possible. Traditional clustering algorithms can be categorized into five primary types: hierarchical, density-based, grid-based, model-based, and partition-based algorithms. However, these traditional models are sensitive to the cluster number K, outliers, and noise points. In addition, the selection of the initial cluster center is set artificially, and the cluster shape cannot extend arbitrarily. Spectral clustering, a clustering algorithm based on spectral graph theory, transforms the clustering problem into an optimal partition problem of the graph. This approach has emerged as one of the most widely adopted techniques in various domains, including data mining, face recognition, and image segmentation.Methods The spectral clustering algorithm provides simplicity in implementation, robustness to high-dimensional data, and flexibility in handling diverse sample space shapes. This enables global optimization and often results in superior performance compared to traditional clustering algorithms. However, existing spectral clustering methods encounter challenges, such as determining the scale parameter, suboptimal clustering results in multi-scale datasets, and instability in the obtained clusters. A density-adjusted spectral clustering algorithm based on neighborhood standard deviation is proposed to address the unstable clustering results caused by the random selection of initial cluster centers and the artificial setting of scale parameters in spectral clustering. This approach incorporates both global and local data information. The proposed algorithm treats the initial class center value and scale parameter as decision variables to achieve adaptive optimization and enhance spectral clustering. The neighborhood labeling difference is employed to determine the initial cluster center value, while the scale parameter is optimized to improve the expression of the Gaussian kernel function. The calculation methods for the scale parameter, sample local density, and initial cluster center of the density peak are redefined using the neighborhood standard deviation. Therefore, the proposed spectral clustering algorithm eliminates the need for neighborhood parameter input and produces stable clustering results. First, an initial class center decision value algorithm based on density peak (DP_KD) is designed to resolve the issue of unstable clustering results in density-adjusted spectral clustering. First, the initial cluster center value determination method replaces the artificial setting of scale parameters. In this process, K points are randomly selected from the dataset as initial clustering centers and different classes are assigned based on each selected point. Then, the reciprocal of the sample neighborhood standard deviation is used as a parameter to measure the local density of the sample, while the average distance between samples is utilized to calculate the corresponding neighborhood radius of each sample. Simultaneously, the neighborhood standard deviation inverse is defined as the local sample density, and the sample with the highest density is assigned the minimum Euclidean distance among the samples. Finally, the product of sample local density and sample distance is defined as the initial class center determination value λ. The samples with the largest λK are selected as the initial class centers for this algorithm, forming the center value Decision algorithm DP_KD.Results and Discussions The change of the scale parameter influences the construction of the similarity matrix, which affects the Laplacian matrix and the formation of the eigenvector space. In the spectral clustering algorithm, the similarity matrix is critical in determining the quality of the final clustering outcomes. Therefore, the Euclidean distance between a sample and its p-th nearest neighbor is introduced as the scale parameter for each sample, utilizing the neighbor information to enhance the similarity matrix and reduce the sensitivity of the spectral clustering algorithm to the scale parameter. In addition, based on the optimized initial class center decision value and the nearest neighbor parameter method, a density-adjusted spectral clustering algorithm based on neighborhood standard deviation (DSSD) is proposed to address the instability of clustering results caused by the random selection of initial cluster centers and the artificial setting of scale parameters. The proposed algorithm effectively fine-tunes the similarity between sample points in regions of varying density for multi-scale datasets.Conclusions Experimental results indicated that, compared to other spectral clustering algorithms, the proposed algorithm achieves improved clustering performance and yields more stable clustering results. When compared to IK_DM, IIEFA, OFMMK–means, classical clustering algorithms such as K–means and K–medoids, as well as the AP and SNN–MSC algorithms, the clustering performance of the DP_KD algorithm across seven UCI datasets is superior to that of the other algorithms. These findings demonstrate the feasibility and effectiveness of integrating the DP_KD algorithm into density-adjusted spectral clustering as a replacement for classical clustering algorithms. The experiments are conducted on seven UCI datasets: Iris, Wine, Seeds, Yeast, Glass, Ecoli, and Waveform. Four evaluation metrics, Accuracy (ACC), Purity, Rand Index (RI), and F–measure (F1), are employed to assess the clustering performance. A higher value for these evaluation indices signifies the superior performance of the proposed DP_KD algorithm. In general, the clustering effectiveness of the DP_KD algorithm across the seven UCI datasets surpasses that of the other algorithms. In addition, the DSSD algorithm is compared to four spectral clustering algorithms: ADSC, WDSC, SC, and PRSC. The experiments are conducted on seven UCI datasets: Sonar, Musk, Breast, Statlog, Madelon, Segmentation, and Waveform. The ACC index is utilized to evaluate the clustering effect of these spectral clustering algorithms. Experimental results demonstrated that the DSSD algorithm enhances the clustering effectiveness of density-adjusted spectral clustering and outperforms other advanced spectral clustering algorithms.