Pulsar candidate identification is an indispensable task in pulsar science. Based on the characteristics of imbalanced and diverse pulsar data sets, and the lack of a unified processing framework, we first used dimensionality reduction and visualization to analyze potential deficiencies caused by the incompleteness of current data set extraction methods. We found that the limited use of non-pulsar data may lead to bias in the result, which may limit the generalization ability. Based on the dimensionality reduction results, we propose a Grid Group Uniform Sampling (GGUS) method. This data preprocessing method improves the performance of Random Forest, Support Vector Machine, Convolutional Neural Network, and ResNet50 models on Lyon's features, diagnostic plots, and period-dispersion measure (period-DM) plots in the HTRU1 data set. The average recall increased by approximately0.5%, precision by nearly 2%, and F1 score by around 1.2% for all models and in all data sets. In the period-DM plots testing, the high-performance ResNet50 algorithm achieved over 98% F1 using random sampling. GGUS demonstrated further improvements in this test,enhancing the average F1 score,precision,and recall by approximately0.07%,0.1%,and0.03%,respectively.