Zhi-Hua Zhou's Publications

[1] Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, 2nd edition, Boca Raton, FL: CRC Press, 2025. [1st edition, Chinese version, Japanese version]

This book provided a concise but comprehensive introduction to ensemble learning. Author's contributions can be found in many important topics of this field, including theoretical, algorithmic as well as applicational aspects.

[2] Z.-H. Zhou, J. Feng. Deep Forest. National Science Review, 2019, 6(1): 74-86. (early version in IJCAI'17)

This work argued that "deep learning" is not limited to be "deep neural networks", and not limited to be built on differentiable modules. The proposed deep forest is the first deep model built on non-differentiable modules, relying on neither BP nor gradient. It has exhibited advantages in tabular data and can be enhanced when hardware for tree learning, like GPU for deep neural networks, become available in future.

[3] W. Gao, Z.-H. Zhou. On the doubt about margin explanation of boosting. Artificial Intelligence, 2013, 203: 1-18.

This work addressed the long-standing fundamental problem: Why AdaBoost seems resistant to overfitting? The answer lies in the theoretical finding that AdaBoost maximizes margin mean and minimizes margin variance simultaneously, and can continue to do that even after training error reaches zero (this also implies that AdaBoost will overfit finally, though very late). This understanding inspired the ODM (Optimal margin Distribution Machines) [Zhou, ANNPR'14; Zhang & Zhou, TKDE'20]. [CVPR'21 keynote video]

[4] F. T. Liu, K. M. Ting, Z.-H. Zhou. Isolation-based anomaly detection. ACM Trans. on Knowledge Discovery from Data, 2012, 6(1): article 3. (early version in ICDM'08)

This work claimed that isolation from the majority is the fundamental property of anomalies, possibly more crucial than previously believed (large) distance or (low) density. The proposed isolation forest (iForest) becomes a popularly used anomaly detection algorithm that can be found in various toolboxes such as scikit-learn.

[5] Z.-H. Zhou. When semi-supervised learning meets ensemble learning. In: Proceedings of 8th International Workshop on Multiple Classifier Systems (MCS'09), keynote article, LNCS 5519, 2009, pp.529-538.

Ensemble learning and semi-supervised learning are two different machine learning branches that were almost separately developed. This work advocated the leverage of ensemble and unlabeled data simultaneously, which has been well adopted by the ensemble community. It also gave born to a new strategy for ensemble diversity enhancement [Zhang & Zhou, DMKD'13].

[6] Z.-H. Zhou, M. Li. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.

An early work leveraging ensemble learning and unlabeled data.

[7] Z.-H. Zhou, Y. Jiang. NeC4.5: Neural ensemble based C4.5. IEEE Trans. on Knowledge and Data Engineering, 2004, 16(6): 770-773.

This work proposed to provide inputs to a complicated learned model A and gets its corresponding outputs, and then use the input-output pairs as pseudo-data to train a model B which can be simpler, stronger, and more explainable than model A. Ten years later, such approach becomes fundamental and popularly used in training large language models and various big models under the name Knowledge Distillation.

[8] Z.-H. Zhou, J. Wu, W. Tang. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002, 137(1-2): 239-263. (early version in IJCAI'01)

This work showed that, an ensemble can be pruned to a smaller-sized one with improved generalization; this altered the common sense that ensemble pruning could reduce storage and computational cost in prediction but had to sacrifice generalization. It initiated optimization-based pruning, a mainstream of ensemble pruning, or called selective ensemble.

[9] Z.-H. Zhou, Y. Jiang, Y.-B. Yang, S.-F. Chen. Lung cancer cell identification based on artificial neural network ensembles. Artificial Intelligence in Medicine, 2002, 24(1): 25-36.

An early application of neural network ensemble to lung cancer diagnosis. A simple yet effective two-level ensemble architecture was designed to decrease missing alarms and increase overall reliability.

Evolutionary Learning

[10] Z.-H. Zhou, Y. Yu, C. Qian. Evolutionary Learning: Advances in Theories and Algorithms, Berlin: Springer, 2019. [Chinese version]

This book summarized authors' efforts during the past two decades on establishing theoretical foundation for exploiting evolutionary mechanisms in machine learning, trying to overcome the "heuristics" nature of evolutionary algorithms (EAs). General theoretical techniques for runtime and approximation analysis of EAs were developed. Theoretical results about important factors of evolutionary process were obtained. New algorithms guided by theoretical understandings were developed, including EAs that achieve better approximation guarantees than conventional algorithms.

Weakly Supervised Learning

(incl. semi-supervised learning, active learning, multi-instance learning, crowdsourcing learning, etc.)

[11] Z.-H. Zhou. A brief introduction to weakly supervised learning. National Science Review, 2018, 5(1): 44-53.

This work offered a concise but comprehensive introduction to weakly supervised learning, and defined the scope of this field according to insufficient, inexact and inaccurate supervisions.

[12] Y.-F. Li, Z.-H. Zhou. Towards making unlabeled data never hurt. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175-188. (early version in ICML'11)

A fundamental weakness of semi-supervised learning algorithms lies in the fact that in many cases they can perform worse than supervised counterparts that use only labeled data. This issue was tackled in this work by the proposal of the S4VM (Safe Semi-Supervised SVM) algorithm with theoretical guarantee.

[13] S.-J. Huang, R. Jin, Z.-H. Zhou. Active learning by querying informative and representative examples. IEEE Trans. on Pattern Analysis and Machine Intelligence 2014, 36(10): 1936-1949. (early version in NIPS'10)

This work proposed to leverage informativeness and representativeness for active learning in a principled way, and presented the QUIRE algorithm.

[14] W. Wang, Z.-H. Zhou. Multi-view active learning in the non-realizable case. In: Advances in Neural Information Processing Systems 23 (NIPS'10), 2010, pp.2388-2396.

Active learning was believed to be helpful in reducing sample complexity, but theoretical results were mostly built on the realizability assumption which rarely holds in practice, while those considered non-realizability were generally pessimistic. This work presented the first encouraging result on active learning in non-realizable case by exploiting multi-view data characteristics.

[15] W. Wang, Z.-H. Zhou. A new analysis on co-training. In: Proceedings of the 27th International Conference on Machine Learning (ICML'10), 2010, pp.1135-1142.

Co-training is a very successful semi-supervised learning algorithm but was believed to rely on the existence of two independent and sufficient feature sets. This work proved the necessary and sufficient condition for co-training, confirming our ECML'07 result (sufficient condition) that co-training only requires enough diversity between the classifiers; this provided theoretical foundation for co-training variants designed for common data having only one single feature set.

[16] Z.-H. Zhou, Y.-Y. Sun, Y.-F. Li. Multi-instance learning by treating instances as non-i.i.d. samples. In: Proceedings of the 26th International Conference on Machine Learning (ICML'09), 2009, pp.1249-1256.

This work claimed that instances within a multi-instance bag should not be regarded as i.i.d. samples, and proposed the miGraph algorithm.

[17] Z.-H. Zhou, J.-M. Xu. On the relation between multi-instance learning and semi-supervised learning. In: Proceedings of the 24th International Conference on Machine Learning (ICML'07), 2007, pp.1167-1174.

This work disclosed the connection between multi-instance learning and semi-supervised learning, showing that it is possible to address multi-instance learning from the view of semi-supervised learning.

[18] D. Zhang, Z.-H. Zhou, S. Chen. Semi-supervised dimensionality reduction. In: Proceedings of the 7th SIAM International Conference on Data Mining (SDM'07), 2007, pp.629-634.

The first work about semi-supervised dimensionality reduction.

[19] Z.-H. Zhou, M. Li. Semisupervised regression with co-training style algorithms. IEEE Trans. on Knowledge and Data Engineering, 2007, 19(11): 1479-1493. (early version in IJCAI'05)

The first work about semi-supervised regression.

[20] Z.-H. Zhou, K.-J. Chen, H.-B. Dai. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Trans. on Information Systems, 2006, 24(2): 219-244.

An early work about combining semi-supervised learning and active learning, in the context of image retrieval.

Multi-label Learning

[21] X.-Z. Wu, Z.-H. Zhou. A unified view of multi-label performance measures. In: Proceedings of the 34th International Conference on Machine Learning (ICML'17), 2017, pp.3780-3788.

This work offered a theoretical understanding of the relation of various multi-label learning performance measures, showing that they can be optimized by maximizing the instance-wise margin and/or label-wise margin defined in the work.

[22] M.-L. Zhang, Z.-H. Zhou. A review on multi-label learning algorithms. IEEE Trans. on Knowledge and Data Engineering, 2014, 26(8): 1819-1837.

This work proposed to categorize multi-label learning algorithms into two groups, i.e., problem transformation and algorithm adaptation algorithms, and offered a review according to this categorization.

[23] W. Gao, Z.-H. Zhou. On the consistency of multi-label learning. Artificial Intelligence, 2013, 199-200: 22-44. (early version in COLT'11)

This work presented the first theoretical analysis on Bayes consistency of multi-label learning, disclosing the amazing fact that several commonly used multi-label performance measures are inconsistent.

[24] Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, Y.-F. Li. Multi-instance multi-label learning. Artificial Intelligence, 2012, 176(1): 2291-2320. (early version in NIPS'06)

The work proposed the MIML (Multi-Instance Multi-Label learning) framework widely used for learning with complicated data objects.

[25] M.-L. Zhang, Z.-H. Zhou. ML-kNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7): 2038-2048.

This work proposed the well-known multi-label learning algorithm MLkNN that can be found in various toolboxes such as scikit-multilearn.

[26] M.-L. Zhang, Z.-H. Zhou. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Trans. on Knowledge and Data Engineering, 2006, 18(10): 1338-1351.

This work proposed the first multi-label neural network algorithm BP-MLL that can be found in various toolboxes such as MULAN.

Imbalanced Learning

(learning with unequal misclassification costs and/or imbalanced classes)

[27] Z.-H. Zhou, X.-Y. Liu. On multi-class cost-sensitive learning. Computational Intelligence, 2010, 26(3): 232-257. (early version in AAAI'06)

For multi-class classification with unequal misclassification costs, this work provided the condition under which there exist close-form solutions. It has become common-sense and been exploited in various toolboxes such as MATLAB.

[28] X.-Y. Liu, J. Wu, Z.-H. Zhou. Exploratory undersampling for class-imbalance learning. IEEE Trans. on Systems, Man, and Cybernetics - Part B: Cybernetics, 2009, 39(2): 539-550. (early version in ICDM'06)

This work proposed EasyEnsemble, an efficient class-imbalance learning algorithm that benefits efficiency from undersampling but avoids discarding important samples.

[29] X.-Y. Liu, Z.-H. Zhou. The influence of class imbalance on cost-sensitive learning: An empirical study. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM'06), 2006, pp.970-974.

This work disclosed that rescaling is needed for severe class imbalance whereas no specific processing needed for mild class imbalance, which has become common-sense.

[30] Z.-H. Zhou, X.-Y. Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. on Knowledge and Data Engineering, 2006, 18(1): 63-77.

This work described an attempt of addressing unequal misclassification cost and class-imbalance in a unified framework, leading to the argument that the nature of cost-sensitive and class-imbalance learning are different, which has become common-sense.

Learning Theory

(in addition to above)

[31] W. Gao, Z.-H. Zhou. Dropout Rademacher complexity of deep neural networks. Science China: Information Sciences, 2016, 59(7): 072104:1-072104:12.

This work disclosed that for shallow neural networks (with one or no hidden layer) dropout is able to reduce Rademacher complexity in polynomial, whereas for deep neural networks it can lead to an exponential reduction.

[32] T. Yang, Y.-F. Li, M. Mahdavi, R. Jin, Z.-H. Zhou. Nyström method vs random Fourier features: A theoretical and empirical comparison. In: Advances in Neural Information Processing Systems 25 (NIPS'12), 2012, pp.485-493.

The Nyström method and random Fourier features are both effective for large-scale kernel learning. This work disclosed that the Nyström method can be significantly better in generalization when there exists large gap in the eigen-spectrum of the kernel matrix.

[33] Y. Zhang, R. Jin, Z.-H. Zhou. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics, 2010, 1(1): 43-52.

The bag-of-words model has been popularly used in various applications such as object recognition, but difficult to be analyzed theoretically. This work presented a statistical framework enabling its analysis.

Face Recognition

[34] X. Geng, C. Yin, Z.-H. Zhou. Facial age estimation by learning from label distributions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2013, 35(10): 2401-2412.

This work proposed the Label distribution learning framework which has been widely used, such as by NASA scientists for the analysis of Mars minerals found by NASA MSL rover Curiosity.

[35] Y. Zhang, Z.-H. Zhou. Cost-sensitive face recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010, 32(10): 1758-1769. (early version in CVPR'08)

This work argued that face recognition has to consider unequal misclassification costs and developed multi-class solutions.

[36] X. Geng, Z.-H. Zhou, K. Smith-Miles. Automatic age estimation based on facial aging patterns. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007, 29(12): 2234-2240.

An early study about facial age estimation. The proposed AGES (Aging Pattern Subspace) algorithm has become a baseline in this area.

[37] X. Tan, S. Chen, Z.-H. Zhou, F. Zhang. Face recognition from a single image per person: A survey. Pattern Recognition, 2006, 39(9): 1725-1745.

A survey including many of authors' work about face recognition with a single training image per person. Many of the methods were later exploited under the names zero-shot learning, few-shot learning, etc.

[38] X. Geng, D.-C. Zhan, Z.-H. Zhou. Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans. on Systems, Man, and Cybernetics - Part B: Cybernetics, 2005, 35(6): 1098-1107.

In contrast to most manifold learning algorithms aiming to recover intrinsic low-dimensional embeddings faithfully, this work claimed that recovering a skewed embedding distorted through incorporating label information can be more convenient for classification.

[39] D. Zhang, Z.-H. Zhou. (2D)2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing, 2005, 69(1-3): 224-231.

A simple yet efficient dimensionality reduction algorithm considering both row and column directions.

[40] F. J. Huang, T. Chen, Z.-H. Zhou, H.-J. Zhang. Pose invariant face recognition. In: Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition (FG'00), 2000, pp.245-250.

In contrast to other pose invariant face recognition solutions (at that time) that required error-prone pose estimation before recognition, the proposed neural network ensemble approach did not require pose estimation as input, but offered recognition along with pose estimation results as output.

Some Recent Publications

[41] Z.-H. Zhou. Learnability with Time-Sharing Computational Resource Concerns. National Science Review, 2024, 11: nwae204.
[42] Z.-H. Zhou, Z.-H. Tan. Learnware: Small models do big. Science China Information Sciences, 2024, 67(1): 112102.
[43] Z.-H. Zhou. Open-environment machine learning. National Science Review, 2022, 9(8): nwac123.
[44] Z.-H. Zhou. Rehearsal: Learning from prediction to decision. Frontiers of Computer Science, 2022, 16(4): 164352.
[45] Z.-H. Zhou, Y.-X. Huang. Abductive learning. In: P. Hitzler and M. K. Sarker eds., Neuro-Symbolic Artificial Intelligence, IOP Press, Amsterdam, 2022, p.353-379.
[46] S.-Q. Zhang, Z.-H. Zhou. Flexible Transmitter Network. Neural Computation, 2021, 33(11): 2951-2970.
[47] Z.-H. Zhou. Machine Learning, Springer, 2021. (ISBN 978-981-15-1966-6) with 中文版; Japanese version; Korean version