-  Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, Chapman & Hall/CRC, 2012. [Chinese version][Japanese version]
-  Z.-H. Zhou, J. Feng. Deep Forest. National Science Review, 2019, 6(1): 74-86. (early version in IJCAI'17)
-  W. Gao, Z.-H. Zhou. On the doubt about margin explanation of boosting. Artificial Intelligence, 2013, 203: 1-18.
-  F. T. Liu, K. M. Ting, Z.-H. Zhou. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1): article 3. (early version in ICDM'08)
-  Z.-H. Zhou. When semi-supervised learning meets ensemble learning. In: Proceedings of 8th International Workshop on Multiple Classifier Systems (MCS'09), keynote article, LNCS 5519, 2009, pp.529-538.
-  Z.-H. Zhou, M. Li. Tri-training: Exploiting unlabeled data using three classifiers. In: IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
-  Z.-H. Zhou, Y. Jiang. NeC4.5: Neural ensemble based C4.5. In: IEEE Transactions on Knowledge and Data Engineering, 2004, 16(6): 770-773.
-  Z.-H. Zhou, J. Wu, W. Tang. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002, 137(1-2): 239-263. (early version in IJCAI'01)
-  Z.-H. Zhou, Y. Jiang, Y.-B. Yang, S.-F. Chen. Lung cancer cell identification based on artificial neural network ensembles. Artificial Intelligence in Medicine, 2002, 24(1): 25-36.
This book provided a concise but comprehensive introduction to ensemble learning. Author's contributions can be found in many important topics of this field, including theoretical, algorithmic as well as applicational aspects.
This work argued that "deep learning" is not limited to be "deep neural networks", and not limited to be built on differentiable modules. The proposed deep forest is the first deep model built on non-differentiable modules, relying on neither BP nor gradient. It has exhibited advantages in tabular data and can be enhanced when hardware for tree learning, like GPU for deep neural networks, become available in future.
This work addressed the long-standing fundamental problem: Why AdaBoost seems resistant to overfitting? The answer lies in the theoretical finding that AdaBoost maximizes margin mean and minimizes margin variance simultaneously, and can continue to do that even after training error reaches zero (this also implies that AdaBoost will overfit finally, though very late). This understanding inspired the ODM (Optimal margin Distribution Machines) [Zhou, ANNPR'14; Zhang & Zhou, TKDE'20]. [CVPR'21 keynote video]
This work claimed that isolation from the majority is the fundamental property of anomalies, possibly more crucial than previously believed (large) distance or (low) density. The proposed isolation forest (iForest) becomes a popularly used anomaly detection algorithm that can be found in various toolboxes such as scikit-learn.
Ensemble learning and semi-supervised learning are two different machine learning branches that were almost separately developed. This work advocated the leverage of ensemble and unlabeled data simultaneously, which has been well adopted by the ensemble community. It also gave born to a new strategy for ensemble diversity enhancement [Zhang & Zhou, DMKD'13].
An early work leveraging ensemble learning and unlabeled data.
A similar idea ten years later becomes popular under the name Knowledge Distillation.
This work showed that, an ensemble can be pruned to a smaller-sized one with improved generalization; this altered the common sense that ensemble pruning could reduce storage and computational cost in prediction but had to sacrifice generalization. It initiated optimization-based pruning, a mainstream of ensemble pruning, or called selective ensemble.
An early application of neural network ensemble to lung cancer diagnosis. A simple yet effective two-level ensemble architecture was designed to decrease missing alarms and increase overall reliability.