[Advanced Machine Learning Course]
Out-of-Distribution (OOD) Generalization for Tabular Data
Tabular data are widely used in real-world applications such as financial risk control, medical diagnosis, industrial inspection, and online advertising. They represent one of the most fundamental data formats in industry. However, unlike image or text tasks, tabular data typically exhibit high feature heterogeneity, a mixture of categorical and continuous variables, a lack of explicit structural relationships among features, limited data scale, and strong scenario-dependent distributions. These characteristics pose significant challenges to the generalization ability of deep learning models.
In traditional supervised learning settings, models are optimized under the assumption that the training and test data are independent and identically distributed (i.i.d.). In real-world scenarios, however, this assumption is often violated. For example:
- User demographics in finance may evolve over time, leading to temporal distribution shifts;
- Medical data collected from different hospitals may vary due to equipment differences, causing covariate shifts;
- User behavior patterns across regions may differ, resulting in label distribution shifts;
- The emergence of new categories or new risk types may induce concept drift.
These phenomena are collectively referred to as the Out-of-Distribution (OOD) Generalization problem. Under OOD settings, a model may perform well on the training distribution, but its performance can deteriorate significantly when the test distribution changes, sometimes leading to catastrophic failure. This issue is particularly critical in high-risk domains such as finance and healthcare.
In recent years, substantial progress has been made in OOD learning within computer vision, including domain generalization, test-time adaptation, and causal representation learning. However, many of these approaches rely on structural inductive biases inherent in image data, such as spatial invariance and convolutional architectures. For tabular data, due to the lack of explicit structural information and the semantic independence and non-exchangeability of feature dimensions, existing OOD techniques cannot be directly transferred.
Moreover, tabular data face several unique challenges in OOD scenarios:
- Non-shareable feature semantics: Features across datasets often differ significantly in meaning, making it difficult to build unified pre-trained models;
- Complex distribution shifts: Covariate shift, label shift, and concept drift may occur simultaneously;
- Small-data settings: Large-scale pretraining is often infeasible to mitigate distribution shifts;
- Hidden causal relationships: Models based purely on statistical correlations tend to fail under distribution changes.
Therefore, developing robust OOD generalization methods for tabular data has become an important research direction in machine learning. Although existing studies have achieved certain progress, effectively modeling distribution shifts, constructing stable invariant features, and achieving reliable generalization without target-domain labels remain open and challenging problems.
Project Requirements
This project aims to design a self-supervised or weakly-supervised tabular machine learning method based on the TableShift benchmark, in order to improve model generalization under Out-of-Distribution (OOD) settings. The focus of this project is not merely to compare existing approaches, but to propose and implement a novel self-supervised or weakly-supervised mechanism (e.g., Test-Time Adaptation, TTA) that mitigates performance degradation caused by distribution shifts, without relying on target-domain labels.
The core research objectives include:
- Designing effective self-supervised or weakly-supervised learning signals;
- Improving model robustness under strict OOD settings;
- Analyzing the applicability and limitations of the proposed method under different types of distribution shifts.
I. Dataset Requirements
This project will be conducted on the TableShift benchmark. TableShift is a standardized benchmark designed for evaluating OOD generalization in tabular data. It includes multiple real-world datasets and provides clearly defined In-Distribution (ID) and Out-of-Distribution (OOD) splits.
All experiments must strictly follow the official train/validation/test splits. Under the standard OOD setting:
- OOD test labels must NOT be used for training or hyperparameter tuning.
The required datasets are:
- assistments
- nhanes_lead
- brfss_diabetes
- acsfoodstamps
- physionet
- acsunemployment
For details, refer to the official TableShift GitHub: https://github.com/mlfoundations/tableshift
For convenience, we provide the data download link: https://box.nju.edu.cn/d/958c50ca9223485eadac/ password: will be provided via QQ group.
II. Method Design Requirements
The proposed method must be built around self-supervised or weakly-supervised mechanisms. Students may choose one of the following three categories or combine multiple directions. All methods must clearly specify the data accessible during the training stage.
Category 1: Domain Generalization (DG) Methods
Core idea: Learn invariant representations or robust objectives during training so that the model generalizes to unseen OOD domains.
Data constraints:
- Allowed: ID training data (with labels)
- Allowed: domain/group labels if provided
- Not allowed: OOD test data (features or labels)
Example directions:
- Invariant Representation Learning
- Distributionally Robust Optimization (DRO)
- Causal representation modeling
- Domain randomization via data augmentation
- Stable feature subspace decomposition for tabular data
Self-supervised extensions may be incorporated during training, such as feature reconstruction, mask prediction, or consistency constraints.
Category 2: Test-Time Adaptation (TTA) Methods
Core idea: Adapt the model at test time using unlabeled target-domain data to mitigate distribution shifts.
Data constraints:
- Training stage: ID training data (with labels only)
- Test stage: Access to OOD test features (without labels)
- Strictly forbidden: OOD test labels
You must clearly specify:
- Whether model parameters are updated at test time;
- Which parameters are updated (e.g., BatchNorm layers, classifier head, or full model);
- The self-supervised objective used for adaptation.
Example directions:
- Entropy minimization
- Consistency regularization
- Feature statistics matching
- Pseudo-label self-training
- Feature distribution alignment for tabular data
Special attention should be given to the non-exchangeability of tabular features when designing TTA strategies.
Category 3: Other Self-Supervised / Weakly-Supervised Methods
Alternative frameworks beyond DG and TTA may also be proposed, such as:
(1) Model selection or ensemble methods
- Multi-source model selection
- Unlabeled target-domain-based weight adjustment
(2) Self-supervised pretraining + downstream fine-tuning
- Masked modeling for tabular data
- Contrastive learning
- Representation alignment learning
(3) Causal or structural modeling methods
- Explicit causal structure modeling
- Stable feature discovery
- Cross-environment consistency constraints
All data usage constraints must be explicitly clarified.
Innovation Requirement
- Propose a new self-supervised objective;
- Design a modeling strategy tailored to tabular structure;
- Develop a unified framework for multiple distribution shifts;
- Provide new theoretical or empirical insights into OOD tabular generalization.
III. Experimental Report
(1) Method Design
- Motivation
- Self-supervised / weakly-supervised mechanism
- Mathematical formulation
(2) Experimental Setup
- Dataset description
- Model architecture
- Training procedure (single-stage or two-stage)
- Whether parameters are updated at test time
- Hyperparameter search strategy
(3) Experimental Results
Report OOD performance of your method and at least three baselines:
- Accuracy
- Balanced Accuracy
- F1-score
Additionally report:
- ID vs OOD performance comparison
- Generalization Gap
- Mean and standard deviation over at least three runs
IV. Experimental Protocol
- Fix random seeds for reproducibility;
- Tune hyperparameters only on validation sets;
- Report average results over at least three independent runs;
- Clearly specify training details (optimizer, learning rate, batch size, number of epochs, etc.).
V. Report Format
The report must follow the official template of NeurIPS2025.
Template download link: https://media.neurips.cc/Conferences/NeurIPS2025/Styles.zip
The report should follow standard academic paper structure, including abstract, introduction, related work, methodology, experiments, and conclusion.
You may find some useful references from the following papers:
- Josh Gardner, Zoran Popovic, Ludwig Schmidt. Benchmarking Distribution Shift in Tabular Data with TableShift. NeurIPS 2023 Dataset and Benchmarks Track. https://arxiv.org/abs/2312.07577
- Weijieying Ren, Xiaoting Li, Huiyuan Chen, Vineeth Rakesh, Zhuoyi Wang, Mahashweta Das, Vasant G Honavar. TabLog: Test-Time Adaptation for Tabular Data Using Logic Rules. ICML 2024. https://proceedings.mlr.press/v235/ren24b.html
- Changhun Kim, Taewon Kim, Seungyeon Woo, June Yong Yang, Eunho Yang. AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler. NeurIPS Workshop on Table Representation Learning (NeurIPSW-TRL), 2024. https://arxiv.org/abs/2407.10784
- Zhi Zhou, Kun-Yang Yu, Lan-Zhe Guo, Yu-Feng Li. Fully Test-time Adaptation for Tabular Data. AAAI 2025. https://arxiv.org/abs/2412.10871
- Rundong He, Jieming Shi. Prior-free Tabular Test-time Adaptation. ICLR 2026. https://openreview.net/pdf/530299a750b36f74a74d3698b0a8fd7e0a9068e0.pdf
You need to submit the report in PDF format and your code(including a readme file) within a zip file. The zip file should be named as your 'Student ID + name'. We collect your report via nju box. The url is: https://box.nju.edu.cn/u/d/f62af0d399cd4077b1e4/ password: will be provided via QQ group.
Your languege: concise, precise, and logical.
Your organization: clearly and properly seperated sections and paragraphs
Your format: carefully deal with the citation and overall consistency
Insights: clear, principled, and technically grounded.
The original deadline was 2026.05.08 23:59:59. Updated at 2026.04.24: the deadline has been EXTENDED to 2026.05.15 23:59:59 (final).
If you have any questions, please contact us via email: yuky@lamda.nju.edu.cn.