Assignment 1: Self/Weakly-Supervised Learning on Tabular Data

Research Background:

Out-of-Distribution (OOD) Generalization for Tabular Data

Tabular data are widely used in real-world applications such as financial risk control, medical diagnosis, industrial inspection, and online advertising. They represent one of the most fundamental data formats in industry. However, unlike image or text tasks, tabular data typically exhibit high feature heterogeneity, a mixture of categorical and continuous variables, a lack of explicit structural relationships among features, limited data scale, and strong scenario-dependent distributions. These characteristics pose significant challenges to the generalization ability of deep learning models.

In traditional supervised learning settings, models are optimized under the assumption that the training and test data are independent and identically distributed (i.i.d.). In real-world scenarios, however, this assumption is often violated. For example:

User demographics in finance may evolve over time, leading to temporal distribution shifts;

Medical data collected from different hospitals may vary due to equipment differences, causing covariate shifts;

User behavior patterns across regions may differ, resulting in label distribution shifts;

The emergence of new categories or new risk types may induce concept drift.

These phenomena are collectively referred to as the Out-of-Distribution (OOD) Generalization problem. Under OOD settings, a model may perform well on the training distribution, but its performance can deteriorate significantly when the test distribution changes, sometimes leading to catastrophic failure. This issue is particularly critical in high-risk domains such as finance and healthcare.

In recent years, substantial progress has been made in OOD learning within computer vision, including domain generalization, test-time adaptation, and causal representation learning. However, many of these approaches rely on structural inductive biases inherent in image data, such as spatial invariance and convolutional architectures. For tabular data, due to the lack of explicit structural information and the semantic independence and non-exchangeability of feature dimensions, existing OOD techniques cannot be directly transferred.

Moreover, tabular data face several unique challenges in OOD scenarios:

Non-shareable feature semantics: Features across datasets often differ significantly in meaning, making it difficult to build unified pre-trained models;

Complex distribution shifts: Covariate shift, label shift, and concept drift may occur simultaneously;

Small-data settings: Large-scale pretraining is often infeasible to mitigate distribution shifts;

Hidden causal relationships: Models based purely on statistical correlations tend to fail under distribution changes.

Therefore, developing robust OOD generalization methods for tabular data has become an important research direction in machine learning. Although existing studies have achieved certain progress, effectively modeling distribution shifts, constructing stable invariant features, and achieving reliable generalization without target-domain labels remain open and challenging problems.

Requirement:

Project Requirements

This project aims to design a self-supervised or weakly-supervised tabular machine learning method based on the TableShift benchmark, in order to improve model generalization under Out-of-Distribution (OOD) settings. The focus of this project is not merely to compare existing approaches, but to propose and implement a novel self-supervised or weakly-supervised mechanism (e.g., Test-Time Adaptation, TTA) that mitigates performance degradation caused by distribution shifts, without relying on target-domain labels.

The core research objectives include:

Designing effective self-supervised or weakly-supervised learning signals;

Improving model robustness under strict OOD settings;

Analyzing the applicability and limitations of the proposed method under different types of distribution shifts.

I. Dataset Requirements

This project will be conducted on the TableShift benchmark. TableShift is a standardized benchmark designed for evaluating OOD generalization in tabular data. It includes multiple real-world datasets and provides clearly defined In-Distribution (ID) and Out-of-Distribution (OOD) splits.

All experiments must strictly follow the official train/validation/test splits. Under the standard OOD setting:

OOD test labels must NOT be used for training or hyperparameter tuning.

The required datasets are:

assistments

nhanes_lead

brfss_diabetes

acsfoodstamps

physionet

acsunemployment

For details, refer to the official TableShift GitHub: https://github.com/mlfoundations/tableshift

For convenience, we provide the data download link: https://box.nju.edu.cn/d/958c50ca9223485eadac/ password: will be provided via QQ group.

II. Method Design Requirements

The proposed method must be built around self-supervised or weakly-supervised mechanisms. Students may choose one of the following three categories or combine multiple directions. All methods must clearly specify the data accessible during the training stage.

Category 1: Domain Generalization (DG) Methods

Core idea: Learn invariant representations or robust objectives during training so that the model generalizes to unseen OOD domains.

Data constraints:

Allowed: ID training data (with labels)

Allowed: domain/group labels if provided

Not allowed: OOD test data (features or labels)

Example directions:

Invariant Representation Learning

Distributionally Robust Optimization (DRO)

Causal representation modeling

Domain randomization via data augmentation

Stable feature subspace decomposition for tabular data

Self-supervised extensions may be incorporated during training, such as feature reconstruction, mask prediction, or consistency constraints.

Category 2: Test-Time Adaptation (TTA) Methods

Core idea: Adapt the model at test time using unlabeled target-domain data to mitigate distribution shifts.

Data constraints:

Training stage: ID training data (with labels only)

Test stage: Access to OOD test features (without labels)

Strictly forbidden: OOD test labels

You must clearly specify:

Whether model parameters are updated at test time;

Which parameters are updated (e.g., BatchNorm layers, classifier head, or full model);

The self-supervised objective used for adaptation.

Example directions:

Entropy minimization

Consistency regularization

Feature statistics matching

Pseudo-label self-training

Feature distribution alignment for tabular data

Special attention should be given to the non-exchangeability of tabular features when designing TTA strategies.

Category 3: Other Self-Supervised / Weakly-Supervised Methods

Alternative frameworks beyond DG and TTA may also be proposed, such as:

(1) Model selection or ensemble methods

Multi-source model selection

Unlabeled target-domain-based weight adjustment

(2) Self-supervised pretraining + downstream fine-tuning

Masked modeling for tabular data

Contrastive learning

Representation alignment learning

(3) Causal or structural modeling methods

Explicit causal structure modeling

Stable feature discovery

Cross-environment consistency constraints

All data usage constraints must be explicitly clarified.

Innovation Requirement

Propose a new self-supervised objective;

Design a modeling strategy tailored to tabular structure;

Develop a unified framework for multiple distribution shifts;

Provide new theoretical or empirical insights into OOD tabular generalization.

III. Experimental Report

(1) Method Design

Motivation

Self-supervised / weakly-supervised mechanism

Mathematical formulation

(2) Experimental Setup

Dataset description

Model architecture

Training procedure (single-stage or two-stage)

Whether parameters are updated at test time

Hyperparameter search strategy

(3) Experimental Results

Report OOD performance of your method and at least three baselines:

Accuracy

Balanced Accuracy

F1-score

Additionally report:

ID vs OOD performance comparison

Generalization Gap

Mean and standard deviation over at least three runs

IV. Experimental Protocol

Fix random seeds for reproducibility;

Tune hyperparameters only on validation sets;

Report average results over at least three independent runs;

Clearly specify training details (optimizer, learning rate, batch size, number of epochs, etc.).

V. Report Format

The report must follow the official template of NeurIPS2025.

Template download link: https://media.neurips.cc/Conferences/NeurIPS2025/Styles.zip

The report should follow standard academic paper structure, including abstract, introduction, related work, methodology, experiments, and conclusion.

You may find some useful references from the following papers:

Josh Gardner, Zoran Popovic, Ludwig Schmidt. Benchmarking Distribution Shift in Tabular Data with TableShift. NeurIPS 2023 Dataset and Benchmarks Track. https://arxiv.org/abs/2312.07577

Weijieying Ren, Xiaoting Li, Huiyuan Chen, Vineeth Rakesh, Zhuoyi Wang, Mahashweta Das, Vasant G Honavar. TabLog: Test-Time Adaptation for Tabular Data Using Logic Rules. ICML 2024. https://proceedings.mlr.press/v235/ren24b.html

Changhun Kim, Taewon Kim, Seungyeon Woo, June Yong Yang, Eunho Yang. AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler. NeurIPS Workshop on Table Representation Learning (NeurIPSW-TRL), 2024. https://arxiv.org/abs/2407.10784

Zhi Zhou, Kun-Yang Yu, Lan-Zhe Guo, Yu-Feng Li. Fully Test-time Adaptation for Tabular Data. AAAI 2025. https://arxiv.org/abs/2412.10871

Rundong He, Jieming Shi. Prior-free Tabular Test-time Adaptation. ICLR 2026. https://openreview.net/pdf/530299a750b36f74a74d3698b0a8fd7e0a9068e0.pdf

How to submit?

You need to submit the report in PDF format and your code(including a readme file) within a zip file. The zip file should be named as your 'Student ID + name'. We collect your report via nju box. The url is: https://box.nju.edu.cn/u/d/f62af0d399cd4077b1e4/ password: will be provided via QQ group.

The evaluation of your report.

Your languege: concise, precise, and logical.

Your organization: clearly and properly seperated sections and paragraphs

Your format: carefully deal with the citation and overall consistency

Insights: clear, principled, and technically grounded.

About the DEADLINE and Score.

The original deadline was 2026.05.08 23:59:59. Updated at 2026.04.24: the deadline has been EXTENDED to 2026.05.15 23:59:59 (final).

Additionally Issues

If you have any questions, please contact us via email: yuky@lamda.nju.edu.cn.