A hybrid framework combining machine learning with traditional optimization solvers for MIP problems.
If you encounter technical hurdles when doing this part of the hw, you can email me or come to the office hour:
Predict + Search is a hybrid framework that combines machine learning with traditional optimization solvers to tackle challenging mixed-integer programming (MIP) problems. Instead of relying solely on the solver’s internal heuristics, a learned model is first used to predict promising values for a subset of decision variables based on structural features of the problem instance. These predictions are then partially enforced or used to guide the search, after which a classical MIP solver completes the optimization over the remaining variables. The key idea is that machine learning can capture recurring patterns across similar problem instances, while exact solvers ensure feasibility and optimality guarantees. In this homework, we apply the Predict + Search paradigm to Maximum Independent Set instances by training a graph neural network on the bipartite constraint–variable graph, and we study how prediction quality—under different loss functions—interacts with downstream MIP solving performance.
This homework uses a fixed dataset of MIP instances (features) + solution (label) data. Students will:
This section provides instructions for setting up your local environment on Apple Silicon (M1/M2/M3). We recommend using Python 3.10, as it is currently the most stable version for the PyTorch and PyTorch Geometric (PyG) ecosystem.
First, create and activate a new conda environment:
conda create -n graph_env python=3.10
conda activate graph_env
Upgrade the base packaging tools:
pip install --upgrade pip setuptools wheel
To leverage the Apple Silicon GPU, install the specific version of PyTorch:
pip install torch==2.2.2 torchvision torchaudio
You can verify that the GPU (MPS) is available with the following command:
python - <<EOF
import torch
print("Torch:", torch.__version__)
print("MPS available:", torch.backends.mps.is_available())
EOF
Expected Output:
Torch: 2.2.2
MPS available: True
This confirms the M1 Pro GPU via MPS is usable.
Install the core data science libraries and metric-learning tools:
pip install numpy pandas scikit-learn matplotlib tqdm networkx
pip install torchmetrics torcheval pytorch-metric-learning
Install PyTorch Geometric. Note that while the wheels may indicate CPU, they will still utilize MPS for device placement:
pip install torch_geometric
Finally, install utilities for data downloading and the Gurobi optimizer:
pip install gdown
conda install -c gurobi gurobi
pip install gurobipy
The operator ‘aten::scatter_reduce.two_out’ is not currently supported on the MPS backend
salloc –partition=gpu –gres=gpu:1 –cpus-per-task=8 –mem=32GB –time=1:00:00
module purge module load conda
mamba init bash source ~/.bashrc
Create new Conda environments in one of your available directories. By default, the packages will be installed in your home directory under /home1/
The process for creating and using environments has three basic steps:
Create the environment in your home directory (recommended)
-p specifies the full path instead of default read-only /apps.
mamba create -p $HOME/conda_envs/graph_env python=3.10
conda activate $HOME/conda_envs/graph_env
Check which CUDA version is available on your cluster:
nvidia-smi
(/home1/jongminm/conda_envs/graph_env) [jongminm@d23-16 ~]$ nvidia-smi Sat Mar 7 15:49:35 2026
+—————————————————————————————–+ | NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 | |—————————————–+————————+———————-+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla P100-PCIE-16GB On | 00000000:07:00.0 Off | 0 | | N/A 27C P0 26W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +—————————————–+————————+———————-+
+—————————————————————————————–+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found |
mamba install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
here’s a compact one-liner you can paste directly into your shell to check PyTorch, CUDA, GPU name, and memory usage in one command:
python -c “import torch; print(‘Torch:’, torch.version); print(‘CUDA available:’, torch.cuda.is_available()); print(‘Device name:’, torch.cuda.get_device_name(0) if torch.cuda.is_available() else ‘N/A’); print(‘Memory allocated (MB):’, torch.cuda.memory_allocated(0)//1024**2 if torch.cuda.is_available() else ‘N/A’)”
Torch: 2.5.1 CUDA available: True Device name: Tesla P100-PCIE-16GB Memory allocated (MB): 0
mamba install numpy==1.23 pandas scikit-learn matplotlib tqdm networkx
```bash
pip install torch_geometric
pip install torchmetrics torcheval pytorch-metric-learning
when using gurobipy: guroby version and gurobypy version should match. if you just run pip install gurobipy blidnly, you might run into the error:
gurobipy._exception.GurobiError: Request denied: license not valid for Gurobi version 13
so first check gurobi version on bash by
module load gurobi
gurobi_cl --version
pip install gurobipy==12.0.3
To evaluate the learning progress of the Graph Neural Network (GNN), we monitor several key performance indicators. Because our model is designed as a Bernoulli generative model, it learns the success probabilities for binary outcomes, which we then compare against the true labels (the optimal solutions).
BCE loss is the standard approach for binary classification. The following results were recorded at the final epoch (1500):
| Metric | Training | Validation |
|---|---|---|
| Loss | 2934.45 | 2956.86 |
| AUC | 0.820 | 0.826 |
| Accuracy | 0.747 | 0.752 |
At the final epoch (1500), we observe that while the AUC remains comparable to the BCE model, the Loss scale and Accuracy metrics differ significantly. This shift is expected due to the nature of the Contrastive objective function, which prioritizes the relative distance between representations rather than direct label matching.
| Metric | Training | Validation |
|---|---|---|
| Loss | 0.152 | 0.244 |
| AUC | 0.820 | 0.825 |
| Accuracy | 0.655 | 0.659 |
| Statistic | Gurobi | BCE Model | CL Model |
|---|---|---|---|
| Count | 10 | 10 | 10 |
| Mean | -2523.60 | -2519.70 | -2520.80 |
| Std Dev | 15.58 | 15.64 | 13.63 |
| Min | -2551.00 | -2545.00 | -2541.00 |
| 25th Pctl | -2528.25 | -2528.50 | -2531.00 |
| Median | -2522.50 | -2517.50 | -2520.50 |
| 75th Pctl | -2514.25 | -2509.50 | -2514.00 |
| Max | -2499.00 | -2498.00 | -2500.00 |
Interpretation:
The primal bound represents the best solution found so far. It is the lowest-cost feasible solution the solver has identified up to the current point and reflects the quality of the current answer. If you were to stop the solver right now, this would be the solution you’d use. The results indicate that Gurobi achieved the best primal bound more than the CL model, although the difference is relatively small.
| Method | Count of Best Primal Bound |
|---|---|
| Gurobi | 6 |
| CL Model | 4 |
The primal integral measures the speed of improvement over time by calculating the area under the objective value curve. A solver that quickly finds high-quality solutions will have a lower (better) primal integral than one that finds similar solutions later in the process. The results show that Gurobi improves relatively slowly, while the CL model achieves the fastest and most consistent improvements.
| Statistic | BCE Model | CL Model | Gurobi |
|---|---|---|---|
| Count | 10 | 10 | 10 |
| Mean | 5.52 | 4.61 | 19.91 |
| Std Dev | 1.08 | 1.28 | 2.93 |
| Min | 4.03 | 2.43 | 13.74 |
| 25th Pctl | 4.86 | 4.14 | 18.46 |
| Median | 5.31 | 4.59 | 20.44 |
| 75th Pctl | 6.43 | 4.86 | 21.44 |
| Max | 7.22 | 7.28 | 23.67 |
| Method | Count of Best Primal Integral |
|---|---|
| CL Model | 8 |
| BCE Model | 2 |
Interpretation:
Here are example training and validation metrics from the two loss functions used to train our models:
@epoch1499 Train loss: 2934.45 Train AUC: 0.820 Train ACC: 0.747 Valid loss: 2956.86 Valid AUC: 0.826 Valid ACC: 0.752 TIME: 19.06s
@epoch1499 Train loss: 0.152 Train AUC: 0.820 Train ACC: 0.655 Valid loss: 0.244 Valid AUC: 0.825 Valid ACC: 0.659 TIME: 27.28s
Conda does not automatically detect a copied environment, and copying environments can easily break things.
Here’s why.
1️⃣ How Conda tracks environments
Conda keeps metadata about environments in:
conda-meta/ inside the environment
a registry in the base conda installation
Example typical structure:
miniconda3/ ├── envs/ │ └── graph_env/ │ ├── bin/ │ ├── lib/ │ └── conda-meta/
When you run:
conda info –envs
Conda lists environments it knows about, usually from:
~/miniconda3/envs/
If you copy an environment somewhere else, Conda may not register it.
2️⃣ What happens if you copy it manually
Example:
cp -r ~/miniconda3/envs/graph_env /project2/…/conda_envs/
Possible issues:
Conda doesn’t list it
Hardcoded paths break
Activation fails
Binary libraries reference old paths
For example, many packages contain absolute paths like:
/home/user/miniconda3/envs/graph_env/…
So moving the environment can break them.
3️⃣ When copying does work
Sometimes it works if:
the path is identical
packages don’t depend on absolute paths
you call python directly
Example:
/project2/…/conda_envs/graph_env/bin/python train.py
This bypasses Conda activation.
4️⃣ Correct way to move an environment Option A (recommended): export + recreate conda activate graph_env conda env export > env.yml
Then recreate:
conda env create -p /project2/…/conda_envs/graph_env -f env.yml Option B: use conda-pack (best for clusters)
Install:
pip install conda-pack
Pack:
conda pack -n graph_env -o graph_env.tar.gz
Unpack elsewhere:
tar -xzf graph_env.tar.gz
Then fix paths:
./bin/conda-unpack 5️⃣ Best practice for HPC clusters
Most clusters recommend:
$PROJECT/conda_envs/
Create environment directly there:
conda create -p /project2/…/conda_envs/graph_env python=3.10
Then activate:
conda activate /project2/…/conda_envs/graph_env
💡 In your case, since the environment already exists at:
/project2/dilkina_1832/jongmin_hw1/conda_envs/graph_env
try activating with the prefix:
conda activate /project2/dilkina_1832/jongmin_hw1/conda_envs/graph_env
If Conda still says it’s not an environment, it was likely copied and broken, and recreating it is the safest fix.