Daily Log — 2026-03-12

Today’s Overview

  • What I did: Fixed torch/torchvision version mismatch and curobo CUDA extension ABI compatibility issues in the VLA/RoboTwin evaluation environment, and improved terminal width and process display logic in the GPU monitoring tool gpumon.py
  • How I did it: Traced the dependency chain errors layer by layer (torchvision upgrade → curobo JIT compilation path fix → checkpoint path verification), then recompiled curobo from source by setting CUDA_HOME+CPATH to point to the conda internal CUDA headers directory
  • Why it matters: Resolved two critical dependency conflicts in a torch 2.7.1 environment; the evaluation script now launches successfully and runs through to the model loading stage, clearing the environment blockers for subsequent robot policy evaluation

Debugging multi-layered dependency issues in a VLA robot evaluation environment on the Tianhe server, while improving GPU monitor display logic

Today’s Tasks

Architecture & Strategy

  • 🔄 VLA eval.sh runtime environment fix — Fixed failures running bash eval.sh place_dual_shoes demo_clean pi05_robotwin2 demo_clean 0 2: sequentially resolved torchvision version mismatch (0.21.0→0.22.1), curobo CUDA extension ABI incompatibility (recompiled from source); the remaining issue of missing checkpoint_id=5000 path is pending (available checkpoints: 15000/25000/29999)

Implementation & Fixes

  • 🔄 gpumon.py display logic improvements — Fixed GPU monitor output exceeding terminal width (>100 columns) and duplicate process display; AI implemented global deduplication, but user corrected the requirement to show each process once per GPU — session was interrupted before the fix was fully completed

Problems & Solutions

Critical Issues

1. curobo pre-compiled .so file ABI incompatibility with torch 2.7.1 (undefined symbol: torchInternalAssertFail); JIT recompilation failed due to missing ninja and CUDA headers

Solution: Install ninja, set CUDA_HOME to the conda environment root, set CPATH to envs/RefineVLA/targets/x86_64-linux/include/, then run pip install -e . to recompile curobo from source

Key insight: When CUDA toolkit is installed via conda, headers live at envs/<name>/targets/x86_64-linux/include/, not /usr/local/cuda/include/; CUDA_HOME and CPATH must point to this path when compiling CUDA extensions

2. torch 2.7.1+cu126 and torchvision 0.21.0 version mismatch causing torchvision::nms operator registration failure

Solution: Upgrade torchvision to 0.22.1+cu126 (pip install torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu126)

Key insight: torch 2.7.x must be paired with torchvision 0.22.x; torchvision .so files link against the torch C++ ABI and must be upgraded in sync with major torch version bumps

General Issues

3. Checkpoint path policy/pi05/checkpoints/pi05_robotwin2/demo_clean/5000/assets/ does not exist; deploy_policy.yml defaults to checkpoint_id=5000, but available checkpoints are 15000/25000/29999

Solution: Session was interrupted before resolution; need to change checkpoint_id in deploy_policy.yml to an available value (e.g., 29999) or pass it as an eval.sh argument

Key insight: The checkpoint_id passed to eval.sh must correspond to actual training artifacts; the yml default of 5000 is just a placeholder

4. gpumon.py output width exceeds terminal width; processes appear duplicated across multiple GPUs

Solution: Rewrote the process table logic to cap output at 80 columns and filter out subprocess noise from multiprocessing workers and wandb-core

Key insight: AI interpreted the requirement as global deduplication (each process appears once total), whereas the actual requirement was per-(process, GPU) deduplication (each process appears once per GPU it uses)

Human Thinking vs. AI Thinking

Strategic Level

Understanding process deduplication granularity in gpumon.py

Role Approach
Human The human clearly distinguished the many-to-many relationship between processes and GPUs: expected each (process, GPU) pair to appear exactly once, not each process globally once
AI AI implemented global deduplication — each process appears once, with GPU IDs for all used cards joined by commas (e.g., 0,1,5,7)

Analysis: AI’s approach loses the mapping between a process and its specific GPUs; the human’s requirement was to eliminate redundant rows while preserving the per-GPU breakdown

Implementation Level

Who drives dependency debugging

Role Approach
Human Human used an iterative-driven strategy: fix one error, re-run to observe the next, and actively declined AI tool calls multiple times (find commands, ExitPlanMode) to stay in control of pace
AI AI handled root-cause analysis and technical execution layer by layer, identifying the full torch→torchvision→curobo dependency chain and proposing fixes

Analysis: The human controlled the pace and scope; AI provided diagnostic and implementation capability. The division of labor was clear, but the human’s repeated interventions extended the debugging cycle

AI Limitations

Significant Limitations

  • AI misunderstood the deduplication granularity requirement for gpumon.py, implementing “global per-process deduplication” instead of “per-(process, GPU) deduplication” — the error was only caught after explicit user correction

General Limitations

  • Locating the correct CUDA headers required multiple attempts (pip nvidia package path → system /usr/local/cuda → conda targets directory); AI failed to identify the right path in one shot based on the environment structure
  • When path locations were uncertain, AI defaulted to running global find searches — a behavior the user rejected; AI should instead infer paths from known environment layouts (e.g., conda env directory structure)

Today’s Takeaways

Core Takeaways

  • When CUDA toolkit is installed in a conda environment, headers are located at envs/<name>/targets/x86_64-linux/include/ (not /usr/local/cuda/include/); compiling CUDA extensions requires setting CUDA_HOME=<conda_env_root> and CPATH=<targets_include>
  • After a major torch version upgrade, all CUDA extensions that link against the torch C++ ABI (pre-compiled .so files like torchvision and curobo) must be recompiled or upgraded; torch 2.7.x corresponds to torchvision 0.22.x
  • When curobo’s pre-compiled .so is incompatible with the current torch version, deleting the .so and rebuilding from source with pip install -e . is a viable quick fix — the key is correctly configuring the CUDA compilation environment

Session Summaries

RoboBrain GPU Monitor

🔄 gpumon.py terminal width limit and process display deduplication improvements 15:48:28.705 | claude_code User showed the oversized output and duplicate process entries in gpumon.py and requested a fix. AI rewrote the process table logic, capping width at 80 columns and implementing global process deduplication. User immediately corrected the requirement: each process should appear once per GPU, not once globally. The session was interrupted and the second fix was not completed.

VLA RoboTwin pi05

🔄 eval.sh dependency chain fix: torchvision upgrade + curobo source recompile + checkpoint path issue discovered 02:34:02.614 | claude_code After successfully upgrading torchvision to 0.22.1 as planned, a curobo CUDA extension ABI incompatibility error appeared. AI installed ninja, deleted the old .so files, located the CUDA headers under the conda environment’s targets/x86_64-linux/include/ directory, and successfully compiled curobo from source after setting CUDA_HOME+CPATH. Re-running the script revealed that the checkpoint_id=5000 path does not exist (available: 15000/25000/29999); the session was interrupted while analyzing the parameter mapping.

🔄 eval.sh first error analysis: torch/torchvision version mismatch diagnosis and fix plan 02:23:18.758 | claude_code User ran eval.sh and got a torchvision::nms operator not found error. AI diagnosed it as a torch 2.7.1 / torchvision 0.21.0 version mismatch and formulated a plan to upgrade torchvision to 0.22.1. User chose the upgrade path but declined AI’s ExitPlanMode to execute directly; session ended waiting for user instructions.

Token Usage

Overview

Metric Value
Total Tokens 1,970,396
Input Tokens 62
Output Tokens 3,342
Cache Creation 199,634
Cache Read 1,767,358
Cache Hit Rate 89.9%
Total Cost (USD) $2.2153

Model Breakdown

Model Input Output Cache Creation Cache Read Cost Share
claude-opus-4-6 62 3,342 199,634 1,767,358 $2.2153 100.0%