Daily Log — 2026-03-12
Today’s Overview
- What I did: Fixed torch/torchvision version mismatch and curobo CUDA extension ABI compatibility issues in the VLA/RoboTwin evaluation environment, and improved terminal width and process display logic in the GPU monitoring tool gpumon.py
- How I did it: Traced the dependency chain errors layer by layer (torchvision upgrade → curobo JIT compilation path fix → checkpoint path verification), then recompiled curobo from source by setting CUDA_HOME+CPATH to point to the conda internal CUDA headers directory
- Why it matters: Resolved two critical dependency conflicts in a torch 2.7.1 environment; the evaluation script now launches successfully and runs through to the model loading stage, clearing the environment blockers for subsequent robot policy evaluation
Debugging multi-layered dependency issues in a VLA robot evaluation environment on the Tianhe server, while improving GPU monitor display logic
Today’s Tasks
Architecture & Strategy
- 🔄 VLA eval.sh runtime environment fix — Fixed failures running
bash eval.sh place_dual_shoes demo_clean pi05_robotwin2 demo_clean 0 2: sequentially resolved torchvision version mismatch (0.21.0→0.22.1), curobo CUDA extension ABI incompatibility (recompiled from source); the remaining issue of missing checkpoint_id=5000 path is pending (available checkpoints: 15000/25000/29999)
Implementation & Fixes
- 🔄 gpumon.py display logic improvements — Fixed GPU monitor output exceeding terminal width (>100 columns) and duplicate process display; AI implemented global deduplication, but user corrected the requirement to show each process once per GPU — session was interrupted before the fix was fully completed
Problems & Solutions
Critical Issues
1. curobo pre-compiled .so file ABI incompatibility with torch 2.7.1 (undefined symbol: torchInternalAssertFail); JIT recompilation failed due to missing ninja and CUDA headers
Solution: Install ninja, set CUDA_HOME to the conda environment root, set CPATH to envs/RefineVLA/targets/x86_64-linux/include/, then run pip install -e . to recompile curobo from source
Key insight: When CUDA toolkit is installed via conda, headers live at envs/<name>/targets/x86_64-linux/include/, not /usr/local/cuda/include/; CUDA_HOME and CPATH must point to this path when compiling CUDA extensions
2. torch 2.7.1+cu126 and torchvision 0.21.0 version mismatch causing torchvision::nms operator registration failure
Solution: Upgrade torchvision to 0.22.1+cu126 (pip install torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu126)
Key insight: torch 2.7.x must be paired with torchvision 0.22.x; torchvision .so files link against the torch C++ ABI and must be upgraded in sync with major torch version bumps
General Issues
3. Checkpoint path policy/pi05/checkpoints/pi05_robotwin2/demo_clean/5000/assets/ does not exist; deploy_policy.yml defaults to checkpoint_id=5000, but available checkpoints are 15000/25000/29999
Solution: Session was interrupted before resolution; need to change checkpoint_id in deploy_policy.yml to an available value (e.g., 29999) or pass it as an eval.sh argument
Key insight: The checkpoint_id passed to eval.sh must correspond to actual training artifacts; the yml default of 5000 is just a placeholder
4. gpumon.py output width exceeds terminal width; processes appear duplicated across multiple GPUs
Solution: Rewrote the process table logic to cap output at 80 columns and filter out subprocess noise from multiprocessing workers and wandb-core
Key insight: AI interpreted the requirement as global deduplication (each process appears once total), whereas the actual requirement was per-(process, GPU) deduplication (each process appears once per GPU it uses)
Human Thinking vs. AI Thinking
Strategic Level
Understanding process deduplication granularity in gpumon.py
| Role | Approach |
|---|---|
| Human | The human clearly distinguished the many-to-many relationship between processes and GPUs: expected each (process, GPU) pair to appear exactly once, not each process globally once |
| AI | AI implemented global deduplication — each process appears once, with GPU IDs for all used cards joined by commas (e.g., 0,1,5,7) |
Analysis: AI’s approach loses the mapping between a process and its specific GPUs; the human’s requirement was to eliminate redundant rows while preserving the per-GPU breakdown
Implementation Level
Who drives dependency debugging
| Role | Approach |
|---|---|
| Human | Human used an iterative-driven strategy: fix one error, re-run to observe the next, and actively declined AI tool calls multiple times (find commands, ExitPlanMode) to stay in control of pace |
| AI | AI handled root-cause analysis and technical execution layer by layer, identifying the full torch→torchvision→curobo dependency chain and proposing fixes |
Analysis: The human controlled the pace and scope; AI provided diagnostic and implementation capability. The division of labor was clear, but the human’s repeated interventions extended the debugging cycle
AI Limitations
Significant Limitations
- AI misunderstood the deduplication granularity requirement for gpumon.py, implementing “global per-process deduplication” instead of “per-(process, GPU) deduplication” — the error was only caught after explicit user correction
General Limitations
- Locating the correct CUDA headers required multiple attempts (pip nvidia package path → system
/usr/local/cuda→ conda targets directory); AI failed to identify the right path in one shot based on the environment structure - When path locations were uncertain, AI defaulted to running global
findsearches — a behavior the user rejected; AI should instead infer paths from known environment layouts (e.g., conda env directory structure)
Today’s Takeaways
Core Takeaways
- When CUDA toolkit is installed in a conda environment, headers are located at
envs/<name>/targets/x86_64-linux/include/(not/usr/local/cuda/include/); compiling CUDA extensions requires settingCUDA_HOME=<conda_env_root>andCPATH=<targets_include> - After a major torch version upgrade, all CUDA extensions that link against the torch C++ ABI (pre-compiled .so files like torchvision and curobo) must be recompiled or upgraded; torch 2.7.x corresponds to torchvision 0.22.x
- When curobo’s pre-compiled .so is incompatible with the current torch version, deleting the .so and rebuilding from source with
pip install -e .is a viable quick fix — the key is correctly configuring the CUDA compilation environment
Session Summaries
RoboBrain GPU Monitor
🔄 gpumon.py terminal width limit and process display deduplication improvements 15:48:28.705 | claude_code User showed the oversized output and duplicate process entries in gpumon.py and requested a fix. AI rewrote the process table logic, capping width at 80 columns and implementing global process deduplication. User immediately corrected the requirement: each process should appear once per GPU, not once globally. The session was interrupted and the second fix was not completed.
VLA RoboTwin pi05
🔄 eval.sh dependency chain fix: torchvision upgrade + curobo source recompile + checkpoint path issue discovered
02:34:02.614 | claude_code
After successfully upgrading torchvision to 0.22.1 as planned, a curobo CUDA extension ABI incompatibility error appeared. AI installed ninja, deleted the old .so files, located the CUDA headers under the conda environment’s targets/x86_64-linux/include/ directory, and successfully compiled curobo from source after setting CUDA_HOME+CPATH. Re-running the script revealed that the checkpoint_id=5000 path does not exist (available: 15000/25000/29999); the session was interrupted while analyzing the parameter mapping.
🔄 eval.sh first error analysis: torch/torchvision version mismatch diagnosis and fix plan
02:23:18.758 | claude_code
User ran eval.sh and got a torchvision::nms operator not found error. AI diagnosed it as a torch 2.7.1 / torchvision 0.21.0 version mismatch and formulated a plan to upgrade torchvision to 0.22.1. User chose the upgrade path but declined AI’s ExitPlanMode to execute directly; session ended waiting for user instructions.
Token Usage
Overview
| Metric | Value |
|---|---|
| Total Tokens | 1,970,396 |
| Input Tokens | 62 |
| Output Tokens | 3,342 |
| Cache Creation | 199,634 |
| Cache Read | 1,767,358 |
| Cache Hit Rate | 89.9% |
| Total Cost (USD) | $2.2153 |
Model Breakdown
| Model | Input | Output | Cache Creation | Cache Read | Cost | Share |
|---|---|---|---|---|---|---|
| claude-opus-4-6 | 62 | 3,342 | 199,634 | 1,767,358 | $2.2153 | 100.0% |