Daily Journal — 2026-03-14
Today’s Overview
- What I did: Completed nvitop-style UI improvements for the GPU monitoring tool, batch MP4 visualization of HDF5 camera data, and architecture design plus code implementation for four groups of manipulation progress prediction auxiliary experiments in the pi05 model
- How I did it: Optimized the monitoring tool using an alternate terminal buffer and adaptive column widths; batch-decoded JPEG frames with OpenCV into 2×2 grids and wrote MP4s; added auxiliary MLP prediction heads, stop_gradient isolation strategy, and experiment config switches across 6 files in the JAX/Flax NNX framework
- Why it matters: GPU monitoring tool UX now matches nvitop quality; all 50 demonstration videos generated and ready for data quality inspection; pi05 four-group experiment configs are ready — training can begin as soon as the lerobot data format conversion is complete
Improved GPU monitoring tool UX, completed four-camera visualization for 50 robot demonstration episodes, and designed and implemented a four-group manipulation progress prediction auxiliary task experiment framework in the pi05 VLA model
Today’s Tasks
Architecture & Strategy
- 🔄 pi05 four-group manipulation progress prediction auxiliary experiment implementation — Implemented
manip_progress_time/distanceauxiliary prediction heads in the pi05 model (four experiments:last_tokenvsspecial_token×timevsdistance), with changes spanningpi0_config.py,model.py,tokenizer.py,robotwin_policy.py,config.py, andpi0.py; addedProgressConfigswitches and four one-click experiment configs; training is blocked pending lerobot data format conversion
Implementation & Fixes
- ✅ Batch HDF5 camera data visualization script — Created
script/visualize_hdf5_cameras.pyto read front/head/left/right four-channel JPEG camera frames from 50 episodes underplace_dual_shoes/demo_clean/data, assemble them into annotated 2×2 grids, and write 640×480@30FPS MP4 files; all 50 videos (~2.3MB each) generated successfully - ✅ gpumon.py alternate buffer and adaptive layout — Added nvitop-style alternate screen buffer to the GPU monitoring script (
\033[?1049hto enter a dedicated screen, restored on Ctrl+C exit); changed GPU and process tables to adapt width/height based on the actual terminalCOLUMNS/LINES; fixed a bug whereos.get_terminal_size()could not read theCOLUMNSenvironment variable in subprocess contexts
Problems & Solutions
Critical Issues
1. pi05 training failed to start: dataset missing new fields — manip_progress_time, manip_progress_distance_left/right, target_endpose, target_joint, etc.
Solution: Modify ~/HDD_POOL/mozihao/VLA/convert_robotwin_democlean_to_lerobot.py to add the missing fields, then re-run the dataset conversion
Key insight: Verify that the data pipeline fully supports all required fields before designing training code — discovering missing data after implementation is complete wastes engineering time
General Issues
2. os.get_terminal_size() cannot read COLUMNS/LINES environment variables in pipe/subprocess contexts, causing the table to not actually expand when tested at 120 columns
Solution: Modified _get_term_size() to prioritize reading COLUMNS/LINES environment variables, falling back to os.get_terminal_size() only on failure
Key insight: Terminal width detection must handle both real TTYs (interactive) and environment-variable-only contexts (pipes/scripts)
Human Thinking vs. AI Thinking
Strategic Level
Research design for the pi05 four-group experiment
| Role | Approach |
|---|---|
| Human | Proactively proposed the full four-group comparative experiment design: two feature extraction methods (last_token vs special_token), two prediction targets (time vs distance), and the specific mechanism for injecting prediction results as conditioning tokens into the action expert |
| AI | Upon receiving the experiment design, analyzed architectural feasibility and proposed engineering implementation details: MLP scale (2048→256→out), loss weight λ=0.1, stop_gradient strategy, and config switch scheme |
Analysis: Research hypotheses and experiment design were human-led; AI primarily handled architecture analysis and engineering implementation — the human contribution to core research direction was larger
Implementation Level
GPU monitoring tool UI specification
| Role | Approach |
|---|---|
| Human | Explicitly specified the nvitop-style interaction behavior (restore command window on exit) and the specific requirement for adaptive width/height |
| AI | Implemented the alternate buffer mechanism, but the initial version didn’t truly adapt table width to terminal changes — the issue was only caught during debugging |
Analysis: The human had a clear target UX in mind; the AI had gaps in implementation details (environment variable vs. TTY), requiring user testing to surface and fix
AI Limitations
Significant Limitations
- Did not proactively verify data pipeline completeness (whether fields like
manip_progresshad already been written to the lerobot dataset) before implementing the pi05 training code, resulting in a missing data format issue discovered only at training time — wasting engineering effort
General Limitations
- The initial adaptive implementation of
gpumon.pymissed the issue whereos.get_terminal_size()cannot readCOLUMNSin subprocess contexts; only surfaced and fixed after user testing
Today’s Takeaways
Core Insights
- Using
stop_gradientto isolate main-task and auxiliary-task gradients in VLA auxiliary tasks is the safe starting point — first ensure the auxiliary head doesn’t interfere with action prediction; if results are poor, remove the gradient stop and run a comparison experiment - When adding auxiliary task heads in JAX/Flax NNX, training uses GT values for teacher forcing while inference uses predicted values for injection — both paths must be implemented separately in
compute_lossandsample_actionswith consistent interfaces
Practical Insights
- Alternate terminal buffer (
\033[?1049hto enter,\033[?1049lto exit) combined withsignal.SIGINTcapture enables an nvitop-style full-screen refresh UI that automatically restores the original terminal content on exit
Session Summaries
RoboTwin GPU Monitor
✅ gpumon.py — Added nvitop-style alternate buffer and adaptive terminal layout
09:24:23.170 | claude_code
User requested refactoring the GPU monitoring script to nvitop style: enter a dedicated screen on launch, restore on exit, and change fixed column widths to adaptive. AI implemented the \033[?1049h alternate buffer mechanism, fixed os.get_terminal_size() being ineffective in pipe environments, and changed GPU/process table widths to proportional allocation. Tested and passed at both 80 and 120 columns; process table row count also dynamically truncated based on terminal height.
RoboTwin HDF5 Visualization
✅ Implemented HDF5→MP4 batch visualization script; successfully processed all 50 episodes
13:21:34.636 | claude_code
User submitted a planning document requesting implementation. AI created script/visualize_hdf5_cameras.py, using cv2.imdecode to decode JPEG frames and assemble annotated 2×2 grids, writing 640×480@30FPS MP4 files. The script ran end-to-end over all 50 episodes; output directory contains 50 video files of ~2.3MB each, with file count and sizes verified.
🔄 HDF5 camera visualization implementation planning (Plan Mode exploration)
13:18:03.922 | claude_code
User re-initiated the visualization request. AI used an Explore agent to analyze the data collection pipeline and HDF5 file structure, confirmed the JPEG-encoded camera data format for 50 episodes (240×320, four channels), read the existing parse_hdf5.py tool to understand decoding patterns, and produced a detailed implementation plan (2×2 grid, 640×480@30FPS MP4) before exiting Plan Mode.
❌ HDF5 camera visualization request (interrupted by 403 auth expiration) 13:16:50.918 | claude_code User requested a script to visualize HDF5 camera data. AI encountered a 403 Request not allowed error on the first file read attempt (session token expired, requiring re-login). The session was immediately interrupted; user then re-initiated the same request in a new session.
RoboTwin pi05 VLA
🔄 Architecture discussion and six-file implementation for pi05 four-group manipulation progress prediction auxiliary experiments
14:21:07.908 | claude_code
User proposed four comparative experiments (last_token vs special_token × manip_progress_time vs distance_left/right). After in-depth analysis of the JAX/Flax NNX architecture, AI proposed MLP scale (2048→256), loss weight λ=0.1, stop_gradient strategy, and config switch scheme — all confirmed by the user. AI completed the implementation across 6 files, including the ProgressConfig class, special token registration, Observation.aux_targets field, auxiliary loss computation, and four one-click experiment entry points; import and config validation tests passed. Training launch revealed the dataset is missing progress-related fields — the lerobot conversion script must be updated first.
Token Usage
Summary
| Metric | Value |
|---|---|
| Total Tokens | 18,998,065 |
| Input Tokens | 11,315 |
| Output Tokens | 60,529 |
| Cache Created | 1,403,485 |
| Cache Read | 17,522,736 |
| Cache Hit Rate | 92.6% |
| Total Cost (USD) | $13.1289 |
Model Breakdown
| Model | Input | Output | Cache Created | Cache Read | Cost | Share |
|---|---|---|---|---|---|---|
| claude-opus-4-6 | 1,635 | 27,657 | 760,379 | 12,366,827 | $11.6354 | 88.6% |
| claude-haiku-4-5-20251001 | 9,680 | 32,872 | 643,106 | 5,155,909 | $1.4935 | 11.4% |
Per-Device Usage
| Device | Total Tokens | Input | Output | Cost |
|---|---|---|---|---|
| tianhe | 7,203,350 | 5,266 | 23,595 | $5.6472 |
| TzJsDesktop | 11,794,715 | 6,049 | 36,934 | $7.4817 |