Weekly Report — 2026-W11 (2026-03-09 ~ 2026-03-15)

This week, six parallel workstreams advanced across three machines (DCC, tianhe, TzJsDesktop): ①MIHD spatial transcriptomics uncovered a fundamental methodological flaw in cross-sample embedding (per-section independent processing causes incomparable feature spaces) and initiated a fix; ②ErrorRecoveryBenchmark scaled from bug fixes to 13 skills/29 subtypes, solved the Drop skill object-not-falling issue, exposed online quota architecture limitations, and established offline injection as the new direction; ③VLA-RoboTwin/pi05 achieved end-to-end progress from environment setup and training performance optimization (JAX version alignment +33% speedup) to new data variable collection and auxiliary task experiments; ④gadget toolchain completed an architectural upgrade with MCP Server + common/ shared package + unified output directory, and the research profiler achieved homepage-first student discovery; ⑤CalendarPro completed 7-phase comprehensive optimization with all 230 tests passing and token consumption reduced by 40–60%; ⑥gadget research toolchain integrated citation graph analysis and produced deep profiles for 7 embodied AI researchers.

Weekly Overview

Metric Value
Date Range 2026-03-09 ~ 2026-03-15
Active Days 6 / 7
Total Conversations 29
Projects Involved 19
Tasks Completed 36
Tasks In Progress 10
Total Tokens 309,110,118
Total Cost $227.47
Daily Avg Cost $32.50

Project Progress

VLA-RoboTwin/pi05 (6 days active) — 🔄 active

Completed:

  • Successfully converted 50 RoboTwin episodes to LeRobot format (11,459 frames)
  • Diagnosed the 33% training time gap between pi05 and openpi; upgraded 6 key dependencies including JAX 0.5.0→0.5.3, compressing expected training time from 20h to 15h
  • Completed full end-to-end fix of the eval.sh runtime environment: upgraded torchvision to 0.22.1 and set conda CUDA_HOME, recompiled curobo from source to resolve ABI incompatibility
  • Implemented 5 new data variables for Place Dual Shoes (manip_progress_time/distance_left/right, target_endpose, target_joint), using a post-processing architecture that backpatches pickles after move() to resolve future-state dependencies
  • Designed and implemented four groups of manipulation progress prediction auxiliary experiments across 6 files (last_token vs special_token × time vs distance) under the JAX/Flax NNX framework; added stop_gradient isolation and ProgressConfig toggle
  • Fixed CheckpointWeightLoader missing_regex configurability and pi0.py LeRobot shape squeeze issues; training step-100 action_loss/aux_loss curves show normal descent

Blockers:

  • ⚠️ All four auxiliary experiment groups are blocked because the LeRobot dataset does not include the new fields; dataset must be re-converted
  • ⚠️ eval.sh defaults to checkpoint_id=5000 which does not exist; needs correction to an available value (15000/25000/29999)

ErrorRecoveryBenchmark (4 days active) — 🔄 active

Completed:

  • Fixed two critical bugs: discarded return value in monitor.update() and taxonomy label mapping; re-annotated 1,029 historical scenarios
  • Solved the Drop skill object-not-falling issue: calling mujoco.mj_step() for 15 physics steps bypasses OSC controller interference
  • Fixed 5 systematically failing skills (3 drop variants + grasp_misalignment + trajectory_regression + wrong_object); all 105 unit tests passing
  • Semantically split E2 Drop into 3 independent skills by recovery strategy (drop_in_transit / drop_at_wrong_place / drop_with_interaction), expanding the benchmark to 13 skills / 29 subtypes
  • Fixed Stack body name parsing silent failure; generated MP4 demo videos for 11 demo skills; completed v4 code archival
  • Completed v5.1 architecture planning (InjectionEngine refactor + speed limits + human demo collection pipeline); established milestone of beginning recovery training before April 1
  • v5 full run generated 231 scenarios and MP4s; first D0 round generated 207 scenarios

Blockers:

  • ⚠️ D0 scenario generation is still short of the 600-scenario target; 5 fixed root causes need re-validation
  • ⚠️ Coffee machine part disassembly (lid floating, base displaced) kinematic tree diagnosis is incomplete
  • ⚠️ v5.1 offline injection architecture implementation has not started

MIHD (Spatial Transcriptomics) (3 days active) — 🔄 active

Completed:

  • Completed 151673↔151508 cross-sample RM-IDEAL benchmark; Layer_1/5 positive correlation (r≤0.66), Layer_3 negative correlation reveals fusion embedding layer specificity
  • Implemented CrossModalEnhancer module (spatial neighbor KV sequence construction + symmetric InfoNCE); CPU-side three-mode tests passing
  • Worked around RTX 2080 Ti cuBLAS large-tensor bug (project to hidden_dim first before aggregating neighbors + mini-batch contrastive loss)
  • scGPT literature review confirmed zero-shot underperforms PCA/scVI, providing strategic evidence for gene encoder selection
  • Completed major MIHD output directory restructure (all 14+ file path references updated)
  • Identified fundamental methodological flaw in cross-sample embedding and initiated raw_shared shared HVG intersection (1,137 genes) baseline fix

Blockers:

  • ⚠️ 151676 STAIG embedding is all-zero (model collapse); GPU retraining failed due to PyTorch 2.9.0 + PyG CUDA conflict; cross-section visualization blocked
  • ⚠️ raw_shared embedding diagnosis still running; CrossModalEnhancer full GPU pipeline evaluation incomplete

gadget Toolchain (2 days active) — 🔄 active

Completed:

  • Wrapped 9 MCP tools using FastMCP + capture_stdout + asyncio.to_thread; refactored to content-return pattern (save parameter controls file writing)
  • Enhanced research_scout logging system (RotatingFileHandler dual output); added bioRxiv/PubMed multi-source support with zero new dependencies
  • Created 6 new common/ modules eliminating ~500 lines of duplicate code; paths.py unifies 6 path constants; .gitignore simplified to single-line outputs/
  • Implemented Homepage-Based student discovery (4-phase strategy: homepage-first + co-authorship supplement); completed deep profiles for 7 embodied AI researchers
  • Integrated research_scout.py as unified CLI entry (profile/citations subcommands); integrated Semantic Scholar citation graph API; added Hugo research section

Blockers:

  • ⚠️ Hugo deployment of 7 researcher profiles not yet completed
  • ⚠️ LLM-generated Chinese long-form JSON quote pollution issue unresolved

CalendarPro (2 days active) — ✅ completed

Completed:

  • Implemented gadget integration layer (ResearchScoutTool + DailySummaryTool + conda run cross-environment); auto-triggered at 8AM/11PM daily; 13 unit tests passing
  • Completed 7-phase comprehensive optimization (confidence threshold, hybrid routing, prompt simplification + Chinese token correction, exponential backoff, configurable scheduling weights, automatic threshold tuning, ThoughtStore cache)
  • Fixed 4 real misclassification scenarios; prompt token consumption reduced by 40–60%; all 230 tests passing

UniVLA/CALVIN Evaluation (2 days active) — 🔄 active

Completed:

  • Completed CALVIN dependency chain analysis (4 issues located); found evaluation is purely online simulation; extracted eval-only files (1.3GB → 600KB)
  • Added –single_gpu mode to bypass torchrun/DDP; fixed multiple hardcoded paths; installed braceexpand dependency

Blockers:

  • ⚠️ Full evaluation script pipeline not yet validated; still iterating through debugging

Key Tasks

  • CalendarPro 7-Phase Comprehensive Optimization (2026-03-15) — Implemented semantic routing confidence threshold, hybrid routing (Dense 70% + Keyword 30%), prompt simplification (530 lines → base + 11 fragments) + Chinese token correction (×1.5/character), exponential backoff retry, configurable scheduling weights, automatic threshold tuning feedback loop, ThoughtStore memory cache; fixed 4 real misclassification scenarios; token consumption reduced 40–60%; all 230 tests passing
  • gadget Research Toolchain CLI Integration + Citation Graph + Deep Profiles for 7 Researchers (2026-03-15) — Unified paper scout and researcher profiler under research_scout.py as a single CLI; added Semantic Scholar citation graph API (three-stage report auto-runs citation analysis on top-5 papers); completed deep profiles for Mingyu Ding / Ruoshi Liu / Xiaolong Wang / Shuran Song / Yunzhu Li / Yuke Zhu / Chelsea Finn; identified complete advisor relationship networks
  • 🔄 ErrorRecoveryBenchmark v5 Comprehensive Fix and Scale-Up to 13 Skills/29 Subtypes (2026-03-15) — Fixed 5 systematically failing skills; split E2 into 3 semantically independent skills; completed v4 archival; v5 full run generated 231 scenarios; first D0 round generated 207 scenarios (target: 600); completed v5.1 architecture planning (InjectionEngine + speed limits + human demo collection; recovery training to begin before April 1)
  • gadget common/ Shared Package Extraction + outputs/ Unified Directory Restructure (2026-03-15) — Created 6 new common/ modules (io/cache/json_utils/llm/hugo); eliminated ~500 lines of duplicate LLM call and JSON parsing code; paths.py unifies 6 path constants; .gitignore simplified to single-line outputs/; updated 4 CLAUDE.md files
  • gadget MCP Server Design, Implementation, and Tool Content-Return Refactor (2026-03-09) — Wrapped 9 MCP tools using FastMCP + capture_stdout + asyncio.to_thread; refactored from ‘write file return path’ to ‘return full content + optional save parameter’; established pip install -e . + console entry point distribution; all tools validated
  • 🔄 MIHD Cross-Sample Embedding Methodology Diagnosis and Fix (2026-03-15) — Identified dual incomparability from per-section independent HVG selection + independent PCA fitting; invalidated the false conclusion that ‘PCA outperforms STAIG = weak input features’; initiated raw_shared baseline with shared HVG intersection (1,137 genes); discovered STAIG’s layer-specific pattern: Layer_1/5 (SL@50=0.94–1.0) vs complete failure in intermediate layers
  • pi05 Training Performance Optimization: JAX Version Alignment + Dependency Conflict Resolution (2026-03-11) — Used parallel sub-agents to compare pyproject.toml/uv.lock/wandb logs; identified JAX version gap (0.5.0 vs 0.5.3) as root cause of 33% slower training due to accumulated XLA compiler optimizations; aligned 6 key dependencies; used uv override-dependencies to resolve lerobot torch<2.7 version constraint conflict; successfully completed uv lock (305 packages)
  • 🔄 pi05 Four-Group Manipulation Progress Prediction Auxiliary Experiment Design and Implementation (2026-03-14) — Implemented manip_progress auxiliary prediction head across 6 files in JAX/Flax NNX framework (last_token vs special_token × time vs distance); added stop_gradient isolation and ProgressConfig toggle; fixed CheckpointWeightLoader and LeRobot shape issues; training step-100 loss curves show normal descent
  • ErrorRecoveryBenchmark v5.1 Architecture Planning (2026-03-15) — Refactored ContextReplayEngine into InjectionEngine (direct recovery by injecting sim state at the target frame, bypassing VLA’s no-context-window assumption); added motion speed limits; designed keyboard teleoperation human demo collection pipeline; limited data source to MimicGen demos; established phased implementation plan for March 16–31
  • RoboTwin New Data Variable Post-Processing Collection Architecture (2026-03-13) — Used post-processing approach of backpatching pickles after move() to implement 5 new variables; resolved target_endpose/target_joint dependency on future states; fixed negative manip_progress_distance (np.clip to [0,1]); pkl2hdf5.py generic recursive design requires no modification
  • 🔄 VLA eval.sh Runtime Environment Full End-to-End Fix (2026-03-12) — Upgraded torchvision 0.22.1+cu126 to fix nms operator mismatch; set CUDA_HOME to conda targets directory and recompiled curobo from source to resolve ABI incompatibility; remaining issue: checkpoint_id=5000 path does not exist
  • gadget Homepage-Based Student Discovery Strategy Implementation (2026-03-15) — Implemented homepage_discovery.py module (~200 lines); 4-phase discovery strategy (homepage-first + co-authorship supplement); multi-strategy URL discovery (S2 homepage field + LLM suggestion + –homepage parameter); HTMLParser text extraction; 2MB limit + 7-day cache TTL; resolved the fundamental limitation of S2 co-authorship analysis failing completely for top-tier researchers

Problems and Solutions

1. Drop Skill: OSC controller actively maintains EEF position during env.step() (impedance control), causing the object to be held by fingers after the gripper opens and unable to fall freely [ErrorRecoveryBenchmark] (2026-03-15)

Solution: Bypass the controller by directly setting MuJoCo qpos/qvel, then call mujoco.mj_step() for 15 physics steps to complete initial separation before entering the standard control loop

2. MIHD Cross-Sample Embedding Comparison Invalid: per-section independent HVG selection + independent PCA fitting causes incomparable feature spaces; conclusion that ‘PCA outperforms STAIG’ is a methodological error [MIHD] (2026-03-15)

Solution: Switch to the raw_shared approach using shared HVG intersection (1,137 genes) + unified processing as the correct baseline; load directly from raw HDF5 rather than relying on per-section cache (which has a var_names integer-conversion bug)

3. Stack Body Name Parsing Silent Failure: stack.yaml uses cubeA/cubeB, but MuJoCo actual names are cubeA_main; _sim_body_name2id returns -1; Python negative indexing causes all task phase detection to be misidentified as pre_reach [ErrorRecoveryBenchmark] (2026-03-15)

Solution: Fixed body name fields; added _main/_body0 suffix fallback logic in _sim_body_name2id; lookup failures now emit WARNING instead of silently returning -1

4. pi05 training 33% slower than openpi (20h vs 15h); intuition pointed to hardware differences, root cause unclear [VLA-RoboTwin/pi05] (2026-03-11)

Solution: Used parallel sub-agents to compare software layers (pyproject.toml/uv.lock/wandb logs); identified JAX version gap (0.5.0 vs 0.5.3) as root cause, with accumulated XLA compiler optimizations; used uv override-dependencies to resolve lerobot torch version upper-bound constraint conflict

5. curobo precompiled .so ABI-incompatible with torch 2.7.1 (undefined symbol); JIT recompilation failed because conda CUDA header path is non-standard [VLA-RoboTwin] (2026-03-12)

Solution: Set CUDA_HOME to conda environment root, CPATH to targets/x86_64-linux/include/, then pip install -e . to recompile from source

6. Online quota generation severely imbalanced: premature_release naturally captured 7,233 entries, 7 types completely at zero; strategy behavior distribution uncontrollable [ErrorRecoveryBenchmark] (2026-03-09)

Solution: Established offline injection architecture: first do complete rollouts to collect trajectories, offline-detect injectable points to build an index, then selectively inject according to quota; skip already-satisfied types

7. CalendarPro intent misclassification: no confidence threshold (0.52 treated as valid), time expressions misrouted by keyword router, short confirmation words lack context understanding, Chinese token estimation off by 3× [CalendarPro] (2026-03-15)

Solution: Added per-route confidence thresholds (0.40–0.60); introduced keyword scorer with 70/30 embedding hybrid routing; split system prompt into base + 11 fragments injected on demand; switched Chinese token estimate to ×1.5

8. S2 co-authorship analysis completely fails for top-tier researchers (Levine/Abbeel/Finn etc.) (depth-2 all empty); Xiaolong Wang/Shuran Song have severe same-name ambiguity [gadget] (2026-03-15)

Solution: Refactored to homepage-first strategy: prioritize scraping student lists from professors’ personal pages, with co-authorship as supplementary; multi-strategy URL discovery; same-name ambiguity flagged with WARNING recommending use of S2 authorId for precise lookup

9. VLA context replay architecture assumption incorrect: designed a full N-1 frame replay mechanism, but most VLAs have no context window, making this overhead useless [ErrorRecoveryBenchmark] (2026-03-15)

Solution: Refactored ContextReplayEngine into InjectionEngine that directly restores sim state at the injection frame; limited data source to MimicGen demo data for better controllability

10. RTX 2080 Ti + PyTorch 2.9.0 triggers cuBLAS CUBLAS_STATUS_EXECUTION_FAILED for high-dimensional tensors with N>3500 [MIHD] (2026-03-09)

Solution: First project to hidden_dim (128) with a Linear layer before indexing neighbors (avoids high-dimensional large tensors entering cuBLAS); switched InfoNCE to mini-batch contrastive loss (batch_size=512)

11. MCP Server tools write file and return path; AI cannot directly consume the content [gadget] (2026-03-09)

Solution: Refactored tools to bypass cmd_* wrappers and directly call underlying functions, returning full content (markdown/JSON); file writing controlled by a save parameter

12. pi0.py made incorrect assumptions about LeRobot internal behavior: inferred shape=(1,) features maintain (b,1) shape and modified code accordingly; actual LeRobot DataLoader squeezes to (b,) causing shape mismatch during training [VLA-RoboTwin/pi05] (2026-03-15)

Solution: Confirmed true shape by actually running training and observing logs (‘aux_targets[…]: (32,)@float32’); reverted original [:, None] and jnp.stack operations

Lessons Learned

Architecture

  • Cross-sample embedding comparison requires a shared feature space as a prerequisite: per-section independent HVG selection + independent PCA fitting = dual incomparability. A valid baseline must use shared HVG intersection + joint processing, or a foundation model with fixed pretrained weights
  • Direct state manipulation in MuJoCo fundamentally conflicts with feedback controllers (OSC): sim.forward() only updates kinematics; mujoco.mj_step() advances dynamics and bypasses the controller. Simulation injection design must explicitly choose one path
  • Error type semantic splitting should be based on ‘whether recovery strategies differ,’ not ‘whether injection mechanisms differ’: drop_in_transit / drop_at_wrong_place / drop_with_interaction have completely different detection conditions and recovery logic; even if injection actions are identical, they must be modeled separately
  • Semantic router architectural flaw: embedding nearest-neighbor always produces a result and cannot express ‘uncertain.’ Confidence threshold + fallback LLM + keyword scorer hybrid is the most practical fix pattern, generalizable to all vector-retrieval-based classification systems (RAG routing, tool selection, etc.)
  • MCP tools should prioritize AI consumption: return full content, with file writing as an optional side effect. General benchmarks should not assume models have a context window; InjectionEngine that directly restores sim state is more generalizable than context replay
  • For top-tier researchers (500+ papers), S2 co-authorship frequency analysis cannot identify students — the first-author signal is diluted by a massive number of collaborators. Professors’ personal pages explicitly list students, with reliability an order of magnitude higher. Citation graph (forward + backward) is a core feature of a research toolchain; ‘relevance’ should be decoupled from ‘citation count/popularity’
  • Offline injection architecture is better suited for building balanced error scenario datasets than online quota systems: decoupling ’exploring injectability’ from ’executing injection’ enables precise control of each error type count; online natural capture is heavily influenced by policy behavior distribution and cannot control type balance

Debugging

  • A minor JAX version upgrade (0.5.0→0.5.3) can bring ~33% training speedup; the cumulative effect of XLA compiler optimizations should not be ignored. uv override-dependencies can forcibly ignore transitive dependency version constraints, an effective tool for resolving third-party library version conflicts
  • Compiling CUDA extensions in a conda environment: CUDA_HOME = conda environment root, CPATH = envs//targets/x86_64-linux/include/ (not /usr/local/cuda/include/); after a major torch version upgrade, all .so files that depend on the torch C++ ABI need recompilation
  • Assumptions about third-party framework internal behavior must be verified through actual runs: LeRobot auto-squeezes shape=(1,) scalar features to (batch_size,) during DataLoader; code inference is unreliable. Actual training config values must be verified from wandb logs, as code defaults may be overridden by CLI parameters
  • GPU monitoring inside K8s containers: scan /proc//fd/ for /dev/nvidia* device symlinks + prioritize reading CUDA_VISIBLE_DEVICES to bypass PID namespace isolation; processes that open all GPU devices without consuming VRAM are usually monitoring tools and can be filtered accordingly
  • Silent failure is the most dangerous bug pattern: body_xpos[-1] negative indexing always returns the same position for two cubes; cached var_names integer-conversion caused gene name intersection to be zero. Any parsing failure should immediately emit WARNING rather than returning a sentinel value; cached data should be sanity-checked before use

Domain Knowledge

  • An independent benchmark (Genome Biology 2025) confirmed that scGPT zero-shot underperforms PCA/scVI; scGPT-spatial only compared against weak baselines (ARI ≈ 0.30–0.40), while SOTA (GraphST, ARI ≈ 0.55–0.63) was not included, with no independent third-party validation. When evaluating new methods, always verify whether their baselines represent current SOTA
  • CALVIN evaluation is purely online simulation; it does not read episode data at all, only requires validation/.hydra/merged_config.yaml; the 1.3GB dataset can be compressed to a 600KB eval-only version
  • Embodied AI researcher advisor lineage: Mingyu Ding ← Jitendra Malik, Ruoshi Liu ← Carl Vondrick, Xiaolong Wang ← Abhinav Gupta, Shuran Song ← Thomas Funkhouser, Yunzhu Li ← Antonio Torralba, Yuke Zhu ← Li Fei-Fei — showing a systematic output of students toward embodied AI from top perception/robotics advisor groups
  • Flow matching is becoming the mainstream action decoding architecture for VLAs. Pi0 time convention: t=1 is pure noise → t=0 is the target action. Pi0.5 uses adaRMS to inject time conditioning, outperforming simple concatenation. In VLA auxiliary tasks, stop_gradient isolating main task gradients is a safe starting point

Tools

  • On-demand prompt injection strategy: split system prompt into base (~50 lines) + intent-specific fragments (dynamically injected by classification), reducing token consumption by 40–60%. Chinese character token density is approximately 6× that of English characters (1.5 tokens/character vs 0.25 tokens/character); failing to correct this systematically underestimates context length
  • For projects with multiple tools, output directories should be organized by ‘file type first’ (outputs/reports/summarize/ rather than summarize/reports/), allowing .gitignore to be simplified to a single-line outputs/; Python re-export shim pattern (containing only from x import y; __all__ = [...]) is an elegant backward-compatible migration approach
  • PubMed esearch→efetch two-step E-utilities API can freely index metadata from subscription journals such as Nature/Cell/Science; bioRxiv API is equally open; both require no new dependencies (urllib.request); small-batch validation of pipeline feasibility is better than going straight to full scale

AI Usage Notes

Effective Patterns:

  • ✓ Parallel sub-agents accelerate multi-dimensional code analysis: launching 3+ sub-agents simultaneously covering different file sets for dependency version diagnosis and codebase exploration significantly compresses analysis time
  • ✓ Goal-driven delegation + iterative debugging loop: user provides clear termination conditions (‘fix until no errors’), AI independently iterates run → error → minimal fix; built-in error correction mechanism
  • ✓ Deep codebase exploration identifies architecture-level challenges: proactively identified the single-spot KV degeneration issue in CrossModalEnhancer (each spot only has one vector) and proposed spatial neighbor KV sequence construction
  • ✓ sys.path hack → common/ package gradual refactoring: re-export shim pattern maintains backward compatibility while eliminating duplicate code
  • ✓ Small-batch pipeline feasibility validation (207 scenarios exposed 5 systemic defects) is better than going straight to full scale; end-to-end integration tests surface pipeline-level implicit dependencies better than unit tests

Limitations:

  • ✗ Insufficient ability to reflect on experimental conclusions: jumps from numerical results directly to attribution without proactively questioning the validity of experimental design (MIHD embedding methodology flaw required external user trigger to correct)
  • ✗ Silent failure patterns not proactively detected: Stack body name parsing returning -1 + Python negative indexing, cached var_names integer-conversion — both required user discovery due to lack of sanity checks
  • ✗ Over-engineering and incorrect architecture assumptions: VLA context replay based on the erroneous assumption that ‘all VLAs need a context window’; incorrect inference about LeRobot shape behavior leading to code modification — both required user correction or runtime verification
  • ✗ Insufficient ability to proactively question methodology applicability boundaries: when S2 student discovery failed, continued debugging code logic rather than proactively questioning the methodology’s own limitations; required user prompting to pivot to the homepage approach
  • ✗ Weak handling of Semantic Scholar same-name ambiguity: lacks proactive entity disambiguation for common Chinese-to-English name translations; LLM analysis also cannot automatically identify ambiguous researchers
  • ✗ API signatures not verified before use: FastMCP version parameter and conda –no-banner were both found incompatible only after runtime failure

Next Week Outlook

Next week (2026-W12) focus: ①ErrorRecoveryBenchmark v5.1 implementation — complete D0 scenario regeneration for 5 fixed skills (target: 600+ scenarios), advance InjectionEngine refactor, motion speed limits, and keyboard teleoperation human demo collection pipeline; milestone: begin recovery strategy training before April 1; ②VLA-RoboTwin/pi05 — re-convert LeRobot dataset (including 5 new fields such as manip_progress), start the four-group auxiliary experiment training and comparative analysis, correct eval.sh checkpoint_id for formal policy evaluation; ③MIHD — complete raw_shared baseline diagnosis and reach a methodological fix conclusion, resolve the 151676 GPU retraining issue (pin PyTorch version), evaluate CrossModalEnhancer full GPU pipeline performance; ④gadget/research — deploy 7 researcher profiles to the Hugo research section, explicitly require English quotes in prompts to eliminate LLM-generated Chinese JSON pollution; ⑤UniVLA — complete CALVIN evaluation full pipeline validation (–single_gpu mode).

Token Usage Statistics

Daily Cost Trend

Date Tokens (millions) Cost ($)
2026-03-09 46.9 32.17
2026-03-11 30.5 20.75
2026-03-12 2.0 2.22
2026-03-13 3.0 2.23
2026-03-14 19.0 13.13
2026-03-15 135.3 100.70
unknown 72.5 56.27

Peak Day: 2026-03-15 — $100.70 / 135.3M tokens

Claude Code

Metric Value
Total Tokens 309,110,118
Input Tokens 315,228
Output Tokens 1,023,671
Cache Creation 22,299,827
Cache Reads 285,471,392
Total Cost $227.47

Model Usage Distribution

Model Cost ($) Input Tokens Output Tokens
claude-opus-4-6 203.57 170,917 554,482
claude-haiku-4-5-20251001 19.77 144,115 468,454
claude-sonnet-4-6 4.12 196 735