Daily Journal — 2026-02-28

Today’s Overview

What was done: Three parallel workstreams across two servers — on DCC: extended the MIHD benchmark to Visium HD data (crop10large, 17,502 spots) and fixed a critical scGPT weight-loading bug; on tianhe: implemented the BC-RNN Phoenix baseline automated training script and deployed Pi0.5 Phoenix for nine-task evaluation.
How it was done: On DCC, used code auditing and checkpoint key comparison to pinpoint and fix a single missing line, then completed 8-file full-pipeline HD support and generated 189 RM-IDEAL visualizations. On tianhe, systematically overcame SLURM constraints, WebSocket proxy interception, and multiple robosuite library-level compatibility issues — ultimately running 9 BC-RNN tasks in parallel on a single GPU.
Why it matters: The MIHD benchmark now covers HD datasets beyond DLPFC; scGPT recovers 17.7% of attention weights that were previously randomly initialized; staig_fusion’s multimodal fusion advantage is now quantitatively validated. On tianhe, a reusable VLA baseline evaluation and BC-RNN training pipeline is established, laying the groundwork for Phoenix paper ablation experiments.

DCC

What was done: Fixed the critical scGPT weight-loading bug and re-extracted embeddings for all 11 DLPFC sections; implemented full Visium HD crop10large pipeline support and completed RM-IDEAL evaluation for section 151673 (27 methods × 7 layers, 189 visualizations).
How it was done: Compared checkpoint keys against the model state_dict to locate a missing self.use_fast_transformer line in model.py; modified 8 core files across the data_loader/clustering/run_benchmark/pipeline modules to add HD unannotated-data support, with Leiden community detection for automatic cluster-count estimation and cached Wasserstein distance computation for speed.
Why it matters: scGPT ARI improved by an average of 44.4% and NMI by 33.3%; staig_fusion ranked first in RM-IDEAL evaluation (avg r=0.396, Layer_3 peak r=0.644), demonstrating that learned multimodal fusion captures spatial niche structure far better than single-modality baselines (avg r≈0.06–0.12).

tianhe

What was done: Investigated and confirmed that Pi0.5 Phoenix completed 100k training steps; debugged and launched nine-task rollout evaluation (7/9 complete); implemented train_bc_rnn_benchmark.py and fixed 5 library-level bugs, successfully launching 9 BC-RNN training jobs in parallel on a single GPU (with 50-rollout evaluation every 20 epochs).
How it was done: Confirmed training completion by directly inspecting the checkpoint directory; iteratively resolved tyro subcommand syntax errors, WebSocket proxy interception, robosuite API version incompatibilities, and SLURM account/memory/EGL constraints; used srun --overlap to reuse an existing interactive job’s GPU node, and patched three library-level defects: mujoco_py import, MimicGen environment registration, and the missing get_bounding_box_half_size method.
Why it matters: Pi0.5 Phoenix received its first evaluation deployment (Stack_D0 24%, Stack_D1 12%); BC-RNN 9-task parallel training is running stably (~2.2 GB VRAM per task), with TensorBoard logging training curves and success rates — full 600-epoch training is expected to complete in 35+ hours.

Completed full-pipeline Visium HD extension for the MIHD project on DCC (8 files modified) and fixed a critical scGPT weight-loading bug (ARI +44.4%); on tianhe, built the BC-RNN Phoenix baseline training pipeline from scratch and successfully launched 9 tasks in parallel training, while also completing the first Pi0.5 Phoenix evaluation deployment (7/9 tasks done).

Today’s Tasks

Architecture & Strategy

✅ Fixed scGPT use_fast_transformer weight-loading bug and re-extracted all embeddings — TransformerModel.__init__ was missing self.use_fast_transformer, which caused the Wqkv→in_proj_ key remapping in load_pretrained() to never execute. Q/K/V weights across all 12 attention layers (9,455,616 parameters, 17.7% of total) were randomly initialized. After the fix, all 11 DLPFC sections (151508–151676) had their scGPT embeddings re-extracted, old caches were backed up to scgpt_buggy_backup/, and spatial clustering visualizations were batch-generated to outputs/visualization/scgpt_fixed/.
✅ Implemented full Visium HD crop10large pipeline support (8 files modified) — After exploring HD data structure, selected the crop10large sub-region (17,502 spots) approach; modified 8 files across the data_loader/clustering/run_benchmark/pipeline modules to add HD data loading, unannotated auto-clustering (estimate_n_clusters_leiden), HD-specific visualization logic that skips ARI/NMI, and an hd_global config block; fixed coordinate space alignment, double-preprocessing, and spot size issues; end-to-end validation successful (leiden k=20, Silhouette=0.086).
🔄 Pi0.5 Phoenix nine-task MimicGen rollout evaluation deployment — Investigated and confirmed that Pi0.5 Phoenix (9-task LoRA fine-tuning) completed 100k training steps (checkpoint up to 99999) but had never been evaluated; catalogued three Pi0.5 model variants (official base / 9-task joint LoRA / single-task fine-tuned); debugged and launched step 99999 nine-task rollout evaluation (50 trials per task, 450 total); 7/9 tasks completed by end of session (Stack_D0 24%, Stack_D1 12%; ThreePieceAssembly D0/D1 still running).
🔄 BC-RNN Phoenix baseline script implementation and 9-task parallel training launch (with rollout evaluation) — Based on MimicGen original paper hyperparameters and aligned to the Phoenix CVPR 2025 paper’s 9-task settings, implemented train_bc_rnn_benchmark.py (5 modes: generate-configs/train/eval/report/status); after fixing three library-level bugs, used srun --overlap to launch all 9 training jobs in parallel on a single GPU on node an49 (Coffee tasks on GPU5, remaining 8 on GPU7), with 50-rollout online evaluation every 20 epochs; confirmed that the existing RobomimicPolicyAdapter is natively compatible with future error-injection requirements.
✅ Section 151673 full RM-IDEAL evaluation (27 methods × 7 layers, 189 visualizations) — Ran RM-IDEAL evaluation for all 27 embedding methods on DLPFC section 151673, computing Spearman r between cosine similarity and RM-IDEAL ground truth; generated 189 three-panel visualizations (niche query + RM-IDEAL + embedding similarity) after the initial run for numerics only; results saved to outputs/rm_ideal_evaluation/151673/summary.csv.

Implementation & Bug Fixes

✅ visualize_from_cache.py HD adaptation and spot size fix — Added HD support to visualize_from_cache.py (process_hd_section function, --dataset/--crop_dir arguments) and implemented create_hd_clustering_visualization in utils/visualization.py (H&E + clustering dual-panel); fixed sc.pl.spatial size parameter (1.0→4.0) to match HD’s spot_diameter_fullres=7.3 vs DLPFC’s ~144 (k=17, Silhouette=0.302).
✅ Pi0.5 training data provenance and three model variant review — Clarified the origin and purpose of three Pi0.5 variants in the project: official pretrained base (zero-shot), 9-task LoRA fine-tuning (4,500 episodes, 5 task prompts, 100k steps on 4×A800, phoenix_comparison), and zhaoganlong’s single-task fine-tuned variant; confirmed 9-task training data details (500 demos per task, 84×84 dual-camera + 8D state → 7D action).
✅ Updated project overview summary.md (v4.13) — Added v4.13 entry, file manifest, and version history to the Error Recovery Benchmark project overview summary.md.

Problems & Solutions

Critical Issues

1. scGPT’s TransformerModel.init did not save use_fast_transformer as an instance attribute, preventing the Wqkv→in_proj_ key remapping in load_pretrained() from ever executing — 17.7% of attention weights were randomly initialized, and strict=False silently skipped mismatched keys without raising an error.

Solution: Added self.use_fast_transformer = use_fast_transformer at line 64 of model.py. After the fix, all 186/186 parameters matched, and ARI improved by an average of 44.4%.

Key Insight: PyTorch’s strict=False is a double-edged sword — it permits partial loading but silently ignores mismatched keys, allowing critical weights to run at random values indefinitely. The same bug exists in the upstream official GitHub repository; weight-loading code paths in open-source projects must be actively audited rather than blindly trusted.

2. CoffeeMachineBodyObject was missing a get_bounding_box_half_size() method, causing an AttributeError during MimicGen Coffee task rollout initialization. The method was called but never implemented in the robosuite base classes — an interface gap between MimicGen and the current robosuite version.

Solution: Traced the full call chain (coffee_machine.py → CoffeeMachineBodyObject → CompositeBodyObject → MujocoXMLObject) and implemented get_bounding_box_half_size() at three different base class levels, computing bounding box half-sizes from each class’s geometric data.

Key Insight: Fixing a missing method in a third-party library requires tracing the full inheritance chain — patching only the nearest call site leaves other subclasses still broken. API incompatibilities exist across different robosuite forks; dependencies should be pinned to the same commit used during model training.

3. The WebSocket client consistently threw ConnectionRefusedError even though the server process was already listening on port 8000. Trying different address formats (localhost/127.0.0.1/0.0.0.0) all failed.

Solution: Discovered that http_proxy was set to 127.0.0.1:10087, causing the websockets library to route connections through the proxy. After unsetting all proxy variables at the start of run_eval.sh, the connection succeeded.

Key Insight: HTTP/HTTPS proxy environment variables transparently intercept WebSocket connections. When debugging local services on HPC clusters, proxy variables should be the first thing to check; this class of issue is particularly common in HPC environments and should be on the standard troubleshooting checklist.

4. robomimic’s env_robosuite.py directly imported mujoco_py at the top level (not installed), and the absence of an import mimicgen call meant MimicGen environment variants (Coffee_D0, etc.) were never registered with robosuite — causing rollout environment initialization to fail.

Solution: Wrapped import mujoco_py in a try/except block; added import mimicgen to trigger environment registration via side effect; confirmed use of the same robosuite version as training (zhaoganlong dependency directory).

Key Insight: MimicGen environments register via import side effects — any external tool calling these environments must explicitly perform that import first. robomimic itself is unaware of this dependency; it must be injected at the integration layer.

5. In the HD data’s adata_8um.h5ad, uns[‘spatial’] contained the full-image hires (6000×3886), while coordinates were in the cropped fullres pixel space (0–4966, 0–2971) — a mismatch that caused visualization misalignment. Additionally, X had already been log-transformed, but the existing preprocess_data() would apply normalize_total+log1p again, causing double transformation.

Solution: In load_hd_data(), replaced the uns['spatial'] image with cropped_fullres.tif and recomputed the scale factor; added a skip_log parameter to preprocess_data() — HD data is passed skip_log=True to perform only HVG filtering.

Key Insight: In preprocessed HD adata, the image and coordinates originate from different processing stages; coordinate space and image must be explicitly aligned. Data-loading functions should carry a record of which preprocessing steps have already been applied, to prevent double transformation.

6. In non-TTY environments, robomimic prompts the user for interactive confirmation when a checkpoint directory already exists; receiving EOF causes an immediate EOFError exit — resulting in repeated parallel launch failures (partially failed runs leave behind directories that trigger the overwrite check on the next launch).

Solution: Thoroughly cleaned the corresponding checkpoint directory before each (re)launch to avoid triggering the overwrite check; used nohup + srun --overlap instead of sbatch to ensure processes don’t terminate when the shell exits.

Key Insight: robomimic’s get_exp_dir() exits immediately in non-TTY environments when it finds an existing directory. The directory must be confirmed clean before each launch; existing directories trigger the overwrite check on the next launch, creating a failure loop.

7. Pi0.5 training was believed to have failed because SLURM job 46553 crashed. The AI initially gave an incorrect judgment of “training incomplete” based on a stale conversation cache (which recorded a step 5000 crash).

Solution: Directly inspected the checkpoint directory and found complete checkpoints from step 4000 through 99999, confirming that training resumed after the crash and ran all 100k steps to completion.

Key Insight: The checkpoint directory is the most authoritative evidence of training completion — more reliable than log files or conversation records. A SLURM job crash does not equal training failure; always verify checkpoints directly rather than relying on records.

General Issues

8. HD data spot_diameter_fullres=7.3 (vs DLPFC ~144), so sc.pl.spatial(size=1.0) rendered points that were nearly invisible. The user caught this by visually inspecting the output.

Solution: Adjusted the HD visualization size parameter from 1.0 to 4.0, making dots approximately 29px in diameter (≈ bin spacing of 29.2 fullres px).

Key Insight: scanpy spatial’s size parameter is a multiplier of spot_diameter_fullres — different-resolution datasets require different multipliers. HD 8µm spot diameter is roughly 1/20th of Visium, so a proportionally larger size multiplier is needed to fill the bin spacing.

9. Multiple SLURM constraints on the tianhe cluster caused repeated job submission failures: missing –account=sysu_gbli2 (group permission), –mem blocked by policy, MUJOCO_EGL_DEVICE_ID requiring physical GPU index rather than CUDA_VISIBLE_DEVICES logical index; insufficient checking of per-GPU free memory before allocation led to coffee_d0 OOM.

Solution: Queried account name via sacctmgr and added --account; replaced --mem with --gres=gpu:1; set MUJOCO_EGL_DEVICE_ID to the actual physical GPU index (5 or 7); checked per-GPU compute-app occupancy in real time and re-allocated tasks accordingly.

Key Insight: EGL device IDs map directly to physical GPUs and are not affected by CUDA_VISIBLE_DEVICES remapping. Before submitting to a new cluster for the first time, confirm account names and QOS limits via sacctmgr; always check per-GPU compute-apps rather than just the aggregate memory summary.

10. run_benchmark.py contained a local copy of cluster_embeddings that duplicated functionality in utils/clustering.py. When the utility library was updated, the local copy was not synced, causing a KMeans InvalidParameterError when n_clusters=-1.

Solution: Added the same automatic Leiden estimation logic to the local cluster_embeddings function in run_benchmark.py, keeping it in sync with the utils/clustering.py changes.

Key Insight: Large scripts often contain local copies of utility functions. When modifying a utility library, always check whether a same-named copy exists inside scripts — or eliminate the duplication through refactoring.

Human Thinking vs. AI Thinking

Strategic Level

scGPT Bug Root-Cause Analysis: Domain Experience (Colleague Tip) vs. Systematic Code Verification

Role	Approach
Human	A colleague, drawing on experience with Flash Attention / PyTorch incompatibilities, directly pointed toward checkpoint key name mismatches as the root cause, dramatically narrowing the search space.
AI	Systematically read the code, compared checkpoint keys against the model state_dict, and wrote a verification script to precisely pinpoint the single missing line.

Analysis: The human leveraged domain experience and community knowledge to quickly form a directional hypothesis; the AI executed systematic verification and precise localization. They are complementary — without the human’s directional hint, the AI would have needed a broader search; without the AI’s code verification, the human’s hypothesis would have been hard to confirm.

Experiment Iteration Strategy: Small-Scale Proxy Dataset First

Role	Approach
Human	Explicitly directed using the crop10large sub-region (17,502 spots) first to avoid immediately processing all 545K spots, prioritizing method feasibility before scaling up.
AI	Initially planned for the full HD dataset and designed a complete full-scale processing architecture, without proactively suggesting starting from a smaller scale.

Analysis: The human has stronger engineering intuition around experiment iteration speed and instinctively reached for a small-scale proxy dataset. The AI tends toward designing complete end-to-end solutions in one shot, lacking a conservative validation mindset.

HPC Resource Management — Global Perspective (GPU Parallelism Strategy, Rollout Enabling, Resource Pre-checks)

Role	Approach
Human	Based on prior knowledge of BC-RNN VRAM footprint (~2 GB/task), directly proposed running 9 tasks in parallel on a single GPU; proactively requested in-training rollout evaluation to be enabled; reminded the AI to first check all running jobs and GPU free state before executing commands.
AI	Followed the conservative “one task per GPU” paradigm and did not proactively suggest high-density parallelism; generated configs with template defaults (rollout.enabled=false) without optimizing proactively; focused on resolving technical blockers while overlooking macro resource state.

Analysis: The human has stronger systemic awareness of HPC resource management and can translate domain knowledge (VRAM estimation) into engineering decisions. The AI tends to focus on current technical obstacles while missing resource constraints and configuration optimization opportunities.

Bug Fix Methodology: Full Call-Chain Tracing vs. Single-Point Patching

Role	Approach
Human	Simply asked to “fix it,” without specifying where or how.
AI	Systematically traced the full call chain (coffee_machine.py → CoffeeMachineBodyObject → CompositeBodyObject → MujocoXMLObject) and added appropriate implementations at three different base-class levels.

Analysis: The AI’s debugging approach was more systematic — it correctly identified that the method needed to be added across multiple base classes rather than just at the nearest call site, demonstrating complete analysis of the class inheritance structure.

Literature Citation Sourcing: Distinguishing Evaluation-Setting References from Hyperparameter References

Role	Approach
Human	Explicitly specified: use the Phoenix paper for evaluation setting alignment, but BC-RNN hyperparameters should reference the original MimicGen paper (arXiv:2310.17596), not the Phoenix paper.
AI	Attempted to extract BC-RNN hyperparameters from the Phoenix paper; when blocked by network restrictions, performed a web search — unable to intuitively distinguish the different intent behind each citation source.

Analysis: The human has clearer provenance awareness in academic literature usage, distinguishing between “evaluation-setting reference” and “algorithm-hyperparameter reference.” The AI needs explicit instruction to correctly differentiate citation intent.

Future Research Perspective: Forward-Looking Compatibility Design for Error Injection

Role	Approach
Human	Proactively raised the future need to inject errors into trained BC-RNN models and collect rollout scenarios, requesting that the code be designed for compatibility in advance — connecting current tooling to future experimental requirements.
AI	Focused on completing the immediate training and evaluation functionality; upon receiving the prompt, analyzed the existing framework (`RobomimicPolicyAdapter`) and confirmed it was natively compatible — but did not consider this proactively.

Analysis: The human has a longer research horizon and can anticipate future uses of current tools. The AI tends to focus on the task at hand, lacking proactive awareness of future compatibility planning.

AI Limitations

Important Limitations

Relies on stale conversation cache (e.g., Pi0.5 step 5000 crash record) without proactively verifying — needed to actually inspect the filesystem before correcting the wrong judgment; over-trust in stale memory can produce misleading conclusions, so actual file state should always be the ground truth.
Lacks a systematic checklist for HPC environment trap diagnosis: did not proactively check proxy environment variables (early in WebSocket connection failures), hardcoded MUJOCO_EGL_DEVICE_ID as a logical rather than physical ID, and repeatedly launched large GPU services during debugging without cleaning up old processes (leading to OOM) — all required human prompts to surface.
Poor srun process lifecycle management: multiple times misidentified “task failed” due to empty logs and repeatedly cleaned/restarted unnecessarily; lacked a “check process state first, then decide” pre-judgment step. A process may already be running but producing no visible output; state checks should be combined with log checks before concluding failure.
Experiment scale planning defaults to full-scale, without proactively suggesting iterative validation from a small-scale proxy dataset (HD data 545K full-scale plan had to be corrected by the user to crop10large); lacks proactive awareness of experiment iteration efficiency optimization.

General Limitations

Unable to anticipate cross-file code consistency issues (local cluster_embeddings copy in run_benchmark.py out of sync) and cross-dataset visualization parameter differences (HD spot size too small) — required the user to observe actual output before triggering a second round of fixes.

Today’s Takeaways

Core Takeaways

PyTorch’s model.load_state_dict(strict=False) silently ignores mismatched keys, allowing critical weights to run at random values without any error — this class of bug can remain latent for a long time. Production code should actively print missing_keys/unexpected_keys and validate parameter value statistics after loading. The same bug exists in the upstream official GitHub repository; weight-loading paths in open-source code must be actively audited rather than blindly trusted.
staig_fusion consistently outperformed all baselines in RM-IDEAL evaluation (section 151673 avg r=0.396, Layer_3 peak r=0.644), proving that learned multimodal fusion captures spatial niche structure far better than simple fusion strategies (concat/mean, avg r≈0.15) and single-modality baselines (gene/vision, avg r≈0.06–0.12).
MimicGen environment variants (Coffee_D0, Stack_D1, etc.) are registered with robosuite via the side effect of import mimicgen; any external tool (e.g., robomimic) calling these environments must perform that import first. API incompatibilities exist across different robosuite forks — training and evaluation must be pinned to the same commit.
HTTP/HTTPS proxy environment variables (http_proxy/https_proxy) are transparently applied to WebSocket connections by Python’s websockets library, causing ws://localhost:xxxx connections to fail when routed through a proxy. When running local WebSocket services on an HPC cluster, unset proxy variables or set no_proxy=localhost before starting the client.
When fixing a missing method in a third-party library, trace the full class inheritance chain and implement the method in all involved base classes. Patching only the most direct call site will leave other subclasses still failing on the same call.
crop10large (17,502 spots) is an ideal proxy dataset for validating HD methods: its scale is comparable to DLPFC, it comes with a corresponding cropped fullres image, and it allows full pipeline validation without modifying architectural assumptions. Leiden community detection (resolution=1.0) works well as an unsupervised cluster-count estimator for HD data (k=17–20, Silhouette=0.302) and is a good default clustering strategy for unannotated HD data.
On an A800 80GB GPU, a single BC-RNN training task occupies approximately 2.2 GB of VRAM, enabling high-density task parallelism on a single GPU — actual bottlenecks are CPU/IO rather than VRAM. For scanpy spatial visualization, the size parameter is a multiplier of spot_diameter_fullres: DLPFC (diameter≈144) works well with size=1.0, while HD 8µm (diameter≈7.3) requires size≈4.0 to fill the bin spacing.

Practical Takeaways

tianhe cluster SLURM-specific constraints: must specify --account=sysu_gbli2; --mem is blocked by QOS policy — use --gres=gpu:1 instead; MUJOCO_EGL_DEVICE_ID corresponds to physical GPU index, not the CUDA_VISIBLE_DEVICES logical index; srun --overlap can be used to attach a new step to an existing interactive job and reuse its node resources, without re-submitting a new job.

Session Summaries

MIHD

✅ DCC Full-Day Work: scGPT Bug Fix + Full RM-IDEAL Evaluation + Visium HD Full Pipeline Implementation 21:23:19.892 | claude_code Three major tasks completed on DCC: ① Fixed the use_fast_transformer attribute omission bug in scGPT-spatial’s TransformerModel, re-extracted embeddings for all 11 DLPFC sections (ARI avg +44.4%, NMI +33.3%), and generated scgpt_fixed visualizations; ② Ran full RM-IDEAL evaluation for 27 embedding methods on section 151673 — staig_fusion ranked first with avg r=0.396 (Layer_3 peak r=0.644), and leveraged caching to quickly generate 189 three-panel visualizations; ③ Extended the MIHD benchmark to Visium HD crop10large (17,502 spots) by modifying 8 core files to add unannotated HD support (automatic cluster-count estimation via Leiden), and fixed HD visualization coordinate alignment, double-preprocessing, and spot size issues — end-to-end validation successful (k=17, Silhouette=0.302).

Error Recovery Benchmark

🔄 tianhe Full-Day Work: Pi0.5 Phoenix Evaluation Deployment + BC-RNN Baseline Script Implementation and 9-Task Parallel Training Launch 20:53:34.791 | claude_code Two parallel workstreams completed on tianhe (node an49): ① Pi0.5 Phoenix evaluation: investigated and confirmed that 100k-step training was complete (checkpoints up to 99999) but had never been evaluated; catalogued three Pi0.5 model variants; debugged and resolved tyro subcommand syntax errors, HTTP proxy intercepting WebSocket, robosuite API version incompatibilities, and missing env.seed() — successfully launched nine-task rollout evaluation (step 99999, 50 trials per task); 7/9 tasks completed by end of session (Stack_D0 24%, Stack_D1 12%; ThreePieceAssembly D0/D1 still running); ② BC-RNN baseline: referencing MimicGen original paper hyperparameters, created train_bc_rnn_benchmark.py (5 modes); fixed SLURM account/memory/EGL constraints, mujoco_py import, MimicGen environment registration, and get_bounding_box_half_size library-level bugs; used srun --overlap to launch 9 parallel training jobs on a single GPU (with rollout evaluation); training running stably (~2.2 GB VRAM per task); confirmed that the existing RobomimicPolicyAdapter is natively compatible with future error-injection requirements; full 600-epoch training expected to complete in 35+ hours.

Token Usage

Overview

Metric	Value
Total Tokens	53,226,640
Input Tokens	25,177
Output Tokens	129,735
Cache Created	2,251,309
Cache Read	50,820,419
Cache Hit Rate	95.8%
Total Cost (USD)	$34.9126

Model Breakdown

Model	Input	Output	Cache Created	Cache Read	Cost	Share
claude-opus-4-6	10,689	85,854	1,506,168	42,629,752	$32.9282	94.3%
claude-haiku-4-5-20251001	14,488	43,881	745,141	8,190,667	$1.9844	5.7%

Per-Device Usage

Device	Total Tokens	Input	Output	Cost
DCC	52,737,471	25,154	128,907	$34.5661
tianhe	0	0	0	$0.0000
TzJsDesktop	489,169	23	828	$0.3465

Daily Journal — 2026-02-28#

Today’s Overview#

DCC#

tianhe#

Today’s Tasks#

Architecture & Strategy#

Implementation & Bug Fixes#

Problems & Solutions#

Critical Issues#

3. The WebSocket client consistently threw ConnectionRefusedError even though the server process was already listening on port 8000. Trying different address formats (localhost/127.0.0.1/0.0.0.0) all failed.#

4. robomimic’s env_robosuite.py directly imported mujoco_py at the top level (not installed), and the absence of an import mimicgen call meant MimicGen environment variants (Coffee_D0, etc.) were never registered with robosuite — causing rollout environment initialization to fail.#

7. Pi0.5 training was believed to have failed because SLURM job 46553 crashed. The AI initially gave an incorrect judgment of “training incomplete” based on a stale conversation cache (which recorded a step 5000 crash).#

General Issues#

8. HD data spot_diameter_fullres=7.3 (vs DLPFC ~144), so sc.pl.spatial(size=1.0) rendered points that were nearly invisible. The user caught this by visually inspecting the output.#

10. run_benchmark.py contained a local copy of cluster_embeddings that duplicated functionality in utils/clustering.py. When the utility library was updated, the local copy was not synced, causing a KMeans InvalidParameterError when n_clusters=-1.#

Human Thinking vs. AI Thinking#

Strategic Level#

scGPT Bug Root-Cause Analysis: Domain Experience (Colleague Tip) vs. Systematic Code Verification#

Experiment Iteration Strategy: Small-Scale Proxy Dataset First#

HPC Resource Management — Global Perspective (GPU Parallelism Strategy, Rollout Enabling, Resource Pre-checks)#

Bug Fix Methodology: Full Call-Chain Tracing vs. Single-Point Patching#

Literature Citation Sourcing: Distinguishing Evaluation-Setting References from Hyperparameter References#

Future Research Perspective: Forward-Looking Compatibility Design for Error Injection#

AI Limitations#

Important Limitations#

General Limitations#

Today’s Takeaways#

Core Takeaways#

Practical Takeaways#

Session Summaries#

MIHD#

Error Recovery Benchmark#

Token Usage#

Overview#

Model Breakdown#

Per-Device Usage#