Weekly Summaries

Weekly Summary 2026-W13

This week, approximately 10 projects were advanced in parallel across three devices (TzJsDesktop / tianhe / DCC). Core achievements: gadget’s summarize (2930 lines → 8 modules + 72 tests) and research_scout (2934 lines → 7 sub-packages) both completed systematic refactoring, with a new natural-language paper search ask command added; TokenMonitor evolved from a macOS-exclusive tool into a cross-platform multi-device SSH cost tracking platform (including Windows-native UX, floating ball, ccusage integration, LiteLLM dynamic pricing, comprehensive security hardening, and multiple successful MSI/NSIS installer builds); Error Recovery Benchmark completed Pipeline 2 full end-to-end design and implementation plus Context Replay architecture refactoring (163 tests all passing); ccplan / cchypothesis / optimize and other Claude Code toolchain components received systematic upgrades. On the robotics research front: Pi0.5 full-task rollout evaluation was completed (revealing extreme divergence: Stack 96% vs PickPlace 6%), BOSS benchmark was engineered into production, and openvla-oft training scripts were created. MIHD spatial transcriptomics completed QueST protocol alignment and an 8-encoder benchmark framework was set up.

Weekly Summary 2026-W12

This week spanned three devices — DCC, tianhe, and TzJsDesktop — with deep, broad advances across two research tracks: robot manipulation and spatial transcriptomics. The Error Recovery Benchmark progressed from collection scheme design (RBG grouping with 329-demo budget) to architecture-level trajectory segmentation refactoring (InteractionSegmenter), reaching a final count of 1,627 training scenes (148 subtypes, +35%). MIHD spatial transcriptomics completed the full cross-section embedding alignment pipeline and established scGPT’s zero-shot superiority (100% hit rate vs. UNI2’s 71%). π₀.₅ completed the full training pipeline for the task completion detection head (loss≈0.253) and designed five conditioning strategies for Exp5–9. The gadget toolchain completed Research Profiler disambiguation architecture refactoring, a unified deploy staging architecture for the website, and upgraded all ECC agents to opus + max thinking. The week’s core breakthroughs centered on ‘finding and fixing architecture-level root causes’: three systemic issues — non-comparable per-section PCA coordinate spaces, multi-object target_object ambiguity, and Flax NNX inheritance vs. composition — were all fundamentally resolved.

Weekly Summary 2026-W11

This week, six parallel workstreams advanced across three machines (DCC, tianhe, TzJsDesktop): ①MIHD spatial transcriptomics uncovered a fundamental methodological flaw in cross-sample embedding (per-section independent processing causes incomparable feature spaces) and initiated a fix; ②ErrorRecoveryBenchmark scaled from bug fixes to 13 skills/29 subtypes, solved the Drop skill object-not-falling issue, exposed online quota architecture limitations, and established offline injection as the new direction; ③VLA-RoboTwin/pi05 achieved end-to-end progress from environment setup and training performance optimization (JAX version alignment +33% speedup) to new data variable collection and auxiliary task experiments; ④gadget toolchain completed an architectural upgrade with MCP Server + common/ shared package + unified output directory, and the research profiler achieved homepage-first student discovery; ⑤CalendarPro completed 7-phase comprehensive optimization with all 230 tests passing and token consumption reduced by 40–60%; ⑥gadget research toolchain integrated citation graph analysis and produced deep profiles for 7 embodied AI researchers.

Weekly Summary 2026-W10

This week, spanning three devices — DCC, Tianhe HPC, and TzJsDesktop — four parallel tracks advanced: spatial transcriptomics research, robot manipulation training/evaluation, AI personal assistant, and paper management. The MIHD project completed a full suite of scGPT+UNI2 fusion experiments (QFormer avg ARI=0.370, +117% vs scGPT-only) and established a zero-shot cross-sample evaluation framework. Pi0.5 LoRA fine-tuning achieved an overall 58.9% success rate, decisively outperforming BC-RNN (0%) and quantitatively validating VLA model superiority. CalendarPro underwent an architectural leap from reactive to proactive decision-making (321 tests passing), while uncovering and fixing critical integration bugs including BackgroundCoordinator never being started. gadget Research Scout was built from scratch to production-ready in a single day, implementing a two-stage LLM paper evaluation pipeline and generating 3 actionable research directions for the first time. error_recovery_benchmark completed full cleanup of 65 symlinks and built the MP4 visualization infrastructure for error scenarios. The most important engineering lessons of the week: passing tests ≠ system availability (the integration layer must be verified independently), and before designing agentic systems, proactively study the architectural patterns of mature comparable projects.

Weekly Summary 2026-W09

This week we ran two parallel workstreams across the DCC and Tianhe clusters: spatial transcriptomics (MIHD) and the robot Error Recovery Benchmark. On DCC, we fixed a critical scGPT weight-loading bug (average ARI improvement of 44.4%), extended the MIHD benchmark to Visium HD datasets, and completed a large-scale repository refactor (~250K lines of code). On Tianhe, we built the BC-RNN Phoenix baseline training pipeline from scratch (9 tasks in parallel), identified and fixed the root cause of Pi0.5 evaluation achieving 0% success rate (task distribution mismatch), obtained key conclusions from the M14 baseline evaluation (learned policies achieve SR≈0% in error scenarios, justifying M15 LoRA fine-tuning), and successfully launched Pi0.5 LoRA 9-task parallel fine-tuning stably across 6×A800 GPUs.

Weekly Summary 2026-W08

This week centered on the MIHD spatial transcriptomics project, completing a systematic survey of H&E Image-Only clustering methods (establishing ARI 0.11–0.16 literature baseline), implementing three self-supervised enhancement schemes (SCAN boosted ARI from 0.251 to 0.303, +20.6%), and building the Vision Refinement two-stage fusion framework. Simultaneously on the tianhe cluster: Error Recovery Benchmark (M14 evaluation infrastructure validated, full 649-scenario evaluation launched) and Phoenix pi0.5 reproduction data pipeline (18.4GB MimicGen dataset ingested, training config ready). Resolved multiple engineering blockers including STEGO NaN, double normalization bug, lerobot version conflict, and HuggingFace proxy issues. Pi0.5 OOM and visualize_scene.py video validation remain blocked for next week.

Weekly Summary 2026-W07

This week spanned three parallel tracks: robotics simulation, bioinformatics, and toolchain work. error_recovery_benchmark hit a deep blocker in force injection debugging (30N had no visible effect on the OSC controller), revealing a fundamental issue with controller cancellation; MIHD completed the Chinese documentation for its enhancement plan; ccusage shipped GLM multi-model billing support with type/format checks passing; robobrain_pi confirmed the training pipeline is ready. This week exposed clear pattern-reuse failures in AI — repeating the same error (SSH missing cd commands) and making bad environment assumptions (proxy, pnpm) — while human insight at key decision points (extreme force testing, detecting local GPU, reusing existing pricing files) proved far more efficient.

Weekly Summary 2026-W06

This week (2026-02-06~07) work focused on two main tracks: first, a systematic root-cause analysis of the ARI performance gap between the MIHD project’s staig_fusion and the original STAIG (0.21 → target 0.56), identifying and quantifying five key implementation differences and completing the code-level strict-alignment override refactor; second, an engineering upgrade to the benchmark tool — fixing a bar chart rendering bug in reports, implementing an end-to-end GitHub Pages auto-publish pipeline, and adding interactive CLI upload functionality. Additionally, one TOEFL English speaking practice session was recorded (personal study).

Weekly Summary 2025-W40

This week (2025-W40) had only one active session on October 2nd, and that session remained in the initialization phase — the user opened the core tracking and segmentation files of the AutoSeg-SAM2 project but did not proceed to any substantive development work. No meaningful progress was made this week overall.