Blog - Shuaijun Liu

2026

A Small Multi-Agent GPU Deployment Demo

June 21, 2026 Multi-Agent Systems GPU Deployment LLM Inference Systems

A practical look at a local demo that tests how eight role-specialized LLM agents behave under different GPU placement and orchestration strategies.

UniLab: Robot RL Benchmarking Is a Systems Problem, Not Just a Simulator Race

June 01, 2026 Robotics Reinforcement Learning Simulation Systems

A practical look at UniLab and why efficient robot reinforcement learning depends on the whole simulation-learning loop, not just whether physics runs on the GPU.

3D Gaussian Splatting for Embodied Perception

May 15, 2026 Robotics 3D Vision Gaussian Splatting

Robots need world representations that are both useful for perception and cheap enough to update or query during operation. A map that looks beautiful but renders slowly may be hard to use in a control loop. A map that is fast but visually sparse may miss the details needed for manipulation.

3D Scene Graphs for Embodied Reasoning

May 15, 2026 Embodied AI 3D Perception Robotics

A robot needs more than pixels. It needs a memory of the world that supports questions such as:

Why TD3 Came Back: Fast Off-Policy RL for Humanoid Control

May 11, 2026 Robotics Reinforcement Learning Humanoid Control Simulation

FastTD3 is a useful reminder that off-policy RL did not disappear from robotics. It needed parallel simulation, large-batch updates, distributional critics, and a training loop designed for wall-clock speed.

A Map of Robot World Models

May 01, 2026 World Models Robot Learning Embodied AI Survey

A survey-guided map of what world models mean in robot learning, how they connect to policies, and why useful prediction is different from generic video generation.

What Is Replanning in Embodied AI?

April 30, 2026 Embodied AI Replanning Robotics

Replanning is one of the easiest ideas to explain and one of the hardest ideas to make work in embodied AI.

Generative Simulation for Robot Learning

April 24, 2026 Robotics Simulation Embodied AI

Robot learning has a data problem. Real-world data is valuable, but collecting it is slow, expensive, and hard to scale. Simulation is faster and safer, but hand-building simulated tasks, scenes, objects, rewards, and demonstrations can become another bottleneck.

Simulation as a Data Engine: RoboCasa and ManiSkill3

April 14, 2026 Robotics Simulation Embodied AI

Robot simulation is often introduced as a benchmark: train a policy, report a success rate, compare algorithms. That view is useful, but too narrow.

From VLM to VLA to WAM: What Changes When Models Start Acting?

March 25, 2026 Embodied AI Vision-Language-Action World Models Robotics

The move from Vision-Language Models (VLMs) to Vision-Language-Action Models (VLAs) and then to World Action Models (WAMs) is often described as a sequence of acronyms. That framing is convenient, but it hides the important point. These are not just three model names. They are three different contracts between a model and the world.

Diffusion Policy: Generating Robot Actions by Denoising

March 13, 2026 Robotics Imitation Learning Diffusion Models

Diffusion models are best known for generating images, but the same idea can be used for robot control. Instead of denoising pixels into an image, a robot policy can denoise random noise into an action sequence. This is the central idea behind Diffusion Policy.

Action Chunking: Why Robot Policies Predict Short Futures

March 10, 2026 Robotics Imitation Learning Action Chunking

A robot policy does not have to predict one action at a time. In many imitation-learning systems, the policy predicts a short sequence of future actions, then executes part of that sequence before querying the model again. This idea is called action chunking.

Robot Data at Scale: From Open X-Embodiment to DROID

February 10, 2026 Robotics Robot Data Embodied AI

Robot learning has a data problem that is easy to underestimate. A language model can consume text from the web. A vision model can learn from images and captions. A robot policy needs trajectories: observations, actions, timing, embodiment details, and the messy physical consequences of contact.

KV Cache: The Hidden Memory Bottleneck in LLM Serving

January 16, 2026 Large Language Models Inference Systems

When people talk about serving large language models, the first memory number they usually mention is model size. A 7B model in FP16 needs roughly 14 GB for weights. A 70B model needs far more. That is real, but it is not the whole serving problem.

AgiBot World Colosseo: Data, Models, Benchmarks, and Ecosystem

November 12, 2025 Benchmark World Models VLA

AgiBot World Colosseo is best read as a large-scale embodied manipulation platform that combines data, models, benchmarks, and ecosystem resources.

RoboFactory: Multi-Robot Tasks for Coordination and Role Allocation

November 11, 2025 Benchmark Multi-Agent Robotics

RoboFactory is useful when a robotics project needs multi-robot coordination tasks rather than another single-arm manipulation benchmark.

ManiSkill2: SAPIEN-Based Manipulation for Richer Simulation Tasks

November 10, 2025 Benchmark SAPIEN Manipulation

ManiSkill2 is useful when manipulation tasks need richer simulation assets, articulated interactions, and scalable visual or state-based evaluation.

MetaWorld: A Compact Benchmark for Multi-Task Tabletop Manipulation

November 09, 2025 Benchmark Manipulation Multi-Task Learning

MetaWorld is a MuJoCo tabletop manipulation benchmark with many compact tasks, useful for multi-task RL, meta-learning, and controlled task heterogeneity.

Habitat: Navigation Benchmarks for Embodied Feedback and Replanning

November 08, 2025 Benchmark Embodied AI Navigation

Habitat-style tasks are useful when the research question involves navigation, partial observability, replanning, and feedback-driven embodied behavior.

PushT: Why a Tiny 2D Control Task Is Still Useful

November 07, 2025 Benchmark Control Diffusion Policy

PushT is a single 2D pushing benchmark, but that simplicity makes it useful for debugging action chunks, policy outputs, and evaluation pipelines.

DROID: Real-Robot Data for Vision-Language-Action Research

November 06, 2025 Benchmark Real Robot Data VLA

DROID is best treated as real-world robot demonstration data for VLA training, action prediction, imitation learning, and trajectory analysis.

OGBench: Offline Goal-Conditioned RL Before Real-Robot Complexity

November 05, 2025 Benchmark Offline RL Goal-Conditioned RL

OGBench is a controlled benchmark for offline RL and offline goal-conditioned RL, useful before moving a method into real robot data or VLA evaluation.

RoboCasa: Kitchen-Scale Manipulation Beyond Toy Tabletop Tasks

November 04, 2025 Benchmark Household Robotics Manipulation

RoboCasa extends MuJoCo/RoboSuite-style manipulation into large kitchen task spaces with appliances, objects, fixtures, and composite household workflows.

RoboSuite: The Manipulation Workbench Behind Many Robot Benchmarks

November 03, 2025 Benchmark Simulation Manipulation

RoboSuite is useful as a modular MuJoCo manipulation workbench for task design, controller debugging, metrics, and reproducible rollout collection.

CALVIN: Long-Horizon Language-Conditioned Manipulation

November 02, 2025 Benchmark Robotics Long-Horizon

CALVIN focuses on language-conditioned manipulation over multi-step sequences, making it useful for testing whether policies can keep acting after the first subtask.

LIBERO: A Practical Guide to Language-Conditioned Manipulation Benchmarks

November 01, 2025 Benchmark Robotics VLA

LIBERO is useful when a robot policy must connect language instructions, visual observations, and task-level success in a controlled manipulation setting.

RAG: Retrieval-Augmented Generation from Indexing to Answers

October 16, 2024 Large Language Models Retrieval RAG

Retrieval-Augmented Generation, usually shortened to RAG, is often introduced as a simple recipe: retrieve relevant documents, put them into the prompt, and let a language model answer. That description is correct, but it is too shallow. A useful RAG system is not a prompt trick. It is an evidence pipeline around a language model.

LoRA: Fine-Tuning Large Models with Low-Rank Updates

May 08, 2024 Large Language Models Fine-Tuning PEFT

Fine-tuning a large model is conceptually simple: start from a pretrained checkpoint, run gradient descent on downstream data, and save the adapted model. The difficulty is that modern checkpoints are too large for this workflow to be cheap. Updating every parameter requires large optimizer states, large gradients, and large checkpoints. If every task needs a full model copy, storage and serving quickly become the bottleneck.