2026

A Small Multi-Agent GPU Deployment Demo
June 21, 2026 Multi-Agent Systems GPU Deployment LLM Inference Systems
A practical look at a local demo that tests how eight role-specialized LLM agents behave under different GPU placement and orchestration strategies.
UniLab: Robot RL Benchmarking Is a Systems Problem, Not Just a Simulator Race
June 01, 2026 Robotics Reinforcement Learning Simulation Systems
A practical look at UniLab and why efficient robot reinforcement learning depends on the whole simulation-learning loop, not just whether physics runs on the GPU.
3D Gaussian Splatting for Embodied Perception
May 15, 2026 Robotics 3D Vision Gaussian Splatting
Robots need world representations that are both useful for perception and cheap enough to update or query during operation. A map that looks beautiful but renders slowly may be hard to use in a control loop. A map that is fast but visually sparse may miss the details needed for manipulation.
3D Scene Graphs for Embodied Reasoning
May 15, 2026 Embodied AI 3D Perception Robotics
A robot needs more than pixels. It needs a memory of the world that supports questions such as:
Why TD3 Came Back: Fast Off-Policy RL for Humanoid Control
May 11, 2026 Robotics Reinforcement Learning Humanoid Control Simulation
FastTD3 is a useful reminder that off-policy RL did not disappear from robotics. It needed parallel simulation, large-batch updates, distributional critics, and a training loop designed for wall-clock speed.
A Map of Robot World Models
May 01, 2026 World Models Robot Learning Embodied AI Survey
A survey-guided map of what world models mean in robot learning, how they connect to policies, and why useful prediction is different from generic video generation.
What Is Replanning in Embodied AI?
April 30, 2026 Embodied AI Replanning Robotics
Replanning is one of the easiest ideas to explain and one of the hardest ideas to make work in embodied AI.
Generative Simulation for Robot Learning
April 24, 2026 Robotics Simulation Embodied AI
Robot learning has a data problem. Real-world data is valuable, but collecting it is slow, expensive, and hard to scale. Simulation is faster and safer, but hand-building simulated tasks, scenes, objects, rewards, and demonstrations can become another bottleneck.
Simulation as a Data Engine: RoboCasa and ManiSkill3
April 14, 2026 Robotics Simulation Embodied AI
Robot simulation is often introduced as a benchmark: train a policy, report a success rate, compare algorithms. That view is useful, but too narrow.
From VLM to VLA to WAM: What Changes When Models Start Acting?
March 25, 2026 Embodied AI Vision-Language-Action World Models Robotics
The move from Vision-Language Models (VLMs) to Vision-Language-Action Models (VLAs) and then to World Action Models (WAMs) is often described as a sequence of acronyms. That framing is convenient, but it hides the important point. These are not just three model names. They are three different contracts between a model and the world.
Diffusion Policy: Generating Robot Actions by Denoising
March 13, 2026 Robotics Imitation Learning Diffusion Models
Diffusion models are best known for generating images, but the same idea can be used for robot control. Instead of denoising pixels into an image, a robot policy can denoise random noise into an action sequence. This is the central idea behind Diffusion Policy.
Action Chunking: Why Robot Policies Predict Short Futures
March 10, 2026 Robotics Imitation Learning Action Chunking
A robot policy does not have to predict one action at a time. In many imitation-learning systems, the policy predicts a short sequence of future actions, then executes part of that sequence before querying the model again. This idea is called action chunking.
Robot Data at Scale: From Open X-Embodiment to DROID
February 10, 2026 Robotics Robot Data Embodied AI
Robot learning has a data problem that is easy to underestimate. A language model can consume text from the web. A vision model can learn from images and captions. A robot policy needs trajectories: observations, actions, timing, embodiment details, and the messy physical consequences of contact.
KV Cache: The Hidden Memory Bottleneck in LLM Serving
January 16, 2026 Large Language Models Inference Systems
When people talk about serving large language models, the first memory number they usually mention is model size. A 7B model in FP16 needs roughly 14 GB for weights. A 70B model needs far more. That is real, but it is not the whole serving problem.

2025

AgiBot World Colosseo: Data, Models, Benchmarks, and Ecosystem
November 12, 2025 Benchmark World Models VLA
AgiBot World Colosseo is best read as a large-scale embodied manipulation platform that combines data, models, benchmarks, and ecosystem resources.
RoboFactory: Multi-Robot Tasks for Coordination and Role Allocation
November 11, 2025 Benchmark Multi-Agent Robotics
RoboFactory is useful when a robotics project needs multi-robot coordination tasks rather than another single-arm manipulation benchmark.
ManiSkill2: SAPIEN-Based Manipulation for Richer Simulation Tasks
November 10, 2025 Benchmark SAPIEN Manipulation
ManiSkill2 is useful when manipulation tasks need richer simulation assets, articulated interactions, and scalable visual or state-based evaluation.
MetaWorld: A Compact Benchmark for Multi-Task Tabletop Manipulation
November 09, 2025 Benchmark Manipulation Multi-Task Learning
MetaWorld is a MuJoCo tabletop manipulation benchmark with many compact tasks, useful for multi-task RL, meta-learning, and controlled task heterogeneity.
Habitat: Navigation Benchmarks for Embodied Feedback and Replanning
November 08, 2025 Benchmark Embodied AI Navigation
Habitat-style tasks are useful when the research question involves navigation, partial observability, replanning, and feedback-driven embodied behavior.
PushT: Why a Tiny 2D Control Task Is Still Useful
November 07, 2025 Benchmark Control Diffusion Policy
PushT is a single 2D pushing benchmark, but that simplicity makes it useful for debugging action chunks, policy outputs, and evaluation pipelines.
DROID: Real-Robot Data for Vision-Language-Action Research
November 06, 2025 Benchmark Real Robot Data VLA
DROID is best treated as real-world robot demonstration data for VLA training, action prediction, imitation learning, and trajectory analysis.
OGBench: Offline Goal-Conditioned RL Before Real-Robot Complexity
November 05, 2025 Benchmark Offline RL Goal-Conditioned RL
OGBench is a controlled benchmark for offline RL and offline goal-conditioned RL, useful before moving a method into real robot data or VLA evaluation.
RoboCasa: Kitchen-Scale Manipulation Beyond Toy Tabletop Tasks
November 04, 2025 Benchmark Household Robotics Manipulation
RoboCasa extends MuJoCo/RoboSuite-style manipulation into large kitchen task spaces with appliances, objects, fixtures, and composite household workflows.
RoboSuite: The Manipulation Workbench Behind Many Robot Benchmarks
November 03, 2025 Benchmark Simulation Manipulation
RoboSuite is useful as a modular MuJoCo manipulation workbench for task design, controller debugging, metrics, and reproducible rollout collection.
CALVIN: Long-Horizon Language-Conditioned Manipulation
November 02, 2025 Benchmark Robotics Long-Horizon
CALVIN focuses on language-conditioned manipulation over multi-step sequences, making it useful for testing whether policies can keep acting after the first subtask.
LIBERO: A Practical Guide to Language-Conditioned Manipulation Benchmarks
November 01, 2025 Benchmark Robotics VLA
LIBERO is useful when a robot policy must connect language instructions, visual observations, and task-level success in a controlled manipulation setting.

2024

RAG: Retrieval-Augmented Generation from Indexing to Answers
October 16, 2024 Large Language Models Retrieval RAG
Retrieval-Augmented Generation, usually shortened to RAG, is often introduced as a simple recipe: retrieve relevant documents, put them into the prompt, and let a language model answer. That description is correct, but it is too shallow. A useful RAG system is not a prompt trick. It is an evidence pipeline around a language model.
LoRA: Fine-Tuning Large Models with Low-Rank Updates
May 08, 2024 Large Language Models Fine-Tuning PEFT
Fine-tuning a large model is conceptually simple: start from a pretrained checkpoint, run gradient descent on downstream data, and save the adapted model. The difficulty is that modern checkpoints are too large for this workflow to be cheap. Updating every parameter requires large optimizer states, large gradients, and large checkpoints. If every task needs a full model copy, storage and serving quickly become the bottleneck.