What Mobile Manipulation Unlocks
Fixed-base manipulation robots are limited to tasks within their workspace — a volume of 1–3 cubic meters centered at a fixed point. The vast majority of useful tasks that humans perform involve navigating to an object, grasping it, transporting it, and placing it somewhere else — often with the source and destination in different rooms. Mobile manipulation is the capability that enables this class of tasks.
The engineering challenge is that mobile manipulation is not simply "navigation + manipulation" — combining a mobile base with a manipulator arm creates coupling problems that each subsystem alone does not have. The arm's motion disturbs the base's stability. The base's motion creates errors in arm pose estimation. Grasping from a moving or recently-stopped platform introduces position error not present in fixed-base systems. Solving these coupling problems is the core technical challenge in mobile manipulation.
Platform Comparison
| Platform | Mobility | Arm | Price | Best For |
|---|---|---|---|---|
| Hello Robot Stretch 3 | Wheeled (omnidirectional) | Telescopic single 7-DOF | $25K | Healthcare, elderly assist, research |
| Boston Dynamics Spot + Arm | Legged (4-leg) | 6-DOF 4kg payload | $100K+ | Industrial inspection + manipulation |
| Mobile ALOHA (Stanford) | Wheeled (custom base) | Bimanual ALOHA arms | $32K (base+arms) | Bimanual mobile manipulation research |
| Unitree H1 + arm | Humanoid biped | Custom 7-DOF | $90K | Humanoid mobile manipulation research |
| Fetch Robotics FR2 | Wheeled (diff drive) | 7-DOF 6kg payload | $65K+ | Warehouse, logistics, shelf picking |
Whole-Body Control Approaches
Two dominant approaches exist for coordinating base and arm motion in mobile manipulation:
- MPC-Based Whole-Body Control: Treat the entire robot (base + arm) as a floating-base system and solve a joint trajectory optimization at 100–500Hz. This produces physically consistent base+arm motion that accounts for dynamic coupling. Computationally expensive: requires 20–100ms per solve on a modern processor for full humanoid systems (Unitree H1 + arm), which is tight for real-time control. Boston Dynamics and ETH Zurich's ANYmal implementations are the state of the art here.
- Hierarchical Control (virtual frame): Stabilize the base using a separate controller, then plan the arm trajectory in a "virtual frame" attached to the stabilized base. Computationally cheaper (two separate controllers rather than one joint optimization) and easier to tune. The tradeoff: doesn't fully account for dynamic coupling, so arm motions that disturb the base cause prediction errors. Preferred for wheeled systems where base dynamics are simpler.
Coordination Challenges
- Arm-Base Collision: The arm's workspace must be planned to avoid collision with the base, which requires a whole-body collision checker that includes both the arm and the base geometry simultaneously. This is often underimplemented — teams import standard arm collision checkers without updating them for the base geometry, resulting in collisions the planner didn't predict.
- Dynamic Stability: For legged systems, arm motion at high speed creates reaction torques that must be compensated by the legs. An arm moving at full speed with a 3kg payload creates 20–50Nm of reaction torque at the base — enough to destabilize a poorly-tuned legged system. Solutions: limit arm acceleration during locomotion, or couple the arm and leg controllers via whole-body MPC.
- Grasp from Moving or Recently-Stopped Base: Studies on Spot+Arm and similar platforms show that grasping accuracy is 3–5× worse when the base has stopped in the last 500ms, due to residual vibration and settling of the localization estimate. The practical fix: pause 1–2 seconds after stopping before initiating a grasp, or use force-torque feedback to compensate for position uncertainty during grasp approach.
Real Deployments
- Diligent Robotics Moxi (hospital logistics): Deployed at 10+ US hospitals including UT Southwestern, HCA Healthcare, and Ochsner. Moxi navigates hospital corridors and delivers supply trays to nursing stations — a navigation + transport task that doesn't require dexterous manipulation. This illustrates where mobile manipulation is commercially mature: transport and delivery rather than in-hand manipulation.
- 6 River Systems (warehouse): Chuck robots navigate alongside warehouse workers, carrying bins and reducing picker walking distance by 50–80%. Again: navigation + transport is the commercially deployed capability; arm manipulation is not yet deployed at this scale.
- Hello Robot (clinical trials, elderly assistance): Stretch robots are in multi-site clinical trials for in-home elderly assistance — reaching objects on shelves, opening doors, retrieving items. The simplicity of the Stretch design (single telescopic arm) is a feature here, not a limitation.
SVRC's solutions team can help scope mobile manipulation projects and connect you with the right platform for your application. See our solutions page for current offerings.
Key Research Papers (2024-2025)
The academic frontier in mobile manipulation has advanced rapidly. These papers represent the current state of the art:
- Mobile ALOHA (Zipeng Fu et al., Stanford, 2024): Demonstrated that co-training with static ALOHA data plus 50 mobile demonstrations enables mobile bimanual tasks (cooking, cleaning) at 80%+ success. The key insight: static manipulation skills transfer to mobile platforms with minimal additional mobile data. This dramatically reduces the data collection burden for mobile manipulation.
- TidyBot++ (Jimmy Wu et al., Princeton, 2024): A system for personalized tidying that uses LLM-based object placement reasoning combined with mobile manipulation. Deployed on Stretch in real homes. Success rate: 85% on seen object categories, 65% on novel objects. Illustrates the power of combining foundation models for high-level reasoning with learned policies for low-level execution.
- RoboCasa (Yuke Zhu et al., UT Austin/NVIDIA, 2024): Large-scale simulation benchmark with 100+ kitchen tasks designed for training mobile manipulation policies in simulation before transfer to real hardware. 2,500+ 3D objects, realistic physics. The dominant benchmark for sim-to-real mobile manipulation research.
- DexMobile (Qi et al., UC San Diego, 2025): Combines mobile base navigation with dexterous manipulation using a whole-body policy. Demonstrated on fetch-and-deliver tasks requiring both mobility and in-hand manipulation. 72% success on novel objects in unseen rooms — a significant advance over prior work.
- HomeRobot (CMU/Meta, 2024): Open-vocabulary mobile manipulation benchmark for household tasks. Standardized evaluation protocol with Stretch and Spot platforms. Provides a fair comparison across methods by fixing the hardware and evaluation criteria.
Data Collection for Mobile Manipulation
Mobile manipulation data collection adds complexity beyond fixed-base collection:
- Localization requirement: Every demo needs accurate 6-DOF base pose throughout the episode. Solutions: external motion capture (OptiTrack, $50,000+ for a room setup), onboard SLAM (less accurate but no external infrastructure), or fiducial markers (ArUco tags, $0.10 each, positioned around the environment).
- Environment consistency: The environment must be reset between episodes — not just object positions but also the robot's starting position, door states, drawer states, and clutter configuration. This increases reset time 3-5x compared to tabletop tasks.
- Navigation + manipulation coupling: Demonstrations must capture the full trajectory from navigation approach through manipulation through retreat. Partial demonstrations (navigation only, or manipulation only) lose the critical transition information between phases.
- Multi-room coverage: If the task spans multiple rooms, camera placement must cover all relevant spaces. A typical mobile manipulation collection setup uses 4-6 fixed cameras plus the robot's onboard cameras.
SVRC's Mountain View lab has configurable partitions and floor markings that enable repeatable multi-room mobile manipulation scenarios. Our mobile manipulation data collection packages include localization infrastructure, multi-camera recording, and full-trajectory capture. Contact us for mobile manipulation data pricing.
SVRC's Mobile Manipulation Capabilities
SVRC maintains Stretch 3 and Mobile ALOHA platforms for mobile manipulation research and data collection. Our capabilities include:
- Teleoperation: VR-based teleoperation for Stretch (Meta Quest 3 + custom mapping), leader-follower for Mobile ALOHA
- Environments: Configurable kitchen, living room, and warehouse scenarios in our 1,800 sq ft collection floor
- Data formats: HDF5 with full 6-DOF base pose, multi-camera synchronized video, joint states, and navigation trajectories
- Leasing: Mobile platforms available for lease for on-site data collection at customer facilities. See leasing.
Related Reading
- Robot Arm Buying Guide 2026 — choosing arms for mobile platforms
- Best Dexterous Robot Hands 2025 — end-effectors for mobile manipulation
- Data Collection Cost in 2026 — mobile manipulation pricing
- RL vs Imitation Learning — choosing the right training approach
- Warehouse Robot ROI — commercial mobile manipulation
- SVRC Data Services — mobile manipulation data collection
- Robot Leasing — mobile platform availability