👁
Sensor Architecture
LiDAR vs. Vision-Only: The Biggest Philosophical Split in Humanoid Perception
The humanoid robotics industry has split into two camps on perception — and the choice reflects deep assumptions about what AI can and cannot do reliably. LiDAR-equipped robots get precise 3D geometry data at 10–100m range regardless of lighting, but add weight, power draw, cost ($800–$3,000/sensor), and mechanical complexity. Vision-only systems bet that neural networks trained on massive datasets can infer depth, geometry, and object identity from stereo or monocular cameras alone — cheaper, lighter, but brittle in unusual lighting or novel environments.
Tesla's bet on vision-only for Optimus directly mirrors their FSD strategy: scale data, scale parameters, replace sensors with intelligence. Agility, Boston Dynamics, and most others carry at least one depth sensor — whether active stereo (Intel RealSense), structured light, or LiDAR — acknowledging that current vision models still fail in edge cases that geometric sensing handles trivially.
Vision-Only (Tesla)
Active Stereo (Agility)
LiDAR (Boston Dynamics)
Depth Estimation
Event Cameras (Figure)
Tesla Optimus
Vision-only. ~5 cameras (head stereo pair + peripheral). No LiDAR, no structured light. Depth estimated by neural network. Advantage: no moving parts, lowest BOM. Risk: failure in reflective/dark environments.
Agility Digit
LiDAR + Intel RealSense depth cameras + MEMS IMU. Confirmed by multiple official sources: Digit uses a neck-mounted LiDAR for navigation and 4× Intel RealSense depth cameras for manipulation and environment sensing. Force sensors in arms for compliant manipulation.
Boston Dynamics Atlas
LiDAR + cameras (unconfirmed specific models). Boston Dynamics has not published Atlas Gen 2 sensor specifications. Previous HD Atlas used LiDAR and stereo cameras. The electric Atlas is confirmed to use RGB-D cameras and IMU; specific sensor models not disclosed.
Figure 03
6-camera vision system (confirmed). Figure has confirmed a multi-camera array with palm cameras and tactile fingertip sensors. Specific camera model names and any supplemental depth sensors are not publicly disclosed by Figure AI.
Unitree G1
Stereo + structured light. 3D LiDAR on head for navigation, stereo camera for manipulation. Mid-range approach balancing cost and reliability.