Most robots can see, hear and even chat — but ask them to pick up an eggshell without crushing it, and the illusion of intelligence shatters fast. That missing sense is touch, and Hong Kong-based DAIMON Robotics wants to fix it. This April the company released Daimon-Infinity, which it bills as the largest omni-modal robotic dataset for physical AI, packed with high-resolution tactile data spanning everything from folding laundry to factory assembly.
The numbers behind it are eye-catching. Daimon-Infinity is built at million-hour scale, drawing on more than 80 real-world scenarios and over 2,000 human skills, with ultra-high-resolution tactile feedback layered alongside vision, motion trajectories and natural language. Crucially, DAIMON has open-sourced 10,000 hours of that data, betting that the whole embodied-AI field will move faster if everyone has better fuel to train on.
The project leans on a roster of heavyweight collaborators across China and beyond, including Google DeepMind, Northwestern University, the National University of Singapore and China Mobile. Rather than centralized data factories, DAIMON says it runs the world’s largest distributed out-of-lab collection network — a lightweight, scalable system that gathers data across diverse environments and can generate millions of hours per year.
Why tactile changes the equation
The dominant paradigm in robotics today is the Vision-Language-Action (VLA) model. DAIMON’s co-founder and chief scientist, Prof. Michael Yu Wang, argues that vision alone leaves robots half-blind to the physical world. His team has pioneered a Vision-Tactile-Language-Action (VTLA) architecture, elevating touch to a modality on equal footing with sight.
The reasoning is practical. Without tactile feedback, robots struggle to find objects in the dark, can’t detect slip, and routinely drop fragile items or apply the wrong force — failing the task or breaking the part. Dexterous manipulation, as opposed to simply gripping, depends on knowing exactly what your fingertips are doing.
Touch in monochrome
DAIMON’s hardware is what makes the dataset possible. The company built what it calls the world’s first monochromatic vision-based tactile sensor, cramming over 110,000 effective sensing units into a fingertip-sized module. Instead of measuring force directly, the sensor captures visual images of how its soft surface deforms on contact, recording sequences over time that encode forces, slip, friction, material properties and surface texture.
Because the output is essentially imagery, it slots neatly into vision-first AI frameworks — which is precisely why VTLA works so cleanly. Wang frames the monochromatic design as an engineering decision: a way to balance cost, reliability and sensitivity without the complexity of tri-color optics.
Wang knows the territory. He earned his PhD at Carnegie Mellon studying manipulation under Matt Mason, founded the Robotics Institute at HKUST, and is an IEEE Fellow with roughly four decades in the field. His early bets on where touch-enabled robots land first range from hotels to convenience stores in China — places where machines need to handle real, messy, fragile objects rather than tidy lab props.