Person w/3D Face, Body, and Hands
3D keypoints for face, body, and hands—from images, video, and live streams

eyepop.place.holder
Model type
Pre-trained Model
Description
Go deeper with 3D keypoint extraction. This model maps three-dimensional features of the human face, body, and hands—returning structured keypoint coordinates with confidence—so you can power detailed gesture recognition, sign language interpretation, and advanced VR/AR integrations.
Use it on images, recorded video, or live streams. No custom training required.
This model is optimized for:
- 3D keypoint outputs (x, y, z) for face, pose, and hands
- Multi-person scenes (where supported)
- Frame-by-frame results for video
- Cloud or On-Prem deployment
- Fast prototyping → production use
Why This Model Exists
Bounding boxes tell you where a person is. 2D keypoints tell you how they’re moving.
But some applications require a deeper layer of precision:
- Fine-grained hand articulation (fingers, pinches, grips)
- Face geometry signals (landmarks for expression + intent)
- Depth-aware pose (body position in 3D space)
- Stable tracking inputs for immersive or interactive systems
Most teams hit friction here because 3D keypoint pipelines are notoriously hard to stand up:
- Too many model formats and skeleton standards
- Large outputs that are painful to integrate and validate
- Inconsistent results across lighting, motion blur, occlusion, or camera angles
- Complexity that turns “cool demo” into “unshippable feature”
This model exists to remove that friction.
It provides a production-ready baseline for 3D face + body + hand keypoints so teams can focus on building experiences, analytics, and product logic not model wrangling.
Key Capabilities
Input Types
- Single images
- Video files
- RTSP / livestream feeds
- Webcam / IP camera streams
Output
- JSON with 3D keypoints (x, y, z)
- Confidence per keypoint
- Grouped by person + region (face / body / left hand / right hand)
- Frame-level results for video and streams
Deployment
- EyePop Cloud
- On-Premise AI Application Runtime
- Edge devices with GPU or CPU
Setup
- Create account
- Get API key
- Send media
- Receive structured 3D keypoints instantly
No training. No labeling. No model configuration.
Example Output
{
"keyPoints": [
{
"id": 34,
"points": [
{
"classId": 0,
"classLabel": "nose",
"confidence": 0.9491,
"id": 1,
"x": 234.161,
"y": 635.132,
"z": -1689.297
},
{
"classId": 1,
"classLabel": "left eye (inner)",
"confidence": 0.9426,
"id": 2,
"x": 260.307,
"y": 572.736,
"z": -1701.229
},
{
"classId": 2,
"classLabel": "left eye",
"confidence": 0.938,
"id": 3,
"x": 284.693,
"y": 567.8,
"z": -1700.895
},
{
"classId": 3,
"classLabel": "left eye (outer)",
"confidence": 0.9202,
"id": 4,
"x": 309.191,
"y": 563.186,
"z": -1701.186
},
{
"classId": 4,
"classLabel": "right eye (inner)",
"confidence": 0.9408,
"id": 5,
"x": 201.415,
"y": 579.59,
"z": -1704.338
},
{
"classId": 5,
"classLabel": "right eye",
"confidence": 0.9248,
"id": 6,
"x": 177.701,
"y": 579.957,
"z": -1704.576
},
{
"classId": 6,
"classLabel": "right eye (outer)",
"confidence": 0.9042,
"id": 7,
"x": 154.157,
"y": 580.182,
"z": -1704.742
},
{
"classId": 7,
"classLabel": "left ear",
"confidence": 0.9163,
"id": 8,
"x": 346.055,
"y": 564.255,
"z": -1444.702
},
{
"classId": 8,
"classLabel": "right ear",
"confidence": 0.905,
"id": 9,
"x": 134.999,
"y": 586.726,
"z": -1456.682
},
{
"classId": 9,
"classLabel": "mouth (left)",
"confidence": 0.7761,
"id": 10,
"x": 284.093,
"y": 669.511,
"z": -1557.855
},
{
"classId": 10,
"classLabel": "mouth (right)",
"confidence": 0.6729,
"id": 11,
"x": 205.256,
"y": 679.134,
"z": -1561.074
},
{
"classId": 11,
"classLabel": "left shoulder",
"confidence": 0.9718,
"id": 12,
"x": 403.004,
"y": 705.527,
"z": -1015.704
},
{
"classId": 12,
"classLabel": "right shoulder",
"confidence": 0.9674,
"id": 13,
"x": 162.64,
"y": 688.311,
"z": -1096.564
},
{
"classId": 13,
"classLabel": "left elbow",
"confidence": 0.9172,
"id": 14,
"x": 431.617,
"y": 884.688,
"z": -430.739
},
{
"classId": 14,
"classLabel": "right elbow",
"confidence": 0.9757,
"id": 15,
"x": 162.3,
"y": 800.236,
"z": -520.094
},
{
"classId": 15,
"classLabel": "left wrist",
"confidence": 0.8391,
"id": 16,
"x": 332.516,
"y": 963.84,
"z": -31.31
},
{
"classId": 16,
"classLabel": "right wrist",
"confidence": 0.9754,
"id": 17,
"x": 109.046,
"y": 679.181,
"z": 123.114
},
{
"classId": 17,
"classLabel": "left pinky",
"confidence": 0.7455,
"id": 18,
"x": 323.187,
"y": 1004.396,
"z": 43.841
},
{
"classId": 18,
"classLabel": "right pinky",
"confidence": 0.9598,
"id": 19,
"x": 97.278,
"y": 649.457,
"z": 195.701
},
{
"classId": 19,
"classLabel": "left index",
"confidence": 0.7072,
"id": 20,
"x": 302.3,
"y": 984.787,
"z": 8.918
},
{
"classId": 20,
"classLabel": "right index",
"confidence": 0.9442,
"id": 21,
"x": 75.983,
"y": 643.261,
"z": 174.003
},
{
"classId": 21,
"classLabel": "left thumb",
"confidence": 0.7428,
"id": 22,
"x": 299.241,
"y": 970.911,
"z": -29.879
},
{
"classId": 22,
"classLabel": "right thumb",
"confidence": 0.9626,
"id": 23,
"x": 93.032,
"y": 658.908,
"z": 134.589
},
{
"classId": 23,
"classLabel": "left hip",
"confidence": 0.9957,
"id": 24,
"x": 364.58,
"y": 1144.101,
"z": 20.53
},
{
"classId": 24,
"classLabel": "right hip",
"confidence": 0.9958,
"id": 25,
"x": 237.728,
"y": 1120.395,
"z": -20.034
},
{
"classId": 25,
"classLabel": "left knee",
"confidence": 0.8921,
"id": 26,
"x": 337.031,
"y": 1351.712,
"z": 492.682
},
{
"classId": 26,
"classLabel": "right knee",
"confidence": 0.9394,
"id": 27,
"x": 255.721,
"y": 1361.324,
"z": 435.646
},
{
"classId": 27,
"classLabel": "left ankle",
"confidence": 0.899,
"id": 28,
"x": 285.797,
"y": 1385.545,
"z": 1363.823
},
{
"classId": 28,
"classLabel": "right ankle",
"confidence": 0.8963,
"id": 29,
"x": 223.22,
"y": 1530.095,
"z": 1110.064
},
{
"classId": 29,
"classLabel": "left heel",
"confidence": 0.8834,
"id": 30,
"x": 283.148,
"y": 1397.849,
"z": 1462.036
},
{
"classId": 30,
"classLabel": "right heel",
"confidence": 0.8636,
"id": 31,
"x": 224.738,
"y": 1548.679,
"z": 1188.204
},
{
"classId": 31,
"classLabel": "left foot index",
"confidence": 0.7937,
"id": 32,
"x": 234.789,
"y": 1440.039,
"z": 1504.145
},
{
"classId": 32,
"classLabel": "right foot index",
"confidence": 0.6736,
"id": 33,
"x": 235.119,
"y": 1653.121,
"z": 1134.64
}
]
}
],
"seconds": 0,
"source_height": 1882,
"source_id": "578c8f4c-18c6-11f1-b631-8e1aed86f95b",
"source_width": 1094,
"system_timestamp": 1772737558305993000,
"timestamp": 0
}(Swap this schema to match your exact landmark set: Face mesh count, pose skeleton, and hand joint naming.)
Practical Use Cases
Gesture Recognition & Interaction
- Fine gesture inputs (pinch, point, grab, swipe)
- Hand motion + intent cues for UI control
- Touchless kiosk / interface interactions
Sign Language & Communication Interfaces
- Hand-shape + motion tracking inputs
- Landmark sequences for downstream interpretation models
- Accessibility tooling prototypes
VR/AR & Spatial Experiences
- 3D pose anchors for avatar control
- Face + hand rigging inputs
- Spatial interaction mapping for immersive apps
Animation & Character Systems
- Body + hand pose capture inputs
- Expression and facial landmark signals
- Retargeting pipelines (with your rigging layer)
Research & Analytics
- Detailed movement analysis
- Micro-motions in hands + face
- Behavior and interaction pattern studies (with appropriate consent/workflows)
Why 3D Keypoints Matter
3D keypoints give you depth-aware structure.
That unlocks:
- More reliable pose interpretation across angle changes
- Better tracking inputs for immersive systems
- Fine motor detail for hands and facial landmarks
- Cleaner downstream features for recognition and interaction logic
If you’re building anything “interactive” or “immersive,” 3D keypoints are often the difference between a novelty demo and a usable feature.
Deployment Options
EyePop Cloud
- Scalable
- Managed infrastructure
- Best for web apps + fast iteration
On-Premise Runtime
- Keep video inside your network
- Lower latency options
- Works with GPU or CPU environments
- Ideal for sensitive footage or regulated contexts
Who This Is For
- Developers building VR/AR or interactive camera experiences
- Teams working on gesture or sign-language interfaces
- Product teams needing structured 3D keypoint outputs without ML staffing
- Anyone who needs face + body + hands mapped in 3D—fast
Get early access
Want to move faster with visual automation? Request early access to Abilities and get notified as new vision capabilities roll out.