People + Common Objects
Detect people and everyday objects in the same frame (images, video, or live streams)

eyepop.common-objects:latest
Model type
Pre-trained Model
Description
Detect not only people, but also the everyday objects they interact with—like furniture, electronics, tools, and common household items.
This model returns structured bounding box coordinates with confidence scores and class labels so you can build richer context around human-object interactions for workflows in retail, warehousing, and home automation.
Use it on images, recorded video, or live streams. No custom training required.
Optimized for:
- Multi-class detection (people + common objects)
- Scene understanding and interaction context
- Frame-by-frame results for video
- Cloud or On-Prem deployment
- Fast setup for prototype → production
Why This Model Exists
“Person detection” answers one question: Where are people?
But many real products need the next layer of context:
What are they interacting with?
What objects are present?
What changed in the scene?
Teams usually try to solve this by stitching together multiple models (person + object + custom rules). That tends to create friction:
- Inconsistent labels and confidence behavior across models
- More infrastructure, more points of failure
- Harder debugging when outputs disagree
- Slower iteration when you need “scene context” quickly
This model exists to provide a single, dependable baseline:
people + common objects in one pass, with a unified output schema—so you can build interaction logic, automation, and analytics without standing up a complex vision stack first.
Key Capabilities
Input Types
- Single images
- Video files
- RTSP / livestream feeds
- Webcam / IP camera streams
Output
- JSON with bounding boxes
- Confidence scores
- Object class labels (person + common objects)
- Frame-level detections (for video/streams)
Deployment
- EyePop Cloud
- On-Premise AI Application Runtime
- Edge devices with GPU or CPU
Setup
- Create account
- Get API key
- Send media
- Receive detections instantly
No training. No labeling. No model configuration.
Example Output
{
"objects": [
{
"category": "person",
"classLabel": "person",
"confidence": 0.945,
"x": 312.4,
"y": 418.7,
"width": 284.1,
"height": 512.6
},
{
"category": "object",
"classLabel": "laptop",
"confidence": 0.921,
"x": 428.2,
"y": 296.1,
"width": 402.8,
"height": 268.3
},
{
"category": "object",
"classLabel": "potted_plant",
"confidence": 0.884,
"x": 778.6,
"y": 212.9,
"width": 166.7,
"height": 174.4
}
],
"source_width": 1920,
"source_height": 1080
}(Swap the class list + label naming to match the exact taxonomy your model uses.)
Practical Use Cases
Retail & In-Store Analytics
- Shopper + product proximity signals
- “Pick up / put back” interaction cues (with rules)
- Queue context (people + carts/baskets/items)
- Loss prevention inputs (context around hands + items)
Warehousing & Operations
- Worker + tool/equipment visibility
- Station utilization (person + workstation objects)
- Pallet/cart/forklift context (if included in taxonomy)
- Process compliance cues (with ROI + thresholds)
Home & Smart Spaces
- Presence + object awareness (lights, devices, furniture zones)
- Contextual automations (“person near door + package present”)
- Room state understanding (object changes over time)
Content Understanding & Indexing
- Search footage by what’s in the scene (person + objects)
- Auto-tagging and filtering by common item classes
- Faster review workflows for large video libraries
Why This Output Matters
Common-object detection gives you scene context.
That means you can derive:
- Proximity (“person near laptop”)
- Interaction likelihood (“hand near tool”)
- State changes (“object appears/disappears”)
- Zone behaviors (“person + object inside ROI”)
All from simple bounding boxes—without building a custom model first.
Deployment Options
EyePop Cloud
- Scalable
- Managed infrastructure
- Best for web apps + fast iteration
On-Premise Runtime
- Keep video inside your network
- Lower latency options
- Works with GPU or CPU environments
- Ideal for regulated or sensitive environments
Who This Is For
- Developers who need “people + objects” context fast
- Teams building automation rules from camera feeds
- Product teams prototyping interaction-aware features
- Anyone who wants a unified detection output without stitching models together
Get early access
Want to move faster with visual automation? Request early access to Abilities and get notified as new vision capabilities roll out.