Live Stream

Detect

People + Common Objects

Detect people and everyday objects in the same frame (images, video, or live streams)

eyepop.common-objects:latest

Model type

Pre-trained Model

Code Example

https://github.com/eyepop-ai/abilities-hub/tree/main/pretrained-models

How It Works

Detect not only people, but also the everyday objects they interact with—like furniture, electronics, tools, and common household items.

This model returns structured bounding box coordinates with confidence scores and class labels so you can build richer context around human-object interactions for workflows in retail, warehousing, and home automation.

Use it on images, recorded video, or live streams. No custom training required.

Optimized for:

Multi-class detection (people + common objects)
Scene understanding and interaction context
Frame-by-frame results for video
Cloud or On-Prem deployment
Fast setup for prototype → production
‍

Why This Model Exists

“Person detection” answers one question: Where are people?

But many real products need the next layer of context:
What are they interacting with?
What objects are present?
What changed in the scene?

Teams usually try to solve this by stitching together multiple models (person + object + custom rules). That tends to create friction:

Inconsistent labels and confidence behavior across models
More infrastructure, more points of failure
Harder debugging when outputs disagree
Slower iteration when you need “scene context” quickly

This model exists to provide a single, dependable baseline:
people + common objects in one pass, with a unified output schema—so you can build interaction logic, automation, and analytics without standing up a complex vision stack first.

‍

Key Capabilities

Input Types

Single images
Video files
RTSP / livestream feeds
Webcam / IP camera streams

Output

JSON with bounding boxes
Confidence scores
Object class labels (person + common objects)
Frame-level detections (for video/streams)

Deployment

EyePop Cloud
On-Premise AI Application Runtime
Edge devices with GPU or CPU

Setup

Create account
Get API key
Send media
Receive detections instantly

No training. No labeling. No model configuration.

‍

Example Output

{
  "objects": [
    {
      "category": "person",
      "classLabel": "person",
      "confidence": 0.945,
      "x": 312.4,
      "y": 418.7,
      "width": 284.1,
      "height": 512.6
    },
    {
      "category": "object",
      "classLabel": "laptop",
      "confidence": 0.921,
      "x": 428.2,
      "y": 296.1,
      "width": 402.8,
      "height": 268.3
    },
    {
      "category": "object",
      "classLabel": "potted_plant",
      "confidence": 0.884,
      "x": 778.6,
      "y": 212.9,
      "width": 166.7,
      "height": 174.4
    }
  ],
  "source_width": 1920,
  "source_height": 1080
}

(Swap the class list + label naming to match the exact taxonomy your model uses.)

Practical Use Cases

Retail & In-Store Analytics

Shopper + product proximity signals
“Pick up / put back” interaction cues (with rules)
Queue context (people + carts/baskets/items)
Loss prevention inputs (context around hands + items)

Warehousing & Operations

Worker + tool/equipment visibility
Station utilization (person + workstation objects)
Pallet/cart/forklift context (if included in taxonomy)
Process compliance cues (with ROI + thresholds)

Home & Smart Spaces

Presence + object awareness (lights, devices, furniture zones)
Contextual automations (“person near door + package present”)
Room state understanding (object changes over time)

Content Understanding & Indexing

Search footage by what’s in the scene (person + objects)
Auto-tagging and filtering by common item classes
Faster review workflows for large video libraries

‍

Why This Output Matters

Common-object detection gives you scene context.

That means you can derive:

Proximity (“person near laptop”)
Interaction likelihood (“hand near tool”)
State changes (“object appears/disappears”)
Zone behaviors (“person + object inside ROI”)

All from simple bounding boxes—without building a custom model first.

‍

Deployment Options

EyePop Cloud

Scalable
Managed infrastructure
Best for web apps + fast iteration

On-Premise Runtime

Keep video inside your network
Lower latency options
Works with GPU or CPU environments
Ideal for regulated or sensitive environments

Who This Is For

Developers who need “people + objects” context fast
Teams building automation rules from camera feeds
Product teams prototyping interaction-aware features
Anyone who wants a unified detection output without stitching models together