Real-Time Object Tracking & Counting Pipeline with DETR and SORT

This repository demonstrates a complete pipeline for real-time object detection and tracking in video streams. It integrates a powerful pre-trained detection model (Facebook's DETR) with a classic and efficient tracking algorithm (SORT) to provide cumulative counts of unique objects.

This project serves as a proof-of-concept for building practical computer vision applications, such as automated traffic analysis, retail footfall counting, or security monitoring.

Project Demo

Author: Arizal Firdaus

πŸš€ Pipeline Overview

The system processes a video frame-by-frame, following these core steps:

  1. Detection: The DETR model identifies and provides bounding boxes for all target objects in the current frame.
  2. Tracking: The detections are passed to the SORT algorithm, which assigns a unique ID to each object and tracks its position over time.
  3. Counting: A "Count All Unique IDs" logic is implemented. The system maintains a list of all unique object IDs it has seen. When a new, previously unseen ID appears, the total count for its object class is incremented by one.

πŸ› οΈ Core Components

This application is built by combining two key components:

  • Detector: facebook/detr-resnet-50. A state-of-the-art end-to-end object detector developed by Facebook AI. It is pre-trained on the COCO dataset, enabling it to recognize 80 common object categories.
  • Tracker: SORT (Simple Online and Realtime Tracking). A classic, lightweight, and efficient tracking algorithm that excels at maintaining object identities in real-time.

πŸš€ Usage

There are two ways to explore this project, both available directly from this Hugging Face repository.

1. Live Demo on Hugging Face Spaces (Recommended)

The easiest way to see this project in action is to use the interactive demo hosted in the "Spaces" tab of this repository. This allows you to upload your own video or use a sample directly in your browser without any installation.

πŸ‘‰ Try the Live Demo Here!

2. Local Installation (For Developers)

If you want to run the code on your own machine, you can clone this repository directly from Hugging Face.

  1. Clone the Hugging Face repository:

    git clone https://huggingface.co/RijalMuluk/detr-sort-object-counting
    cd detr-sort-object-counting
    
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install the required dependencies:

    pip install -r requirements.txt 
    
  4. Run the Notebook:

    • The project notebook (.ipynb) is included in this repository.
    • Place your input video file in the project directory.
    • Open the notebook in a Jupyter environment and run all the cells. The processed video will be saved locally.

⚠️ Limitations and Considerations

As a baseline proof-of-concept, this system has several known limitations:

  • Occlusion: The SORT tracker's performance can degrade when objects are heavily occluded (i.e., hidden behind one another). This can sometimes lead to ID switches. More advanced trackers like DeepSORT or ByteTrack could be used to mitigate this.
  • Confidence Threshold: The number of objects detected and subsequently tracked is highly sensitive to the threshold parameter set during detection. A low threshold may increase false positives, while a high one may miss valid objects.
  • Domain Specificity: The underlying DETR model was trained on the COCO dataset. Its performance may vary on videos with different lighting conditions, camera angles, or object types not well-represented in COCO.

πŸ™ Acknowledgements and Credits

This project was made possible by leveraging several key resources:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RijalMuluk/detr-sort-object-counting

Finetuned
(730)
this model