📌 DETR + Keypoint Estimation (COCO Subset)

🧠 Model Overview

This project combines:

🤖 facebook/detr-resnet-50 (object detector)
🧱 Custom PyTorch keypoint head
📊 Trained on 500-person subset of COCO 2017 Keypoints

The system detects people using DETR, then predicts 17 COCO-style keypoints (top-down) using heatmap regression.

📂 Files Included

File	Description
`pytorch_model.bin`	Trained PyTorch model weights
`05_detr_pose_coco_colab.ipynb`	Full Colab notebook (training + inference)
`config.json`	Basic model metadata
`README.md`	Project description

📚 Dataset

Subset: 500 images from COCO val2017 with visible persons
Annotations: 17 keypoints per person
Source: COCO Keypoints

🏗️ Architecture

[ Input Image ]
      │
      ▼
[ DETR (Person BBox) ]
      │
      ▼
[ Crop + Resize (256×256) ]
      │
      ▼
[ CNN Keypoint Head ]
      │
      ▼
[ 17 Heatmaps (Keypoints) ]

🚀 Quick Start

import torch
from model import KeypointHead

model = KeypointHead()
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()

🧪 Inference Demo

from PIL import Image
import cv2, numpy as np
from transformers import DetrImageProcessor, DetrForObjectDetection

img = Image.open('sample.jpg')
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
detector = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

inputs = processor(images=img, return_tensors="pt")
outputs = detector(**inputs)
results = processor.post_process_object_detection(outputs, target_sizes=[img.size[::-1]], threshold=0.8)[0]

# Use results['boxes'][0] to crop person
# Feed crop into model(img) to get 17 heatmaps

🧠 Training (optional)

To fine-tune on your own dataset:

Convert your data to COCO format
Use the notebook provided (05_detr_pose_coco_colab.ipynb)
Change paths and re-train

✨ Credit

Downloads last month: 5

Inference Providers NEW

Keypoint Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Heatmap MSE on COCO 2017 (50-person subset)
self-reported

~0.02

View on Papers With Code