How to Build an Object Detection Model with Python

1. Overview

Object detection involves identifying and locating objects in an image or video. This guide demonstrates how to build a basic object detection model using Python, OpenCV, and YOLO (You Only Look Once) for real-time detection.

2. Prerequisites

Basic knowledge of Python programming.
Familiarity with OpenCV.
Required installations: cv2 (OpenCV library) and imutils.

Install OpenCV:

pip install opencv-python
pip install opencv-python-headless
pip install imutils

3. Code Walkthrough and Explanation

Here’s a step-by-step explanation of the code:

import cv2
import imutils

cv2: OpenCV library for computer vision tasks.
imutils: Simplifies image resizing and transformations.

# Load the pre-trained deep neural network (DNN) model
net = cv2.dnn.readNet("dnn_model/yolov4-tiny.cfg", "dnn_model/yolov4-tiny.weights")
model = cv2.dnn_DetectionModel(net)
model.setInputParams(size=(500, 500), scale=1/255)

cv2.dnn.readNet: Loads the YOLO model configuration (.cfg) and pre-trained weights (.weights) for object detection.
cv2.dnn_DetectionModel: Initializes the detection model from the network.
setInputParams: Configures input parameters:
- size=(500, 500): Resizes input images to 500x500 for the model.
- scale=1/255: Normalizes pixel values to the range [0, 1].

# Load class names for object detection
classes = []
with open("dnn_model/classes.txt", "r") as file_object:
    for class_name in file_object.readlines():
        class_name = class_name.strip()
        classes.append(class_name)

classes.txt: Contains names of detectable objects (one per line).
strip: Removes leading/trailing whitespace.

# Access the webcam
cam = cv2.VideoCapture(0)

cv2.VideoCapture(0): Opens the default webcam (index 0).

while True:
    _, frame = cam.read()
    frame = imutils.resize(frame, height=450, width=900)

cam.read(): Captures a single frame from the webcam.
imutils.resize: Resizes the frame to fit the desired dimensions (450px height, 900px width).

    # Perform object detection
    (class_ids, scores, bboxes) = model.detect(frame)
    for class_id, score, bbox in zip(class_ids, scores, bboxes):
        (x, y, w, h) = bbox
        class_name = classes[class_id]

model.detect(frame): Detects objects in the frame, returning:
- class_ids: IDs of detected classes.
- scores: Confidence scores of detections.
- bboxes: Bounding boxes (x, y, width, height).
zip: Groups the three outputs for iteration.
(x, y, w, h): Unpacks each bounding box.
class_name = classes[class_id]: Maps class ID to its name.

        # Annotate the frame
        cv2.putText(frame, class_name, (x, y - 15), cv2.FONT_HERSHEY_PLAIN, 3, (200, 0, 50), 2)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (200, 0, 50), 4)

cv2.putText: Adds the class name above the bounding box:
- (x, y - 15): Position.
- cv2.FONT_HERSHEY_PLAIN, 3: Font style and size.
- (200, 0, 50), 2: Color and thickness.
cv2.rectangle: Draws the bounding box:
- (x, y), (x + w, y + h): Start and end coordinates.
- (200, 0, 50), 4: Color and thickness.

    cv2.imshow('object detection', frame)
    cv2.waitKey(1)

cv2.imshow: Displays the annotated frame in a window.
cv2.waitKey(1): Captures keyboard input and refreshes the display.

4. Execution

Run the script.
Allow webcam access.
View real-time object detection in the display window.

5. Improvement Tips

Use a GPU-optimized version of YOLO for faster detection.
Fine-tune detection thresholds for better accuracy.

6. Conclusion

Building an object detection model in Python is straightforward with libraries like OpenCV. With YOLO, you can achieve real-time detection efficiently. Expand this project by integrating it into larger systems, such as security or traffic monitoring applications.