How to Build an Object Detection Model with Python

1. Overview

Object detection involves identifying and locating objects in an image or video. This guide demonstrates how to build a basic object detection model using Python, OpenCV, and YOLO (You Only Look Once) for real-time detection.

2. Prerequisites

  • Basic knowledge of Python programming.

  • Familiarity with OpenCV.

  • Required installations: cv2 (OpenCV library) and imutils.

Install OpenCV:

pip install opencv-python
pip install opencv-python-headless
pip install imutils

3. Code Walkthrough and Explanation

Here’s a step-by-step explanation of the code:

import cv2
import imutils
  • cv2: OpenCV library for computer vision tasks.

  • imutils: Simplifies image resizing and transformations.

# Load the pre-trained deep neural network (DNN) model
net = cv2.dnn.readNet("dnn_model/yolov4-tiny.cfg", "dnn_model/yolov4-tiny.weights")
model = cv2.dnn_DetectionModel(net)
model.setInputParams(size=(500, 500), scale=1/255)
  • cv2.dnn.readNet: Loads the YOLO model configuration (.cfg) and pre-trained weights (.weights) for object detection.

  • cv2.dnn_DetectionModel: Initializes the detection model from the network.

  • setInputParams: Configures input parameters:

    • size=(500, 500): Resizes input images to 500x500 for the model.

    • scale=1/255: Normalizes pixel values to the range [0, 1].

# Load class names for object detection
classes = []
with open("dnn_model/classes.txt", "r") as file_object:
    for class_name in file_object.readlines():
        class_name = class_name.strip()
  • classes.txt: Contains names of detectable objects (one per line).

  • strip: Removes leading/trailing whitespace.

# Access the webcam
cam = cv2.VideoCapture(0)
  • cv2.VideoCapture(0): Opens the default webcam (index 0).

while True:
    _, frame =
    frame = imutils.resize(frame, height=450, width=900)
  • Captures a single frame from the webcam.

  • imutils.resize: Resizes the frame to fit the desired dimensions (450px height, 900px width).

    # Perform object detection
    (class_ids, scores, bboxes) = model.detect(frame)
    for class_id, score, bbox in zip(class_ids, scores, bboxes):
        (x, y, w, h) = bbox
        class_name = classes[class_id]
  • model.detect(frame): Detects objects in the frame, returning:

    • class_ids: IDs of detected classes.

    • scores: Confidence scores of detections.

    • bboxes: Bounding boxes (x, y, width, height).

  • zip: Groups the three outputs for iteration.

  • (x, y, w, h): Unpacks each bounding box.

  • class_name = classes[class_id]: Maps class ID to its name.

        # Annotate the frame
        cv2.putText(frame, class_name, (x, y - 15), cv2.FONT_HERSHEY_PLAIN, 3, (200, 0, 50), 2)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (200, 0, 50), 4)
  • cv2.putText: Adds the class name above the bounding box:

    • (x, y - 15): Position.

    • cv2.FONT_HERSHEY_PLAIN, 3: Font style and size.

    • (200, 0, 50), 2: Color and thickness.

  • cv2.rectangle: Draws the bounding box:

    • (x, y), (x + w, y + h): Start and end coordinates.

    • (200, 0, 50), 4: Color and thickness.

    cv2.imshow('object detection', frame)
  • cv2.imshow: Displays the annotated frame in a window.

  • cv2.waitKey(1): Captures keyboard input and refreshes the display.

4. Execution

  • Run the script.

  • Allow webcam access.

  • View real-time object detection in the display window.

5. Improvement Tips

  • Use a GPU-optimized version of YOLO for faster detection.

  • Fine-tune detection thresholds for better accuracy.

6. Conclusion

Building an object detection model in Python is straightforward with libraries like OpenCV. With YOLO, you can achieve real-time detection efficiently. Expand this project by integrating it into larger systems, such as security or traffic monitoring applications.