How to Build an Object Detection Model with Python

Download model file

1. Overview

Object detection involves identifying and locating objects in an image or video. This guide demonstrates how to build a basic object detection model using Python, OpenCV, and YOLO (You Only Look Once) for real-time detection.

2. Prerequisites

  • Basic knowledge of Python programming.

  • Familiarity with OpenCV.

  • Required installations: cv2 (OpenCV library) and imutils.

Install OpenCV:

pip install opencv-python
pip install opencv-python-headless
pip install imutils

3. Code Walkthrough and Explanation

Here’s a step-by-step explanation of the code:


import cv2
import imutils
  • cv2: OpenCV library for computer vision tasks.

  • imutils: Simplifies image resizing and transformations.


# Load the pre-trained deep neural network (DNN) model
net = cv2.dnn.readNet("dnn_model/yolov4-tiny.cfg", "dnn_model/yolov4-tiny.weights")
model = cv2.dnn_DetectionModel(net)
model.setInputParams(size=(500, 500), scale=1/255)
  • cv2.dnn.readNet: Loads the YOLO model configuration (.cfg) and pre-trained weights (.weights) for object detection.

  • cv2.dnn_DetectionModel: Initializes the detection model from the network.

  • setInputParams: Configures input parameters:

    • size=(500, 500): Resizes input images to 500x500 for the model.

    • scale=1/255: Normalizes pixel values to the range [0, 1].


# Load class names for object detection
classes = []
with open("dnn_model/classes.txt", "r") as file_object:
    for class_name in file_object.readlines():
        class_name = class_name.strip()
        classes.append(class_name)
  • classes.txt: Contains names of detectable objects (one per line).

  • strip: Removes leading/trailing whitespace.


# Access the webcam
cam = cv2.VideoCapture(0)
  • cv2.VideoCapture(0): Opens the default webcam (index 0).

while True:
    _, frame = cam.read()
    frame = imutils.resize(frame, height=450, width=900)
  • cam.read(): Captures a single frame from the webcam.

  • imutils.resize: Resizes the frame to fit the desired dimensions (450px height, 900px width).


    # Perform object detection
    (class_ids, scores, bboxes) = model.detect(frame)
    for class_id, score, bbox in zip(class_ids, scores, bboxes):
        (x, y, w, h) = bbox
        class_name = classes[class_id]
  • model.detect(frame): Detects objects in the frame, returning:

    • class_ids: IDs of detected classes.

    • scores: Confidence scores of detections.

    • bboxes: Bounding boxes (x, y, width, height).

  • zip: Groups the three outputs for iteration.

  • (x, y, w, h): Unpacks each bounding box.

  • class_name = classes[class_id]: Maps class ID to its name.


        # Annotate the frame
        cv2.putText(frame, class_name, (x, y - 15), cv2.FONT_HERSHEY_PLAIN, 3, (200, 0, 50), 2)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (200, 0, 50), 4)
  • cv2.putText: Adds the class name above the bounding box:

    • (x, y - 15): Position.

    • cv2.FONT_HERSHEY_PLAIN, 3: Font style and size.

    • (200, 0, 50), 2: Color and thickness.

  • cv2.rectangle: Draws the bounding box:

    • (x, y), (x + w, y + h): Start and end coordinates.

    • (200, 0, 50), 4: Color and thickness.


    cv2.imshow('object detection', frame)
    cv2.waitKey(1)
  • cv2.imshow: Displays the annotated frame in a window.

  • cv2.waitKey(1): Captures keyboard input and refreshes the display.


4. Execution

  • Run the script.

  • Allow webcam access.

  • View real-time object detection in the display window.

5. Improvement Tips

  • Use a GPU-optimized version of YOLO for faster detection.

  • Fine-tune detection thresholds for better accuracy.

6. Conclusion

Building an object detection model in Python is straightforward with libraries like OpenCV. With YOLO, you can achieve real-time detection efficiently. Expand this project by integrating it into larger systems, such as security or traffic monitoring applications.