How to Build an Object Detection Model with Python
1. Overview
Object detection involves identifying and locating objects in an image or video. This guide demonstrates how to build a basic object detection model using Python, OpenCV, and YOLO (You Only Look Once) for real-time detection.
2. Prerequisites
Basic knowledge of Python programming.
Familiarity with OpenCV.
Required installations:
cv2
(OpenCV library) andimutils
.
Install OpenCV:
pip install opencv-python
pip install opencv-python-headless
pip install imutils
3. Code Walkthrough and Explanation
Here’s a step-by-step explanation of the code:
import cv2
import imutils
cv2
: OpenCV library for computer vision tasks.imutils
: Simplifies image resizing and transformations.
# Load the pre-trained deep neural network (DNN) model
net = cv2.dnn.readNet("dnn_model/yolov4-tiny.cfg", "dnn_model/yolov4-tiny.weights")
model = cv2.dnn_DetectionModel(net)
model.setInputParams(size=(500, 500), scale=1/255)
cv2.dnn.readNet
: Loads the YOLO model configuration (.cfg
) and pre-trained weights (.weights
) for object detection.cv2.dnn_DetectionModel
: Initializes the detection model from the network.setInputParams
: Configures input parameters:size=(500, 500)
: Resizes input images to 500x500 for the model.scale=1/255
: Normalizes pixel values to the range [0, 1].
# Load class names for object detection
classes = []
with open("dnn_model/classes.txt", "r") as file_object:
for class_name in file_object.readlines():
class_name = class_name.strip()
classes.append(class_name)
classes.txt
: Contains names of detectable objects (one per line).strip
: Removes leading/trailing whitespace.
# Access the webcam
cam = cv2.VideoCapture(0)
cv2.VideoCapture(0)
: Opens the default webcam (index0
).
while True:
_, frame = cam.read()
frame = imutils.resize(frame, height=450, width=900)
cam.read
()
: Captures a single frame from the webcam.imutils.resize
: Resizes the frame to fit the desired dimensions (450px height, 900px width).
# Perform object detection
(class_ids, scores, bboxes) = model.detect(frame)
for class_id, score, bbox in zip(class_ids, scores, bboxes):
(x, y, w, h) = bbox
class_name = classes[class_id]
model.detect(frame)
: Detects objects in the frame, returning:class_ids
: IDs of detected classes.scores
: Confidence scores of detections.bboxes
: Bounding boxes (x, y, width, height).
zip
: Groups the three outputs for iteration.(x, y, w, h)
: Unpacks each bounding box.class_name = classes[class_id]
: Maps class ID to its name.
# Annotate the frame
cv2.putText(frame, class_name, (x, y - 15), cv2.FONT_HERSHEY_PLAIN, 3, (200, 0, 50), 2)
cv2.rectangle(frame, (x, y), (x + w, y + h), (200, 0, 50), 4)
cv2.putText
: Adds the class name above the bounding box:(x, y - 15)
: Position.cv2.FONT_HERSHEY_PLAIN, 3
: Font style and size.(200, 0, 50), 2
: Color and thickness.
cv2.rectangle
: Draws the bounding box:(x, y), (x + w, y + h)
: Start and end coordinates.(200, 0, 50), 4
: Color and thickness.
cv2.imshow('object detection', frame)
cv2.waitKey(1)
cv2.imshow
: Displays the annotated frame in a window.cv2.waitKey(1)
: Captures keyboard input and refreshes the display.
4. Execution
Run the script.
Allow webcam access.
View real-time object detection in the display window.
5. Improvement Tips
Use a GPU-optimized version of YOLO for faster detection.
Fine-tune detection thresholds for better accuracy.
6. Conclusion
Building an object detection model in Python is straightforward with libraries like OpenCV. With YOLO, you can achieve real-time detection efficiently. Expand this project by integrating it into larger systems, such as security or traffic monitoring applications.