In this article, you’ll discover what YOLO is and how to use it to analyze a video and detect and count people within it.
We’ll walk through a simple Python script that lets you configure basic parameters and choose the most appropriate video source for your needs.
What is YOLO?
YOLO (You Only Look Once) is a real-time object detection algorithm based on neural networks.
Put simply, YOLO analyzes each video frame (whether from a file or a webcam/IP stream) to identify objects such as people, vehicles, animals, and more.
🔗 For an in-depth look at YOLO, check out this link.
What is OpenCV?
OpenCV (Open Source Computer Vision Library) is a powerful open-source library originally developed by Intel. Its main goal is to facilitate image processing and computer vision.
With OpenCV, we can process video frames and visually display recognized objects by drawing bounding boxes around them.
🔗 Want to know more about OpenCV? Visit this link.
What is Python?
Python is one of the most popular programming languages in the world. It’s known for its simplicity and power, and is widely used in data science, AI, and automation.
🔗 We’ve already covered Python in this article.
The Idea
This mini-project is meant to be a gentle introduction to artificial intelligence, allowing us to get started without too much complexity or overwhelm.
What You Need
Here’s what I used for this project:
- Python 3.12.6
- YOLO 8.3.169
- OpenCV 4.12.0.88
- GPU Nvidia GeForce RTX 3060 Ti
The Code
The script was designed to let users configure the following:
- Select the YOLO model to use
- Choose the video source (file, webcam, IP stream)
- Enable or disable GPU usage
Depending on the chosen settings, the performance and playback speed may vary.
Once the settings are selected, the appropriate YOLO model is automatically downloaded.
The core functionality is implemented in the cv_player.py
file.
Initializing the Output Window
The video frame dimensions are read:
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
The frames per second (fps) value is also retrieved:
frames_per_second = int(cap.get(cv2.CAP_PROP_FPS))
Frame Analysis and Tracking
Each frame is analyzed using the selected YOLO model:
results = model(img, stream=True, device=selected_device)
From the detection results, we extract the coordinates to draw bounding boxes.
The script also uses the deep_sort_realtime
library to assign a unique ID to each detected person, making it possible to track them across frames:
tracker = DeepSort(max_age=frames_per_second)
Visual Output
We update the frame with tracking info:
cvzone.putTextRect(img, f'ID: {track_id}', (x1, max(30, y1 - 10)), scale=1, thickness=1)
cvzone.putTextRect(img, f"Detected {len(unique_ids)} unique people", (15, 25), scale=0.75, thickness=1)
cv2.imshow("main", img)
Conclusion
This was my first hands-on experience with YOLO, OpenCV, and object detection through AI.
I hope to improve this script or build new ones in the future to dive deeper into the world of computer vision and artificial intelligence.
🔗 For the full code, click this link.
💬 Got suggestions or feedback? Feel free to leave a comment below!