基于改进的YOLOv8算法在实践教学视频融合系统中的应用

朱列

doi:10.3969/j.issn.1000-386x.2025.05.013

基于改进的YOLOv8算法在实践教学视频融合系统中的应用

朱列

THE APPLICATION OF THE IMPROVED YOLOv8 ALGORITHM IN THE PRACTICAL TEACHING VIDEO FUSION SYSTEM

Zhu Lie

摘要

摘要: 为了实现新能源汽车实践教学站点智慧管理，提升教学站点安全管理能力，针对新能源汽车教学站点存在的学员遮挡问题，提出一种改进的YOLOv8目标识别算法。通过构建深度过参数化高效聚合网络，借助不同卷积校对各通道进行计算，从而强化目标未遮挡部分的特征提取能力；为了增强对目标遮挡的识别，使用多层自注意力机制SwiftFormerEncoder替换YOLOv8的C2f模块中的Bottleneck，增强遮挡部分网络特征提取能力；为了提升视频中特征信息识别速度，在模型卷积中进行基于BN层的通道剪枝，并使用UloU改进损失函数，在提升模型密集场景识别率的同时减少模型内存占有率。该文所述改进算法在UnrealSynth（虚幻合成数据生成器）进行训练和验证并在实践教学基地自有数据集分别进行测试，测试结果表明改进后的模型mAP达到0.955。在视频融合识别系统中应用结果表明，该模型在提升识别速度的同时，可以有效提升识别精度，整体性能更优。

Abstract: To achieve intelligent management of new energy vehicle teaching sites and enhance the safety management capabilities of these sites, an improved YOLOv8 object recognition algorithm is proposed to address the issue of student occlusion at new energy vehicle sites. By constructing a deep over-parameterized efficient aggregation network, different convolution kernels were utilized to calculate each channel, thereby enhancing the feature extraction ability of the unobscured parts of the target. To enhance the recognition of target occlusion, the SwiftFormerEncoder, a multi-layer self-attention mechanism, was used to replace the Bottleneck in the C2f module of YOLOv8, thereby enhancing the network’s feature extraction capability. To enhance the speed of feature information recognition in the video, channel pruning based on the BN layer was conducted in the model convolution, and the UloU was used to improve the loss function, reducing the model’s memory usage and improving the recognition accuracy in dense scenes. The improved algorithm was trained and validated on UnrealSynth (a synthetic data generator for Unreal) and tested on the self-owned datasets of the practical teaching base respectively. The test results show that the mAP of the improved model reached 0.955. Testing results in the video fusion recognition system indicate that this model not only increases recognition speed but also effectively improves recognition accuracy, offering superior overall performance.

HTML全文

参考文献(0)

施引文献

资源附件(0)