|
1.INTRODUCTIONAgriculture is one of the largest and most important industries in Cyprus and worldwide [1]. Crop yield estimation underpins planning strategies to fulfill projected demands of human population under the constraints of food security. Precision agriculture (PA) is a farming method that uses data and modern technologies to maximize crop yield through decision-making applications. A key goal of PA is to improve yield estimation, which has traditionally been estimated with manual sampling or other indirect methods. Remote Sensing (RS) has been found important in PA and yield estimation tasks during the last decades, due to the daily volume of data it generates [4]. However, validating such RS-based models for yield estimation requires further enhancement with in-situ work, e.g., fruit counting. Those field exercises require many manhours. Estimating yield with higher accuracy helps farmers, stakeholders, and relevant governmental bodies for a better planning to satisfy the needs of consumers. Moreover, yield assessments can enhance the market projections and more precise financial management of agricultural market[9]. The use of data-driven application in farming using Artificial Intelligence (AI) has shown promising performance to efficiently validate RS models [3]. For example, data derived from unmanned aerial vehicles (UAVs), robots and cameras installed in fields are combined with Deep Learning (DL) models that are able to perform object detection tasks and, consequently, perform fruit counting [5], [6], [7]. Two primary categories exist in the subject of object detection AI models, i.e., one-stage and two-stage models. Two-stage models are typically based on the architecture of region-based Convolutional Neural Network (R-CNN) and they are utilizing a region proposal network (RPN) such as the Faster R-CNN [10] to extract potential object regions. The extracted regions are then used to classify the objects of an image and execute the bounding box regression to mark the object. These algorithms are executing tasks with higher accuracy due to their capability to execute detection in two stages by enhancing object localization, but they are slow and not usually suitable for real-time applications. More variations of those methodologies have been developed through the years such as Fast R-CNN and Mask R-CNN [11], [12]. On the other hand, one-stage models are more efficient in the aspect of realtime processing as they are executing object classification and bounding box regression in the same neural network. Popular models are the different versions and variations of the You Only Look Once (YOLO) algorithm [13], [14]. According to a study, YOLOv5s achieved 1.84% higher precision on different varieties of grapes’ real-time detection in complex agricultural environments compared to other YOLO variations [15]. A channel pruning is applied on YOLOv5 to develop a lighter architecture by reducing the floating-point operations and then used on real-time tracking of grapes on field images and reached a mean Average Precision (mAP) of 82.3% while also improved the detection performance on overlapping grapes [16]. An improved YOLOv5s using the method Soft Non-Maximum Suppression (Soft-NMS) is used for tomato detection in a working environment as a part of picking robots [17]. The model succeeds 92% and 82% of precision and recall, respectively. Another study shows the ability of YOLOv5 installed on picking robots for real-time apple detection [18]. A YOLOv5s enhanced with a spatial attention module and an adaptive context information fusion module is developed for pineapple buds’ detection on UAV’s in-situ captured images. The proposed methodology increased the mAP@0.5 (mean Average Precision at Intersection of Union equal to 0.5) by 7.4% and mAP@0.5:0.95 (mean Average Precision at Intersection of Union from 0.5 to 0.95) by 31% [19]. An image processing methodology is proposed for the detection of cotton bolls on images captured by a remote-controlled robot, using more traditional methodologies [20]. Later on, an improvement of this work was proposed by the same author, using a variation of Faster R-CNN, known as FrRCNN5-cls, which achieved an R-squared (R2 or the coefficient of determination) of 0.88 and Root Mean Square Error (RMSE) of 0.79 [21]. Another model, which uses YOLOv5s as the base model with ShuffleNet-v2 as the backbone model, after fine tuning achieved an improvement of 3.5% at mAP on litchi fruit detection [22]. The YOLOv4 is evaluated on RGB chestnuts’ images captured by a UAV and achieved an R2 of 0.98 and RMSE of 6.3 [24]. Moreover, improved fundamental variations of YOLOv5s adapting attention mechanisms and robustness to fog are considered for real-time object detection tasks in uncertain agricultural landscapes [25], [26].In this study one of the light variations of YOLOv5s is adopted on two different benchmark datasets for counting mangoes and wheat heads. Mango fruit is one of the most demanded fruits around different geographical areas in the world such as Asia, Africa, and Latin America. On the other hand, wheat has a high importance for Cyprus with an annual value of ca. 16.4 million euros, covering an area of ca. 12.000 ha and a production of ca. 25.000 metric tons (t) per year [8]. Furthermore, this study aims to evaluate YOLOv5s on crop yield estimation tasks using in-situ captured images as a preliminary work before comparing it with other lightweight variations and examine the different on-board processing aspects. The rest of the paper is structured as follows. Section 2 describes the methodology followed for the fine tuning of YOLOv5s, the experimental setup and the evaluation strategy used. Moreover, describes the two benchmark datasets used in the experimental study. Section 3 gives an overview of the experimental results concerning the performance and time consumption of the investigated model. Finally, Section 4 concludes this work. 2.MATERIALS AND METHODS2.1Benchmark datasetsIn this study. two benchmark datasets are utilized for the preliminary evaluation of YOLOv5s on yield estimation from field images. Both datasets are acquired using the AgML Python library [27], which is an open-access package offering access to fundamental object detection models and benchmark datasets for agricultural tasks. The first dataset consists of 1730 images of mango trees during night-time. The images are captured using a ground-based RGB sensor from mango orchards in Australia [28]. On the other hand, the dataset for wheat heads is also developed from images acquired with a ground-based RGB sensor. The images are captured from plots cultivated with wheat from around the world and are 6512 in total [29]. Both datasets consist of single classes and are suitable for training fundamental models performing object detection tasks. 2.2You Only Look Once version 5YOLOv5, or You Only Look Once version 5, is a fundamental object detection model with real-time capabilities. The architecture (figure 2) of YOLOv5 consists of a backbone network, neck and head (output). The primary function of the backbone, which is usually an altered CSPDarknet53, is to extract detailed, layered features from the input images. The neck, which consists of PANet (Path Aggregation Network), connects the various backbone stages to improve feature fusion and the model’s capacity to detect objects at different scales. Ultimately, the head is made up of multiple convolutional layers that predict confidence scores, object classes, and bounding boxes. YOLOv5 is enhanced in four different aspects. The first advancement in the model is the operation of mosaic-data-augmentation which improves the training speed and network accuracy. YOLOv5, in order to enhance its ability of detecting targets of various sizes, is using the adaptive anchor computation and the adaptive picture scaling techniques. Furthermore, the model adopts fresh concepts in the backbone network and, to enhance the computation and improve the feature representation, adds the focus structure and the cross-stage partial (CSP) structure. Moreover, the Neck network is using a Feature Pyramid Network and a Pyramid Attention Network and adds the CSP2 architecture [30], [31]. Those newly added strategies enhanced the general model’s performance. Still, YOLOv5 faces limitations in detecting tiny targets. YOLOv5s is the second smallest and fastest architecture in the family of YOLOv5 algorithms [32]. 2.3Evaluation metricsThe performance of YOLOv5 was evaluated using three different well known evaluation metrics for object detection tasks. The first metric given in eq.1 is the Precision, which measures the ability of a classification algorithm to identify only the relevant data points. Recall, which is given in eq.2, is the ability of a model to identify all the relevant cases within a data set. where true positives is the number of model’s correct predictions of the positive class, false positives is the number of model’s incorrect predictions of the positive class and false negatives is the number of model’s incorrect predictions of the negative class. The mean Average Precision (mAP) given in eq.3 is widely used for object detection models to evaluate their performance based on a range of Intersection of Union (IoU). where n is the number of classes and APk is the average precision of class k. In cases where mAP is evaluated on certain ranges of IoU, it is the average precision of class k at the specific IoU range. 2.4Experimental SetupTo evaluate the performance of the YOLOv5s object detection model on the two datasets, they were randomly split into training (70%), validation (20%) and testing (10%) subsets. YOLOv5s is a fundamental pre-trained model and thus hyperparameter tuning is not necessary to be performed in this case. Moreover, for each dataset, two independent instances of the model are fine-tuned with further training of 50 epochs. YOLOv5 is applying built-in augmentations (e.g., blur, median blur, gray, etc.) to the datasets. 3.EXPERIMENTAL RESULTS3.1Model’s performanceThe accuracy metrics are defined in Equations (1) to (3) and are used to evaluate the YOLOv5s. Table 1 shows the precision, recall, mAP@50 and mAP@50:95 achieved on both datasets. The investigated object detection model performed better on the mango dataset with respect to all the evaluation metrics, i.e., Precision = 0.960, Recall = 0.954, mAP@50 = 0.981, and mAP@50:95 = 0.734. On the wheat head dataset the model achieved relatively high accuracy with respect to three of the evaluation metrics, i.e., Precision = 0.910, Recall = 0.824, and mAP@50 = 0.896, but has low accuracy in mAP@50:95. This is because mango colour, shape, and size (especially when they are ripening) creates a contrast with the background of the image and, thus, it is easier for YOLOv5s to detect and generate a bounding box around the fruits. In contrast, wheat colour together with the density of the stems (during both of their states, green and dry) in an image imposes challenges for YOLOv5s to detect the wheat heads. Furthermore the limitations on detecting tiny objects probably influences the performance of the model, despite the large dataset. The model counted a total of 1151 mango instances on the 127 images of the testing set, and 4399 wheat heads instances on the 100 images of the testing set. Figures 3 and 4 present examples of the detected mangoes and wheat heads, respectively. Table 1.Experimental results of object detection tasks using YOLOv5s on mango and wheat head benchmark datasets.
3.2Time consumption of YOLOv5sYOLOv5s needed almost the same time for the 50 epochs of training on both mango and wheat head datasets with 1753.2 and 1828.8 seconds, respectively. YOLOv5s is faster on the wheat head dataset regarding pre-process and inference with 0.0004 and 0.0116 seconds, respectively. The model performed non-maximum suppression in 0.0141 seconds for the mango dataset, in contrast to the 0.0190 seconds needed for the wheat head dataset. All the performance metrics are shown in Table 2. Further than training, the model applies other functionalities that are pre-process, which is the step where the YOLOv5 prepares the image by applying resizing and normalization on pixels’ values, inference, which is the step where the model is making the predictions, and Non-Maximum suppression, which is the step where the model is keeping the best bounding box for each object by comparing to others and suppressing low-confidence and overlapping boxes. The experiments are conducted using a Virtual Machine from Google Colab equipped with 52GB of RAM and an A100GPU. Future experiments will include the use of specific on-board hardware. Table 2.Cost effective metrics.
4.CONCLUSIONIn this work, YOLOv5s, a light variation of YOLOv5, is investigated for its performance on two benchmark datasets for yield estimation using object detection. Based on the model’s performance on the experiments, the following concluding remarks can be drawn. YOLOv5s performed better on the mango dataset, mainly due to the colour, shape and size of the fruits. However, the shape, colour and density of wheat heads generates multiple objects overlaps in the images resulting in lower accuracy for the YOLOv5s. The fine tuning of the model requires approximately 1800 seconds. The inference time of 0.0142 and 0.0116 enables the further examination of on-board yield estimation using ground-based cameras or UAVs. Indeed, the use of AI on-board requires evaluating specific aspects, such as the size of the model and the use of computational resources, which must be limited compared to a ground-based scenario. Therefore, the use of specific onboard hardware will be considered for the future development. ACKNOWLEDGEMENTSThis work was partially supported by the European Union’s HORIZON Research and Innovation Programme under grant agreement No 101120657, project ENFIELD (European Lighthouse to Manifest Trustworthy and Green AI), the AI-OBSERVER project funded from the European Union’s Horizon Europe Framework Programme HORIZON WIDERA-2021-ACCESS-03 (Twinning) under the Grant Agreement No 101079468, and the ‘EXCELSIOR’: ERATOSTHENES: Excellence Research Centre for Earth Surveillance and Space-Based Monitoring of the Environment H2020 Widespread Teaming project (www.excelsior2020.eu). The ‘EXCELSIOR’ project has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No 857510, from the Government of the Republic of Cyprus through the Directorate General for the European Programmes, Coordination and Development and the Cyprus University of Technology. REFERENCESM. Eliades et al.,
“Earth Observation in the EMMENA Region: Scoping Review of Current Applications and Knowledge Gaps,”
Remote Sensing, 15
(17), 4202
(2023). https://doi.org/10.3390/rs15174202 Google Scholar
A. Nyéki and M. Neményi,
“Crop Yield Prediction in Precision Agriculture,”
Agronomy, 12
(10), 2460
(2022). https://doi.org/10.3390/agronomy12102460 Google Scholar
M. T. Linaza et al.,
“Data-Driven Artificial Intelligence Applications for Sustainable Precision Agriculture,”
Agronomy, 11
(6), 1227
(2021). https://doi.org/10.3390/agronomy11061227 Google Scholar
M. Weiss, F. Jacob, and G. Duveiller,
“Remote sensing for agricultural applications: A meta-review,”
Remote Sensing of Environment, 236 111402
(2020). https://doi.org/10.1016/j.rse.2019.111402 Google Scholar
L. Deng et al.,
“Lightweight aerial image object detection algorithm based on improved YOLOv5s,”
Sci Rep, 13
(1), 7817
(2023). https://doi.org/10.1038/s41598-023-34892-4 Google Scholar
A. Pretto et al.,
“Building an Aerial–Ground Robotics System for Precision Farming: An Adaptable Solution,”
IEEE Robot. Automat. Mag., 28
(3), 29
–49
(2021). https://doi.org/10.1109/MRA.2020.3012492 Google Scholar
Springer Handbook of Robotics. in Springer Handbooks. Cham: Springer International Publishing,2016). https://doi.org/10.1007/978-3-319-32552-1 Google Scholar
‘Agriculture, Livestock, Fishing -Predefined Tables,”
(2024) https://www.cystat.gov.cy/en/KeyFiguresList?s=28 , May ). 2024). Google Scholar
E. Benami et al.,
“Uniting remote sensing, crop modelling and economics for agricultural risk management,”
Nat Rev Earth Environ, 2
(2), 140
–159
(2021). https://doi.org/10.1038/s43017-020-00122-y Google Scholar
B. Cheng, Y. Wei, H. Shi, R. Feris, J. Xiong, and T. Huang,
“Revisiting RCNN: On Awakening the Classification Power of Faster RCNN,”
Computer Vision -ECCV2018in Lecture Notes in Computer Science, 1121911219 473
–490 Cham: Springer International Publishing,2018). https://doi.org/10.1007/978-3-030-01267-0_28 Google Scholar
“Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019,”
in Advances in Intelligent Systems and Computing, 999999 Springer Singapore, Singapore
(2020). https://doi.org/10.1007/978-981-13-9042-5 Google Scholar
Ross Girshick,
“Fast R-CNN,”
in in Proceedings of the IEEE International Conference on Computer Vision,
(2015). https://doi.org/10.1109/ICCV.2015.169 Google Scholar
U. Sirisha, S. P. Praveen, P. N. Srinivasu, P. Barsocchi, and A. K. Bhoi,
“Statistical Analysis of Design Aspects of Various YOLO-Based Deep Learning Models for Object Detection,”
Int J Comput Intell Syst, 16
(1), 126
(2023). https://doi.org/10.1007/s44196-023-00302-w Google Scholar
T. Diwan, G. Anirudh, and J. V. Tembhurne,
“Object detection using YOLO: challenges, architectural successors, datasets and applications,”
Multimed Tools Appl, 82
(6), 9243
–9275
(2023). https://doi.org/10.1007/s11042-022-13644-y Google Scholar
C. Zhang, H. Ding, Q. Shi, and Y. Wang,
“Grape Cluster Real-Time Detection in Complex Natural Scenes Based on YOLOv5s Deep Learning Network,”
Agriculture, 12
(8), 1242
(2022). https://doi.org/10.3390/agriculture12081242 Google Scholar
L. Shen et al.,
“Real-time tracking and counting of grape clusters in the field based on channel pruning with YOLOv5s,”
Computers and Electronics in Agriculture, 206 107662
(2023). https://doi.org/10.1016/j.compag.2023.107662 Google Scholar
G. Gao, C. Shuai, S. Wang, and T. Ding,
“Using improved YOLO V5s to recognize tomatoes in a continuous working environment,”
SIViP, 18
(5), 4019
–4028
(2024). https://doi.org/10.1007/s11760-024-03010-w Google Scholar
B. Yan, P. Fan, X. Lei, Z. Liu, and F. Yang,
“A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5,”
Remote Sensing, 13
(9), 1619
(2021). https://doi.org/10.3390/rs13091619 Google Scholar
G. Yu, T. Wang, G. Guo, and H. Liu,
“SFHG-YOLO: A Simple Real-Time Small-Object-Detection Method for Estimating Pineapple Yield from Unmanned Aerial Vehicles,”
Sensors, 23
(22), 9242
(2023). https://doi.org/10.3390/s23229242 Google Scholar
S. Sun, C. Li, A. H. Paterson, P. W. Chee, and J. S. Robertson,
“Image processing algorithms for infield single cotton boll counting and yield prediction,”
Computers and Electronics in Agriculture, 166 104976
(2019). https://doi.org/10.1016/j.compag.2019.104976 Google Scholar
Y. Jiang, C. Li, R. Xu, S. Sun, J. S. Robertson, and A. H. Paterson,
“DeepFlower: a deep learning-based approach to characterize flowering patterns of cotton plants in the field,”
Plant Methods, 16
(1), 156
(2020). https://doi.org/10.1186/s13007-020-00698-y Google Scholar
L. Wang, Y. Zhao, Z. Xiong, S. Wang, Y. Li, and Y. Lan,
“Fast and precise detection of litchi fruits for yield estimation based on the improved YOLOv5 model,”
Front. Plant Sci., 13 965425
(2022). https://doi.org/10.3389/fpls.2022.965425 Google Scholar
R. Sanya, A. L. Nabiryo, J. F. Tusubira, S. Murindanyi, A. Katumba, and J. Nakatumba-Nabende,
“Coffee and cashew nut dataset: A dataset for detection, classification, and yield estimation for machine learning applications,”
Data in Brief, 52 109952
(2024). https://doi.org/10.1016/j.dib.2023.109952 Google Scholar
T. Arakawa, T. S. T. Tanaka, and S. Kamio,
“Detection of on-tree chestnut fruits using deep learning and RGB unmanned aerial vehicle imagery for estimation of yield and fruit load,”
Agronomy Journal, 116
(3), 973
–981
(2024). https://doi.org/10.1002/agj2.v116.3 Google Scholar
T. Jiang, C. Li, M. Yang, and Z. Wang,
“An Improved YOLOv5s Algorithm for Object Detection with an Attention Mechanism,”
Electronics, 11
(16), 2494
(2022). https://doi.org/10.3390/electronics11162494 Google Scholar
X. Meng, Y. Liu, L. Fan, and J. Fan,
“YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios,”
Sensors, 23
(11), 5321
(2023). https://doi.org/10.3390/s23115321 Google Scholar
A. Joshi, D. Guevara, and M. Earles,
“Standardizing and Centralizing Datasets for Efficient Training of Agricultural Deep Learning Models,”
Plant Phenomics, 5 0084
(2023). https://doi.org/10.34133/plantphenomics.0084 Google Scholar
Anand Koirala, C McCarthy, Kerry Walsh, and Z Wang,
“MangoYOLO data set,”
Central Queensland University,
(2021) http://hdl.handle.net/10018/1261224https://researchdata.edu.au/mangoyolo-set , May 2024). Google Scholar
Etienne David et al.,
“Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of highresolution RGB-labelled images to develop and benchmark wheat head detection methods,”
Zenodo,
(2020). Google Scholar
G. Jocher, others,
“YOLOv5 by Ultralytics (Version 7.0)[Computer software],”
(2020). Google Scholar
G. Jocher, others,
“YOLOv5 by Ultralytics. 2020,”
(2023) https://github.com/ultralytics/yolov5 Google Scholar
Y. Zhao, Y. Shi, and Z. Wang,
“The Improved YOLOV5 Algorithm and Its Application in Small Target Detection,”
Intelligent Robotics and Applications, 679
–688 Cham: Springer International Publishing,2022). https://doi.org/10.1007/978-3-031-13841-6 Google Scholar
|