Detecting Paddy Disorders and Pests from Crop Images
|Domain experts, AI engineers, and app developers|
|Work with the TN government's Uzhavan app to include AI for disease/pest detection for paddy|
Given an image of a paddy crop directly from the field, identify whether crop is infected or not, and classify the disease due to pests and other reasons, and suggest a list of remedial steps to be taken by the farmers based on stage and severity of infection.
Relevance in Indian Context
Agriculture is one of the biggest livelihood providers in India. It also contributes vastly to the GDP of the country. India is the second-largest producer of rice.
Though India ranks first globally with the highest net cropped area followed by the US and China, the Crop yields in India are still just 30% to 60% of the best sustainable crop yields achievable in the farms of other countries.
In assessment by All India Coordinated Rice Improvement Project, it has been found that the insect pests alone account for 28.8 % loss .
Data availability and collection
Rice Leaf Diseases Data Set - UCI ML (3 Disease classes: Bacterial leaf blight, Brown spot & Leaf smut, each 40 images)
APS Image Database - Open images DB of plant and diseases (For Rice, there are just 35 images as of now, covering 5-6 diseases)
Rice-Disease-DataSet - GitHub (3 Disease Classes, a total of 276 images)
63 Rice Disease Images Sample - flickr (Needs to be scraped and cleaned along with the captions)
In total, we can hardly get around 400 images after combining all the above datasets and cleaning it, which clearly is insufficient to train a deep model, even if we augment the images.
Scraping the Internet for a larger set of these images might be a better way. But it isn’t reliable at all, and requires domain experts to manually check the scraped data. Also, there are works which have tried to do so, but have collected only a small amount of data. (For e.g., dataset #3)
Hence, it is required to do an extensive data collection with all the metadata and annotations that we require for training a deep learning model. In addition to the RGB images of the diseased crop along with the disease class, it will be really helpful to have the following data along with it, which can be used as additional features in the ML model.
Stage of the disease (Initial, Development, Serious)
Temperature and humidity
Sunlight intensities and rainfall data for certain time period
Type of paddy rice, and the species/sub-species rank it belongs to
Region of plant (lat-long) with date and time of image capture
Soil type and moisture content & salinity level
Day number since the paddy was cultivated
Optional: Type of fertilizers / pesticides used
Though it may not be feasible to all these additional data, even a subset of these data should be reasonably good to get better predictions. To address the issue of structured data collection according to our requirements, AI4Bharat is also planning to develop an app for collecting crowd-sourced data. One should be able to capture the images using the app and fill-in the required details, making structured data collection feasible and reliable across a large data providers.
Before the advent of deep learning era, most of the techniques employed for this problem involves usage of CV techniques hand-crafted specific to the disease type, plant, etc  . One of the common trends was to do segmentation (with kNN) on the crop images to get just the segmented part containing the diseased patches or use k-mean clustering and obtain statistical data, then a basic classifier like SVM is used on top of it is used to detect the type(s) of the disease(s) .
Modern deep learning based solutions involve direct detection and classification of the disease based on the direct input image  . A combination of CV-based techniques followed by DL methods should be an optimal solution for the problem.
Open Technical Challenges
Segmenting paddy is particularly difficult when the background has a significant amount of green elements. Hence the general trend is to do a tight-bound detection on the images to get just the crop, but still, it may cover a significant area of background.
Some diseases are more prone to symptom variations than others. Hence it is necessary to capture all the variations of a disease.
Multiple simultaneous disorders: Many algorithms assume that only one disease is present in each image, but there can be multiple deficiencies / pests that can cause many disorders.
Several disorders manifests with similar symptoms in the leaves, making it difficult to identify the disease with the images and CV techniques alone.
Varying light conditions of the image at the time of capturing (training and inference) provides greater variation in colors.
Hyper-spectral images might actually help get really good results, instead of just RGB.
Identifying the list of all features that are essential for an ML model to recognize diseases.
A scalable data collection app/software system and facilitate online learning model to accommodate diverse collection of data on the fly.
 Pasalu IC, Mishra B, Krishnaiah NV, Katti G (2004) Integrated pest management in rice in India: Status and prospects. In: Birthal PS, Sharma OP (eds) Proceedings 11. Integrated pest management in Indian agriculture. National Centre for Agricultural Economics and Policy Research (NCAP)/National Centre for Integrated Pest Management (NCIPM), New Delhi, pp. 25–49
 Singh, V., & Misra, A. K. (2017). Detection of plant leaf diseases using image segmentation and soft computing techniques. Information processing in Agriculture, 4(1), 41-49.
 Prajapati, H. B., Shah, J. P., & Dabhi, V. K. (2017). Detection and classification of rice plant diseases. Intelligent Decision Technologies, 11(3), 357-373.
 Mohan, K. J., Balasubramanian, M., & Palanivel, S. (2016). Detection and recognition of diseases from paddy plant leaf images. International Journal of Computer Applications, 144(12).
 Ferentinos, K. P. (2018). Deep learning models for plant disease detection and diagnosis. Computers and Electronics in Agriculture, 145, 311-318.
Periodic Data Collection over a large period of time, say 6 months and annotation.
Data cleaning and filtering to get a balanced dataset and analysis.
Feasibility and design of an ML-based method to solve the problem
Implementation of the solution
Deployment on real-time systems for immediate analysis.