I attended the Wolfram Neural Networks Boot Camp 2020, and that inspired me to incorporate elements of data science and machine learning in my course. The helper functions for machine learning make it quite easy to experiment and introduce such applications to students. We chose to perform image recognition and classification problems that are routinely used to initiate the topics of both neural networks and machine learning.
I was immediately struck, however, by the lack of chemistry-oriented image datasets, and after playing with a Modified National Institute of Standards and Technology (MNIST) handwriting dataset example, decided to create this type of dataset. The goal was to provide students with an end-to-end data science project experience, from creating the data to the final step of performing an advanced machine learning exercise. Thus, this project creates a set of images of glassware routinely found in a chemistry laboratory with the goal of constructing our own example of object identification.
Methodology
The following steps were outlined to collect pictures for each piece of glassware:
Take pictures of glassware when empty and when filled with water.
Fill the glassware with different-colored solutions to various levels, e.g. a 250-mL Erlenmeyer flask would be filled to various volume capacities.
Provide variation in the background by occasionally including clutter. For example, some glassware was placed on the laboratory bench or an elevated surface to provide different backgrounds.
Organize all pictures in folders shared on Google Drive. Students uploaded pictures from their phones to the shared drive.
The item classes—and the number of items in each class—for the classification problem are:
This collage has one picture of each of the 17 different types of laboratory equipment. There were some difficulties with the pipet pictures. The 5-mL pipets appeared quite difficult to detect in the picture, so we used white or yellow paper backgrounds to highlight them. Of course, there are some other issues with labeling. Some glassware is routinely seen and utilized as suspended, e.g. burettes and separatory funnels.
Volumetric glassware that could be filled with liquids had far more pictures than other items in the dataset. This collection of beaker images illustrates the diversity:
The important point here is that the data collection included multiple variations: lighting conditions, focus on the object, cellphone quality, picture presets and various images for each label.
Dataset Augmentation
Due to the small number of images in the dataset and the disparity in the number of images between different classes, we thought it prudent to increase the sample space by performing some image modifications on the pictures. We utilized ImageEffect options to increase the sample size by a factor of 19:
Images were blurred using the Blur function with a randomly chosen pixel radius over which the blur was applied. Images were also modified to appear lighter or darker using the appropriately named functions. Next, all of these images were collected and flipped from left to right. The final operation randomly rotated all images between –10 and 10 degrees.
Here is the result of these operations on a single picture from the Erlenmeyer flask label:
The effect of this series of operations on a single beaker-class image is shown here:
Now we’ll represent the effects of image augmentation for one image chosen from each of the beaker, flat-bottom flask and wash-bottle classes, respectively:
Image set augmentation produced 6,365 images. We then decided to use this as our dataset for training and testing of various pre-trained neural networks available at the Wolfram Neural Net Repository.
Neural Networks
We decided to evaluate the performance of four well-known pre-trained neural networks from the field of image recognition. Here is a summary of the sources, number of layers and parameters of these neural networks:
Data Import and Organization
We split the dataset to use 80% of the data as the training set and the remaining 20% as the testing set. We imported the training and testing sets and carried out the training and evaluation of classification performance using four variants:
Full-color images
Full-color images with the dataset augmented using the image augmentation module described earlier
Grayscale images from the full-color images
Grayscale images with the dataset augmented using the image augmentation module described earlier
Let’s check some examples of the training and testing sets:
After the 80–20 split, here are the total images in the training and test sets:
Training and Results
The following steps were used for net surgery on the pre-trained ResNet model obtained from the Wolfram Neural Net Repository:
Remove the linear layer from the pre-trained net.
Remove the final classification layers of the trained net and replace them with this classification task:
In this example, the LinearLayer and SoftmaxLayer functions have been removed, and a new net composed of the pre-trained net followed by a linear layer and a softmax layer is created:
In the final classifier step, LinearLayer takes an integer argument for the number of classes. The number of classes for the laboratory glassware identification experiment is the following:
An uninitialized LinearLayer and a SoftmaxLayer is attached to create the final net. A NetDecoder with the labels specified by the laboratory glassware is attached to inference the class names from the output of the net:
Here we will perform training with MaxTrainingRounds set to 5. The training is finished in about five minutes on a laptop with an NVIDIA Quadro M2000M GPU. For the transfer learning, all the layers except the final classifier layer are kept frozen:
Various properties can be obtained from the trained net. Here we list a few:
Classification Performance
The trained net can be directly used with the ClassifierMeasurements function to obtain a varied range of performance metrics:
Obtain the overall accuracy of the trained net on the test set:
"ConfusionMatrixPlot" is a great plotting technique to visualize the correctness of the classification. The off-diagonal terms are the incorrectly classified examples:
Here are the top five of the six errors where the net misclassified:
The low frequency of misclassification is clear from the confusion matrix. A graduated cylinder gets misclassified as a standard flask only once out of 179 samples tested. The high accuracy with very little computational cost is quite a revelation for students. Another striking example is the Büchner funnel classification with perfect precision and recall. The test-tube class, which has the lowest number of images in the test set (23), also displays perfect recall:
Results Overview
Following the processes described previously, we tested the four neural networks on four variants of our laboratory images.
The validation error during the training is initially quite high with the unaugmented dataset. After a few rounds of training, both datasets resulted in trained networks with increased accuracy:
The next graphic shows the training time and accuracy of all four trained networks with each of the four dataset variants. Unsurprisingly, the enlarged datasets take longer to train and provide greater accuracy compared to the unaugmented datasets. The ResNet-50 and ResNet-101 networks both provide similar classification accuracy between grayscale and full-color images:
Teaching Ideas and Conclusion
The dataset images and sample notebooks used for the training of networks and data analysis are available on Zenodo. This should enable other instructors to employ this project in its entirety or devise other projects using its data. One simple extension would be to add different glassware images and investigate classification performance. Another interesting, and possibly more advanced, application would be to identify the text on glassware that annotates the volume, especially on beakers or Erlenmeyer flasks. This project also provides a hands-on experience about the power of deep learning and pre-trained neural networks for students from non–computer science backgrounds.
Acknowledgements
I am grateful to my students in the Introduction to Scientific Computing course at Wagner College. I am also thankful to Dr. Tuseeta Banerjee and Dr. Mads Bahrami for the invitation to write this blog post and the Wolfram Blog Team for their support in publishing it.
コメント