In the previous post, I discussed a solution to Kaggle’s Dogs vs. Cats Challenge using Convolutional Neural Networks. CNN’s takes time to train and I tried a number of different network models and various values for hyperparameters before achieving 94% accuracy. This was very time consuming and it took around two days to determine the best network model and values of the hyperparameters. I used grid-search with the help of TrainCNN.py [1] to tune the value of hyperparameters. One run of TrainCNN.py for grid-search took few hours and since I was unable to do anything related to CNN, I decided to try Logistic Regression on another machine to solve the problem. I used LogisticRegressionCV from Scikit-learn which is the cross-validated version of the LogisticRegression function. I am not going to discuss the code in this blog post as it is straightforward implementation and instead encourage you to read it from LogisticRegression.py in my Exploring Deep Learning repository at Github.

Kaggle’s dogs vs. cats dataset has 25,000 images in two equal classes of dogs and cats. I used 15,000 (7,500 each for dogs and cats) randomly selected images for fitting model and 5,000 images (2,500 each for dogs and cats) for validation.

There are two parameters for processing the dataset itself: image size and whether to standardizing images or not. For logistic regression there is choice of solver and a hyperparameter called `Cs`

which describes the strength of the regularization. Smaller values of `Cs`

specifies stronger regularization. I did grid-search for optimal solution for these parameters and below are the results:

Solver | ImageSize | Rescale | TrainingAcc | ValidationAcc | TimeToFit (s) | Memory (GB) |

lbfgs | 75 | True | 67.6 | 61.8 | 308.7 | 13.5 |

lbfgs | 100 | True | 70.1 | 61.3 | 544.8 | 23.6 |

lbfgs | 125 | True | 72.3 | 60.6 | 857.9 | 36.5 |

sag | 75 | True | 67.6 | 61.9 | 1222.6 | 13.2 |

sag | 100 | True | 70.1 | 61.3 | 2255.5 | 23.2 |

sag | 125 | True | 72.6 | 60.5 | 3572.6 | 36.1 |

lbfgs | 125 | False | 81.7 | 57.3 | 944.4 | 36.5 |

sag | 125 | False | 84.8 | 58.4 | 4072.1 | 36.1 |

lbfgs | 125 | True | 68.1 | 62.0 | 3635.8 | 36.5 |

Sklearn recommends using `liblinear`

for a smaller dataset and `sag`

or *saga* for larger dataset. However, the default solver is `lbfgs`

for logistic regression. Since dogs vs. cats dataset is relatively large for logistic regression, I decided to compare `lbfgs`

and `sag`

solvers. Comparing rows 1-3 with 4-6, we can see that although the training and validation accuracy is same for both `lbfgs`

and `sag`

solvers, the `sag`

solver is about four times slower than `lbfgs`

solver. Thus, sklean has a good default value of `lbfgs`

as solver for logistic regression.

If we compare image size for any one solver (rows 1-3 or 4-6) we can see that as the image size increases, training accuracy increases from 67.6% to 72.6%. However, the validation accuracy stays roughly the same at 61-62%. This indicates that model is being over-fitting over training samples. In the regularization section, we will see how to handle overfitting by adjusting the regularization strength.

Sklearn recommends that features should be approximately of the same scale. “*Note that [for] ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing*” [2]. I used `sklean.preprocessing.StandardScaler`

to normalized both training and validation data. ` StandardScaler`

transform the data so that each feature has a zero mean and unit standard deviation. Looking at the rows 7 and 8, we can see that without image normalization both `lbfgs`

and `sag`

massively overfits the training data with the training accuracy of 82% and 85%, respectively and the validation accuracy of only 57% and 58%. Both solvers are also about three times slower then when images were normalized. This clearly highlights the importance of the feature normalization.

Once I decided on the solver (`lbfgs`

), image size (125), and that images should be normalized, I fine tuned for regularization strength (`Cs`

). I used L2 regularization since `lbfgs`

supports only L2 regularization. To use L1 regularization we have to use `saga`

solver but since `sag`

and `saga`

are so much slower than `lbfgs`

I decided not to try it out. `LogisticRegressionCV`

in sklearn supports grid-search for hyperparameters internally, which means we don’t have to use model_selection.GridSearchCV or model_selection.RandomizedSearchCV. `LogisticRegressionCV`

has a parameter called `Cs`

which is a list all values among which the solver will find the best model. I used `Cs`

= [1e-12, 1e-11, …, 1e11, 1e12]. The results for fine tuning is presented in the last row (row 9) in the table above. It can be seen that the training accuracy has dropped from 72.3% to 68.1% while validation accuracy has increased from 60.6% to 62%. This, tuning for regularization strength does indeed decrease the degree of overfitting the training data.

In this article, I presented results for image classification for Kaggle’s dogs vs. cats dataset using logistic regression. The classifier achieved an accuracy of 62% on validation images. It may be possible to achieve higher accuracy by further tuning image size, preprocessing images, using a grayscale image instead of RGB color images, using a different value of regularization strength, or using both L1 and L2 regularization. I choose not to further explore since the memory requirements for logistic regression in sklearn is very large (last column in the table above).

Dogs vs. cats challenge [1] from Kaggle ended in Jan 2014 but it is still extremely popular for getting started in deep learning. This is because of two main reasons: the data set is small (25,000 images taking up about 600MB), and it is relatively easy to get a good score.

There are many many online articles discussing on how pre-process data , design a CNN model and finally train the model. So, in this post I am not going to discuss the implementation details. Instead, I am simply going to report my results using a custom designed model and transfer learning. I used Tensorflow and tf.keras with Python and it is available from my Exploring Deep Learning repository [2] at Github.

Note that this is my best attempt and not the first attempt. I used four blocks of 2D convolution layers followed by max pooling. In the end, I used two dense layers and a softmax layer as output. I also used dropout layers and image augmentation. The exact command line for training this model is:

1 |
TrainCNN.py --cnnArch Custom --classMode Categorical --optimizer Adam --learningRate 0.0001 --imageSize 224 --numEpochs 30 --batchSize 16 --dropout --augmentation --augMultiplier 3 |

The CNN model is given below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
--------------------------------------------------------------- Model: "Custom" --------------------------------------------------------------- Layer (type) Output Shape Param # conv2d (Conv2D) (None, 224, 224, 32) 896 max_pooling2d (MaxPooling2D) (None, 112, 112, 32) 0 conv2d_1 (Conv2D) (None, 112, 112, 64) 18496 max_pooling2d_1 (MaxPooling2 (None, 56, 56, 64) 0 conv2d_2 (Conv2D) (None, 56, 56, 128) 73856 max_pooling2d_2 (MaxPooling2 (None, 28, 28, 128) 0 conv2d_3 (Conv2D) (None, 28, 28, 256) 295168 max_pooling2d_3 (MaxPooling2 (None, 14, 14, 256) 0 flatten (Flatten) (None, 50176) 0 dense (Dense) (None, 512) 25690624 dense_1 (Dense) (None, 256) 131328 dense_2 (Dense) (None, 2) 514 =============================================================== Total params: 26,210,882 Trainable params: 26,210,882 Non-trainable params: 0 --------------------------------------------------------------- |

The above model was trained on 15,000 (7,500 each for dogs and cats) randomly chosen images from the Kaggle data set and validated with a separate 5,000 (2,500 each for dogs and cats) images. The model achieved 94% accuracy after 24 epochs. It took about 4 hours of training on my PC with NVidia GeForce GTX 1050 with 2GB of RAM.

For the second part, I used the VGG16 model with imagenet weights without the top layer and a custom denser layers at the end. Similar to the previous step, I used dropout layers and image augmentation. The exact command line for training this model is:

1 |
TrainCNN.py --cnnArch VGG16 --classMode Categorical --optimizer Adam --learningRate 1e-5 --imageSize 224 --numEpochs 30 --batchSize 25 --dropout --augmentation --augMultiplier 3 |

The CNN model is given below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
--------------------------------------------------------------- Model: "VGG16" --------------------------------------------------------------- Layer (type) Output Shape Param # =============================================================== vgg16 (Model) (None, 7, 7, 512) 14714688 flatten (Flatten) (None, 25088) 0 dense (Dense) (None, 512) 12845568 dense_1 (Dense) (None, 256) 131328 dense_3 (Dense) (None, 2) 514 =============================================================== Total params: 27,692,104 Trainable params: 12,977,410 Non-trainable params: 14,714,688 --------------------------------------------------------------- |

The above model was trained on the same dataset as the custom model above and it achieved an accuracy of 98% after 11 epochs. Clearly, this model is far more efficient and more accurate then the custom designed model.