This project is based on a CSV of facial images that are labeled on the basis of age, gender, and ethnicity. The dataset includes 27305 rows and 5 columns (age, ethnicity, gender, img_name, and pixels). My main objective was to create a model that predicted the gender of the person by analyzing his/her images.
Data Loading and Exploration
After analyzing the dataset, I found that none of the values are null. Although age, gender, and ethnicity are of integer data type, gender and ethnicity represent categorical values. Pixels represents the list of pixels of the particular image, but in our dataset it is in the form of a string and needs to be converted to a numpy array. Then, I created some graphics to understand the distribution of the different ages, genders, and ethnicities.
This first histogram shows the distribution of ages ranging from 1 to about 116 with a clear spike in newborns and young adults.
The next bar graph shows the number of males vs females, which is relatively equal although the amount of males is slightly higher. It is also important to note that 0 represents males and 1 represents females.
This bar graph shows the amount of people that are of a certain ethnicity. 0 represents Caucasian, 1 represents Black, 2 represents Asian, 3 represents Indian, and 4 is other. There are significantly more people that are Caucasian than the other ethnicities.
In order to determine what numbers corresponded with a specific gender/ethnicity, I used this code to display 20 random faces at a time to avoid inconsistency in labeling.
The first model I tested for gender prediction was the PyTorch Model. However, after many attempts at adjusting the model’s architecture, I found the accuracy to be very low. To accelerate the training process, I used GPU. I finally improved the model by adding more dropout layers and increasing the number of epochs to 50, but the accuracy only reached about .51. Therefore, I did not select this as the best model for gender prediction.
The Keras model proved to be much more efficient. I also used GPU to accelerate the training process for this model. I tested many variations to my model’s architecture and discovered using the optimizer RMSprop increased the accuracy the most. The completed model had an accuracy of about .89, which is why I selected it as the most promising model for gender prediction.
The PyTorch model is overfitting, as seen from the training loss and the test accuracy. Even though the training loss is decreasing, the accuracy still ended up being .51 for that model. For the Keras model, the accuracy is .89 even though the training loss is not as low as the PyTorch model.