MSDA 665 BIG DATA ANALYTICS FOR BUSINESS – PROJECT TWO
This project aims to accurately classify images into different categories using artificial neural networks
(ANNs) and convolutional neural networks (CNNs). You can remove the gray sentences and fill this
document using your information.
Student Name: Write your First and Last name (student ID)
Data Source: CIFAR-10 is a built-in image dataset available in Python. It consists of 50,000 32×32 color
images in 10 different classes (e.g., airplanes, cars, cats, dogs, etc.). It is often used for image
classification tasks. Import it into Google Colab by using the following code.
import tensorflow as tf
# Import CIFAR-10 dataset
cifar10 = tf.keras.datasets.cifar10
(train_xs, train_ys), (test_xs, test_ys) = cifar10.load_data()
class_names = ["Airplane","Automobile","Bird","Cat","Deer","Dog","Frog","Horse","Ship","Truck"]
# Printing the class names
for i, class_name in enumerate(class_names, start=0):
print(f"Class {i}: {class_name}")
To complete this project, you have the flexibility to choose another Python built-in image dataset or
import your own custom dataset. However, please ensure that your chosen data set can be used to
complete the following analysis requirements. If this course is your first exposure to neural networks, I
recommend using the above built-in dataset for this project.
Python Code: Conduct the analysis in Google Colab and paste the link here once you complete all
analyses. Your analysis should include the following items.
Data Exploration:
a) find the basic information of the data set, including the total number of images in the train
and test datasets, the size of each image, and the number of categories (labels/classes).
b) Display sample images with corresponding labels: 25 images in the train set and 25 images
in the test set.
Hint: to add labels, you need to convert train_ys[i] to an integer for this dataset.
plt.xlabel(class_names[int(train_ys[i])]) # Convert train_ys[i] to an integer
Data Preprocessing: perform necessary preprocessing steps such as resizing the images to a
consistent size, normalizing pixel values, etc.
ANN Model Architecture Design:
a) Initiate model: design an ANN architecture suitable for image classification. Experiment with
different architectures and hyperparameters to optimize performance, such as the number
of layers, activation functions, optimizer, etc.
b) model training: train the ANN model on the training dataset.
c) model evaluation: evaluate the trained model using the testing dataset. Calculate loss,
accuracy, and/or other metrics to measure the model's performance.
d) check overfitting issues using a loss or accuracy plot. Identify the optimal number of epochs.
MSDA 665 BIG DATA ANALYTICS FOR BUSINESS – PROJECT TWO
e) apply the model to do prediction: predict the label/classification for at least one image.
CNN Model Architecture Design:
a) Initiate model: design a CNN architecture suitable for image classification. Experiment with
different architectures and hyperparameters to optimize performance, such as the filter
size, activation functions, optimizer, etc.
b) model training: train the CNN model on the training dataset.
c) model evaluation: evaluate the trained model using the testing dataset. Calculate loss,
accuracy, and/or other metrics to measure the model's performance.
d) check overfitting issues using a loss or accuracy plot. Identify the optimal number of epochs.
e) apply the model to do prediction: predict the label/classification for at least one image.
Compare the two models’ performance: accuracy, computational efficiency, etc.
Summary: Compose a report (at least 300 words) that briefly covers the above points and highlights the
key points in the analysis process. For example, how did you adjust the hyperparameters?