SOLUTION: MSDA 665 BIG DATA ANALYTICS FOR BUSINESS – PROJECT TWO This project aims to accurately classify images into different categories using artificial neural networks (ANNs) and convolutional neural networks (CNNs). You can remove the gray sentences and fill this document using your information.

MSDA 665 BIG DATA ANALYTICS FOR BUSINESS – PROJECT TWO

This project aims to accurately classify images into different categories using artificial neural networks

(ANNs) and convolutional neural networks (CNNs). You can remove the gray sentences and fill this

document using your information.

Student Name: Write your First and Last name (student ID)

Data Source: CIFAR-10 is a built-in image dataset available in Python. It consists of 50,000 32×32 color

images in 10 different classes (e.g., airplanes, cars, cats, dogs, etc.). It is often used for image

classification tasks. Import it into Google Colab by using the following code.

import tensorflow as tf

# Import CIFAR-10 dataset

cifar10 = tf.keras.datasets.cifar10

(train_xs, train_ys), (test_xs, test_ys) = cifar10.load_data()

class_names = ["Airplane","Automobile","Bird","Cat","Deer","Dog","Frog","Horse","Ship","Truck"]

# Printing the class names

for i, class_name in enumerate(class_names, start=0):

print(f"Class {i}: {class_name}")

To complete this project, you have the flexibility to choose another Python built-in image dataset or

import your own custom dataset. However, please ensure that your chosen data set can be used to

complete the following analysis requirements. If this course is your first exposure to neural networks, I

recommend using the above built-in dataset for this project.

Python Code: Conduct the analysis in Google Colab and paste the link here once you complete all

analyses. Your analysis should include the following items.

 Data Exploration:

a) find the basic information of the data set, including the total number of images in the train

and test datasets, the size of each image, and the number of categories (labels/classes).

b) Display sample images with corresponding labels: 25 images in the train set and 25 images

in the test set.

Hint: to add labels, you need to convert train_ys[i] to an integer for this dataset.

plt.xlabel(class_names[int(train_ys[i])]) # Convert train_ys[i] to an integer

 Data Preprocessing: perform necessary preprocessing steps such as resizing the images to a

consistent size, normalizing pixel values, etc.

 ANN Model Architecture Design:

a) Initiate model: design an ANN architecture suitable for image classification. Experiment with

different architectures and hyperparameters to optimize performance, such as the number

of layers, activation functions, optimizer, etc.

b) model training: train the ANN model on the training dataset.

c) model evaluation: evaluate the trained model using the testing dataset. Calculate loss,

accuracy, and/or other metrics to measure the model's performance.

d) check overfitting issues using a loss or accuracy plot. Identify the optimal number of epochs.

MSDA 665 BIG DATA ANALYTICS FOR BUSINESS – PROJECT TWO

e) apply the model to do prediction: predict the label/classification for at least one image.

 CNN Model Architecture Design:

a) Initiate model: design a CNN architecture suitable for image classification. Experiment with

different architectures and hyperparameters to optimize performance, such as the filter

size, activation functions, optimizer, etc.

b) model training: train the CNN model on the training dataset.

c) model evaluation: evaluate the trained model using the testing dataset. Calculate loss,

accuracy, and/or other metrics to measure the model's performance.

d) check overfitting issues using a loss or accuracy plot. Identify the optimal number of epochs.

e) apply the model to do prediction: predict the label/classification for at least one image.

 Compare the two models’ performance: accuracy, computational efficiency, etc.

Summary: Compose a report (at least 300 words) that briefly covers the above points and highlights the

key points in the analysis process. For example, how did you adjust the hyperparameters?