INTRO TO DATA MINING
Week 1 Discussion:
When using different data algorithms, why is it fundamentally important to understand why they are being used?
If there are significant differences in the data output, how can this happen and why is it important to note the differences?
Who should determine which algorithm is “right” and the one to keep? Why?
Week 2 Discussion:
What is knowledge discovery in databases (KDD)?
Review section 1.2 and review the various motivating challenges. Select one and note what it is and why it is a challenge.
Note how data mining integrates with the components of statistics and AL, ML, and Pattern Recognition.
Note the difference between predictive and descriptive tasks and the importance of each.
Week 3 Discussion:
Chapter 2:
What is an attribute and note the importance?
What are the different types of attributes?
What is the difference between discrete and continuous data?
Why is data quality important?
What occurs in data preprocessing?
In section 2.4, review the measures of similarity and dissimilarity, select one topic and note the key factors.
Week 4 Discussion:
Note the basic concepts in data classification.
Discuss the general framework for classification.
What is a decision tree and decision tree modifier? Note the importance.
What is a hyper-parameter?
Note the pitfalls of model selection and evaluation.
Week 4 Essay work
What were the traditional methods of data collection in the transit system?
Why are the traditional methods insufficient in satisfying the requirement of data collection?
Give a synopsis of the case study and your thoughts regarding the requirements of the optimization and performance measurement requirements and the impact to expensive and labor-intensive nature.
Week 5 Discussion:
What are the various types of classifiers?
What is a rule-based classifier?
What is the difference between nearest neighbor and naïve bayes classifiers?
What is logistic regression?
Week6 Homework:
Review the article by Hemmatian (2019), on classification techniques. In essay format answer the following questions:
What were the results of the study?
Note what opinion mining is and how it’s used in information retrieval.
Discuss the various concepts and techniques of opinion mining and the importance to transforming an organizations NLP framework.
WEEK 7 DISCUSSION
What is the association rule in data mining?
Why is the association rule especially important in big data analysis?
How does the association rule allow for more advanced data interpretation?
WEEK 8 DISCUSSION
When thinking about data visualization, it is important to understand regular expressions in data analytics. Therefore, note the importance of data visualizations and choose two types of expressions (* – wildcards for example) and discuss the difference between the two types of expressions.
WEEK9 DISCUSSION
What are the techniques in handling categorical attributes?
How do continuous attributes differ from categorical attributes?
What is a concept hierarchy?
Note the major patterns of data and how they work.
Week 10 Discussion
What is K-means from a basic standpoint?
What are the various types of clusters and why is the distinction important?
What are the strengths and weaknesses of K-means?
What is a cluster evaluation?
Week 10 Homework
After reviewing the case study this week by Krizanic (2020), answer the following questions in essay format.
What is the definition of data mining that the author mentions? How is this different from our current understanding of data mining?
What is the premise of the use case and findings?
What type of tools are used in the data mining aspect of the use case and how are they used?
Were the tools used appropriate for the use case? Why or why not?
Week 11 Discussion
In chapter 8 we focus on cluster analysis. Therefore, after reading the chapter answer the following questions:
What are the characteristics of data?
Compare the difference in each of the following clustering types: prototype-based, density-based, graph-based.
What is a scalable clustering algorithm?
How do you choose the right algorithm?
Week 12 Homework
After reviewing chapter 9 on anomaly detection this week, answer the following questions in essay format.
What are the characteristics of anomaly detection?
What are the detection problems and methods?
What are the statistical approaches when there is an anomaly found?
Compare and contrast proximity and clustering based approaches.
Week 13 Discussion
This week we focus on the concept of false discovery in data. After reviewing the article by Naouma (2019), answer the following questions:
What is a false discovery rate?
Can a false discovery rate be completely avoided? Explain.
What was the outcome of the results of the use case?
Week 14 Homework
Review the reading by Naouma and answer the following in essay format:
Denote what the study was about.
Discuss how random-field theory was used in the case study.
What were the results of the false recovery rate in the study?
The post INTRO TO DATA MINING Week 1 Discussion: When using different data algorithms, appeared first on PapersSpot.