Background:
This dataset encompasses a wide array of health-related information, offering a comprehensive overview of various physiological and lifestyle factors. It includes demographic details such as sex and age, as well as crucial anthropometric measurements like height, weight, waistline. Additionally, the dataset contains data on blood pressure (systolic and diastolic), blood components (blood sugar, cholesterol levels, triglycerides, and hemoglobin), kidney function markers (serum creatinine), liver enzymes (SGOT_AST, SGOT_ALT, and gamma-GTP), and indicators of lifestyle habits (drinking). This rich dataset provides a valuable resource for exploring relationships between these variables, conducting health assessments, and investigating the impact of lifestyle choices on various health parameters.
By considering factors such as blood pressure (SBP, DBP), liver enzymes (SGOT_AST, SGOT_ALT, gamma_GTP), and cholesterol levels (tot_chole, HDL_chole, LDL_chole), we can gain insights into the impact of drinking (DRK_YN) on overall health. This analysis allows us to identify trends and potential health risks associated with drinking habits, such as increased liver stress, cardiovascular issues, and metabolic irregularities. By examining additional variables like smoking status (SMK_stat_type_cd), age, and gender, we can explore how these factors interact with drinking to influence health outcomes, providing a deeper understanding of the challenges and risks related to lifestyle choices.
Examples of investigative question will be:
- Which age group shows the highest levels of total cholesterol (tot_chole) among drinkers (DRK_YN)?
- Does drinking status (DRK_YN) correlate with liver enzyme levels (SGOT_AST, SGOT_ALT)?
Sources:
https://www.kaggle.com/code/mcpenguin/smoking-drinking-prediction-tfdf71/notebook?scriptVersionId=143235036
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Native Singapore Writers Team
- 100% Plagiarism-Free Essay
- Highest Satisfaction Rate
- Free Revision
- On-Time Delivery
Objectives:
- Data Cleaning: To perform data cleaning to prepare the dataset for further analysis.
- Exploratory Data Analysis (EDA): To conduct exploratory data analysis to gain statistical insights into the dataset. Key activities include gathering statistical summaries, plotting box plots and histograms for numerical variables, and creating visual charts for categorical data types. A correlation matrix for all numerical variables should also be included.
- Formulating Investigative Questions or Hypotheses: To propose preliminary investigative questions or hypotheses based on the dataset. Use data visualization techniques to explore and answer these questions or hypotheses. Go beyond the initial findings to explore specific scenarios in more depth, uncovering additional insights.
- Data Transformation: To perform data transformations based on insights gained from the EDA. This may include outlier removal and aggregation to improve data quality.
- Model Selection and Evaluation: To select appropriate target variables and apply Linear and Logistic Regression models. Assess and discuss the accuracy of each model, using SGOT_AST and DRK_YN as target variables.
Additional Notes:
- Complete Objectives 1, 2, and 3, and compile your findings into Report 1, which should be no more than 20 pages.
- For Objective 2, analyze all variables in the dataset and provide evidence of your work in both Knime and Tableau. In the report, show the statistic table, include any two box plots, two histograms, and two pie charts which are worth mentioning plus the linear correlation matrix.
- It is not necessary to answer the preliminary investigative questions in Report 1, answer them in Report 2. You may use AI tools to help generate relevant questions if needed. Propose at least two two-variable and three three-variable (further Insights) questions for Objective 3, ensuring they are unique from those in the introduction.
- For Report 2, perform the necessary data transformations following your EDA and use the data to address the investigative questions. Copy both the investigative questions from the Background section and proposed questions from Report 1 and provide answers for each one. Additionally, include Linear and Logistic Regression model analysis and conclude with a reflection.
- Reflection: In your reflection, evaluate the dataset’s usefulness, model accuracy, and any feature enhancements (such as additional features) that could improve the model’s predictive accuracy. Keep Report 2 to a maximum of 20 pages.
Data Dictionary (variable descriptions)
Variable | Description |
---|---|
sex | Gender of the individual (e.g., Male or Female). |
age | Age of the individual, categorized into 5-year interval |
height | Height of the individual, usually in centimeters. |
weight | Weight of the individual, typically in kilograms. |
waistline | Measurement of the individual’s waistline, in centimeters, indicating abdominal fat. |
SBP | Systolic Blood Pressure, measuring the pressure in arteries when the heart beats (mmHg). |
DBP | Diastolic Blood Pressure, measuring the pressure in arteries between heartbeats (mmHg). |
BLDS | Blood Sugar level, typically measured in mg/dL indicating blood glucose concentration. |
tot_chole | Total Cholesterol level, measuring the overall cholesterol in blood (mg/dL). |
HDL_chole | High-Density Lipoprotein (HDL) Cholesterol, often referred to as “good” cholesterol (mg/dL). |
LDL_chole | Low-Density Lipoprotein (LDL) Cholesterol, often called “bad” cholesterol (mg/dL). |
triglyceride | Level of triglycerides, a type of fat in the blood, usually in mg/dL. |
hemoglobin | Hemoglobin concentration, an indicator of oxygen-carrying capacity in the blood (g/dL). |
urine_protein | Presence of protein in urine, indicating possible kidney issues; usually coded as a categorical value. |
serum_creatinine | Serum creatinine level, indicating kidney function (mg/dL). |
SGOT_AST | Aspartate Aminotransferase (AST), a liver enzyme used to assess liver health (U/L). |
SGOT_ALT | Alanine Aminotransferase (ALT), another liver enzyme indicating liver health (U/L). |
gamma_GTP | Gamma-Glutamyl Transferase (GGT), an enzyme indicating liver and bile duct function (U/L). |
SMK_stat_type_cd | Smoking Status: 1 never smoked, 2 used to smoke but quit, 3 still smoking. |
DRK_YN | Drinking Status (Yes/No), indicating whether the individual consumes alcohol. |
Data Assigned:
S/N | Data File Assigned (Tick) |
---|---|
1 | Health-1.csv |
2 | Health-2.csv |
3 | Health-3.csv |
4 | Health-4.csv |
5 | Health-5.csv |
Buy Custom Answer of This Assessment & Raise Your Grades
The post ESE1008 The Impact of Lifestyle Choices on Health Using Data-Driven Insights | Project appeared first on Singapore Assignment Help.