Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionAdvanced Statistical Methods with SPSS(Statistics session for Staffs and PhD students)Graduate School, Staffordshire UniversityAsad (Dr Md Asaduzzaman)Department of EngineeringR md.asaduzzaman@staffs.ac.uk www.mdasad.com17 March, 2021Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionOutlineIntroduction: SPSSCorrelation & Regression with SPSSANOVA and ANCOVALogistic regression Asad, Dept. … Continue reading “SPSS Correlation & Regression | My Assignment Tutor”
Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionAdvanced Statistical Methods with SPSS(Statistics session for Staffs and PhD students)Graduate School, Staffordshire UniversityAsad (Dr Md Asaduzzaman)Department of EngineeringR md.asaduzzaman@staffs.ac.uk www.mdasad.com17 March, 2021Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionOutlineIntroduction: SPSSCorrelation & Regression with SPSSANOVA and ANCOVALogistic regression Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionPreliminaries …Session plan:I SPSS introductionI Correlation and multiple linear regression with SPSSI Analysis of variance (ANOVA): one-way, two-way and ANCOVA with SPSSI Logistic regression with SPSSAssumptions … about your statistics knowledge (correlation, regression,one-way/two anova and logistic regression). Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionWhat is SPSS?SPSS is short for Statistical Package for the Social Sciences, and it is used byvarious kinds of researchers for complex statistical data analysis. The SPSSsoftware package was created for the management and statistical analysis ofsocial science data.I Data reading/entry, import and handling are very easy (Text, CSV, Excel canbe imported easily)I It has built-in data manipulation tools such as computing, recoding,transforming variablesI Advanced statistical analysis, model fitting can be performed easily.I Output can be imported or transferred easily into word or word-processingsoftwares.I Staffordshire University has the full version of SPSS, and the software licenceis updated every year. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionSPSS blank data editor – Data view Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionSPSS blank data editor – Variable view Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionSPSS data file view Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionSome basic operations in SPSSI Other two important windows are: Syntax window and Output window. Manyprefers to use menu bars rather than syntax.I Data manipulation can be performed easily with lots of options. Somefrequently used options are: Compute, Recode, Select Cases, Split File, etc.These options can be found under menu: Data, Transform.I Analysis menus can be found under the tab: AnalyzeI Our focus todayI Correlation: Analyze ! Correlate ! BivariateI Linear regression: Analyze ! Regression ! LinearI ANOVA (1-way): Analyze ! Compare Means ! One-Way ANOVAI ANOVA (2-way & ANCOVA): Analyze ! General Linear Model ! UnivariateI Logistic regression: Analyze ! Regression ! Binary Logistic Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionA quick demo on SPSS. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionCorrelation & RegressionCorrelation: Simply measures the strength of association between variables. Instatistical terms, correlation (r) denotes linear relationship between twoquantitative variables. If one increases the other will also increase or decrease orvice-versa.I Varies between -1 and +1. Scatter diagram is a useful visual tool to explorecorrelation. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionCorrelation … continuedSome basic examples of correlation:I age and heightI advertisement spending and product sellI amount of fertiliser use and crop yieldI IQ score and exam markI car mileage and car priceI item price and their demand Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionCorrelation … continuedA quick demo of correlation into SPSS with Employee.sav dataset.The datafile contains information on (474 employees):I id, gender, birth date, education level (in singleyears), job category (managerial, clerical, custodial),current salary, beginning salary, months since hire,previous experience in months, minority classification(yes/no)Scatter plot: Graphs ! Legacy Dialogs ! Scatter/DotCorrelation: Analyze ! Correlate ! Bivariate Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionCorrelation … Scatter plotI Explore the relationship between variables: education level, currentsalary, beginning salary, months since hire, previousexperience in monthsI Plot scatter diagramsI Obtain the correlation coefficientsI Check whether correlation coefficients are significantSPSS demo on scatter plot and correlation:Scatter plot: Graphs ! Legacy Dialogs ! Scatter/DotCorrelation: Analyze ! Correlate ! Bivariate Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionCorrelation matrix Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionCorrelation … test of significance Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression … introductionRegression is a statistical technique for investigating and modelling therelationship between variables; more specifically, estimating the effect of a set ofvariables (explanatory or independent variables) on the response variable(dependent variable).Applications of regression are numerous and occur in almost every field, including:I engineeringI physical sciencesI economicsI business & managementI biological sciencesI social sciencesIn fact, regression analysis is one of the most widely used statistical techniques. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionTypical regression examplesOne may be interested in estimating the effect ofI age on height (plants, human being)I advertisement spending on product sellI amount of fertiliser use on crop yieldI IQ score on exam markI car mileage on car priceI item price on their demandIn multiple linear regression, you may want to estimate the effect of severalvariables simultaneously. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionTypical regression examples …However, the analysis, particularly, the choice of variables (dependent and the setof explanatory variables) will depend on your specific research objective. Inmedical studies, age is a common explanatory variable.However, age may be the dependent variable in many cases. For instance,I botanist may be interested in predicting age of trees based on their heightsand other factorsI archaeologist may want to determine the age of a historic site based on anumber of explanatory variables Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression model … setupA multiple linear regression model:Y = β0 + β1X1 + β2X2 + : : : + βpXp + Y ! dependent/response variableX1; X2; : : : xn ! independent/explanatory variablesβ0 ! interceptβ1; β2; : : : ; βn ! slopes or effect of the variables ! error termAssumptions:I liner relationshipI ! error terms are independent, normallydistributed with mean 0 and constant variance Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression analysis strategyFirst step:I investigate the relationship among the variable, particularly, the responsevariable with other variables using correlation analysisI if you find a reasonable indication that a linear regression of the responsevariable with other variables is suitable then perform the analysisSecond step:I Perform the regression analysis selecting the variables (response andexplanatory) appropriatelyI Check the results whether the error assumptions are met (independent orscattered, constant variance)I Normality of the errorsI Whether any multicollinearity existsI Find the best set of explanatory variables (only significant variables – finalmodel) Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression example … Employee.sav dataInformation on:I id, gender, birth date, education level (in singleyears), job category (managerial, clerical, custodial),current salary, beginning salary, months since hire,previous experience in months, minority classification(yes/no)Our research question is:I Are the factors: gender, education level, job category, beginning salary,months since hire, previous experience in months, minority classificationsignificant for salary change?I If so, can we predict the salary change for a person based on the set ofexplanatory variable values for that person. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression analysis with Employee.sav dataI First, compute salary change (salchange) = Current salary – BeginningsalaryDescriptive statistics of salchange: Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionExplore the relation between variables Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionTesting significance of correlation coefficients Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionTests for other categorical variablesI Perform t-test whether salary change is significant for gender and minority(two category variable)I Perform one-way ANOVA for to test whether salary change is significant forjob category Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionPerforming ML regression analysis – variable setup in SPSSCategorical variables need to re-generated as dummy variables:I Variables with two categories like gender and minority can be easily by codedas 1 and 0.I Job category variable has three categories. Therefore, two dummy variablesneed to be created:Dummy VariablesCustodial ManagerialClerical (baseline category) 0 0Custodial 1 0Managerial 0 1 Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionPerforming ML regression analysisLinear regression: Analyze ! Regression ! Linear Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression SPSS Demo …Look at:I Model SummaryI ANOVA tableI CoefficientsFurther checks on:I error assumptions: independent or scattered, constant varianceI normality of the errorsI any multicollinearityI any outlier or influential observationI variable selection Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression SPSS Demo …Issues:I Errors are not scatteredI Variance is not constantI Error distribution is not normalRemedial measures:I There are several ways to solve these issuesI One simple way is to make a transformation of the response variable (salarychange)I we will perform a natural logarithm transformation and re-perform the analysisNo multicollinearity is observed as VIF for all variables found to be between 1 and10. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionRegression SPSS Demo …Identifying outliers: if the error is too high for an observationIdentifying influential observations:I Difference in Fits (DFF): An observation is deemed influential if the absolutevalue of its DFF value is greater than:2sn -p +p -2 2 = 2r4747-+72- 2 = 0:0387k ! no. of explanatory variables and n ! no. of total observationsI Cook’s distance: if greater than 0.5, then it may be influential, if greater than 1or far apart from other values, then it quite likely to be influential.I Leverage: A common rule is to flag any observation whose leverage value ismore than 3 times larger than the mean leverage value: p/n = 7/474 = 0.0148. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionChoosing the best set of explanatory variablesSPSS options:I Enter: forces all variables to be in the modelI Stepwise: removing the weakest correlated variableI Remove: all variables in a block are removed in a single step.I Backward: all variables are entered into the equation and then sequentiallyremoved based on the smallest partial correlationI Forward: adding variables based on the highest correlation/partial correlationA demo with different selection method for Employee.sav data. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionANOVAAnalysis of variance (ANOVA) is a statistical technique that is used to check if themeans of three or more groups are significantly different from each other. ANOVAchecks the impact of one or more factors by comparing the means of differentsamples. In one-way ANOVA, we consider only one factor (with three or morecategories).Some examples:I whether different variety of crops give different amount of productionI whether different levels of factors affect plants and wildlifeI whether different types of promotions, store layouts, advertisement tactics,etc. lead to different salesI whether or not different medications affect patients differently Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionANOVA assumptionsAssumptions:I Independence of observationsI Normally-distributed response variableI Homogeneity of varianceIf the assumptions are not satisfied, we can perform non-parametric approaches. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionOne-way ANOVA – Diet.sav datasetVariablesI Person, Gender, Age, Height, Preweight, Diet,Weight6weeksOne-way ANOVA: we shall now consider only diet and weight loss (= Week6weeks– Preweight). Alsdataseto, we will think that experiment was conduct with ahomogeneous cohort of people (no other extraneous source of variation involved).Research interest:I Our main goal is to find whether the different diets have impacted weight loss.More specifically, whether the different diets impacted differentlyI If so, then which diet is different than others Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionPre-analysis descriptives and assumption checks Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionPre-analysis descriptives and assumption checks Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionANOVA (1-way): Analyze ! Compare Means ! One-Way ANOVA Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionDemo of ANOVA with Diet.sav dataANOVA outputs:I Tests of homogeneity of variancesI ANOVA tableI Post Hoc test table (multiple comparison) Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionTwo-way ANOVA with Diet.sav data …In two-way ANOVA, you add (believe) an extra source of variation, which isreferred to as “blocking” effect (gender is added as block)ANOVA (2-way): Analyze ! General Linear Model ! Univariate Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionTwo-way ANOVA …Options:I Model (selection)I Plots (interaction plot)I Post Hoc tests (between different levels of the factor)I Options (residual plot)Note: You still need to check the assumption (normality and homogeneity ofvariance) like you have done for one-way ANOVA. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionTwo-way ANOVA with Diet.sav data …Outputs:I ANOVA table (Tests of between-subjects effects)I Estimated marginal meansI Post Hoc TestI Profile plot (for interaction checking) Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionANCOVAANCOVA is similar to traditional ANOVA but is used to detect a difference inmeans of three or more independent groups, whilst controlling for scale covariates.Difference with MLR: the research objective (in MLR concentration is on allexplanatory factors)I Performed exactly the same way as you have seen in two-way ANOVAANOVA (2-way): Analyze ! General Linear Model ! UnivariateI model assumptions, selection, SPSS options exactly the same, other thanselecting the covariables Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionANCOVA …ANCOVA: Analyze ! General Linear Model ! Univariate Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionANCOVA … demoA quick demo of ANCOVA with Diet.sav data … Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionLogistic regression – introductionIn MLR, we have seen:I the response variable (dependent) is continuous andI takes values between -1 and +1Now consider that you want to find the significant factors associated withI developing lung cancer (Yes/No) – age, gender, ethnicity, occupation,smoking status, family history, etc.I customers would default (Yes/No) – age, gender, ethnicity, occupation,income group, number of family members, history, etc.I preference of apple’s iPhone (Yes/No) – age, gender, ethnicity, occupation,income group, region, etc. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionLogistic regression …In each example, the response variables has two outcomes: Yes and No.Therefore, the MLR cannot be applied.I We can apply the logistic regression modelI The model is, more specifically, referred to as “binary logistic regressionmodel”I The functional form of the model is given by:P(Y = 1) = exp(β0 + β1X1 + β2X2 + : : : + βpXp)1 + exp(β0 + β1X1 + β2X2 + : : : + βpXp)I We don’t have to understand the complex form, but it is worth noting that the“Yes” and “No” are modelling through some probabilistic mechanism Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionLogistic regression …Some good aspectsI minimal assumption unlike multiple linear regression (MLR)I easy way of interpretation of parameters using odds ratiosI SPSS implementation is much easier, even easier than MLRI significance testing of factors and model selection options in SPSS are similarto MLR (though mathematical setup are different) Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionA quick example … heart disease incidenceI Response variable: incidence of heart disease – Yes/NoI Explanatory variables: age (in years), weight (in Kg), gender (male – 1/female– 0), VO2max (maximal aerobic capacity) Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionA quick example … heart disease incidenceLogistic regression: Analyze ! Regression ! Binary Logistic Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionA quick example … heart disease incidence Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionA quick example … heart disease incidence Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionA quick example … heart disease incidence Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionLogistic regression – SPSS demo with CHD.sav datasetLogistic regression: Analyze ! Regression ! Binary LogisticI Run the modelI See the model result with “Enter method”I Find the most suitable model using forward/backward – conditional/LR/Wald Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionLearning summary and practicalI Basics on SPSS (brief)I Correlation and regressionI ANOVA (one and two-way) and ANCOVAI Logistic regressionPractical session:I Three datasets: Employee.sav, Diet.sav and CHD.savI You will do exactly what we have done so farI Perform correlation analysis and develop a regression model for salaryincrease with the relevant explanatory variablesI Perform one and two-way ANOVA and ANCOVA with the research questionswe have discussed for diet dataI Perform a logistic regression analysis to identify factors for heart disease andfind the most suitable model with CHD data. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionLast thingFeedback and future session:I A survey form will be circulated soon, please complete the survey.I It will help us to conduct future sessions with specific topics and softwaretraining. Asad, Dept. of Engineering, Staffordshire University Introduction: SPSS Correlation & Regression with SPSS ANOVA and ANCOVA Logistic regressionReferencesI Arbuckle, J. L. (2020). IBM SPSS Amos 27 User’s Guide. Amos DevelopmentCorporation.I Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linearregression analysis. John Wiley & Sons.I Montgomery, D. C. (2017). Design and analysis of experiments. John Wiley &sons.I Dobson, A. J., & Barnett, A. G. (2018). An introduction to generalized linearmodels. CRC press.I Many online lecture notes, websites and resources (from where images, textsand datasets are taken).