1.Continue using the dataset: Most-Recent-Cohorts-Scorecard-Elements.csv from https://catalog.data.gov/dataset/college-scorecard.
2.Create a subset dataframe using State Abbreviation = CA. (Note: remove all NA values).
3.Compute the standard scores for the SAT_AVG variable, include mean and SD.
4.Perform steps 3-4 with Pennsylvania.
5.Calculate the SEM for both CA and PA on the SAT_AVG variable.
In a 3- to 5-page APA formatted paper, address the following:
1.Describe what colleges have the highest and lowest average SAT scores and why.
2.What are the differences between the standard errors of both states?
3.Compare California and Pennsylvania schools.
4.Include all R code as an appendix.
___________________________________________________________________________
In the final week of this course, you will review, analyze, and report out, descriptively, on a dataset.
Use the CMS.gov information:
https://data.cms.gov/provider-data/dataset/99ue-w85f
Open the dataset in R. Perform all descriptive analyses needed to describe the distribution of the variables in the dataset. If any data transformation of manipulation is needed, make these modifications.
In a 10- to 15-page APA formatted paper, using a minimum of two scholarly resources describe the data in preparation for more advanced analysis. Be sure to include:
1.Title page
2.Background on the dataset
3.The methodology used to collect the data
4.Results from the descriptive analysis to include:
a.Statistics
b.Visualizations
5.Discussion of the soundness of the dataset to include (if applicable):
a.Justification of data transformation
b.Concerns due to data collection
6.All code as an appendix