Kingsborough Community College The City University of New York Departments of Biological Sciences and Mathematics BIO/MAT9100                      Biostatistics Computer Research Project Spring 2022                M.T. Ortiz, Ph.D. Introduction and Background This semester our BIO/MAT9100 (Biostatistics) classes will work with the Speech 29 (Voice and Articulation) and Speech 40 (Phonetics) classes to conduct a joint research project in linguistics.  The students in the speech classes will measure speech sounds in some way in two or more groups of subjec

Kingsborough Community College

The City University of New York

Departments of Biological Sciences and Mathematics

 

BIO/MAT9100                      Biostatistics Computer Research Project Spring 2022                M.T. Ortiz, Ph.D.

 

Introduction and Background

This semester our BIO/MAT9100 (Biostatistics) classes will work with the Speech 29 (Voice and Articulation) and Speech 40 (Phonetics) classes to conduct a joint research project in linguistics.  The students in the speech classes will measure speech sounds in some way in two or more groups of subjects.  We will look at properties of those speech sounds, conduct statistical analysis of the data generated, draw conclusions based on the results obtained, and share those findings with students in the speech classes.

 

The Data We Receive

The data we receive will be in an Excel table with several columns.  Even though the columns in the tables we receive may be a little different, the columns may have titles like: SUBJECT, TYPE, CONSONANT, PLACE, VOICING, LENGTH, and DURATION.  Each of these titles is described/defined below:

 

SUBJECT

This is the list of subjects (people) who participated in the research study.  Real names are not presented in the table.  The names are coded to protect the individual’s privacy.

TYPE

This column lists the category for how many languages a subject speaks and when those languages were learned.  There are three possible entries in this column:

               Monolingual = the subject has learned one language only

               Late Bilingual = a second language was learned by the subject in school

               Heritage = two or more languages were learned at home by the subject before starting school

CONSONANT

This column contains the consonant sound spoken by the subject, such as “b”, “d”, “k”.  If a colon is after the consonant letter, this indicates a long consonant.  For example, “d:” indicates a long “d” sound.

PLACE

This column indicates where in, or around the mouth, the sound spoken is being made.  There are three possible entries in this column:

               Labial means the sound spoken is formed by the lips.

               Dorsal means the sound spoken is being made by the dorsal or back part of the tongue.

               Coronal means the sound spoken is being formed by the front part of the mouth.

VOICING

This column indicates how the sound is being generated.  There are two possible entries in this column:

               Voiced means the vocal cords vibrate to make the sound, such as in sounding out “b”.

               Voiceless means the vocal cords do not vibrate to make the sound, such as in sounding out “p”.

LENGTH

This column indicates if the consonant is short or long.  There are two possible entries in this column:

               Singleton indicates a short consonant

               Geminate indicates a long consonant

This characteristic is seen in some languages, such as Arabic, Estonian, and Italian, to name a few.

DURATION

This column provides the duration, in seconds, of the sound produced by the subject.

 

This semester there may be columns with titles such as: PITCH, MANNER, POSITION.  If we receive column titles that differ from the ones defined above, you will be given information on what these titles mean.

 

Previous work and the current study

Previous examples of questions students in the speech classes  have tried to answer are:

“Is there a difference in sound duration between native speakers and non-native speakers of a language?”

“Which group is more similar in sound duration to Monolingual subjects, the Late Bilingual or the Heritage subjects?”

This semester, the students in the speech classes will seek to answer the following questions:

1.      Does the pitch of males’ and females’ voices vary depending on the culture they mainly identify with?

2.      Are the vowels of the Brooklyn variety of North American English different from those of “standard North American English”?

3.      Are bi- and multilingualism associated with a decrease in speech fluency?

4.      Are there gender differences in speech rate?

5.      How are the sounds of the X variety of English different from the Y variety of English? (Here we could compare any two different varieties, for instance Jamaican English and Spanish-accented English.)

Note: We will have a specific question for you to answer well-before Part 2 of this project is due.

What you will do

Your project will be divided into two parts:

Part 1 – Analysis using data from a previous semester

Part 2 – Research using data recorded this semester

Part 1Analysis using data from a previous semester

a.      Select two subjects from the “91 Previous Data” Excel table.  Place the data from those two subjects into a separate Excel table that you will submit with your project.

b.      For each subject, calculate the following descriptive statistics on the DURATION data.  Place your results in a table in a Word document you create with your name at the top of the document:

Mean

Median

Mode

Standard deviation (SD)

Standard Error of the Mean (SEM)

Maximum

Minimum

Range

c.      Graph: Graph the means and standard deviations of the DURATIONS for the two subjects.  Insert this graph below the descriptive statistics table in your Word document.

d.      Binomial distribution: Using the data from the subjects you have selected, determine whether the DURATION data meet the criteria for a Binomial Distribution. Write this determination in a paragraph below the graph in your Word document.

e.      t-Test: Using the data from the subjects you have selected, perform a t-Test on the data.  Determine if there is a difference in DURATION between the two subjects.  If so, what is the difference?  What does the difference mean?  If there is no difference, what does this indicate?  Place the results of your t-Test and the answers to these questions below the Binomial Distribution paragraph in your Word document.

f.       Conclusion – What can you infer/conclude from the results you obtained?  Write a paragraph with your conclusion and explanation below the t-Test paragraph in your Word document.

Part 2Research using data recorded this semester

a.      Select 5 subjects from the “Current Research” Excel table.  Place these data into a separate Excel table that you will submit with your project.

b.      For each subject, calculate the following descriptive statistics on the DURATION data:

Mean

Median

Mode

Standard deviation (SD)

Standard Error of the Mean (SEM)

Maximum

Minimum

Range

c.      Graph: Graph the means and standard deviations of the DURATIONS for the 5 subjects.

d.      If there are more than 30 data points for each subject, and the data meet the criteria for ANOVA, conduct an ANOVA (parametric test) on the data of the subjects to test for equality of the means of DURATION.  Provide an explanation for why you chose to use this test.  Use the following website to do the ANOVA calculation:  www.socscistatistics.com

This website is easy to use.  If you need help, ask Prof. Ortiz for assistance with it.

e.      If there are less than 30 data points for any of the subjects, or the data do not meet the criteria for ANOVA, then perform a Kruskal-Wallis test (nonparametric test) on the data to test for equality of the means of DURATION.  Provide an explanation for why you chose to use this test.  Use the following website to do the Kruskal-Wallis calculation:  www.socscistatistics.com

f.       Conclusion: Using the results of your data analysis, what can you conclude about the data?  What is the answer to the question we are researching?

g.      Provide input on this research project: Did you learn anything participating in this project?  Were there any problems you encountered?  If so, what were they?  Did you enjoy participating in this project?  Why or why not?  What changes/suggestions would you suggest for doing this project in the future?

 

Due dates

Part 1 is due Mon, May 2, 2022, 9am EST via upload to File Exchange in your Group on Blackboard.  Upload, separately, both the Excel table with your raw data and the Word document.  Call each file “Part 1”.

Part 2 is due Tue, May 31, 2022, 9am EST via upload to File Exchange in your Group on Blackboard.  Upload, separately, both the Excel table with your raw data and the Word document.  Call each file “Part 2”.

How your project will be graded:

Part 1                                                                               Point value

1.      Excel table with raw data                              5

2.      Descriptive statistics table                           10

3.      Graph                                                                10

4.      Binomial distribution                                       5

5.      t-Test                                                                10

6.      Conclusion                                                       10

Part 2

1.      Excel table with raw data                              5

2.      Descriptive statistics table                           10

3.      Graph                                                                10

4.      ANOVA or Kruskal-Wallis Test                    10

5.      Conclusion                                                       10

6.      Answers to Input questions                           5

————————————————————————————————

Total                                                                              100  points

 

This project is worth 30% of your final grade in Biostatistics.

 

Remember, if you have any questions at all, email Prof. Ortiz for help.

 

If adjustments need to be made to this document, you will be informed.