**Part 1. First 1000 COVID-19 cases in Singapore**

Part 1 of the Excel data file provides selected patient information of the first 1000 confirmed cases of COVID-19 in Singapore.

Answer all six questions below by analysing the data

For each question, state clearly the sample size used the assumptions with justifications and the potential flaws of the approach chosen (no approach is perfect).

If you find the question ambiguous, follow your own interpretation, then provide a justification of your interpretation.

- Plot a histogram showing the distribution of COVID-19 patients by age group,
*g*. 0 – 10 years, 11 – 20 years, 21 – 30 years,*etc*. What is the average age to contract the virus? - According to the data provided, is it true that males generally take longer to recover than females? How conclusive is your answer?
- According to the data provided, is it true that males are less likely to recover (i.e. die from COVID-19), than females? How conclusive is your answer?
- Estimate the mean number of days to recover with 95% confidence intervals.
- According to the data provided, is it true that a Singaporean (nationality-wise) male is more likely to contract COVID-19 than a Singaporean female? How conclusive is your answer?
- According to the data, is it true that a COVID-19 patient at or above the age of 50 takes longer to recover than a patient who is younger than 50? How conclusive is your answer?

**Part 2. COVID-19 in Singapore and overseas**

Part 2 of the Excel data file provides the data of the number of new cases of COVID-19 every day, between 31 December 2019 and 2 August 2020, confirmed in Singapore, China, United Kingdom, and the United States.

Analyse the data and answer all three questions below.

If you find the question ambiguous, follow your own interpretation, then provide a justification of your interpretation.

- How does the daily number of new cases in Singapore correlate to the daily number of new cases in China, the United Kingdom, and the United States, respectively?
- When COVID-19 spreads from one country to another, we may see some similarities in the behaviour of the outbreaks. For example, the number of new cases in Singapore may not correlate well with the number of new cases in China on a same-day basis but may correlate better with the number of new cases in China
*x*days ago. Find out the values of*x*that maximise the correlation between the number of new cases in Singapore and the number of new cases in China, the United Kingdom, and the United States, respectively. Explain the method used to determine*x*and the assumptions made with justifications. Solutions that use an automated algorithm to find out*x*will score more marks (of course, you need to clearly explain the algorithm used and how it is computationally implemented). - Propose numerical models that describe the trends in the number of new cases in Singapore, China, the United Kingdom, and the United States, respectively. How well do the models fit?