Using Python, calculate or apply the following descriptive and inferential statistics techniques to the chosen data sets. Interpret the results where applicable. Provide the Python code (used to perform the statistical analyses) in a separate file.

Data Science project: Data Sets related to Covid-19, global vaccinations and vaccinations in Australia

Assignment Instructions 1

Tasks:

1. Pitch for a dataset of your choice from the following data source for global Covid-19
vaccinations:
https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations
(Choose vaccinations.csv)

Enrich and explain your initial dataset selection with information from one or two datasets in the following data source:

https://www.covid19data.com.au

(Choose datasets related to global Covid vaccinations, vaccinations in Australia and Covid-19 in Australia)

2. Profile the selected data using descriptive and inferential statistics techniques (with Python).

3. Propose one main problem (regarding the statistical analyses of the datasets) to be solved
later in the ‘Data science project 2.’

4. Present the key outcomes of Tasks 1 – 3 above in a report of 2000 words.

Step 2 involves Python programming to perform descriptive and inferential statistical analyses of the chosen datasets.

This includes measuring the central tendency (e.g. mean, median, mode) and spread (e.g. range, interquartile range, variance, standard deviation etc) in the data as well as computing and plotting (e.g. histograms) summary statistics using Python. Provide the Python code in a separate file.

The statistical analyses should be done using Python. There is no need to use other statistical software (e.g. Excel) except for the purpose of affirming the prior analyses with Python.

Make sure to propose one main PROBLEM (regarding the statistical analyses of the data sets) TO BE SOLVED LATER in Data Science project 2. (See the attached Assignment Instructions 2 and ‘Tasks’)

Using Python, calculate or apply the following descriptive and inferential statistics techniques to the chosen data sets. Interpret the results where applicable. Provide the Python code (used to perform the statistical analyses) in a separate file.

Descriptive Statistics techniques:

1) Levels of Measurement
• Nominal data
• Ordinal data
• Interval data
• Ratio data
2) Continuous and discrete data
3) Measures of Central Tendency
• Mean
• Median
• Mode
4) Measures of Dispersion
• Range
• The Interquartile Range (IQR)
• Variance
• Standard deviation
5) Distributions of data
Histogram
Common types of continuous data distributions
• Uniform distribution
• Normal distribution
• Skewed distributions
• Binomial distribution
Inferential Statistics techniques – Part 1
1) The Standard Normal Distribution
• Z-score
• The empirical rule or the 68–95–99.7 rule
2) The Central Limit Theorem
• Parameters versus statistics
• The sampling distribution
3) Confidence Intervals (CI)
• Confidence levels and Z-statistic
• The Z-statistic (z*)
• The confidence interval of population mean
• The confidence interval of population proportion

Inferential Statistics techniques – Part 2

1) Statistical Hypotheses

• The alternative hypothesis
• The null hypothesis

2) The p-value

3) The t-Distributions
• Degrees of freedom
• The t-statistic
• Calculating p-value given t-statistic
4) The 4-Step Process of Hypothesis Test
• Step 1: State
• Step 2: Plan
• Step 3: Solve
• Step 4: Conclude
5) One-Sample t-Test
6) Two-Sample t-Test

_______________________________________
Address the following in the report:

1) The first task is to pitch the dataset. The data science skill most relevant at this stage is the domain knowledge (i.e. demonstrate as much domain knowledge as possible).

When pitching your dataset, consider addressing one or more of the following concerns:

• source of data
• validity of data
• why the dataset matters (in practical or academic terms)
• domain knowledge
• relevance to you etc.

2) The subsequent tasks are to profile the data using descriptive and inferential statistics techniques. The statistics or maths + programming skills are the most relevant here.

Consider addressing one or more of the following concerns:

• scope of dataset (rows vs columns)
• data types
• centrality
• spread
• shape of data etc.

3) The last task is to propose one main problem to be addressed in the subsequent assignment.

The data science skill most relevant at this stage is the domain knowledge. In formulating the problem, you may want to consider addressing the following:

• what might the data tells us
• what would you like to explore further based on your initial data profiling
• what existing assumption you want to test
• what new idea you want to test

Last Completed Projects

topic title academic level Writer delivered
© 2020 EssayQuoll.com. All Rights Reserved. | Disclaimer: For assistance purposes only. These custom papers should be used with proper reference.