AsSalam and hi all. This will be my 1st posting on Data Analysis Methods & Techniques series. I start with an introduction to Time Series Anlaysis.
Data analysis, in general, can be understood as discovering valuable information from data. This process can be done by applying standard procedure that includes inspecting, cleaning, transforming and modelling data using analytical and statistical tools. Why is it important to analyse data? Analyzing data effectively will help us to make better decisions. In this posting, I will explain and give an example of how data analysis techniques, known as Time Series Analysis, work.
Time series analysis is a statistical technique that identifies trends and cycles over time. Time series data is a sequence of data points which measure the same variable at different points in time. For example, weekly sales figures or monthly car accident cases. By looking at time-related trends, analysis can forecast how the variable of interest may fluctuate in the future or pinpoint differences between data in the past.
When conducting a time series analysis, we will look for trends, seasonality or cyclic patterns in our data.
Now, let us look at or do some time series analysis. I will use an actual dataset of “Deaths by vaccination status, England”. It is official data from Office for National Statistics and can be accessed publicly at https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/deathsbyvaccinationstatusengland
***
Several dataset editions are available, but I will use the latest edition, the Deaths occurring between 1 April 2021 and 31 December 2022 edition of this dataset. Once you download the dataset, we need to prepare it before we can use it for analysis.
1. Open the dataset (.xlsx file) and go through it. In this dataset, you will see the Content tab that gives us information on what data are made available in this dataset. We will use data from Table 5 in this example: “Monthly counts of all registered deaths for ‘unvaccinated’ and ‘ever vaccinated’ by age group; for all deaths and deaths involving COVID-19, deaths occurring between 1 April 2021 and 31 December 2022, England.”
2. Based on the availability of data, and interest, we will analyse the death between unvaccinated and ever vaccinated for all deaths occurring between 1 April 2021 and 31 December 2022 in England.
3. Next, I will remove all other unwanted tabs or data and save this dataset as “time-series-analysis-1.”
4. We already have the ‘clean’ data we need now. But, there are a few more steps that we need to do before the analysis can be done. We notice that the data was made available by age group for both unvaccinated and ever-vaccinated. In some settings, the data are ready for analysis. Still, based on our objectives or requirements, it will be good to pre-process the data, in this scenario, by removing age group data categories and combining data based on Month for both unvaccinated and ever-vaccinated.
5. In this example, I will copy the data we need into a new collum, as shown in the figure below (labelled 2). Then, I will create a new collum with processed data that we will use for analysis (labelled 3). Notice the difference. As a final touch, I rearranged my data as shown in (labelled 4)
Now, we will do the time series analysis. A simple analysis method that in this example shows using excel only.
1. Line With Marker
2. Select Data
3. Chart elements
Ads:
Congratulations, we have successfully produced a data visualization in Line with the Markert chart. What’s next? It’s time for future analysis. Looks at any sign of trends, seasonality or cyclic patterns. What can we observe based on this chart?
1. The death of the ever-vaccinated is significantly higher than unvaccinated.
2. The death of unvaccinated show a stable pattern trend.
3. The death of ever-vaccinated show linear increases and decreases, but relatively stable high trends.
4. let us look at possible seasonality on ever-vaccinated. We consider the four sessionals (https://www.metoffice.gov.uk/weather/learn-about/met-office-for-schools/other-content/other-resources/our-seasons)
Spring (March, April and May)
Summer (June, July and August)
Autumn (September, October and November)
Winter (December, January and February)
Comparing each season, it does not show significant relation. For example, Spring (2021-4 to 2021-5) increase, while Spring (2022-4 to 2022-5) decrease.
Winter sessions also did not show a significant relation between 2021-12 and 2022-12. We could consider significant relation if on 2022-12 the data show increasing number, but in ths data, it did not. We also should consider, curently we did not have enough data for 2023-1 and 2023-2 or data for 2021-1 and 2021-2 if to compare the winter session.
5. For the Cyclic patterns, we could cross-check with vaccination uptake data, for example. (I will update this section later, need to obtain relavent data).
Weather/Sessions are just one example of what we can analyse based on the results from this time series analysis. We can do other cross-analyses, for example, comparing vaccination uptake, social-policy restrictions, new virus variants and more. By doing this analysis, we will try to find a relationship and possible causes or solutions that justify the data results and, for this example, will help justify the need or risk of covid-19 vaccine.
Alright, that’s basically how process performing a time series data analysis. This was just a glimpse of what time-series analysis offers. I hope you can see and understand the concept of Time series data analysis and how it can be applied to other datasets and needs. In my next Data Analysis Methods & Techniques series, I will share how to perform Time series analysis using four different tools, Excel, PowerBI, Python and Tableau.