How to download data from kaggle

# ! pip install -q kaggle
# files.upload()
# ! cp kaggle.json ~/.kaggle/
# ! chmod 600 ~/.kaggle/kaggle.json
 
 

Loading of Notebook might take some time because of Plotly visualizations. Kindly be patient!

What is COVID-19?

COVID-19 is a respiratory illness caused by a new virus. Symptoms include fever, coughing, sore throat and shortness of breath. The virus can spread from person to person, but good hygiene can prevent infection.

COVID-19 may not be fatal but it spreads faster than other diseases, like common cold. Every virus has Basic Reproduction number (R0) which implies how many people will get the disease from the infected person. As per inital reseach work R0 of COVID-19 is 2.7.

Currently the goal of all scientists around the world is to "Flatten the Curve". COVID-19 currently has exponential growth rate around the world which we will be seeing in the notebook ahead. Flattening the Curve typically implies even if the number of Confirmed Cases are increasing but the distribution of those cases should be over longer timestamp. To put it in simple words if say suppose COVID-19 is going infect 100K people then those many people should be infected in 1 year but not in a month.

The sole reason to Flatten the Curve is to reudce the load on the Medical Systems so as to increase the focus of Research to find the Medicine for the disease.

Every Pandemic has four stages:

Stage 1: Confirmed Cases come from other countries

Stage 2: Local Transmission Begins

Stage 3: Communities impacted with local transimission

Stage 4: Significant Transmission with no end in sight

Italy, USA, UK and France are the two countries which are currently in Stage 4 While India is in on the edge of Stage 3.

Other ways to tackle the disease like Corona other than Travel Ban, Cross-Border shutdown, Ban on immigrants are Testing, Contact Tracing and Quarantine.

Objective of the Notebook

Objective of this notebook is to study COVID-19 outbreak with the help of some basic visualizations techniques. Comparison of China where the COVID-19 originally originated from with the Rest of the World. Perform predictions and Time Series forecasting in order to study the impact and spread of the COVID-19 in comming days.

Let's get Started

Importing required Python Packages and Libraries

!pip install pmdarima
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pmdarima
  Downloading pmdarima-2.0.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (1.8 MB)
     |████████████████████████████████| 1.8 MB 8.5 MB/s 
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.24.3)
Collecting statsmodels>=0.13.2
  Downloading statsmodels-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.8 MB)
     |████████████████████████████████| 9.8 MB 46.2 MB/s 
Requirement already satisfied: pandas>=0.19 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.3.5)
Requirement already satisfied: setuptools!=50.0.0,>=38.6.0 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (57.4.0)
Requirement already satisfied: Cython!=0.29.18,!=0.29.31,>=0.29 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (0.29.32)
Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.0.2)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.7.3)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.1.0)
Requirement already satisfied: numpy>=1.21 in /usr/local/lib/python3.7/dist-packages (from pmdarima) (1.21.6)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.19->pmdarima) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.19->pmdarima) (2022.2.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.19->pmdarima) (1.15.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.22->pmdarima) (3.1.0)
Requirement already satisfied: patsy>=0.5.2 in /usr/local/lib/python3.7/dist-packages (from statsmodels>=0.13.2->pmdarima) (0.5.2)
Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.7/dist-packages (from statsmodels>=0.13.2->pmdarima) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=21.3->statsmodels>=0.13.2->pmdarima) (3.0.9)
Installing collected packages: statsmodels, pmdarima
  Attempting uninstall: statsmodels
    Found existing installation: statsmodels 0.12.2
    Uninstalling statsmodels-0.12.2:
      Successfully uninstalled statsmodels-0.12.2
Successfully installed pmdarima-2.0.1 statsmodels-0.13.2
%%capture
!pip3 install prophet
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#!pip install plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import datetime as dt
from datetime import timedelta
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score,silhouette_samples
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error,r2_score
import statsmodels.api as sm
from statsmodels.tsa.api import Holt,SimpleExpSmoothing,ExponentialSmoothing
from prophet import Prophet
from sklearn.preprocessing import PolynomialFeatures
from statsmodels.tsa.stattools import adfuller
from pmdarima.arima import auto_arima

# from pyramid.arima import auto_arima
std=StandardScaler()
#pd.set_option('display.float_format', lambda x: '%.6f' % x)
!gdown --id 1N7yV6DLbfwWyioq3JB4GLYt0pVIdcteH
/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.
  category=FutureWarning,
Downloading...
From: https://drive.google.com/uc?id=1N7yV6DLbfwWyioq3JB4GLYt0pVIdcteH
To: /content/Covid19.zip
100% 8.93M/8.93M [00:00<00:00, 32.4MB/s]
!unzip "/content/Covid19.zip"
Archive:  /content/Covid19.zip
  inflating: covid_19_data.csv       
  inflating: time_series_covid_19_confirmed.csv  
  inflating: time_series_covid_19_confirmed_US.csv  
  inflating: time_series_covid_19_deaths.csv  
  inflating: time_series_covid_19_deaths_US.csv  
  inflating: time_series_covid_19_recovered.csv  
covid=pd.read_csv("/content/covid_19_data.csv")
covid.head()
SNo ObservationDate Province/State Country/Region Last Update Confirmed Deaths Recovered
0 1 01/22/2020 Anhui Mainland China 1/22/2020 17:00 1.0 0.0 0.0
1 2 01/22/2020 Beijing Mainland China 1/22/2020 17:00 14.0 0.0 0.0
2 3 01/22/2020 Chongqing Mainland China 1/22/2020 17:00 6.0 0.0 0.0
3 4 01/22/2020 Fujian Mainland China 1/22/2020 17:00 1.0 0.0 0.0
4 5 01/22/2020 Gansu Mainland China 1/22/2020 17:00 0.0 0.0 0.0
print("Size/Shape of the dataset: ",covid.shape)
print("Checking for null values:\n",covid.isnull().sum())
print("Checking Data-type of each column:\n",covid.dtypes)
Size/Shape of the dataset:  (306429, 8)
Checking for null values:
 SNo                    0
ObservationDate        0
Province/State     78100
Country/Region         0
Last Update            0
Confirmed              0
Deaths                 0
Recovered              0
dtype: int64
Checking Data-type of each column:
 SNo                  int64
ObservationDate     object
Province/State      object
Country/Region      object
Last Update         object
Confirmed          float64
Deaths             float64
Recovered          float64
dtype: object
covid.drop(["SNo"],1,inplace=True)
covid["ObservationDate"]=pd.to_datetime(covid["ObservationDate"])
grouped_country=covid.groupby(["Country/Region","ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
grouped_country["Active Cases"]=grouped_country["Confirmed"]-grouped_country["Recovered"]-grouped_country["Deaths"]
grouped_country["log_confirmed"]=np.log(grouped_country["Confirmed"])
grouped_country["log_active"]=np.log(grouped_country["Active Cases"])

Datewise analysis

datewise=covid.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise["Days Since"]=datewise.index-datewise.index.min()
print("Basic Information")
print("Totol number of countries with Disease Spread: ",len(covid["Country/Region"].unique()))
print("Total number of Confirmed Cases around the World: ",datewise["Confirmed"].iloc[-1])
print("Total number of Recovered Cases around the World: ",datewise["Recovered"].iloc[-1])
print("Total number of Deaths Cases around the World: ",datewise["Deaths"].iloc[-1])
print("Total number of Active Cases around the World: ",(datewise["Confirmed"].iloc[-1]-datewise["Recovered"].iloc[-1]-datewise["Deaths"].iloc[-1]))
print("Total number of Closed Cases around the World: ",datewise["Recovered"].iloc[-1]+datewise["Deaths"].iloc[-1])
print("Approximate number of Confirmed Cases per Day around the World: ",np.round(datewise["Confirmed"].iloc[-1]/datewise.shape[0]))
print("Approximate number of Recovered Cases per Day around the World: ",np.round(datewise["Recovered"].iloc[-1]/datewise.shape[0]))
print("Approximate number of Death Cases per Day around the World: ",np.round(datewise["Deaths"].iloc[-1]/datewise.shape[0]))
print("Approximate number of Confirmed Cases per hour around the World: ",np.round(datewise["Confirmed"].iloc[-1]/((datewise.shape[0])*24)))
print("Approximate number of Recovered Cases per hour around the World: ",np.round(datewise["Recovered"].iloc[-1]/((datewise.shape[0])*24)))
print("Approximate number of Death Cases per hour around the World: ",np.round(datewise["Deaths"].iloc[-1]/((datewise.shape[0])*24)))
print("Number of Confirmed Cases in last 24 hours: ",datewise["Confirmed"].iloc[-1]-datewise["Confirmed"].iloc[-2])
print("Number of Recovered Cases in last 24 hours: ",datewise["Recovered"].iloc[-1]-datewise["Recovered"].iloc[-2])
print("Number of Death Cases in last 24 hours: ",datewise["Deaths"].iloc[-1]-datewise["Deaths"].iloc[-2])
Basic Information
Totol number of countries with Disease Spread:  229
Total number of Confirmed Cases around the World:  169951560.0
Total number of Recovered Cases around the World:  107140669.0
Total number of Deaths Cases around the World:  3533619.0
Total number of Active Cases around the World:  59277272.0
Total number of Closed Cases around the World:  110674288.0
Approximate number of Confirmed Cases per Day around the World:  344031.0
Approximate number of Recovered Cases per Day around the World:  216884.0
Approximate number of Death Cases per Day around the World:  7153.0
Approximate number of Confirmed Cases per hour around the World:  14335.0
Approximate number of Recovered Cases per hour around the World:  9037.0
Approximate number of Death Cases per hour around the World:  298.0
Number of Confirmed Cases in last 24 hours:  480835.0
Number of Recovered Cases in last 24 hours:  507600.0
Number of Death Cases in last 24 hours:  10502.0
fig=px.bar(x=datewise.index,y=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"])
fig.update_layout(title="Distribution of Number of Active Cases",
                  xaxis_title="Date",yaxis_title="Number of Cases",)
fig.show()

Active Cases = Number of Confirmed Cases - Number of Recovered Cases - Number of Death Cases

Increase in number of Active Cases is probably an indication of Recovered case or Death case number is dropping in comparison to number of Confirmed Cases drastically. Will look for the conclusive evidence for the same in the notebook ahead.

fig=px.bar(x=datewise.index,y=datewise["Recovered"]+datewise["Deaths"])
fig.update_layout(title="Distribution of Number of Closed Cases",
                  xaxis_title="Date",yaxis_title="Number of Cases")
fig.show()

Closed Cases = Number of Recovered Cases + Number of Death Cases

Increase in number of Closed classes imply either more patients are getting recovered from the disease or more pepole are dying because of COVID-19

datewise["WeekOfYear"]=datewise.index.weekofyear

week_num=[]
weekwise_confirmed=[]
weekwise_recovered=[]
weekwise_deaths=[]
w=1
for i in list(datewise["WeekOfYear"].unique()):
    weekwise_confirmed.append(datewise[datewise["WeekOfYear"]==i]["Confirmed"].iloc[-1])
    weekwise_recovered.append(datewise[datewise["WeekOfYear"]==i]["Recovered"].iloc[-1])
    weekwise_deaths.append(datewise[datewise["WeekOfYear"]==i]["Deaths"].iloc[-1])
    week_num.append(w)
    w=w+1

fig=go.Figure()
fig.add_trace(go.Scatter(x=week_num, y=weekwise_confirmed,
                    mode='lines+markers',
                    name='Weekly Growth of Confirmed Cases'))
fig.add_trace(go.Scatter(x=week_num, y=weekwise_recovered,
                    mode='lines+markers',
                    name='Weekly Growth of Recovered Cases'))
fig.add_trace(go.Scatter(x=week_num, y=weekwise_deaths,
                    mode='lines+markers',
                    name='Weekly Growth of Death Cases'))
fig.update_layout(title="Weekly Growth of different types of Cases in India",
                 xaxis_title="Week Number",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig, (ax1,ax2) = plt.subplots(1, 2,figsize=(18,5))
sns.barplot(x=week_num,y=pd.Series(weekwise_confirmed).diff().fillna(0),ax=ax1)
sns.barplot(x=week_num,y=pd.Series(weekwise_deaths).diff().fillna(0),ax=ax2)
ax1.set_xlabel("Week Number")
ax2.set_xlabel("Week Number")
ax1.set_ylabel("Number of Confirmed Cases")
ax2.set_ylabel("Number of Death Cases")
ax1.set_title("Weekly increase in Number of Confirmed Cases")
ax2.set_title("Weekly increase in Number of Death Cases")
Text(0.5, 1.0, 'Weekly increase in Number of Death Cases')
32nd week id currently going on.

The death toll was low in 14th week, as it was expected to rise looking at the trend of infection of death trend of previous few weeks.

Number of Death cases were consistently dropping since 14th week, upto 19th week. After which it's again showing a spike for two consecutive weeks.

We are somehow able to reduce the Death Numbers or maybe able to control it somehow, but new infections are increasing with considerable speed recording 800k+ cases in 21st week which is a staggering number.

The number infections are increasing every week, recording 1.2M+ Confirmed Cases in 24th week. 25th Week has recorded another peak in number of Confirmed Cases (1.5M+)

The infection rate is increasing with every passing week.

Growth rate of Confirmed, Recovered and Death Cases

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"],
                    mode='lines+markers',
                    name='Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Recovered"],
                    mode='lines+markers',
                    name='Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Deaths"],
                    mode='lines+markers',
                    name='Death Cases'))
fig.update_layout(title="Growth of different types of cases",
                 xaxis_title="Date",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

Moratality and Recovery Rate analysis around the World

datewise["Mortality Rate"]=(datewise["Deaths"]/datewise["Confirmed"])*100
datewise["Recovery Rate"]=(datewise["Recovered"]/datewise["Confirmed"])*100
datewise["Active Cases"]=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"]
datewise["Closed Cases"]=datewise["Recovered"]+datewise["Deaths"]

print("Average Mortality Rate",datewise["Mortality Rate"].mean())
print("Median Mortality Rate",datewise["Mortality Rate"].median())
print("Average Recovery Rate",datewise["Recovery Rate"].mean())
print("Median Recovery Rate",datewise["Recovery Rate"].median())

#Plotting Mortality and Recovery Rate 
fig = make_subplots(rows=2, cols=1,
                   subplot_titles=("Recovery Rate", "Mortatlity Rate"))
fig.add_trace(
    go.Scatter(x=datewise.index, y=(datewise["Recovered"]/datewise["Confirmed"])*100,name="Recovery Rate"),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=datewise.index, y=(datewise["Deaths"]/datewise["Confirmed"])*100,name="Mortality Rate"),
    row=2, col=1
)
fig.update_layout(height=1000,legend=dict(x=-0.1,y=1.2,traceorder="normal"))
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_yaxes(title_text="Recovery Rate", row=1, col=1)
fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_yaxes(title_text="Mortality Rate", row=1, col=2)
fig.show()
Average Mortality Rate 3.398557417508881
Median Mortality Rate 2.772038814120292
Average Recovery Rate 51.148201824468615
Median Recovery Rate 56.426751740200025

Mortality rate = (Number of Death Cases / Number of Confirmed Cases) x 100

Recovery Rate= (Number of Recoverd Cases / Number of Confirmed Cases) x 100

Mortality rate is showing a considerable for a pretty long time, which is positive sign

Recovery Rate has started to pick up again which is a good sign, another supportive reason to why number of Closed Cases are increasing

print("Average increase in number of Confirmed Cases every day: ",np.round(datewise["Confirmed"].diff().fillna(0).mean()))
print("Average increase in number of Recovered Cases every day: ",np.round(datewise["Recovered"].diff().fillna(0).mean()))
print("Average increase in number of Deaths Cases every day: ",np.round(datewise["Deaths"].diff().fillna(0).mean()))

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"].diff().fillna(0),mode='lines+markers',
                    name='Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Recovered"].diff().fillna(0),mode='lines+markers',
                    name='Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Deaths"].diff().fillna(0),mode='lines+markers',
                    name='Death Cases'))
fig.update_layout(title="Daily increase in different types of Cases",
                 xaxis_title="Date",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
Average increase in number of Confirmed Cases every day:  344030.0
Average increase in number of Recovered Cases every day:  216884.0
Average increase in number of Deaths Cases every day:  7153.0
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"].diff().rolling(window=7).mean(),mode='lines+markers',
                    name='Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Recovered"].diff().rolling(window=7).mean(),mode='lines+markers',
                    name='Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Deaths"].diff().rolling(window=7).mean(),mode='lines+markers',
                    name='Death Cases'))
fig.update_layout(title="7 Days Rolling Mean of Daily Increase of Confirmed, Recovered and Death Cases",
                 xaxis_title="Date",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

Growth Factor

Growth factor is the factor by which a quantity multiplies itself over time. The formula used is:

Formula: Every day's new (Confirmed,Recovered,Deaths) / new (Confirmed,Recovered,Deaths) on the previous day.

A growth factor above 1 indicates an increase correspoding cases.

A growth factor above 1 but trending downward is a positive sign, whereas a growth factor constantly above 1 is the sign of exponential growth.

A growth factor constant at 1 indicates there is no change in any kind of cases.

print("Average growth factor of number of Confirmed Cases: ",(datewise["Confirmed"]/datewise["Confirmed"].shift()).mean())
print("Median growth factor of number of Confirmed Cases: ",(datewise["Confirmed"]/datewise["Confirmed"].shift()).median())
print("Average growth factor of number of Recovered Cases: ",(datewise["Recovered"]/datewise["Recovered"].shift()).mean())
print("Median growth factor of number of Recovered Cases: ",(datewise["Recovered"]/datewise["Recovered"].shift()).median())
print("Average growth factor of number of Death Cases: ",(datewise["Deaths"]/datewise["Deaths"].shift()).mean())
print("Median growth factor of number of Death Cases: ",(datewise["Deaths"]/datewise["Deaths"].shift()).median())

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"]/datewise["Confirmed"].shift(),
                    mode='lines',
                    name='Growth Factor of Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Recovered"]/datewise["Recovered"].shift(),
                    mode='lines',
                    name='Growth Factor of Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Deaths"]/datewise["Deaths"].shift(),
                    mode='lines',
                    name='Growth Factor of Death Cases'))
fig.update_layout(title="Datewise Growth Factor of different types of cases",
                 xaxis_title="Date",yaxis_title="Growth Factor",
                 legend=dict(x=0,y=-0.4,traceorder="normal"))
fig.show()
Average growth factor of number of Confirmed Cases:  1.0281591322080432
Median growth factor of number of Confirmed Cases:  1.0105328040968438
Average growth factor of number of Recovered Cases:  1.033783342773454
Median growth factor of number of Recovered Cases:  1.0112782082196978
Average growth factor of number of Death Cases:  1.027312583713661
Median growth factor of number of Death Cases:  1.0071398973639754

Growth Factor for Active and Closed Cases

Growth factor is the factor by which a quantity multiplies itself over time. The formula used is:

Formula: Every day's new (Active and Closed Cases) / new (Active and Closed Cases) on the previous day.

A growth factor above 1 indicates an increase correspoding cases.

A growth factor above 1 but trending downward is a positive sign.

A growth factor constant at 1 indicates there is no change in any kind of cases.

A growth factor below 1 indicates real positive sign implying more patients are getting recovered or dying as compared to the Confirmed Cases.

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, 
                         y=(datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"])/(datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"]).shift(),
                    mode='lines',
                    name='Growth Factor of Active Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=(datewise["Recovered"]+datewise["Deaths"])/(datewise["Recovered"]+datewise["Deaths"]).shift(),
                    mode='lines',
                    name='Growth Factor of Closed Cases'))
fig.update_layout(title="Datewise Growth Factor of Active and Closed Cases",
                 xaxis_title="Date",yaxis_title="Growth Factor",
                 legend=dict(x=0,y=-0.4,traceorder="normal"))
fig.show()

Growth Factor constantly above 1 is an clear indication of Exponential increase in all form of cases.

Rate of Doubling for Confirmed Cases around the World

c=560
double_days=[]
C=[]
while(1):
    double_days.append(datewise[datewise["Confirmed"]<=c].iloc[[-1]]["Days Since"][0])
    C.append(c)
    c=c*2
    if(c<datewise["Confirmed"].max()):
        continue
    else:
        break
doubling_rate=pd.DataFrame(list(zip(C,double_days)),columns=["No. of cases","Days since first Case"])
doubling_rate["Number of days for doubling"]=doubling_rate["Days since first Case"].diff().fillna(doubling_rate["Days since first Case"])
doubling_rate
No. of cases Days since first Case Number of days for doubling
0 560 0 days 0 days
1 1120 2 days 2 days
2 2240 4 days 2 days
3 4480 5 days 1 days
4 8960 8 days 3 days
5 17920 11 days 3 days
6 35840 16 days 5 days
7 71680 25 days 9 days
8 143360 50 days 25 days
9 286720 58 days 8 days
10 573440 64 days 6 days
11 1146880 72 days 8 days
12 2293760 86 days 14 days
13 4587520 114 days 28 days
14 9175040 152 days 38 days
15 18350080 194 days 42 days
16 36700160 260 days 66 days
17 73400320 327 days 67 days
18 146800640 458 days 131 days

Doubling Rate is fluctuating very much, which ideally supposed to increase if we are successfully faltening the curve.

Number of days requried for increase in Confirmed Cases by 300K

c1=100000
days_300k=[]
C1=[]
while(1):
    days_300k.append(datewise[datewise["Confirmed"]<=c1].iloc[[-1]]["Days Since"][0])
    C1.append(c1)
    c1=c1+300000
    if(c1<datewise["Confirmed"].max()):
        continue
    else:
        break
rate_300k=pd.DataFrame(list(zip(C1,days_300k)),columns=["No. of Cases","Days Since first Case"])
rate_300k["Days requried for rise of 300K"]=rate_300k["Days Since first Case"].diff().fillna(rate_300k["Days Since first Case"].iloc[[0]][0])

fig=go.Figure()
fig.add_trace(go.Scatter(x=rate_300k["No. of Cases"], y=rate_300k["Days requried for rise of 300K"].dt.days,
                    mode='lines+markers',
                    name='Weekly Growth of Confirmed Cases'))
fig.update_layout(title="Number of Days required for increase in number of cases by 300K",
                 xaxis_title="Number of Cases",yaxis_title="Number of Days")
fig.show()

It's hardly taking a day or two for rise in cases by 300k, which is pretty much a clear indication that we are still not able to "Flatten the curve"

Countrywise Analysis

countrywise=covid[covid["ObservationDate"]==covid["ObservationDate"].max()].groupby(["Country/Region"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'}).sort_values(["Confirmed"],ascending=False)
countrywise["Mortality"]=(countrywise["Deaths"]/countrywise["Confirmed"])*100
countrywise["Recovery"]=(countrywise["Recovered"]/countrywise["Confirmed"])*100
country_last_24_confirmed=[]
country_last_24_recovered=[]
country_last_24_deaths=[]
for country in countrywise.index:
    country_last_24_confirmed.append((grouped_country.loc[country].iloc[-1]-grouped_country.loc[country].iloc[-2])["Confirmed"])
    country_last_24_recovered.append((grouped_country.loc[country].iloc[-1]-grouped_country.loc[country].iloc[-2])["Recovered"])
    country_last_24_deaths.append((grouped_country.loc[country].iloc[-1]-grouped_country.loc[country].iloc[-2])["Deaths"])
Last_24_Hours_country=pd.DataFrame(list(zip(countrywise.index,country_last_24_confirmed,country_last_24_recovered,country_last_24_deaths)),
                                   columns=["Country Name","Last 24 Hours Confirmed","Last 24 Hours Recovered","Last 24 Hours Deaths"])
Top_15_Confirmed_24hr=Last_24_Hours_country.sort_values(["Last 24 Hours Confirmed"],ascending=False).head(15)
Top_15_Recoverd_24hr=Last_24_Hours_country.sort_values(["Last 24 Hours Recovered"],ascending=False).head(15)
Top_15_Deaths_24hr=Last_24_Hours_country.sort_values(["Last 24 Hours Deaths"],ascending=False).head(15)


fig, (ax1, ax2, ax3) = plt.subplots(3, 1,figsize=(10,20))
sns.barplot(x=Top_15_Confirmed_24hr["Last 24 Hours Confirmed"],y=Top_15_Confirmed_24hr["Country Name"],ax=ax1)
ax1.set_title("Top 15 Countries with Highest Number of Confirmed Cases in Last 24 Hours")
sns.barplot(x=Top_15_Recoverd_24hr["Last 24 Hours Recovered"],y=Top_15_Recoverd_24hr["Country Name"],ax=ax2)
ax2.set_title("Top 15 Countries with Highest Number of Recovered Cases in Last 24 Hours")
sns.barplot(x=Top_15_Deaths_24hr["Last 24 Hours Deaths"],y=Top_15_Deaths_24hr["Country Name"],ax=ax3)
ax3.set_title("Top 15 Countries with Highest Number of Death Cases in Last 24 Hours")
Text(0.5, 1.0, 'Top 15 Countries with Highest Number of Death Cases in Last 24 Hours')
Last_24_Hours_country["Proportion of Confirmed"]=(Last_24_Hours_country["Last 24 Hours Confirmed"]/(datewise["Confirmed"].iloc[-1]-datewise["Confirmed"].iloc[-2]))*100
Last_24_Hours_country["Proportion of Recovered"]=(Last_24_Hours_country["Last 24 Hours Recovered"]/(datewise["Recovered"].iloc[-1]-datewise["Recovered"].iloc[-2]))*100
Last_24_Hours_country["Proportion of Deaths"]=(Last_24_Hours_country["Last 24 Hours Deaths"]/(datewise["Deaths"].iloc[-1]-datewise["Deaths"].iloc[-2]))*100

Proportion of Countries in Confirmed, Recovered and Death Cases

Last_24_Hours_country[["Country Name","Proportion of Confirmed","Proportion of Recovered","Proportion of Deaths"]].sort_values(["Proportion of Confirmed"],ascending=False).style.background_gradient(cmap="Reds")
  Country Name Proportion of Confirmed Proportion of Recovered Proportion of Deaths
1 India 34.430314 54.434397 32.946106
2 Brazil 16.569093 0.694050 19.158256
8 Argentina 6.206079 7.018125 3.951628
11 Colombia 4.262169 3.873325 5.141878
0 US 2.490667 0.000000 3.266045
3 France 2.397288 0.134358 0.628452
5 Russia 1.903980 1.801812 3.761188
39 Malaysia 1.875903 1.088849 0.933156
22 Chile 1.708694 1.515957 1.133118
4 Turkey 1.592230 2.202522 1.304513
23 Philippines 1.544813 1.463357 1.485431
12 Iran 1.478054 2.911939 1.647305
17 Indonesia 1.365333 1.067179 1.542563
61 Uruguay 1.242422 0.721040 0.552276
16 Peru 1.144051 0.931442 1.542563
82 Thailand 0.998887 0.000000 0.323748
20 South Africa 0.939823 0.524823 0.666540
9 Germany 0.938576 1.388889 0.504666
40 Nepal 0.896565 1.197794 1.104552
33 Japan 0.749529 1.420410 0.866502
18 Netherlands 0.704192 0.005910 0.076176
7 Italy 0.696289 1.491135 0.790326
71 Bahrain 0.680691 0.499015 0.152352
24 Iraq 0.677363 1.068755 0.219006
6 UK 0.674036 0.000197 0.066654
15 Ukraine 0.671748 2.029748 1.542563
78 Sri Lanka 0.599374 0.399724 0.000000
14 Mexico 0.566722 0.220843 3.646924
28 Pakistan 0.560899 0.516154 0.533232
53 Paraguay 0.505163 0.438731 0.904590
21 Canada 0.471264 0.764972 0.257094
51 Bolivia 0.445891 0.298660 0.552276
27 Belgium 0.386827 0.000000 0.104742
38 United Arab Emirates 0.376844 0.350473 0.047610
45 Ecuador 0.327763 0.000000 0.733194
47 Greece 0.311333 0.000000 0.276138
54 Tunisia 0.295736 0.295902 0.580842
72 Venezuela 0.269531 0.229708 0.190440
60 Dominican Republic 0.257469 0.038416 0.047610
84 Cuba 0.247070 0.285461 0.095220
59 Kuwait 0.235840 0.225374 0.038088
65 Egypt 0.232720 0.240544 0.485622
43 Saudi Arabia 0.230017 0.250985 0.133308
105 Maldives 0.220866 0.270095 0.019044
62 Denmark 0.217122 0.171395 0.038088
32 Bangladesh 0.216914 0.233846 0.361836
48 Belarus 0.206308 0.238968 0.095220
55 Georgia 0.205060 0.116233 0.114264
103 Afghanistan 0.204020 0.031915 0.171396
99 Cameroon 0.196949 0.000000 0.047610
68 Guatemala 0.188838 0.150709 0.257094
13 Poland 0.161386 0.386919 1.190249
112 Uganda 0.145580 0.000000 0.009522
29 Portugal 0.126655 0.100473 0.000000
123 Cambodia 0.122287 0.081757 0.066654
50 Panama 0.119376 0.095745 0.038088
63 Lithuania 0.108145 0.342790 0.095220
154 Vietnam 0.106481 0.000000 0.000000
70 Honduras 0.104194 0.032703 0.114264
37 Austria 0.103570 0.168834 0.038088
151 Taiwan 0.102114 0.000000 0.199962
85 South Korea 0.099618 0.126478 0.057132
31 Hungary 0.096707 0.607171 0.257094
19 Czech Republic 0.094419 0.490544 0.123786
79 Kenya 0.092339 0.022656 0.161874
42 Morocco 0.085268 0.058511 0.028566
108 Namibia 0.081317 0.037037 0.104742
128 Trinidad and Tobago 0.079237 0.061269 0.114264
92 Kyrgyzstan 0.074038 0.090229 0.085698
69 Slovenia 0.073414 0.097124 0.019044
52 Croatia 0.070294 0.097912 0.095220
95 Zambia 0.066759 0.029748 0.009522
86 Latvia 0.063639 0.235028 0.104742
35 Serbia 0.056984 0.000000 0.095220
58 West Bank and Gaza 0.055944 0.078605 0.028566
25 Romania 0.055112 0.159574 0.695106
93 Uzbekistan 0.054904 0.054965 0.009522
89 Algeria 0.053657 0.036840 0.047610
64 Ethiopia 0.053241 0.182033 0.038088
90 Norway 0.050953 0.000000 0.000000
41 Lebanon 0.050745 0.261820 0.066654
118 Angola 0.049081 0.013593 0.076176
134 Suriname 0.043674 0.024823 0.057132
56 Azerbaijan 0.034107 0.082545 0.066654
74 Qatar 0.032444 0.066391 0.019044
121 Cabo Verde 0.031404 0.030930 0.019044
155 Timor-Leste 0.027036 0.000000 0.009522
88 Estonia 0.026412 0.074074 0.028566
46 Bulgaria 0.023917 0.020686 0.047610
176 Bhutan 0.020797 0.000000 0.000000
131 Guyana 0.020381 0.018125 0.019044
44 Kazakhstan 0.019965 0.017730 0.066654
136 Haiti 0.018925 0.003349 0.095220
110 Jamaica 0.017886 0.032112 0.057132
127 Guinea 0.012894 0.025808 0.000000
143 Bahamas 0.012686 0.027384 0.000000
101 Cyprus 0.011646 0.000000 0.000000
129 Mauritania 0.011646 0.005516 0.009522
149 Equatorial Guinea 0.011022 0.000000 0.047610
114 Madagascar 0.010399 0.029157 0.019044
94 Montenegro 0.009983 0.015760 0.028566
163 Burundi 0.009567 0.000000 0.000000
66 Moldova 0.009359 0.021474 0.028566
73 Armenia 0.008735 0.036643 0.047610
83 Burma 0.008319 0.003546 0.000000
165 Eritrea 0.008111 0.000000 0.000000
111 Ivory Coast 0.007903 0.010047 0.019044
113 Senegal 0.007279 0.007880 0.000000
106 Singapore 0.006863 0.003152 0.000000
80 Nigeria 0.006447 0.002167 0.000000
119 Congo (Kinshasa) 0.006239 0.002167 0.000000
125 Syria 0.006239 0.000985 0.047610
81 North Macedonia 0.006031 0.018125 0.104742
102 Mozambique 0.005615 0.052600 0.009522
30 Israel 0.004159 0.002758 0.000000
115 Zimbabwe 0.003120 0.002955 0.019044
124 Rwanda 0.002912 0.011820 0.000000
174 Saint Vincent and the Grenadines 0.002704 0.000000 0.000000
122 Australia 0.002704 0.000788 0.000000
87 Albania 0.002496 0.023247 0.009522
98 Mainland China 0.002288 0.000985 0.000000
156 Yemen 0.001664 0.004728 0.028566
177 Mauritius 0.001456 0.000394 0.000000
133 Somalia 0.001248 0.001773 0.000000
139 Burkina Faso 0.001248 0.000000 0.000000
117 Malawi 0.001248 0.000985 0.000000
159 Niger 0.000832 0.002364 0.000000
166 Barbados 0.000624 0.000000 0.000000
175 Laos 0.000624 0.023838 0.000000
135 Mali 0.000624 0.025808 0.009522
120 Malta 0.000624 0.000394 0.000000
130 Eswatini 0.000416 0.001182 0.000000
170 New Zealand 0.000416 0.001379 0.000000
167 Comoros 0.000416 0.000591 0.000000
169 Liechtenstein 0.000416 0.000000 0.000000
162 Chad 0.000208 0.000000 0.000000
172 Sao Tome and Principe 0.000208 0.000591 0.000000
142 Hong Kong 0.000208 0.001182 0.000000
164 Sierra Leone 0.000208 0.000591 0.000000
168 Guinea-Bissau 0.000208 0.000985 0.000000
179 Diamond Princess 0.000000 0.000000 0.000000
189 MS Zaandam 0.000000 0.000000 0.000000
161 Saint Lucia 0.000000 0.000000 0.000000
180 Tanzania 0.000000 0.000000 0.000000
193 Kiribati 0.000000 0.000000 0.000000
192 Samoa 0.000000 0.000000 0.000000
181 Fiji 0.000000 0.000000 0.000000
191 Marshall Islands 0.000000 0.000000 0.000000
190 Vanuatu 0.000000 0.000000 0.000000
160 San Marino 0.000000 0.000000 0.000000
184 Grenada 0.000000 0.000000 0.000000
187 Holy See 0.000000 0.000000 0.000000
185 Saint Kitts and Nevis 0.000000 0.000000 0.000000
186 Macau 0.000000 0.000000 0.000000
188 Solomon Islands 0.000000 0.000000 0.000000
171 Monaco 0.000000 0.000000 0.000000
182 Brunei 0.000000 0.000000 0.000000
173 Liberia 0.000000 0.000000 0.000000
183 Dominica 0.000000 0.000000 0.000000
178 Antigua and Barbuda 0.000000 0.000000 0.000000
97 Finland 0.000000 0.000000 0.000000
158 Gambia 0.000000 0.000000 0.000000
75 Oman 0.000000 0.000000 0.000000
104 Luxembourg 0.000000 0.000000 0.000000
100 El Salvador 0.000000 0.000000 0.038088
96 Ghana 0.000000 0.000000 0.000000
91 Kosovo 0.000000 0.000000 0.000000
77 Libya 0.000000 0.000000 0.000000
76 Bosnia and Herzegovina 0.000000 0.000000 0.000000
67 Ireland 0.000000 0.000000 0.000000
157 Iceland 0.000000 0.000000 0.000000
57 Costa Rica 0.000000 0.000000 0.000000
49 Slovakia 0.000000 0.000000 0.000000
36 Switzerland 0.000000 0.000000 0.038088
34 Jordan 0.000000 0.052994 0.000000
26 Sweden 0.000000 0.000000 0.000000
10 Spain 0.000000 0.000000 0.000000
107 Mongolia 0.000000 0.000000 0.000000
109 Botswana 0.000000 0.000000 0.000000
116 Sudan 0.000000 0.000000 0.000000
126 Gabon 0.000000 0.000000 0.000000
132 Papua New Guinea 0.000000 0.000000 0.000000
137 Andorra 0.000000 0.000000 0.000000
138 Togo 0.000000 0.000000 0.000000
140 Tajikistan 0.000000 0.000000 0.000000
141 Belize 0.000000 0.000000 0.000000
144 Congo (Brazzaville) 0.000000 0.000000 0.000000
145 Djibouti 0.000000 0.000000 0.000000
146 Seychelles 0.000000 0.000000 0.000000
147 Lesotho 0.000000 0.000000 0.000000
148 South Sudan 0.000000 0.000000 0.000000
150 Benin 0.000000 0.000000 0.000000
152 Nicaragua 0.000000 0.000000 0.000000
153 Central African Republic 0.000000 0.000000 0.000000
194 Micronesia 0.000000 0.000000 0.000000
fig, (ax1, ax2) = plt.subplots(2, 1,figsize=(10,12))
top_15_confirmed=countrywise.sort_values(["Confirmed"],ascending=False).head(15)
top_15_deaths=countrywise.sort_values(["Deaths"],ascending=False).head(15)
sns.barplot(x=top_15_confirmed["Confirmed"],y=top_15_confirmed.index,ax=ax1)
ax1.set_title("Top 15 countries as per Number of Confirmed Cases")
sns.barplot(x=top_15_deaths["Deaths"],y=top_15_deaths.index,ax=ax2)
ax2.set_title("Top 15 countries as per Number of Death Cases")
Text(0.5, 1.0, 'Top 15 countries as per Number of Death Cases')

Another interesting thing to see is the median age of worst affected countries.

Top 25 Countries as per Mortatlity Rate and Recovery Rate with more than 500 Confirmed Cases

fig, (ax1, ax2) = plt.subplots(2, 1,figsize=(10,15))
countrywise_plot_mortal=countrywise[countrywise["Confirmed"]>500].sort_values(["Mortality"],ascending=False).head(15)
sns.barplot(x=countrywise_plot_mortal["Mortality"],y=countrywise_plot_mortal.index,ax=ax1)
ax1.set_title("Top 15 Countries according High Mortatlity Rate")
ax1.set_xlabel("Mortality (in Percentage)")
countrywise_plot_recover=countrywise[countrywise["Confirmed"]>500].sort_values(["Recovery"],ascending=False).head(15)
sns.barplot(x=countrywise_plot_recover["Recovery"],y=countrywise_plot_recover.index, ax=ax2)
ax2.set_title("Top 15 Countries according High Recovery Rate")
ax2.set_xlabel("Recovery (in Percentage)")
Text(0.5, 0, 'Recovery (in Percentage)')
fig, (ax1, ax2) = plt.subplots(2, 1,figsize=(10,15))
countrywise_plot_mortal=countrywise[countrywise["Confirmed"]>500].sort_values(["Mortality"],ascending=False).tail(15)
sns.barplot(x=countrywise_plot_mortal["Mortality"],y=countrywise_plot_mortal.index,ax=ax1)
ax1.set_title("Top 15 Countries according Low Mortatlity Rate")
ax1.set_xlabel("Mortality (in Percentage)")
countrywise_plot_recover=countrywise[countrywise["Confirmed"]>500].sort_values(["Recovery"],ascending=False).tail(15)
sns.barplot(x=countrywise_plot_recover["Recovery"],y=countrywise_plot_recover.index, ax=ax2)
ax2.set_title("Top 15 Countries according Low Recovery Rate")
ax2.set_xlabel("Recovery (in Percentage)")
Text(0.5, 0, 'Recovery (in Percentage)')

No Recovered Patients with considerable Mortality Rate

no_recovered_countries=countrywise[(countrywise["Recovered"]==0)][["Confirmed","Deaths"]]
no_recovered_countries["Mortality Rate"]=(no_recovered_countries["Deaths"]/no_recovered_countries["Confirmed"])*100
no_recovered_countries=no_recovered_countries[no_recovered_countries["Mortality Rate"]>0].sort_values(["Mortality Rate"],ascending=False)
no_recovered_countries.style.background_gradient('Reds')
  Confirmed Deaths Mortality Rate
Country/Region      
Belgium 1059763.000000 24921.000000 2.351564
US 33251939.000000 594306.000000 1.787282
Sweden 1068473.000000 14451.000000 1.352491
Serbia 712046.000000 6844.000000 0.961174

Sweden currently has maximum number of Confirmed Cases, with no Recovered patient being recorded, it also has hihgt comparitively has high mortality rate compared to overall mortality rate of the World.

Countries with more than 100 Confirmed Cases and No Deaths with considerably high Recovery Rate

no_deaths=countrywise[(countrywise["Confirmed"]>100)&(countrywise["Deaths"]==0)]
no_deaths=no_deaths[no_deaths["Recovery"]>0].sort_values(["Recovery"],ascending=False).drop(["Mortality"],1)
no_deaths.style.background_gradient(cmap="Reds")
  Confirmed Recovered Deaths Recovery
Country/Region        
Dominica 188.000000 182.000000 0.000000 96.808511

Vietnam has able to contain COVID-19 pretty well with no Deaths recorded so far with pretty healthy Recovery Rate. Just for information Vietnam was the first country to inform World Health Organization about Human to Human Transmission of COVID-19.

Vietnam and Cambodia will soon be free from COVID-19.

fig, (ax1, ax2) = plt.subplots(2, 1,figsize=(10,15))
countrywise["Active Cases"]=(countrywise["Confirmed"]-countrywise["Recovered"]-countrywise["Deaths"])
countrywise["Outcome Cases"]=(countrywise["Recovered"]+countrywise["Deaths"])
top_15_active=countrywise.sort_values(["Active Cases"],ascending=False).head(15)
top_15_outcome=countrywise.sort_values(["Outcome Cases"],ascending=False).head(15)
sns.barplot(x=top_15_active["Active Cases"],y=top_15_active.index,ax=ax1)
sns.barplot(x=top_15_outcome["Outcome Cases"],y=top_15_outcome.index,ax=ax2)
ax1.set_title("Top 15 Countries with Most Number of Active Cases")
ax2.set_title("Top 15 Countries with Most Number of Closed Cases")
Text(0.5, 1.0, 'Top 15 Countries with Most Number of Closed Cases')
# confirm_rate=[]
# for country in countrywise.index:
#     days=country_date.ix[country].shape[0]
#     confirm_rate.append((countrywise.ix[country]["Confirmed"])/days)
# countrywise["Confirm Cases/Day"]=confirm_rate
# top_15_ccpd=countrywise.sort_values(["Confirm Cases/Day"],ascending=False).head(15)
# sns.barplot(y=top_15_ccpd.index,x=top_15_ccpd["Confirm Cases/Day"],ax=ax1)
# ax1.set_title("Top 15 countries as per high number Confirmed Cases per Day")
# bottom_15_ccpd=countrywise[countrywise["Confirmed"]>1000].sort_values(["Confirm Cases/Day"],ascending=False).tail(15)
# sns.barplot(y=bottom_15_ccpd.index,x=bottom_15_ccpd["Confirm Cases/Day"],ax=ax2)
# ax2.set_title("Top 15 countries as per Lowest Confirmed Cases per Day having more than 1000 Confirmed Cases")

Mainland China has recorded highest number of Closed cases as thier Recovery Rate is staggering recording 85%+

Confirmed Cases/Day is clear indication of why US has highest number of Active Cases currently. The rate is 11000+ cases per day. Showing increase in that value every day.

fig, (ax1, ax2) = plt.subplots(2, 1,figsize=(10,15))
countrywise["Survival Probability"]=(1-(countrywise["Deaths"]/countrywise["Confirmed"]))*100
top_25_survival=countrywise[countrywise["Confirmed"]>1000].sort_values(["Survival Probability"],ascending=False).head(15)
sns.barplot(x=top_25_survival["Survival Probability"],y=top_25_survival.index,ax=ax1)
ax1.set_title("Top 25 Countries with Maximum Survival Probability having more than 1000 Confiremed Cases")
print('Mean Survival Probability across all countries',countrywise["Survival Probability"].mean())
print('Median Survival Probability across all countries',countrywise["Survival Probability"].median())
print('Mean Death Probability across all countries',100-countrywise["Survival Probability"].mean())
print('Median Death Probability across all countries',100-countrywise["Survival Probability"].median())

Bottom_5_countries=countrywise[countrywise["Confirmed"]>100].sort_values(["Survival Probability"],ascending=True).head(15)
sns.barplot(x=Bottom_5_countries["Survival Probability"],y=Bottom_5_countries.index,ax=ax2)
plt.title("Bottom 15 Countries as per Survival Probability")
Mean Survival Probability across all countries 97.83094336678441
Median Survival Probability across all countries 98.38462415588694
Mean Death Probability across all countries 2.1690566332155896
Median Death Probability across all countries 1.6153758441130606
Text(0.5, 1.0, 'Bottom 15 Countries as per Survival Probability')

Survival Probability is the only graph that looks the most promising! Having average survival probability of 95%+ across all countries. The difference between Mean and Median Death Probability is an clear indication that there few countries with really high mortality rate e.g. Italy, Algeria, UK etc.

Journey of different Countries in COVID-19

When we see daily news reports on COVID-19 it's really hard to interpret what's actually happening, since the numbers are changing so rapidly but that's something expected from Exponential growth. Since almost all the pandemics tend to grow exponentially it's really hard to understand for someone from a non-mathematical or non-statistical background.

We are more concerned about how we are doing and where we are heading in this pandemic rather than just looking at those exponentially growing numbers. The growth won't be exponentially forever, at some point of time the curve will become flat because probably all the people on the planet are infected or we human have been able to control the disease.

When we are in the middle of the exponential growth it's almost impossible to tell where are we heading.

Here, I am trying to show how we can interpret the exponential growth which is the common trend among all the countries

fig=go.Figure()
for country in countrywise.head(10).index:
    fig.add_trace(go.Scatter(x=grouped_country.loc[country]["log_confirmed"], y=grouped_country.loc[country]["log_active"],
                    mode='lines',name=country))
fig.update_layout(height=600,title="COVID-19 Journey of Top 15 Worst Affected Countries",
                 xaxis_title="Confirmed Cases (Logrithmic Scale)",yaxis_title="Active Cases (Logarithmic Scale)",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

It's pretty evident that the disease is spreading in same manner everywhere, but if particular country is following pandemic controlling practices rigrously the results are evident in the graph.

Most of the countries will follow the same trajectory as that USA, which is "Uncontrolled Exponential Growth"

There are few countries where the pandemic controlling practices seems to be working accurately, few classic examples are China, Germany, Italy, Spain, Turkey has started showing that dip indicating there are somehow got control over COVID-19

Countries like United Kingdom, Russia are following similar lines as that of United States, indicating the growth is still exponential among those countries.

Iran is showing some occasional drops.

fig=go.Figure()
for country in countrywise.head(10).index:
    fig.add_trace(go.Scatter(x=grouped_country.loc[country].index, y=grouped_country.loc[country]["Confirmed"].rolling(window=7).mean().diff(),
                    mode='lines',name=country))
fig.update_layout(height=600,title="7 Days Rolling Average of Daily increase of Confirmed Cases",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=go.Figure()
for country in countrywise.head(10).index:
    fig.add_trace(go.Scatter(x=grouped_country.loc[country].index, 
                             y=grouped_country.loc[country]["Deaths"].rolling(window=7).mean().diff(),
                    mode='lines',name=country))
fig.update_layout(height=600,title="7 Days Rolling Average of Daily increase of Death Cases",
                 xaxis_title="Date",yaxis_title="Death Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=go.Figure()
for country in countrywise.head(10).index:
    fig.add_trace(go.Scatter(x=grouped_country.loc[country].index, 
                             y=grouped_country.loc[country]["Recovered"].rolling(window=7).mean().diff(),
                    mode='lines',name=country))
fig.update_layout(height=600,title="7 Days Rolling Average of Daily increase of Recovered Cases",
                 xaxis_title="Date",yaxis_title="Recovered Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

Clustering of Countries

The clustering of countries can be done considering different features. Here I'm trying to cluster different countries based on the Mortality and Recovery rate of indivisual country.

As we all are well aware that COVID-19 has different Mortality Rate among different countries based on different factors and so is the Recovery Rate because of pandemic controlling practices followed by the individual country. Also Mortality Rate and Recovery Rate both togther takes into account all types of cases Confirmed, Recoverd and Deaths.

Let's checkout how these clusters look like!

X=countrywise[["Mortality","Recovery"]]
#Standard Scaling since K-Means Clustering is a distance based alogrithm
X=std.fit_transform(X) 
wcss=[]
sil=[]
for i in range(2,11):
    clf=KMeans(n_clusters=i,init='k-means++',random_state=42)
    clf.fit(X)
    labels=clf.labels_
    centroids=clf.cluster_centers_
    sil.append(silhouette_score(X, labels, metric='euclidean'))
    wcss.append(clf.inertia_)
x=np.arange(2,11)
plt.figure(figsize=(10,5))
plt.plot(x,wcss,marker='o')
plt.xlabel("Number of Clusters")
plt.ylabel("Within Cluster Sum of Squares (WCSS)")
plt.title("Elbow Method")
Text(0.5, 1.0, 'Elbow Method')
import scipy.cluster.hierarchy as sch
plt.figure(figsize=(20,15))
dendogram=sch.dendrogram(sch.linkage(X, method  = "ward"))

All methods namely Elbow Method and Hierarchical Clustering shows K=3 will correct number of clusters.

clf_final=KMeans(n_clusters=3,init='k-means++',random_state=6)
clf_final.fit(X)
KMeans(n_clusters=3, random_state=6)
countrywise["Clusters"]=clf_final.predict(X)

Summary of Clusters

cluster_summary=pd.concat([countrywise[countrywise["Clusters"]==1].head(15),countrywise[countrywise["Clusters"]==2].head(15),countrywise[countrywise["Clusters"]==0].head(15)])
cluster_summary.style.background_gradient(cmap='Reds').format("{:.2f}")
  Confirmed Recovered Deaths Mortality Recovery Active Cases Outcome Cases Survival Probability Clusters
Country/Region                  
US 33251939.00 0.00 594306.00 1.79 0.00 32657633.00 594306.00 98.21 1.00
France 5719877.00 390878.00 109518.00 1.91 6.83 5219481.00 500396.00 98.09 1.00
UK 4496823.00 15481.00 128037.00 2.85 0.34 4353305.00 143518.00 97.15 1.00
Spain 3668658.00 150376.00 79905.00 2.18 4.10 3438377.00 230281.00 97.82 1.00
Netherlands 1671967.00 26810.00 17889.00 1.07 1.60 1627268.00 44699.00 98.93 1.00
Sweden 1068473.00 0.00 14451.00 1.35 0.00 1054022.00 14451.00 98.65 1.00
Belgium 1059763.00 0.00 24921.00 2.35 0.00 1034842.00 24921.00 97.65 1.00
Serbia 712046.00 0.00 6844.00 0.96 0.00 705202.00 6844.00 99.04 1.00
Switzerland 693023.00 317600.00 10805.00 1.56 45.83 364618.00 328405.00 98.44 1.00
Greece 400395.00 93764.00 12024.00 3.00 23.42 294607.00 105788.00 97.00 1.00
Ireland 254870.00 23364.00 4941.00 1.94 9.17 226565.00 28305.00 98.06 1.00
Honduras 236952.00 84389.00 6296.00 2.66 35.61 146267.00 90685.00 97.34 1.00
Thailand 151842.00 26873.00 988.00 0.65 17.70 123981.00 27861.00 99.35 1.00
Norway 124655.00 17998.00 783.00 0.63 14.44 105874.00 18781.00 99.37 1.00
Finland 92244.00 46000.00 948.00 1.03 49.87 45296.00 46948.00 98.97 1.00
Yemen 6731.00 3399.00 1319.00 19.60 50.50 2013.00 4718.00 80.40 2.00
MS Zaandam 9.00 7.00 2.00 22.22 77.78 0.00 9.00 77.78 2.00
Vanuatu 4.00 3.00 1.00 25.00 75.00 0.00 4.00 75.00 2.00
India 27894800.00 25454320.00 325972.00 1.17 91.25 2114508.00 25780292.00 98.83 0.00
Brazil 16471600.00 14496224.00 461057.00 2.80 88.01 1514319.00 14957281.00 97.20 0.00
Turkey 5235978.00 5094279.00 47271.00 0.90 97.29 94428.00 5141550.00 99.10 0.00
Russia 4995613.00 4616422.00 118781.00 2.38 92.41 260410.00 4735203.00 97.62 0.00
Italy 4213055.00 3845087.00 126002.00 2.99 91.27 241966.00 3971089.00 97.01 0.00
Argentina 3732263.00 3288467.00 77108.00 2.07 88.11 366688.00 3365575.00 97.93 0.00
Germany 3684672.00 3479700.00 88413.00 2.40 94.44 116559.00 3568113.00 97.60 0.00
Colombia 3363061.00 3141549.00 87747.00 2.61 93.41 133765.00 3229296.00 97.39 0.00
Iran 2893218.00 2425033.00 79741.00 2.76 83.82 388444.00 2504774.00 97.24 0.00
Poland 2871371.00 2636675.00 73682.00 2.57 91.83 161014.00 2710357.00 97.43 0.00
Mexico 2411503.00 1924865.00 223455.00 9.27 79.82 263183.00 2148320.00 90.73 0.00
Ukraine 2257904.00 2084477.00 52414.00 2.32 92.32 121013.00 2136891.00 97.68 0.00
Peru 1947555.00 1897522.00 68978.00 3.54 97.43 -18945.00 1966500.00 96.46 0.00
Indonesia 1809926.00 1659974.00 50262.00 2.78 91.72 99690.00 1710236.00 97.22 0.00
Czech Republic 1660935.00 1617498.00 30101.00 1.81 97.38 13336.00 1647599.00 98.19 0.00
print("Avergae Mortality Rate of Cluster 0: ",countrywise[countrywise["Clusters"]==0]["Mortality"].mean())
print("Avergae Recovery Rate of Cluster 0: ",countrywise[countrywise["Clusters"]==0]["Recovery"].mean())
print("Avergae Mortality Rate of Cluster 1: ",countrywise[countrywise["Clusters"]==1]["Mortality"].mean())
print("Avergae Recovery Rate of Cluster 1: ",countrywise[countrywise["Clusters"]==1]["Recovery"].mean())
print("Avergae Mortality Rate of Cluster 2: ",countrywise[countrywise["Clusters"]==2]["Mortality"].mean())
print("Avergae Recovery Rate of Cluster 2: ",countrywise[countrywise["Clusters"]==2]["Recovery"].mean())
Avergae Mortality Rate of Cluster 0:  1.877885707521883
Avergae Recovery Rate of Cluster 0:  90.74888221767141
Avergae Mortality Rate of Cluster 1:  1.701640341180339
Avergae Recovery Rate of Cluster 1:  22.335095583921884
Avergae Mortality Rate of Cluster 2:  22.272707263793283
Avergae Recovery Rate of Cluster 2:  67.75849166652911
plt.figure(figsize=(10,5))
sns.scatterplot(x=countrywise["Recovery"],y=countrywise["Mortality"],hue=countrywise["Clusters"],s=100)
plt.axvline(((datewise["Recovered"]/datewise["Confirmed"])*100).mean(),
            color='red',linestyle="--",label="Mean Recovery Rate around the World")
plt.axhline(((datewise["Deaths"]/datewise["Confirmed"])*100).mean(),
            color='black',linestyle="--",label="Mean Mortality Rate around the World")
plt.legend()
<matplotlib.legend.Legend at 0x7f19bf8e4e50>
print("Few Countries belonging to Cluster 0: ",list(countrywise[countrywise["Clusters"]==0].head(10).index))
print("Few Countries belonging to Cluster 1: ",list(countrywise[countrywise["Clusters"]==1].head(10).index))
print("Few Countries belonging to Cluster 2: ",list(countrywise[countrywise["Clusters"]==2].head(10).index))
Few Countries belonging to Cluster 0:  ['India', 'Brazil', 'Turkey', 'Russia', 'Italy', 'Argentina', 'Germany', 'Colombia', 'Iran', 'Poland']
Few Countries belonging to Cluster 1:  ['US', 'France', 'UK', 'Spain', 'Netherlands', 'Sweden', 'Belgium', 'Serbia', 'Switzerland', 'Greece']
Few Countries belonging to Cluster 2:  ['Yemen', 'MS Zaandam', 'Vanuatu']

Cluster 2 is a set of countries which have really High Mortality Rate and consdierably Good Recovery Rate. Basically few countries among these clusters have seen already the worst of this pandemic but are now recovering with healty Recovery Rate.

Cluster 0 is set of countries which have Low Mortality Rate and really High Recovery Rate. These are the set of countries who has been able to control the COVID-19 by following pandemic controlling practices rigorously.

Cluster 1 is set of countries which have Low Mortality Rate and really Low Recovery Rate. These countries need to pace up their Revovery Rate to get out it, Some thses countries have really high number of Infected Cases but Low Mortality is positive sign out of it.

Comparison of China, Italy, US, Spain, Brazil and Rest of the World

china_data=covid[covid["Country/Region"]=="Mainland China"]
Italy_data=covid[covid["Country/Region"]=="Italy"]
US_data=covid[covid["Country/Region"]=="US"]
spain_data=covid[covid["Country/Region"]=="Spain"]
brazil_data=covid[covid["Country/Region"]=="Brazil"]
rest_of_world=covid[(covid["Country/Region"]!="Mainland China")&(covid["Country/Region"]!="Italy")&(covid["Country/Region"]!="US")&(covid["Country/Region"]!="Spain")&(covid["Country/Region"]!="Brazil")]

datewise_china=china_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_Italy=Italy_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_US=US_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_Spain=spain_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_Brazil=brazil_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_restofworld=rest_of_world.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_china.index, y=(datewise_china["Confirmed"]),
                    mode='lines',name="China"))
fig.add_trace(go.Scatter(x=datewise_Italy.index, y=(datewise_Italy["Confirmed"]),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US.index, y=(datewise_US["Confirmed"]),
                    mode='lines',name="United States"))
fig.add_trace(go.Scatter(x=datewise_Spain.index, y=(datewise_Spain["Confirmed"]),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_Brazil.index, y=(datewise_Brazil["Confirmed"]),
                    mode='lines',name="Brazil"))
fig.add_trace(go.Scatter(x=datewise_restofworld.index, y=(datewise_restofworld["Confirmed"]),
                    mode='lines',name="Rest of the World"))
fig.update_layout(title="Confirmed Cases plot",
                  xaxis_title="Date",yaxis_title="Number of Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_china.index, y=(datewise_china["Recovered"]),
                    mode='lines',name="China"))
fig.add_trace(go.Scatter(x=datewise_Italy.index, y=(datewise_Italy["Recovered"]),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US.index, y=(datewise_US["Recovered"]),
                    mode='lines',name="United States"))
fig.add_trace(go.Scatter(x=datewise_Spain.index, y=(datewise_Spain["Recovered"]),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_Brazil.index, y=(datewise_Brazil["Recovered"]),
                    mode='lines',name="Brazil"))
fig.add_trace(go.Scatter(x=datewise_restofworld.index, y=(datewise_restofworld["Recovered"]),
                    mode='lines',name="Rest of the World"))
fig.update_layout(title="Recovered Cases plot",
                  xaxis_title="Date",yaxis_title="Number of Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_china.index, y=(datewise_china["Deaths"]),
                    mode='lines',name="China"))
fig.add_trace(go.Scatter(x=datewise_Italy.index, y=(datewise_Italy["Deaths"]),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US.index, y=(datewise_US["Deaths"]),
                    mode='lines',name="United States"))
fig.add_trace(go.Scatter(x=datewise_Spain.index, y=(datewise_Spain["Deaths"]),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_Brazil.index, y=(datewise_Brazil["Deaths"]),
                    mode='lines',name="Brazil"))
fig.add_trace(go.Scatter(x=datewise_restofworld.index, y=(datewise_restofworld["Deaths"]),
                    mode='lines',name="Rest of the World"))
fig.update_layout(title="Death Cases plot",
                  xaxis_title="Date",yaxis_title="Number of Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

China has been able to "flatten the curve" looking at their graphs of Confirmed and Death Cases. With staggering Recovery Rate.

US seems to have good control on Deaths, but number of people getting affected is going way out of hand.

datewise_china["Mortality"]=(datewise_china["Deaths"]/datewise_china["Confirmed"])*100
datewise_Italy["Mortality"]=(datewise_Italy["Deaths"]/datewise_Italy["Confirmed"])*100
datewise_US["Mortality"]=(datewise_US["Deaths"]/datewise_US["Confirmed"])*100
datewise_Spain["Mortality"]=(datewise_Spain["Deaths"]/datewise_Spain["Confirmed"])*100
datewise_Brazil["Mortality"]=(datewise_Brazil["Deaths"]/datewise_Brazil["Confirmed"])*100
datewise_restofworld["Mortality"]=(datewise_restofworld["Deaths"]/datewise_restofworld["Confirmed"])*100

datewise_china["Recovery"]=(datewise_china["Recovered"]/datewise_china["Confirmed"])*100
datewise_Italy["Recovery"]=(datewise_Italy["Recovered"]/datewise_Italy["Confirmed"])*100
datewise_US["Recovery"]=(datewise_US["Recovered"]/datewise_US["Confirmed"])*100
datewise_Spain["Recovery"]=(datewise_Spain["Recovered"]/datewise_Spain["Confirmed"])*100
datewise_Brazil["Recovery"]=(datewise_Brazil["Recovered"]/datewise_Brazil["Confirmed"])*100
datewise_restofworld["Recovery"]=(datewise_restofworld["Recovered"]/datewise_restofworld["Confirmed"])*100
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_china.index, y=(datewise_china["Mortality"]),
                    mode='lines',name="China"))
fig.add_trace(go.Scatter(x=datewise_Italy.index, y=(datewise_Italy["Mortality"]),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US.index, y=(datewise_US["Mortality"]),
                    mode='lines',name="United States"))
fig.add_trace(go.Scatter(x=datewise_Spain.index, y=(datewise_Spain["Mortality"]),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_Brazil.index, y=(datewise_Brazil["Mortality"]),
                    mode='lines',name="Brazil"))
fig.add_trace(go.Scatter(x=datewise_restofworld.index, y=(datewise_restofworld["Mortality"]),
                    mode='lines',name="Rest of the World"))
fig.update_layout(title="Mortality Rate comparison plot",
                  xaxis_title="Date",yaxis_title="Mortality Rate",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_china.index, y=(datewise_china["Recovery"]),
                    mode='lines',name="China"))
fig.add_trace(go.Scatter(x=datewise_Italy.index, y=(datewise_Italy["Recovery"]),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US.index, y=(datewise_US["Recovery"]),
                    mode='lines',name="United States"))
fig.add_trace(go.Scatter(x=datewise_Spain.index, y=(datewise_Spain["Recovery"]),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_Brazil.index, y=(datewise_Brazil["Recovery"]),
                    mode='lines',name="Brazil"))
fig.add_trace(go.Scatter(x=datewise_restofworld.index, y=(datewise_restofworld["Recovery"]),
                    mode='lines',name="Rest of the World"))
fig.update_layout(title="Recovery Rate comparison plot",
                  xaxis_title="Date",yaxis_title="Recovery Rate",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

Taking off Recovery Rate of Spain is a good sign but it's nowhere in comparison to the Moratality Rate.

Its alarming sign for USA and Brazil as their Recovery Rate is improving considerably as compared to other severly affected countries.

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_china.index, y=(datewise_china["Confirmed"]).diff().fillna(0),
                    mode='lines',name="China"))
fig.add_trace(go.Scatter(x=datewise_Italy.index, y=(datewise_Italy["Confirmed"]).diff().fillna(0),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US.index, y=(datewise_US["Confirmed"]).diff().fillna(0),
                    mode='lines',name="United States"))
fig.add_trace(go.Scatter(x=datewise_Spain.index, y=(datewise_Spain["Confirmed"]).diff().fillna(0),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_Brazil.index, y=(datewise_Brazil["Confirmed"]).diff().fillna(0),
                    mode='lines',name="Brazil"))
fig.add_trace(go.Scatter(x=datewise_restofworld.index, y=(datewise_restofworld["Confirmed"]).diff().fillna(0),
                    mode='lines',name="Rest of the World"))
fig.update_layout(title="Daily increase in Number of Confirmed Cases",
                  xaxis_title="Date",yaxis_title="Number of Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_china.index, y=(datewise_china["Deaths"]).diff().fillna(0),
                    mode='lines',name="China"))
fig.add_trace(go.Scatter(x=datewise_Italy.index, y=(datewise_Italy["Deaths"]).diff().fillna(0),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US.index, y=(datewise_US["Deaths"]).diff().fillna(0),
                    mode='lines',name="United States"))
fig.add_trace(go.Scatter(x=datewise_Spain.index, y=(datewise_Spain["Deaths"]).diff().fillna(0),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_Brazil.index, y=(datewise_Brazil["Deaths"]).diff().fillna(0),
                    mode='lines',name="Brazil"))
fig.add_trace(go.Scatter(x=datewise_restofworld.index, y=(datewise_restofworld["Deaths"]).diff().fillna(0),
                    mode='lines',name="Rest of the World"))
fig.update_layout(title="Daily increase in Number of Death Cases",
                  xaxis_title="Date",yaxis_title="Number of Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

We can clearly notice the decreasing trend in the number of Daily Confirmed and Death Cases of Spain and Italy. That's really positive sign for both the countries.

Data Analysis for India

For detailed Data analysis and Forecasting specific to India

Click Here: COVID-19 Data Analysis & Forecasting for India

The notebook consists of detailed data analysis specific to India, Comparison of India with the neighboring countries, Comparison with worst affected countries in this pandemic and try and build Machine Learnig Prediction and Time Series and Forecasting models to try and understand the how the numbers are going to be in near future.

india_data=covid[covid["Country/Region"]=="India"]
datewise_india=india_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
print(datewise_india.iloc[-1])
print("Total Active Cases: ",datewise_india["Confirmed"].iloc[-1]-datewise_india["Recovered"].iloc[-1]-datewise_india["Deaths"].iloc[-1])
print("Total Closed Cases: ",datewise_india["Recovered"].iloc[-1]+datewise_india["Deaths"].iloc[-1])
Confirmed    27894800.0
Recovered    25454320.0
Deaths         325972.0
Name: 2021-05-29 00:00:00, dtype: float64
Total Active Cases:  2114508.0
Total Closed Cases:  25780292.0
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Confirmed"],
                    mode='lines+markers',
                    name='Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Recovered"],
                    mode='lines+markers',
                    name='Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Deaths"],
                    mode='lines+markers',
                    name='Death Cases'))
fig.update_layout(title="Growth of different types of cases in India",
                 xaxis_title="Date",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=px.bar(x=datewise_india.index,y=datewise_india["Confirmed"]-datewise_india["Recovered"]-datewise_india["Deaths"])
fig.update_layout(title="Distribution of Number of Active Cases in India",
                  xaxis_title="Date",yaxis_title="Number of Cases",)
fig.show()
india_increase_confirm=[]
india_increase_recover=[]
india_increase_deaths=[]
for i in range(datewise_india.shape[0]-1):
    india_increase_confirm.append(((datewise_india["Confirmed"].iloc[i+1])/datewise_india["Confirmed"].iloc[i]))
    india_increase_recover.append(((datewise_india["Recovered"].iloc[i+1])/datewise_india["Recovered"].iloc[i]))
    india_increase_deaths.append(((datewise_india["Deaths"].iloc[i+1])/datewise_india["Deaths"].iloc[i]))
india_increase_confirm.insert(0,1)
india_increase_recover.insert(0,1)
india_increase_deaths.insert(0,1)

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_india.index, y=india_increase_confirm,
                    mode='lines',
                    name='Growth Factor of Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise_india.index, y=india_increase_recover,
                    mode='lines',
                    name='Growth Factor of Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise_india.index, y=india_increase_deaths,
                    mode='lines',
                    name='Growth Factor of Death Cases'))
fig.update_layout(title="Datewise Growth Factor of Active and Closed cases in India",
                 xaxis_title="Date",yaxis_title="Growth Factor",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Confirmed"].diff().fillna(0),
                    mode='lines+markers',
                    name='Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Recovered"].diff().fillna(0),
                    mode='lines+markers',
                    name='Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Deaths"].diff().fillna(0),
                    mode='lines+markers',
                    name='Death Cases'))
fig.update_layout(title="Daily increase in different types of cases in India",
                 xaxis_title="Date",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
datewise_india["WeekOfYear"]=datewise_india.index.weekofyear

week_num_india=[]
india_weekwise_confirmed=[]
india_weekwise_recovered=[]
india_weekwise_deaths=[]
w=1
for i in list(datewise_india["WeekOfYear"].unique()):
    india_weekwise_confirmed.append(datewise_india[datewise_india["WeekOfYear"]==i]["Confirmed"].iloc[-1])
    india_weekwise_recovered.append(datewise_india[datewise_india["WeekOfYear"]==i]["Recovered"].iloc[-1])
    india_weekwise_deaths.append(datewise_india[datewise_india["WeekOfYear"]==i]["Deaths"].iloc[-1])
    week_num_india.append(w)
    w=w+1
    
fig=go.Figure()
fig.add_trace(go.Scatter(x=week_num_india, y=india_weekwise_confirmed,
                    mode='lines+markers',
                    name='Weekly Growth of Confirmed Cases'))
fig.add_trace(go.Scatter(x=week_num_india, y=india_weekwise_recovered,
                    mode='lines+markers',
                    name='Weekly Growth of Recovered Cases'))
fig.add_trace(go.Scatter(x=week_num_india, y=india_weekwise_deaths,
                    mode='lines+markers',
                    name='Weekly Growth of Death Cases'))
fig.update_layout(title="Weekly Growth of different types of Cases in India",
                 xaxis_title="Week Number",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
fig, (ax1,ax2) = plt.subplots(1, 2,figsize=(15,5))
sns.barplot(x=week_num_india,y=pd.Series(india_weekwise_confirmed).diff().fillna(0),ax=ax1)
sns.barplot(x=week_num_india,y=pd.Series(india_weekwise_deaths).diff().fillna(0),ax=ax2)
ax1.set_xlabel("Week Number")
ax2.set_xlabel("Week Number")
ax1.set_ylabel("Number of Confirmed Cases")
ax2.set_ylabel("Number of Death Cases")
ax1.set_title("India's Weekwise increase in Number of Confirmed Cases")
ax2.set_title("India's Weekwise increase in Number of Death Cases")
Text(0.5, 1.0, "India's Weekwise increase in Number of Death Cases")
max_ind=datewise_india["Confirmed"].max()

print("It took",datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)].shape[0],"days in Italy to reach number of Confirmed Cases equivalent to India")
print("It took",datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)].shape[0],"days in USA to reach number of Confirmed Cases equivalent to India")
print("It took",datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)].shape[0],"days in Spain to reach number of Confirmed Cases equivalent to India")
print("It took",datewise_india[datewise_india["Confirmed"]>0].shape[0],"days in India to reach",max_ind,"Confirmed Cases")

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)].index, y=datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)]["Confirmed"],
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)].index, y=datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)]["Confirmed"],
                    mode='lines',name="USA"))
fig.add_trace(go.Scatter(x=datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)].index, y=datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)]["Confirmed"],
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Confirmed"],
                    mode='lines',name="India"))
fig.update_layout(title="Growth of Recovered Cases with respect to India",
                 xaxis_title="Date",yaxis_title="Number of Confirmed Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
It took 485 days in Italy to reach number of Confirmed Cases equivalent to India
It took 392 days in USA to reach number of Confirmed Cases equivalent to India
It took 484 days in Spain to reach number of Confirmed Cases equivalent to India
It took 486 days in India to reach 27894800.0 Confirmed Cases

Comparison of Daily Increase in Number of Cases of Italy, Spain, USA and India, where maximum number of Confirmed Cases are equivalent to maximum number of Confirmed Cases in India

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)].index, 
                         y=datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)]["Confirmed"].diff().fillna(0),
                    mode='lines',name="Italy"))
fig.add_trace(go.Scatter(x=datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)].index, 
                         y=datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)]["Confirmed"].diff().fillna(0),
                    mode='lines',name="USA"))
fig.add_trace(go.Scatter(x=datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)].index,
                         y=datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)]["Confirmed"].diff().fillna(0),
                    mode='lines',name="Spain"))
fig.add_trace(go.Scatter(x=datewise_india.index, y=datewise_india["Confirmed"].diff().fillna(0),
                    mode='lines',name="India"))
fig.update_layout(title="Daily increase in Confirmed Cases",
                 xaxis_title="Date",yaxis_title="Number of Confirmed Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

Prediction using Machine Learning Models

Linear Regression Model for Confirm Cases Prediction

datewise["Days Since"]=datewise.index-datewise.index[0]
datewise["Days Since"]=datewise["Days Since"].dt.days
train_ml=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid_ml=datewise.iloc[int(datewise.shape[0]*0.95):]
model_scores=[]
lin_reg=LinearRegression(normalize=True)
lin_reg.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))
LinearRegression(normalize=True)
prediction_valid_linreg=lin_reg.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))
model_scores.append(np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_linreg)))
print("Root Mean Square Error for Linear Regression: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_linreg)))
Root Mean Square Error for Linear Regression:  33541511.296706144
plt.figure(figsize=(11,6))
prediction_linreg=lin_reg.predict(np.array(datewise["Days Since"]).reshape(-1,1))
linreg_output=[]
for i in range(prediction_linreg.shape[0]):
    linreg_output.append(prediction_linreg[i][0])

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=datewise.index, y=linreg_output,
                    mode='lines',name="Linear Regression Best Fit Line",
                    line=dict(color='black', dash='dot')))
fig.update_layout(title="Confirmed Cases Linear Regression Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
<Figure size 792x432 with 0 Axes>

The Linear Regression Model is absolutely falling aprat. As it is clearly visible that the trend of Confirmed Cases in absolutely not Linear.

Polynomial Regression for Prediction of Confirmed Cases

train_ml=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid_ml=datewise.iloc[int(datewise.shape[0]*0.95):]
poly = PolynomialFeatures(degree = 8) 
train_poly=poly.fit_transform(np.array(train_ml["Days Since"]).reshape(-1,1))
valid_poly=poly.fit_transform(np.array(valid_ml["Days Since"]).reshape(-1,1))
y=train_ml["Confirmed"]
linreg=LinearRegression(normalize=True)
linreg.fit(train_poly,y)
LinearRegression(normalize=True)
prediction_poly=linreg.predict(valid_poly)
rmse_poly=np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_poly))
model_scores.append(rmse_poly)
print("Root Mean Squared Error for Polynomial Regression: ",rmse_poly)
Root Mean Squared Error for Polynomial Regression:  27362958.416571088
comp_data=poly.fit_transform(np.array(datewise["Days Since"]).reshape(-1,1))
plt.figure(figsize=(11,6))
predictions_poly=linreg.predict(comp_data)

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=datewise.index, y=predictions_poly,
                    mode='lines',name="Polynomial Regression Best Fit",
                    line=dict(color='black', dash='dot')))
fig.update_layout(title="Confirmed Cases Polynomial Regression Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",
                 legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
<Figure size 792x432 with 0 Axes>
new_prediction_poly=[]
for i in range(1,18):
    new_date_poly=poly.fit_transform(np.array(datewise["Days Since"].max()+i).reshape(-1,1))
    new_prediction_poly.append(linreg.predict(new_date_poly)[0])

Support Vector Machine ModelRegressor for Prediction of Confirmed Cases

train_ml=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid_ml=datewise.iloc[int(datewise.shape[0]*0.95):]
svm=SVR(C=1,degree=6,kernel='poly',epsilon=0.01)
svm.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))
SVR(C=1, degree=6, epsilon=0.01, kernel='poly')
prediction_valid_svm=svm.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))
model_scores.append(np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_svm)))
print("Root Mean Square Error for Support Vectore Machine: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_svm)))
Root Mean Square Error for Support Vectore Machine:  27435923.21693116
plt.figure(figsize=(11,6))
prediction_svm=svm.predict(np.array(datewise["Days Since"]).reshape(-1,1))
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=datewise.index, y=prediction_svm,
                    mode='lines',name="Support Vector Machine Best fit Kernel",
                    line=dict(color='black', dash='dot')))
fig.update_layout(title="Confirmed Cases Support Vectore Machine Regressor Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
<Figure size 792x432 with 0 Axes>

Support Vector Machine model isn't providing great results now, the predictions are either overshooting or really lower than what's expected.

new_date=[]
new_prediction_lr=[]
new_prediction_svm=[]
for i in range(1,18):
    new_date.append(datewise.index[-1]+timedelta(days=i))
    new_prediction_lr.append(lin_reg.predict(np.array(datewise["Days Since"].max()+i).reshape(-1,1))[0][0])
    new_prediction_svm.append(svm.predict(np.array(datewise["Days Since"].max()+i).reshape(-1,1))[0])
pd.set_option('display.float_format', lambda x: '%.6f' % x)
model_predictions=pd.DataFrame(zip(new_date,new_prediction_lr,new_prediction_poly,new_prediction_svm),
                               columns=["Dates","Linear Regression Prediction","Polynonmial Regression Prediction","SVM Prediction"])
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718

Predictions of Linear Regression are nowhere close to actual values.

Time Series Forecasting

Holt's Linear Model

model_train=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid=datewise.iloc[int(datewise.shape[0]*0.95):]
y_pred=valid.copy()
holt=Holt(np.asarray(model_train["Confirmed"])).fit(smoothing_level=0.4, smoothing_slope=0.4,optimized=False)     
y_pred["Holt"]=holt.forecast(len(valid))
model_scores.append(np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt"])))
print("Root Mean Square Error Holt's Linear Model: ",np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt"])))
Root Mean Square Error Holt's Linear Model:  1696111.7924457418
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Confirmed"],
                    mode='lines+markers',name="Validation Data for Confirmed Cases",))
fig.add_trace(go.Scatter(x=valid.index, y=y_pred["Holt"],
                    mode='lines+markers',name="Prediction of Confirmed Cases",))
fig.update_layout(title="Confirmed Cases Holt's Linear Model Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
holt_new_date=[]
holt_new_prediction=[]
for i in range(1,18):
    holt_new_date.append(datewise.index[-1]+timedelta(days=i))
    holt_new_prediction.append(holt.forecast((len(valid)+i))[-1])

model_predictions["Holt's Linear Model Prediction"]=holt_new_prediction
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction Holt's Linear Model Prediction
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116 174366504.894772
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817 175133215.804870
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054 175899926.714968
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129 176666637.625065
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718 177433348.535163

Holt's Winter Model for Daily Time Series

model_train=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid=datewise.iloc[int(datewise.shape[0]*0.95):]
y_pred=valid.copy()
es=ExponentialSmoothing(np.asarray(model_train['Confirmed']),seasonal_periods=14,trend='add', seasonal='mul').fit()
y_pred["Holt's Winter Model"]=es.forecast(len(valid))
model_scores.append(np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt's Winter Model"])))
print("Root Mean Square Error for Holt's Winter Model: ",np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt's Winter Model"])))
Root Mean Square Error for Holt's Winter Model:  2594639.6682255697
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Confirmed"],
                    mode='lines+markers',name="Validation Data for Confirmed Cases",))
fig.add_trace(go.Scatter(x=valid.index, y=y_pred["Holt\'s Winter Model"],
                    mode='lines+markers',name="Prediction of Confirmed Cases",))
fig.update_layout(title="Confirmed Cases Holt's Winter Model Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
holt_winter_new_prediction=[]
for i in range(1,18):
    holt_winter_new_prediction.append(es.forecast((len(valid)+i))[-1])
model_predictions["Holt's Winter Model Prediction"]=holt_winter_new_prediction
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction Holt's Linear Model Prediction Holt's Winter Model Prediction
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116 174366504.894772 175873980.988871
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817 175133215.804870 176535478.925915
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054 175899926.714968 177370523.430426
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129 176666637.625065 178275543.880503
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718 177433348.535163 179241090.821188
model_train=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid=datewise.iloc[int(datewise.shape[0]*0.95):]
y_pred=valid.copy()

AR Model (using AUTO ARIMA)

model_ar= auto_arima(model_train["Confirmed"],trace=True, error_action='ignore', start_p=0,start_q=0,max_p=4,max_q=0,
                   suppress_warnings=True,stepwise=False,seasonal=False)
model_ar.fit(model_train["Confirmed"])
 ARIMA(0,2,0)(0,0,0)[0] intercept   : AIC=11809.478, Time=0.05 sec
 ARIMA(1,2,0)(0,0,0)[0] intercept   : AIC=11798.602, Time=0.03 sec
 ARIMA(2,2,0)(0,0,0)[0] intercept   : AIC=11798.849, Time=0.06 sec
 ARIMA(3,2,0)(0,0,0)[0] intercept   : AIC=11750.641, Time=0.14 sec
 ARIMA(4,2,0)(0,0,0)[0] intercept   : AIC=11660.908, Time=0.19 sec

Best model:  ARIMA(4,2,0)(0,0,0)[0] intercept
Total fit time: 0.489 seconds
ARIMA(order=(4, 2, 0), scoring_args={}, suppress_warnings=True)
prediction_ar=model_ar.predict(len(valid))
y_pred["AR Model Prediction"]=prediction_ar
model_scores.append(np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["AR Model Prediction"])))
print("Root Mean Square Error for AR Model: ",np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["AR Model Prediction"])))
Root Mean Square Error for AR Model:  2350964.490321815
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Confirmed"],
                    mode='lines+markers',name="Validation Data for Confirmed Cases",))
fig.add_trace(go.Scatter(x=valid.index, y=y_pred["AR Model Prediction"],
                    mode='lines+markers',name="Prediction of Confirmed Cases",))
fig.update_layout(title="Confirmed Cases AR Model Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
AR_model_new_prediction=[]
for i in range(1,18):
    AR_model_new_prediction.append(model_ar.predict(len(valid)+i)[-1])
model_predictions["AR Model Prediction"]=AR_model_new_prediction
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction Holt's Linear Model Prediction Holt's Winter Model Prediction AR Model Prediction
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116 174366504.894772 175873980.988871 175686390.526716
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817 175133215.804870 176535478.925915 176528318.654233
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054 175899926.714968 177370523.430426 177370455.177831
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129 176666637.625065 178275543.880503 178213091.415752
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718 177433348.535163 179241090.821188 179057425.154646

MA Model (using AUTO ARIMA)

model_train=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid=datewise.iloc[int(datewise.shape[0]*0.95):]
y_pred=valid.copy()
model_ma= auto_arima(model_train["Confirmed"],trace=True, error_action='ignore', start_p=0,start_q=0,max_p=0,max_q=2,
                   suppress_warnings=True,stepwise=False,seasonal=False)
model_ma.fit(model_train["Confirmed"])
 ARIMA(0,2,0)(0,0,0)[0] intercept   : AIC=11809.478, Time=0.03 sec
 ARIMA(0,2,1)(0,0,0)[0] intercept   : AIC=11785.886, Time=0.09 sec
 ARIMA(0,2,2)(0,0,0)[0] intercept   : AIC=11738.463, Time=0.15 sec

Best model:  ARIMA(0,2,2)(0,0,0)[0] intercept
Total fit time: 0.278 seconds
ARIMA(order=(0, 2, 2), scoring_args={}, suppress_warnings=True)
prediction_ma=model_ma.predict(len(valid))
y_pred["MA Model Prediction"]=prediction_ma
model_scores.append(np.sqrt(mean_squared_error(valid["Confirmed"],prediction_ma)))
print("Root Mean Square Error for MA Model: ",np.sqrt(mean_squared_error(valid["Confirmed"],prediction_ma)))
Root Mean Square Error for MA Model:  2901478.9273606585
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Confirmed"],
                    mode='lines+markers',name="Validation Data for Confirmed Cases",))
fig.add_trace(go.Scatter(x=valid.index, y=y_pred["MA Model Prediction"],
                    mode='lines+markers',name="Prediction for Confirmed Cases",))
fig.update_layout(title="Confirmed Cases MA Model Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
MA_model_new_prediction=[]
for i in range(1,18):
    MA_model_new_prediction.append(model_ma.predict(len(valid)+i)[-1])
model_predictions["MA Model Prediction"]=MA_model_new_prediction
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction Holt's Linear Model Prediction Holt's Winter Model Prediction AR Model Prediction MA Model Prediction
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116 174366504.894772 175873980.988871 175686390.526716 176938616.913241
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817 175133215.804870 176535478.925915 176528318.654233 177872887.185443
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054 175899926.714968 177370523.430426 177370455.177831 178812067.494476
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129 176666637.625065 178275543.880503 178213091.415752 179756157.840340
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718 177433348.535163 179241090.821188 179057425.154646 180705158.223035

ARIMA Model (using AUTOARIMA)

model_train=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid=datewise.iloc[int(datewise.shape[0]*0.95):]
y_pred=valid.copy()
model_arima= auto_arima(model_train["Confirmed"],trace=True, error_action='ignore', start_p=1,start_q=1,max_p=3,max_q=3,
                   suppress_warnings=True,stepwise=False,seasonal=False)
model_arima.fit(model_train["Confirmed"])
 ARIMA(0,2,0)(0,0,0)[0] intercept   : AIC=11809.478, Time=0.03 sec
 ARIMA(0,2,1)(0,0,0)[0] intercept   : AIC=11785.886, Time=0.09 sec
 ARIMA(0,2,2)(0,0,0)[0] intercept   : AIC=11738.463, Time=0.15 sec
 ARIMA(0,2,3)(0,0,0)[0] intercept   : AIC=11730.267, Time=0.22 sec
 ARIMA(1,2,0)(0,0,0)[0] intercept   : AIC=11798.602, Time=0.04 sec
 ARIMA(1,2,1)(0,0,0)[0] intercept   : AIC=11730.635, Time=0.14 sec
 ARIMA(1,2,2)(0,0,0)[0] intercept   : AIC=11753.103, Time=0.25 sec
 ARIMA(1,2,3)(0,0,0)[0] intercept   : AIC=11835.814, Time=0.40 sec
 ARIMA(2,2,0)(0,0,0)[0] intercept   : AIC=11798.849, Time=0.05 sec
 ARIMA(2,2,1)(0,0,0)[0] intercept   : AIC=11716.037, Time=0.39 sec
 ARIMA(2,2,2)(0,0,0)[0] intercept   : AIC=inf, Time=1.33 sec
 ARIMA(2,2,3)(0,0,0)[0] intercept   : AIC=11543.789, Time=1.30 sec
 ARIMA(3,2,0)(0,0,0)[0] intercept   : AIC=11750.641, Time=0.30 sec
 ARIMA(3,2,1)(0,0,0)[0] intercept   : AIC=11650.242, Time=0.75 sec
 ARIMA(3,2,2)(0,0,0)[0] intercept   : AIC=11534.633, Time=2.28 sec

Best model:  ARIMA(3,2,2)(0,0,0)[0] intercept
Total fit time: 7.778 seconds
ARIMA(order=(3, 2, 2), scoring_args={}, suppress_warnings=True)
prediction_arima=model_arima.predict(len(valid))
y_pred["ARIMA Model Prediction"]=prediction_arima
model_scores.append(np.sqrt(mean_squared_error(valid["Confirmed"],prediction_arima)))
print("Root Mean Square Error for ARIMA Model: ",np.sqrt(mean_squared_error(valid["Confirmed"],prediction_arima)))
Root Mean Square Error for ARIMA Model:  3160128.216756515
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Confirmed"],
                    mode='lines+markers',name="Validation Data for Confirmed Cases",))
fig.add_trace(go.Scatter(x=valid.index, y=y_pred["ARIMA Model Prediction"],
                    mode='lines+markers',name="Prediction for Confirmed Cases",))
fig.update_layout(title="Confirmed Cases ARIMA Model Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
ARIMA_model_new_prediction=[]
for i in range(1,18):
    ARIMA_model_new_prediction.append(model_arima.predict(len(valid)+i)[-1])
model_predictions["ARIMA Model Prediction"]=ARIMA_model_new_prediction
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction Holt's Linear Model Prediction Holt's Winter Model Prediction AR Model Prediction MA Model Prediction ARIMA Model Prediction
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116 174366504.894772 175873980.988871 175686390.526716 176938616.913241 177416042.776341
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817 175133215.804870 176535478.925915 176528318.654233 177872887.185443 178299238.605094
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054 175899926.714968 177370523.430426 177370455.177831 178812067.494476 179262949.471225
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129 176666637.625065 178275543.880503 178213091.415752 179756157.840340 180311831.173595
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718 177433348.535163 179241090.821188 179057425.154646 180705158.223035 181391134.015959

SARIMA Model (using AUTO ARIMA)

model_sarima= auto_arima(model_train["Confirmed"],trace=True, error_action='ignore', 
                         start_p=0,start_q=0,max_p=2,max_q=2,m=7,
                   suppress_warnings=True,stepwise=True,seasonal=True)
model_sarima.fit(model_train["Confirmed"])
Performing stepwise search to minimize aic
 ARIMA(0,2,0)(1,0,1)[7]             : AIC=11695.173, Time=0.65 sec
 ARIMA(0,2,0)(0,0,0)[7]             : AIC=11807.732, Time=0.06 sec
 ARIMA(1,2,0)(1,0,0)[7]             : AIC=11687.019, Time=0.24 sec
 ARIMA(0,2,1)(0,0,1)[7]             : AIC=11682.480, Time=0.38 sec
 ARIMA(0,2,1)(0,0,0)[7]             : AIC=11786.601, Time=0.15 sec
 ARIMA(0,2,1)(1,0,1)[7]             : AIC=11593.360, Time=0.96 sec
 ARIMA(0,2,1)(1,0,0)[7]             : AIC=11635.387, Time=0.29 sec
 ARIMA(0,2,1)(2,0,1)[7]             : AIC=11592.530, Time=1.54 sec
 ARIMA(0,2,1)(2,0,0)[7]             : AIC=11623.862, Time=0.63 sec
 ARIMA(0,2,1)(2,0,2)[7]             : AIC=11593.700, Time=3.03 sec
 ARIMA(0,2,1)(1,0,2)[7]             : AIC=11592.018, Time=0.94 sec
 ARIMA(0,2,1)(0,0,2)[7]             : AIC=11661.146, Time=0.49 sec
 ARIMA(0,2,0)(1,0,2)[7]             : AIC=11698.470, Time=0.93 sec
 ARIMA(1,2,1)(1,0,2)[7]             : AIC=11589.999, Time=1.60 sec
 ARIMA(1,2,1)(0,0,2)[7]             : AIC=11649.655, Time=0.86 sec
 ARIMA(1,2,1)(1,0,1)[7]             : AIC=11591.366, Time=0.76 sec
 ARIMA(1,2,1)(2,0,2)[7]             : AIC=11591.685, Time=2.60 sec
 ARIMA(1,2,1)(0,0,1)[7]             : AIC=11664.029, Time=0.46 sec
 ARIMA(1,2,1)(2,0,1)[7]             : AIC=11590.503, Time=1.76 sec
 ARIMA(1,2,0)(1,0,2)[7]             : AIC=11629.360, Time=0.96 sec
 ARIMA(2,2,1)(1,0,2)[7]             : AIC=11575.118, Time=1.45 sec
 ARIMA(2,2,1)(0,0,2)[7]             : AIC=11640.735, Time=0.89 sec
 ARIMA(2,2,1)(1,0,1)[7]             : AIC=11576.924, Time=0.84 sec
 ARIMA(2,2,1)(2,0,2)[7]             : AIC=11576.807, Time=2.32 sec
 ARIMA(2,2,1)(0,0,1)[7]             : AIC=11656.210, Time=0.55 sec
 ARIMA(2,2,1)(2,0,1)[7]             : AIC=11575.667, Time=1.78 sec
 ARIMA(2,2,0)(1,0,2)[7]             : AIC=11617.001, Time=1.11 sec
 ARIMA(2,2,2)(1,0,2)[7]             : AIC=11530.862, Time=5.51 sec
 ARIMA(2,2,2)(0,0,2)[7]             : AIC=11594.983, Time=2.46 sec
 ARIMA(2,2,2)(1,0,1)[7]             : AIC=11529.930, Time=1.75 sec
 ARIMA(2,2,2)(0,0,1)[7]             : AIC=11597.914, Time=1.58 sec
 ARIMA(2,2,2)(1,0,0)[7]             : AIC=11593.771, Time=1.55 sec
 ARIMA(2,2,2)(2,0,1)[7]             : AIC=11531.098, Time=4.12 sec
 ARIMA(2,2,2)(0,0,0)[7]             : AIC=11602.758, Time=0.87 sec
 ARIMA(2,2,2)(2,0,0)[7]             : AIC=11579.930, Time=2.70 sec
 ARIMA(2,2,2)(2,0,2)[7]             : AIC=11532.498, Time=4.01 sec
 ARIMA(1,2,2)(1,0,1)[7]             : AIC=11542.508, Time=1.09 sec
 ARIMA(2,2,2)(1,0,1)[7] intercept   : AIC=inf, Time=2.32 sec

Best model:  ARIMA(2,2,2)(1,0,1)[7]          
Total fit time: 56.302 seconds
ARIMA(order=(2, 2, 2), scoring_args={}, seasonal_order=(1, 0, 1, 7),
      suppress_warnings=True, with_intercept=False)
prediction_sarima=model_sarima.predict(len(valid))
y_pred["SARIMA Model Prediction"]=prediction_sarima
model_scores.append(np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["SARIMA Model Prediction"])))
print("Root Mean Square Error for SARIMA Model: ",np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["SARIMA Model Prediction"])))
Root Mean Square Error for SARIMA Model:  2357813.584875235
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Confirmed"],
                    mode='lines+markers',name="Train Data for Confirmed Cases"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Confirmed"],
                    mode='lines+markers',name="Validation Data for Confirmed Cases",))
fig.add_trace(go.Scatter(x=valid.index, y=y_pred["SARIMA Model Prediction"],
                    mode='lines+markers',name="Prediction for Confirmed Cases",))
fig.update_layout(title="Confirmed Cases SARIMA Model Prediction",
                 xaxis_title="Date",yaxis_title="Confirmed Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
SARIMA_model_new_prediction=[]
for i in range(1,18):
    SARIMA_model_new_prediction.append(model_sarima.predict(len(valid)+i)[-1])
model_predictions["SARIMA Model Prediction"]=SARIMA_model_new_prediction
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction Holt's Linear Model Prediction Holt's Winter Model Prediction AR Model Prediction MA Model Prediction ARIMA Model Prediction SARIMA Model Prediction
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116 174366504.894772 175873980.988871 175686390.526716 176938616.913241 177416042.776341 175731193.635291
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817 175133215.804870 176535478.925915 176528318.654233 177872887.185443 178299238.605094 176486478.773622
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054 175899926.714968 177370523.430426 177370455.177831 178812067.494476 179262949.471225 177346177.932718
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129 176666637.625065 178275543.880503 178213091.415752 179756157.840340 180311831.173595 178259953.574374
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718 177433348.535163 179241090.821188 179057425.154646 180705158.223035 181391134.015959 179192135.442158

Facebook's Prophet Model for forecasting

prophet_c=Prophet(interval_width=0.95,weekly_seasonality=True,)
prophet_confirmed=pd.DataFrame(zip(list(datewise.index),list(datewise["Confirmed"])),columns=['ds','y'])
prophet_c.fit(prophet_confirmed)
INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp1hdn0j_9/621fvi80.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp1hdn0j_9/n1yc24i_.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.7/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=32513', 'data', 'file=/tmp/tmp1hdn0j_9/621fvi80.json', 'init=/tmp/tmp1hdn0j_9/n1yc24i_.json', 'output', 'file=/tmp/tmpgh2viftq/prophet_model-20220911120844.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
12:08:44 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
12:08:44 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing
<prophet.forecaster.Prophet at 0x7f19be49f1d0>
forecast_c=prophet_c.make_future_dataframe(periods=17)
forecast_confirmed=forecast_c.copy()
confirmed_forecast=prophet_c.predict(forecast_c)
#print(confirmed_forecast[['ds','yhat', 'yhat_lower', 'yhat_upper']])
model_scores.append(np.sqrt(mean_squared_error(datewise["Confirmed"],confirmed_forecast['yhat'].head(datewise.shape[0]))))
print("Root Mean Squared Error for Prophet Model: ",np.sqrt(mean_squared_error(datewise["Confirmed"],confirmed_forecast['yhat'].head(datewise.shape[0]))))
Root Mean Squared Error for Prophet Model:  1027618.2849851169
print(prophet_c.plot(confirmed_forecast))
Figure(720x432)
print(prophet_c.plot_components(confirmed_forecast))
Figure(648x432)

Summarization of Forecasts using different Models

model_names=["Linear Regression","Polynomial Regression","Support Vector Machine Regressor","Holt's Linear","Holt's Winter Model",
            "Auto Regressive Model (AR)","Moving Average Model (MA)","ARIMA Model","SARIMA Model","Facebook's Prophet Model"]
model_summary=pd.DataFrame(zip(model_names,model_scores),columns=["Model Name","Root Mean Squared Error"]).sort_values(["Root Mean Squared Error"])
model_summary
Model Name Root Mean Squared Error
9 Facebook's Prophet Model 1027618.284985
3 Holt's Linear 1696111.792446
5 Auto Regressive Model (AR) 2350964.490322
8 SARIMA Model 2357813.584875
4 Holt's Winter Model 2594639.668226
6 Moving Average Model (MA) 2901478.927361
7 ARIMA Model 3160128.216757
1 Polynomial Regression 27362958.416571
2 Support Vector Machine Regressor 27435923.216931
0 Linear Regression 33541511.296706
model_predictions["Prophet's Prediction"]=list(confirmed_forecast["yhat"].tail(17))
model_predictions["Prophet's Upper Bound"]=list(confirmed_forecast["yhat_upper"].tail(17))
model_predictions.head()
Dates Linear Regression Prediction Polynonmial Regression Prediction SVM Prediction Holt's Linear Model Prediction Holt's Winter Model Prediction AR Model Prediction MA Model Prediction ARIMA Model Prediction SARIMA Model Prediction Prophet's Prediction Prophet's Upper Bound
0 2021-05-30 134093578.446173 226028087.475372 216458436.483116 174366504.894772 175873980.988871 175686390.526716 176938616.913241 177416042.776341 175731193.635291 169459809.090123 171483558.872668
1 2021-05-31 134427502.855213 230137725.685058 218802891.161817 175133215.804870 176535478.925915 176528318.654233 177872887.185443 178299238.605094 176486478.773622 170030833.121070 172047244.682688
2 2021-06-01 134761427.264252 234392252.930976 221171147.183054 175899926.714968 177370523.430426 177370455.177831 178812067.494476 179262949.471225 177346177.932718 170654717.260170 172588532.062770
3 2021-06-02 135095351.673291 238795909.836299 223563397.464129 176666637.625065 178275543.880503 178213091.415752 179756157.840340 180311831.173595 178259953.574374 171315069.571070 173389186.329374
4 2021-06-03 135429276.082331 243353027.074790 225979836.092718 177433348.535163 179241090.821188 179057425.154646 180705158.223035 181391134.015959 179192135.442158 171988246.771603 174029579.064819

Time Series Forecasting for Death Cases

fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Deaths"],
                    mode='lines+markers',name="Death Cases"))
fig.update_layout(title="Death Cases",
                 xaxis_title="Date",yaxis_title="Number of Death Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
model_train=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid=datewise.iloc[int(datewise.shape[0]*0.95):]
y_pred=valid.copy()
model_arima_deaths=auto_arima(model_train["Deaths"],trace=True, error_action='ignore', start_p=0,start_q=0,
                              max_p=5,max_q=5,suppress_warnings=True,stepwise=False,seasonal=False)     
model_arima_deaths.fit(model_train["Deaths"])
 ARIMA(0,2,0)(0,0,0)[0] intercept   : AIC=8372.566, Time=0.03 sec
 ARIMA(0,2,1)(0,0,0)[0] intercept   : AIC=8373.587, Time=0.08 sec
 ARIMA(0,2,2)(0,0,0)[0] intercept   : AIC=8257.762, Time=0.42 sec
 ARIMA(0,2,3)(0,0,0)[0] intercept   : AIC=8225.544, Time=1.01 sec
 ARIMA(0,2,4)(0,0,0)[0] intercept   : AIC=8133.811, Time=1.16 sec
 ARIMA(0,2,5)(0,0,0)[0] intercept   : AIC=8117.849, Time=1.32 sec
 ARIMA(1,2,0)(0,0,0)[0] intercept   : AIC=8374.053, Time=0.04 sec
 ARIMA(1,2,1)(0,0,0)[0] intercept   : AIC=8283.551, Time=0.89 sec
 ARIMA(1,2,2)(0,0,0)[0] intercept   : AIC=8246.770, Time=0.95 sec
 ARIMA(1,2,3)(0,0,0)[0] intercept   : AIC=8206.501, Time=1.20 sec
 ARIMA(1,2,4)(0,0,0)[0] intercept   : AIC=8114.336, Time=1.41 sec
 ARIMA(2,2,0)(0,0,0)[0] intercept   : AIC=8343.019, Time=0.08 sec
 ARIMA(2,2,1)(0,0,0)[0] intercept   : AIC=8201.101, Time=0.83 sec
 ARIMA(2,2,2)(0,0,0)[0] intercept   : AIC=8079.130, Time=1.17 sec
 ARIMA(2,2,3)(0,0,0)[0] intercept   : AIC=8032.299, Time=1.31 sec
 ARIMA(3,2,0)(0,0,0)[0] intercept   : AIC=8298.821, Time=0.14 sec
 ARIMA(3,2,1)(0,0,0)[0] intercept   : AIC=8164.503, Time=1.03 sec
 ARIMA(3,2,2)(0,0,0)[0] intercept   : AIC=8206.177, Time=1.27 sec
 ARIMA(4,2,0)(0,0,0)[0] intercept   : AIC=8268.916, Time=0.21 sec
 ARIMA(4,2,1)(0,0,0)[0] intercept   : AIC=8129.881, Time=0.88 sec
 ARIMA(5,2,0)(0,0,0)[0] intercept   : AIC=7978.749, Time=0.30 sec

Best model:  ARIMA(5,2,0)(0,0,0)[0] intercept
Total fit time: 15.803 seconds
ARIMA(order=(5, 2, 0), scoring_args={}, suppress_warnings=True)
predictions_deaths=model_arima_deaths.predict(len(valid))
y_pred["ARIMA Death Prediction"]=predictions_deaths
print("Root Mean Square Error: ",np.sqrt(mean_squared_error(valid["Deaths"],predictions_deaths)))
Root Mean Square Error:  16995.356616234585
fig=go.Figure()
fig.add_trace(go.Scatter(x=model_train.index, y=model_train["Deaths"],
                    mode='lines+markers',name="Train Data for Death Cases"))
fig.add_trace(go.Scatter(x=valid.index, y=valid["Deaths"],
                    mode='lines+markers',name="Validation Data for Death Cases",))
fig.add_trace(go.Scatter(x=valid.index, y=y_pred["ARIMA Death Prediction"],
                    mode='lines+markers',name="Prediction for Death Cases",))
fig.update_layout(title="Death Cases ARIMA Model Prediction",
                 xaxis_title="Date",yaxis_title="Death Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()
ARIMA_model_death_forecast=[]
for i in range(1,18):
    ARIMA_model_death_forecast.append(model_arima_deaths.predict(len(valid)+i)[-1])
pd.DataFrame(zip(new_date,ARIMA_model_death_forecast),columns=["Deaths","ARIMA Model Death Forecast"]).head()
Deaths ARIMA Model Death Forecast
0 2021-05-30 3581571.344507
1 2021-05-31 3595243.649509
2 2021-06-01 3609474.437173
3 2021-06-02 3624071.061656
4 2021-06-03 3638565.543238

Conclusion

COVID-19 doesn't have very high mortatlity rate as we can see which is the most positive take away. Also the healthy Recovery Rate implies the disease is cureable. The only matter of concern is the exponential growth rate of infection.

Countries like USA, Spain, United Kingdom,and Italy are facing some serious trouble in containing the disease showing how deadly the neglegence can lead to. The need of the hour is to perform COVID-19 pendemic controlling practices like Testing, Contact Tracing and Quarantine with a speed greater than the speed of disease spread at each country level.

The reason of putting this graph in the conclusion, there is an interesting pattern to observe here, Everytime there has been drop in World's Carbon emission, the world economy crashed. A one classic example is 2008 recession. I think most of you must have already guessed what's ahead, probably COVID-19 is just a big wave with a Tsunami of Recession or Depression following it.

The growth of Confirmed and Death Cases seems to have slowed down since past few days. Which is really good sign. Hope this goes like that for a brief period. There should not be any new country emerging as the new epicenter of COVID-19 just like USA happened to be that epicenter for brief period. In case if any new country emerges as new epicenter, the Growth of Confirmed Cases will shoot up again.