Project1-558

View project on GitHub

Covid19 API Vignette

Chennade Brown 10/05/2021

Introduction

This vignette will go through the steps of reading and summarizing data from an API. We will utilize the Covid19 API.

Packages Required

The following packages were used to utilize the functions needed for interacting with the API and manipulating the retrieved data:
* httr: Used to retrieve data from APIs.
* jsonlite: Functions used to manipulate JSON data.
* tidyverse: Functions used to manipulate and reshape data.
* dplyr: Function that contains a set of tools for manipulating data in R.

library(httr)
library(jsonlite)
library(tidyverse)
library(dplyr)

Accessing the Data

First we will access data from the API regarding confirmed Covid19 cases for Norway and Switzerland during the time frame from March 1, 2020 - April 1, 2020 for comparison. This will be achieved using the GET function from the httr package to return information from the API.

getCountry <- GET("https://api.covid19api.com/country/Switzerland/status/confirmed?from=2020-03-01T00:00:00Z&to=2020-04-01T00:00:00Z")

getCountry2 <- GET("https://api.covid19api.com/country/Norway/status/confirmed?from=2020-03-01T00:00:00Z&to=2020-04-01T00:00:00Z")

We will now review the contents of the returned data. The content function with the text parameter converts the raw data to JSON.

getCountryText <- content(getCountry, "text")
getCountryText2 <- content(getCountry2, "text")

We will now convert the raw data into JSON format which will allow us to parse the data and output a data frame using the jsonlite package.

getCountryJs <- fromJSON(getCountryText, flatten = TRUE)
getCountryJs2 <- fromJSON(getCountryText2, flatten = TRUE)

We will convert the parsed JSON data for Norway & Switzerland into a tibble for our analysis.

getCountryJsTb <- as_tibble(getCountryJs)
getCountryJs2Tb <- as_tibble(getCountryJs2)

We will now combine the data for Norway and Switzerland into one data frame using the rbind function and remove the empty columns Province, City, & CityCode.

NorSwiz <- function(df1, df2) {rbind(df1, df2)}
Combo <- NorSwiz(df1 = getCountryJsTb, df2 = getCountryJs2Tb)
Combo$Province <- NULL
Combo$City <- NULL
Combo$CityCode <- NULL

The following functions will allow the user to query the Norway and Switzerland API by columns by entering the column names or all. To subset by rows the user can enter the column name and row to be selected in the filter function.

ConfirmedSwiss <- function(type = "all"){
  OutputDatSwiss <- GET("https://api.covid19api.com/country/Switzerland/status/confirmed?from=2020-03-01T00:00:00Z&to=2020-04-01T00:00:00Z")
    DataSwiss <- fromJSON(rawToChar(OutputDatSwiss$content))
  if(type!="all"){DataSwiss <- DataSwiss %>% select(type, Country, CountryCode, Cases, Lat, Lon, Date, Status) %>% filter(Cases == 27)}
  return(DataSwiss)}
ConfirmedSwiss("Country")
ConfirmedNor <- function(type = "all"){
  OutputDatNor <- GET("https://api.covid19api.com/country/Norway/status/confirmed?from=2020-03-01T00:00:00Z&to=2020-04-01T00:00:00Z")
    DataNor <- fromJSON(rawToChar(OutputDatNor$content))
  if(type!="all"){DataNor <- DataNor %>% select(type, Country, CountryCode, Cases, Lat, Lon, Date, Status) %>% filter(Cases == 19)}
  return(DataNor)}
ConfirmedNor("CountryCode")

The following functions returns data for South Africa & Mexico from the first recorded case. We will go through the same steps outlined above to retrieve and parse the data from the API.

DayOneDataSa <- GET(
  url = "https://api.covid19api.com/dayone/country/south-africa/status/confirmed")
DayOneDataMex <- GET(
  url = "https://api.covid19api.com/dayone/country/mexico/status/confirmed")

DayOneDataText <- content(DayOneDataSa, "text")
DayOneDataTextMex <- content(DayOneDataMex, "text")

DayOneDataJson <- fromJSON(DayOneDataText, flatten = TRUE)
DayOneDataJsonMex <- fromJSON(DayOneDataTextMex, flatten = TRUE)

DayOneDfSa <- as_tibble(DayOneDataJson)
DayOneDfMex <- as_tibble(DayOneDataJsonMex)

We will combine the data sets for the analysis.

SaMex <- function(df1, df2) {rbind(df1, df2)}
Day1 <- SaMex(df1 = DayOneDfSa, df2 = DayOneDfMex)

Day1$Province <- NULL
Day1$City <- NULL
Day1$CityCode <- NULL

These functions will allow the user to query the Day One Mexico and Day One South Africa APIs by columns by entering all or a column name. To subset by rows the user can enter the column name and row to be selected in the filter function.

DayOneSa <- function(type = "all"){
  OutputDat <- GET("https://api.covid19api.com/dayone/country/south-africa/status/confirmed")
    DataSa <- fromJSON(rawToChar(OutputDat$content))
  if(type!="all"){DataSa <- DataSa %>% select(type, Country, CountryCode, Cases, Lat, Lon, Date) %>% filter(Cases == 1)}
  return(DataSa)}
DayOneSa("CountryCode")
DayOneMex <- function(type = "all"){
  OutputDatMex <- GET("https://api.covid19api.com/dayone/country/mexico/status/confirmed")
    DataMex <- fromJSON(rawToChar(OutputDatMex$content))
  if(type!="all"){DataMex <- DataMex %>% select(type, Country, CountryCode, Cases, Lat, Lon, Date) %>% filter(Cases == 4)}
  return(DataMex)}
DayOneMex("CountryCode")

We will now look at summary data which contains a summary of new and total cases per country which is updated daily utilizing the steps outlined above to retrieve and parse data from the API.

resp2 <- GET("https://api.covid19api.com/summary")

resp2Text <- content(resp2, "text")
resp2Json <- fromJSON(resp2Text, flatten = TRUE)

# Save data frame as resp1Df and remove the ID column.
resp2Df <- as.data.frame(resp2Json$Countries)

resp2Df$ID <- NULL

This function allows the user to query the Summary API by columns by entering the column name or all. To subset by rows the user can enter the column name and row to be selected in the filter function.

Summary <- function(type = "all"){
  OutputAPI <- GET("https://api.covid19api.com/summary")
    data <- fromJSON(rawToChar(OutputAPI$content))
    data2 <- data$Countries
  if(type!="all"){data2 <- data2 %>% select(type, Country, CountryCode,  NewDeaths, TotalDeaths, NewRecovered) %>% filter(Country == "Afghanistan")}
  return(data2)}
Summary("all")

The next set of functions will return live cases by case type for a country after a given date. This particular endpoint returns cases for Greece.

Live <- GET("https://api.covid19api.com/live/country/greece/status/confirmed/date/2020-03-21T13:13:30Z")
LiveText <- content(Live, "text")
LiveTextJson <- fromJSON(LiveText, flatten = TRUE)

# Save the data frame as LiveDf.
LiveDf <- as.data.frame(LiveTextJson)

#Remove the ID, Province, City, & CityCode columns.
LiveDf$ID <- NULL
LiveDf$Province <- NULL
LiveDf$City <- NULL
LiveDf$CityCode <- NULL

The following function will allow the user to query live cases for Greece by columns by entering the column name or all. To subset by rows the user can enter the column name and row to be selected in the filter function.

Live2 <- function(type = "all"){
  OutputLive2 <- GET("https://api.covid19api.com/live/country/greece/status/confirmed/date/2020-03-21T13:13:30Z")
    DataLive <- fromJSON(rawToChar(OutputLive2$content))
  if(type!="all"){DataLive <- DataLive %>% select(type, Country, CountryCode, Lat, Lon, Confirmed, Deaths, Recovered, Active, Date) %>% filter(Deaths == "12604")}
  return(DataLive)}
Live2("Country")

The next set of functions will access the Day One endpoint which has information on all cases by case type for a country from the first recorded case. This function accesses cases for Denmark.

One <- GET("https://api.covid19api.com/total/dayone/country/denmark/status/confirmed")
OneText <- content(One, "text")
OneTextJson <- fromJSON(OneText, flatten = TRUE)

# Save the data frame as OneDf.
OneDf <- as.data.frame(OneTextJson)

The following function will allow the user to query day one cases for Denmark by columns by entering the column name or all. To subset by rows the user can enter the column name and row to be selected in the filter function.

OneFun <- function(type = "all"){
  OutputOneFun <- GET("https://api.covid19api.com/total/dayone/country/denmark/status/confirmed")
    DataOneFun <- fromJSON(rawToChar(OutputOneFun$content))
  if(type!="all"){DataOneFun <- DataOneFun %>% select(type, Country, CountryCode, Lat, Lon, Date, Status, Cases, Province, City, CityCode) %>% filter(Cases == 1)}
  return(DataOneFun)}
OneFun("Country")

The last function will contact the Countries API which returns all the available countries and provinces listed on the API which may be useful for the user when querying the returned data.

Ctry <- GET("https://api.covid19api.com/countries")
CtryText <- content(Ctry, "text")
CtryTextJson <- fromJSON(CtryText, flatten = TRUE)

# Save the data frame as CtryDf.
CtryDf <- as.data.frame(CtryTextJson)

The following function will allow the user to query the Countries API by column by entering the column name or all. To subset by rows the user can enter the column name and row to be selected in the filter function.

CtryFun <- function(type = "all"){
  OutputCtryFun <- GET("https://api.covid19api.com/countries")
    CtryDataFun <- fromJSON(rawToChar(OutputCtryFun$content))
  if(type!="all"){CtryDataFun <- CtryDataFun %>% select(type, Country, Slug, ISO2) %>% filter(Country == "Poland")}
  return(CtryDataFun)}
CtryFun("Country")

Data Analysis

We will create a contingency table that shows the occurrences of confirmed cases between March - April 2020 for Switzerland and Norway. Each country had a confirmed case during this time frame.

tbl <- table(Combo$Status, Combo$Country)
tbl
##            
##             Norway Switzerland
##   confirmed     32          32

Below is a box plot representing confirmed cases in Norway and Switzerland between March - April 2020. The output shows that Switzerland had a higher number of confirmed cases.

ggplot(Combo, aes(x = Cases, y = Country)) +
geom_boxplot() + geom_jitter(aes(color = Status)) + ggtitle("Boxplot for Confirmed Cases")

The following code calculates numerical summaries for daily cases confirmed for the two countries. The mean case count per day for Norway was 1,760 cases and the mean case count per day for Switzerland was 5,402 cases.

Combo %>%
  group_by(Country) %>%
  summarize(Avg = mean(Cases), Sd = sd(Cases), Median = median(Cases), IQR =    IQR(Cases))
## # A tibble: 2 x 5
##   Country       Avg    Sd Median   IQR
##   <chr>       <dbl> <dbl>  <dbl> <dbl>
## 1 Norway      1760. 1590.   1398 2720.
## 2 Switzerland 5402. 5967.   2450 9767.

The following bar plot reports the number of confirmed case statuses for Norway & Switzerland which were similar through March - April 2020. The data appears to report each day that had a confirmed case status.

ggplot(Combo, aes(x = Country)) + geom_bar(aes(fill = Status), position = "dodge") + xlab("Country") + scale_fill_discrete(name = "") + ggtitle("Confirmed Case Statuses")

The following contingency table reports the number of confirmed case statuses for South Africa and Mexico returned from the Day 1 API. The count represents the number of days in which at least one confirmed case was reported.

tbl2 <- table(Day1$Status, Day1$Country)
tbl2
##            
##             Mexico South Africa
##   confirmed    587          582

The following bar graph plots the output of the above contingency table which shows the confirmed case statuses for Mexico and South Africa from the first recorded case. Case statuses were similar for each country.

ggplot(Day1, aes(x = Country)) + geom_bar(aes(fill = Status), position = "dodge") + xlab("Country") + scale_fill_discrete(name = "") + ggtitle("Confirmed Cases Statuses")

The following code calculates numerical summaries for daily cases since Day 1 for Mexico and South Africa. Average cases for Mexico were approximately 1.4M and average cases for South Africa were 1.1M.

Day1 %>%
  group_by(Country) %>%
  summarize(Avg = mean(Cases), Sd = sd(Cases), Median = median(Cases), IQR =    IQR(Cases))
## # A tibble: 2 x 5
##   Country           Avg       Sd   Median      IQR
##   <chr>           <dbl>    <dbl>    <dbl>    <dbl>
## 1 Mexico       1441368. 1125767. 1289298  1998800.
## 2 South Africa 1134102.  879501.  926316. 1147854

The following code creates a new variable (Ratio = TotalDeaths/TotalConfirmed) that represents the ratio of total deaths to total confirmed cases and appends to the Resp2Df which is based on total cases for Countries that is updated daily. The ratio of deaths to total confirmed cases is low meaning most people recover from Covid19.

resp2Df <- resp2Df %>% mutate(Ratio = TotalDeaths/TotalConfirmed)
head(resp2Df) %>% select(Country, TotalDeaths, TotalConfirmed, Ratio)
## # A tibble: 6 x 4
##   Country             TotalDeaths TotalConfirmed   Ratio
##   <chr>                     <int>          <int>   <dbl>
## 1 Afghanistan                7220         155380 0.0465 
## 2 Albania                    2734         173190 0.0158 
## 3 Algeria                    5838         204171 0.0286 
## 4 Andorra                     130          15284 0.00851
## 5 Angola                     1598          60448 0.0264 
## 6 Antigua and Barbuda          87           3581 0.0243

The following code returns numerical summaries for total deaths among all countries reported in the Summary API. The average total deaths was approximately 25K.

resp2Df %>%
    summarize(Avg = mean(TotalDeaths), Max = max(TotalDeaths), Sd = sd(TotalDeaths), Min = min(TotalDeaths))
## # A tibble: 1 x 4
##      Avg    Max     Sd   Min
##    <dbl>  <int>  <dbl> <int>
## 1 25131. 707781 81699.     0

The following code returns a scatter plot for the summary data set. The plot shows there is minimal data on Newly Confirmed Cases and New Deaths; however, the graph shows a positive correlation between newly confirmed cases and new deaths.

correlation <- cor(resp2Df$NewConfirmed, resp2Df$NewDeaths)
g <- ggplot(resp2Df, aes(x = NewConfirmed, y = NewDeaths))+ labs(y="New Deaths", x = "New Confirmed Cases")
g + geom_point(col = "Red") + ggtitle("New Confirmed Cases vs New Deaths") + geom_text(x = 75000, y = 50, size = 5, label = paste0("Correlation = ", round(correlation, 2)))

The following code creates a new variable which calculates new deaths to new confirmed cases for the countries in the summary data set. The ratio of new deaths to confirmed cases was less than 1% for Australia.

resp2Df <- resp2Df %>% mutate(Ratio = NewDeaths/NewConfirmed)
a <- filter(resp2Df, Country == "Australia")

head(a) %>% select(Country, NewConfirmed, NewDeaths, Ratio)
## # A tibble: 1 x 4
##   Country   NewConfirmed NewDeaths   Ratio
##   <chr>            <int>     <int>   <dbl>
## 1 Australia         2216        10 0.00451

The following code will produce a histogram that displays the total confirmed cases for all countries included in the summary API. The highest frequency of cases was between 50,000 and 250,000 leveling off in the 750,000 range.

ggplot(data = resp2Df, aes(TotalConfirmed)) + geom_histogram(breaks = seq(20000, 900000, by = 70000), col = "blue", fill = "purple", alpha = .2) + labs(title = "Total Confirmed Cases for All Countries") + xlab("Total Confirmed")

Conclusion

We have reached the conclusion of this vignette. In summary this vignette details the functions used and created to access and parse data from the Covid19 API. The vignette also includes an exploratory data analysis which includes plots, tables, and numerical summaries. This vignette can be used to get a basic understanding of how to contact and retrieve data from an API.