Skip to the content.

Project 2 - Predictive Modeling

This repo documents our group’s work on Project 2. Project 2 involved analyzing the online news popularity data set, creating predictive models to predict the number of shares for each channel, and automating Markdown reports for each type of article.

Packages Required

The following packages were used to retrieve and analyze the data:
* tidyverse: Functions used to manipulate and reshape data.
* dplyr: Functions used to manipulate data in R.
* GGally: Functions used to create correlograms.
* caret: Functions that streamline the model training process for regression and classification problems.
* randomForest: Has the function randomForest() which is used to create and analyze random forests.
* doParallel: Functions used to allow parallel computing in R.
* rmarkdown: Package helps you create dynamic analysis documents that combine code, rendered output, and text.
* knitr: Package that enables integration of R code into LaTex, HTML, and Markdown documents.

Reports

A report was created for each data channel: Lifestyle, Business, Entertainment, Social Media, Tech, and World articles.

Code

The code to create the analysis from a single .Rmd file is below:

# Create variable that contains each article topic 
library(tidyverse)
Topic <- list("Lifestyle", "Business", "Entertainment", "Social Media",
              "Tech", "World")

# Create list for output files
output_file <- paste0("Reports/", Topic, ".md")
params <- lapply(Topic, FUN = function(x){list(topic = x)})
reports <- tibble(output_file, params)

# Use apply function to render through each article
library(rmarkdown)
apply(reports, MARGIN = 1, 
      FUN = function(x){
        render("Project 2 Rmd.Rmd", output_file = x[[1]], params = x[[2]])
      })