1.1 Exploratory Data Analysis

1.1.1 Libraries

We will use many functions from the base package, which is loaded by default, but also some functions from time series libraries. What makes the R programming language so great is its vast contributed packages libraries. An exhaustive list of these can be found at https://cran.r-project.org/web/views/TimeSeries.html. Let’s install some of these.

# Most common time series libraries
install.packages(c("dlm", "forecast", "lubridate", "tsa", "tseries", "xts", 
    "zoo"), dependencies = TRUE)
# Datasets
install.packages(c("expsmooth", "fpp", "TSA", "astsa"))
# Hadley Wickham's tidyverse universe
install.packages("tidyverse")

# To load a library, use the function
library(xts)

As a side remark, note that the tidyverse, which loads a bundle of packages, overwrites some of the base functions, notably lag and filter which are present in both dplyr and stats (one of the default libraries that come alongside with R and is loaded upon start). Load libraries with great caution! In case of ambiguity (when many functions in different packages have the same name), use the :: operator to specify the package, e.g. stat::lag. You can unload a library using the command

detach("package:tidyverse", unload = TRUE)

1.1.2 Loading datasets

You can load and read objects, whether txt, csv from your computer or by directly downloading them into R from the web. You can call R datasets found in packages via data()

Good data sources for your semester projects are

1.1.3 Time series objects and basic plots

Objects in R are vectors by default, which have a type and attributes (vector is a type, length is an attribute of vectors). Some objects also inherit a class, such as ts. They inherit printing and plotting methods specific to the data class.

We start by loading the AirPassengers dataset, which contains monthly airline passenger numbers for years 1949-1960. Datasets that are found in libraries other than datasets must typically be loaded via a call to data, unless they are lazy loaded when calling the library. Both datasets are time series.

# AirPassenger dataset, lazy-loaded
class(AirPassengers)  #object of class 'ts'
[1] "ts"
`?`(AirPassengers  #description of the dataset
)
# Basic plot
plot(AirPassengers, ylab = "Monthly total (in thousands)", main = "Number of international airline passengers")
grid()

`?`(sunspot.month)
plot(sunspot.month, ylab = "Monthly number of sunspots", main = "Monthly mean relative sunspot numbers from 1749 to 1983", 
    bty = "l")

# Dataset present in a R package - without loading the package
data(list = "birth", package = "astsa")
plot(birth, ylab = "Monthly live births (in thousands)", main = "U.S. Monthly Live Birth")