Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Applied Supervised Learning with R

You're reading from   Applied Supervised Learning with R Use machine learning libraries of R to build models that solve business problems and predict future trends

Arrow left icon
Product type Paperback
Published in May 2019
Publisher
ISBN-13 9781838556334
Length 502 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Jojo Moolayil Jojo Moolayil
Author Profile Icon Jojo Moolayil
Jojo Moolayil
Karthik Ramasubramanian Karthik Ramasubramanian
Author Profile Icon Karthik Ramasubramanian
Karthik Ramasubramanian
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Applied Supervised Learning with R
Preface
1. R for Advanced Analytics FREE CHAPTER 2. Exploratory Analysis of Data 3. Introduction to Supervised Learning 4. Regression 5. Classification 6. Feature Selection and Dimensionality Reduction 7. Model Improvements 8. Model Deployment 9. Capstone Project - Based on Research Papers Appendix

Chapter 1: R for Advanced Analytics


Activity 1: Create an R Markdown File to Read a CSV File and Write a Summary of Data

  1. Start the RStudio and navigate to Files | New Files | R Markdown.

  2. On the New R Markdown window, provide the Title and Author name, as illustrated in the following screenshot. Ensure that you select the Word option under the Default Output Format section:

    Figure 1.13: Creating a new R Markdown file in Rstudio

  3. Now, use the read.csv() method to read the bank-full.csv file:

    Figure 1.14: Using the read.csv method to read the data

  4. Finally, print the summary into a word file using the summary method:

    Figure 1.15: Final output after using the summary method

Activity 2: Create a List of Two Matrices and Access the Values

  1. Create two matrices of size 10 x 4 and 4 x 5 by randomly generated numbers from a binomial distribution (use rbinom method). Call the matrix mat_A and mat_B, respectively:

    mat_A <- matrix(rbinom(n = 40, size = 100, prob = 0.4),nrow = 10, ncol=4)
    mat_B <- matrix(rbinom(n = 20, size = 100, prob = 0.4),nrow = 4, ncol=5)
  2. Now, store the two matrices in a list:

    list_of_matrices <- list(mat_A = mat_A, mat_B =mat_B)
  3. Using the list, access the row 4 and column 2 of mat_A and store it in variable A, and access row 2 and column 1 of mat_B and store it in variable B:

    A <- list_of_matrices[["mat_A"]][4,2]
    B <- list_of_matrices[["mat_B"]][2,1]
  4. Multiply the A and B matrices and subtract from row 2 and column 1 of mat_A:

    list_of_matrices[["mat_A"]][2,1] - (A*B)

    The output is as follows:

    ## [1] -1554

Activity 3: Create a DataFrame with Five Summary Statistics for All Numeric Variables from Bank Data Using dplyr and tidyr

  1. Import the dplyr and tidyr packages in the system:

    library(dplyr)
    library(tidyr)
    Warning: package 'tidyr' was built under R version 3.2.5
  2. Create the df DataFrame and import the file into it:

    df <- tbl_df(df_bank_detail)
  3. Extract all numeric variables from bank data using select(), and compute min, 1st quartile, 3rd quartile, median, mean, max, and standard deviation using the summarise_all() method:

    df_wide <- df %>%
      select(age, balance, duration, pdays) %>% 
      summarise_all(funs(min = min, 
                          q25 = quantile(., 0.25), 
                          median = median, 
                          q75 = quantile(., 0.75), 
                          max = max,
                          mean = mean, 
                          sd = sd))
  4. The result is a wide data frame. 4 variable, 7 measures:

    dim(df_wide)
    ## [1]  1 28
  5. Store the result in a DataFrame of wide format named df_wide, reshape it using the tidyr functions, and, finally, convert the wide format to deep, use the gather, separate, and spread functions of the tidyr package:

    df_stats_tidy <- df_wide %>% gather(stat, val) %>%
      separate(stat, into = c("var", "stat"), sep = "_") %>%
      spread(stat, val) %>%
      select(var,min, q25, median, q75, max, mean, sd) # reorder columns
    print(df_stats_tidy)

    The output is as follows:

    ## # A tibble: 4 x 8
    ##        var   min   q25 median   q75    max       mean         sd
    ## *    <chr> <dbl> <dbl>  <dbl> <dbl>  <dbl>      <dbl>      <dbl>
    ## 1      age    18    33     39    48     95   40.93621   10.61876
    ## 2  balance -8019    72    448  1428 102127 1362.27206 3044.76583
    ## 3 duration     0   103    180   319   4918  258.16308  257.52781
    ## 4    pdays    -1    -1     -1    -1    871   40.19783  100.12875
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image