R intro part 2
Written on November 18th, 2019 by szarki9Hi there again,
part 2 of R intro summary :)
other handy functions:
lapply - returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.
sapply – returns vector, matrix or array of output from applying FUN to elements of X, more user-friendly then lapply.
vapply – tries to generate named array, has a pre-specified type of return value, so it can be safer (and sometimes faster) to use
seq, rep, sort, rev – reverse the elements, append – merge vectors, is.* – check the class of R object, as.* – convert an R object from one class to another, unlist – flatten list to a vectors
REGULAR EXPRESSIONS! – is a sequence of characters that define a search pattern.
grepl - true if a pattern is found, grep - vector of indices of the character strings that contain the pattern, in sub and gsub you can specify a replacement argument, sub only first match, gsub all the matches.
Date and POSIXct objects –
dyplr: arrange – in descending or ascending order, filter , mutate – adding new column out of the rest
ggplot2: ggplot, facet_wrap – wraps a 1d sequence of panels into 2d, expand_limits
different plots with ggplot2: + geom_col -bar plot, geom_point - points, geom_line, geom_histogram, geom_boxplot
LOADING DATA INTO R
Utils package:
read.csv, read.delim – from txt files, read.table, which.min, which.max – returns an index of min or max value in the column, in read.delim for example you can add column names and column classes
readr library:
read_csv, read_tsv – tsv files, read_delim
collectors – are used to pass information about how to interpret values in a column; col_integer, col_factors
data.table library:
fread – same as read.table, but extremely fast and easy
readxl library:
excel_sheets – prints out names of the sheets in excel, read_excel – imports excels as tbl_df, tbl, data.frame,
gdata library:
read.xls – converting excel files to csv and then reading csv files using read.csv,
XLConnect library:
loadWorkbook – a bridge between Excel file and R session, getSheets – list of sheets, readWorksheet – importing the sheet into a data frame, createSheet – create a new sheet, writeWorksheet – adding data frames to a sheet, saveWorkbook – store adapter excel file, renameSheet, removeSheet
DBI library:
dbConnect – MySQLConnection, dbListTables – list of the tables in db, dbReadTable, dbGetQuery – using query to get data from a database table, dbDisconnect
dbSendQuery, then dbFetch – fetching results of executing a query, gives the ability to fetch the query’s result in chunks rather than all at once, dbClearResult – frees the memory
Importing flat files from the web: using readr library, read_csv or read_tsv with url address.
Downloading files from the web: readxl and gdata libraries, read.xls(gdata) and then download.file or read_excel (readr).
Downloading .RData files, download.file first and then load it into the workspace with load.
httr library:
GET – to get request from the web, in result we get the response object, that provides easy access to the status code, content-type, and actual content, using content we can extract the content, and we can define what object we want to retrieve: raw object, R object (list) or a character vector.
JSON files: first GET, then content as text but we can also use jsonlite library and use fromJSON to convert character data into a list, we can pass an object as an argument or URL, we can convert data to JSON file using toJSON, prettify makes JSON files pretty and minify makes in as concise as possible.
haven library:
read_sas for SAS
read_dta for STATA, columns are imported as a labeled vector and in order to change it for R format we need to use as_factor function and then we can convert it for the wanted data type.
read_sav or read_por for SPSS, also labeled class and we need to change it to other standard R class.
foreign library:
Simple functions to import STATA data and SPSS - read.dta for STATA, read.spss for SPSS.
thanks, wait for more!
szarki9