R intro part 1
Written on November 18th, 2019 by szarki9Hola, hola.
As another part of my studying experience I want to sum up my weekend entertainment, which is learning R. This sum up is going to be in the form of a note with links to documentation.
First of all, let me tell you what R is.
R is a programming language (interpreted language, which is a language that for most of the implementations execute instructions without previously compiling a program), created in the 90s for data analysis and as a teaching aid for learning statistics. As it was an open-source project, R easily became popular and even now many users are engaged into developing R and in working on new packages and libraries. Thanks to that, R has a lot of useful functions and is used for statistical calculations, data visualization, data analysis and data science.
Data types, objects:
Basic object type in R is a vector. Elements of the vector have the same data type, one of the following: numeric, integer, character or logical. Except vectors we have matrix, data frame, and list.
Matrix – all of the columns must have the same type and same length.
Data Frames – used for storing data tables, and it is a list of vectors with equal length, columns might have different data type.
List – object that contains element of different types.
Factors – used to represent categorical data, are stored as integers and have labels associated with these unique integers, R sorts levels in alphabetical order.
Arrays – object that can store data in more than two dimensions.
Tibble - is a data frame providing a nicer printing method, useful when working with large data sets.
Useful functions:
class() - will return in a class of the object, names() - will assign names to values of the vector/colnames and rownames accordingly for matrices, cbind() – adding new vector to matrix or merging matrices, by columns ,rbind() – combines vector, martix, data frame by row, levels(), summary(), str() - display the structure of an arbitrary R object, head().
LIBRARIES worth to know:
ggplot2 – data visualization package and „most elegant and aesthetically pleasing graphics framework available in R”.
dplyr - is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges (like select, mutate, filter, arrange, %>% !pipelines!).
readr – provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf) (read_csv, read_tsv, read_delim etc.)
readxl – makes it easy to get data out of Excel and into R, has no external dependencies and is designed to work with tabular data (read_excel, excel_sheets etc.)
XLConnect – comprehensive and cross-platform R package for manipulating Microsoft Excel files from within R.
DBI – a database interface definition for communication between R and relational database management systems.
httr – useful tools for working with HTTP organized by HTTP verbs (GET(), POST(), etc). Configuration functions make it easy to control additional request components (authenticate(), add_headers() and so on).
jsonlite – a fast JSON parser and generator optimized for statistical data and the web.
haven – enables R to read and write various data formats used by other statistical packages (SAS, SPSS, STATA).
foreign – another package to read and write data formats such as SAS, SPSS, STATA.
tidyr - the goal of tidyr is to help you create tidy data, functions such as gather, spread, separate, unite, extract.
lubridate – provides easy and fast parsing of date-times, simple functions to get and set components of a date-time help to handle time zones (y, m, d, h, m, s)
stringr – provide a cohesive set of functions designed to make working with strings as easy as possible (str_detect, str_replace).
tidyverse - set of packages that work in harmony because they share common data representations and API design, such as ggplot2, dplyr, tidyr, readr, purrr, tibble, haven, stringr.
Todo for now, will get to you with more soon!
szarki9