TIENE EN SU CESTA DE LA COMPRA
en total 0,00 €
Introduction to Data Science: Data Analysis and Prediction Algorithms with R introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression, and machine learning. It also helps you develop skills such as R programming, data wrangling, data visualization, predictive algorithm building, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation.
This book is a textbook for a first course in data science. No previous knowledge of R is necessary, although some experience with programming may be helpful. The book is divided into six parts: R, data visualization, statistics with R, data wrangling, machine learning, and productivity tools. Each part has several chapters meant to be presented as one lecture.
The author uses motivating case studies that realistically mimic a data scientist's experience. He starts by asking specific questions and answers these through data analysis so concepts are learned as a means to answering the questions. Examples of the case studies included are: US murder rates by state, self-reported student heights, trends in world health and economics, the impact of vaccines on infectious disease rates, the financial crisis of 2007-2008, election forecasting, building a baseball team, image processing of hand-written digits, and movie recommendation systems.
The statistical concepts used to answer the case study questions are only briefly introduced, so complementing with a probability and statistics textbook is highly recommended for in-depth understanding of these concepts. If you read and understand the chapters and complete the exercises, you will be prepared to learn the more advanced concepts and skills needed to become an expert.
Table of Contents
I R 20
1. Installing R and RStudio
Installing R
Installing RStudio
2. Getting Started with R and RStudio
Why R?
The R console
Scripts
RStudio
The panes
Key bindings
Running commands while editing scripts
Changing global options
Installing R packages
3. R Basics
Case study: US Gun Murders
The very basics
Objects
The workspace
Functions
Other prebuilt objects
Variable names
Saving your workspace
Motivating scripts
Commenting your code
Exercises
Data types
Data frames
Examining an object
The accessor: $
Vectors: numerics, characters, and logical
Factors
Lists
Matrices
Exercises
Vectors
Creating vectors
Names
Sequences Subsetting
Coercion
Not availables (NA)
Exercises
Sorting
sort
order
max and which.max
rank
Beware of recycling
Exercise
Vector arithmetics
Rescaling a vector
Two vectors
Exercises
Indexing
Subsetting with logicals
Logical operators
which
match
%in%
Exercises
Basic plots
plot
hist
boxplot
image
Exercises
4. Programming basics
Conditional expressions
Defining functions
Namespaces
For-loops
Vectorization and functionals
Exercises
5. The tidyverse 84
Tidy data
Exercises
Manipulating data frames
Adding a column with mutate
Subsetting with filter
Selecting columns with select
Exercises
The pipe: %>%
Exercises
Summarizing data
summarize
pull
Group then summarize with group by
Sorting data frames
Nested sorting
The top n
Exercises
Tibbles
Tibbles display better
Subsets of tibbles are tibbles
Tibbles can have complex entries
Tibbles can be grouped
Create a tibble using tibble instead of data frame
The dot operator
do
The purrr package
Tidyverse conditionals
Case when
between
Exercises
6. Importing data 105
Paths and the working directory
The filesystem
Relative and full paths
The working directory
Generating path names
Copying files using paths
The readr and readxl packages
readr
readxl
Exercises
Downloading files
R-base importing functions
scan
Text versus binary files
Unicode versus ASCII
Organizing Data with Spreadsheets
Exercises
II Data Visualization
7. Introduction to data visualization
8. ggplot2
The components of a graph
ggplot objects
Geometries
Aesthetic mappings
Layers
Tinkering with arguments
Global versus local aesthetic mappings
Scales
Labels and titles
Categories as colors
Annotation, shapes, and adjustments
Add-on packages
Putting it all together
Quick plots with qplot
Grids of plots
Exercises
9. Visualizing data distributions
Variable types
Case study: describing student heights
Distribution function
Cumulative distribution functions
Histograms
Smoothed density
Interpreting the y-axis
Densities permit stratification
Exercises
The normal distribution
Standard units
Quantile-quantile plots
Percentiles
Boxplots
Stratification
Case study: describing student heights (continued)
Exercises
ggplot2 geometries
Barplots
Histograms
Density plots
Boxplots
QQ-plots
Images
Quick plots
Exercises
10. Data visualization in practice
Case study: new insights on poverty
Hans Rosling's quiz
Scatterplots
Faceting
facet_wrap
Fixed scales for better comparisons
Time series plots
Labels instead of legends
Data transformations
Log transformation
Which base?
Transform the values or the scale?
Visualizing multimodal distributions
Comparing multiple distributions with boxplots and ridge plots
Boxplots
Ridge plots
Example: 1970 versus 2010 income distributions
Accessing computed variables
Weighted densities
The ecological fallacy and importance of showing the data
Logistic transformation
Show the data
11. Data visualization principles
Encoding data using visual cues
Know when to include
Do not distort quantities
Order categories by a meaningful value
Show the data
Ease comparisons
Use common axes
Align plots vertically to see horizontal changes and horizontally to
see vertical changes
Consider transformations
Visual cues to be compared should be adjacent
Use color
Think of the color blind
Plots for two variables
Slope charts
Bland-Altman plot
Encoding a third variable
Avoid pseudo-three-dimensional plots
Avoid too many significant digits
Know your audience
Exercises
Case study: impact of vaccines on battling infectious diseases
Exercises
12. Robust summaries
Outliers
Median
The inter quartile range (IQR)
Tukey's definition of an outlier
Median absolute deviation
Exercises
Case study: self-reported student heights
III Statistics with R
13. Introduction to Statistics with R
14. Probability
Discrete probability
Relative frequency
Notation
Probability distributio