Librería Portfolio Librería Portfolio

Búsqueda avanzada

TIENE EN SU CESTA DE LA COMPRA

0 productos

en total 0,00 €

INTRODUCTION TO DATA SCIENCE: DATA ANALYSIS AND PREDICTION ALGORITHMS WITH R
Título:
INTRODUCTION TO DATA SCIENCE: DATA ANALYSIS AND PREDICTION ALGORITHMS WITH R
Subtítulo:
Autor:
IRIZARRY, R
Editorial:
CRC
Año de edición:
2019
Materia
BASES DE DATOS - OTROS TEMAS
ISBN:
978-0-367-35798-6
Páginas:
713
98,75 €

 

Sinopsis

Introduction to Data Science: Data Analysis and Prediction Algorithms with R introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression, and machine learning. It also helps you develop skills such as R programming, data wrangling, data visualization, predictive algorithm building, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation.

This book is a textbook for a first course in data science. No previous knowledge of R is necessary, although some experience with programming may be helpful. The book is divided into six parts: R, data visualization, statistics with R, data wrangling, machine learning, and productivity tools. Each part has several chapters meant to be presented as one lecture.

The author uses motivating case studies that realistically mimic a data scientist's experience. He starts by asking specific questions and answers these through data analysis so concepts are learned as a means to answering the questions. Examples of the case studies included are: US murder rates by state, self-reported student heights, trends in world health and economics, the impact of vaccines on infectious disease rates, the financial crisis of 2007-2008, election forecasting, building a baseball team, image processing of hand-written digits, and movie recommendation systems.

The statistical concepts used to answer the case study questions are only briefly introduced, so complementing with a probability and statistics textbook is highly recommended for in-depth understanding of these concepts. If you read and understand the chapters and complete the exercises, you will be prepared to learn the more advanced concepts and skills needed to become an expert.



Table of Contents
I R 20

1. Installing R and RStudio

Installing R

Installing RStudio

2. Getting Started with R and RStudio

Why R?

The R console

Scripts

RStudio

The panes

Key bindings

Running commands while editing scripts

Changing global options

Installing R packages

3. R Basics

Case study: US Gun Murders

The very basics

Objects

The workspace

Functions

Other prebuilt objects

Variable names

Saving your workspace

Motivating scripts

Commenting your code

Exercises

Data types

Data frames

Examining an object

The accessor: $

Vectors: numerics, characters, and logical

Factors

Lists

Matrices

Exercises

Vectors

Creating vectors

Names

Sequences Subsetting

Coercion

Not availables (NA)

Exercises

Sorting

sort

order

max and which.max

rank

Beware of recycling

Exercise

Vector arithmetics

Rescaling a vector

Two vectors

Exercises

Indexing

Subsetting with logicals

Logical operators

which

match

%in%

Exercises

Basic plots

plot

hist

boxplot

image

Exercises

4. Programming basics

Conditional expressions

Defining functions

Namespaces

For-loops

Vectorization and functionals

Exercises

5. The tidyverse 84

Tidy data

Exercises

Manipulating data frames

Adding a column with mutate

Subsetting with filter

Selecting columns with select

Exercises

The pipe: %>%

Exercises

Summarizing data

summarize

pull

Group then summarize with group by

Sorting data frames

Nested sorting

The top n

Exercises

Tibbles

Tibbles display better

Subsets of tibbles are tibbles

Tibbles can have complex entries

Tibbles can be grouped

Create a tibble using tibble instead of data frame

The dot operator

do

The purrr package

Tidyverse conditionals

Case when

between

Exercises

6. Importing data 105

Paths and the working directory

The filesystem

Relative and full paths

The working directory

Generating path names

Copying files using paths

The readr and readxl packages

readr

readxl

Exercises

Downloading files

R-base importing functions

scan

Text versus binary files

Unicode versus ASCII

Organizing Data with Spreadsheets

Exercises

II Data Visualization

7. Introduction to data visualization

8. ggplot2

The components of a graph

ggplot objects

Geometries

Aesthetic mappings

Layers

Tinkering with arguments

Global versus local aesthetic mappings

Scales

Labels and titles

Categories as colors

Annotation, shapes, and adjustments

Add-on packages

Putting it all together

Quick plots with qplot

Grids of plots

Exercises

9. Visualizing data distributions

Variable types

Case study: describing student heights

Distribution function

Cumulative distribution functions

Histograms

Smoothed density

Interpreting the y-axis

Densities permit stratification

Exercises

The normal distribution

Standard units

Quantile-quantile plots

Percentiles

Boxplots

Stratification

Case study: describing student heights (continued)

Exercises

ggplot2 geometries

Barplots

Histograms

Density plots

Boxplots

QQ-plots

Images

Quick plots

Exercises

10. Data visualization in practice

Case study: new insights on poverty

Hans Rosling's quiz

Scatterplots

Faceting

facet_wrap

Fixed scales for better comparisons

Time series plots

Labels instead of legends

Data transformations

Log transformation

Which base?

Transform the values or the scale?

Visualizing multimodal distributions

Comparing multiple distributions with boxplots and ridge plots

Boxplots

Ridge plots

Example: 1970 versus 2010 income distributions

Accessing computed variables

Weighted densities

The ecological fallacy and importance of showing the data

Logistic transformation

Show the data

11. Data visualization principles

Encoding data using visual cues

Know when to include

Do not distort quantities

Order categories by a meaningful value

Show the data

Ease comparisons

Use common axes

Align plots vertically to see horizontal changes and horizontally to

see vertical changes

Consider transformations

Visual cues to be compared should be adjacent

Use color

Think of the color blind

Plots for two variables

Slope charts

Bland-Altman plot

Encoding a third variable

Avoid pseudo-three-dimensional plots

Avoid too many significant digits

Know your audience

Exercises

Case study: impact of vaccines on battling infectious diseases

Exercises

12. Robust summaries

Outliers

Median

The inter quartile range (IQR)

Tukey's definition of an outlier

Median absolute deviation

Exercises

Case study: self-reported student heights

III Statistics with R

13. Introduction to Statistics with R

14. Probability

Discrete probability

Relative frequency

Notation

Probability distributio