Librería Portfolio Librería Portfolio

Búsqueda avanzada

TIENE EN SU CESTA DE LA COMPRA

0 productos

en total 0,00 €

PYTHON FOR DATA SCIENCE FOR DUMMIES
Título:
PYTHON FOR DATA SCIENCE FOR DUMMIES
Subtítulo:
Autor:
HUSSAIN, Z
Editorial:
JOHN WILEY
Año de edición:
2015
Materia
UNIX
ISBN:
978-1-118-84418-2
Páginas:
432
27,50 €

 

Sinopsis

Unleash the power of Python for your data analysis projects with For Dummies!

Python is the preferred programming language for data scientists and combines the best features of Matlab, Mathematica, and R into libraries specific to data analysis and visualization. Python for Data Science For Dummies shows you how to take advantage of Python programming to acquire, organize, process, and analyze large amounts of information and use basic statistics concepts to identify trends and patterns. You'll get familiar with the Python development environment, manipulate data, design compelling visualizations, and solve scientific computing challenges as you work your way through this user-friendly guide.

Covers the fundamentals of Python data analysis programming and statistics to help you build a solid foundation in data science concepts like probability, random distributions, hypothesis testing, and regression models
Explains objects, functions, modules, and libraries and their role in data analysis
Walks you through some of the most widely-used libraries, including NumPy, SciPy, BeautifulSoup, Pandas, and MatPlobLib
Whether you're new to data analysis or just new to Python, Python for Data Science For Dummies is your practical guide to getting a grip on data overload and doing interesting things with the oodles of information you uncover.



Table of Contents

Introduction 1

About This Book 1

Foolish Assumptions 2

Icons Used in This Book 3

Beyond the Book 4

Where to Go from Here 5

Part I: Getting Started with Python for Data Science 7

Chapter 1: Discovering the Match between Data Science and Python 9

Defining the Sexiest Job of the 21st Century 11

Considering the emergence of data science 11

Outlining the core competencies of a data scientist 12

Linking data science and big data 13

Understanding the role of programming 13

Creating the Data Science Pipeline 14

Preparing the data 14

Performing exploratory data analysis 15

Learning from data 15

Visualizing 15

Obtaining insights and data products 15

Understanding Python's Role in Data Science 16

Considering the shifting profile of data scientists 16

Working with a multipurpose, simple, and efficient language 17

Learning to Use Python Fast 18

Loading data 18

Training a model 18

Viewing a result 20

Chapter 2: Introducing Python's Capabilities and Wonders 21

Why Python? 22

Grasping Python's core philosophy 23

Discovering present and future development goals 23

Working with Python 24

Getting a taste of the language 24

Understanding the need for indentation 25

Working at the command line or in the IDE 25

Performing Rapid Prototyping and Experimentation 29

Considering Speed of Execution 30

Visualizing Power 32

Using the Python Ecosystem for Data Science 33

Accessing scientific tools using SciPy 33

Performing fundamental scientific computing using NumPy 34

Performing data analysis using pandas 34

Implementing machine learning using Scikit ]learn 35

Plotting the data using matplotlib 35

Parsing HTML documents using Beautiful Soup 35

Chapter 3: Setting Up Python for Data Science 37

Considering the Off ]the ]Shelf Cross ]Platform Scientific Distributions 38

Getting Continuum Analytics Anaconda 39

Getting Enthought Canopy Express 40

Getting pythonxy 40

Getting WinPython 41

Installing Anaconda on Windows 41

Installing Anaconda on Linux 45

Installing Anaconda on Mac OS X 46

Downloading the Datasets and Example Code 47

Using IPython Notebook 47

Defining the code repository 48

Understanding the datasets used in this book 54

Chapter 4: Reviewing Basic Python 57

Working with Numbers and Logic 59

Performing variable assignments 60

Doing arithmetic 61

Comparing data using Boolean expressions 62

Creating and Using Strings 65

Interacting with Dates 66

Creating and Using Functions 68

Creating reusable functions 68

Calling functions in a variety of ways 70

Using Conditional and Loop Statements 73

Making decisions using the if statement 73

Choosing between multiple options using nested decisions 74

Performing repetitive tasks using for 75

Using the while statement 76

Storing Data Using Sets, Lists, and Tuples 77

Performing operations on sets 77

Working with lists 78

Creating and using Tuples 80

Defining Useful Iterators 81

Indexing Data Using Dictionaries 82

Part II: Getting Your Hands Dirty with Data 83

Chapter 5: Working with Real Data 85

Uploading, Streaming, and Sampling Data 86

Uploading small amounts of data into memory 87

Streaming large amounts of data into memory 88

Sampling data 89

Accessing Data in Structured Flat ]File Form 90

Reading from a text file 91

Reading CSV delimited format 92

Reading Excel and other Microsoft Office files 94

Sending Data in Unstructured File Form 95

Managing Data from Relational Databases 98

Interacting with Data from NoSQL Databases 100

Accessing Data from the Web 101

Chapter 6: Conditioning Your Data 105

Juggling between NumPy and pandas 106

Knowing when to use NumPy 106

Knowing when to use pandas 106

Validating Your Data 107

Figuring out what's in your data 108

Removing duplicates 109

Creating a data map and data plan 110

Manipulating Categorical Variables 112

Creating categorical variables 113

Renaming levels 114

Combining levels 115

Dealing with Dates in Your Data 116

Formatting date and time values 117

Using the right time transformation 117

Dealing with Missing Data 118

Finding the missing data 119

Encoding missingness 119

Imputing missing data 120

Slicing and Dicing: Filtering and Selecting Data 122

Slicing rows 122

Slicing columns 123

Dicing 123

Concatenating and Transforming 124

Adding new cases and variables 125

Removing data 126

Sorting and shuffling 127

Aggregating Data at Any Level 128

Chapter 7: Shaping Data 131

Working with HTML Pages 132

Parsing XML and HTML 132

Using XPath for data extraction 133

Working with Raw Text 134

Dealing with Unicode 134

Stemming and removing stop words 136

Introducing regular expressions 137

Using the Bag of Words Model and Beyond 140

Understanding the bag of words model 141

Working with n ]grams 142

Implementing TF ]IDF transformations 144

Working with Graph Data 145

Understanding the adjacency matrix 146

Using NetworkX basics 146

Chapter 8: Putting What You Know in Action 149

Contextualizing Problems and Data 150

Evaluating a data science problem 151

Researching solutions 151

Formulating a hypothesis 152

Preparing your data 153

Considering the Art of Feature Creation 153

Defining feature creation 153

Combining variables 154

Understanding binning and discretization 155

Using indicator variables 155

Transforming distributions 1