TIENE EN SU CESTA DE LA COMPRA
en total 0,00 €
Unleash the power of Python for your data analysis projects with For Dummies!
Python is the preferred programming language for data scientists and combines the best features of Matlab, Mathematica, and R into libraries specific to data analysis and visualization. Python for Data Science For Dummies shows you how to take advantage of Python programming to acquire, organize, process, and analyze large amounts of information and use basic statistics concepts to identify trends and patterns. You'll get familiar with the Python development environment, manipulate data, design compelling visualizations, and solve scientific computing challenges as you work your way through this user-friendly guide.
Covers the fundamentals of Python data analysis programming and statistics to help you build a solid foundation in data science concepts like probability, random distributions, hypothesis testing, and regression models
Explains objects, functions, modules, and libraries and their role in data analysis
Walks you through some of the most widely-used libraries, including NumPy, SciPy, BeautifulSoup, Pandas, and MatPlobLib
Whether you're new to data analysis or just new to Python, Python for Data Science For Dummies is your practical guide to getting a grip on data overload and doing interesting things with the oodles of information you uncover.
Table of Contents
Introduction 1
About This Book 1
Foolish Assumptions 2
Icons Used in This Book 3
Beyond the Book 4
Where to Go from Here 5
Part I: Getting Started with Python for Data Science 7
Chapter 1: Discovering the Match between Data Science and Python 9
Defining the Sexiest Job of the 21st Century 11
Considering the emergence of data science 11
Outlining the core competencies of a data scientist 12
Linking data science and big data 13
Understanding the role of programming 13
Creating the Data Science Pipeline 14
Preparing the data 14
Performing exploratory data analysis 15
Learning from data 15
Visualizing 15
Obtaining insights and data products 15
Understanding Python's Role in Data Science 16
Considering the shifting profile of data scientists 16
Working with a multipurpose, simple, and efficient language 17
Learning to Use Python Fast 18
Loading data 18
Training a model 18
Viewing a result 20
Chapter 2: Introducing Python's Capabilities and Wonders 21
Why Python? 22
Grasping Python's core philosophy 23
Discovering present and future development goals 23
Working with Python 24
Getting a taste of the language 24
Understanding the need for indentation 25
Working at the command line or in the IDE 25
Performing Rapid Prototyping and Experimentation 29
Considering Speed of Execution 30
Visualizing Power 32
Using the Python Ecosystem for Data Science 33
Accessing scientific tools using SciPy 33
Performing fundamental scientific computing using NumPy 34
Performing data analysis using pandas 34
Implementing machine learning using Scikit ]learn 35
Plotting the data using matplotlib 35
Parsing HTML documents using Beautiful Soup 35
Chapter 3: Setting Up Python for Data Science 37
Considering the Off ]the ]Shelf Cross ]Platform Scientific Distributions 38
Getting Continuum Analytics Anaconda 39
Getting Enthought Canopy Express 40
Getting pythonxy 40
Getting WinPython 41
Installing Anaconda on Windows 41
Installing Anaconda on Linux 45
Installing Anaconda on Mac OS X 46
Downloading the Datasets and Example Code 47
Using IPython Notebook 47
Defining the code repository 48
Understanding the datasets used in this book 54
Chapter 4: Reviewing Basic Python 57
Working with Numbers and Logic 59
Performing variable assignments 60
Doing arithmetic 61
Comparing data using Boolean expressions 62
Creating and Using Strings 65
Interacting with Dates 66
Creating and Using Functions 68
Creating reusable functions 68
Calling functions in a variety of ways 70
Using Conditional and Loop Statements 73
Making decisions using the if statement 73
Choosing between multiple options using nested decisions 74
Performing repetitive tasks using for 75
Using the while statement 76
Storing Data Using Sets, Lists, and Tuples 77
Performing operations on sets 77
Working with lists 78
Creating and using Tuples 80
Defining Useful Iterators 81
Indexing Data Using Dictionaries 82
Part II: Getting Your Hands Dirty with Data 83
Chapter 5: Working with Real Data 85
Uploading, Streaming, and Sampling Data 86
Uploading small amounts of data into memory 87
Streaming large amounts of data into memory 88
Sampling data 89
Accessing Data in Structured Flat ]File Form 90
Reading from a text file 91
Reading CSV delimited format 92
Reading Excel and other Microsoft Office files 94
Sending Data in Unstructured File Form 95
Managing Data from Relational Databases 98
Interacting with Data from NoSQL Databases 100
Accessing Data from the Web 101
Chapter 6: Conditioning Your Data 105
Juggling between NumPy and pandas 106
Knowing when to use NumPy 106
Knowing when to use pandas 106
Validating Your Data 107
Figuring out what's in your data 108
Removing duplicates 109
Creating a data map and data plan 110
Manipulating Categorical Variables 112
Creating categorical variables 113
Renaming levels 114
Combining levels 115
Dealing with Dates in Your Data 116
Formatting date and time values 117
Using the right time transformation 117
Dealing with Missing Data 118
Finding the missing data 119
Encoding missingness 119
Imputing missing data 120
Slicing and Dicing: Filtering and Selecting Data 122
Slicing rows 122
Slicing columns 123
Dicing 123
Concatenating and Transforming 124
Adding new cases and variables 125
Removing data 126
Sorting and shuffling 127
Aggregating Data at Any Level 128
Chapter 7: Shaping Data 131
Working with HTML Pages 132
Parsing XML and HTML 132
Using XPath for data extraction 133
Working with Raw Text 134
Dealing with Unicode 134
Stemming and removing stop words 136
Introducing regular expressions 137
Using the Bag of Words Model and Beyond 140
Understanding the bag of words model 141
Working with n ]grams 142
Implementing TF ]IDF transformations 144
Working with Graph Data 145
Understanding the adjacency matrix 146
Using NetworkX basics 146
Chapter 8: Putting What You Know in Action 149
Contextualizing Problems and Data 150
Evaluating a data science problem 151
Researching solutions 151
Formulating a hypothesis 152
Preparing your data 153
Considering the Art of Feature Creation 153
Defining feature creation 153
Combining variables 154
Understanding binning and discretization 155
Using indicator variables 155
Transforming distributions 1