Librería Portfolio Librería Portfolio

Búsqueda avanzada

TIENE EN SU CESTA DE LA COMPRA

0 productos

en total 0,00 €

PROGRAMMING PIG 2E. DATAFLOW SCRIPTING WITH HADOOP
Título:
PROGRAMMING PIG 2E. DATAFLOW SCRIPTING WITH HADOOP
Subtítulo:
Autor:
GATES, A
Editorial:
O´REILLY
Año de edición:
2016
Materia
BASES DE DATOS - OTROS TEMAS
ISBN:
978-1-4919-3709-9
Páginas:
368
38,50 €

 

Sinopsis

For many organizations, Hadoop is the first step for dealing with massive amounts of data. The next step? Processing and analyzing datasets with the Apache Pig scripting platform. With Pig, you can batch-process data without having to create a full-fledged application, making it easy to experiment with new datasets.

Updated with use cases and programming examples, this second edition is the ideal learning tool for new and experienced users alike. You'll find comprehensive coverage on key features such as the Pig Latin scripting language and the Grunt shell. When you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.

Delve into Pig's data model, including scalar and complex data types
Write Pig Latin scripts to sort, group, join, project, and filter your data
Use Grunt to work with the Hadoop Distributed File System (HDFS)
Build complex data processing pipelines with Pig's macros and modularity features
Embed Pig Latin in Python for iterative processing and other advanced tasks
Use Pig with Apache Tez to build high-performance batch and interactive data processing applications
Create your own load and store functions to handle data formats and storage mechanisms



Chapter 1What Is Pig?
Pig Latin, a Parallel Data Flow Language
Pig on Hadoop
What Is Pig Useful For?
The Pig Philosophy
Pig's History
Chapter 2Installing and Running Pig
Downloading and Installing Pig
Running Pig
Grunt
Chapter 3Pig's Data Model
Types
Schemas
Chapter 4Introduction to Pig Latin
Preliminary Matters
Input and Output
Relational Operations
User-Defined Functions
Chapter 5Advanced Pig Latin
Advanced Relational Operations
Integrating Pig with Executables and Native Jobs
split and Nonlinear Data Flows
Controlling Execution
Pig Latin Preprocessor
Chapter 6Developing and Testing Pig Latin Scripts
Development Tools
Testing Your Scripts with PigUnit
Chapter 7Making Pig Fly
Writing Your Scripts to Perform Well
Writing Your UDFs to Perform
Tuning Pig and Hadoop for Your Job
Using Compression in Intermediate Results
Data Layout Optimization
Map-Side Aggregation
The JAR Cache
Processing Small Jobs Locally
Bloom Filters
Schema Tuple Optimization
Dealing with Failures
Chapter 8Embedding Pig
Embedding Pig Latin in Scripting Languages
Using the Pig Java APIs
Chapter 9Writing Evaluation and Filter Functions
Writing an Evaluation Function in Java
The Algebraic Interface
The Accumulator Interface
Writing Filter Functions
Writing Evaluation Functions in Scripting Languages
Chapter 10Writing Load and Store Functions
Load Functions
Store Functions
Shipping JARs Automatically
Handling Bad Records
Chapter 11Pig on Tez
What Is Tez?
Running Pig on Tez
Potential Differences When Running on Tez
Pig on Tez Internals
Chapter 12Pig and Other Members of the Hadoop Community
Pig and Hive
Cascading
Spark
NoSQL Databases
DataFu
Oozie
Chapter 13Use Cases and Programming Examples
Sparse Tuples
k-Means
intersect and except
Pig at Yahoo!
Pig at Particle News
Appendix Built-in User Defined Functions and PiggyBank
Built-in UDFs
PiggyBank