TIENE EN SU CESTA DE LA COMPRA
en total 0,00 €
Intel Xeon Phi Processor High Performance Programming is an all-in-one source of information for programming the Second-Generation Intel Xeon Phi product family also called Knights Landing. The authors provide detailed and timely Knights Landingspecific details, programming advice, and real-world examples. The authors distill their years of Xeon Phi programming experience coupled with insights from many expert customers - Intel Field Engineers, Application Engineers, and Technical Consulting Engineers - to create this authoritative book on the
essentials of programming for Intel Xeon Phi products.
Intel® Xeon PhiT Processor High-Performance Programming is useful even before you ever program a system with an Intel Xeon Phi processor. To help ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi processors, or other high-performance microprocessors. Applying these techniques will generally increase your program performance on any system and prepare
you better for Intel Xeon Phi processors.
Key Features
A practical guide to the essentials for programming Intel Xeon Phi processors
Definitive coverage of the Knights Landing architecture
Presents best practices for portable, high-performance computing and a familiar and proven threads and vectors programming model
Includes real world code examples that highlight usages of the unique aspects of this new highly parallel and high-performance computational product
Covers use of MCDRAM, AVX-512, Intel® Omni-Path fabric, many-cores (up to 72), and many threads (4 per core)
Covers software developer tools, libraries and programming models
Covers using Knights Landing as a processor and a coprocessor
Readership
Software engineers, High Performance and Super Computing developers, scientific researchers in need of high-performance computing resources
Table of Contents
Foreword
Extending the Sports Car Analogy to Higher Performance
What Exactly Is The Unfair Advantage?
Peak Performance Versus Drivable/Usable Performance
How Does The Unfair Advantage Relate to This Book?
Closing Comments
Preface
Sports Car Tutorial: Introduction for Many-Core Is Online
Parallelism Pearls: Inspired by Many Cores
Organization
Structured Parallel Programming
What's New?
lotsofcores.com
Section I: Knights Landing
Introduction
Chapter 1: Introduction
Abstract
Introduction to Many-Core Programming
Trend: More Parallelism
Why Intel® Xeon PhiT Processors Are Needed
Processors Versus Coprocessor
Measuring Readiness for Highly Parallel Execution
What About GPUs?
Enjoy the Lack of Porting Needed but Still Tune!
Transformation for Performance
Hyper-Threading Versus Multithreading
Programming Models
Why We Could Skip To Section II Now
For More Information
Chapter 2: Knights Landing overview
Abstract
Overview
Instruction Set
Architecture Overview
Motivation: Our Vision and Purpose
Summary
For More Information
Chapter 3: Programming MCDRAM and Cluster modes
Abstract
Programming for Cluster Modes
Programming for Memory Modes
Query Memory Mode and MCDRAM Available
SNC Performance Implications of Allocation and Threading
How to Not Hard Code the NUMA Node Numbers
Approaches to Determining What to Put in MCDRAM
Why Rebooting Is Required to Change Modes
BIOS
Summary
For More Information
Chapter 4: Knights Landing architecture
Abstract
Tile Architecture
Cluster Modes
Memory Interleaving
Memory Modes
Interactions of Cluster and Memory Modes
Summary
For More Information
Chapter 5: Intel Omni-Path Fabric
Abstract
Overview
Performance and Scalability
Transport Layer APIs
Quality of Service
Virtual Fabrics
Unicast Address Resolution
Multicast Address Resolution
Summary
For More Information
Chapter 6: µarch optimization advice
Abstract
Best Performance From 1, 2, or 4 Threads Per Core, Rarely 3
Memory Subsystem
µarch Nuances (tile)
Direct Mapped MCDRAM Cache
Advice: Use AVX-512
Summary
For More Information
Section II: Parallel Programming
Introduction
Chapter 7: Programming overview for Knights Landing
Abstract
To Refactor, or Not to Refactor, That Is the Question
Evolutionary Optimization of Applications
Revolutionary Optimization of Applications
Know When to Hold'em and When to Fold'em
For More Information
Chapter 8: Tasks and threads
Abstract
OpenMP
Fortran 2008
Intel TBB
hStreams
Summary
For More Information
Chapter 9: Vectorization
Abstract
Why Vectorize?
How to Vectorize
Three Approaches to Achieving Vectorization
Six-Step Vectorization Methodology
Streaming Through Caches: Data Layout, Alignment, Prefetching, and so on
Compiler Tips
Compiler Options
Compiler Directives
Use Array Sections to Encourage Vectorization
Look at What the Compiler Created: Assembly Code Inspection
Numerical Result Variations With Vectorization
Summary
For More Information
Chapter 10: Vectorization advisor
Abstract
Getting Started With Intel Advisor for Knights Landing
Enabling and Improving AVX-512 Code With the Survey Report
Memory Access Pattern Report
AVX-512 Gather/Scatter Profiler
Mask Utilization and FLOPs Profiler
Advisor Roofline Report
Explore AVX-512 Code Characteristics Without AVX-512 Hardware
Example - Analysis of a Computational Chemistry Code
Summary
For More Information
Chapter 11: Vectorization with SDLT
Abstract
What Is SDLT?
Getting Started
SDLT Basics
Example Normalizing 3d Points With SIMD
What Is Wrong With AOS Memory Layout and SIMD?
SIMD Prefers Unit-Stride Memory Accesses
Alpha-Blended Overlay Reference
Alpha-Blended Overlay With SDLT
Additional Features
Summary
For More Information
Chapter 12: Vectorization with AVX-512 intrinsics
Abstract
What Are Intrinsics?
AVX-512 Overview
Migrating From Knights Corner
AVX-512 Detection
Learning AVX-512 Instructions
Learning AVX-512 Intrinsics
Step-by-Step Example Using AVX-512 Intrinsics
Results Using Our Intrinsics Code
For More Information
Chapter 13: Performance libraries
Abstract
Intel Performance Library Overview
Intel Math Kernel Library Overview
Intel Data Analytics Library Overview
Together: MKL and DAAL
Intel Integrated Performance Primitives Library Overview
Intel Performance Libraries and Intel Compilers
Native (Direct) Library Usage
Offloading to Knights Landing While Using a Library
Precision Choices and Variations
Performance Tip for Faster Dynamic Libraries
For More Information
Chapter 14: Profiling and timing
Abstract
Introduction to Knight Landing Tuning
Event-Monitoring Registers
Efficiency Metrics
Potential Performance Issues
Intel VTune Amplifier XE Product
Performance Application Programming Interface
MPI Analysis: ITAC
HPCToolkit
Tuning and Analysis Utilities
Timing
Summary
For More Information
Chapter 15: MPI
Abstract
Internode