

Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3, (Paperback)
Key item features
- Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3, (Paperback)
- Author: Packt Publishing
- ISBN: 9781800568877
- Format: Paperback
- Publication Date: 2021-10-29
- Page Count: 322
Specs
- Manual & guide typeInstruction Manual
- Book formatPaperback
- Edition1
- Skill levelBeginner
- Pages322
- LanguageEnglish
- Free shipping
Free 90-day returns
How do you want your item?
More seller options (2)
Get free delivery, shipping and more*
About this item
Product details
Get started with distributed computing using PySpark, a single unified framework to solve end-to-end data analytics at scale
Key Features:
- Discover how to convert huge amounts of raw data into meaningful and actionable insights
- Use Spark's unified analytics engine for end-to-end analytics, from data preparation to predictive analytics
- Perform data ingestion, cleansing, and integration for ML, data analytics, and data visualization
Book Description:
Apache Spark is a unified data analytics engine designed to process huge volumes of data quickly and efficiently. PySpark is Apache Spark's Python language API, which offers Python developers an easy-to-use scalable data analytics framework.
Essential PySpark for Scalable Data Analytics starts by exploring the distributed computing paradigm and provides a high-level overview of Apache Spark. You'll begin your analytics journey with the data engineering process, learning how to perform data ingestion, cleansing, and integration at scale. This book helps you build real-time analytics pipelines that enable you to gain insights much faster. You'll then discover methods for building cloud-based data lakes, and explore Delta Lake, which brings reliability and performance to data lakes. The book also covers Data Lakehouse, an emerging paradigm, which combines the structure and performance of a data warehouse with the scalability of cloud-based data lakes. Later, you'll perform scalable data science and machine learning tasks using PySpark, such as data preparation, feature engineering, and model training and productionization. Finally, you'll learn ways to scale out standard Python ML libraries along with a new pandas API on top of PySpark called Koalas.
By the end of this PySpark book, you'll be able to harness the power of PySpark to solve business problems.
What You Will Learn:
- Understand the role of distributed computing in the world of big data
- Gain an appreciation for Apache Spark as the de facto go-to for big data processing
- Scale out your data analytics process using Apache Spark
- Build data pipelines using data lakes, and perform data visualization with PySpark and Spark SQL
- Leverage the cloud to build truly scalable and real-time data analytics applications
- Explore the applications of data science and scalable machine learning with PySpark
- Integrate your clean and curated data with BI and SQL analysis tools
Who this book is for:
This book is for practicing data engineers, data scientists, data analysts, and data enthusiasts who are already using data analytics to explore distributed and scalable data analytics. Basic to intermediate knowledge of the disciplines of data engineering, data science, and SQL analytics is expected. General proficiency in using any programming language, espe
- Essential PySpark for Scalable Data Analytics: A beginner's guide to harnessing the power and ease of PySpark 3, (Paperback)
- Author: Packt Publishing
- ISBN: 9781800568877
- Format: Paperback
- Publication Date: 2021-10-29
- Page Count: 322
Specifications
Manual & guide type
Book format
Edition
Skill level
Warranty
Warranty information
Similar items you might like
Based on what customers bought
Mastering Tableau 2019.1 - Second Edition: An expert guide to implementing advanced business intelligence and analytics with Tableau 2019.1 (Paperback) $48.99
$4899current price $48.99Mastering Tableau 2019.1 - Second Edition: An expert guide to implementing advanced business intelligence and analytics with Tableau 2019.1 (Paperback)
Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine (Paperback) $46.57
$4657current price $46.57Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine (Paperback)
Scalable Data Architecture with Java: Build efficient enterprise-grade data architecting solutions using Java (Paperback) $44.85
$4485current price $44.85Scalable Data Architecture with Java: Build efficient enterprise-grade data architecting solutions using Java (Paperback)
Graph Data Modeling for NoSQL and SQL: Visualize Structure and Meaning (Paperback) $44.06
$4406current price $44.06Graph Data Modeling for NoSQL and SQL: Visualize Structure and Meaning (Paperback)
Deep Learning for Natural Language Processing (Edition 1) (Paperback) $49.98
$4998current price $49.98Deep Learning for Natural Language Processing (Edition 1) (Paperback)
AI and Deep Learning Fundamentals: Step by Step Tutorials, (Paperback) $39.65
$3965current price $39.65AI and Deep Learning Fundamentals: Step by Step Tutorials, (Paperback)
Machine Learning for Economics and Finance in Tensorflow 2: Deep Learning Models for Research and Industry, (Paperback) $46.84
$4684current price $46.84Machine Learning for Economics and Finance in Tensorflow 2: Deep Learning Models for Research and Industry, (Paperback)
Cassandra Data Modeling and Schema Design, (Paperback) $39.83
$3983current price $39.83Cassandra Data Modeling and Schema Design, (Paperback)
Text as Data: A New Framework for Machine Learning and the Social Sciences, (Paperback) $40.00
$4000current price $40.00Text as Data: A New Framework for Machine Learning and the Social Sciences, (Paperback)
Growing Business Intelligence: An Agile Approach to Leveraging Data and Analytics for Maximum Business Value (Paperback) $36.30
$3630current price $36.30Growing Business Intelligence: An Agile Approach to Leveraging Data and Analytics for Maximum Business Value (Paperback)
Addison-Wesley Data & Analytics Visual Analytics Fundamentals: Creating Compelling Data Narratives with Tableau, (Paperback) $43.87 Was $49.99
$4387current price $43.87, Was $49.99$49.99Addison-Wesley Data & Analytics Visual Analytics Fundamentals: Creating Compelling Data Narratives with Tableau, (Paperback)
Chapman & Hall/CRC Data Mining and Knowl Feature Engineering for Machine Learning and Data Analytics, (Paperback) $42.97
$4297current price $42.97Chapman & Hall/CRC Data Mining and Knowl Feature Engineering for Machine Learning and Data Analytics, (Paperback)
Principles of Data Science: Mathematical techniques and theory to succeed in data-driven industries, (Paperback) $41.40
$4140current price $41.40Principles of Data Science: Mathematical techniques and theory to succeed in data-driven industries, (Paperback)
Culture Analytics: An Evidence-Based Approach to Company Culture, (Paperback) $37.13
$3713current price $37.13Culture Analytics: An Evidence-Based Approach to Company Culture, (Paperback)
Practical Data Science with Python: Learn tools and techniques from hands-on examples to extract insights from data (Paperback) $51.72
$5172current price $51.72Practical Data Science with Python: Learn tools and techniques from hands-on examples to extract insights from data (Paperback)
Creo Parametric 7.0: A Power Guide for Beginners and Intermediate Users, (Paperback) $43.14
$4314current price $43.14Creo Parametric 7.0: A Power Guide for Beginners and Intermediate Users, (Paperback)
A Practical Guide to Logical Data Modeling (Paperback) $32.67
$3267current price $32.67A Practical Guide to Logical Data Modeling (Paperback)
Microsoft Power BI Cookbook: Creating Business Intelligence Solutions of Analytical Data Models, Reports, and Dashboards, (Paperback) $61.17
$6117current price $61.17Microsoft Power BI Cookbook: Creating Business Intelligence Solutions of Analytical Data Models, Reports, and Dashboards, (Paperback)
The Kaggle Workbook: Self-learning exercises and valuable insights for Kaggle data science competitions, (Paperback) $25.76
$2576current price $25.76The Kaggle Workbook: Self-learning exercises and valuable insights for Kaggle data science competitions, (Paperback)
Customer ratings & reviews
Related pages
- Ai Analytics
- Attribution Modelling Google Analytics
- Digital Analytics Agency
- All Data Resource Review
- Mapping Programs
- Sap Analytics Cloud
- Quality Assurance & Testing Books
- Business Intelligence Tools Books
- Email Administration Books
- General Certification Guide Books
- Spreadsheets Books
- Project Management Books

