Machine learning, at its core, is concerned with algorithms that transform raw data into information to actionable intelligence. This fact makes the machine learning well-suited to the predictive analytics to the Big Data. Without machine learning, therefore, it would be nearly impossible to keep up with these massive streams of information altogether. The Spark which is relatively a new and emerging technology provides the big data engineers and data scientists a powerful response and a unified engine that are both faster and easy to use.
This allows learners from the numerous areas to solve their machine learning problems interactively and at much greater scale.
The book is designed to offer data scientists, engineers and researchers to develop and deploy their machine learning applications at scale so that they can learn how to handle large data clusters in data intensive environments to build powerful machine learning models. The contents of the books have been written in a bottom-up approach from Spark and ML basics, exploring data with feature engineering, building scalable ML pipelines, tuning and adapting them through for the new data and problem types and finally model building to deployment. To clarify more, we provided the chapters outline such a way that a new reader with minimum knowledge of machine learning and programming with Spark will be able to catch the demonstrated examples towards some real life machine learning problems and their solutions.
Data processing, tuning, and scaling out with enough interactions, implementing the related algorithm and finally deploying applications are crucial steps in the process of optimizing any application.
Spark can handle large-scale batch and streaming data to figure out when to cache data in memory and capable of processing them up to 100x faster than Hadoop-based MapReduce and Mahout. This means analytics can be applied to streaming data and developed complete machine learning applications a lot quicker, making Spark an ideal candidate for large data-intensive applications.
This book focuses on design, engineering, and scalable solutions using machine learning with Spark. First, you will learn how to install Spark with the all-new features from the latest Spark 2.0 release. You will also get to grips with Spark MLlib and Spark ML and see how it has implemented machine learning algorithms.
Moving on, you will explore important concepts such as Datasets and advanced feature engineering. After studying more on developing and deploying an application, you will also get to know the other external libraries available for data analysis.
In a nutshell, you will be able to develop your personalized machine learning with ease, tuning, scaling up, and deploying on large cluster or cloud infrastructure with ease.
What you will learn
Get solid theoretical understandings of machine learning algorithms and techniques for new and unknown data types
Set up and configure Spark on local machine, local cluster and cloud infrastructure and develop your first Spark application using Scala, Java, Python, and SparkR
Scale up your machine learning application on large cluster or even cloud computing environment such as Amazon EC2
Use Spark ML and MLlib to develop machine learning pipelines with recommendation system, classification, regression, clustering, sentiment analysis, and dimensionality reduction
Handling large-scale text data for developing ML applications with strong concentration of feature engineering including feature extraction and feature selection
Use Spark Streaming to develop large-scale machine learning applications on real-time streaming data
Tune your machine learning models using cross-validation, grid searching, hyperparameters tuning, and train validation split
Enhance the performance of your machine learning models and make the models adaptable for new data types in a dynamic and incremental environment