BIG DATA WITH SPARK AND SCALA
5 out of 5 stars
*Course completion certificate, Certification documents and materials, interview questions and job assistance included.
Duration of Course:
Topics Covered are:
Introduction to Scala for Apache Spark
- What is Scala?
- Why Scala for Spark?
- Scala in other frameworks
- Introduction to Scala REPL
- Basic Scala operations
- Variable Types in Scala
- Control Structures in Scala
- Foreach loop, Functions and Procedures
- Collections in Scala- Array
- ArrayBuffer, Map, Tuples, Lists, and more
OOPS and Functional Programming in Scala
- Class in Scala
- Getters and Setters
- Custom Getters and Setters
- Properties with only Getters
- Auxiliary Constructor and Primary Constructor
- Extending a Class
- Overriding Methods
- Traits as Interfaces and Layered Traits
- Higher Order Functions
- Anonymous Functions, and more
Introduction to Big Data and Hadoop
- What is Big Data?
- Big Data Customer Scenarios
- Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
- How Hadoop Solves the Big Data Problem
- What is Hadoop?
- Hadoop’s Key Characteristics
- Hadoop Ecosystem and HDFS
- Hadoop Core Components
- Rack Awareness and Block Replication
- HDFS Read/Write Mechanism
- YARN and Its Advantage
- Hadoop Cluster and Its Architecture
- Hadoop: Different Cluster Modes
- Data Loading using Sqoop
Apache Spark Framework
- Big Data Analytics with Batch & Real-Time Processing
- Why Spark is Needed?
- What is Spark?
- How Spark Differs from Its Competitors?
- Spark at eBay
- Spark’s Place in Hadoop Ecosystem
- Spark Components & it’s Architecture
- Running Programs on Scala IDE & Spark Shell
- Spark Web UI
- Configuring Spark Properties
Playing with RDDs
- Challenges in Existing Computing Methods
- Probable Solution & How RDD Solves the Problem
- What is RDD, It’s Functions, Transformations & Actions?
- Data Loading and Saving Through RDDs
- Key-Value Pair RDDs and Other Pair RDDs o RDD Lineage
- RDD Persistence
- WordCount Program Using RDD Concepts
- RDD Partitioning & How It Helps Achieve Parallelization
DataFrames and Spark SQL
- Need for Spark SQL
- What is Spark SQL?
- Spark SQL Architecture
- SQL Context in Spark SQL
- Data Frames & Datasets
- Interoperating with RDDs
- JSON and Parquet File Formats
- Loading Data through Different Sources
Machine Learning using Spark MLlib
- What is Machine Learning?
- Where is Machine Learning Used?
- Different Types of Machine Learning Techniques
- Face Detection: USE CASE
- Understanding MLlib
- Features of Saprk MLlib and MLlib Tools
- Various ML algorithms supported by Spark MLlib
- K-Means Clustering & How It Works with MLlib
- Analysis on US Election Data: K-Means Spark MLlib USE CASE
Understanding Apache Kafka and Kafka Cluster
- Need for Kafka
- What is Kafka?
- Core Concepts of Kafka
- Kafka Architecture
- Where is Kafka Used?
- Understanding the Components of Kafka Cluster
- Configuring Kafka Cluster
- Producer and Consumer
Capturing Data with Apache Flume and Integration with Kafka
- Need of Apache Flume
- What is Apache Flume
- Basic Flume Architecture
- Flume Sources
- Flume Sinks
- Flume Channels
- Flume Configuration
- Integrating Apache Flume and Apache Kafka
Apache Spark Streaming
- Drawbacks in Existing Computing Methods
- Why Streaming is Necessary?
- What is Spark Streaming?
- Spark Streaming Features
- Spark Streaming Workflow
- How Uber Uses Streaming Data
- Streaming Context & DStreams
- Transformations on DStreams
- WordCount Program using Spark Streaming
- Describe Windowed Operators and Why it is Useful
- Important Windowed Operators
- Slice, Window and ReduceByWindow Operators
- Stateful Operators
- Perform Twitter Sentimental Analysis Using Spark Streaming