BIG DATA WITH SPARK AND SCALA
5 out of 5 stars
5 star 1
4 star 0
3 star 0
2 star 0
1 star 0

Instructors :

*Lifetime Access.
*Course completion certificate, Certification documents and materials, interview questions and job assistance included.

 Duration of Course:

40+ hours

 Topics Covered are:

Introduction to Scala for Apache Spark

Topics:

  • What is Scala?
  • Why Scala for Spark?
  • Scala in other frameworks
  • Introduction to Scala REPL
  • Basic Scala operations
  • Variable Types in Scala
  • Control Structures in Scala
  • Foreach loop, Functions and Procedures
  • Collections in Scala- Array
  • ArrayBuffer, Map, Tuples, Lists, and more


OOPS and Functional Programming in Scala

Topics:

  • Class in Scala
  • Getters and Setters
  • Custom Getters and Setters
  • Properties with only Getters
  • Auxiliary Constructor and Primary Constructor
  • Singletons
  • Extending a Class
  • Overriding Methods
  • Traits as Interfaces and Layered Traits
  • Programming
  • Higher Order Functions
  • Anonymous Functions, and more


Introduction to Big Data and Hadoop

Topics:

  • What is Big Data?
  • Big Data Customer Scenarios
  • Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
  • How Hadoop Solves the Big Data Problem
  • What is Hadoop?
  • Hadoop’s Key Characteristics
  • Hadoop Ecosystem and HDFS
  • Hadoop Core Components
  • Rack Awareness and Block Replication
  • HDFS Read/Write Mechanism
  • YARN and Its Advantage
  • Hadoop Cluster and Its Architecture
  • Hadoop: Different Cluster Modes
  • Data Loading using Sqoop

Apache Spark Framework

Topics:

  • Big Data Analytics with Batch & Real-Time Processing
  • Why Spark is Needed?
  • What is Spark?
  • How Spark Differs from Its Competitors?
  • Spark at eBay
  • Spark’s Place in Hadoop Ecosystem
  • Spark Components & it’s Architecture
  • Running Programs on Scala IDE & Spark Shell
  • Spark Web UI
  • Configuring Spark Properties

Playing with RDDs

Topics:

  • Challenges in Existing Computing Methods
  • Probable Solution & How RDD Solves the Problem
  • What is RDD, It’s Functions, Transformations & Actions?
  • Data Loading and Saving Through RDDs
  • Key-Value Pair RDDs and Other Pair RDDs o RDD Lineage
  • RDD Persistence
  • WordCount Program Using RDD Concepts
  • RDD Partitioning & How It Helps Achieve Parallelization


DataFrames and Spark SQL

  • Need for Spark SQL
  • What is Spark SQL?
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • Data Frames & Datasets
  • Interoperating with RDDs
  • JSON and Parquet File Formats
  • Loading Data through Different Sources

Machine Learning using Spark MLlib

  • What is Machine Learning?
  • Where is Machine Learning Used?
  • Different Types of Machine Learning Techniques
  • Face Detection: USE CASE
  • Understanding MLlib
  • Features of Saprk MLlib and MLlib Tools
  • Various ML algorithms supported by Spark MLlib
  • K-Means Clustering & How It Works with MLlib
  • Analysis on US Election Data: K-Means Spark MLlib USE CASE

Understanding Apache Kafka and Kafka Cluster

  • Need for Kafka
  • What is Kafka?
  • Core Concepts of Kafka
  • Kafka Architecture
  • Where is Kafka Used?
  • Understanding the Components of Kafka Cluster
  • Configuring Kafka Cluster
  • Producer and Consumer


Capturing Data with Apache Flume and Integration with Kafka

  • Need of Apache Flume
  • What is Apache Flume
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Channels
  • Flume Configuration
  • Integrating Apache Flume and Apache Kafka

Apache Spark Streaming

  • Drawbacks in Existing Computing Methods
  • Why Streaming is Necessary?
  • What is Spark Streaming?
  • Spark Streaming Features
  • Spark Streaming Workflow
  • How Uber Uses Streaming Data
  • Streaming Context & DStreams
  • Transformations on DStreams
  • WordCount Program using Spark Streaming
  • Describe Windowed Operators and Why it is Useful
  • Important Windowed Operators
  • Slice, Window and ReduceByWindow Operators
  • Stateful Operators
  • Perform Twitter Sentimental Analysis Using Spark Streaming

Lessons Sample lesson

Student Reviews

5 out of 5
5 star 1
4 star 0
3 star 0
2 star 0
1 star 0
Share your thoughts with other users Write a course review

Top Student Reviews

  1. By Siddharth on August 13, 2018
    Thanks Knowasap for clear and best course available !!!