BIG DATA WITH SPARK AND SCALA
5 out of 5 stars
5 star 9
4 star 0
3 star 0
2 star 0
1 star 0

Instructors :

*Lifetime Access.
*Course completion certificate, Certification documents and materials, interview questions and job assistance included.

 Duration of Course:

40+ hours

 Topics Covered are:

Introduction to Scala for Apache Spark

Topics:

  • What is Scala?
  • Why Scala for Spark?
  • Scala in other frameworks
  • Introduction to Scala REPL
  • Basic Scala operations
  • Variable Types in Scala
  • Control Structures in Scala
  • Foreach loop, Functions and Procedures
  • Collections in Scala- Array
  • ArrayBuffer, Map, Tuples, Lists, and more


OOPS and Functional Programming in Scala

Topics:

  • Class in Scala
  • Getters and Setters
  • Custom Getters and Setters
  • Properties with only Getters
  • Auxiliary Constructor and Primary Constructor
  • Singletons
  • Extending a Class
  • Overriding Methods
  • Traits as Interfaces and Layered Traits
  • Programming
  • Higher Order Functions
  • Anonymous Functions, and more


Introduction to Big Data and Hadoop

Topics:

  • What is Big Data?
  • Big Data Customer Scenarios
  • Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
  • How Hadoop Solves the Big Data Problem
  • What is Hadoop?
  • Hadoop’s Key Characteristics
  • Hadoop Ecosystem and HDFS
  • Hadoop Core Components
  • Rack Awareness and Block Replication
  • HDFS Read/Write Mechanism
  • YARN and Its Advantage
  • Hadoop Cluster and Its Architecture
  • Hadoop: Different Cluster Modes
  • Data Loading using Sqoop

Apache Spark Framework

Topics:

  • Big Data Analytics with Batch & Real-Time Processing
  • Why Spark is Needed?
  • What is Spark?
  • How Spark Differs from Its Competitors?
  • Spark at eBay
  • Spark’s Place in Hadoop Ecosystem
  • Spark Components & it’s Architecture
  • Running Programs on Scala IDE & Spark Shell
  • Spark Web UI
  • Configuring Spark Properties

Playing with RDDs

Topics:

  • Challenges in Existing Computing Methods
  • Probable Solution & How RDD Solves the Problem
  • What is RDD, It’s Functions, Transformations & Actions?
  • Data Loading and Saving Through RDDs
  • Key-Value Pair RDDs and Other Pair RDDs o RDD Lineage
  • RDD Persistence
  • WordCount Program Using RDD Concepts
  • RDD Partitioning & How It Helps Achieve Parallelization


DataFrames and Spark SQL

  • Need for Spark SQL
  • What is Spark SQL?
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • Data Frames & Datasets
  • Interoperating with RDDs
  • JSON and Parquet File Formats
  • Loading Data through Different Sources

Machine Learning using Spark MLlib

  • What is Machine Learning?
  • Where is Machine Learning Used?
  • Different Types of Machine Learning Techniques
  • Face Detection: USE CASE
  • Understanding MLlib
  • Features of Saprk MLlib and MLlib Tools
  • Various ML algorithms supported by Spark MLlib
  • K-Means Clustering & How It Works with MLlib
  • Analysis on US Election Data: K-Means Spark MLlib USE CASE

Understanding Apache Kafka and Kafka Cluster

  • Need for Kafka
  • What is Kafka?
  • Core Concepts of Kafka
  • Kafka Architecture
  • Where is Kafka Used?
  • Understanding the Components of Kafka Cluster
  • Configuring Kafka Cluster
  • Producer and Consumer


Capturing Data with Apache Flume and Integration with Kafka

  • Need of Apache Flume
  • What is Apache Flume
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Channels
  • Flume Configuration
  • Integrating Apache Flume and Apache Kafka

Apache Spark Streaming

  • Drawbacks in Existing Computing Methods
  • Why Streaming is Necessary?
  • What is Spark Streaming?
  • Spark Streaming Features
  • Spark Streaming Workflow
  • How Uber Uses Streaming Data
  • Streaming Context & DStreams
  • Transformations on DStreams
  • WordCount Program using Spark Streaming
  • Describe Windowed Operators and Why it is Useful
  • Important Windowed Operators
  • Slice, Window and ReduceByWindow Operators
  • Stateful Operators
  • Perform Twitter Sentimental Analysis Using Spark Streaming

Lessons Sample lesson

Student Reviews

5 out of 5
5 star 9
4 star 0
3 star 0
2 star 0
1 star 0
Share your thoughts with other users Write a course review

Top Student Reviews

  1. By Gaushik on March 11, 2019
    Great Course content with amazing and challenging assignments!
  2. By Abdus-Shaheed on March 11, 2019
    Great course over Spark. It shows the syntax, but more than this, it shows the problems, the caveats, the optimization and the architecture from a wide point of view.

    The assignment were designed to focus right to the point: they managed all the configuration and initialization code of the project, leaving the student to fill only the most important part, with the resulto of having, at the end of
  3. By jay on March 11, 2019
    Wonderful course. Helped me a lot.

  4. By Bhadrak on March 11, 2019
    Very good and knowledgeable course. It basically explains about all about big data with scala and spark.
  5. By Eugene on March 11, 2019
    Amazing course, i learned a lot even if i'm working with scala and spark
  6. By Kenneth on March 11, 2019
    Great course about Big Data analysis

    It was my first exposure to Big Data frameworks and I learned a lot about the problems trying to be solved and the power of Spark.
  7. By Siddharth on August 13, 2018
    Thanks Knowasap for clear and best course available !!!
  8. By Cynthia on January 17, 2019
    All things as mentioned are covered in best way possible..:)
  9. By Daiva on March 11, 2019
    A lot things to learn and experiment.