Course curriculum

    1. Overview of Kafka

    2. Managing Topics using Kafka CLI

    3. Produce and Consume Messages using CLI

    4. Validate Generation of Web Server Logs

    5. Create Web Server using nc

    6. Produce retail logs to Kafka Topic

    7. Consume retail logs from Kafka Topic

    8. Clean up Kafka CLI Sessions to produce and consume messages

    1. Overview of Kafka Connect

    2. Define Kafka Connect to Produce Messages

    3. Validate Kafka Connect to produce messages

    4. Cleanup Kafka Connect to produce messages

    5. Write Data to HDFS using Kafka Connect

    6. Setup HDFS 3 Sink Connector Plugin

    7. Overview of Kafka Consumer Groups

    8. Configure HDFS 3 Sink Properties

    9. Run and Validate HDFS 3 Sink

    10. Cleanup Kafka Connect to consume messages

    11. Configure HDFS 3 Sink Properties for String Format

    12. Run and Validate HDFS 3 Sink using String Format

    13. Cleanup Kafka Connect to consume messages

    1. Understanding Streaming Context

    2. Validate Log Data for Streaming

    3. Push log messages to Netcat Webserver

    4. Overview of built-in Input Sources.cmproj

    5. Reading Web Server logs using Spark Structured Streaming

    6. Overview of Output Modes

    7. Using append as Output Mode

    8. Using complete as Output Mode

    9. Using update as Output Mode

    10. Overview of Triggers in Spark Structured Streaming

    11. Overview of built-in Output Sinks

    12. Previewing the Streaming Data

    1. Create Kafka Topic

    2. Read Data from Kafka Topic

    3. Preview data using console

    4. Preview data using memory

    5. Transform Data using Spark APIs

    6. Write Data to HDFS using Spark

    7. Validate Data in HDFS using Spark

    8. Write Data to HDFS using Spark using Header

    9. Cleanup Kafka Connect and Files in HDFS

    1. Overview of Spark Structured Streaming Triggers

    2. Steps for Incremental Data Processing

    3. Create Working Directory in HDFS

    4. Logic to Upload GHArchive Files

    5. Add new GHActivity JSON Files

    6. Read JSON Data using Spark Structured streaming

    7. Write in Parquet File Format

    8. Analyze GHArchive Data in Parquet files using Spark

    9. Add New GHActivity JSON files

    10. Load Data Incrementally to Target Table

    11. Validate Incremental Load

    12. Add New GHActivity JSON files

    13. Validate Incremental Load

    14. Using maxFilerPerTrigger and latestFirst

    15. Add New GHActivity JSON files

    16. Incremental Load using Archival Process

    17. Validate Incremental Load

About this course

  • Free
  • 59 lessons
  • 5 hours of video content

Discover your potential, starting today