Course curriculum
-
-
Overview of Kafka
-
Managing Topics using Kafka CLI
-
Produce and Consume Messages using CLI
-
Validate Generation of Web Server Logs
-
Create Web Server using nc
-
Produce retail logs to Kafka Topic
-
Consume retail logs from Kafka Topic
-
Clean up Kafka CLI Sessions to produce and consume messages
-
-
-
Overview of Kafka Connect
-
Define Kafka Connect to Produce Messages
-
Validate Kafka Connect to produce messages
-
Cleanup Kafka Connect to produce messages
-
Write Data to HDFS using Kafka Connect
-
Setup HDFS 3 Sink Connector Plugin
-
Overview of Kafka Consumer Groups
-
Configure HDFS 3 Sink Properties
-
Run and Validate HDFS 3 Sink
-
Cleanup Kafka Connect to consume messages
-
Configure HDFS 3 Sink Properties for String Format
-
Run and Validate HDFS 3 Sink using String Format
-
Cleanup Kafka Connect to consume messages
-
-
-
Understanding Streaming Context
-
Validate Log Data for Streaming
-
Push log messages to Netcat Webserver
-
Overview of built-in Input Sources.cmproj
-
Reading Web Server logs using Spark Structured Streaming
-
Overview of Output Modes
-
Using append as Output Mode
-
Using complete as Output Mode
-
Using update as Output Mode
-
Overview of Triggers in Spark Structured Streaming
-
Overview of built-in Output Sinks
-
Previewing the Streaming Data
-
-
-
Create Kafka Topic
-
Read Data from Kafka Topic
-
Preview data using console
-
Preview data using memory
-
Transform Data using Spark APIs
-
Write Data to HDFS using Spark
-
Validate Data in HDFS using Spark
-
Write Data to HDFS using Spark using Header
-
Cleanup Kafka Connect and Files in HDFS
-
-
-
Overview of Spark Structured Streaming Triggers
-
Steps for Incremental Data Processing
-
Create Working Directory in HDFS
-
Logic to Upload GHArchive Files
-
Add new GHActivity JSON Files
-
Read JSON Data using Spark Structured streaming
-
Write in Parquet File Format
-
Analyze GHArchive Data in Parquet files using Spark
-
Add New GHActivity JSON files
-
Load Data Incrementally to Target Table
-
Validate Incremental Load
-
Add New GHActivity JSON files
-
Validate Incremental Load
-
Using maxFilerPerTrigger and latestFirst
-
Add New GHActivity JSON files
-
Incremental Load using Archival Process
-
Validate Incremental Load
-

About this course
- Free
- 59 lessons
- 5 hours of video content