Build Streaming Pipelines using Kafka and Spark Structured Streaming

Enroll for free

Course curriculum

1. Overview of Kafka
2. Managing Topics using Kafka CLI
3. Produce and Consume Messages using CLI
4. Validate Generation of Web Server Logs
5. Create Web Server using nc
6. Produce retail logs to Kafka Topic
7. Consume retail logs from Kafka Topic
8. Clean up Kafka CLI Sessions to produce and consume messages
1. Overview of Kafka Connect
2. Define Kafka Connect to Produce Messages
3. Validate Kafka Connect to produce messages
4. Cleanup Kafka Connect to produce messages
5. Write Data to HDFS using Kafka Connect
6. Setup HDFS 3 Sink Connector Plugin
7. Overview of Kafka Consumer Groups
8. Configure HDFS 3 Sink Properties
9. Run and Validate HDFS 3 Sink
10. Cleanup Kafka Connect to consume messages
11. Configure HDFS 3 Sink Properties for String Format
12. Run and Validate HDFS 3 Sink using String Format
13. Cleanup Kafka Connect to consume messages
1. Understanding Streaming Context
2. Validate Log Data for Streaming
3. Push log messages to Netcat Webserver
4. Overview of built-in Input Sources.cmproj
5. Reading Web Server logs using Spark Structured Streaming
6. Overview of Output Modes
7. Using append as Output Mode
8. Using complete as Output Mode
9. Using update as Output Mode
10. Overview of Triggers in Spark Structured Streaming
11. Overview of built-in Output Sinks
12. Previewing the Streaming Data
1. Create Kafka Topic
2. Read Data from Kafka Topic
3. Preview data using console
4. Preview data using memory
5. Transform Data using Spark APIs
6. Write Data to HDFS using Spark
7. Validate Data in HDFS using Spark
8. Write Data to HDFS using Spark using Header
9. Cleanup Kafka Connect and Files in HDFS
1. Overview of Spark Structured Streaming Triggers
2. Steps for Incremental Data Processing
3. Create Working Directory in HDFS
4. Logic to Upload GHArchive Files
5. Add new GHActivity JSON Files
6. Read JSON Data using Spark Structured streaming
7. Write in Parquet File Format
8. Analyze GHArchive Data in Parquet files using Spark
9. Add New GHActivity JSON files
10. Load Data Incrementally to Target Table
11. Validate Incremental Load
12. Add New GHActivity JSON files
13. Validate Incremental Load
14. Using maxFilerPerTrigger and latestFirst
15. Add New GHActivity JSON files
16. Incremental Load using Archival Process
17. Validate Incremental Load

About this course

Free
59 lessons
5 hours of video content

Discover your potential, starting today

Enroll today

Build Streaming Pipelines using Kafka and Spark Structured Streaming

Course curriculum

Getting Started with Kafka

Data Ingestion using Kafka Connect

Overview of Spark Structured Streaming

Kafka and Spark Structured Streaming Integration

Incremental Loads using Spark Structured Streaming

About this course

Discover your potential, starting today