Course curriculum
Overview of SQL Revision for Data Engineering
Introduction to SQL Revision for Data Engineering
Overview of Application Architecture and RDBMS
Overview of Database Technologies and relevance of SQL
Overview of Purpose Built Databases
Overview of Data Warehouse and Data Lake
Usage of RDBMS and Data Warehouse technologies
Differences and Similarities between RDBMS and Data Warehouse Technologies
Introduction to Setting up Tools for Data Engineering Essentials
Setup Git on Windows for Code Versioning
Setup VS Code on Windows
Setup Python 3.9 on Windows
Configure Environment Variable PATH for Python on Windows
Overview of learning Python using Python CLI
Integrate VSCode with Python on Windows
Install Postgres 14 on Windows 11
Getting Started with pgAdmin on Windows
Getting Started with pgAdmin on Mac
Conclusion of Setting up Tools for Data Engineering Essentials
Overview of Postgres Database Server and pgAdmin
Overview of Database Connection Details
Overview of Connecting to External Databases using pgAdmin
Create Application Database and User in Postgres Database Server
Clone Data Sets from Git Repository for Database Scripts
Register Server in pgAdmin using Application Database and User
Setup Application Tables and Data in Postgres Database
Overview of pgAdmin to write SQL Queries
Review Data Model Diagram
Define Problem Statement for SQL Queries
Filtering Data using SQL Queries
Total Aggregations using SQL Queries
Group By Aggregations using SQL Queries
Order of Execution of SQL Queries
Rules and Restrictions to Group and Filter Data in SQL queries
Filter Data based on Aggregated Results using Group By and Having
Inner Joins using SQL Queries
Outer Joins using SQL Queries
Filter and Aggregate on Join Results using SQL
Overview of Database Views
Overview of Common Table Expressions or CTEs
Outer Join with Additional Conditions in SQL Queries
Explanation about Fix of SQL Queries with Filtering on Outer Join Results
Introduction to Cumulative Aggregations and Ranking in SQL Queries
Overview of CTAS to create tables based on Query Results
Create Tables for Cumulative Aggregations and Ranking
Overview of OVER and PARTITION BY Clause in SQL Queries
Compute Total Aggregation using OVER and PARTITION BY in SQL Queries
Overview of Ranking in SQL
Compute Global Ranks using SQL
Compute Ranks based on key using SQL
Rules and Restrictions to Filter Data based on Ranks in SQL
Filtering based on Global Ranks using Nested Queries and CTEs in SQL
Filtering based on Ranks per Partition using Nested Queries and CTEs in SQL
Create Students table with Data for ranking using SQL
Difference between rank and dense rank using SQL
Introduction to SQL Troubleshooting and Debugging Guide
Overview of Database Connectivity Issues
Validate and Setup Telnet on Mac or PC
Validate Connectivity to Database Server using telnet
Troubleshoot Database Connectivity Issue with Correct Host Details
Current Databases and Users in Postgres Database Server
Troubleshoot Database Credentials and Permissions Issues
Overview of Compilation of SQL Queries
Troubleshooting Syntax Errors in SQL Queries
Troubleshooting Semantec Errors in SQL Queries
Overview of Bugs in SQL Queries
Development Best Practices with tips to troubleshoot SQL bugs
Develop Initial Solution based on the requirement
Identify and Troubleshoot Bugs in SQL Queries
Develop Solution using Development Best Practices
Introduction to Performance Tuning of SQL Queries
Overview of SQL Compilation Process and Explain Plans
Generate Explain Plans for SQL Queries
Review Tables used for Performance Tuning of SQL Queries
Review Data Storage Internals for Tables and Indexes
Review key terms used in Explain Plans for SQL Queries
Interpret Explain Plans for Basic SQL Queries
Review the Common Application Scenarios for Performance Tuning
Write SQL Queries for Customer Orders
Performance Testing of SQL Queries using Stored Procedure
Add Required Indexes to tune performance of SQL Queries
Guidelines on adding Indexes on Tables for SQL Queries
Interpreting the explain plan for SQL Queries using Indexes
Conclusion of Performance Tuning of SQL Queries
Simple Exercises for Filtering and Aggregations
Exercises on Joins and Aggregations using SQL
Solutions for Filtering and Aggregations
Solutions for Filtering and Aggregations
Validate Data and Review Data Model Diagram
Solution for Exercise 1 to get Customer Order Count
Solution for Exercise 2 to get Dormant Customers using Outer Join
Solution for Exercise 3 to get Revenue Per Customer using Outer Join
Solution for Exercise 4 to get Revenue Per Category
Solution for Exercise 5 to get Product Count Per Department
SQL - Frequently Asked Interview Questions
Tips for Technical Questions
How much do you rate your self in SQL?
What all you have done using SQL?
What is the difference between Truncate and Delete?
What are the different types of constraints you have used?
What is the difference between Primary Key and Unique Constraint?
What is the difference between Primary Key and Foreign Key Constraint?
Can a table have more than one Unique Constraint?
What happens to the data in the child table's foreign key column when data in the parent table is deleted?
What all different types of joins you have used?
What is the difference between inner join and outer join?
What is a full outer join?
What is the difference between WHERE and HAVING?
What is a view and how is it different from a table?
What is CTAS and how it can be used to create a table with structure but no data?
Overview of Python Revision for Data Engineering
Setup Material - Python Essentials for Data Engineering
Setup Visual Studio Workspace for Python Application Development
Setup Notebook Environment in VS Code Workspace
Overview of VS Code Notebook Environment
Overview of Cells in VS Code Notebook
Defining Functions in VS Code Notebooks
Run the Code in VS Code Notebook Cell by Line
Constants and Variables in Python
Overview of Python Data Types
Getting help on Python Variables and Functions
Pre-Defined String Manipulation Functions
Overview of Python Lists
Loops and Conditions in Python
User Defined Functions in Python
Overview of File IO using Python
Read Data from CSV File into Python List
Overview of Python Collections
Getting Started with Processing Python Lists
Overview of Lambda Functions in Python
Usage of Lambda Functions
Filter Data in Python Lists using filter and lambda
Get unique values from list using map and set
Sort Python lists using key
Overview of JSON Strings and Files
Read JSON Strings to Python dicts or lists
Read JSON Schemas from file to Python dicts
Overview of Processing JSON Data using Python
Extract Details from Complex JSON Arrays using Python
Sort Data in JSON Arrays using Python
Create Function to get Column Details from Schemas JSON File
Overview of Pandas for Data Processing
Overview of Reading CSV Data using Pandas
Read Data from CSV Files to Pandas Dataframes
Filter Data in Pandas Dataframe using query
Get Count by Status using Pandas Dataframe APIs
Get count by Month and Status using Pandas Dataframe APIs
Create Dataframes using dynamic column list on CSV Data
Performing Inner Join between Pandas Dataframes
Perform Aggregations on Join results
Sort Data in Pandas Dataframes
Overview of Writing Pandas Dataframes to Files
Write Pandas Dataframes to JSON Files
Introduction to Troubleshooting and Debugging Python issues
Guidelines for Troubleshooting and Debugging Python related Issues
Overview of Database Connectivity using Python Applications
Overview of Database Connectivity using Python
Troubleshoot Network Connectivity to the Database Server using telnet
Troubleshoot Module Related issues for Database Connectivity using Python
Troubleshoot Credentials Related issues for Database Connectivity using Python
Overview of Python process to run Python Applications
Troubleshooting Compilation Errors in Python
Troubleshooting Run Time Errors in Python
Overview of Software Development Life Cycle
Overview of Unit Testing or Validation of Applications
Overview of Debugging VS Code Notebooks using Debug Feature
Debug VS Code Notebooks using Debug Feature
Getting Started with Debugging of Python Programs using VS Code
Recap of running File Format Converter application
Debug Python Application using VS Code with breakpoints
Managing Breakpoints for Debugging in VS Code
Conclusion to Troubleshooting and Debugging Python Issues
Introduction to Performance of Python Applications
Setup Database Loader Python Application
Ensure Postgres Database is setup for file to db loader Python Application
Cleanup the tables to run file to db loader application
Run and Validate File to DB Loader Application
Fix the error message in file to db loader application
Overview of Execution of file to db loader application
Performance Tuning using Chunksize in Pandas
Review Pandas Data Frame API to load data into the target table
Overview of multi or batch insert into Database Tables
Develop application for multiprocessing
Getting Started with Multiprocessing using Python
Invoking User Defined Functions using multiprocessing in Python
Refactor File to Database Loader Application for Multiprocessing
Add Parallel Processing to file to db loader Python Application
Validate File to DB Loader Application with Multiprocessing
Understanding the concept of Multiprocessing in Python
Performance Tuning Scenarios of Python Applications
Project 1 Handout - File Format Converter
Get File Names to be processed using glob
Get Column Names using Schemas File
Get Data Set Names from File Names or Paths using regular expressions
Read CSV Data into Pandas Dataframe with Schema Dynamically
Generate File Paths for Target JSON Files Dynamically
Recap of Writing Pandas Dataframe to JSON File
Write Pandas Dataframe to JSON Files
Modularize File Format Converter for Dataset
Wrapper to Process all Data Sets
Setup Project for File Format Converter using Python
Install Dependencies for the Python Project using pip
Add Core Logic to Python Application
Overview of Run-time Arguments and Environment Variables
Using Run Time Arguments in Python Applications
Overview of Environment Variables
Setting Environment Variables on Windows or Mac or Linux
Use Environment Variables in Python Applications
Use Environment Variables in File Format Converter
Pass JSON Array as argument to Python Applications
Pass Data Sets as Run Time Arguments to File Format Converter
Exception Handling in Python Applications
Raising Exceptions in Python Applications
Exception Handling in File Format Converter Application
Project 2 Handout - Files To Database Loader
Install Python Dependencies for Pandas and Database Integration
Run Queries from Notebook using SQL Magic
Validate Pandas and SQL Integration
Write CSV Data from File to Database Table
Write CSV Data from Files to Database Tables in Chunks
Overview of Deploying File to DB Loader Project
Project 3 Handout - Rest Payload to the DB Loader Essentials
Processing JSON Data - Introduction
Overview of JSON
JSON Data Types
Create JSON String
Process JSON String
Single JSON Document in Files
Multiple JSON Documents in files
Process JSON using Pandas
Different JSON Formats supported by Pandas
Common Use Cases for JSON
Write to JSON files using json module
Write to JSON files using pandas
Overview of REST APIs
Using curl command
Overview of Postman
Getting Started with requests
Convert REST Payload to Python Objects
Process REST Payload using Collection Operations
Process REST Payload using Pandas
Python - Frequently Asked Interview Questions
How much do you rate your self in Python?
Can you elaborate your experience in Python?
What all Python Libraries or modules you have used?
Which library do you use for the data processing?
If you have to read the data from REST API, which library do you use?
What are the different Python collections or Data Structures?
What is the difference between list, set, dict and tuple?
How do you sort the data in a Python list? What is the purpose of keyword argument key?
What is the difference between sort and sorted?
What is Python Virtual Environment and what are the advantages of using Python Virtual Environment?
What is pip? How do you organize and install the required dependent libraries to the current project?
How do you check if file exists in a given path (Hint: using os module)?
How can you check the data type of a Python Variable?
Overview of Build and Deploy AWS Lambda Functions
Introduction to Getting Started on Windows with Required Tools
Overview of Powershell on Windows 10 or Windows 11
Setup Ubuntu VM on Windows 10 or 11 using wsl
Setup Ubuntu VM on Windows 10 or 11 using wsl
Setup Docker Desktop on Windows
Validate Docker on Windows using Command Line leveraging Power Shell
Review Docker Desktop Resource Configurations
Install Visual Studio Code on Windows
Install Remote Development Extension Kit for Visual Studio Code
Install Python 3.9 and Distutils on Windows using wsl Ubuntu
Review Tools Installed for Application Development using Python and AWS Services
Setup Project Folder using Visual Studio Code
Ensure Python 3.9 for the Project
Create Python Virtual Environment using Python 3.9 for the project
Install Required Dependencies for the Project using AWS Services
Ensure AWS CLI to interact with AWS Services using AWS CLI Commands
Recommendation to use Personal AWS Account for the course
Setup and Login into AWS Account
Setup AWS IAM User with Administrator Permissions
Configure and Validate AWS CLI
Configure AWS CLI with custom profile as default
Recap of Date Arithmetic using Python
Validate Python boto3 to interact with AWS Services
Setup and Validate Jupyter based Interactive Environment
Review GHActivity Data Details
Download GHActivity Data using requests
Review GHActivity Data using Pandas
Managing s3 using Python boto3
Overview of AWS Dynamodb
Create DynamoDB Table for Job Details
Create DynamoDB Table for Job Run Details
Recap of Date Arithmetic using Python
Get First Run Details to Copy GHActivity Data to AWS s3
Get Incremental Load Logic for next file
Understand AWS s3 concepts such as buckets and objects
Copying or Uploading Files to AWS s3 as objects using Python boto3
Writing Python Objects or Data as AWS s3 Objects using boto3
Convert Date Time to Integer Unix Epoch using Python
Validate Data Copied to AWS s3 and job run details
Run and Validate End to End Process
Overview of AWS Lambda and Getting Started using Python 3.9 Runtime
Passing Arguments to AWS Lambda and Processing using Python
Using Custom Handlers for AWS Lambda Functions using Python 3.9
Using AWS Services such as s3 in AWS Lambda Functions
Recap of handling permissions using AWS IAM Roles and User Groups
Develop AWS Lambda Function to list objects from AWS S3 Bucket
Passing Environment Variables to AWS Lambda Functions
Customizing Resources such as memory used for AWS Lambda Function
Understand Problem Statement for Python Application for AWS
Setup Python Project for AWS Lambda using Visual Studio Code
Core Logic to upload files to AWS S3 using Python boto3
Develop Python Application to upload files to AWS s3 using Python boto3
Build Zip File for Python Application to deploy as AWS Lambda Function
Deploy Python Application as AWS Lambda Function using Zip File
Conclusion and request for rating and feedback
Introduction to Build and Deploy AWS Lambda Function using Zip File
Update Application Code with Core logic for Ingestion
Overview of Validating User Defined Functions using Python CLI
Validate Application using Core Logic to ingest data
Add Lambda Handler to ingest data to AWS s3
Build Zip File for Python Application to deploy as AWS Lambda Function
Upload Python Application Zip File to s3 and deploy as AWS Lambda Function
Set Custom Handler and required Environment Variables for AWS Lambda Function
Granting Permissions on AWS s3 and Dynamodb to AWS Lambda Function via Role
Change Memory and Timeout for AWS Lambda Function and Test
Recap and Overview of Monitoring Lambda Functions using Cloudwatch
Limitations of Deploying AWS Lambda Function using Zip file
Automate Build of AWS Lambda Function using Shell Scripts
Introduction to Deploying AWS Lambda Functions using Python Runtime with Layers
Create Lambda Function to explore layers
Get list of Python Libraries installed in AWS Lambda Runtime
Add Existing AWS Layer to Lambda Function using Python runtime
Steps to Add and Configure Custom Layers to AWS Lambda Functions
Setup Local Environment using AWS Cloud Shell to Create Custom Layer
Install Required Dependencies for Lambda Layer for Python Runtime
Create Zip File and Upload to s3 with Python dependencies for AWS Lambda Layer
Create Lambda Layer using AWS Lambda Console using zip file in AWS s3
Configure Lambda Function with Custom Layer for Pandas and Requests
Troubleshoot and Fix the issues related to Lambda Layers for AWS Lambda Functions
Upload Zip File with Python boto3 to s3 for AWS Lambda Layer
Create Lambda Layer with latest version of Python boto3 for AWS Lambda Functions
Deploy AWS Lambda Function Sample Application with Layers
Overview of Data Warehousing using Amazon Serverless Redshift
Create Workgroup and Namespace for Amazon Redshift Serverless
Overview of Amazon Redshift Serverless Namespaces and Workgroups
Quick Preview of Amazon Redshift Serverless Dashboard
Validate Amazon Redshift Serverless Workgroup by running a query
Enable Public Accessbility to Redshift Serverless Workgroup
Understand Redshift Serverless Workgroup Capacity measured in RPUs
Introduction to Setup Redshift Spectrum Database using Redshift Serverless
Setup Files in S3 for Glue Catalog and Redshift Spectrum Database Tables
Cleanup Glue Catalog Database and Crawler using AWS Glue Console
Create Glue Crawler to Setup Glue Catalog Database and Tables for Redshift Spectrum
Run Glue Crawler to Create Glue Catalog Database and Tables for Redshift Spectrum
Create Redshift Serverless Workgroup and Namespace for Redshift Spectrum
Accessing Redshift using Jupyter Based Environment of VS Code
Create Database and User for Data Mart using AWS Redshift Query Editor
Create Database and User for Data Mart using Jupyter Notebooks
Create External Schema in Redshift Database using Glue Catalog Database
Validate External Schema Setup using Redshift Query Editor
Introduction to Basic SQL Queries using AWS Redshift SQL
Overview of Using WITH Clause in Redshift SQL Queries
Overview of Using Views in Redshift SQL Queries
Filtering Data using AWS Redshift SQL
Filtering Data using Boolean AND in Redshift SQL
Filtering Data using LIKE Operator in Redshift SQL
Filtering Data using Boolean OR and IN Operators in Redshift SQL
Overview of Count and Sum using Redshift SQL
Getting Total Average using Redshift SQL
Perform Total Aggregations based on Condition using Redshift SQL
Get Count and Distinct Count using Redshift SQL
Get Sum and Average on Order Item Measures using Redshift SQL
Perform Grouped Aggregations using Redshift SQL
Filtering on Aggregate Results using HAVING on GROUP BY
Overview of Order Of Execution of SQL using Group By and Having
Overview of Joins using Redshift Tables
Data Processing using Spark on Databricks
Process Data in DBFS using Databricks Spark SQL
Getting Started with Spark SQL Example using Databricks
Create Temporary Views using Spark SQL
Exercise to create temporary views using Spark SQL
Spark SQL Query to compute Daily Product Revenue
Save Query Result to DBFS using Spark SQL
Ranking using Spark SQL Windowing Functions
Create Temporary View for ranking using Spark SQL Windowing Functions
Compute Global Rank using Spark SQL Windowing Functions
Compute Ranks Per Key using Spark SQL Windowing Functions
Difference Between rank and dense_rank
Filter on Ranks using Spark SQL Windowing Functions
Overview of Pyspark Examples on Databricks
Process Schema Details in JSON using Pyspark
Create Dataframe with Schema from JSON File using Pyspark
Transform Data using Spark APIs
Get Schema Details for all Data Sets using Pyspark
Convert CSV to Parquet with Schema using Pyspark
Overview of Data Processing using Spark on EMR
Create bootstrap script for AWS EMR Cluster
Provision Elastic IP for Master Node of AWS EMR Cluster
Create AWS EMR Cluster for Development
Troubleshooting Issues related to Bootstrap of EMR Cluster
Fix Bootstrap Script for AWS EMR Cluster
Validate AWS EMR Cluster with Bootstrap Action with updated script
Get Cluster Details of AWS EMR Development Cluster using boto3
Getting Started with Boto3 to Manage AWS EMR Clusters
Set AWS Profile using env file in Visual Studio Code
Setup boto3 to explore APIs to manage AWS EMR Clusters
Setup Python Virtual Environment as part of VS Code Workspace
Associating Elastic Ip with AWS EMR Master Node using Boto3
Getting Instance Id of the Master Node of AWS EMR Cluster using boto3
Setup Notebook Environment for EMR Cluster using IAM User
Getting Allocation Id of the Elastic Ip using AWS boto3
Open Remote Window on AWS EMR Master Node using VS Code
Setup Workspace on AWS EMR Master using Git Repository
Best Practices and Advantages of using AWS EMR Cluster for Team Development
Install VSCode Extensions in remote Workspace for Python
Review Python and Pyspark details on EMR Cluster
Running Applications using local and yarn during development
Getting Started with Development of Spark Applications on EMR Cluster
Create Function for Spark Session
Upload Files to AWS s3 for the development using AWS EMR Cluster
Develop read logic for the Spark Application
Process Data Frame using Spark APIs
Write Data to Files using Spark APIs
Productionize the Code and setup required data sets for validation
Resize the AWS EMR Cluster using Web Console
Validate Changes to productionize the Application Code
Take the backup and terminate the cluster
Recreate the AWS EMR Cluster to deploy Spark Applications
Resize the AWS EMR Cluster to validate application on larger data sets
Build Zip File for the Spark Application
Setup Code Repository on the AWS EMR Master Node
Run Spark Application copied to s3 on EMR using Cluster Deployment Mode
Run Spark Application on EMR using Cluster Deployment Mode
Validate the Spark Application using zip file and client as deploy mode
Validate Spark Application Deployed as Step on AWS EMR Cluster
Deploy Spark Application as Step to the AWS EMR Cluster
Update Material related to Managing AWS EMR using Boto3
Create AWS EMR Cluster using AWS CLI Command
Manage AWS EMR Clusters using AWS CLI Commands
Overview of AWS boto3 to Manage AWS EMR Clusters
Overview of Run Job Flow API to create AWS EMR Cluster
Create AWS EMR Cluster or Job Flow Cluster using AWS Boto3
Prepare Data Sets to add Spark Application as Step to AWS EMR Cluster
Add Spark Application as Step to AWS EMR Cluster using Boto3
Exercise to add Spark Application as Step to EMR Cluster using boto3
Terminate the AWS EMR Cluster used for adding Steps
Exercise to Create AWS EMR Cluster with Steps for Spark Application
Overview of Orchestration using Step Functions and EMR
Review of Development Environment for AWS Step Functions and EMR
Quick Overview of Important Terms of AWS Step Functions
Getting Started with EMR based Pipeline using AWS Step Functions copy
Overview of AWS IAM Role associated with State Machine copy
Overview of Creating EMR Cluster using AWS Step Functions copy
Parameters to Create EMR Cluster using AWS Step Functions copy
Attach Permissions to Step Function Role to Create AWS EMR Cluster copy
Add Step to AWS EMR Cluster using AWS Step Function
Validate Adding Step to AWS EMR Cluster using Step Functions copy
Validate the execution of State Machine to run Spark Application on AWS EMR Cluster copy
Add Action to Step Machine to Terminate the AWS EMR Cluster
Terminate AWS EMR Clusters Created to Validate State Machine copy
Review the current state of AWS EMR based Pipeline or State Machine copy
Create State Machine using AWS Step Function to Validate s3 copy
Attach Policy with Permissions on AWS s3 to Step Function Role copy
Setup File in AWS s3 and Validate State Machine to list objects copy
Relationship between AWS Boto3 and Actions in Step Functions copy
Add State to Delete Object from AWS s3 copy
Fix Permissions and Run State Machine to Delete Object from AWS s3 copy
Passing Input to States in AWS Step Functions State Machine copy
Setup Multiple Files to Manage AWS s3 Objects using State Machines copy
Process AWS s3 Objects using Map in State Machine
Extract Key of AWS s3 Objects using Step Functions Pass
Add State to AWS Step Function Delete s3 Object
Develop AWS Lambda Function to customise State Machine Data
Add AWS Lambda Function to State Machine to Pass s3 Details for delete
Add Condition to State Machine to avoid Key Error on AWS s3 List Objects
Overview of Map Concurrency in State Machines of AWS Step Functions
Invoking AWS Step Function State Machine from Other State Machines
Overview of integration of s3 based State Machine with EMR State Machine
Taking back up of AWS Step Functions State Machines
Grant Permissions between AWS Step Functions State Machines via IAM Role
Update AWS Step Function State Machine with EMR to validate s3
Pass EMR Step Details to AWS Step Functions State
Validate AWS Step Function EMR based State Machine Execution
Run AWS Step Function State Machine to validate logic to delete AWS s3 Objects
Exercise to add validation of source s3 location in AWS Step Function State Machine
Update AWS Step Function State Machine to Validate Source s3 Location
Run AWS Step Function State Function with source s3 Validation Logic
Develop AWS Lambda Function to check number of files in source s3
Attach Policy to State Machine Role to Invoke AWS Lambda Function
Run Updated State Machine to validate source count
Best Practices to Run AWS Step Functions State Machines
Setup AWS EMR Cluster to develop applications using Spark SQL
Setup Visual Studio Code Workspace using AWS EMR Master Node
Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
Setup Required Data Sets for Spark SQL
Upload Retail DB Files to AWS s3 using AWS CLI commands
Getting Started with Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster
Create Spark SQL Temporary Views for Orders and Order Items
Join and Aggregate using Spark SQL on AWS EMR Cluster
Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
Develop Script using Spark SQL Commands
Parameterize Bucket Name in Spark SQL Script
Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
Deploy Spark SQL Script as Step on AWS EMR Cluster
Conclusion to Develop Spark SQL Applications on EMR Cluster
Create State Machine to Deploy Spark SQL Script on AWS EMR Cluster
Overview of Managing AWS EMR Clusters using Boto3
Overview of AWS boto3 to Manage AWS EMR Clusters
Create AWS EMR Job Flow Cluster using Python Boto3
Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
Overview of AWS EMR Waiters using Python Boto3
Terminate AWS EMR Cluster using waiters and Python Boto3
Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
Create State Machine using AWS Step Function to create EMR Cluster
Grant Permissions to State Machine via Role to Create AWS EMR Cluster
Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
Pass AWS EMR Step Details as Input to State Machine at Execution Time
Validate Spark SQL Script Execution as AWS EMR Step using State Machine
Overview of Integration of Spark and Redshift
Create AWS EC2 Elastic IP and Key Pair for AWS EMR Cluster
Create Shell Script for AWS EMR Bootstrap Action to install boto3
Create AWS EMR Cluster to integrate with Amazon Redshift
Attach Elastic IP to the AWS EMR Master Node and Validate SSH Connectivity
Setup Project for AWS EMR and Redshift Integration using VS Code Remote Development
Setup Amazon Redshift Serverless Workgroup and Validate Connetivity
Connect to Redshift Serverless Workgroup from AWS EMR Master using psql
Setup Required Database and User in Amazon Redshift Serverless Workgroup
Install Python Library psycopg2 to connect to Redshift Databases using Python
Validate Redshift Connectivity using Python from AWS EMR Master Node
Create and Validate Redshift Database Tables
Create Secret for Redshift Database using AWS Secrets Manager
Validate Python Boto3 on Master Node of AWS EMR Cluster
Read Secret from AWS Secrets Manager using Python Boto3
Validate Redshift Connectivity from Master Node of AWS EMR Cluster
Launch Pyspark CLI with Redshift Dependencies on AWS EMR Master Node
Validate Redshift Connectivity using Spark on AWS EMR Cluster
Develop Code to Validate Spark and Redshift Integration using EMR
Setup GHActivity Data in AWS s3
Read and Process Data using Pyspark to write into Redshift Table
Develop Write Logic to load Spark Dataframe into Redshift Table
Validate Spark Load Process to Amazon Redshift Table
Understanding AWS s3 Temp Location specified in Spark Applications
Conclusion on Integration of AWS EMR with Amazon Redshift
Setup AWS EMR Cluster to develop applications using Spark SQL
Setup Visual Studio Code Workspace using AWS EMR Master Node
Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
Setup Required Data Sets for Spark SQL
Upload Retail DB Files to AWS s3 using AWS CLI commands
Getting Started with Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster
Create Spark SQL Temporary Views for Orders and Order Items
Join and Aggregate using Spark SQL on AWS EMR Cluster
Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
Develop Script using Spark SQL Commands
Parameterize Bucket Name in Spark SQL Script
Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
Deploy Spark SQL Script as Step on AWS EMR Cluster
Conclusion to Develop Spark SQL Applications on EMR Cluster
Introduction to Integration of AWS Lambda Functions and Redshift
Setup Redshift Serverless Workgroup and Namespace
Setup Workspace for Integration of AWS Lambda Functions and Redshift
Validate JSON Data in AWS s3 using Pandas
Get Redshift Cluster Details using Python boto3
Get Redshift Serverless Details using Python Boto3
Run SQL Queries using Redshift Serverless and Python Boto3
Capture Redshift Query Results using Python Boto3
Create Database and User in Redshift Serverless Namespace
Create Table in Redshift Serverless Namespace
Overview of Python Boto3 Waiters
Run Queries against Redshift Table using Boto3 without credentials
Create and Validate Secret using AWS Secrets Manager for Redshift Workgroup
Copy Processed Data from AWS s3 into Redshift Table
Conclusion on Developing Applications using Redshift and Python Boto3
Overview of Data Pipelines using EMR and Redshift
Introduction to Integration of AWS Lambda Functions and Redshift
Getting Started with Lambda Function using boto3
Running Lambda Function using AWS Lambda Console
Troubleshoot issues of AWS Lambda Functions using Cloudwatch Logs
Check Python Boto3 Version in AWS Lambda Function Run Time Environment
Overview of adding Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
Copy Zip File with Latest Boto3 to AWS s3 for Lambda Layer
Create Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
Create Function to Copy Data into Redshift Table using boto3
Update Lambda Handler to copy data to Redshift Table
Grant Permissions on Redshift Secret to AWS Lambda Function via IAM Role
Grant Permissions on Redshift Data API to AWS Lambda Function via IAM Role
Review Redshift Workgroup and Truncate Table before running Lambda Function
Run AWS Lambda Function to Copy Data to Redshift Table
Validate Data Copied by AWS Lambda Function in Redshift Table by running queries
Introduction to Data Pipeline using AWS Step Functions with EMR and Redshift
Getting Started with State Machines or Data Pipelines using AWS Step Functions
Review Execution Details of State Machine or Data Pipeline using AWS Step Functions
Manage State Machines using AWS Step Functions State Machines Dashboard
Create State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Table
Update State Machine with Permissions on Lambda to Copy Data From s3 to Redshift Table
Run State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Table
Overview of Managing AWS EMR Clusters using Boto3
Overview of AWS boto3 to Manage AWS EMR Clusters
Create AWS EMR Job Flow Cluster using Python Boto3
Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
Overview of AWS EMR Waiters using Python Boto3
Terminate AWS EMR Cluster using waiters and Python Boto3
Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
Create State Machine using AWS Step Function to create EMR Cluster
Grant Permissions to State Machine via Role to Create AWS EMR Cluster
Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
Pass AWS EMR Step Details as Input to State Machine at Execution Time
Validate Spark SQL Script Execution as AWS EMR Step using State Machine
Create Data Pipeline with EMR and Redshift Integration using AWS Step Functions
Grant Permissions on AWS EMR to role of State Machine with EMR and Redshift Integration
Run AWS Step Function State Machine with EMR and Redshift Integration
Validate AWS State Machine Execution with EMR and Redshift Integration
Best Practices to Build State Machines with AWS EMR and Redshift Integration
Overview of Glue Components and Glue Catalog
Introduction - Overview of Glue Components
Create Crawler and Catalog Table
Analyze Data using Athena
Creating S3 Bucket and Role
Create and Run the Glue Job
Validate using Glue CatalogTable and Athena
Create and Run Glue Trigger
Create Glue Workflow
Run Glue Workflow and Validate
Prerequisites for Glue Catalog Tables
Steps for Creating Catalog Tables
Download Data Set
Upload data to s3
Create Glue Catalog Database - itvghlandingdb
Create Glue Catalog Table - ghactivity
Running Queries using Athena - ghactivity
Crawling Multiple Folders
Managing Glue Catalog using AWS CLI
Managing Glue Catalog using Python Boto3
Data Analysis using Amazon Athena
Getting Started with Amazon Athena
Quick Recap of Glue Catalog Databases and Tables
Access Glue Catalog Databases and Tables using Athena Query Editor
Create Database and Table using Athena
Populate Data into Table using Athena
Using CTAS to create tables using Athena
Overview of Amazon Athena Architecture
Amazon Athena Resources and relationship with Hive
Create Partitioned Table using Athena
Develop Query for Partitioned Column
Insert into Partitioned Tables using Athena
Validate Data Partitioning using Athena
Drop Athena Tables and Delete Data Files
Drop Partitioned Table using Athena
Data Partitioning in Athena using CTAS
Amazon Athena using AWS CLI - Introduction
Get help and list Athena databases using AWS CLI
Managing Athena Workgroups using AWS CLI
Run Athena Queries using AWS CLI
Get Athena Table Metadata using AWS CLI
Run Athena Queries with custom location using AWS CLI
Drop Athena table using AWS CLI
Run CTAS under Athena using AWS CLI
Amazon Athena using Python boto3 - Introduction
Getting Started with Managing Athena using Python boto3
List Amazon Athena Databases using Python boto3
List Amazon Athena Tables using Python boto3
Run Amazon Athena Queries with boto3
Review Athena Query Results using boto3
Persist Amazon Athena Query Results in Custom Location using boto3
Processing Athena Query Results using Pandas
Run CTAS against Amazon Athena using Python boto3

About this course
- $300.00
- 663 lessons
- 52 hours of video content