Course curriculum
-
-
Overview of SQL Revision for Data Engineering
-
-
-
Introduction to SQL Revision for Data Engineering
-
Overview of Application Architecture and RDBMS
-
Overview of Database Technologies and relevance of SQL
-
Overview of Purpose Built Databases
-
Overview of Data Warehouse and Data Lake
-
Usage of RDBMS and Data Warehouse technologies
-
Differences and Similarities between RDBMS and Data Warehouse Technologies
-
-
-
Introduction to Setting up Tools for Data Engineering Essentials
-
Setup Git on Windows for Code Versioning
-
Setup VS Code on Windows
-
Setup Python 3.9 on Windows
-
Configure Environment Variable PATH for Python on Windows
-
Overview of learning Python using Python CLI
-
Integrate VSCode with Python on Windows
-
Install Postgres 14 on Windows 11
-
Getting Started with pgAdmin on Windows
-
Getting Started with pgAdmin on Mac
-
Conclusion of Setting up Tools for Data Engineering Essentials
-
-
-
Overview of Postgres Database Server and pgAdmin
-
Overview of Database Connection Details
-
Overview of Connecting to External Databases using pgAdmin
-
Create Application Database and User in Postgres Database Server
-
Clone Data Sets from Git Repository for Database Scripts
-
Register Server in pgAdmin using Application Database and User
-
Setup Application Tables and Data in Postgres Database
-
Overview of pgAdmin to write SQL Queries
-
-
-
Review Data Model Diagram
-
Define Problem Statement for SQL Queries
-
Filtering Data using SQL Queries
-
Total Aggregations using SQL Queries
-
Group By Aggregations using SQL Queries
-
Order of Execution of SQL Queries
-
Rules and Restrictions to Group and Filter Data in SQL queries
-
Filter Data based on Aggregated Results using Group By and Having
-
Inner Joins using SQL Queries
-
Outer Joins using SQL Queries
-
Filter and Aggregate on Join Results using SQL
-
Overview of Database Views
-
Overview of Common Table Expressions or CTEs
-
Outer Join with Additional Conditions in SQL Queries
-
Explanation about Fix of SQL Queries with Filtering on Outer Join Results
-
-
-
Introduction to Cumulative Aggregations and Ranking in SQL Queries
-
Overview of CTAS to create tables based on Query Results
-
Create Tables for Cumulative Aggregations and Ranking
-
Overview of OVER and PARTITION BY Clause in SQL Queries
-
Compute Total Aggregation using OVER and PARTITION BY in SQL Queries
-
Overview of Ranking in SQL
-
Compute Global Ranks using SQL
-
Compute Ranks based on key using SQL
-
Rules and Restrictions to Filter Data based on Ranks in SQL
-
Filtering based on Global Ranks using Nested Queries and CTEs in SQL
-
Filtering based on Ranks per Partition using Nested Queries and CTEs in SQL
-
Create Students table with Data for ranking using SQL
-
Difference between rank and dense rank using SQL
-
-
-
Introduction to SQL Troubleshooting and Debugging Guide
-
Overview of Database Connectivity Issues
-
Validate and Setup Telnet on Mac or PC
-
Validate Connectivity to Database Server using telnet
-
Troubleshoot Database Connectivity Issue with Correct Host Details
-
Current Databases and Users in Postgres Database Server
-
Troubleshoot Database Credentials and Permissions Issues
-
Overview of Compilation of SQL Queries
-
Troubleshooting Syntax Errors in SQL Queries
-
Troubleshooting Semantec Errors in SQL Queries
-
Overview of Bugs in SQL Queries
-
Development Best Practices with tips to troubleshoot SQL bugs
-
Develop Initial Solution based on the requirement
-
Identify and Troubleshoot Bugs in SQL Queries
-
Develop Solution using Development Best Practices
-
-
-
Introduction to Performance Tuning of SQL Queries
-
Overview of SQL Compilation Process and Explain Plans
-
Generate Explain Plans for SQL Queries
-
Review Tables used for Performance Tuning of SQL Queries
-
Review Data Storage Internals for Tables and Indexes
-
Review key terms used in Explain Plans for SQL Queries
-
Interpret Explain Plans for Basic SQL Queries
-
Review the Common Application Scenarios for Performance Tuning
-
Write SQL Queries for Customer Orders
-
Performance Testing of SQL Queries using Stored Procedure
-
Add Required Indexes to tune performance of SQL Queries
-
Guidelines on adding Indexes on Tables for SQL Queries
-
Interpreting the explain plan for SQL Queries using Indexes
-
Conclusion of Performance Tuning of SQL Queries
-
-
-
Simple Exercises for Filtering and Aggregations
-
Exercises on Joins and Aggregations using SQL
-
-
-
Solutions for Filtering and Aggregations
-
Solutions for Filtering and Aggregations
-
Validate Data and Review Data Model Diagram
-
Solution for Exercise 1 to get Customer Order Count
-
Solution for Exercise 2 to get Dormant Customers using Outer Join
-
Solution for Exercise 3 to get Revenue Per Customer using Outer Join
-
Solution for Exercise 4 to get Revenue Per Category
-
Solution for Exercise 5 to get Product Count Per Department
-
-
-
SQL - Frequently Asked Interview Questions
-
Tips for Technical Questions
-
How much do you rate your self in SQL?
-
What all you have done using SQL?
-
What is the difference between Truncate and Delete?
-
What are the different types of constraints you have used?
-
What is the difference between Primary Key and Unique Constraint?
-
What is the difference between Primary Key and Foreign Key Constraint?
-
Can a table have more than one Unique Constraint?
-
What happens to the data in the child table's foreign key column when data in the parent table is deleted?
-
What all different types of joins you have used?
-
What is the difference between inner join and outer join?
-
What is a full outer join?
-
What is the difference between WHERE and HAVING?
-
What is a view and how is it different from a table?
-
What is CTAS and how it can be used to create a table with structure but no data?
-
-
-
Overview of Python Revision for Data Engineering
-
Setup Material - Python Essentials for Data Engineering
-
-
-
Setup Visual Studio Workspace for Python Application Development
-
Setup Notebook Environment in VS Code Workspace
-
Overview of VS Code Notebook Environment
-
Overview of Cells in VS Code Notebook
-
Defining Functions in VS Code Notebooks
-
Run the Code in VS Code Notebook Cell by Line
-
Constants and Variables in Python
-
Overview of Python Data Types
-
Getting help on Python Variables and Functions
-
Pre-Defined String Manipulation Functions
-
Overview of Python Lists
-
Loops and Conditions in Python
-
User Defined Functions in Python
-
-
-
Overview of File IO using Python
-
Read Data from CSV File into Python List
-
Overview of Python Collections
-
Getting Started with Processing Python Lists
-
Overview of Lambda Functions in Python
-
Usage of Lambda Functions
-
Filter Data in Python Lists using filter and lambda
-
Get unique values from list using map and set
-
Sort Python lists using key
-
Overview of JSON Strings and Files
-
Read JSON Strings to Python dicts or lists
-
Read JSON Schemas from file to Python dicts
-
Overview of Processing JSON Data using Python
-
Extract Details from Complex JSON Arrays using Python
-
Sort Data in JSON Arrays using Python
-
Create Function to get Column Details from Schemas JSON File
-
-
-
Overview of Pandas for Data Processing
-
Overview of Reading CSV Data using Pandas
-
Read Data from CSV Files to Pandas Dataframes
-
Filter Data in Pandas Dataframe using query
-
Get Count by Status using Pandas Dataframe APIs
-
Get count by Month and Status using Pandas Dataframe APIs
-
Create Dataframes using dynamic column list on CSV Data
-
Performing Inner Join between Pandas Dataframes
-
Perform Aggregations on Join results
-
Sort Data in Pandas Dataframes
-
Overview of Writing Pandas Dataframes to Files
-
Write Pandas Dataframes to JSON Files
-
-
-
Introduction to Troubleshooting and Debugging Python issues
-
Guidelines for Troubleshooting and Debugging Python related Issues
-
Overview of Database Connectivity using Python Applications
-
Overview of Database Connectivity using Python
-
Troubleshoot Network Connectivity to the Database Server using telnet
-
Troubleshoot Module Related issues for Database Connectivity using Python
-
Troubleshoot Credentials Related issues for Database Connectivity using Python
-
Overview of Python process to run Python Applications
-
Troubleshooting Compilation Errors in Python
-
Troubleshooting Run Time Errors in Python
-
Overview of Software Development Life Cycle
-
Overview of Unit Testing or Validation of Applications
-
Overview of Debugging VS Code Notebooks using Debug Feature
-
Debug VS Code Notebooks using Debug Feature
-
Getting Started with Debugging of Python Programs using VS Code
-
Recap of running File Format Converter application
-
Debug Python Application using VS Code with breakpoints
-
Managing Breakpoints for Debugging in VS Code
-
Conclusion to Troubleshooting and Debugging Python Issues
-
-
-
Introduction to Performance of Python Applications
-
Setup Database Loader Python Application
-
Ensure Postgres Database is setup for file to db loader Python Application
-
Cleanup the tables to run file to db loader application
-
Run and Validate File to DB Loader Application
-
Fix the error message in file to db loader application
-
Overview of Execution of file to db loader application
-
Performance Tuning using Chunksize in Pandas
-
Review Pandas Data Frame API to load data into the target table
-
Overview of multi or batch insert into Database Tables
-
Develop application for multiprocessing
-
Getting Started with Multiprocessing using Python
-
Invoking User Defined Functions using multiprocessing in Python
-
Refactor File to Database Loader Application for Multiprocessing
-
Add Parallel Processing to file to db loader Python Application
-
Validate File to DB Loader Application with Multiprocessing
-
Understanding the concept of Multiprocessing in Python
-
Performance Tuning Scenarios of Python Applications
-
-
-
Project 1 Handout - File Format Converter
-
Get File Names to be processed using glob
-
Get Column Names using Schemas File
-
Get Data Set Names from File Names or Paths using regular expressions
-
Read CSV Data into Pandas Dataframe with Schema Dynamically
-
Generate File Paths for Target JSON Files Dynamically
-
Recap of Writing Pandas Dataframe to JSON File
-
Write Pandas Dataframe to JSON Files
-
Modularize File Format Converter for Dataset
-
Wrapper to Process all Data Sets
-
Setup Project for File Format Converter using Python
-
Install Dependencies for the Python Project using pip
-
Add Core Logic to Python Application
-
Overview of Run-time Arguments and Environment Variables
-
Using Run Time Arguments in Python Applications
-
Overview of Environment Variables
-
Setting Environment Variables on Windows or Mac or Linux
-
Use Environment Variables in Python Applications
-
Use Environment Variables in File Format Converter
-
Pass JSON Array as argument to Python Applications
-
Pass Data Sets as Run Time Arguments to File Format Converter
-
Exception Handling in Python Applications
-
Raising Exceptions in Python Applications
-
Exception Handling in File Format Converter Application
-
-
-
Project 2 Handout - Files To Database Loader
-
Install Python Dependencies for Pandas and Database Integration
-
Run Queries from Notebook using SQL Magic
-
Validate Pandas and SQL Integration
-
Write CSV Data from File to Database Table
-
Write CSV Data from Files to Database Tables in Chunks
-
Overview of Deploying File to DB Loader Project
-
-
-
Project 3 Handout - Rest Payload to the DB Loader Essentials
-
Processing JSON Data - Introduction
-
Overview of JSON
-
JSON Data Types
-
Create JSON String
-
Process JSON String
-
Single JSON Document in Files
-
Multiple JSON Documents in files
-
Process JSON using Pandas
-
Different JSON Formats supported by Pandas
-
Common Use Cases for JSON
-
Write to JSON files using json module
-
Write to JSON files using pandas
-
Overview of REST APIs
-
Using curl command
-
Overview of Postman
-
Getting Started with requests
-
Convert REST Payload to Python Objects
-
Process REST Payload using Collection Operations
-
Process REST Payload using Pandas
-
-
-
Python - Frequently Asked Interview Questions
-
How much do you rate your self in Python?
-
Can you elaborate your experience in Python?
-
What all Python Libraries or modules you have used?
-
Which library do you use for the data processing?
-
If you have to read the data from REST API, which library do you use?
-
What are the different Python collections or Data Structures?
-
What is the difference between list, set, dict and tuple?
-
How do you sort the data in a Python list? What is the purpose of keyword argument key?
-
What is the difference between sort and sorted?
-
What is Python Virtual Environment and what are the advantages of using Python Virtual Environment?
-
What is pip? How do you organize and install the required dependent libraries to the current project?
-
How do you check if file exists in a given path (Hint: using os module)?
-
How can you check the data type of a Python Variable?
-
-
-
Overview of Build and Deploy AWS Lambda Functions
-
-
-
Introduction to Getting Started on Windows with Required Tools
-
Overview of Powershell on Windows 10 or Windows 11
-
Setup Ubuntu VM on Windows 10 or 11 using wsl
-
Setup Ubuntu VM on Windows 10 or 11 using wsl
-
Setup Docker Desktop on Windows
-
Validate Docker on Windows using Command Line leveraging Power Shell
-
Review Docker Desktop Resource Configurations
-
Install Visual Studio Code on Windows
-
Install Remote Development Extension Kit for Visual Studio Code
-
Install Python 3.9 and Distutils on Windows using wsl Ubuntu
-
Review Tools Installed for Application Development using Python and AWS Services
-
-
-
Setup Project Folder using Visual Studio Code
-
Ensure Python 3.9 for the Project
-
Create Python Virtual Environment using Python 3.9 for the project
-
Install Required Dependencies for the Project using AWS Services
-
Ensure AWS CLI to interact with AWS Services using AWS CLI Commands
-
Recommendation to use Personal AWS Account for the course
-
-
-
Setup and Login into AWS Account
-
Setup AWS IAM User with Administrator Permissions
-
Configure and Validate AWS CLI
-
Configure AWS CLI with custom profile as default
-
Recap of Date Arithmetic using Python
-
Validate Python boto3 to interact with AWS Services
-
Setup and Validate Jupyter based Interactive Environment
-
Review GHActivity Data Details
-
Download GHActivity Data using requests
-
Review GHActivity Data using Pandas
-
-
-
Managing s3 using Python boto3
-
Overview of AWS Dynamodb
-
Create DynamoDB Table for Job Details
-
Create DynamoDB Table for Job Run Details
-
Recap of Date Arithmetic using Python
-
Get First Run Details to Copy GHActivity Data to AWS s3
-
Get Incremental Load Logic for next file
-
Understand AWS s3 concepts such as buckets and objects
-
Copying or Uploading Files to AWS s3 as objects using Python boto3
-
Writing Python Objects or Data as AWS s3 Objects using boto3
-
Convert Date Time to Integer Unix Epoch using Python
-
Validate Data Copied to AWS s3 and job run details
-
Run and Validate End to End Process
-
-
-
Overview of AWS Lambda and Getting Started using Python 3.9 Runtime
-
Passing Arguments to AWS Lambda and Processing using Python
-
Using Custom Handlers for AWS Lambda Functions using Python 3.9
-
Using AWS Services such as s3 in AWS Lambda Functions
-
Recap of handling permissions using AWS IAM Roles and User Groups
-
Develop AWS Lambda Function to list objects from AWS S3 Bucket
-
Passing Environment Variables to AWS Lambda Functions
-
Customizing Resources such as memory used for AWS Lambda Function
-
Understand Problem Statement for Python Application for AWS
-
Setup Python Project for AWS Lambda using Visual Studio Code
-
Core Logic to upload files to AWS S3 using Python boto3
-
Develop Python Application to upload files to AWS s3 using Python boto3
-
Build Zip File for Python Application to deploy as AWS Lambda Function
-
Deploy Python Application as AWS Lambda Function using Zip File
-
Conclusion and request for rating and feedback
-
-
-
Introduction to Build and Deploy AWS Lambda Function using Zip File
-
Update Application Code with Core logic for Ingestion
-
Overview of Validating User Defined Functions using Python CLI
-
Validate Application using Core Logic to ingest data
-
Add Lambda Handler to ingest data to AWS s3
-
Build Zip File for Python Application to deploy as AWS Lambda Function
-
Upload Python Application Zip File to s3 and deploy as AWS Lambda Function
-
Set Custom Handler and required Environment Variables for AWS Lambda Function
-
Granting Permissions on AWS s3 and Dynamodb to AWS Lambda Function via Role
-
Change Memory and Timeout for AWS Lambda Function and Test
-
Recap and Overview of Monitoring Lambda Functions using Cloudwatch
-
Limitations of Deploying AWS Lambda Function using Zip file
-
Automate Build of AWS Lambda Function using Shell Scripts
-
-
-
Introduction to Deploying AWS Lambda Functions using Python Runtime with Layers
-
Create Lambda Function to explore layers
-
Get list of Python Libraries installed in AWS Lambda Runtime
-
Add Existing AWS Layer to Lambda Function using Python runtime
-
Steps to Add and Configure Custom Layers to AWS Lambda Functions
-
Setup Local Environment using AWS Cloud Shell to Create Custom Layer
-
Install Required Dependencies for Lambda Layer for Python Runtime
-
Create Zip File and Upload to s3 with Python dependencies for AWS Lambda Layer
-
Create Lambda Layer using AWS Lambda Console using zip file in AWS s3
-
Configure Lambda Function with Custom Layer for Pandas and Requests
-
Troubleshoot and Fix the issues related to Lambda Layers for AWS Lambda Functions
-
Upload Zip File with Python boto3 to s3 for AWS Lambda Layer
-
Create Lambda Layer with latest version of Python boto3 for AWS Lambda Functions
-
Deploy AWS Lambda Function Sample Application with Layers
-
-
-
Overview of Data Warehousing using Amazon Serverless Redshift
-
-
-
Create Workgroup and Namespace for Amazon Redshift Serverless
-
Overview of Amazon Redshift Serverless Namespaces and Workgroups
-
Quick Preview of Amazon Redshift Serverless Dashboard
-
Validate Amazon Redshift Serverless Workgroup by running a query
-
Enable Public Accessbility to Redshift Serverless Workgroup
-
Understand Redshift Serverless Workgroup Capacity measured in RPUs
-
-
-
Introduction to Setup Redshift Spectrum Database using Redshift Serverless
-
Setup Files in S3 for Glue Catalog and Redshift Spectrum Database Tables
-
Cleanup Glue Catalog Database and Crawler using AWS Glue Console
-
Create Glue Crawler to Setup Glue Catalog Database and Tables for Redshift Spectrum
-
Run Glue Crawler to Create Glue Catalog Database and Tables for Redshift Spectrum
-
Create Redshift Serverless Workgroup and Namespace for Redshift Spectrum
-
Accessing Redshift using Jupyter Based Environment of VS Code
-
Create Database and User for Data Mart using AWS Redshift Query Editor
-
Create Database and User for Data Mart using Jupyter Notebooks
-
Create External Schema in Redshift Database using Glue Catalog Database
-
Validate External Schema Setup using Redshift Query Editor
-
-
-
Introduction to Basic SQL Queries using AWS Redshift SQL
-
Overview of Using WITH Clause in Redshift SQL Queries
-
Overview of Using Views in Redshift SQL Queries
-
Filtering Data using AWS Redshift SQL
-
Filtering Data using Boolean AND in Redshift SQL
-
Filtering Data using LIKE Operator in Redshift SQL
-
Filtering Data using Boolean OR and IN Operators in Redshift SQL
-
Overview of Count and Sum using Redshift SQL
-
Getting Total Average using Redshift SQL
-
Perform Total Aggregations based on Condition using Redshift SQL
-
Get Count and Distinct Count using Redshift SQL
-
Get Sum and Average on Order Item Measures using Redshift SQL
-
Perform Grouped Aggregations using Redshift SQL
-
Filtering on Aggregate Results using HAVING on GROUP BY
-
Overview of Order Of Execution of SQL using Group By and Having
-
Overview of Joins using Redshift Tables
-
-
-
Data Processing using Spark on Databricks
-
-
-
Process Data in DBFS using Databricks Spark SQL
-
Getting Started with Spark SQL Example using Databricks
-
Create Temporary Views using Spark SQL
-
Exercise to create temporary views using Spark SQL
-
Spark SQL Query to compute Daily Product Revenue
-
Save Query Result to DBFS using Spark SQL
-
-
-
Ranking using Spark SQL Windowing Functions
-
Create Temporary View for ranking using Spark SQL Windowing Functions
-
Compute Global Rank using Spark SQL Windowing Functions
-
Compute Ranks Per Key using Spark SQL Windowing Functions
-
Difference Between rank and dense_rank
-
Filter on Ranks using Spark SQL Windowing Functions
-
-
-
Overview of Pyspark Examples on Databricks
-
Process Schema Details in JSON using Pyspark
-
Create Dataframe with Schema from JSON File using Pyspark
-
Transform Data using Spark APIs
-
Get Schema Details for all Data Sets using Pyspark
-
Convert CSV to Parquet with Schema using Pyspark
-
-
-
Overview of Data Processing using Spark on EMR
-
-
-
Create bootstrap script for AWS EMR Cluster
-
Provision Elastic IP for Master Node of AWS EMR Cluster
-
Create AWS EMR Cluster for Development
-
Troubleshooting Issues related to Bootstrap of EMR Cluster
-
Fix Bootstrap Script for AWS EMR Cluster
-
Validate AWS EMR Cluster with Bootstrap Action with updated script
-
Get Cluster Details of AWS EMR Development Cluster using boto3
-
Getting Started with Boto3 to Manage AWS EMR Clusters
-
Set AWS Profile using env file in Visual Studio Code
-
Setup boto3 to explore APIs to manage AWS EMR Clusters
-
Setup Python Virtual Environment as part of VS Code Workspace
-
Associating Elastic Ip with AWS EMR Master Node using Boto3
-
Getting Instance Id of the Master Node of AWS EMR Cluster using boto3
-
Setup Notebook Environment for EMR Cluster using IAM User
-
Getting Allocation Id of the Elastic Ip using AWS boto3
-
-
-
Open Remote Window on AWS EMR Master Node using VS Code
-
Setup Workspace on AWS EMR Master using Git Repository
-
Best Practices and Advantages of using AWS EMR Cluster for Team Development
-
Install VSCode Extensions in remote Workspace for Python
-
Review Python and Pyspark details on EMR Cluster
-
Running Applications using local and yarn during development
-
Getting Started with Development of Spark Applications on EMR Cluster
-
Create Function for Spark Session
-
Upload Files to AWS s3 for the development using AWS EMR Cluster
-
Develop read logic for the Spark Application
-
Process Data Frame using Spark APIs
-
Write Data to Files using Spark APIs
-
Productionize the Code and setup required data sets for validation
-
Resize the AWS EMR Cluster using Web Console
-
Validate Changes to productionize the Application Code
-
Take the backup and terminate the cluster
-
-
-
Recreate the AWS EMR Cluster to deploy Spark Applications
-
Resize the AWS EMR Cluster to validate application on larger data sets
-
Build Zip File for the Spark Application
-
Setup Code Repository on the AWS EMR Master Node
-
Run Spark Application copied to s3 on EMR using Cluster Deployment Mode
-
Run Spark Application on EMR using Cluster Deployment Mode
-
Validate the Spark Application using zip file and client as deploy mode
-
Validate Spark Application Deployed as Step on AWS EMR Cluster
-
Deploy Spark Application as Step to the AWS EMR Cluster
-
-
-
Update Material related to Managing AWS EMR using Boto3
-
Create AWS EMR Cluster using AWS CLI Command
-
Manage AWS EMR Clusters using AWS CLI Commands
-
Overview of AWS boto3 to Manage AWS EMR Clusters
-
Overview of Run Job Flow API to create AWS EMR Cluster
-
Create AWS EMR Cluster or Job Flow Cluster using AWS Boto3
-
Prepare Data Sets to add Spark Application as Step to AWS EMR Cluster
-
Add Spark Application as Step to AWS EMR Cluster using Boto3
-
Exercise to add Spark Application as Step to EMR Cluster using boto3
-
Terminate the AWS EMR Cluster used for adding Steps
-
Exercise to Create AWS EMR Cluster with Steps for Spark Application
-
-
-
Overview of Orchestration using Step Functions and EMR
-
-
-
Review of Development Environment for AWS Step Functions and EMR
-
Quick Overview of Important Terms of AWS Step Functions
-
Getting Started with EMR based Pipeline using AWS Step Functions copy
-
Overview of AWS IAM Role associated with State Machine copy
-
Overview of Creating EMR Cluster using AWS Step Functions copy
-
Parameters to Create EMR Cluster using AWS Step Functions copy
-
Attach Permissions to Step Function Role to Create AWS EMR Cluster copy
-
Add Step to AWS EMR Cluster using AWS Step Function
-
Validate Adding Step to AWS EMR Cluster using Step Functions copy
-
Validate the execution of State Machine to run Spark Application on AWS EMR Cluster copy
-
Add Action to Step Machine to Terminate the AWS EMR Cluster
-
Terminate AWS EMR Clusters Created to Validate State Machine copy
-
-
-
Review the current state of AWS EMR based Pipeline or State Machine copy
-
Create State Machine using AWS Step Function to Validate s3 copy
-
Attach Policy with Permissions on AWS s3 to Step Function Role copy
-
Setup File in AWS s3 and Validate State Machine to list objects copy
-
Relationship between AWS Boto3 and Actions in Step Functions copy
-
Add State to Delete Object from AWS s3 copy
-
Fix Permissions and Run State Machine to Delete Object from AWS s3 copy
-
Passing Input to States in AWS Step Functions State Machine copy
-
Setup Multiple Files to Manage AWS s3 Objects using State Machines copy
-
Process AWS s3 Objects using Map in State Machine
-
Extract Key of AWS s3 Objects using Step Functions Pass
-
Add State to AWS Step Function Delete s3 Object
-
Develop AWS Lambda Function to customise State Machine Data
-
Add AWS Lambda Function to State Machine to Pass s3 Details for delete
-
Add Condition to State Machine to avoid Key Error on AWS s3 List Objects
-
Overview of Map Concurrency in State Machines of AWS Step Functions
-
Invoking AWS Step Function State Machine from Other State Machines
-
Overview of integration of s3 based State Machine with EMR State Machine
-
-
-
Taking back up of AWS Step Functions State Machines
-
Grant Permissions between AWS Step Functions State Machines via IAM Role
-
Update AWS Step Function State Machine with EMR to validate s3
-
Pass EMR Step Details to AWS Step Functions State
-
Validate AWS Step Function EMR based State Machine Execution
-
Run AWS Step Function State Machine to validate logic to delete AWS s3 Objects
-
Exercise to add validation of source s3 location in AWS Step Function State Machine
-
Update AWS Step Function State Machine to Validate Source s3 Location
-
Run AWS Step Function State Function with source s3 Validation Logic
-
Develop AWS Lambda Function to check number of files in source s3
-
Attach Policy to State Machine Role to Invoke AWS Lambda Function
-
Run Updated State Machine to validate source count
-
Best Practices to Run AWS Step Functions State Machines
-
-
-
Setup AWS EMR Cluster to develop applications using Spark SQL
-
Setup Visual Studio Code Workspace using AWS EMR Master Node
-
Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
-
Setup Required Data Sets for Spark SQL
-
Upload Retail DB Files to AWS s3 using AWS CLI commands
-
Getting Started with Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster
-
Create Spark SQL Temporary Views for Orders and Order Items
-
Join and Aggregate using Spark SQL on AWS EMR Cluster
-
Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
-
Develop Script using Spark SQL Commands
-
Parameterize Bucket Name in Spark SQL Script
-
Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
-
Deploy Spark SQL Script as Step on AWS EMR Cluster
-
Conclusion to Develop Spark SQL Applications on EMR Cluster
-
-
-
Create State Machine to Deploy Spark SQL Script on AWS EMR Cluster
-
Overview of Managing AWS EMR Clusters using Boto3
-
Overview of AWS boto3 to Manage AWS EMR Clusters
-
Create AWS EMR Job Flow Cluster using Python Boto3
-
Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
-
Overview of AWS EMR Waiters using Python Boto3
-
Terminate AWS EMR Cluster using waiters and Python Boto3
-
Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
-
Create State Machine using AWS Step Function to create EMR Cluster
-
Grant Permissions to State Machine via Role to Create AWS EMR Cluster
-
Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
-
Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
-
Pass AWS EMR Step Details as Input to State Machine at Execution Time
-
Validate Spark SQL Script Execution as AWS EMR Step using State Machine
-
-
-
Overview of Integration of Spark and Redshift
-
-
-
Create AWS EC2 Elastic IP and Key Pair for AWS EMR Cluster
-
Create Shell Script for AWS EMR Bootstrap Action to install boto3
-
Create AWS EMR Cluster to integrate with Amazon Redshift
-
Attach Elastic IP to the AWS EMR Master Node and Validate SSH Connectivity
-
Setup Project for AWS EMR and Redshift Integration using VS Code Remote Development
-
Setup Amazon Redshift Serverless Workgroup and Validate Connetivity
-
Connect to Redshift Serverless Workgroup from AWS EMR Master using psql
-
Setup Required Database and User in Amazon Redshift Serverless Workgroup
-
Install Python Library psycopg2 to connect to Redshift Databases using Python
-
Validate Redshift Connectivity using Python from AWS EMR Master Node
-
Create and Validate Redshift Database Tables
-
Create Secret for Redshift Database using AWS Secrets Manager
-
Validate Python Boto3 on Master Node of AWS EMR Cluster
-
Read Secret from AWS Secrets Manager using Python Boto3
-
Validate Redshift Connectivity from Master Node of AWS EMR Cluster
-
Launch Pyspark CLI with Redshift Dependencies on AWS EMR Master Node
-
Validate Redshift Connectivity using Spark on AWS EMR Cluster
-
Develop Code to Validate Spark and Redshift Integration using EMR
-
Setup GHActivity Data in AWS s3
-
Read and Process Data using Pyspark to write into Redshift Table
-
Develop Write Logic to load Spark Dataframe into Redshift Table
-
Validate Spark Load Process to Amazon Redshift Table
-
Understanding AWS s3 Temp Location specified in Spark Applications
-
Conclusion on Integration of AWS EMR with Amazon Redshift
-
-
-
Setup AWS EMR Cluster to develop applications using Spark SQL
-
Setup Visual Studio Code Workspace using AWS EMR Master Node
-
Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
-
Setup Required Data Sets for Spark SQL
-
Upload Retail DB Files to AWS s3 using AWS CLI commands
-
Getting Started with Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster
-
Create Spark SQL Temporary Views for Orders and Order Items
-
Join and Aggregate using Spark SQL on AWS EMR Cluster
-
Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
-
Develop Script using Spark SQL Commands
-
Parameterize Bucket Name in Spark SQL Script
-
Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
-
Deploy Spark SQL Script as Step on AWS EMR Cluster
-
Conclusion to Develop Spark SQL Applications on EMR Cluster
-
-
-
Introduction to Integration of AWS Lambda Functions and Redshift
-
Setup Redshift Serverless Workgroup and Namespace
-
Setup Workspace for Integration of AWS Lambda Functions and Redshift
-
Validate JSON Data in AWS s3 using Pandas
-
Get Redshift Cluster Details using Python boto3
-
Get Redshift Serverless Details using Python Boto3
-
Run SQL Queries using Redshift Serverless and Python Boto3
-
Capture Redshift Query Results using Python Boto3
-
Create Database and User in Redshift Serverless Namespace
-
Create Table in Redshift Serverless Namespace
-
Overview of Python Boto3 Waiters
-
Run Queries against Redshift Table using Boto3 without credentials
-
Create and Validate Secret using AWS Secrets Manager for Redshift Workgroup
-
Copy Processed Data from AWS s3 into Redshift Table
-
Conclusion on Developing Applications using Redshift and Python Boto3
-
-
-
Overview of Data Pipelines using EMR and Redshift
-
-
-
Introduction to Integration of AWS Lambda Functions and Redshift
-
Getting Started with Lambda Function using boto3
-
Running Lambda Function using AWS Lambda Console
-
Troubleshoot issues of AWS Lambda Functions using Cloudwatch Logs
-
Check Python Boto3 Version in AWS Lambda Function Run Time Environment
-
Overview of adding Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
-
Copy Zip File with Latest Boto3 to AWS s3 for Lambda Layer
-
Create Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
-
Create Function to Copy Data into Redshift Table using boto3
-
Update Lambda Handler to copy data to Redshift Table
-
Grant Permissions on Redshift Secret to AWS Lambda Function via IAM Role
-
Grant Permissions on Redshift Data API to AWS Lambda Function via IAM Role
-
Review Redshift Workgroup and Truncate Table before running Lambda Function
-
Run AWS Lambda Function to Copy Data to Redshift Table
-
Validate Data Copied by AWS Lambda Function in Redshift Table by running queries
-
-
-
Introduction to Data Pipeline using AWS Step Functions with EMR and Redshift
-
Getting Started with State Machines or Data Pipelines using AWS Step Functions
-
Review Execution Details of State Machine or Data Pipeline using AWS Step Functions
-
Manage State Machines using AWS Step Functions State Machines Dashboard
-
Create State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Table
-
Update State Machine with Permissions on Lambda to Copy Data From s3 to Redshift Table
-
Run State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Table
-
Overview of Managing AWS EMR Clusters using Boto3
-
Overview of AWS boto3 to Manage AWS EMR Clusters
-
Create AWS EMR Job Flow Cluster using Python Boto3
-
Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
-
Overview of AWS EMR Waiters using Python Boto3
-
Terminate AWS EMR Cluster using waiters and Python Boto3
-
Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
-
Create State Machine using AWS Step Function to create EMR Cluster
-
Grant Permissions to State Machine via Role to Create AWS EMR Cluster
-
Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
-
Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
-
Pass AWS EMR Step Details as Input to State Machine at Execution Time
-
Validate Spark SQL Script Execution as AWS EMR Step using State Machine
-
Create Data Pipeline with EMR and Redshift Integration using AWS Step Functions
-
Grant Permissions on AWS EMR to role of State Machine with EMR and Redshift Integration
-
Run AWS Step Function State Machine with EMR and Redshift Integration
-
Validate AWS State Machine Execution with EMR and Redshift Integration
-
Best Practices to Build State Machines with AWS EMR and Redshift Integration
-
-
-
Overview of Glue Components and Glue Catalog
-
-
-
Introduction - Overview of Glue Components
-
Create Crawler and Catalog Table
-
Analyze Data using Athena
-
Creating S3 Bucket and Role
-
Create and Run the Glue Job
-
Validate using Glue CatalogTable and Athena
-
Create and Run Glue Trigger
-
Create Glue Workflow
-
Run Glue Workflow and Validate
-
-
-
Prerequisites for Glue Catalog Tables
-
Steps for Creating Catalog Tables
-
Download Data Set
-
Upload data to s3
-
Create Glue Catalog Database - itvghlandingdb
-
Create Glue Catalog Table - ghactivity
-
Running Queries using Athena - ghactivity
-
Crawling Multiple Folders
-
Managing Glue Catalog using AWS CLI
-
Managing Glue Catalog using Python Boto3
-
-
-
Data Analysis using Amazon Athena
-
-
-
Getting Started with Amazon Athena
-
Quick Recap of Glue Catalog Databases and Tables
-
Access Glue Catalog Databases and Tables using Athena Query Editor
-
Create Database and Table using Athena
-
Populate Data into Table using Athena
-
Using CTAS to create tables using Athena
-
Overview of Amazon Athena Architecture
-
Amazon Athena Resources and relationship with Hive
-
Create Partitioned Table using Athena
-
Develop Query for Partitioned Column
-
Insert into Partitioned Tables using Athena
-
Validate Data Partitioning using Athena
-
Drop Athena Tables and Delete Data Files
-
Drop Partitioned Table using Athena
-
Data Partitioning in Athena using CTAS
-
-
-
Amazon Athena using AWS CLI - Introduction
-
Get help and list Athena databases using AWS CLI
-
Managing Athena Workgroups using AWS CLI
-
Run Athena Queries using AWS CLI
-
Get Athena Table Metadata using AWS CLI
-
Run Athena Queries with custom location using AWS CLI
-
Drop Athena table using AWS CLI
-
Run CTAS under Athena using AWS CLI
-
-
-
Amazon Athena using Python boto3 - Introduction
-
Getting Started with Managing Athena using Python boto3
-
List Amazon Athena Databases using Python boto3
-
List Amazon Athena Tables using Python boto3
-
Run Amazon Athena Queries with boto3
-
Review Athena Query Results using boto3
-
Persist Amazon Athena Query Results in Custom Location using boto3
-
Processing Athena Query Results using Pandas
-
Run CTAS against Amazon Athena using Python boto3
-
About this course
- $300.00
- 663 lessons
- 52 hours of video content