AWS Data Engineer Bootcamp

Buy $300.00

Course curriculum

1. Overview of SQL Revision for Data Engineering
1. Introduction to SQL Revision for Data Engineering
2. Overview of Application Architecture and RDBMS
3. Overview of Database Technologies and relevance of SQL
4. Overview of Purpose Built Databases
5. Overview of Data Warehouse and Data Lake
6. Usage of RDBMS and Data Warehouse technologies
7. Differences and Similarities between RDBMS and Data Warehouse Technologies
1. Introduction to Setting up Tools for Data Engineering Essentials
2. Setup Git on Windows for Code Versioning
3. Setup VS Code on Windows
4. Setup Python 3.9 on Windows
5. Configure Environment Variable PATH for Python on Windows
6. Overview of learning Python using Python CLI
7. Integrate VSCode with Python on Windows
8. Install Postgres 14 on Windows 11
9. Getting Started with pgAdmin on Windows
10. Getting Started with pgAdmin on Mac
11. Conclusion of Setting up Tools for Data Engineering Essentials
1. Overview of Postgres Database Server and pgAdmin
2. Overview of Database Connection Details
3. Overview of Connecting to External Databases using pgAdmin
4. Create Application Database and User in Postgres Database Server
5. Clone Data Sets from Git Repository for Database Scripts
6. Register Server in pgAdmin using Application Database and User
7. Setup Application Tables and Data in Postgres Database
8. Overview of pgAdmin to write SQL Queries
1. Review Data Model Diagram
2. Define Problem Statement for SQL Queries
3. Filtering Data using SQL Queries
4. Total Aggregations using SQL Queries
5. Group By Aggregations using SQL Queries
6. Order of Execution of SQL Queries
7. Rules and Restrictions to Group and Filter Data in SQL queries
8. Filter Data based on Aggregated Results using Group By and Having
9. Inner Joins using SQL Queries
10. Outer Joins using SQL Queries
11. Filter and Aggregate on Join Results using SQL
12. Overview of Database Views
13. Overview of Common Table Expressions or CTEs
14. Outer Join with Additional Conditions in SQL Queries
15. Explanation about Fix of SQL Queries with Filtering on Outer Join Results
1. Introduction to Cumulative Aggregations and Ranking in SQL Queries
2. Overview of CTAS to create tables based on Query Results
3. Create Tables for Cumulative Aggregations and Ranking
4. Overview of OVER and PARTITION BY Clause in SQL Queries
5. Compute Total Aggregation using OVER and PARTITION BY in SQL Queries
6. Overview of Ranking in SQL
7. Compute Global Ranks using SQL
8. Compute Ranks based on key using SQL
9. Rules and Restrictions to Filter Data based on Ranks in SQL
10. Filtering based on Global Ranks using Nested Queries and CTEs in SQL
11. Filtering based on Ranks per Partition using Nested Queries and CTEs in SQL
12. Create Students table with Data for ranking using SQL
13. Difference between rank and dense rank using SQL
1. Introduction to SQL Troubleshooting and Debugging Guide
2. Overview of Database Connectivity Issues
3. Validate and Setup Telnet on Mac or PC
4. Validate Connectivity to Database Server using telnet
5. Troubleshoot Database Connectivity Issue with Correct Host Details
6. Current Databases and Users in Postgres Database Server
7. Troubleshoot Database Credentials and Permissions Issues
8. Overview of Compilation of SQL Queries
9. Troubleshooting Syntax Errors in SQL Queries
10. Troubleshooting Semantec Errors in SQL Queries
11. Overview of Bugs in SQL Queries
12. Development Best Practices with tips to troubleshoot SQL bugs
13. Develop Initial Solution based on the requirement
14. Identify and Troubleshoot Bugs in SQL Queries
15. Develop Solution using Development Best Practices
1. Introduction to Performance Tuning of SQL Queries
2. Overview of SQL Compilation Process and Explain Plans
3. Generate Explain Plans for SQL Queries
4. Review Tables used for Performance Tuning of SQL Queries
5. Review Data Storage Internals for Tables and Indexes
6. Review key terms used in Explain Plans for SQL Queries
7. Interpret Explain Plans for Basic SQL Queries
8. Review the Common Application Scenarios for Performance Tuning
9. Write SQL Queries for Customer Orders
10. Performance Testing of SQL Queries using Stored Procedure
11. Add Required Indexes to tune performance of SQL Queries
12. Guidelines on adding Indexes on Tables for SQL Queries
13. Interpreting the explain plan for SQL Queries using Indexes
14. Conclusion of Performance Tuning of SQL Queries
1. Simple Exercises for Filtering and Aggregations
2. Exercises on Joins and Aggregations using SQL
1. Solutions for Filtering and Aggregations
2. Solutions for Filtering and Aggregations
3. Validate Data and Review Data Model Diagram
4. Solution for Exercise 1 to get Customer Order Count
5. Solution for Exercise 2 to get Dormant Customers using Outer Join
6. Solution for Exercise 3 to get Revenue Per Customer using Outer Join
7. Solution for Exercise 4 to get Revenue Per Category
8. Solution for Exercise 5 to get Product Count Per Department
1. SQL - Frequently Asked Interview Questions
2. Tips for Technical Questions
3. How much do you rate your self in SQL?
4. What all you have done using SQL?
5. What is the difference between Truncate and Delete?
6. What are the different types of constraints you have used?
7. What is the difference between Primary Key and Unique Constraint?
8. What is the difference between Primary Key and Foreign Key Constraint?
9. Can a table have more than one Unique Constraint?
10. What happens to the data in the child table's foreign key column when data in the parent table is deleted?
11. What all different types of joins you have used?
12. What is the difference between inner join and outer join?
13. What is a full outer join?
14. What is the difference between WHERE and HAVING?
15. What is a view and how is it different from a table?
16. What is CTAS and how it can be used to create a table with structure but no data?
1. Overview of Python Revision for Data Engineering
2. Setup Material - Python Essentials for Data Engineering
1. Setup Visual Studio Workspace for Python Application Development
2. Setup Notebook Environment in VS Code Workspace
3. Overview of VS Code Notebook Environment
4. Overview of Cells in VS Code Notebook
5. Defining Functions in VS Code Notebooks
6. Run the Code in VS Code Notebook Cell by Line
7. Constants and Variables in Python
8. Overview of Python Data Types
9. Getting help on Python Variables and Functions
10. Pre-Defined String Manipulation Functions
11. Overview of Python Lists
12. Loops and Conditions in Python
13. User Defined Functions in Python
1. Overview of File IO using Python
2. Read Data from CSV File into Python List
3. Overview of Python Collections
4. Getting Started with Processing Python Lists
5. Overview of Lambda Functions in Python
6. Usage of Lambda Functions
7. Filter Data in Python Lists using filter and lambda
8. Get unique values from list using map and set
9. Sort Python lists using key
10. Overview of JSON Strings and Files
11. Read JSON Strings to Python dicts or lists
12. Read JSON Schemas from file to Python dicts
13. Overview of Processing JSON Data using Python
14. Extract Details from Complex JSON Arrays using Python
15. Sort Data in JSON Arrays using Python
16. Create Function to get Column Details from Schemas JSON File
1. Overview of Pandas for Data Processing
2. Overview of Reading CSV Data using Pandas
3. Read Data from CSV Files to Pandas Dataframes
4. Filter Data in Pandas Dataframe using query
5. Get Count by Status using Pandas Dataframe APIs
6. Get count by Month and Status using Pandas Dataframe APIs
7. Create Dataframes using dynamic column list on CSV Data
8. Performing Inner Join between Pandas Dataframes
9. Perform Aggregations on Join results
10. Sort Data in Pandas Dataframes
11. Overview of Writing Pandas Dataframes to Files
12. Write Pandas Dataframes to JSON Files
1. Introduction to Troubleshooting and Debugging Python issues
2. Guidelines for Troubleshooting and Debugging Python related Issues
3. Overview of Database Connectivity using Python Applications
4. Overview of Database Connectivity using Python
5. Troubleshoot Network Connectivity to the Database Server using telnet
6. Troubleshoot Module Related issues for Database Connectivity using Python
7. Troubleshoot Credentials Related issues for Database Connectivity using Python
8. Overview of Python process to run Python Applications
9. Troubleshooting Compilation Errors in Python
10. Troubleshooting Run Time Errors in Python
11. Overview of Software Development Life Cycle
12. Overview of Unit Testing or Validation of Applications
13. Overview of Debugging VS Code Notebooks using Debug Feature
14. Debug VS Code Notebooks using Debug Feature
15. Getting Started with Debugging of Python Programs using VS Code
16. Recap of running File Format Converter application
17. Debug Python Application using VS Code with breakpoints
18. Managing Breakpoints for Debugging in VS Code
19. Conclusion to Troubleshooting and Debugging Python Issues
1. Introduction to Performance of Python Applications
2. Setup Database Loader Python Application
3. Ensure Postgres Database is setup for file to db loader Python Application
4. Cleanup the tables to run file to db loader application
5. Run and Validate File to DB Loader Application
6. Fix the error message in file to db loader application
7. Overview of Execution of file to db loader application
8. Performance Tuning using Chunksize in Pandas
9. Review Pandas Data Frame API to load data into the target table
10. Overview of multi or batch insert into Database Tables
11. Develop application for multiprocessing
12. Getting Started with Multiprocessing using Python
13. Invoking User Defined Functions using multiprocessing in Python
14. Refactor File to Database Loader Application for Multiprocessing
15. Add Parallel Processing to file to db loader Python Application
16. Validate File to DB Loader Application with Multiprocessing
17. Understanding the concept of Multiprocessing in Python
18. Performance Tuning Scenarios of Python Applications
1. Project 1 Handout - File Format Converter
2. Get File Names to be processed using glob
3. Get Column Names using Schemas File
4. Get Data Set Names from File Names or Paths using regular expressions
5. Read CSV Data into Pandas Dataframe with Schema Dynamically
6. Generate File Paths for Target JSON Files Dynamically
7. Recap of Writing Pandas Dataframe to JSON File
8. Write Pandas Dataframe to JSON Files
9. Modularize File Format Converter for Dataset
10. Wrapper to Process all Data Sets
11. Setup Project for File Format Converter using Python
12. Install Dependencies for the Python Project using pip
13. Add Core Logic to Python Application
14. Overview of Run-time Arguments and Environment Variables
15. Using Run Time Arguments in Python Applications
16. Overview of Environment Variables
17. Setting Environment Variables on Windows or Mac or Linux
18. Use Environment Variables in Python Applications
19. Use Environment Variables in File Format Converter
20. Pass JSON Array as argument to Python Applications
21. Pass Data Sets as Run Time Arguments to File Format Converter
22. Exception Handling in Python Applications
23. Raising Exceptions in Python Applications
24. Exception Handling in File Format Converter Application
1. Project 2 Handout - Files To Database Loader
2. Install Python Dependencies for Pandas and Database Integration
3. Run Queries from Notebook using SQL Magic
4. Validate Pandas and SQL Integration
5. Write CSV Data from File to Database Table
6. Write CSV Data from Files to Database Tables in Chunks
7. Overview of Deploying File to DB Loader Project
1. Project 3 Handout - Rest Payload to the DB Loader Essentials
2. Processing JSON Data - Introduction
3. Overview of JSON
4. JSON Data Types
5. Create JSON String
6. Process JSON String
7. Single JSON Document in Files
8. Multiple JSON Documents in files
9. Process JSON using Pandas
10. Different JSON Formats supported by Pandas
11. Common Use Cases for JSON
12. Write to JSON files using json module
13. Write to JSON files using pandas
14. Overview of REST APIs
15. Using curl command
16. Overview of Postman
17. Getting Started with requests
18. Convert REST Payload to Python Objects
19. Process REST Payload using Collection Operations
20. Process REST Payload using Pandas
1. Python - Frequently Asked Interview Questions
2. How much do you rate your self in Python?
3. Can you elaborate your experience in Python?
4. What all Python Libraries or modules you have used?
5. Which library do you use for the data processing?
6. If you have to read the data from REST API, which library do you use?
7. What are the different Python collections or Data Structures?
8. What is the difference between list, set, dict and tuple?
9. How do you sort the data in a Python list? What is the purpose of keyword argument key?
10. What is the difference between sort and sorted?
11. What is Python Virtual Environment and what are the advantages of using Python Virtual Environment?
12. What is pip? How do you organize and install the required dependent libraries to the current project?
13. How do you check if file exists in a given path (Hint: using os module)?
14. How can you check the data type of a Python Variable?
1. Overview of Build and Deploy AWS Lambda Functions
1. Introduction to Getting Started on Windows with Required Tools
2. Overview of Powershell on Windows 10 or Windows 11
3. Setup Ubuntu VM on Windows 10 or 11 using wsl
4. Setup Ubuntu VM on Windows 10 or 11 using wsl
5. Setup Docker Desktop on Windows
6. Validate Docker on Windows using Command Line leveraging Power Shell
7. Review Docker Desktop Resource Configurations
8. Install Visual Studio Code on Windows
9. Install Remote Development Extension Kit for Visual Studio Code
10. Install Python 3.9 and Distutils on Windows using wsl Ubuntu
11. Review Tools Installed for Application Development using Python and AWS Services
1. Setup Project Folder using Visual Studio Code
2. Ensure Python 3.9 for the Project
3. Create Python Virtual Environment using Python 3.9 for the project
4. Install Required Dependencies for the Project using AWS Services
5. Ensure AWS CLI to interact with AWS Services using AWS CLI Commands
6. Recommendation to use Personal AWS Account for the course
1. Setup and Login into AWS Account
2. Setup AWS IAM User with Administrator Permissions
3. Configure and Validate AWS CLI
4. Configure AWS CLI with custom profile as default
5. Recap of Date Arithmetic using Python
6. Validate Python boto3 to interact with AWS Services
7. Setup and Validate Jupyter based Interactive Environment
8. Review GHActivity Data Details
9. Download GHActivity Data using requests
10. Review GHActivity Data using Pandas
1. Managing s3 using Python boto3
2. Overview of AWS Dynamodb
3. Create DynamoDB Table for Job Details
4. Create DynamoDB Table for Job Run Details
5. Recap of Date Arithmetic using Python
6. Get First Run Details to Copy GHActivity Data to AWS s3
7. Get Incremental Load Logic for next file
8. Understand AWS s3 concepts such as buckets and objects
9. Copying or Uploading Files to AWS s3 as objects using Python boto3
10. Writing Python Objects or Data as AWS s3 Objects using boto3
11. Convert Date Time to Integer Unix Epoch using Python
12. Validate Data Copied to AWS s3 and job run details
13. Run and Validate End to End Process
1. Overview of AWS Lambda and Getting Started using Python 3.9 Runtime
2. Passing Arguments to AWS Lambda and Processing using Python
3. Using Custom Handlers for AWS Lambda Functions using Python 3.9
4. Using AWS Services such as s3 in AWS Lambda Functions
5. Recap of handling permissions using AWS IAM Roles and User Groups
6. Develop AWS Lambda Function to list objects from AWS S3 Bucket
7. Passing Environment Variables to AWS Lambda Functions
8. Customizing Resources such as memory used for AWS Lambda Function
9. Understand Problem Statement for Python Application for AWS
10. Setup Python Project for AWS Lambda using Visual Studio Code
11. Core Logic to upload files to AWS S3 using Python boto3
12. Develop Python Application to upload files to AWS s3 using Python boto3
13. Build Zip File for Python Application to deploy as AWS Lambda Function
14. Deploy Python Application as AWS Lambda Function using Zip File
15. Conclusion and request for rating and feedback
1. Introduction to Build and Deploy AWS Lambda Function using Zip File
2. Update Application Code with Core logic for Ingestion
3. Overview of Validating User Defined Functions using Python CLI
4. Validate Application using Core Logic to ingest data
5. Add Lambda Handler to ingest data to AWS s3
6. Build Zip File for Python Application to deploy as AWS Lambda Function
7. Upload Python Application Zip File to s3 and deploy as AWS Lambda Function
8. Set Custom Handler and required Environment Variables for AWS Lambda Function
9. Granting Permissions on AWS s3 and Dynamodb to AWS Lambda Function via Role
10. Change Memory and Timeout for AWS Lambda Function and Test
11. Recap and Overview of Monitoring Lambda Functions using Cloudwatch
12. Limitations of Deploying AWS Lambda Function using Zip file
13. Automate Build of AWS Lambda Function using Shell Scripts
1. Introduction to Deploying AWS Lambda Functions using Python Runtime with Layers
2. Create Lambda Function to explore layers
3. Get list of Python Libraries installed in AWS Lambda Runtime
4. Add Existing AWS Layer to Lambda Function using Python runtime
5. Steps to Add and Configure Custom Layers to AWS Lambda Functions
6. Setup Local Environment using AWS Cloud Shell to Create Custom Layer
7. Install Required Dependencies for Lambda Layer for Python Runtime
8. Create Zip File and Upload to s3 with Python dependencies for AWS Lambda Layer
9. Create Lambda Layer using AWS Lambda Console using zip file in AWS s3
10. Configure Lambda Function with Custom Layer for Pandas and Requests
11. Troubleshoot and Fix the issues related to Lambda Layers for AWS Lambda Functions
12. Upload Zip File with Python boto3 to s3 for AWS Lambda Layer
13. Create Lambda Layer with latest version of Python boto3 for AWS Lambda Functions
14. Deploy AWS Lambda Function Sample Application with Layers
1. Overview of Data Warehousing using Amazon Serverless Redshift
1. Create Workgroup and Namespace for Amazon Redshift Serverless
2. Overview of Amazon Redshift Serverless Namespaces and Workgroups
3. Quick Preview of Amazon Redshift Serverless Dashboard
4. Validate Amazon Redshift Serverless Workgroup by running a query
5. Enable Public Accessbility to Redshift Serverless Workgroup
6. Understand Redshift Serverless Workgroup Capacity measured in RPUs
1. Introduction to Setup Redshift Spectrum Database using Redshift Serverless
2. Setup Files in S3 for Glue Catalog and Redshift Spectrum Database Tables
3. Cleanup Glue Catalog Database and Crawler using AWS Glue Console
4. Create Glue Crawler to Setup Glue Catalog Database and Tables for Redshift Spectrum
5. Run Glue Crawler to Create Glue Catalog Database and Tables for Redshift Spectrum
6. Create Redshift Serverless Workgroup and Namespace for Redshift Spectrum
7. Accessing Redshift using Jupyter Based Environment of VS Code
8. Create Database and User for Data Mart using AWS Redshift Query Editor
9. Create Database and User for Data Mart using Jupyter Notebooks
10. Create External Schema in Redshift Database using Glue Catalog Database
11. Validate External Schema Setup using Redshift Query Editor
1. Introduction to Basic SQL Queries using AWS Redshift SQL
2. Overview of Using WITH Clause in Redshift SQL Queries
3. Overview of Using Views in Redshift SQL Queries
4. Filtering Data using AWS Redshift SQL
5. Filtering Data using Boolean AND in Redshift SQL
6. Filtering Data using LIKE Operator in Redshift SQL
7. Filtering Data using Boolean OR and IN Operators in Redshift SQL
8. Overview of Count and Sum using Redshift SQL
9. Getting Total Average using Redshift SQL
10. Perform Total Aggregations based on Condition using Redshift SQL
11. Get Count and Distinct Count using Redshift SQL
12. Get Sum and Average on Order Item Measures using Redshift SQL
13. Perform Grouped Aggregations using Redshift SQL
14. Filtering on Aggregate Results using HAVING on GROUP BY
15. Overview of Order Of Execution of SQL using Group By and Having
16. Overview of Joins using Redshift Tables
1. Data Processing using Spark on Databricks
1. Process Data in DBFS using Databricks Spark SQL
2. Getting Started with Spark SQL Example using Databricks
3. Create Temporary Views using Spark SQL
4. Exercise to create temporary views using Spark SQL
5. Spark SQL Query to compute Daily Product Revenue
6. Save Query Result to DBFS using Spark SQL
1. Ranking using Spark SQL Windowing Functions
2. Create Temporary View for ranking using Spark SQL Windowing Functions
3. Compute Global Rank using Spark SQL Windowing Functions
4. Compute Ranks Per Key using Spark SQL Windowing Functions
5. Difference Between rank and dense_rank
6. Filter on Ranks using Spark SQL Windowing Functions
1. Overview of Pyspark Examples on Databricks
2. Process Schema Details in JSON using Pyspark
3. Create Dataframe with Schema from JSON File using Pyspark
4. Transform Data using Spark APIs
5. Get Schema Details for all Data Sets using Pyspark
6. Convert CSV to Parquet with Schema using Pyspark
1. Overview of Data Processing using Spark on EMR
1. Create bootstrap script for AWS EMR Cluster
2. Provision Elastic IP for Master Node of AWS EMR Cluster
3. Create AWS EMR Cluster for Development
4. Troubleshooting Issues related to Bootstrap of EMR Cluster
5. Fix Bootstrap Script for AWS EMR Cluster
6. Validate AWS EMR Cluster with Bootstrap Action with updated script
7. Get Cluster Details of AWS EMR Development Cluster using boto3
8. Getting Started with Boto3 to Manage AWS EMR Clusters
9. Set AWS Profile using env file in Visual Studio Code
10. Setup boto3 to explore APIs to manage AWS EMR Clusters
11. Setup Python Virtual Environment as part of VS Code Workspace
12. Associating Elastic Ip with AWS EMR Master Node using Boto3
13. Getting Instance Id of the Master Node of AWS EMR Cluster using boto3
14. Setup Notebook Environment for EMR Cluster using IAM User
15. Getting Allocation Id of the Elastic Ip using AWS boto3
1. Open Remote Window on AWS EMR Master Node using VS Code
2. Setup Workspace on AWS EMR Master using Git Repository
3. Best Practices and Advantages of using AWS EMR Cluster for Team Development
4. Install VSCode Extensions in remote Workspace for Python
5. Review Python and Pyspark details on EMR Cluster
6. Running Applications using local and yarn during development
7. Getting Started with Development of Spark Applications on EMR Cluster
8. Create Function for Spark Session
9. Upload Files to AWS s3 for the development using AWS EMR Cluster
10. Develop read logic for the Spark Application
11. Process Data Frame using Spark APIs
12. Write Data to Files using Spark APIs
13. Productionize the Code and setup required data sets for validation
14. Resize the AWS EMR Cluster using Web Console
15. Validate Changes to productionize the Application Code
16. Take the backup and terminate the cluster
1. Recreate the AWS EMR Cluster to deploy Spark Applications
2. Resize the AWS EMR Cluster to validate application on larger data sets
3. Build Zip File for the Spark Application
4. Setup Code Repository on the AWS EMR Master Node
5. Run Spark Application copied to s3 on EMR using Cluster Deployment Mode
6. Run Spark Application on EMR using Cluster Deployment Mode
7. Validate the Spark Application using zip file and client as deploy mode
8. Validate Spark Application Deployed as Step on AWS EMR Cluster
9. Deploy Spark Application as Step to the AWS EMR Cluster
1. Update Material related to Managing AWS EMR using Boto3
2. Create AWS EMR Cluster using AWS CLI Command
3. Manage AWS EMR Clusters using AWS CLI Commands
4. Overview of AWS boto3 to Manage AWS EMR Clusters
5. Overview of Run Job Flow API to create AWS EMR Cluster
6. Create AWS EMR Cluster or Job Flow Cluster using AWS Boto3
7. Prepare Data Sets to add Spark Application as Step to AWS EMR Cluster
8. Add Spark Application as Step to AWS EMR Cluster using Boto3
9. Exercise to add Spark Application as Step to EMR Cluster using boto3
10. Terminate the AWS EMR Cluster used for adding Steps
11. Exercise to Create AWS EMR Cluster with Steps for Spark Application
1. Overview of Orchestration using Step Functions and EMR
1. Review of Development Environment for AWS Step Functions and EMR
2. Quick Overview of Important Terms of AWS Step Functions
3. Getting Started with EMR based Pipeline using AWS Step Functions copy
4. Overview of AWS IAM Role associated with State Machine copy
5. Overview of Creating EMR Cluster using AWS Step Functions copy
6. Parameters to Create EMR Cluster using AWS Step Functions copy
7. Attach Permissions to Step Function Role to Create AWS EMR Cluster copy
8. Add Step to AWS EMR Cluster using AWS Step Function
9. Validate Adding Step to AWS EMR Cluster using Step Functions copy
10. Validate the execution of State Machine to run Spark Application on AWS EMR Cluster copy
11. Add Action to Step Machine to Terminate the AWS EMR Cluster
12. Terminate AWS EMR Clusters Created to Validate State Machine copy
1. Review the current state of AWS EMR based Pipeline or State Machine copy
2. Create State Machine using AWS Step Function to Validate s3 copy
3. Attach Policy with Permissions on AWS s3 to Step Function Role copy
4. Setup File in AWS s3 and Validate State Machine to list objects copy
5. Relationship between AWS Boto3 and Actions in Step Functions copy
6. Add State to Delete Object from AWS s3 copy
7. Fix Permissions and Run State Machine to Delete Object from AWS s3 copy
8. Passing Input to States in AWS Step Functions State Machine copy
9. Setup Multiple Files to Manage AWS s3 Objects using State Machines copy
10. Process AWS s3 Objects using Map in State Machine
11. Extract Key of AWS s3 Objects using Step Functions Pass
12. Add State to AWS Step Function Delete s3 Object
13. Develop AWS Lambda Function to customise State Machine Data
14. Add AWS Lambda Function to State Machine to Pass s3 Details for delete
15. Add Condition to State Machine to avoid Key Error on AWS s3 List Objects
16. Overview of Map Concurrency in State Machines of AWS Step Functions
17. Invoking AWS Step Function State Machine from Other State Machines
18. Overview of integration of s3 based State Machine with EMR State Machine
1. Taking back up of AWS Step Functions State Machines
2. Grant Permissions between AWS Step Functions State Machines via IAM Role
3. Update AWS Step Function State Machine with EMR to validate s3
4. Pass EMR Step Details to AWS Step Functions State
5. Validate AWS Step Function EMR based State Machine Execution
6. Run AWS Step Function State Machine to validate logic to delete AWS s3 Objects
7. Exercise to add validation of source s3 location in AWS Step Function State Machine
8. Update AWS Step Function State Machine to Validate Source s3 Location
9. Run AWS Step Function State Function with source s3 Validation Logic
10. Develop AWS Lambda Function to check number of files in source s3
11. Attach Policy to State Machine Role to Invoke AWS Lambda Function
12. Run Updated State Machine to validate source count
13. Best Practices to Run AWS Step Functions State Machines
1. Setup AWS EMR Cluster to develop applications using Spark SQL
2. Setup Visual Studio Code Workspace using AWS EMR Master Node
3. Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
4. Setup Required Data Sets for Spark SQL
5. Upload Retail DB Files to AWS s3 using AWS CLI commands
6. Getting Started with Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster
7. Create Spark SQL Temporary Views for Orders and Order Items
8. Join and Aggregate using Spark SQL on AWS EMR Cluster
9. Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
10. Develop Script using Spark SQL Commands
11. Parameterize Bucket Name in Spark SQL Script
12. Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
13. Deploy Spark SQL Script as Step on AWS EMR Cluster
14. Conclusion to Develop Spark SQL Applications on EMR Cluster
1. Create State Machine to Deploy Spark SQL Script on AWS EMR Cluster
2. Overview of Managing AWS EMR Clusters using Boto3
3. Overview of AWS boto3 to Manage AWS EMR Clusters
4. Create AWS EMR Job Flow Cluster using Python Boto3
5. Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
6. Overview of AWS EMR Waiters using Python Boto3
7. Terminate AWS EMR Cluster using waiters and Python Boto3
8. Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
9. Create State Machine using AWS Step Function to create EMR Cluster
10. Grant Permissions to State Machine via Role to Create AWS EMR Cluster
11. Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
12. Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
13. Pass AWS EMR Step Details as Input to State Machine at Execution Time
14. Validate Spark SQL Script Execution as AWS EMR Step using State Machine
1. Overview of Integration of Spark and Redshift
1. Create AWS EC2 Elastic IP and Key Pair for AWS EMR Cluster
2. Create Shell Script for AWS EMR Bootstrap Action to install boto3
3. Create AWS EMR Cluster to integrate with Amazon Redshift
4. Attach Elastic IP to the AWS EMR Master Node and Validate SSH Connectivity
5. Setup Project for AWS EMR and Redshift Integration using VS Code Remote Development
6. Setup Amazon Redshift Serverless Workgroup and Validate Connetivity
7. Connect to Redshift Serverless Workgroup from AWS EMR Master using psql
8. Setup Required Database and User in Amazon Redshift Serverless Workgroup
9. Install Python Library psycopg2 to connect to Redshift Databases using Python
10. Validate Redshift Connectivity using Python from AWS EMR Master Node
11. Create and Validate Redshift Database Tables
12. Create Secret for Redshift Database using AWS Secrets Manager
13. Validate Python Boto3 on Master Node of AWS EMR Cluster
14. Read Secret from AWS Secrets Manager using Python Boto3
15. Validate Redshift Connectivity from Master Node of AWS EMR Cluster
16. Launch Pyspark CLI with Redshift Dependencies on AWS EMR Master Node
17. Validate Redshift Connectivity using Spark on AWS EMR Cluster
18. Develop Code to Validate Spark and Redshift Integration using EMR
19. Setup GHActivity Data in AWS s3
20. Read and Process Data using Pyspark to write into Redshift Table
21. Develop Write Logic to load Spark Dataframe into Redshift Table
22. Validate Spark Load Process to Amazon Redshift Table
23. Understanding AWS s3 Temp Location specified in Spark Applications
24. Conclusion on Integration of AWS EMR with Amazon Redshift
1. Setup AWS EMR Cluster to develop applications using Spark SQL
2. Setup Visual Studio Code Workspace using AWS EMR Master Node
3. Update PYTHONPATH to access Pyspark Libraries or Modules on AWS EMR Master Node
4. Setup Required Data Sets for Spark SQL
5. Upload Retail DB Files to AWS s3 using AWS CLI commands
6. Getting Started with Spark SQL and Temporary Views using Spark SQL on AWS EMR Cluster
7. Create Spark SQL Temporary Views for Orders and Order Items
8. Join and Aggregate using Spark SQL on AWS EMR Cluster
9. Write Query Results back to AWS s3 using Spark SQL on AWS EMR Cluster
10. Develop Script using Spark SQL Commands
11. Parameterize Bucket Name in Spark SQL Script
12. Deploy Spark SQL Script in s3 and Run using CLI on AWS EMR Master Node
13. Deploy Spark SQL Script as Step on AWS EMR Cluster
14. Conclusion to Develop Spark SQL Applications on EMR Cluster
1. Introduction to Integration of AWS Lambda Functions and Redshift
2. Setup Redshift Serverless Workgroup and Namespace
3. Setup Workspace for Integration of AWS Lambda Functions and Redshift
4. Validate JSON Data in AWS s3 using Pandas
5. Get Redshift Cluster Details using Python boto3
6. Get Redshift Serverless Details using Python Boto3
7. Run SQL Queries using Redshift Serverless and Python Boto3
8. Capture Redshift Query Results using Python Boto3
9. Create Database and User in Redshift Serverless Namespace
10. Create Table in Redshift Serverless Namespace
11. Overview of Python Boto3 Waiters
12. Run Queries against Redshift Table using Boto3 without credentials
13. Create and Validate Secret using AWS Secrets Manager for Redshift Workgroup
14. Copy Processed Data from AWS s3 into Redshift Table
15. Conclusion on Developing Applications using Redshift and Python Boto3
1. Overview of Data Pipelines using EMR and Redshift
1. Introduction to Integration of AWS Lambda Functions and Redshift
2. Getting Started with Lambda Function using boto3
3. Running Lambda Function using AWS Lambda Console
4. Troubleshoot issues of AWS Lambda Functions using Cloudwatch Logs
5. Check Python Boto3 Version in AWS Lambda Function Run Time Environment
6. Overview of adding Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
7. Copy Zip File with Latest Boto3 to AWS s3 for Lambda Layer
8. Create Lambda Layer to Upgrade Python Boto3 of Lambda Runtime
9. Create Function to Copy Data into Redshift Table using boto3
10. Update Lambda Handler to copy data to Redshift Table
11. Grant Permissions on Redshift Secret to AWS Lambda Function via IAM Role
12. Grant Permissions on Redshift Data API to AWS Lambda Function via IAM Role
13. Review Redshift Workgroup and Truncate Table before running Lambda Function
14. Run AWS Lambda Function to Copy Data to Redshift Table
15. Validate Data Copied by AWS Lambda Function in Redshift Table by running queries
1. Introduction to Data Pipeline using AWS Step Functions with EMR and Redshift
2. Getting Started with State Machines or Data Pipelines using AWS Step Functions
3. Review Execution Details of State Machine or Data Pipeline using AWS Step Functions
4. Manage State Machines using AWS Step Functions State Machines Dashboard
5. Create State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Table
6. Update State Machine with Permissions on Lambda to Copy Data From s3 to Redshift Table
7. Run State Machine with AWS Lambda Function to Copy Data From s3 to Redshift Table
8. Overview of Managing AWS EMR Clusters using Boto3
9. Overview of AWS boto3 to Manage AWS EMR Clusters
10. Create AWS EMR Job Flow Cluster using Python Boto3
11. Add Spark SQL Script as Step to AWS EMR Cluster using Boto3
12. Overview of AWS EMR Waiters using Python Boto3
13. Terminate AWS EMR Cluster using waiters and Python Boto3
14. Overview of AWS Step Functions State Machine to execute Spark SQL on EMR
15. Create State Machine using AWS Step Function to create EMR Cluster
16. Grant Permissions to State Machine via Role to Create AWS EMR Cluster
17. Add Spark SQL Script as Step to AWS EMR Cluster using AWS Step Functions
18. Add Add Terminate AWS EMR Cluster Step to AWS Step Functions State Machine
19. Pass AWS EMR Step Details as Input to State Machine at Execution Time
20. Validate Spark SQL Script Execution as AWS EMR Step using State Machine
21. Create Data Pipeline with EMR and Redshift Integration using AWS Step Functions
22. Grant Permissions on AWS EMR to role of State Machine with EMR and Redshift Integration
23. Run AWS Step Function State Machine with EMR and Redshift Integration
24. Validate AWS State Machine Execution with EMR and Redshift Integration
25. Best Practices to Build State Machines with AWS EMR and Redshift Integration
1. Overview of Glue Components and Glue Catalog
1. Introduction - Overview of Glue Components
2. Create Crawler and Catalog Table
3. Analyze Data using Athena
4. Creating S3 Bucket and Role
5. Create and Run the Glue Job
6. Validate using Glue CatalogTable and Athena
7. Create and Run Glue Trigger
8. Create Glue Workflow
9. Run Glue Workflow and Validate
1. Prerequisites for Glue Catalog Tables
2. Steps for Creating Catalog Tables
3. Download Data Set
4. Upload data to s3
5. Create Glue Catalog Database - itvghlandingdb
6. Create Glue Catalog Table - ghactivity
7. Running Queries using Athena - ghactivity
8. Crawling Multiple Folders
9. Managing Glue Catalog using AWS CLI
10. Managing Glue Catalog using Python Boto3
1. Data Analysis using Amazon Athena
1. Getting Started with Amazon Athena
2. Quick Recap of Glue Catalog Databases and Tables
3. Access Glue Catalog Databases and Tables using Athena Query Editor
4. Create Database and Table using Athena
5. Populate Data into Table using Athena
6. Using CTAS to create tables using Athena
7. Overview of Amazon Athena Architecture
8. Amazon Athena Resources and relationship with Hive
9. Create Partitioned Table using Athena
10. Develop Query for Partitioned Column
11. Insert into Partitioned Tables using Athena
12. Validate Data Partitioning using Athena
13. Drop Athena Tables and Delete Data Files
14. Drop Partitioned Table using Athena
15. Data Partitioning in Athena using CTAS
1. Amazon Athena using AWS CLI - Introduction
2. Get help and list Athena databases using AWS CLI
3. Managing Athena Workgroups using AWS CLI
4. Run Athena Queries using AWS CLI
5. Get Athena Table Metadata using AWS CLI
6. Run Athena Queries with custom location using AWS CLI
7. Drop Athena table using AWS CLI
8. Run CTAS under Athena using AWS CLI
1. Amazon Athena using Python boto3 - Introduction
2. Getting Started with Managing Athena using Python boto3
3. List Amazon Athena Databases using Python boto3
4. List Amazon Athena Tables using Python boto3
5. Run Amazon Athena Queries with boto3
6. Review Athena Query Results using boto3
7. Persist Amazon Athena Query Results in Custom Location using boto3
8. Processing Athena Query Results using Pandas
9. Run CTAS against Amazon Athena using Python boto3

About this course

$300.00
663 lessons
52 hours of video content

Build AWS Data Engineering Skills, starting today

Enroll today

Module 1 - SQL Revision for Data Engineering

Getting Started with SQL Revision for Data Engineering

Setup Tools for Data Engineering Essentials

Setup Application Tables and Data in Postgres Database

Writing Basic SQL Queries

Cumulative Aggregations and Ranking in SQL Queries

SQL Troubleshooting and Debugging Guide

Performance Tuning of SQL Queries

Exercises for Basic SQL Queries

Solutions for Basic SQL Queries

SQL - Interview Questions

Module 2 - Python Revision for Data Engineering

Getting Started with Python

Python Collections for Data Engineering

Data Processing using Pandas Data frame APIs

Troubleshooting and Debugging Python Issues

Performance Tuning of Python Applications

Project 1 - File Format Converter using Python

Project 2 - Files to Database Loader

Project 3 - Rest Payload to the DB Loader Essentials

Python - Interview Questions

Module 4 - Build and Deploy AWS Lambda Functions

Getting Started on Windows with Required Tools

Setup Development Environment to build Data Pipelines using AWS

Getting Started with AWS and Review Data Sets

Core Logic to Ingest Data from Web Service to AWS s3

Getting Started with AWS Lambda Functions

Build and Deploy AWS Lambda Function using Zip File

Deploy AWS Lambda Functions using Python Runtime with Layers

Module 5 - Data Warehousing using Amazon Serverless Redshift

Getting Started with Amazon Serverless Redshift

Setup Redshift Spectrum Schema using Redshift Serverless

Basic SQL Queries using AWS Redshift SQL

Module 6 - Data Processing using Spark on Databricks

Basic Transformations using Spark SQL

Ranking using Spark SQL Windowing Functions

Getting Started with PySpark Data Frame APIs

Module 7 - Data Processing using Spark on EMR

Setup Development Cluster using AWS EMR

Development Life Cycle using AWS EMR Development Cluster

Deploy Spark Application on AWS EMR Cluster

Manage AWS EMR Clusters using Python Boto3

Module 8 - Orchestration using Step Functions and EMR

Build EMR based Workflows or Pipelines using AWS Step Functions

Develop State Machine using AWS Step Functions to manage s3

Adding s3 Validation Logic to AWS EMR based State Machine

Develop Applications using Spark SQL on AWS EMR Cluster

Develop AWS Step Function to deploy Spark SQL Script on EMR Cluster

Module 9 - Integration of Spark and Redshift

Integration of AWS EMR with Amazon Redshift

Develop Applications using Spark SQL on AWS EMR Cluster

Develop Applications using Redshift and Python boto3

Module 10 - Data Pipelines using EMR and Redshift

Integration of AWS Lambda Functions and Redshift

Data Pipeline using AWS Step Functions with EMR and Redshift

Module 11 - Overview of Glue Components and Glue Catalog

Overview of Glue Components

Deep Dive into Glue Catalog

Module 12 - Data Analysis using Amazon Athena

Overview of Amazon Athena

Amazon Athena using AWS CLI

Amazon Athena using Python boto3

About this course