Course curriculum

    1. Getting Started with Databricks Community Edition

    2. Setup and Validate Spark Cluster using Databricks Community Edition

    3. Overview of Databricks Notebooks

    4. Overview of Magic Commands in Databricks Notebooks

    5. Recreating Clusters using Databricks Community Edition

    6. Limitations using Databricks Community Edition

    1. Introduction to Basics of Spark and Spark APIs

    2. Overview of Spark and Distributed Computing

    3. Review Data Sets for Word Count

    4. Quick Revision of Python Collections for Spark RDDs

    5. YouTube - Python for Data Engineering Courses

    6. Overview of Spark RDDs

    7. Overview of Spark Dataframes and Datasets

    8. YouTube - Recommend other videos in description

    9. Overview of APIs to create Spark RDDs

    10. Read Text Data from Files into Spark RDDs

    11. YouTube - Review Videos

    12. Overview of APIs or Functions on Spark RDDs

    13. Previewing the Data in Spark RDDs using Actions

    14. Important Concepts related to Transformations in Spark

    15. Filter Data in Spark RDDs

    16. Row Level Transformations using map on Spark RDDs

    17. Understand the requirements for Word Count

    18. Recap of String Functions in Python

    19. YouTube - Course Recommendations

    20. Overview of Aggregations

    21. Using reduce function on Spark RDDs

    22. Flatten lists in Spark RDDs using flatMap

    23. Understand Concept of Shuffling with Example

    24. Word Count using reduceByKey on Spark RDDs

    25. Sort Word Count Results using sortByKey on Spark RDDs

    26. Save Spark RDDs to Text Files

    27. Final Logic to perform Word Count using Spark RDDs

    28. Review Spark Driver logs and Spark UI

    29. Overview of Lazy Evaluation and DAGs in Spark

    30. Next Steps for Data Engineering using Spark

    31. YouTube - Promote Guided Programs

    1. Review Databricks Datasets using fs commands.cmproj

    2. Getting Started with Spark Dataframes using Pyspark.cmproj

    3. Overview of Apache Spark.cmproj

    4. Compute Item Revenue using Pyspark.cmproj

    5. Aggregations using Pyspark.cmproj

    6. Restart Databricks Cluster using Community Edition.cmproj

    7. DBFS Commands to Manage Files and Folders.cmproj

    8. Writing Processed Data to DBFS using Spark.cmproj

    1. Getting Started with Spark SQL using Databricks Community Edition.cmproj

    2. Review Files and Data using fs Commands.cmproj

    3. Create External Table using Spark SQL.cmproj

    4. Review Online Retail Data Set to Practice Spark SQL.cmproj

    5. Compute Item Revenue using Spark SQL.cmproj

    6. Compute Invoice Revenue using GROUP BY in Spark SQL.cmproj

    7. Revision of SQL Syntax.cmproj

    8. Views and CTEs in Spark SQL.cmproj

    9. Create Spark Metastore Table for Processed Data.cmproj

    10. Populate Data into Spark Tables using INSERT.cmproj

    11. Data Engineering Pipeline using Spark SQL.cmproj

    1. Introduction to Word Count using Spark Dataframe APIs

    2. Preview Data in Spark Dataframes

    3. Using split and explode on Spark Dataframes

    4. Revision of Standard Transformations

    5. Using groupBy and orderBy for Aggregations using Spark

    6. Final Code for Word Count using Spark Dataframe APIs

    7. Review Driver Logs and Spark UI

    8. Conclusion and Next Steps

    1. Introduction to Word Count using Spark SQL

    2. Create External Table using Spark SQL

    3. Standard Transformations and Functions in Spark SQL

    4. Using split and explode in Spark SQL

    5. GROUP BY and CTE using Spark SQL

    6. Write Data into DBFS using Spark SQL

    7. Final Code for Word Count using Spark SQL

    8. Conclusion and Next Steps

About this course

  • Free
  • 84 lessons
  • 6.5 hours of video content

Discover your potential, starting today