What is map() transformation in PySpark
How to use map() transformation in PySpark
How to use map() transformation  using DataFrame in PySpark

 

Convert RDD to Dataframe in PySpark

 

What is createOrReplaceGlobalTempView() function in PySpark
How to use createOrReplaceGlobalTempView() function in PySpark

 

What is createOrReplaceTempView() function in PySpark
How to use createOrReplaceTempView() function in PySpark

 

pyspark.sql.functions.transform() function in PySpark

 

 DataFrame.transform() function in PySpark

 

What collect() function in PySpark

 

What sample() function in PySpark

 

fill() & fillna() functions in PySpark

unpivot Dataframe or stack function in PySpark

pivot() function in PySpark

Left Semi Join in PySpark
Left Anti Join in PySpark
Self Join in PySpark

join() function in PySpark,
Inner Join with DataFrame in PySpark,
Left Join with DataFrame in PySpark,
Right Join with DataFrame in PySpark,
Full Join with DataFrame in PySpark

select() function in PySpark

 

GroupBy agg() function in PySpark

 

groupBy() in PySpark  Azure Databricks

 

unionByName() function in PySpark

 

union() & unionAll() in PySpark

 

orderBy() & sort() in PySpark

 

distinct() & dropDuplicates() function in PySpark

 

like() function or wildcard searching in DataFrame in PySpark?
ilike() function in DataFrame in PySpark?

 

Where() and filter() in PySpark?

alias() function in PySpark?
asc() function with sort() in PySpark?
desc() function with sort() in PySpark?
cast() function in PySpark?

When() function in PySpark?
Otherwise() function in PySpark?

Column class (pyspark.sql.Column) in PySpark?
 How to add new column in DataFrame?
How to access columns in DataFrame?
How to access struct type columns in DataFrame?

What is Row class in PySpark?
How to use Row Class?
How to create DataFrame using Row object?
Create Nested struct type using Row

What is Array() function in PySpark?,
How to use Array() function in PySpark?,
What is array_contains() function in PySpark?,
How to use array_contains() function in PySpark?

What is Split() function in PySpark?
How to use Split() function in PySpark?

What is explode() function in PySpark?

How to use explode() function in PySpark?

MapType Column in DataFrame in PySpark?
How to access the MapType elements in columns?

Array Type Columns in DataFrame in PySpark?
How to access array elements as a new column.
How to access DataFrame columns as a array

 

What is  StructType() & StructField() in PySpark
How to use Complex Data Type/Nested StructType

How to Change DataType using PySpark withColumn() in DataFrame?
How to Update The Value of an Existing Column in DataFrame?
How to Create a Column from an Existing in DataFrame?
How to Add a New Column using withColumn() in DataFrame?

How to Rename columns name in DataFrame

Using of withColumnRenamed() usage in PySpark

 

show() method in PySpark to display DataFrame contents in table?
Discuss Show() method parameters

How to Write DataFrame into parquet file?
Types of saving mode

How to read parquet file into DataFrame?
How to read multiple parquet files into DataFrame?

How to Write DataFrame into JSON file?
Types of DataFrame write JSON mode
 

How to read single line json file into DataFrame?
How to read multiline json file into DataFrame?
How to read multiple json files into DataFrame?
How to read json files with custom schema into DataFrame?
 

How to Write DataFrame into CSV file?
Types of DataFrame write CSV mode

How to Read multiple csv files into DataFrame with default schema?
How to Read multiple csv files into DataFrame with Custom Schema?
How to Read Folder all csv files into DataFrame?

How to Read csv file in to DataFrame with default schema?
How to Read csv file in to DataFrame with Custom Schema?

Create DataFrame with custom schema (columns)
Create DataFrame with custom schema (columns) and data type

  • Compare RDD vs DataFrame
  • Compare Feature like Launching, version,
  • Data representation, Data format, Optimization,
  • APIs, Schema Projection, Memory Management
  • etc. more.

What is DataFrame in PySpark?
Why use PySpark DataFrame?
Core Features of DataFrame
Create DataFrame From List, Tuple, Dictionaries & RDD

How to save RDD data as a textfile in PySpark

How to apply Filter/Search on RDD

 

Extracting data from RDD Using First(), Take(),  Key(), Values() & Count() methods

Sorting RDD data using sortByKey() & sortBy() Ascending & Descending

Create RDD using text file data?
Word split from text file data?
Word count from text file data?

How to read parquet file into DataFrame?
How to read multiple parquet files into DataFrame?

  1. What is RDD with Example
  2. Types of RDD Operations
  3. What is DAG With Example
  4. DAG Scheduler with Example
  5. Spark Architecture
  6. Memory Usage 

  1. What is PySpark
  2. PySpark Modules & Packages
  3. Features of PySpark
  4. Advantage of PySpark
  5. Characteristics of PySpark
  6. Disadvantage of PySpark