Writing tests for your spark code using FunSuite
One of the frequently asked questions in StackOverflow or any other forum by the Data Engineer’s who create their data pipelines using Apache Spark is how to write the test cases.
In this write-up, I would like to share my knowledge on writing the Apache Spark unit test cases using the FunSuite package provided by scalatest
Here are the Apache Spark and ScalaTest dependencies and versions
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.0",
"org.apache.spark" %% "spark-sql" % "2.4.0",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
)
For the sake of keeping the code simple and concept straight, I have created a small code which wraps textFile. The function takes SparkSession and path of the file as a string.
import org.apache.spark.sql.{Dataset, SparkSession}
object Utilities {
def readFile(spark: SparkSession,
locationPath: String): Dataset[String] = {
spark.read
.textFile(locationPath)
}
}
Now let's try to test this function with the following test cases
- Creating Dataframe from a text file.
- Counts should match with the number of records in a text file.
- Data should match with sample records in a text file.
- Reading files of different format using readFile should throw an exception.
- Reading an invalid file location using readFile should throw an exception.
import org.apache.spark.sql
import org.apache.spark.sql.{SaveMode, SparkSession}
import org.scalatest.{BeforeAndAfterEach, FunSuite}
class UtilitiesTestSpec extends FunSuite with BeforeAndAfterEach {
private val master = "local"
private val appName = "ReadFileTest"
var spark : SparkSession = _
override def beforeEach(): Unit = {
spark = new sql.SparkSession.Builder().appName(appName).master(master).getOrCreate()
}
test("creating data frame from text file") {
val sparkSession = spark
import sparkSession.implicits._
val peopleDF = ReadAndWrite.readFile(sparkSession,"src/test/resources/people.txt").map(_.split(",")).map(attributes => Person(attributes(0), attributes(1).trim.toInt)).toDF()
peopleDF.printSchema()
}
test("counts should match with number of records in a text file") {
val sparkSession = spark
import sparkSession.implicits._
val peopleDF = ReadAndWrite.readFile(sparkSession,"src/test/resources/people.txt").map(_.split(",")).map(attributes => Person(attributes(0), attributes(1).trim.toInt)).toDF()
peopleDF.printSchema()
assert(peopleDF.count() == 3)
}
test("data should match with sample records in a text file") {
val sparkSession = spark
import sparkSession.implicits._
val peopleDF = ReadAndWrite.readFile(sparkSession,"src/test/resources/people.txt").map(_.split(",")).map(attributes => Person(attributes(0), attributes(1).trim.toInt)).toDF()
peopleDF.printSchema()
assert(peopleDF.take(1)(0)(0).equals("Michael"))
}
test("Reading files of different format using readTextfileToDataSet should throw an exception") {
intercept[org.apache.spark.sql.AnalysisException] {
val sparkSession = spark
import org.apache.spark.sql.functions.col
val df = ReadAndWrite.readFile(sparkSession,"src/test/resources/people.parquet")
df.select(col("name"))
}
}
test("Reading an invalid file location using readTextfileToDataSet should throw an exception") {
intercept[Exception] {
val sparkSession = spark
import org.apache.spark.sql.functions.col
val df = ReadAndWrite.readFile(sparkSession,"src/test/resources/invalid.txt")
df.show()
}
}
override def afterEach(): Unit = {
spark.stop()
}
}
case class Person(name: String, age: Int)
We are going to use local for spark master. The beforeEach() and afterEach() creates SparkSession and close the session before and after running each test case.
The code block intercept[Exception] is used to check if a valid exception is thrown by the code on providing invalid arguments
intercept[Exception]{ }
You can find the entire code at the GitHub repo github
Cool! The fact that I am a decent writer gives me the confidence to enter this competition, and I believe that everything will work out for me. I just published an essay https://www.tmcnet.com/topics/articles/2022/02/24/451626-best-essay-writing-services-2021-everything-should-know.htm regarding writing services, and it received a lot of positive feedback. I am certain that you will appreciate this essay as well.