Pyspark Note
Spark Env
When setting up the the Spark environment, we can use the PYSPARK_PYTHON for pointing to a specific python version that will be used in a Spark Executor. However, whent the environment contains editable libraries, PYSPARK_PYTHON may not be able to correctly pick it up sometimes.
To solve this issue, we can use bash start up script to point the Spark Executor to the correct environment.
For example, when using conda,
// Open the bash startup config file
vim ~/.bashrc
// Add the following line to activate the environment
conda activate <env>
// Active the bash change
source ~/.bashrc
After making the change on the bash startup file, you will need to restart the Spark Executor by using start-worker.sh again or the change will not take effect.