Configure Apache Spark Application – Apache Spark Application could be configured using properties that could be set directly on a SparkConf object that is passed during SparkContext initialization.
Configure Apache Spark Application using Spark Properties
Following are the properties (and their descriptions) that could be used to tune and fit a spark application in the Apache Spark ecosystem. We shall discuss the following properties with details and examples :
- Spark Application Name
- Number of Spark Driver Cores
- Spark Driver’s Maximum Result Size
- Spark Driver’s Memory
- Spark Executors’ Memory
- Spark Extra Listeners
- Spark Local Directory
- Log Spark Configuration
- Spark Master
- Deploy Mode of Spark Driver
- Log Application Information
- Spark Driver Supervise Action
Application Name
Property Name : spark.app.name
Default value : (none)
This is the name that you could give to your spark application. This application name appears in the Web UI and logs, which makes it easy for debugging and visualizing when multiple spark applications are running on the machine/cluster.
Following is an example to set spark application name :
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
/**
* Configure Apache Spark Application Name
*/
public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}
Output
spark.app.id=local-1501222987079
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.port=44103
spark.executor.id=driver
spark.master=local[2]Number of Spark Driver Cores
Property Name : spark.driver.cores
Default value : 1
Exception : This property is considered only in cluster mode.
It represents the maximum number of cores, a driver process may use.
Following is an example to set number spark driver cores.
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class c{
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.cores", "2");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}Output
spark.app.id=local-1501223394277
spark.app.name=SparkApplicationName
spark.driver.cores=2
spark.driver.host=192.168.1.100
spark.driver.port=42100
spark.executor.id=driver
spark.master=local[2]Driver’s Maximum Result Size
Property Name : spark.driver.maxResultSize
Default value : 1g (meaning 1 GB)
Exception : Minimum 1MB
This is the higher limit on the total sum of size of serialized results of all partitions for each Spark action. Submitted jobs abort if the limit is exceeded. Setting it to ‘0’ means, there is no upper limit. But, if the value set by the property is exceeded, out-of-memory may occur in driver.
Following is an example to set Maximum limit on Spark Driver’s memory usage.
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.maxResultSize", "200m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}Output
spark.app.id=local-1501224103438
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.maxResultSize=200m
spark.driver.port=35249
spark.executor.id=driver
spark.master=local[2]Driver’s Memory Usage
Property Name : spark.driver.memory
Default value : 1g (meaning 1 GB)
Exception : If spark application is submitted in client mode, the property has to be set via command line option –driver-memory.
This is the higher limit on the memory usage by Spark Driver. Submitted jobs abort if the limit is exceeded. Setting it to ‘0’ means, there is no upper limit. But, if the value set by the property is exceeded, out-of-memory may occur in driver.
Following is an example to set Maximum limit on Spark Driver’s memory usage.
AppConfigureExample.java
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.memory", "600m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}Output
spark.app.id=local-1501225134344
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.memory=600m
spark.driver.port=43159
spark.executor.id=driver
spark.master=local[2]Conclusion
In this Apache Spark Tutorial, we learned some of the properties of a Spark Project.
