In PySpark, .master is an attribute used when creating a SparkSession to specify the cluster manager to which the Spark application will connect. It defines where the application will run, whether locally or on a cluster. Some common values for .master include:
- "local": Runs Spark locally on a single thread. Useful for testing and debugging.
- "local[n]": Runs Spark locally with n threads. This allows for some level of parallelism on a local machine.
- "spark://host:port": Connects to a standalone Spark cluster at the specified host and port.
- "yarn": Connects to a Hadoop YARN cluster.
- "mesos://host:port": Connects to an Apache Mesos cluster.
The .master attribute is set within the SparkSession.builder when creating a SparkSession object. For instance:
|
Python |
|
from pyspark.sql import SparkSession .appName("My PySpark App") \ .master("local[*]") \ .getOrCreate()
|
In this example, `"local[*]" `tells Spark to run locally, using as many threads as available cores.
No comments:
Post a Comment