Alibabacloud Emr Spark Manage
by @sdk-team
Manage the full lifecycle of Alibaba Cloud EMR Serverless Spark workspaces—create workspaces, submit jobs, Kyuubi interactive queries, resource queue scaling...
Pre-check: Aliyun CLI >= 3.3.3 required
> Run aliyun version to verify >= 3.3.3. If not installed or version too low,
> run curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash to update,
> or see references/cli-installation-guide.md for installation instructions.
Pre-check: Aliyun CLI plugin update required
> [MUST] run aliyun configure set --auto-plugin-install true to enable automatic plugin installation.
> [MUST] run aliyun plugin update to ensure that any existing plugins are always up-to-date.
[MUST] CLI User-Agent — Every aliyun CLI command invocation must include:
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage
1. Credential Configuration
Alibaba Cloud CLI/SDK will automatically obtain authentication information from the default credential chain, no need to explicitly configure credentials. Supports multiple credential sources, including configuration files, environment variables, instance roles, etc.
Recommended to use Alibaba Cloud CLI to configure credentials:
aliyun configure
For more credential configuration methods, refer to Alibaba Cloud CLI Credential Management.
2. Grant Service Roles (Required for First-time Use)
Before using EMR Serverless Spark, you need to grant the account the following two roles (see RAM Permission Policies for details):
| Role Name | Type | Description | |-----------|------|-------------| | AliyunServiceRoleForEMRServerlessSpark | Service-linked role | EMR Serverless Spark service uses this role to access your resources in other cloud products | | AliyunEMRSparkJobRunDefaultRole | Job execution role | Spark jobs use this role to access OSS, DLF and other cloud resources during execution |
> For first-time use, you can authorize through the EMR Serverless Spark Console with one click, or manually create in the RAM console.
3. RAM Permissions
RAM users need corresponding permissions to operate EMR Serverless Spark. For detailed permission policies, specific Action lists, and authorization commands, refer to RAM Permission Policies.
4. OSS Storage
Spark jobs typically need OSS storage for JAR packages, Python scripts, and output data:
# Check for available OSS Buckets
aliyun oss ls --user-agent AlibabaCloud-Agent-Skills/alibabacloud-emr-spark-manage
clawhub install alibabacloud-emr-spark-manage