🎁 Get the FREE AI Skills Starter GuideSubscribe →
BytesAgainBytesAgain
🦀 ClawHub

SageMaker Training Job

by @zyyhhxx

Submit ML training jobs to AWS SageMaker — package code, upload to S3, launch on GPU/CPU instances, poll status, download artifacts. Use when training machin...

Versionv1.0.2
💡 Examples

1. Write a training script

Follow the SageMaker training script contract: read data from SM_CHANNEL_TRAIN, save model to SM_MODEL_DIR. See references/training-scripts.md for templates.

2. Submit a training job

python3 scripts/sagemaker_train.py \
  --job-name my-experiment-001 \
  --script ./train.py \
  --role arn:aws:iam::ACCOUNT:role/SageMakerRole \
  --bucket my-sagemaker-bucket \
  --instance-type ml.g5.xlarge \
  --spot \
  --framework pytorch \
  --input-data s3://my-bucket/data/train/ \
  --hyperparameters '{"epochs":"50","lr":"0.001"}' \
  --output-dir ./results

The script packages your code, uploads to S3, submits the job, polls until complete, and downloads model artifacts to --output-dir.

3. Check cost

# Estimate before running
python3 scripts/sagemaker_cost.py --instance-type ml.g5.xlarge --duration 3600 --spot

Check actual cost after job completes

python3 scripts/sagemaker_cost.py --job-name my-experiment-001

4. List recent jobs

python3 scripts/sagemaker_list.py --max 5
python3 scripts/sagemaker_list.py --status Failed

⚙️ Configuration

  • boto3 Python package installed (pip install boto3). sagemaker recommended.
  • AWS credentials available — EC2 instance profile (recommended), or aws configure / env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  • S3 bucket for training artifacts
  • Two IAM roles configured — see references/setup.md for exact policies:
  • - Role A (Caller): SageMaker job management + S3 access + ECR image pull - Role B (Execution): S3 data access + CloudWatch logs + ECR images

    View on ClawHub
    TERMINAL
    clawhub install sagemaker-training-job

    🧪 Use this skill with your agent

    Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

    🔍 Can't find the right skill?

    Search 60,000+ AI agent skills — free, no login needed.

    Search Skills →