🦀 ClawHub
SageMaker Training Job
by @zyyhhxx
Submit ML training jobs to AWS SageMaker — package code, upload to S3, launch on GPU/CPU instances, poll status, download artifacts. Use when training machin...
💡 Examples
1. Write a training script
Follow the SageMaker training script contract: read data from SM_CHANNEL_TRAIN,
save model to SM_MODEL_DIR. See references/training-scripts.md for templates.
2. Submit a training job
python3 scripts/sagemaker_train.py \
--job-name my-experiment-001 \
--script ./train.py \
--role arn:aws:iam::ACCOUNT:role/SageMakerRole \
--bucket my-sagemaker-bucket \
--instance-type ml.g5.xlarge \
--spot \
--framework pytorch \
--input-data s3://my-bucket/data/train/ \
--hyperparameters '{"epochs":"50","lr":"0.001"}' \
--output-dir ./results
The script packages your code, uploads to S3, submits the job, polls until
complete, and downloads model artifacts to --output-dir.
3. Check cost
# Estimate before running
python3 scripts/sagemaker_cost.py --instance-type ml.g5.xlarge --duration 3600 --spotCheck actual cost after job completes
python3 scripts/sagemaker_cost.py --job-name my-experiment-001
4. List recent jobs
python3 scripts/sagemaker_list.py --max 5
python3 scripts/sagemaker_list.py --status Failed
⚙️ Configuration
boto3 Python package installed (pip install boto3). sagemaker recommended.aws configure / env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)references/setup.md for exact policies:TERMINAL
clawhub install sagemaker-training-job