🎁 Get the FREE AI Skills Starter Guide β€” Subscribe β†’
BytesAgainBytesAgain
πŸ¦€ ClawHub✦ BytesAgain

Cluster

by @ckchzh

Perform data clustering analysis using k-means and hierarchical algorithms. Use when you need to group, classify, or segment datasets.

Versionv1.0.0
Downloads546
Installs1
TERMINAL
clawhub install cluster

πŸ“– About This Skill


name: cluster version: "1.0.0" description: "Perform data clustering analysis using k-means and hierarchical algorithms. Use when you need to group, classify, or segment datasets." author: BytesAgain homepage: https://bytesagain.com source: https://github.com/bytesagain/ai-skills tags: [data, clustering, analysis, machine-learning, k-means, segmentation]

Cluster β€” Data Clustering Analysis Tool

Cluster is a command-line data clustering analysis tool that supports k-means and hierarchical clustering algorithms. It reads numerical data from CSV/JSONL sources, performs clustering, evaluates cluster quality, and exports results.

Data is stored in ~/.cluster/data.jsonl as JSONL records. Each record represents a clustering run with its parameters, assignments, centroids, and evaluation metrics.

Prerequisites

  • Python 3.8+ with standard library (no external packages required for basic operations)
  • bash shell
  • Commands

    run

    Run a clustering algorithm on input data.

    Environment Variables:

  • INPUT (required) β€” Path to input CSV/JSONL file with numerical data
  • K β€” Number of clusters (default: 3)
  • ALGORITHM β€” Algorithm to use: kmeans or hierarchical (default: kmeans)
  • MAX_ITER β€” Maximum iterations for k-means (default: 100)
  • SEED β€” Random seed for reproducibility
  • Example:

    INPUT=/path/to/data.csv K=5 ALGORITHM=kmeans bash scripts/script.sh run
    

    assign

    Assign new data points to existing clusters from a previous run.

    Environment Variables:

  • RUN_ID (required) β€” ID of the clustering run to use
  • INPUT (required) β€” Path to new data points (CSV/JSONL)
  • Example:

    RUN_ID=abc123 INPUT=/path/to/new_data.csv bash scripts/script.sh assign
    

    centroids

    Display or export centroid coordinates for a clustering run.

    Environment Variables:

  • RUN_ID (required) β€” ID of the clustering run
  • FORMAT β€” Output format: table, json, csv (default: table)
  • evaluate

    Evaluate clustering quality with silhouette score, inertia, and Davies-Bouldin index.

    Environment Variables:

  • RUN_ID (required) β€” ID of the clustering run to evaluate
  • visualize

    Generate a text-based or ASCII visualization of cluster assignments.

    Environment Variables:

  • RUN_ID (required) β€” ID of the clustering run
  • DIMS β€” Dimensions to plot, comma-separated (default: first two)
  • export

    Export clustering results to a file.

    Environment Variables:

  • RUN_ID (required) β€” ID of the run to export
  • OUTPUT β€” Output file path (default: stdout)
  • FORMAT β€” Export format: json, csv, jsonl (default: json)
  • import

    Import a previously exported clustering run.

    Environment Variables:

  • INPUT (required) β€” Path to the file to import
  • config

    View or update configuration settings.

    Environment Variables:

  • KEY β€” Configuration key to set
  • VALUE β€” Configuration value
  • list

    List all stored clustering runs with summary info.

    Environment Variables:

  • LIMIT β€” Maximum runs to display (default: 20)
  • SORT β€” Sort field: date, k, score (default: date)
  • stats

    Show aggregate statistics across all clustering runs.

    help

    Display usage information and available commands.

    version

    Display the current version of the cluster tool.

    Data Storage

    All clustering runs are stored in ~/.cluster/data.jsonl. Each line is a JSON object with fields:

  • id β€” Unique run identifier
  • timestamp β€” ISO 8601 creation time
  • algorithm β€” Algorithm used
  • k β€” Number of clusters
  • centroids β€” List of centroid coordinates
  • assignments β€” Mapping of data point indices to cluster IDs
  • metrics β€” Evaluation metrics (silhouette, inertia, etc.)
  • input_file β€” Source data file path
  • num_points β€” Number of data points clustered
  • Configuration

    Config is stored in ~/.cluster/config.json. Available keys:

  • default_k β€” Default number of clusters (default: 3)
  • default_algorithm β€” Default algorithm (default: kmeans)
  • max_iterations β€” Default max iterations (default: 100)
  • random_seed β€” Default random seed (default: 42)

  • Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

    βš™οΈ Configuration

    Config is stored in ~/.cluster/config.json. Available keys:

  • default_k β€” Default number of clusters (default: 3)
  • default_algorithm β€” Default algorithm (default: kmeans)
  • max_iterations β€” Default max iterations (default: 100)
  • random_seed β€” Default random seed (default: 42)

  • Powered by BytesAgain | bytesagain.com | hello@bytesagain.com