π¦ ClawHub
Protein Phylogeny
by @billwanttobetop
Comprehensive protein family phylogenetic analysis workflow with quality control, conservation analysis, coevolution network analysis, and publication-ready...
π‘ Examples
Input: FASTA file with protein sequences (any family, any size) Output: Publication-ready report with phylogenetic tree, conservation analysis, coevolution networks, and high-quality figures
Typical workflow:
# 1. Quality control (removes low-quality sequences)
bash scripts/01_quality_control.sh input.fasta output_dir/2. Conservation analysis
bash scripts/02_conservation.sh output_dir/qc/final.fasta output_dir/3. Coevolution analysis
bash scripts/03_coevolution.sh output_dir/qc/final.fasta output_dir/4. Phylogenetic tree
bash scripts/04_phylogeny.sh output_dir/qc/final.fasta output_dir/5. Generate figures
bash scripts/05_visualize.sh output_dir/6. Create report
bash scripts/06_report.sh output_dir/ "Family Name"
π Tips & Best Practices
Issue: CD-HIT crashes with large datasets Fix: Split input, process in batches, merge results
Issue: IQ-TREE runs forever
Fix: Use -fast mode or reduce bootstrap replicates
Issue: Figures look pixelated
Fix: Increase DPI in scripts/05_visualize.sh (default 300)
Issue: Report generation fails Fix: Check all intermediate files exist, rerun failed stages
TERMINAL
clawhub install protein-phylogeny