π¦ ClawHub
rdkit
by @wu-uk
Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D genera...
π Tips & Best Practices
Error Handling
Always check for None when parsing molecules:
mol = Chem.MolFromSmiles(smiles)
if mol is None:
print(f"Failed to parse: {smiles}")
continue
Performance Optimization
Use binary formats for storage:
import picklePickle molecules for fast loading
with open('molecules.pkl', 'wb') as f:
pickle.dump(mols, f)Load pickled molecules (much faster than reparsing)
with open('molecules.pkl', 'rb') as f:
mols = pickle.load(f)
Use bulk operations:
# Calculate fingerprints for all molecules at once
fps = [AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mols]Use bulk similarity calculations
similarities = DataStructs.BulkTanimotoSimilarity(fps[0], fps[1:])
Thread Safety
RDKit operations are generally thread-safe for:
Not thread-safe: MolSuppliers when accessed concurrently.
Memory Management
For large datasets:
# Use ForwardSDMolSupplier to avoid loading entire file
with open('large.sdf') as f:
suppl = Chem.ForwardSDMolSupplier(f)
for mol in suppl:
# Process one molecule at a time
passUse MultithreadedSDMolSupplier for parallel processing
suppl = Chem.MultithreadedSDMolSupplier('large.sdf', numWriterThreads=4)
TERMINAL
clawhub install find-topk-similiar-chemicals-rdkit