Graph Analysis
- networkx - the standard Python library for graph creation, manipulation, and analysis. Supports directed, undirected, multigraphs. Not GPU-accelerated (use cuGraph for large-scale), but covers 90% of graph analysis tasks.
Explainable AI
| Library | What It Does |
|---|---|
| SHAP | Shapley values. Theoretically grounded, model-agnostic feature attribution. |
| LIME | Local surrogate models. Fits a simple model in the neighborhood of a prediction. |
| ELI5 | Feature weights, permutation importance, text and image debugging. |
| InterpretML | Explainable Boosting Machines plus unified dashboard. |
| Shapash | Business-facing Shapley dashboard with plain-language labels. |
| OmniXAI | Salesforce’s unified XAI library. Multiple methods, one API. |
| explainerdashboard | Interactive Shapley dashboard with classification and regression support. |
SHAP is the default choice for production XAI. It has the strongest theoretical backing (Shapley values from cooperative game theory) and works with tree models, deep learning, and linear models.
Quantitative Finance and Time Series
- quantstats - portfolio analytics in one call: Sharpe ratio, max drawdown, CAGR, rolling statistics, tearsheet generation
- alphalens-reloaded - factor analysis: information coefficient, factor turnover, quantile returns versus benchmark
- tsfresh - automated extraction of ~800 time series features from raw sensor or financial data; pairs well with feature selection pipelines
- scalecast - forecasting pipeline with model comparison, cross-validation, and multiple backends (statsmodels, sklearn, Prophet, neural)
Python-First Web Frameworks
For building data-driven dashboards and apps without JavaScript:
- Streamlit - fastest path from Python script to interactive web app; ideal for internal tools and demos
- Pynecone / Reflex - full-stack Python web apps that compile to React; more control than Streamlit, more complexity
- Anvil - full-stack framework with drag-and-drop UI builder; runs Python in the browser via Skulpt/Brython
MLOps Tooling
- MLflow - experiment tracking, model registry, and model serving. De facto standard for tracking training runs.
- DVC - data version control; treats data files like git treats code files. Integrates with git, stores large files in S3/GCS/Azure.
- Feast - feature store for serving consistent features to both training pipelines and production models
- Kedro - Machine Learning pipeline framework with built-in DVC and MLflow integration; enforces reproducible, testable pipelines
Kedro plus DVC plus MLflow is a coherent MLOps stack that handles data versioning, experiment tracking, and pipeline orchestration in one configuration.
Curated Library Lists
- ml-tooling/best-of-ml-python - ranked by GitHub activity; the fastest way to find the best library in each Machine Learning category
- ml-tooling/best-of-python - broader Python ecosystem
- ml-tooling/ml-workspace - Docker image with JupyterLab plus full Machine Learning stack pre-installed; useful for reproducible research environments