Python Libraries for Data Science: A Practical Reference

Graph Analysis

networkx - the standard Python library for graph creation, manipulation, and analysis. Supports directed, undirected, multigraphs. Not GPU-accelerated (use cuGraph for large-scale), but covers 90% of graph analysis tasks.

Explainable AI

Library	What It Does
SHAP	Shapley values. Theoretically grounded, model-agnostic feature attribution.
LIME	Local surrogate models. Fits a simple model in the neighborhood of a prediction.
ELI5	Feature weights, permutation importance, text and image debugging.
InterpretML	Explainable Boosting Machines plus unified dashboard.
Shapash	Business-facing Shapley dashboard with plain-language labels.
OmniXAI	Salesforce’s unified XAI library. Multiple methods, one API.
explainerdashboard	Interactive Shapley dashboard with classification and regression support.

SHAP is the default choice for production XAI. It has the strongest theoretical backing (Shapley values from cooperative game theory) and works with tree models, deep learning, and linear models.

Quantitative Finance and Time Series

quantstats - portfolio analytics in one call: Sharpe ratio, max drawdown, CAGR, rolling statistics, tearsheet generation
alphalens-reloaded - factor analysis: information coefficient, factor turnover, quantile returns versus benchmark
tsfresh - automated extraction of ~800 time series features from raw sensor or financial data; pairs well with feature selection pipelines
scalecast - forecasting pipeline with model comparison, cross-validation, and multiple backends (statsmodels, sklearn, Prophet, neural)

Python-First Web Frameworks

For building data-driven dashboards and apps without JavaScript:

Streamlit - fastest path from Python script to interactive web app; ideal for internal tools and demos
Pynecone / Reflex - full-stack Python web apps that compile to React; more control than Streamlit, more complexity
Anvil - full-stack framework with drag-and-drop UI builder; runs Python in the browser via Skulpt/Brython

MLOps Tooling

MLflow - experiment tracking, model registry, and model serving. De facto standard for tracking training runs.
DVC - data version control; treats data files like git treats code files. Integrates with git, stores large files in S3/GCS/Azure.
Feast - feature store for serving consistent features to both training pipelines and production models
Kedro - Machine Learning pipeline framework with built-in DVC and MLflow integration; enforces reproducible, testable pipelines

Kedro plus DVC plus MLflow is a coherent MLOps stack that handles data versioning, experiment tracking, and pipeline orchestration in one configuration.

Curated Library Lists

ml-tooling/best-of-ml-python - ranked by GitHub activity; the fastest way to find the best library in each Machine Learning category
ml-tooling/best-of-python - broader Python ecosystem
ml-tooling/ml-workspace - Docker image with JupyterLab plus full Machine Learning stack pre-installed; useful for reproducible research environments