Writing › productivity

productivity 6 min read 17 December 2022

Machine Learning Research Papers and Python Libraries: A Working Reference

A curated reference of Machine Learning research papers organized by topic, and Python libraries for graph analysis, explainable AI, trading, and MLOps - actively maintained as a working reading list.

Research Papers by Topic

Graph-Based Methods

DeepWalk: Online Learning of Social Representations - the foundational paper for random walk-based graph embeddings
HOPE: Asymmetric Transitivity Preserving Graph Embedding - preserves directed graph asymmetry that DeepWalk misses
Feature Extraction for Graphs - practical overview of hand-crafted graph features
A Overview of FE in Graphs

Training Methodology

A Disciplined Approach to Neural Network Hyper-Parameters - Smith’s CLR paper; the systematic approach to learning rate, batch size, momentum, weight decay that replaces trial-and-error

Loss Functions

Comprehensive Survey of Loss Functions in Machine Learning
Focal Loss for Dense Object Detection - RetinaNet’s class-imbalance solution for object detection
Recall Loss for Semantic Segmentation
Regression Based Loss Functions for Time Series Forecasting

Knowledge Distillation

Distilling the Knowledge in a Neural Network (Hinton 2015) - the original distillation paper
Survey: Knowledge Distillation
DistilBERT - BERT compressed to 60% size with 97% performance

Mixture of Experts

Language-Image Mixture of Experts - multimodal MoE architecture
LIMoE: Learning Multiple Modalities with One Sparse Mixture-of-Experts Model

Deep Learning for Tabular Data

Revisiting Deep Learning Models for Tabular Data - systematic comparison showing tree models still dominate for many tabular tasks; FT-Transformer as the best neural baseline
TabNet - attention-based tabular model with built-in feature selection

Embeddings

Time2Vec: Learning a Vector Representation of Time - position encoding for time series; periodic and non-periodic components

Attention and Transformers

Neural Machine Translation by Jointly Learning to Align and Translate - the original attention paper (Bahdanau 2014)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
The Illustrated Transformer - the best visual explanation of the transformer architecture

Explainable AI

LIME: Explaining the Predictions of Any Classifier
Integrated Gradients
Interpretable Machine Learning (Molnar) - the free online book

Sequence Models

Sequence to Sequence Learning with Neural Networks - the Sutskever 2014 seq2seq paper

Python Libraries

Graph Analysis

networkx - standard graph manipulation and analysis library for Python; not GPU-accelerated but excellent for prototyping and small to medium graphs

Explainable AI

Library	Focus
SHAP	Shapley value explanations, model-agnostic
LIME	Local surrogate model explanations
ELI5	Weights, permutation importance
InterpretML	EBM + unified dashboard
Shapash	Business-facing XAI dashboard
explainerdashboard	Interactive Shapley dashboard

Quantitative Finance and Time Series

quantstats - portfolio performance analytics (Sharpe, max drawdown, rolling statistics)
alphalens-reloaded - factor analysis: IC, turnover, quantile returns
tsfresh - automated time series feature extraction (~800 features per series)
scalecast - forecasting pipeline with built-in cross-validation

Python-First Web Frameworks

Streamlit - fastest path to a data app; runs a Python script as a web app
Pynecone (now Reflex) - React frontend from pure Python
Anvil - full-stack Python web apps with drag-and-drop UI builder

MLOps

MLflow - experiment tracking, model registry, model serving
DVC - data version control; works alongside git for large data and model files
Feast - feature store for serving features to production models
Kedro - pipeline framework with built-in DVC and MLflow integration

Best-Of Lists

ml-tooling/best-of-ml-python - ranked list of Machine Learning Python libraries by GitHub stars and activity
ml-tooling/best-of-python - broader Python ecosystem rankings
ml-tooling/ml-workspace - Docker image with JupyterLab plus common Machine Learning stack pre-installed

research papers python libraries resources machine-learning

← All articles

Have a problem worth solving?

Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.

Book a call →