Writing industry
industry 3 min read 29 December 2022

Staying Current in the Data Industry

A curated list of engineering blogs from top technology companies - where practitioners publish real Machine Learning system designs, infrastructure decisions, and data science case studies.

Why Engineering Blogs Over Courses

Academic papers are rigorous. They’re also disconnected from production constraints. Online courses teach what’s known, not what’s being discovered right now. Engineering blogs from product companies are different: they describe systems that work at scale, decisions made under real constraints, and lessons learned after failures that cost real money.

Reading them consistently is one of the highest-leverage ways to stay calibrated on what production Machine Learning actually looks like.

The blogs below are where practitioners at these companies publish real work. Infrastructure decisions. Model architectures. A/B testing methodologies. Data pipelines. The specific tradeoffs they navigate. The signal density is high.

Company Engineering Blogs

AirBnB Engineering Strong on recommendation systems (search ranking, listing personalization), trust and safety Machine Learning, data infrastructure, and experimentation at scale. Their work on Zipline (feature platform) and experimentation methodology is particularly well documented.

Spotify Research Music and podcast recommendation, audio signal processing, and causal inference in A/B testing. Notable for publishing on the tension between sequential recommendation and user intent modeling.

Netflix Research Content recommendation, streaming quality optimization, and A/B testing methodology. Their work on quasi-experimental methods and the pitfalls of standard online experiments is worth reading for anyone doing causal inference.

DoorDash Machine Learning Blog Last-mile logistics Machine Learning: ETA prediction, dispatch optimization, fraud detection. Interesting because the ground truth feedback loop is fast (delivery happens within the hour), enabling rapid model iteration.

Uber Engineering Breadth: pricing Machine Learning, maps, self-driving, fraud. Strong on real-time systems and Michelangelo (their Machine Learning platform). Useful for infrastructure patterns at scale.

Lyft Engineering Ride pricing, marketplace dynamics, safety. The content on causal inference for policy decisions and the Flyte workflow orchestrator is practical and well written.

Shopify Engineering E-commerce Machine Learning: demand forecasting, fraud detection, merchant recommendation. The architecture decisions around multi-tenant data systems are interesting for anyone working in platform Machine Learning.

Meta Engineering High-volume systems: feed ranking, ads optimization, content moderation at scale. The infrastructure posts on PyTorch distributed training and production model serving are technically dense and worth the time.

LinkedIn Engineering Graph Machine Learning, job recommendation, economic graph data. Their work on fairness in hiring recommendations and member privacy-preserving Machine Learning is among the most practically grounded in those areas.

Kaggle Competition Blog Post-competition writeups from winning teams. The gap between Kaggle solutions and production Machine Learning is real, but these writeups are useful for understanding which ensembling techniques, feature engineering tricks, and validation strategies actually win. Then you can reason about when they transfer to real work.

How to Use These

Subscribe to RSS feeds or set a weekly reminder to skim headlines. The goal is not to read every post. It’s to maintain awareness of the direction the field is moving and to have mental hooks when you encounter similar problems. The posts that matter will stand out because they describe something you’ve already struggled with.

Cross-reference with the Research Papers and Resources list for the academic papers that underpin many of these production systems.

resources learning data-science engineering-blogs industry
← All articles

Have a problem worth solving?

Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.

Book a call →