Writing productivity
productivity 4 min read 17 December 2022

Data Structures and Algorithms: Complexity and Resources

The core intuition behind space-time complexity analysis, with a guide to the best resources for building DSA fundamentals as a data scientist.

Why DSA Matters for Data Scientists

Data scientists don’t spend most of their time writing merge sort from scratch. But the underlying intuition — how does the cost of an operation scale with input size — is used constantly. A model training loop that’s O(n²) in the number of samples will not scale to production data. A nearest-neighbor search that works on 10K points will be unusable at 10M. Understanding complexity is prerequisite to reasoning about whether an approach will work at scale.

Space-Time Complexity

The aim of complexity analysis is to measure how much time and space a program takes as a function of input size.

The core question: given an input of size n, how does the number of operations (time) or memory usage (space) grow as n increases?

Example: finding whether element x exists in a list of n elements.

n = 4
l = [1, 2, 3, 4]
x = 5

for i in l:
    if i == x:
        print('element exists')

In the worst case (x not in the list), this touches every element — n comparisons. Time complexity is O(n): linear in the size of the list.

Common complexity classes (from fastest to slowest):

NotationNameExample
O(1)ConstantDictionary lookup by key
O(log n)LogarithmicBinary search in sorted array
O(n)LinearLinear scan, single pass
O(n log n)LinearithmicMerge sort, heap sort
O(n²)QuadraticNested loops, bubble sort
O(2ⁿ)ExponentialBrute-force subset enumeration

Space complexity follows the same notation but measures memory. A function that creates a copy of the input array is O(n) space. A function that operates in-place is O(1) space.

Amortized Complexity

Some operations are occasionally expensive but cheap on average. Python’s list.append() is O(1) amortized: most appends are O(1), but occasionally the list needs to resize (O(n) for that one operation). Over many appends, the average cost per operation is still O(1).

This matters when analyzing data pipelines: an occasional expensive operation (index rebuild, cache invalidation) doesn’t change the overall throughput if it’s rare enough.

Common Data Structure Complexities

StructureAccessSearchInsertDelete
Array / ListO(1)O(n)O(n)O(n)
Hash TableO(1) avgO(1) avgO(1) avgO(1) avg
Binary Search TreeO(log n)O(log n)O(log n)O(log n)
HeapO(n)O(n)O(log n)O(log n)
Linked ListO(n)O(n)O(1)O(1)

Hash tables (Python dict, set) are the most useful structure for data science work — fast lookups, fast membership checks, and the backbone of groupby/aggregation operations.

Learning Resources

For building fundamentals:

dsa algorithms python complexity interview-prep
← All articles

Lets collaborate!

Whether you need a quantitative researcher, an machine learning systems builder, or a technical advisor — I'm available for select consulting engagements.

Get in Touch →