Case Studies Automating Regulatory Data Pipelines at ANZ Bank
automation Tier-1 Australian Bank · 2021–2022

Automating Regulatory Data Pipelines at ANZ Bank

Designed and automated ETL pipelines across 70+ source systems for APRA regulatory reporting at ANZ Bank, cutting development and testing time by 50% through systematic automation using Robot Framework and Python.

Problem

Manual, fragile ETL processes across 70+ heterogeneous data sources for APRA regulatory compliance — slow to develop, slow to test, and prone to undetected data quality errors.

Outcome

50% reduction in development and testing time through end-to-end automation; production-grade pipelines processing regulatory data from 70+ source systems.

Overview

My first role out of IIT Bombay was at ANZ Bank’s enterprise data automation team in India. The scope was APRA (Australian Prudential Regulation Authority) regulatory data — the pipeline that ensures the bank’s regulatory reporting is accurate and timely. I joined as a fresh graduate and ended up owning significant parts of the automation infrastructure that reduced development and testing time by 50%.

The Problem

Banking regulatory data pipelines are a specific class of engineering problem: the correctness requirements are absolute, the data sources are heterogeneous and legacy, and the cost of errors is regulatory — not just operational. ANZ’s data was flowing from 70+ source systems with different encodings, schemas, and update frequencies into a centralized regulatory reporting layer.

The development and testing process was largely manual: engineers would write DataStage jobs, test them by hand against sample data, and chase down discrepancies through multiple system layers. This was slow, error-prone, and didn’t scale as the number of source systems grew.

Why It Mattered

APRA regulatory reporting is not optional. Late or incorrect regulatory data is a compliance risk — and in banking, compliance failures have material consequences. The engineering team was spending more time on manual testing and debugging than on building new capabilities. The bottleneck was structural.

Data & Inputs

Understanding EBCDIC encoding — and why a single wrong character can silently corrupt a downstream calculation — was one of the first things I learned on the job.

Approach

The automation strategy had two components:

Component 1: ETL development automation Analyzed the patterns in existing IBM DataStage jobs and identified the repetitive structural elements — source connection setup, field mapping, data type handling, error logging. Wrote Python tooling to generate DataStage job templates from configuration files, reducing the work to specify-then-generate rather than build-from-scratch.

Component 2: Automated test framework Built an end-to-end test automation framework using Robot Framework and Python. The framework:

This meant engineers could test a new DataStage job in minutes rather than hours — and the tests could be run automatically before any production deployment.

Engineering & Implementation

The framework was designed to be usable by other engineers in the team — not just by me. Documentation and onboarding were built in from the start.

Results & Impact

Limitations & What I’d Do Differently

The test framework was strong on happy-path coverage but required manual work to add new edge cases. A property-based testing approach (generating test inputs programmatically from specs) would have improved coverage depth.

The DataStage job generation was template-based, not model-based — it couldn’t handle genuinely novel job types without manual template extension. A more principled DSL-to-DataStage compiler would have been more powerful, though the complexity tradeoff might not have been worth it for the team’s needs.

Stack

IBM DataStage, Teradata, Control-M, Robot Framework, Python, SQL, ASCII/EBCDIC file handling

Stack

Python IBM DataStage Teradata Control-M Robot Framework SQL
banking etl automation regulatory data-engineering

Lets collaborate!

Whether you need a quantitative researcher, an machine learning systems builder, or a technical advisor — I'm available for select consulting engagements.

Get in Touch →