Cross-Border E-Commerce Intelligence for 1,000+ SKUs

Overview

At GoGlocal, I owned the full data product — from raw marketplace data scraping to ML-driven SKU automation at scale. The problem was cross-border e-commerce at scale: 1,000+ SKUs selling across Amazon, eBay, Walmart, and Lazada, each with different listing requirements, pricing dynamics, and buyer behavior patterns. Manual management was the bottleneck.

As Manager of Data Science, I directed the strategy and built the core machine learning systems that automated the most labor-intensive workflows — cutting manual effort by 50% and improving revenue estimation efficiency by 30%.

The Problem

Cross-border e-commerce at the SKU level is repetitive and data-intensive work: product categorization, keyword optimization, pricing research, inventory planning, and listing creation — all manually, across multiple marketplaces, for hundreds of products. At 1,000+ SKUs, this was operationally untenable.

The additional challenge was the diversity of the data: different marketplaces have different taxonomies, different listing formats, different pricing signals. A solution that worked on Amazon didn’t automatically work on Lazada.

Why It Mattered

Every hour spent manually listing a product was an hour not spent on pricing strategy or inventory positioning. The margin in cross-border e-commerce is thin — operational efficiency is a competitive advantage. Getting forecasts right at the SKU level meant the difference between optimal inventory positions and costly overstock or stockout situations.

Data & Inputs

Marketplace data: Amazon, eBay, Walmart, Lazada product listings — scraped at scale across multiple regions
Competitor pricing data — scraped and normalized across different currency, tax, and fee structures
Historical sales data: SKU-level sell-through rates, velocity, seasonality
Product attribute data: descriptions, images, weights, dimensions, categories
Demand signals: search volume, bestseller rank, review velocity

Data scraping was itself a significant engineering challenge — scraping at scale across regions with anti-bot measures required a robust, distributed pipeline.

Approach

The system was built in three layers:

Layer 1: SKU Intelligence

NLP-driven product classification using semantic similarity and attribute extraction. Given a product with its description and attributes, the system automatically placed it in the correct marketplace taxonomy, extracted key searchable attributes, and generated optimized listing copy using fine-tuned OpenAI models.

Image optimization using Stable Diffusion for background replacement and product visualization — a significant commercial requirement for marketplace listing quality.

Layer 2: Pricing Intelligence

Scraped competitor pricing data was normalized and used to train price-response models at the category level. The system recommended optimal price points per marketplace, accounting for marketplace fees, currency, and competitive position.

Real-time sales forecasting (demand at current price) combined with inventory levels generated expected profit margin per SKU per marketplace — enabling prioritization of the most valuable listing opportunities.

Layer 3: Demand Forecasting & Inventory Planning

SKU-level demand forecasting using XGBoost and LightGBM with marketplace-specific features. Trend tracking to identify velocity changes early. Replenishment recommendations based on forecast + lead time + safety stock.

Engineering & Implementation

Scraping infrastructure: Distributed scraping using Ray on AWS — horizontal scaling across multiple scrapers, with rotation logic and rate limiting
NLP pipeline: Semantic similarity using sentence transformers for product matching across marketplace taxonomies
LLM integration: Fine-tuned OpenAI models for listing copy generation — prompts engineered for marketplace-specific requirements
Image pipeline: Stable Diffusion for product image background replacement and enhancement
Serving: FastAPI endpoints for the merchant-facing analytics product — real-time SKU recommendations and forecast updates
Data store: PostgreSQL for normalized product and pricing data; Redis for hot pricing data

Multi-tenant architecture to support different merchant accounts with isolated data and compute.

Results & Impact

50% reduction in manual effort through workflow automation
30% improvement in revenue estimation efficiency — faster, more accurate forecasts per SKU
1,000+ SKUs automated across 4 marketplaces in multiple regions
NLP pipeline reducing listing creation time from hours to minutes per product
Pricing intelligence system enabling data-driven pricing decisions where previously pure manual judgment was used

Limitations & What I’d Do Differently

The marketplace taxonomy mapping was hand-crafted for the initial set of categories — a proper learned taxonomy alignment would scale better as new product categories are added.

The demand forecasting models were trained per-marketplace in isolation. A hierarchical model that shares information across marketplaces would improve accuracy for low-velocity SKUs where per-marketplace data is sparse.

Stack

Python, Ray, AWS (EC2, S3, Lambda), OpenAI API (fine-tuned), Stable Diffusion, sentence-transformers, XGBoost, LightGBM, FastAPI, PostgreSQL, Redis

Overview

The Problem

Why It Mattered

Data & Inputs

Approach

Engineering & Implementation

Results & Impact

Limitations & What I’d Do Differently

Stack

Related Writing

Stack

Lets collaborate!