← All projects

ETL Pipeline: Bank Info

Extracts a Wikipedia table, converts market cap currency via a local exchange-rate CSV, and loads to CSV + SQLite. Containerized with Docker and automated with a CI/CD pipeline (GitHub Actions). Demonstrates production-ready ETL practices at small scale.


Problem

Demonstrate a clean, reproducible ETL workflow with portable execution and verifiable outputs.

Context and constraints

  • Extract: Wikipedia "List of largest banks" (by market cap)
  • Transform: convert USD to GBP/EUR/INR using local exchange-rate CSV
  • Load: CSV + SQLite
  • Operational: CLI, Makefile shortcuts, Docker, CI smoke test

Approach

Architecture

flowchart TB
  A[Wikipedia table] --> B[Extract]
  B --> C[Clean & normalize]
  C --> D[Join exchange rates CSV]
  D --> E[Currency conversion]
  E --> F[Write CSV]
  E --> G[Load SQLite]
  H[CI smoke test] --> I[Build verification]
    

Implementation highlights

Results and impact

End-to-end ETL with operational readiness patterns (Docker + CI). No runtime performance metrics claimed.

Tech stack

Python, SQLite, Docker, GitHub Actions.

Links

What I'd improve next

Add incremental refresh logic and schema validation to handle upstream HTML/table changes.