Building Superligaen Analytics
This blog documents the end-to-end journey of building Superligaen Analytics — a live data engineering project that ingests football data from the Danish premier league, transforms it through a medallion architecture, and serves it on a public dashboard.
The project was built in roughly 10 days in April 2026, and almost nothing went according to the original plan. Every major tool choice had to be revisited at least once. This is the honest account of what happened, why, and what I’d do differently.
Posts in order:
- The Idea — Why I Built This
- Choosing a Data Source
- Building the Bronze Layer — Raw Ingestion
- Silver and Gold — Transforming Data into a Star Schema
- The Dashboard — Discovering Evidence.dev
- The Deployment Saga — Netlify, Cloudflare, and Finally Vercel
- Migrating to dbt — When Raw SQL Isn’t Enough
- Adding Web Analytics — Vercel and Cloudflare
- Global Launch — A Conclusion
- What’s Next — The Road Ahead
- Why We Migrated from api-football to Sportmonks
- Data Quality Tests — Making the Pipeline Fail Loudly
- Building the Player Analytics Layer
- Organising the Semantic Layer — From Raw Models to Mart Views
- Advanced BI Techniques — Making the Data Say More
- Fixing Cold-Start Failures in a DuckDB-WASM Dashboard
- Building a Fan Forum with an LLM Pipeline
- What Happens When You Share a Side Project on LinkedIn
- What Happens When You Share a Side Project on Reddit
Live dashboard: superligaanalytics.vercel.app
Source code: github.com/SaUgKi1773/data-engineering-demo
Posts
-
What Happens When You Share a Side Project on Reddit
On May 30 I posted the Superligaen Analytics dashboard to r/sportsanalytics. Three days later it had 6,200 views and was my number one Reddit post of all time.
-
What Happens When You Share a Side Project on LinkedIn
On April 25 I shared the Superligaen Analytics dashboard on LinkedIn. The next day, 45 people visited the site in a single day — more than the entire preceding week combined.
-
Building a Fan Forum with an LLM Pipeline
One of the more unusual features on the Superligaen dashboard is the Fan Forum on the match analysis page. Four fictional personas — a stats-obsessed analyst, an elderly lifelong fan, a passionate FC Nordsjælland supporter, and a referee-focused obsessive — post reactions to every completed match. The comments feel like a real forum thread because each persona has a defined character and they reference each other. This post is about how the pipeline behind it is built and the design decisions that shaped it.
-
Fixing Cold-Start Failures in a DuckDB-WASM Dashboard
The match analysis section of the Superligaen dashboard had an annoying property: it worked perfectly if you had already visited another page. It failed silently if you opened it directly. On mobile it almost never loaded. This post is about diagnosing that failure and the architectural change that fixed it.
-
Advanced BI Techniques — Making the Data Say More
The roadmap post described a gap between what the dashboard was doing and what it could do. Most of the charts were single-metric bar charts. They were readable. They also left most of the signal in the data on the table.
-
Organising the Semantic Layer — From Raw Models to Mart Views
When the gold layer was first built, the dashboard queried the dimensional model directly. A page that needed to show team standings would join
fct_team_matchestodim_team,dim_match,dim_stadium,dim_referee, anddim_match_result, filtering for completed matches and aggregating points and goal difference. That join pattern appeared in six or seven dashboard pages, written independently, with subtle differences in filter conditions between them. -
Building the Player Analytics Layer
The roadmap post described player analytics as data that was already sitting in the warehouse — it just needed to be modelled and served. That was accurate. Sportmonks returns per-player match statistics for every fixture as part of the lineups include. The challenge was not acquiring the data but deciding what to do with it.
-
Data Quality Tests — Making the Pipeline Fail Loudly
The roadmap post flagged data quality tests as a priority. The pipeline runs nightly. The dashboard is public. If bad data reaches the gold layer, real users see wrong numbers, and there is no automated check stopping that from happening. That situation had to change.
-
Why We Migrated from api-football to Sportmonks
This was not planned. The original architecture was built on api-football.com, the bronze layer was complete, the pipeline was running, and the dashboard was live. Then the nightly job started failing — and fixing it meant rebuilding the entire ingestion layer from scratch.
-
What's Next — The Road Ahead
This project started as a personal challenge: build a real end-to-end data engineering system using only free tools, on a dataset I actually care about. It shipped. It runs nightly. It has real users.
-
Adding Web Analytics — Vercel and Cloudflare
Once the dashboard was live, the natural question was: is anyone visiting it? We needed analytics.
-
Global Launch — A Conclusion
By April 2026 — roughly ten days after the first real commit — the pipeline was stable, the dashboard had seven pages, and the nightly job was running cleanly. It was time to call it launched.
-
The Deployment Saga — Netlify, Cloudflare, and Finally Vercel
This is the chapter I wish someone had written before I started. The deployment story is not a story about bad tools — Netlify, Cloudflare Pages, and Vercel are all good products. It is a story about free tier constraints that are easy to overlook until you hit them, and about how a project with an unusual build profile (large data files, Node.js compilation, MotherDuck token handling) does not fit neatly into the assumptions any of these platforms make.
-
Migrating to dbt — When Raw SQL Isn't Enough
When the silver and gold layers were first built, they ran as plain SQL files executed by Python runner scripts —
run_silver.pyandrun_gold.py. Each script would read a directory of.sqlfiles, connect to MotherDuck, and execute them in a specific order. It worked. The data was correct. But as the number of models grew and the logic became more complex, the cracks in the approach started to show. -
The Dashboard — Discovering Evidence.dev
I knew from the start that I wanted a live public dashboard, not a static report or a screenshot. The question was which tool to use.
-
Silver and Gold — Transforming Data into a Star Schema
With 21 tables of raw JSON sitting in MotherDuck, the next step was to make the data actually usable. That meant two more layers: silver (clean, structured tables) and gold (a Kimball dimensional model designed for analytics).
-
Building the Bronze Layer — Raw Ingestion
The bronze layer has one job: pull data from the API and store it in the warehouse exactly as it arrived. No transformation, no business logic. If the API gives you a nested JSON blob, you store a nested JSON blob. The philosophy is that raw data is irreplaceable — once you transform it, you lose the original, and if your transformation logic turns out to be wrong you have nothing to go back to.
-
Choosing a Data Source
Choosing the Data Source
-
The Idea — Why I Built This
Every project starts somewhere. This one started with two things that happened to collide at the right moment.
subscribe via RSS