What's Next — The Road Ahead
This project started as a personal challenge: build a real end-to-end data engineering system using only free tools, on a dataset I actually care about. It shipped. It runs nightly. It has real users.
But there is a lot more to build.
Here is what is on the roadmap.
dbt Semantic Layer
Right now the gold layer exposes raw dimensional tables and fact tables. The dashboard queries them directly with hand-written SQL. This works, but it means business logic lives in two places — the transformation layer and the dashboard queries.
The dbt Semantic Layer would centralise all metric definitions in one place. total_goals, win_rate, xg_overperformance — defined once in dbt, queryable everywhere. The dashboard would consume metrics rather than writing joins. No more drift between how a metric is calculated in one page versus another.
Data Quality Tests
The pipeline runs nightly and the dashboard is public. If bad data makes it through, real users see wrong numbers — and there is currently no automated check stopping that from happening.
dbt has a built-in testing framework that fits naturally into the existing setup. Tests live alongside the models and run as part of the same pipeline. The basics are straightforward: uniqueness and not-null constraints on keys, accepted value checks on categorical columns, referential integrity between the fact table and every dimension. These catch the obvious failures — a venue ID that resolves to nothing, a match result outside the expected set, a duplicate surrogate key.
Beyond the built-in tests, the dbt-expectations package brings a richer set of statistical checks: row count thresholds, value range assertions, column distribution checks. These are useful for catching subtler issues — a round where suspiciously few goals were recorded, a team with negative possession, a season where no matches were flagged as complete.
The goal is for every nightly run to either produce correct data or fail loudly. Silent corruption is the worst outcome in a pipeline like this.
Player Analytics
The bronze layer already ingests player-level data — appearances, goals, assists, shots, passes, cards, ratings — for every fixture. None of it surfaces in the dashboard yet.
The plan is to build a full player analytics layer on top of what is already there: top scorers, top assisters, player form over time, contribution per 90 minutes. A player profile page in the dashboard. Head-to-head comparisons.
The data is sitting in the warehouse. It just needs to be modelled and served.
Beyond the Top Flight
Right now the pipeline only ingests Superligaen — the Danish top division. But the same API covers the full Danish football pyramid: the 1st Division (second tier), the 2nd Division, and the DBU Pokalen cup competition.
The plan is to extend ingestion to cover all of these, model them through the same bronze → silver → gold pipeline, and build dedicated dashboard pages for each competition. Teams moving up and down between divisions, cup upsets, cross-division comparisons — all of it becomes possible once the data is flowing.
Discussions Page
This is the most experimental idea on the list.
The concept: a page in the dashboard where the data is not just displayed but discussed. Different analytical personalities — a statistician who trusts only the numbers, a football traditionalist who distrusts xG, a fan who reads into every result — analyse the same data and reach different conclusions.
The personas would be generated by a language model, grounded in the actual data from the warehouse, and updated each matchday. It would make the dashboard less of a static report and more of a living conversation about the season.
Whether this is useful or just a novelty is an open question. But it is worth finding out.
Advanced BI Techniques
The current dashboard tells you what happened. The next step is to tell you what it means — and to do that, the visualisations need to work harder.
Right now most charts are single-metric bar charts or line charts. They are readable, but they leave a lot of the data on the table. The plan is to move toward techniques that surface relationships and context that are invisible in a single-axis view.
Scatter plots comparing attacking output versus defensive solidity across teams. Radar charts that give a full performance fingerprint for a team or player in a single glance. Rolling averages that separate a genuine form run from a single good result. These are standard tools in professional football analytics — and the data to drive all of them is already in the warehouse.
On the benchmarking side, the dashboard currently shows a team’s numbers in isolation. A win rate of 60% means something very different depending on whether the league average is 40% or 55%. The plan is to add contextual benchmarks throughout: league averages as reference lines on charts, percentile rankings alongside raw values, and head-to-head comparisons that anchor a team’s performance relative to its peers.
The goal is a dashboard where a casual fan understands the story at a glance, and an analyst can find genuine signal without exporting to a spreadsheet.
Closing Thought
The original goal was to learn by building something real. That goal was met. But the more interesting discovery is that a project like this does not have a natural end — it just has the next thing to build.
The data keeps arriving. The season keeps moving. The dashboard keeps growing.