Eurostat Pipeline
Eurostat's Urban Audit publishes 220+ comparable indicators across 1,000+ European cities as flat TSV files. Cittopia's pipeline turns these into the per-city JSON datasets that power the atlas.
Raw input #
Eurostat ships datasets as gzipped TSV files such as estat_urb_cpop1.tsv.gz (population by year, by city, by indicator). Each row is one observation across many time columns; each cell may be a number, a flag, or a colon (missing).
Pipeline stages #
- Decompress & parse. Stream the .gz file, parse TSV, identify the indicator code and city code columns.
- City code mapping. Match Eurostat city codes (e.g.
BG003Cfor Varna) against Cittopia's canonical city slugs. - Time-series construction. For each city + indicator, build a {year: value} dict. Drop colons (missing) and flagged estimates.
- Normalisation. Apply unit conversion and z-scoring against the corpus.
- Confidence stamp. Compute a freshness score per indicator (full credit if < 2 years old, decaying after).
- Write JSON. Emit per-city objects into
assets/data/global-cities.js.
Example: Varna (BG003C) #
varna: {
eurostat: 'BG003C',
indicators: {
population: { 2013: 334_679, ..., 2022: 332_686 },
employment_rate: { 2013: 53.8, ..., 2022: 56.4 },
...
},
confidence: 0.86,
}
CoverageApril 2026: 12 EU cities are fully ingested. Full Urban Audit coverage (1,000+ cities) is planned for Phase 5 of the roadmap.
Last updated 30 April 2026 by Tunç Meriç
Suggest an edit