Tools
Tools: Building a French Address Validation API with 26M Addresses
2026-03-07
0 views
admin
The Problem: Fragmented French Geographic Data ## Architecture Overview ## Importing 26M Addresses from the BAN ## Address Validation with GeoTrust Scoring ## Elasticsearch for City Autocomplete ## Multi-Tenant API Keys & Rate Limiting ## What's Next The French government's Base Adresse Nationale (BAN) contains 26 million addresses — every street, every house number, every hamlet across mainland France and overseas territories. We built GEOREFER to make this data accessible through a single REST API, combined with company lookup from the SIRENE database. This is the technical story of how we did it. If you're building a FinTech product in France, you need to validate customer addresses for KYC compliance. Sounds simple, right? Here's what the landscape looks like in 2026: To do proper KYC, you need at least two of these APIs, with different auth mechanisms, different response formats, and different rate limits. We decided to build one API that does it all. GEOREFER is built on a straightforward Java stack: The architecture follows a clean layered approach: The BAN publishes its data as CSV files, updated monthly. The full dataset is around 3.5 GB compressed. The key challenge was handling the French administrative hierarchy: Each commune has an INSEE code (5 digits), one or more postal codes, and belongs to exactly one department. Paris, Lyon, and Marseille have arrondissements that function as sub-communes with their own INSEE codes. We store communes in a french_town_desc table with full hierarchy: The core feature is POST /addresses/validate. You send a French address, and we return: The GeoTrust Score is a weighted composite: City autocomplete needs to be fast — under 50ms for a good UX. We use Elasticsearch's Completion Suggester with a custom analyzer: The ASCII folding is critical for French cities. Users type "Beziers" but the official name is "Beziers". Our analyzer handles both. The GET /cities/autocomplete?q=marseil&limit=5 endpoint returns results in under 50ms, even with 35,000+ communes indexed. We also support fuzzy search with GET /cities/search?q=Monplier — using Elasticsearch's fuzziness AUTO parameter, this correctly returns "Montpellier" despite the typos. GEOREFER is a SaaS with 5 subscription plans: Each API key gets its own token bucket (Bucket4j) for rate limiting. Authentication goes through a Spring filter chain: The Feature Gate controls which endpoints each plan can access. For example, company search (/companies) requires PRO or higher, while city search is available on all plans. We're currently at 16.8 million SIRENE establishments imported and 35,000+ communes indexed. The API handles 39 endpoints across geographic data, address validation, company search, and admin/billing. If you're building anything that touches French addresses or company data, give it a try: In the next article, we'll deep-dive into how we query 16.8M SIRENE establishments in 66ms using PostgreSQL trigram indexes. AZMORIS Engineering — "Software that Endures" Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
Java 11 + Spring Boot 2.7.5
PostgreSQL 16 (42M+ rows across 12 tables)
Redis 7 (API key cache, TTL 5min)
Elasticsearch 7.17 (city autocomplete, fuzzy search) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
Java 11 + Spring Boot 2.7.5
PostgreSQL 16 (42M+ rows across 12 tables)
Redis 7 (API key cache, TTL 5min)
Elasticsearch 7.17 (city autocomplete, fuzzy search) CODE_BLOCK:
Java 11 + Spring Boot 2.7.5
PostgreSQL 16 (42M+ rows across 12 tables)
Redis 7 (API key cache, TTL 5min)
Elasticsearch 7.17 (city autocomplete, fuzzy search) CODE_BLOCK:
REST Controllers (17 controllers, 39 endpoints) |
Business Services (12 interfaces, 16 implementations) |
Repositories (JPA + Elasticsearch) |
PostgreSQL + Redis + Elasticsearch Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
REST Controllers (17 controllers, 39 endpoints) |
Business Services (12 interfaces, 16 implementations) |
Repositories (JPA + Elasticsearch) |
PostgreSQL + Redis + Elasticsearch CODE_BLOCK:
REST Controllers (17 controllers, 39 endpoints) |
Business Services (12 interfaces, 16 implementations) |
Repositories (JPA + Elasticsearch) |
PostgreSQL + Redis + Elasticsearch CODE_BLOCK:
Region (18) → Department (101) → Commune (35,000+) → Address (26M) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
Region (18) → Department (101) → Commune (35,000+) → Address (26M) CODE_BLOCK:
Region (18) → Department (101) → Commune (35,000+) → Address (26M) CODE_BLOCK:
SELECT f.name, f.insee_code, f.postal_code, d.name as department, r.name as region
FROM georefer.french_town_desc f
JOIN georefer.department d ON f.department_code = d.code
JOIN georefer.region r ON d.region_code = r.code
WHERE f.name ILIKE 'paris%' Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
SELECT f.name, f.insee_code, f.postal_code, d.name as department, r.name as region
FROM georefer.french_town_desc f
JOIN georefer.department d ON f.department_code = d.code
JOIN georefer.region r ON d.region_code = r.code
WHERE f.name ILIKE 'paris%' CODE_BLOCK:
SELECT f.name, f.insee_code, f.postal_code, d.name as department, r.name as region
FROM georefer.french_town_desc f
JOIN georefer.department d ON f.department_code = d.code
JOIN georefer.region r ON d.region_code = r.code
WHERE f.name ILIKE 'paris%' COMMAND_BLOCK:
curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \ -H 'Content-Type: application/json' \ -H 'X-Georefer-API-Key: YOUR_API_KEY' \ -d '{ "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "Paris", "country_code": "FR" }' Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \ -H 'Content-Type: application/json' \ -H 'X-Georefer-API-Key: YOUR_API_KEY' \ -d '{ "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "Paris", "country_code": "FR" }' COMMAND_BLOCK:
curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \ -H 'Content-Type: application/json' \ -H 'X-Georefer-API-Key: YOUR_API_KEY' \ -d '{ "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "Paris", "country_code": "FR" }' CODE_BLOCK:
{ "success": true, "data": { "validated_address": { "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "PARIS", "country": "France" }, "confidence_score": 95, "geotrust_score": { "overall": 92, "level": "LOW", "components": { "confidence": 95, "geo_consistency": 100, "postal_match": 100, "country_risk": 0 } } }
} Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
{ "success": true, "data": { "validated_address": { "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "PARIS", "country": "France" }, "confidence_score": 95, "geotrust_score": { "overall": 92, "level": "LOW", "components": { "confidence": 95, "geo_consistency": 100, "postal_match": 100, "country_risk": 0 } } }
} CODE_BLOCK:
{ "success": true, "data": { "validated_address": { "street_line": "15 Rue de la Paix", "postal_code": "75002", "city": "PARIS", "country": "France" }, "confidence_score": 95, "geotrust_score": { "overall": 92, "level": "LOW", "components": { "confidence": 95, "geo_consistency": 100, "postal_match": 100, "country_risk": 0 } } }
} CODE_BLOCK:
city_analyzer: edge_ngram (min=2, max=15) + ascii_folding
city_search_analyzer: standard + ascii_folding Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
city_analyzer: edge_ngram (min=2, max=15) + ascii_folding
city_search_analyzer: standard + ascii_folding CODE_BLOCK:
city_analyzer: edge_ngram (min=2, max=15) + ascii_folding
city_search_analyzer: standard + ascii_folding CODE_BLOCK:
Request → API Key validation (Redis cache) → Quota check → Rate limit → Feature gate → Controller Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
Request → API Key validation (Redis cache) → Quota check → Rate limit → Feature gate → Controller CODE_BLOCK:
Request → API Key validation (Redis cache) → Quota check → Rate limit → Feature gate → Controller - API Adresse (BAN) — Free, but no SLA, rate-limited to 50 req/s, and no company data
- La Poste RNVP — The gold standard for postal validation, but no public REST API
- Google Address Validation — Global coverage but $0.005/request adds up fast, and no SIRENE integration
- INSEE API SIRENE — Company data, but separate authentication, slow responses (~500ms), and no address validation - Download the latest BAN CSV export
- Parse with streaming CSV reader (no full file in memory)
- Batch insert using JDBC batch operations (batch size = 5000)
- Index city data into Elasticsearch for autocomplete - Confidence score (0-100) — how sure we are the address exists
- GeoTrust Score (0-100) — composite reliability score for KYC
- Validated address — normalized, corrected, with GPS coordinates
- AFNOR format — postal-standard NF Z 10-011 formatting - Free tier: 100 requests/day, no credit card required
- Docs: https://georefer.io/docs
- Sign up: https://georefer.io/#signup
- Examples: https://github.com/azmoris-group/georefer-examples
how-totutorialguidedev.toaimlpostgresqldatabasegitgithub