How Web Maps Work

This post aims to summarize the basic elements of web maps at a deep level but without any implementation-specific details: it should roughly describe the paradigm adopted by Leaflet, Modest Maps, and OpenLayers, and the slippy map tilename standard.

To those not indoctrinated into programming, this is the basis of Google Maps and similar sites like OpenStreetMap.

This doesn't describe non-tiled maps, like non-tiled WMS layers suported by OpenLayers.Layer.WMS with singleTile: true.

It does generally describe tiled vector maps, with the obvious caveat that, instead of simply displaying raster data, the browser rasterizes a vector source before adding it to the map.

Goal: Tile Layout

The end result of a map client is a tile layout: given a viewport of, say, 640x480px, the client finds all tiles within a certain zoom level and centerpoint which intersect with this viewport. It then arranges them so that tiles are perfectly adjacent.

Data Types

Tiles

Tiles are chunks of raster or vector data. Most commonly, tiles are images of 256x256px, because they're broadly supported, and fast to consume, but they can also be 512x512px.

Tiles can also be JSON, GeoJSON, Protocol buffers, or another kind of vectorized data, so that rasterization can be pushed to the client. This has its advantages and disadvantages, but that's another discussion: to the map client, vector tiles are equivalent to raster tiles except with an additional rendering step in-browser.

Coordinates

Tile coordinates are tuples with three elements:

tile: [zoom, column, row]

Unlike locations and pixel coordinates, tile coordinates uniquely identify maps because they include a zoom level. They also differ in that coordinates can be interpreted as having area, so the coordinate [2, 0, 0] occupies the square of space between it and [2, 1, 1].

Since tiles are quad-tree indexed, they're transformed between coordinates like such:

tile: [zoom level, column, row] = [zoom level + 1, column * 2, row * 2] = [zoom level + n, column * 2^(n - zoom level), column * 2^(n - zoom level)]

So the location of the tile [2, 1, 1] (zoom 2, column & row 1) becomes [3, 2, 2] at zoom level 3, and so on.

Ref: Modest Maps Coordinate.js

Locations

Locations are geographical locations: they can be represented as [latitude, longitude] pairs. A location maps to a different, unique coordinate at each zoom level.

The latitude longitude values are assumed to be WGS84 unless otherwise specified.

Ref: Leaflet LatLng | Modest Maps Location | OpenLayers LonLat

Pixel locations

Pixel locations are the most ephemeral and immediate units in web maps: the pixel location of a point is a [x, y] pair of its offset from the top-left of the map element. Every time that the map moves, all pixel locations become invalid, so positioning elements on the map with pixels requires you to reposition those elements every call to draw().

Ref: Modest Maps Point | OpenLayers Point

Relationships Between Data Types

You can 'convert' between these different representations, but since each type represents a different concept, it's not 1:1.

Coordinate → Location: Each coordinate maps to one geographical location
Location → Coordinate: Since coordinates have zoom as well as location, locations map to a different coordinate at each zoom level
Coordinate / Location → Point: Coordinates and locations at a particular zoom levels map to points, but this changes whenever the map centerpoint moves.
Coordinate → Tile: Coordinates yield tiles: the coordinate [0, 0, 0] could yield a tile with the URL http://c.tile.openstreetmap.org/0/0/0.png. Tiles represent coordinates.

Functionality

Map State

Given the goal of tile layout and the definition of a tile, let's start thinking about the map client. Like any application, it is defined by its state.

The state of the map is its current zoom level and centerpoint: this changes every time that the map moves, zooms, pans, etc. Other attributes, like layer selections, styling, and such, will be described as configuration.

There are multiple ways to represent state:

A 'geographical location' and a zoom level
A coordinate

The Modest Maps tradition uses a coordinate, so we'll describe those first, but Leaflet and OpenLayers use geographical locations, which are equivalent.

Changing Map State

Changing the map's state - in this case, represented by a Coordinate, is an appropriate place for getters and setters. A minimal API for changing map state could be

map.setCoordinate(coordinate);

And, indeed, this is what Modest Maps does inside of its higher abstractions. But a typical set of abstractions would be:

map.setCenter(location);
map.setZoom(zoomlevel);

Internally each of these functions will convert the location into a coordinate, assign that as the current map state, and call draw(), a function that renders the map onto the page - which is what we'll address in the next section.

Rendering

Maps are rendered: typically as <img> elements in <div> elements.

The structure of an images-in-the-dom library is like so:

<div id='map'>
  <div id='layer'>
    <div id='zoom-0'>
      <img src='tile/z/x/y.png'>
      <img src='tile/z/x/y.png'>
      ...
    </div>
  </div>
</div>

The structure here is extremely similar for libraries that use SVG instead of HTML elements like polymaps.

Some libraries, like pixymaps, render to Canvas elements, but the render step is very similar:

// (pseudocode)

// HTML
function positionTile(tile) {
    moveElementToPoint(tile.element, map.coordinatePoint(tiles.coordinate));
}

// Canvas
function positionTile(tile) {
    drawElementAtPoint(tile.element, map.coordinatePoint(tiles.coordinate));
}

So the main task of positionTile in the common case - in which it's dealing with HTML elements - is positioning.

The simplest way to do this is with absolute positioning inside of relative positioning - the map's parent is relative, the children are absolutely positioned.

An optimization here is to use CSS transforms, which has a few advantages: they can cause fewer reflows, use hardware acceleration in their 3D versions, and, unlike CSS's width and height properties, the scaling transformation is inherited.

Map Interaction

Interaction is typically in the domain of 'handlers' which split up different ways of moving and changing the map. A typical set of interactions might be

Double-click → zoom in around a point
Drag → pan
MouseWheel → zoom
Single-finger swipe → pan
Double-finger pinch → zoom

These handlers connect to the source of movement - like mouse events in JavaScript or a GestureDetector in Android's Java environment. Internally, they can use the map's API, so they can be exchanged for other handlers without rearchitecting the map.

Ref: Leaflet Map.Drag.js | Modest Maps mouse.js | OpenLayers Drag.js

Tile Management and Removal

At each zoom level, the number of potential tiles increases exponentially - at zoom level 5, 1,024 tiles are possible. The interface therefore needs to intelligently load tiles that are visible and prune those that aren't.

To find what tiles are on screen, you find the Coordinates of the top-left and bottom-right corners of the screen, and loop through them:

// (pseudocode)

// The top left corner is the point 0, 0
var tl = map.pointCoordinate(new Point(0, 0));
// The bottom right corner is the size of the map
var br = map.pointCoordinate(new Point(map.width, map.height));
// Looking to build a list of valid tiles
var tiles = [];
// Loop through the map's space
for (var col = tl.column; col < br.column; col++) {
    for (var row = tl.row; row < br.row; row++) {
       tiles.push([map.zoom, col, row]);
    }
}
// Request these tiles

The simplest way to implement tile removal is with an LRU cache - tiles that haven't been displayed recently are the first ones to be flushed from the cache. On the web, this isn't necessarily part of the map client itself, since web browser implement their own caching.

Updates

This was rewritten and crossposted on MapBox for its developers section

May 15, 2012 Tom MacWright
@macwright.com on Bluesky, @tmcw@mastodon.social on Mastodon

Tom MacWright