Tom MacWright


For Sense City I wanted to isolate foreground objects in video. This is really a computer vision problem, but there are simple ways to cheat.

This technique that works well for my case:

  • A relatively static background
  • Lighting that changes over the course of a day
  • Moving, solid, and opaque objects

Isolating the Background

There are always foreground elements in the scene, like pedestrians, birds, or cars. This rules out the approach of choosing a single reference frame and computing foreground elements relative to an empty street. So, we’ll need a more flexible definition of foreground and background:

  • Foreground is something that isn’t there most of the time
  • Background is something that is there most of the time

Start with three frames. There are no ‘clean’ frames in this sample - each one contains cars, pedestrians, bicyclists, and so on.

To isolate the background, we’re going to run an operation over every pixel in each of these images. For instance, for the very top left corner, if that pixel is black for 2 frames and white for 1, running an average over the frames will make it a 66% gray.

Here’s what a mean looks like:

Ghostly cars are not ideal. The same ghosts haunt min & max:

min, max

So, statistical minds have already figured it out: foreground objects are outliers, so we’ll need a more robust statistic: the median.

The result isn’t perfect - some of the areas where bicyclist and motorist overlapped are still noticeable in the green protected area of the intersection.

But it’s quite usable: from here, all we need to do is subtract the median from each frame, and we’re starting to see isolated motion.

Now, instead of simply subtracting each pixel from the other, determine whether the difference between the two is beyond a certain threshold and just return one of them if true. This way colors are true to what’s visible in the image, rather than flipped and skewed by the difference between the median and frame.



The frames we picked for this example are close together in time, so they share similar daylight. A larger sample includes much more diversity in lighting:

The solution to this issue is to use windows: instead of finding the median of all frames in the dataset, run medians over local samples. So, a frame of video at 2pm would be compared against a median of all frames between 2:00 and 2:15pm.

In Code

Implementing this algorithm is incredibly simple. I used Sean Gillies’s rasterio Python package for the first pass, For this post, I reimplemented it in JavaScript as isolate-movement. I’ve stashed some of the prettier examples on /myland.

See Also

  • This approach is inspired by the Cloudless Atlas technique, which also uses pixel math to eliminate obstructions.
  • SIOX is a recent solution to a static version of this problem.
  • OpenCV’s optical flow algorithms also try to detect and track motion.