Technical Blog

Estimate the area of lakes from Google Maps

I am presenting an algorithm to estimate the surface area of lakes from Google Maps images. The objective is to detect the lakes, and compute their area.

The input image

I am using the Map mode on Google Maps. My aim is to focus on the area computing algorithm, rather than preprocessing. That is why I use this mode rather than the satellite view, which is way more complicated to process. I even modified a bit the image with a graphic editing software to remove the names superimposed on the lakes.

The image we will use is a view of three lakes in Bolivia: Salar de Uyuni, Salar de Coipasa and Poopo Lake.

Lake segmentation

The first step is the lake segmentation: we want to detect where the lakes are.
Lakes in Google Maps are show with an almost uniform color. We can assume that a pixel belongs to a lake if it matches the following rules:

  • abs(Red_Component – 167) <= 5
  • abs(Green_Component – 190) <= 5
  • abs(Blue_Component – 221) <= 5

where abs is the absolute value.
We can build a thresholded image: if the pixel matches these rules, set it to 1, otherwise 0.
We obtain the following image:

Lake identification & measure

Next step is to identify each lake, i.e. getting information about each shape we have detected. We do a labeling process using connected components. Our objective is to extract the number of pixel for each lake.
The code looks like this (used to be C, but I added a few C++ structures):

std::vector<int> *label(IplImage *img)
{
  // We want to know if a pixel has been visited or not
  char **seen = (char **) malloc(img->width * sizeof (char));
  for (int i = 0; i < img->width; ++i)
    seen[i] = (char *) calloc(img->height, 1);

  std::vector<int> *comp_size_cnt = new std::vector<int>(); // nb of pixel per lake

  for (int y = 0; y < img->height; ++y)
    for (int x = 0; x < img->width; ++x)
    {
      if (seen[x][y]) // already marked point
        continue;
      if (ACCESS_PIXEL(img, x, y)) // lake
        comp_size_cnt->push_back(labeling_sub(seen, img, x, y));
    }

  for (int i = 0; i < img->width; ++i)
    free(seen[i]);
  free(seen);
  return comp_size_cnt;
}

static int labeling_sub(char **seen, IplImage *img, int x, int y)
{
  int count = 0; // number of pixels in the lake
  std::stack<std::pair<int, int> > stack; // pixels to visit
  stack.push(std::make_pair(x, y)); // first pixel to visit

  while (!stack.empty())
  {
    // getting the pixel to visit and pop it from the stack
    const std::pair<int, int> point = stack.top();
    stack.pop();
    x = point.first;
    y = point.second;
    if (seen[x][y]) // already processed pixel
      continue;
    seen[x][y] = 1; // mark it visited
    count += 1;

    // Visiting the neighbor (8 connexity, but this is badly done)
    for (int i = x - 1; i <= x + 1; ++i)
      for (int j = y - 1; j <= y + 1; ++j)
      {
	    if ((i >= 0) && (i < img->width) // bound checking
            && (j >= 0) && (j < img->height)
            && !seen[i][j] // pixel hasn't been visited yet
	        && ACCESS_PIXEL(img, i, j) // pixel belongs to the lake
           )
	    stack.push(std::make_pair(i, j)); // we want to visit (i, j)
      }
  }
  return count;
}

I also added a bit of code to these functions to display the result we get, which looks like this:

Area estimation

The final step is to calculate the lake area. The technique is quite straightforward: we know the total number of pixel for the entire image and for each lake. Therefore, we can determine the percentage of the map that is the lake. Using the map’s scale, we can also compute the total area of the map. Finally, we can combine the ratio and the total area to get the estimation of that of the lake’s.

I processed the scale information manually: I determined the scale’s width (in pixels), and read the distance in relation to this scale.

The code is the following:

void process_area(int width, int height, // image size
                  const std::vector<int> &size, // number of pixel per lake
                  int scale_pixel, int scale_km) // scale information
{
  float km_per_pixel = (float)scale_km / scale_pixel;
  float real_width = width * km_per_pixel;
  float real_height = height * km_per_pixel;
  float total_area = real_width * real_height;
  int total_pixel = width * height;

  for (int i = 0; i < size.size(); ++i) // for each lake...
  {
    float ratio = (float) size[i] / (float) (total_pixel);
    std::cout << "Lake " << i << ": " << ratio * total_area <<  std::endl;
  }
}

Results

I summarized the areas in this table:

Lake’s name Computed (km2) Real (km2) Error (%)
Salar de Coipasa 2300 2200 4.3
Poopo Lake 2567 2530 1.5
Salar de Uyuni 11121 10582 5.1

High level of accuracy is hard to obtain. The map is a small image that represents a very large region, so we can’t be more precise. The borders of the lake can also make a big difference, as shown in the following close-up. It is complicated to know where the lake ends and the shore starts.

The algorithm inspiration

The inspiration comes from the french Wikipedia article about the Monte-Carlo Method. It is a set of stochastic methods to solve deterministic problems.
As an illustration, they show a way to estimate a lake’s area using a cannon.
The idea is to shoot randomly X cannonballs in a square with a known area. N represents the number of cannonballs that ended up in the lake.


You can estimate the area using the following formula:

Area_{Lake} = \frac{(X - N)}{N} \times Area_{Ground}

In this article, we consider that we sent one cannonball in each pixel, so we remove the stochastic aspect.