Technical Blog

Category : OpenCV

Follow a region of an image frame by frame

My objective is to detect a moving object with an immobile camera, and then to track it with a moving camera. I will present how I made the second part.
My first algorithm detects the moving object, by drawing a square around it.

In this example, the object is a ball. It is quite straightforward to detect, but it could be something more complex like a cat. Therefore, I needed an algorithm that could handle any kind of object.

Once I’ve detected a moving object, I can go to the second step: tracking it.

Tracking the object

We detected an object in the frame N. We have to find it back in the next pictures: N + 1, N + 2 etc., as long as the object is in the field of view.

We cannot use the previous algorithm. The technique to detect the moving object is a video surveillance technique, that only works with a static camera.

It turns out that the camshift algorithm is the one we need. It is a modified meanshift algorithm, that can handle a scaling on the object. Therefore, it still works if the altitude of the camera changes. It can find where the region is in the next frame.

The process to use camshift is the following:

  • Initialize the tracking with the previously detected object in frame N
  • Call the tracker with the new frame. It will detect the object and update the position of the last detected zone. Do it again with each new frame.

We use the algorithm with the second frame, and display with an ellipse the result:

The ellipse should be a circle, but what matters is that the center of the shape is (almost) the center of the ball. A few parameters can be configured in the camshift algorithm, and therefore we can optimize it.

The algorithm works on real time:


    The result is convincing, however camshift has some drawbacks:

  • The algorithm uses the hue of the image, so it doesn’t work when the object to track is close to white or black.
  • It requires that the object did not move too much.

For example, in this image the object or the camera moved to much. The difference of position is to high, so the algorithm failed to detect it.

Some code

I use the OpenCV’s function cvCamShift, but it requires more code to work. Therefore, I decided used a wrapper. I found more or less the same code on a lot of different place, but I used Billy Lamberta’s C Wrapper. I used camshifting.[ch] to build my own program.

// returns a CvRect containing the object to track
// here the value is hardcoded, but we should put here the function
// that detects the moving object
static CvRect *get_object_rect(void)
  CvRect* object_rect =  malloc(sizeof (CvRect));
  *object_rect = cvRect(235, 40, 50, 50);
  return object_rect;

int main (void)
  const int nbImages = 6;
  const char *files[nbImages] = {"1.jpeg", "2.jpeg", "3.jpeg", "4.jpeg", "5.jpeg", "6.jpeg"};
  IplImage *in[nbImages];

  for (int i = 0; i < nbImages; ++i)
    in[i] = cvLoadImage(files[i], CV_LOAD_IMAGE_COLOR);

  cvNamedWindow("example", CV_WINDOW_AUTOSIZE);

  CvRect* object_rect = get_object_rect();
  /*  // Use this to check that the rectangle is correct
              cvPoint(object_rect->x, object_rect->y),
              cvPoint(object_rect->x + object_rect->width, object_rect->y + object_rect->height),
              cvScalar(255, 0, 0, 1), 1, 8, 0);
  cvShowImage("example", in[0]); // Display initial image

  TrackedObj* tracked_obj = create_tracked_object(in[0], object_rect);
  CvBox2D object_box; //area to draw around the object                                                                                                         

  IplImage* image;
  //object detection with camshift
  for (int i = 1; i < nbImages; ++i)
    image = in[i];
    //track the object in the new frame
    object_box = camshift_track_face(image, tracked_obj);

    //outline object ellipse
    cvEllipseBox(image, object_box, CV_RGB(0,0,255), 3, CV_AA, 0);
    cvShowImage("example", image);

  //free memory
  for (int i = 0; i < nbImages; ++i)

  return 0;

Estimate the area of lakes from Google Maps

I am presenting an algorithm to estimate the surface area of lakes from Google Maps images. The objective is to detect the lakes, and compute their area.

The input image

I am using the Map mode on Google Maps. My aim is to focus on the area computing algorithm, rather than preprocessing. That is why I use this mode rather than the satellite view, which is way more complicated to process. I even modified a bit the image with a graphic editing software to remove the names superimposed on the lakes.

The image we will use is a view of three lakes in Bolivia: Salar de Uyuni, Salar de Coipasa and Poopo Lake.

Lake segmentation

The first step is the lake segmentation: we want to detect where the lakes are.
Lakes in Google Maps are show with an almost uniform color. We can assume that a pixel belongs to a lake if it matches the following rules:

  • abs(Red_Component – 167) <= 5
  • abs(Green_Component – 190) <= 5
  • abs(Blue_Component – 221) <= 5

where abs is the absolute value.
We can build a thresholded image: if the pixel matches these rules, set it to 1, otherwise 0.
We obtain the following image:

Lake identification & measure

Next step is to identify each lake, i.e. getting information about each shape we have detected. We do a labeling process using connected components. Our objective is to extract the number of pixel for each lake.
The code looks like this (used to be C, but I added a few C++ structures):

std::vector<int> *label(IplImage *img)
  // We want to know if a pixel has been visited or not
  char **seen = (char **) malloc(img->width * sizeof (char));
  for (int i = 0; i < img->width; ++i)
    seen[i] = (char *) calloc(img->height, 1);

  std::vector<int> *comp_size_cnt = new std::vector<int>(); // nb of pixel per lake

  for (int y = 0; y < img->height; ++y)
    for (int x = 0; x < img->width; ++x)
      if (seen[x][y]) // already marked point
      if (ACCESS_PIXEL(img, x, y)) // lake
        comp_size_cnt->push_back(labeling_sub(seen, img, x, y));

  for (int i = 0; i < img->width; ++i)
  return comp_size_cnt;

static int labeling_sub(char **seen, IplImage *img, int x, int y)
  int count = 0; // number of pixels in the lake
  std::stack<std::pair<int, int> > stack; // pixels to visit
  stack.push(std::make_pair(x, y)); // first pixel to visit

  while (!stack.empty())
    // getting the pixel to visit and pop it from the stack
    const std::pair<int, int> point =;
    x = point.first;
    y = point.second;
    if (seen[x][y]) // already processed pixel
    seen[x][y] = 1; // mark it visited
    count += 1;

    // Visiting the neighbor (8 connexity, but this is badly done)
    for (int i = x - 1; i <= x + 1; ++i)
      for (int j = y - 1; j <= y + 1; ++j)
	    if ((i >= 0) && (i < img->width) // bound checking
            && (j >= 0) && (j < img->height)
            && !seen[i][j] // pixel hasn't been visited yet
	        && ACCESS_PIXEL(img, i, j) // pixel belongs to the lake
	    stack.push(std::make_pair(i, j)); // we want to visit (i, j)
  return count;

I also added a bit of code to these functions to display the result we get, which looks like this:

Area estimation

The final step is to calculate the lake area. The technique is quite straightforward: we know the total number of pixel for the entire image and for each lake. Therefore, we can determine the percentage of the map that is the lake. Using the map’s scale, we can also compute the total area of the map. Finally, we can combine the ratio and the total area to get the estimation of that of the lake’s.

I processed the scale information manually: I determined the scale’s width (in pixels), and read the distance in relation to this scale.

The code is the following:

void process_area(int width, int height, // image size
                  const std::vector<int> &size, // number of pixel per lake
                  int scale_pixel, int scale_km) // scale information
  float km_per_pixel = (float)scale_km / scale_pixel;
  float real_width = width * km_per_pixel;
  float real_height = height * km_per_pixel;
  float total_area = real_width * real_height;
  int total_pixel = width * height;

  for (int i = 0; i < size.size(); ++i) // for each lake...
    float ratio = (float) size[i] / (float) (total_pixel);
    std::cout << "Lake " << i << ": " << ratio * total_area <<  std::endl;


I summarized the areas in this table:

Lake’s name Computed (km2) Real (km2) Error (%)
Salar de Coipasa 2300 2200 4.3
Poopo Lake 2567 2530 1.5
Salar de Uyuni 11121 10582 5.1

High level of accuracy is hard to obtain. The map is a small image that represents a very large region, so we can’t be more precise. The borders of the lake can also make a big difference, as shown in the following close-up. It is complicated to know where the lake ends and the shore starts.

The algorithm inspiration

The inspiration comes from the french Wikipedia article about the Monte-Carlo Method. It is a set of stochastic methods to solve deterministic problems.
As an illustration, they show a way to estimate a lake’s area using a cannon.
The idea is to shoot randomly X cannonballs in a square with a known area. N represents the number of cannonballs that ended up in the lake.

You can estimate the area using the following formula:

Area_{Lake} = \frac{(X - N)}{N} \times Area_{Ground}

In this article, we consider that we sent one cannonball in each pixel, so we remove the stochastic aspect.

Create a video with the AR.Drone

The AR.Drone is an efficient source of images: it can fly, being remotely controlled, etc. We will see in this article how to create avi files from its cameras. We will use OpenCV to create it, so you’ll probably need first to take a look at Use OpenCV with the AR.Drone SDK.

OpenCV code

We will use the CvVideoWriter structure to build our avi file.
Firstly, we need a function to initialize it.

CvVideoWriter *init_video_writer(char *fname)
  int isColor = 1;
  int fps     = 30;
  int frameW = 320;
  int frameH = 240;
  return cvCreateVideoWriter(fname, // with avi extension
                             CV_FOURCC('D', 'I', 'V', 'X'), //MPEG4

This feature is handled in my project with a button. I added to functions that are call by the button’s callback :

static CvVideoWriter *video_writer = 0;
void init_video(void)
  video_writer = init_video_writer();

void stop_video(void)
  // Necessary to have a valid avi file

I added a function to add a frame to the video:

inline void add_frame(IplImage *img)
  if (video_writer)
    cvWriteFrame(video_writer, img);

Finally, we need to call the add_frame function every time we receive a new frame from the drone. I added it in the output_gtk_stage_transform function in Video/video_stage.c.
Underneath the code creating the OpenCV image, I added

if (/* video saving is enabled by the user */)

Handling different frame rates

The AR.Drone has two cameras, using two different frame rates:

  • The frontal camera has 15 FPS
  • The vertical camera has 60 FPS

In the previous code, video was created with 30FPS. Therefore, one camera will look to slow, and the other one to fast. Therefore, we can update the function this way:

CvVideoWriter *init_video_writer(char *fname, int fps)
  int isColor = 1;
  int fps     = fps;
  int frameW = 320;
  int frameH = 240;
  return cvCreateVideoWriter(fname, // with avi extension
                             CV_FOURCC('D', 'I', 'V', 'X'), // //MPEG4

Then, we can use two ways to call it:

video_writer = init_video_writer("out_horizontal.avi", 15);


video_writer = init_video_writer("out_vertical.avi", 60);

Refer to this page for more information about possible codecs.

Use OpenCV with the AR.Drone SDK

OpenCV (Open Source Computer Vision Library) is a powerful image processing library. I will detail in this post how to use it with the AR.Drone’s C SDK.

Compiling the AR.Drone SDK with OpenCV

The first step is to install OpenCV. If you’re using Ubuntu, you may refer to this page.

Once the library is installed, we need to modify the Makefile to add the correct flags. We will edit sdk_demo/Build/Makefile.

To add the correct cflags, find the line:


and add underneath:

GENERIC_INCLUDES += `pkg-config --cflags opencv` 

To add the correct libraries, change the following line:

GENERIC_LIBS=-lpc_ardrone -lgtk-x11-2.0 -lrt


GENERIC_LIBS=-lpc_ardrone -lgtk-x11-2.0 -lrt `pkg-config --libs opencv`

Creating an OpenCV image from the drone’s image

We need to update the output_gtk_stage_transform in Video/video_stage.c to transform the data received from the drone to an IplImage, the OpenCV image structure. First, we need to add some includes:

#include "cv.h"
#include "highgui.h" // if you want to display images with OpenCV functions

We will use a method close to what we did to create a GdkPixbuf:

IplImage *ipl_image_from_data(uint8_t* data)
  IplImage *currframe;
  IplImage *dst;

  currframe = cvCreateImage(cvSize(320,240), IPL_DEPTH_8U, 3);
  dst = cvCreateImage(cvSize(320,240), IPL_DEPTH_8U, 3);

  currframe->imageData = data;
  cvCvtColor(currframe, dst, CV_BGR2RGB);
  return dst;

We call it from output_gtk_stage_transform in Video/video_stage.c:

IplImage *img = ipl_image_from_data((uint8_t*)in->buffers[0], 1);

Vertical camera handling

As detailed in a previous article, the images captured with the vertical camera has a lower size than the horizontal camera. The data transmitted has the same size in both cases, but with empty pixels. I updated the ipl_image_from_data:

IplImage *ipl_image_from_data(uint8_t* data, int reduced_image)
  IplImage *currframe;
  IplImage *dst;

  if (!reduced_image)
    currframe = cvCreateImage(cvSize(320,240), IPL_DEPTH_8U, 3);
    dst = cvCreateImage(cvSize(320,240), IPL_DEPTH_8U, 3);
    currframe = cvCreateImage(cvSize(176, 144), IPL_DEPTH_8U, 3);
    dst = cvCreateImage(cvSize(176,144), IPL_DEPTH_8U, 3);
    currframe->widthStep = 320*3;

  currframe->imageData = data;
  cvCvtColor(currframe, dst, CV_BGR2RGB);
  return dst;

The trick is the same as detailed in the previous article. We set that each new line starts every 320*3 bytes, but we only use 176*3 byes per line.

Converting OpenCV images to GdkPixbuf

If you’re using a GTK interface as detailed in previous articles, you may want to display the OpenCV image inside your GTK Window. To do this, I use the following function to create a GdkPixbuf structure that can be displayed by GTK:

GdkPixbuf* pixbuf_from_opencv(IplImage *img, int resize)
  IplImage* converted = cvCreateImage(cvSize(img->width, img->height), IPL_DEPTH_8U, 3);
  cvCvtColor(img, converted, CV_BGR2RGB);

  GdkPixbuf* res = gdk_pixbuf_new_from_data(converted->imageData,
  if (resize)
    res = gdk_pixbuf_scale_simple(res, 320, 240, GDK_INTERP_BILINEAR);

  return res;