Archive for the ‘Work’ Category

It was fun while working with the following image for character detection, (challenging too!)
Screenshot

 

Trying the algo on a random pic containing Indian Script (Hindi precisely) resulted:  Screenshot-1

 

(Not Bad!)

Advertisements

It all started with my regular B.Tech. classes at Delhi Technological University. My work and interests keep me far from having a big social life. It turns out that even after two years, I don’t  know everyone in my class. Its not about being oblivious, I know few friends completely, while I recognize many by just their face.

One day while travelling, a boy came to me and initiated a conversation. He knew me well enough! A face recognition algorithm back in my brain instructed me that he is someone from my class. But my neuron could not retrieve any other information about him from my brain (not even his name). I thought he might feel bad if I ask his name, so during the entire conversation I had to pretend as if I knew him.

Post conversation, I simply started thinking about a scene from “Mission Impossible: Ghost Protocol”, where an agent’s face recognition system triggers a false positive alarm. Fascinated from this, i thought to work on a miniature face recognition system of my own which I could later include in something big (shhh!!….concealed…).

But why face recognition after all…? Why do researchers spend their time on this..?

To answer this question, let us work on how we identify someone. We see, observe, and interact with various people around us or far in our day to day life. Our brain is so smart to recognize each person differently. We identify each other with Names and/or appearance. We even recognize some with their voice. Certainly of all other ways, we use Name and Appearance the most to identify a person. And out of all things that come under appearance, face has the most impact. That is, of all appearance features, we commonly use face to identify a person.

I represent above pictorially as below:

1

Now to decide among Name and Face, let me ask you a simple question. My neighbor’s name is Sunny. So now considering that you know his name, can you identify him if he comes to you?
Certainly not!
But now if I tell you that my Neighbor looks like below, maybe you will be able to identify him next time you see him.

2
MY NEIGHBOR: SUNNY

We interact with the world around us through our senses. Our eyes are our windows into the world. The amount humans learn through their eyes should not be underestimated under any circumstances. Concretely, studies have show that even three day old babies are able to distinguish between known faces.

Getting a bit technical now! We are far from understanding how over brain actually decodes faces. But various human attempts have been made for face recognition to simulate something similar to our brain’s working. Now, OpenCV has few modules that can be used directly for this purpose.

Few useful links:

1) http://www.cognotics.com/opencv/servo_2007_series/part_2/index.html

2) http://docs.opencv.org/trunk/modules/contrib/doc/facerec/tutorial/facerec_video_recognition.html

3) http://docs.opencv.org/trunk/modules/contrib/doc/facerec/facerec_tutorial.html

Few Output Videos:

Face Detection:

Face Recognition

Face Recognition with prediction confidence

Panda goes wild – Reminders notes

Posted: April 23, 2013 in Work
Tags:

Configuring pandaboard is not that difficult. But once things stop working, it sometimes become difficult to answer “why”.

Past few days with panda were not so good. This is how it went. I had two pandaboards. I picked once of them. Tried to work with it, but i guess panda was in bad mood. So was the other panda. After hours of googling at night (00:00 am till 5 am ), nothing worked. Eventually, i figured out that there was some problem with memory card reader’s pins. And luckily one of them started working after little hardware  correction.

While in some other test flight one of my “dearest” (angrily) friend spills gas on pangaboard. Luckily panda was not powered so nothing went wrong that day. BUT, just before the next test flight (3:0 am ) memory card inside broke. It smelled like fuel. Reason was: swelling of memory card due to fuel. So i had to reconfig a new memory card with all libraries before (6 am ) .

So thought to put some reminder notes for juniors  (like what my seniors did for me ) :

And the installation begins.

>> Install OpenSSH server

Install following (sudo-apt get):

gphoto2

exiv2

openssh-server

gcc
g++ (not required though)
pkg-config
libltdl-dev
libexif-dev
libusb-dev
libjpeg62-dev
libpopt-dev
libreadline-dev
libcdk5-dev
make

 

> Rsync wihtout password :
1) generate keys :
$ ssh-keygen
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

2)
ssh-copy-id -i ~/.ssh/id_rsa.pub panda@192.168.200.10
(and bang)

I have been searching for some good papers which I could implement to improve  my current implementation. After lots of googling and reading few papers, my eyes were half red. It was around 3 a.m. when I came across  Ioannis Katramados and Toby Breckon’s paper on “REAL-TIME VISUAL SALIENCY BY DIVISION OF GAUSSIANS”. This was not exactly what I was looking for but it somehow seems to satisfy my need. So I thought to implement the paper. Since then I have been discussing various things with both I. Katramados and T. Breckon. They both are really helpful and so is their paper.

Coming back to implementation, the paper has been written perfectly and it seems easy to implement the paper using OpenCV (SEEMS!).

This is what I am doing:
1) I converted image to GrayScale (32F)
2) according to step one in paper: “The Gaussian pyramid U comprises of n levels,starting with an image U1 as the base with resolution w × h. Higher pyramid levels are derived via downsampling using a 5 × 5 Gaussian filter. The top pyramid level has a resolution of (w/2n−1 ) × (h/2n−1 ). Let us call this image Un .
    which means I simply have to perform pyrDown() operation in opencv. I did it 8 times.
3) according to step two paper: “U n is used as the top level Dn of a second Gaus-sian pyramid D in order to derive its base D1 . In this case, lower pyramid levels are derived via upsampling using a 5×5 Gaussian filter.”
I simply performed 8 times pyrUp image.
4) And then goes pixel by pixel division of values as according to paper.
5) I normalized the result matrix to 0-255
saliency
I. Katramados has been really generous. Both the writers of the paper have been really helpful. I guess few changes to its implementation might result in a better output.
3:41 PM = So now its time follow few more advises  Ioannis Katramados.

Finding the Hul(k)l with in

Posted: February 10, 2013 in Work
Tags:

Struggle is something that should never stop. Even if I want it to stop, time constrains are not allowing it to be stopped. Implementation up till now is good. But is taking more time then calculated (3-4 seconds). Hence the struggle for a better implementation continues. Today I have been working to find convex hull. around the object in the image. The idea seems to be not so good to me. But trying it is the only option. So am being a little optimistic for now. As the time passes, my assumptions are turning right. mapping the convex hull around target to the original size image was not really so good. SO after few trials i dropped this idea. Current implementation is good enough. So tweaking it could help us. Using the concept of ROI really helped.   Now I am trying what I call another way of finding saliency map. This should reduce my run time (SHOULD).

Pdf + “Cute” [Touchulator]

Posted: February 3, 2013 in Work
Tags: , ,

So finally i have started my work on “something” new. Lets see how far this goes. So at first I was trying to read pdf file using QT. I found that QT itself does not come with any support to read pdf. However, there are many APIs available for QT. One of them being “Poppler”. Poppler (or libpoppler) is a free software library used to render PDF documents. Hence, I tried to implement poppler with QT.

If we skip the initial googling time, the task was not too difficult. All I had to do was to build poppler library. And then read pdf using this popeller library and rander pages as images. Now this image could easily be displayed thorugh QT.

 

Screenshot from 2013-02-03 13:34:41

 

 

 

For very basic implementation , this is the code:

QString filename = “/media/BACC8094CC804C97/Docs/Work Docs/PDF/IP/OReilly Learning OpenCV.pdf”;

Poppler::Document* document = Poppler::Document::load(filename);
if (!document || document->isLocked())
{
// … error message ….
delete document;
return;
}
// Paranoid safety check
if (document == 0) {
// … error message …
return;
}
// Access page of the PDF file
int pageNumber = 1;
Poppler::Page* pdfPage = document->page(pageNumber); // Document starts at page 0
if (pdfPage == 0) {
// … error message …
return;
}
// Generate a QImage of the rendered page
double xres=150.0,yres=200.0;

image = pdfPage->renderToImage(xres, yres, 0, 0, 1000, 1000);
if (image.isNull()) {
// … error message …
return;
}
// … use image …
// after the usage, the page must be deleted
delete pdfPage;
delete document;

 

Segmentation

Posted: January 17, 2013 in Work
Tags: ,

Almost all recognition jobs start with segmentation. Hence, I am giving segmentation the top priority. Due to time constrains, I am forced to conduct segmentation on reduced size image. When I map segmented mask to the original image. I get this:

aThis is looks cool enough. But is not really. I aim for more accuracy. So the struggle begins.

I am  half sleeping while am writing this post. Today’s night was not full of debugging (finally!). But today’s trial to map segmented target from pyrDowned image to actual image seems useless. Reason being 61 seconds ! One of the step , which is very important to jump to recognition step (step which I can not name right now) has started taking 61 seconds every time. Such a huge time is obviously a issue of big concern. Though I am feeling too sleepy today, but trying this thing out was important.

So here we go with explanation. Larger image surely contains more info then the smaller one. But it also takes larger time to process it. So it processing the pyrDowned image was fair enough ! But bringing the size down result in loss in info.   So i was able to perform segmentation on smaller image. But to carry identification operation it was important for me to map segmented object to the one in actual image. So this is the point where problem rises. If I map contour of target from smaller image, I get false result most of the time (undesirable ). This happens due the loss of information while reducing the size of image. And hence the target obtained in small image is not of exact shape. Thus mapping the contour of this target results in false result (or no result ).

And here is the difference!

Posted: December 30, 2012 in Work
Tags:

So finally over and out call to python this time. Everything was going fine but then suddenly a loop that should take not more than few second (2-5 seconds ) was taking around 5 minutes. Even after highly optimizing it, the time could not be reduced to few seconds as I mentioned. So after hours of struggling and debugging Machine Vision department decided to stick to C/C++.

So yesterday, I started once again  from scratch, this time in C++. Surely the day was not good (rather night I must say). I started in even. I wrote code to find distance contrast with is the image using histogram over HSV range. And I got some strange outputs. And again the debugging process started. After rechecking each thing , handling data types properly I got nothing. So after hours of searching and debugging, I gave a Mayday Mayday call to my senior. It might be around 0.00 am when I asked for his suggestions. I must say I have a very nice senior (Mr. Harsh Agrawal). He helps me every time I give away a Mayday call. So it started around 0.00 am. We both went through code again and again. Then following the Senior’s advice I started comparing all values in all matrices. Printed almost all matrices and compared them, rechecked all calculations. And this continued till 4 am , when I figured out that I used a wrong flag in one of the normalization function. I was happy that ultimately problem was resolved, but a silly mistake and few wrong comparisons(manually) wasted 6-7 hours. So finally I am able to calculate distance contrast perfectly. And am really thankful to Harsh Agrawal for his support.

The day was going still better today and then I started using cvBlob library. I have been coding in Opencv c++ so I read images as Mat format. But cvBlob library was written in C which used IplImage data structure. It is easy to convert an IplImage data type to Mat but I needed to do just the opposite.

was actually using cvLabel fucntion. For it i needed an image to be converted into IplImage format.I  tried following commant

IplImage result_image = result;

which indeed worked perfectly for me.

Though today I had to spend some of my time debugging, but I gave no Mayday calls and were able to solve all issues with in time constrains.

Nothing much to right today. Team is planning alot for future. Lots of planning means lots of changes. It even means the change in my plan.

Till now I am successful in finding blobs.

Asd

Histogram exam and my paining arm

Posted: December 25, 2012 in Work
Tags: , ,

I was thinking to find histogram over the HSV ranges in an image that too by using python. Yes, I have done this many times using OpenCV functions. But I am trying to search if I could replace a standard hist in OpenCV with numpy if possible.
Normally, using createHist() function requires size of bins , dimension and the ranges to create a histogram object. The same can be done by using numpy to create a multidimensional array of the same dimensions. After a lot of brainstorming i just 4 lines of code to find histogram. This proves me correct that I can code in any famous computer language provided I am given INTERNET.

So this is how i did it. I do not want to display the histogram, so I focused on formulating array (numpy) to store the histogram and eliminating the overheads in displaying them. I wanted to find histogram over just Hue and Sat range from HSV form. Since OpenCV allows us to specify the channel, this prevented me from splitting the image into three respective channels. Then i first calculated a Histogram for Hue range and normalized it. After that i followed same procedure for S range. I first calculated hist and the normalized it. Merging both of then in different dimensions of a single array gave me the required result.

The outcome seems simple, but i would just say  “mission accomplished !”

I thought the above would work. but it did not . So i struggled more to optimize my code rather to make it work properly. But after lots of struggling i got error. Debug too failed. Python is a good wrap up language. But i dont know why was i not able to debug it properly. May be because of run time nature of opencv.

But after lots of searching i found the solution. And I also found that python is smarter, and has a beauty in it. OpenCV needs to properly document its python implementation.So finally Mission accomplished!

Aim is to perform some initial steps involved in mean shift segmentation of an image. To recognize the objects in image i first want to remove the texture from image to have effective segmentation.  After performing mean shift filtering we get filtered “posterized” image with color gradient and fine-grain texture flattened.

In opencv mean shift filtering can be implemented on the image by using function PyrMeanShiftFiltering(). Apart from source and destination image, it takes radius of spatial window, radius of colour window as its parameters. There are few more default parameters that are implementation dependent.

At every pixel (X,Y) of the input image (or down-sized input image) the function cv2::PyrMeanShiftFiltering  in openCV executes meanshift iterations, that is, the pixel (X,Y) neighborhood in the joint space-color hyperspace is considered:

(x,y): X- \texttt{sp} \le x  \le X+ \texttt{sp} , Y- \texttt{sp} \le y  \le Y+ \texttt{sp} , ||(R,G,B)-(r,g,b)||   \le \texttt{sr}

where (R,G,B) and (r,g,b) are the vectors of color components at (X,Y) and (x,y), respectively (though, the algorithm does not depend on the color space used, so any 3-component color space can be used instead). Over the neighborhood the average spatial value (X',Y') and average color vector (R',G',B') are found and they act as the neighborhood center on the next iteration:

(X,Y)~(X',Y'), (R,G,B)~(R',G',B').

After the iterations over, the color components of the initial pixel (that is, the pixel from where the iterations started) are set to the final value (average color at the last iteration)

Now the point is that I have to perform this function on an image that has been obtained by performing pyrdown three time. But at the same time i do not want to lose any info when I pyrup this image after performing meanshiftfiltering.  By performing filtering I get the image below

But the cv::PyrMeanShiftFiltering also has a parameter maxLevel (default value = 0 ). When maxLevel > 0, the gaussian pyramid of maxLevel+1 levels is built, and the above procedure is run on the smallest layer first. After that, the results are propagated to the larger layer and the iterations are run again only on those pixels where the layer colors differ by more than sr from the lower-resolution layer of the pyramid. That makes boundaries of color regions sharper.