Going into my Junior year at Boston University (BU), I got the opportunity to work with a professor in the Engineering Product Innovation Center (EPIC) Robotics Lab. This is a place I have only toured up until then; the research they do with robotics is fascinating. A lot of what this lab focuses on is artificial intelligence (AI) that is implemented in robots (such as Baxter), quad-rotor UAVs, and ground UAVs (which are re-purposed IRobot Roombas without the vacuum). This was exciting for me because I was finally able to work in a research setting during my time at BU while applying some of the technical skills that I have been acquiring throughout my time at BU. If you would like to learn more about this lab and what they do, just follow this link: BU EPIC Robotics Lab
The Project: Small: Distributed Semantic Information Processing Applied to Camera Sensor Networks
The title of this project seems very intimidating; here is the unpacked version. Raspberry Pi's are very useful and convenient mini computers. They can be placed on any application that requires some sort of processing on the go. The goal of this project is to use a network Raspberry Pi to detect and extract features of an object. Each Pi module in the network is called a node, and the entire network will have between 10 to 20 nodes. Each of the Pis will be able to perform feature detection and feature extraction and process all of that data as a network. A matching algorithm will be implemented that determines if multiple nodes have seen the same object by matching the object features seen by different nodes. For example, imagine that a network of these nodes has been set up in a room. A cup with pictures on it is placed in the room; at least one node will detect this cup. Now if the cup is moved to a different spot in the room, a different node will see it. As a network, it will recognize that that specific cup has been seen in the room before and has been moved. That's the end goal of this project.
The title of this project seems very intimidating; here is the unpacked version. Raspberry Pi's are very useful and convenient mini computers. They can be placed on any application that requires some sort of processing on the go. The goal of this project is to use a network Raspberry Pi to detect and extract features of an object. Each Pi module in the network is called a node, and the entire network will have between 10 to 20 nodes. Each of the Pis will be able to perform feature detection and feature extraction and process all of that data as a network. A matching algorithm will be implemented that determines if multiple nodes have seen the same object by matching the object features seen by different nodes. For example, imagine that a network of these nodes has been set up in a room. A cup with pictures on it is placed in the room; at least one node will detect this cup. Now if the cup is moved to a different spot in the room, a different node will see it. As a network, it will recognize that that specific cup has been seen in the room before and has been moved. That's the end goal of this project.
What I Do:
My role in this project is to work with one of the graduate students in the lab to develop the nodes mentioned above. This involved the following:
My role in this project is to work with one of the graduate students in the lab to develop the nodes mentioned above. This involved the following:
- Learning how to use ROS: a framework that is commonly used to control and communicate with robots.
- Installing the software package Tensorflow on a Raspberry Pi running Ubuntu Mate (a Linux OS). This was harder than it sounds to get functioning on a computer with low resources.
- Writing custom Python scripts that incorporate ROS, Tensorflow, a neural network, and a pre-trained model (offered by Google) in order to process images on board the Raspberry Pi in order to detect and label objects. This script had to find a balance between resource use (which is in limited supply on a Pi) and accuracy.
- Using Tensorflow in order to develop Python scripts to be used for feature extraction by finding and computing hypercolumn at certain keypoints in an image.
- Update: We have decided to use SIFT feature extraction for this aspect rather than hypercolumns due to inconsistent test results.
- Establishing a visual memory similar to that of a human on the Pi in order to keep track and remember what has been seen by the camera.
- Using OpenCV object tracking software in tandem with the object detection in order to follow an object as it moves around in a frame. This is used to better detect and match object with each other.
- Building the physical nodes by interfacing cameras with the Raspberry Pi and creating housings/mounts for the camera. These nodes will ultimately be incorporated in ground based UAVs and had to have mounts custom designed and 3D printed.
- Maintaining a research blog in order to document all of my work (Link at the bottom of the page).
Some Results:
I try to put as much of the progress and results on my blog, however due to limitations with the site I cannot add things like videos easily. Also, I want to highlight some of the more notable results here on this site. Below are some of those results:
I try to put as much of the progress and results on my blog, however due to limitations with the site I cannot add things like videos easily. Also, I want to highlight some of the more notable results here on this site. Below are some of those results:
6/6/2018 : It's Alive!
At this point I have been working on the project for close to a year now. Throughout that time I have made a lot of progress. Right now my software works as follows:
At this point I have been working on the project for close to a year now. Throughout that time I have made a lot of progress. Right now my software works as follows:
- Three "nodes" (programs) run simultaneously in order to split the work load.
- One node identifies objects in the video feed using a neural network and a pretrained model.
- A second node actively tracks that identified objects until they leave the frame, assigning then tracking ID's. Objects are tracked regardless of whether or not the object detection sees them once they are initially identified. That means that if the detection node stops seeing the object yet it is still in frame, the tracker will still see the object.
- A final node extracts features of the object and matches the object with objects in its memory. The matching node can dynamically update its memory in order to either add objects that it has not seen before or to add more information on objects that it has seen before and has been tracking. That means that if an object is turned, the object will not be regarded as a new object, rather it will know that its the same object just rotated. This dynamic memory can be disabled and the software can operate with a static memory instead with pre-loaded object files (as the case with the video below; only one object file is in its memory which is the information of the larger cup).
- All of these nodes will "publish" (relay) information to the user. This information includes what the detection node has detected, the objects being tracked, and information on the matching. Each node also can display visualizations of what they are doing.
- The detection node will draw a box around identified objects with a label of its classification and confidence (this has been disabled in the video below).
- The tracking node will draw a box around the objects being tracked with a label of the object ID.
- The matching node will draw an eye on the objects being tracked in the video. A green eye means that the object has been seen before, a red one mean it has not.