Brown professor Benjamin Kimia’s latest project, “Bridging the Semantic-Metric Gap via Multinocular Image Integration,” has been awarded $1.2 million from the National Science Foundation. The award, to be distributed over four years, will help further Kimia’s work on increasing the utility and robustness of computer vision 3D technology.
Humans and other animals can effortlessly and subconsciously reconstruct the 3D world around them from the video imagery streaming to their eyes, and successfully use it for navigation, food-finding, predator avoidance, and the like. Computer vision 3D technology has been evolving rapidly to reconstruct the world from a set of cameras and locate these cameras in the environment. This technology is applicable to navigation (automated driving, robot navigation, and drone flights); manipulation (robotic manufacturing, robotic medical interventions); measurements in metrology; modeling for the entertainment industry; and a host of other applications. As a result, 3D vision has experienced an exponential growth in capability, efficiency, and robustness.
Despite this growth, fundamental shortcomings exist that Kimia and his team want to address to enlarge the scope of application. First, images from rapidly moving cameras like drones and pedestrians, are often blurry and lack features, or indoor scenes might lack stable features or have indistinguishable features. Second, image sensing typically enjoys a high degree of redundancy which is often discarded in current algorithms, thus forfeiting the opportunity to use the high information content inherent in the redundancy. Third, there is often a large gap between the internal representations used in the current technology (often point-based), and a semantic representation of the scene, which are more resonant with an understanding of underlying curves and surface patches of an object.
Kimia and collaborators from the University of Texas and the University of Tennessee will work to address technical challenges that will serve to bridge the semantic-metric gap that exists between geometrically accurate 3D point clouds/meshes and semantically meaningful organizations in terms of objects, object parts, spatial layout, and mapping. The team will introduce novel methods for stability analysis into multiview geometry (MVG), further develop tools to solve very large polynomial systems, and create a framework for MVG operations based on curves, surfaces and their differential geometry. These three streams of research will allow direct, efficient and reliable integration of information across a large number of views in multinocular vision systems.
“The research explores the often ignored notion of stability which we have shown is responsible in many failure cases,” said Kimia. “Unstable data are often unimportant when many stable data are also available, but in cases when data is not plentiful, stability becomes a critical issue.
“Similarly, in cases of video imagery, the relative contribution of each image with respect to another is viewed as incremental, but we have shown that integrating this incremental information can lead to substantial improvements. In addition, we have developed an approach to solving large polynomial systems which is applicable beyond computer vision examples,” he said.
Kimia’s more than 30 years of research includes computer vision and medical imaging inspired by neurophysiology and psychophysics. His expertise includes the representation of shape in 2D, 3D and multiview reconstruction, as applied to large image databases, archaeology, assistance for the blind, odometry, and image-guided treatments.