June 18, 2019

Photogrammetric Calibration of the Quad Thermal Camera

by Andrey Filippov

Figure 1. Calibration of the quad visible range + quad LWIR camera.

We’ve got the first results of the photogrammetric calibration of the composite (visible+LWIR) Talon camera described int the previous post and tested the new pattern. In the first experiments with the new software code we’ve got average reprojection error for LWIR of 0.067 pix, while the visible quad camera subsystem was calibrated down to 0.036 pix. In the process of calibration we acquired 3 sequences of 8-frame (4 visible + 4 LWIR) image sets, 60 sets from each of the 3 camera positions: 7.4m from the target on the target center line, and 2 side views from 4.5m from the target and 2.2 m right and left from the center line. From each position camera axis was scanned in the range of ±40° horizontally and ±25° vertically.

In the first stage of calibration the acquired images (Figure 2 shows both LWIR and visible range images) were processed to extract the detected pattern grid node. Each node’s pixel coordinates (x,y), a pair of the pattern UV indexes and the contrast are stored in the form of multi-layer TIFF files – processed images from Figure 2 are shown on animated GIFs, Figure 3 is for LWIR and Figure 4 – for the visible range images. Each of the images has a link to a respective TIFF file that has metadata visible with ImageJ program.

Figure 2. LWIR and visible range images of the pattern.

The software for the calibration had to be modified, as it did not have provisions for handling different resolution sensors in the same system. Additionally it was designed for the smaller pattern cell size so each image had larger number of detected nodes. We also had to use other method of absolute pattern matching (finding out which of the multiple almost identical measure nodes corresponds to the particular (e.g. center) node of the physical pattern. Earlier, with the fine pitch patterns, we used software-controlled laser pointers, with the current approach we used the fact that the visible subsystem was already calibrated – that provided sufficient data for absolute matching. Only one image for each camera position had manually marked center pattern cell – that mark simplified the processing.

Before LWIR data was used the visible rage images were processed with Levenberg-Marquardt algorithm (LMA) and the system knew precise position and orientation of the camera during capturing of each image set. After preliminary adjustment with idealized (uniform and flat) pattern grid it was possible to simultaneously refine each physical pattern node location in 3d, similarly as we did earlier (Figures 4 and 5 in this post). Using large cell pattern, the visible cameras bundle adjustment resulted in 0.11 pixels when using just radial distortion model. This was improved to 0.036 pix when using non-radial correction to compensate other irregularities of the lenses.

When LWIR data was added and each image set had known extrinsic parameters, the absolute grid matching was performed in the following way. Starting with the image with the smallest distance from the target (so the wrong matching of the grid cell would cause attitude error larger than initial precision of the system assembly) the attitude angles of each of the 4 LWIR subcameras was found. Then for each additional image LMA was run with all but 2 parameters known and fixed – camera base position X and Y coordinates (in the plane parallel to the target). As the position was already known from the visible range subsystem, the X and Y difference was matched to the physical pattern dimensions to determine how many cells (if any) the assumed grid match is offset. That gave absolute grid match for all of the LWIR images, and bundle adjustment of the LWIR modules intrinsic (focal length, offset from the center pixel, radial distortion coefficient) as well as the modules location in the camera coordinate system and their attitudes was possible. It resulted in 0.097 pixels reprojection error with just the radial distortion model. We did not expect much from the non-radial correction as the sensor resolution is low, but it still gave some improvement – the residual error dropped down to 0.067 pix. It is likely this value can be improved more with better tweaking of the processing parameters – the current result was achieved in the first run.

Figure 3. Processed pattern grid from LWIR data (click image to download ImageJ Tiff stack).

Figure 4. Processed pattern grid from visible range camera data (click image to download ImageJ Tiff stack).

These calibration results are not that exciting compared to what we will get with the LWIR 3D model reconstruction, but it gives us confidence that we are on the right track, and our goal is possible. Having a much higher resolution sub-system conveniently provides ground truth data for training of the neural network – something that was much more difficult to achieve with the high resolution visible range only systems.

June 6, 2019

3D Perception with Thermal Cameras

by Andrey Filippov

Figure 1. Oleg carrying combined LWIR and visible range camera.

While working on extremely long range 3D visible range cameras we realized that the passive nature of such 3D reconstruction that does not involve flashing lasers or LEDs like LIDARs and Time-of-Flight (ToF) cameras can be especially useful for thermal (a.k.a LWIR – long wave infrared) vision applications. SBIR contract granted at the AF Pitch Day provided initial funding for our research in this area and made it possible.

We are now in the middle of the project and there is a lot of the development ahead, but we have already tested all the hardware and modified our earlier code to capture and detect calibration pattern in the acquired images.


October 11, 2018

GPU Implementation of the Tile Processor

by Andrey Filippov

After we coupled the Tile Processor (TP) that performs quad camera image conditioning and produces 2D phase correlation in space-invariant form with the neural network[1], the TP remained the bottleneck of the tandem. While the inferred network uses GPU and produces disparity output in 0.5 sec (more than 80% of this time is used for the data transfer), the TP required tens of seconds to run on CPU as a multithreaded Java application. When converted to run on the GPU, similar operation takes just 0.087 seconds for four 5 MPix images, and it is likely possible to optimize the code farther — this is our first experience with Nvidia® CUDA™.


September 5, 2018

Neural network doubled effective baseline of the stereo camera

by Andrey Filippov

Figure 1. Network diagram. One of the tested configurations is shown.

Neural network connected to the output of the Tile Processor (TP) reduced the disparity error twice from the previously used heuristic algorithms. The TP corrects optical aberrations of the high resolution stereo images, rectifies images, and provides 2D correlation outputs that are space-invariant and so can be efficiently processed with the neural network.

What is unique in this project compared to other ML applications for image-based 3D reconstruction is that we deal with extremely long ranges (and still wide field of view), the disparity error reduction means 0.075 pix standard deviation down from 0.15 pix for 5 MPix images.

See also: arXiv:1811.08032


July 21, 2018

CVPR 2018 – from Elphel’s perspective

by olga

In this blog article  we will recall the most interesting results of Elphel participation at CVPR 2018 Expo, the conversations we had with visitor’s at the booth, FAQs as well as unusual questions, and what we learned from it. In addition we will explain our current state of development as well as our near and far goals, and how the exhibition helps to achieve them. The Expo lasted from June 19-21, and each day had it’s own focus and results, so this article is organized chronologically.

Day One: The best show ever!

June 19, CVPR 2018, booth 132

June 19, CVPR 2018, booth 132

While we are standing nervously at our booth, thinking: “Is there going to be any interest? Will people come, will they ask questions?”, the first poster session starts and a wave of visitors floods the exhibition floor. Our first guest at the booth spends 30 minutes, knowledgeably inquiring about Elphel’s long-range 3D technology and leaves his business card, saying that he is very impressed. This was a good start of a very busy day full of technical discussions. CVPR is the first exhibition we have participated in where we did not have any problems explaining our projects.

The most common questions that were asked:


July 20, 2018

Two Dimensional Phase Correlation as Neural Network Input for 3D Imaging

by Andrey Filippov

We uploaded an image set with 2D correlation data together with the import Python code for experiments with the neural networks and are now looking for collaboration with those who would love to apply their DL experience to the new kind of input data. More data will follow and we welcome feedback to make this data set more useful.

The application area we are interested in is an extremely long distance 3D scene reconstruction and ranging with the distance to baseline ratio of 1000:1 to 10,000:1 while preserving wide field of view. Earlier post describes aircraft distance and velocity measurements with up to 3,000:1 distance-to-baseline ratio and 60°(H)×45°(V) field of view.

Figure 1. Space-invariant 2D phase correlation data

Data set: source images, 2D correlation tiles, and X3D scene models

The data set contains 2D phase correlation output calculated from the 2592×1936 Bayer mosaic source images captured by the quad stereo camera, and Disparity Space Image (DSI) calculated from a pair of such cameras. Longer baseline provides higher range resolution and this DSI is serving as the ground truth for a single quad camera. The source images as well as all the used software is also provided under the GNU/GPLv3 license. DSI is organized as a 324×242 array – each sample is calculated from the corresponding 16×16 tile. Tiles are overlapping (as shown in Figure 3) with stride 8.

These data sets are provided together with the fully reconstructed X3D scene models, viewable in the browser (Wavefront OBJ files are also generated). The scene models are different from the raw DSI as the next software stages generate meshes, and that frequently leads to over-simplification of the original DSI (so most fronto parallel objects in the scene provide better range accuracy when probed near the bottom). Each scene is accompanied with the multi-layer TIFF file (*-DSI_COMBO.tiff) that allows to see the difference between the measured DSI and the one used in the rendered X3D model. File format and Python import software is documented in Oleg’s post “Reading quad stereo TIFF image stacks in Python and formatting data for TensorFlow”, the data files are directly viewable with ImageJ.


Reading quad stereo TIFF image stacks in Python and formatting data for TensorFlow

by Oleg Dzhimiev

Fig.1 TIFF image stack

The input is a <filename>.tiff – a TIFF image stack generated by ImageJ Java plugin (using bioformats) with Elphel-specific information in ImageJ written TIFF tags. Reading and formatting image data for the Tensorflow can be split into the following subtasks:
  1. convert a TIFF image stack into a NumPy array
  2. extract information from the TIFF header tags
  3. reshape/perform a few array manipulations based on the information from the tags.
To do this we have created a few Python scripts (see python3-imagej-tiffimagej_tiff.py) that use Pillow, Numpy, Matplotlib, etc.. Usage: ~$ python3 imagej_tiff.py <filename>.tiff It will print header info in the terminal and display the layers (and decoded values) using Matplotlib. (more…)

May 6, 2018

Capturing Aircraft Position with the Long Range Quad Stereo Camera

by Andrey Filippov

Figure 1. Aircraft positions during descent captured with the quad stereo camera. Each animation frame corresponds to the available 3-D model.

While we continue to work on the multi-sensor stereo camera hardware (we plan to double the number of sensors to capture single-exposure HDR image sets) and develop code to get the ground truth data for the CNN training, we had some fun testing the camera for capturing aircraft position in 3-D space. Completely passive method, of course.

We found a suitable spot about 2.5 km from the beginning of the runway 34L of the Salt Lake City international airport (exact location is shown in the model viewer) so approaching aircraft would pass almost over our heads. With the 60°(H)×45°(V) field of view of the camera aircraft are visible when they are 270 m away horizontally.


March 20, 2018

Dual Quad-Camera Rig for Capturing Image Sets

by Andrey Filippov

Figure 1. Dual quad-camera rig mounted on a car

Following the plan laid out in the earlier post we’ve built a camera rig for capturing training/testing image sets. The rig consists of the two quad cameras as shown in Figure 1. Four identical Sensor Front Ends (SFE) 10338E of each camera use 5 MPix MT9P006 image sensors, we will upgrade the cameras to 18 MPix SFE later this year, the circuit boards 103981 are in production now.


February 5, 2018

High Resolution Multi-View Stereo: Tile Processor and Convolutional Neural Network

by Andrey Filippov

Figure 1. Multi-board setup for the TP+CNN prototype

Featured on Image Sensors World

This article describes our next steps that will continue the year-long research on high resolution multi-view stereo for long distance ranging and 3-D reconstruction. We plan to fuse the methods of high resolution images calibration and processing, already emulated functionality of the Tile Processor (TP), RTL code developed for its implementation and the Convolutional Neural Network (CNN). Compared to the CNN alone this approach promises over a hundred times reduction in the number of input features without sacrificing universality of the end-to-end processing. The TP part of the system is responsible for the high resolution aspects of the image acquisition (such as optical aberrations correction and image rectification), preserves deep sub-pixel super-resolution using efficient implementation of the 2-D linear transforms. Tile processor is free of any training, only a few hyperparameters define its operation, all the application-specific processing and “decision making” is delegated to the CNN.

Next Page »