July 21, 2018

CVPR 2018 – from Elphel’s perspective

by olga

In this blog article  we will recall the most interesting results of Elphel participation at CVPR 2018 Expo, the conversations we had with visitor’s at the booth, FAQs as well as unusual questions, and what we learned from it. In addition we will explain our current state of development as well as our near and far goals, and how the exhibition helps to achieve them.
The Expo lasted from June 19-21, and each day had it’s own focus and results, so this article is organized chronologically.

Day One: The best show ever!

June 19, CVPR 2018, booth 132

June 19, CVPR 2018, booth 132

While we are standing nervously at our booth, thinking: “Is there going to be any interest? Will people come, will they ask questions?”, the first poster session starts and a wave of visitors floods the exhibition floor. Our first guest at the booth spends 30 minutes, knowledgeably inquiring about Elphel’s long-range 3D technology and leaves his business card, saying that he is very impressed. This was a good start of a very busy day full of technical discussions. CVPR is the first exhibition we have participated in where we did not have any problems explaining our projects.

The most common questions that were asked:


Comments: no responses

July 20, 2018

Two Dimensional Phase Correlation as Neural Network Input for 3D Imaging

by Andrey Filippov

We uploaded an image set with 2D correlation data together with the import Python code for experiments with the neural networks and are now looking for collaboration with those who would love to apply their DL experience to the new kind of input data. More data will follow and we welcome feedback to make this data set more useful.

The application area we are interested in is an extremely long distance 3D scene reconstruction and ranging with the distance to baseline ratio of 1000:1 to 10,000:1 while preserving wide field of view. Earlier post describes aircraft distance and velocity measurements with up to 3,000:1 distance-to-baseline ratio and 60°(H)×45°(V) field of view.

Figure 1. Space-invariant 2D phase correlation data

Data set: source images, 2D correlation tiles, and X3D scene models

The data set contains 2D phase correlation output calculated from the 2592×1936 Bayer mosaic source images captured by the quad stereo camera, and Disparity Space Image (DSI) calculated from a pair of such cameras. Longer baseline provides higher range resolution and this DSI is serving as the ground truth for a single quad camera. The source images as well as all the used software is also provided under the GNU/GPLv3 license. DSI is organized as a 324×242 array – each sample is calculated from the corresponding 16×16 tile. Tiles are overlapping (as shown in Figure 3) with stride 8.

These data sets are provided together with the fully reconstructed X3D scene models, viewable in the browser (Wavefront OBJ files are also generated). The scene models are different from the raw DSI as the next software stages generate meshes, and that frequently leads to over-simplification of the original DSI (so most fronto parallel objects in the scene provide better range accuracy when probed near the bottom). Each scene is accompanied with the multi-layer TIFF file (*-DSI_COMBO.tiff) that allows to see the difference between the measured DSI and the one used in the rendered X3D model. File format and Python import software is documented in Oleg’s post “Reading quad stereo TIFF image stacks in Python and formatting data for TensorFlow”, the data files are directly viewable with ImageJ.

Comments: no responses

Reading quad stereo TIFF image stacks in Python and formatting data for TensorFlow

by Oleg Dzhimiev

Fig.1 TIFF image stack

The input is a <filename>.tiff – a TIFF image stack generated by ImageJ Java plugin (using bioformats) with Elphel-specific information in ImageJ written TIFF tags. Reading and formatting image data for the Tensorflow can be split into the following subtasks:
  1. convert a TIFF image stack into a NumPy array
  2. extract information from the TIFF header tags
  3. reshape/perform a few array manipulations based on the information from the tags.
To do this we have created a few Python scripts (see python3-imagej-tiffimagej_tiff.py) that use Pillow, Numpy, Matplotlib, etc.. Usage: ~$ python3 imagej_tiff.py <filename>.tiff It will print header info in the terminal and display the layers (and decoded values) using Matplotlib. (more…)
Comments: no responses

June 20, 2018

Elphel at CVPR 2018

by olga

Elphel booth at CVPR 2018 Expo

Elphel is presenting at CVPR 2018 EXPO: Long Range Passive 3D Reconstruction System,  providing NN with custom training sets, enabling realtime 3D-aware machine learning systems.

Please visit us at CVPR, Booth 132.

You can also find our presentation and related articles on wiki page https://wiki.elphel.com/wiki/CVPR2018

Comments: no responses

May 6, 2018

Capturing Aircraft Position with the Long Range Quad Stereo Camera

by Andrey Filippov

Figure 1. Aircraft positions during descent captured with the quad stereo camera. Each animation frame corresponds to the available 3-D model.

While we continue to work on the multi-sensor stereo camera hardware (we plan to double the number of sensors to capture single-exposure HDR image sets) and develop code to get the ground truth data for the CNN training, we had some fun testing the camera for capturing aircraft position in 3-D space. Completely passive method, of course.

We found a suitable spot about 2.5 km from the beginning of the runway 34L of the Salt Lake City international airport (exact location is shown in the model viewer) so approaching aircraft would pass almost over our heads. With the 60°(H)×45°(V) field of view of the camera aircraft are visible when they are 270 m away horizontally.

Comments: no responses

April 20, 2018

Implementing a linux driver for an image sensor for NC393

by Oleg Dzhimiev

Fig.1 MT9F002


This post briefly covers implementation of a driver for On Semi’s MT9F002 14MPx image sensor for 10393 system boards – the steps are more or less general. The driver is included in the latest software/firmware image (20180416). The implemented features are programmable:
  • window size
  • horizontal & vertical mirror
  • color gains
  • exposure
  • fps and trigger-synced ports
  • frame-based commands sequence allowing to change settings of any image up to 16 frames ahead (didn’t need to be implemented as it’s the common part of the driver for all sensors)
  • auto cable phase adjustment during init for cables of various lengths
Comments: no responses

March 20, 2018

Dual Quad-Camera Rig for Capturing Image Sets

by Andrey Filippov

Figure 1. Dual quad-camera rig mounted on a car

Following the plan laid out in the earlier post we’ve built a camera rig for capturing training/testing image sets. The rig consists of the two quad cameras as shown in Figure 1. Four identical Sensor Front Ends (SFE) 10338E of each camera use 5 MPix MT9P006 image sensors, we will upgrade the cameras to 18 MPix SFE later this year, the circuit boards 103981 are in production now.

Comments: no responses

February 5, 2018

High Resolution Multi-View Stereo: Tile Processor and Convolutional Neural Network

by Andrey Filippov

Figure 1. Multi-board setup for the TP+CNN prototype

Featured on Image Sensors World

This article describes our next steps that will continue the year-long research on high resolution multi-view stereo for long distance ranging and 3-D reconstruction. We plan to fuse the methods of high resolution images calibration and processing, already emulated functionality of the Tile Processor (TP), RTL code developed for its implementation and the Convolutional Neural Network (CNN). Compared to the CNN alone this approach promises over a hundred times reduction in the number of input features without sacrificing universality of the end-to-end processing. The TP part of the system is responsible for the high resolution aspects of the image acquisition (such as optical aberrations correction and image rectification), preserves deep sub-pixel super-resolution using efficient implementation of the 2-D linear transforms. Tile processor is free of any training, only a few hyperparameters define its operation, all the application-specific processing and “decision making” is delegated to the CNN.

Comments: one response

January 30, 2018

Photo Finish

by Oleg Dzhimiev

Photo Finish: all cars driving in the same direction effect

Since 2005 and the older 333 model, Elphel cameras have a Photo Finish mode. First, it was ported to 353 generation, and then from 353 to 393 camera systems. In this mode the camera samples scan lines and delivers composite images as video frames. Due to the Bayer pattern of the sensor the minimal sample height is 2 lines. The max fps for the minimal sample height is 2300 line pairs per second. The max width of a composite frame can be up to 16384px (is determined by WOI_HEIGHT). A sequence of these frames can be simply joined together without any missing scan lines. Current firmware (20180130) includes a photo finish demo: http://<camera_ip>/photofinish A couple notes for 393 photo finish implementation:
  • works in JP4 format (COLOR=5). Because in this format demosaicing is not done it does not require extra scan lines, which simplified fpga’s logic.
  • fps is controlled:
    • by exposure for the sensor in the freerun mode (TRIG=0, delivers max fps possible)
    • by external or internal trigger period for the sensor in the snapshot mode (TRIG=4, a bit lower fps than in freerun)
See our wiki’s Photo-finish article for instructions and examples.
Comments: no responses

January 8, 2018

Efficient Complex Lapped Transform Implementation for the Space-Variant Frequency Domain Calculations of the Bayer Mosaic Color Images

by Andrey Filippov

This post continues discussion of the small tile space-variant frequency domain (FD) image processing in the camera, it demonstrates that modulated complex lapped transform (MCLT) of the Bayer mosaic color images requires almost 3 times less computational resources than that of the full RGB color data.

“Small Tile” and “Space Variant”

Why “small tile“? Most camera images have short (up to few pixels) correlation/mutual information span related to the acquisition system properties – optical aberrations cause a single scene object point influence a small area of the sensor pixels. When matching multiple images increase of the window size reduces the lateral (x,y) resolution, so many of the 3d reconstruction algorithms do not use any windows at all, and process every pixel individually. Other limitation on the window size comes from the fact that FD conversions (Fourier and similar) in Cartesian coordinates are shift-invariant, but are sensitive to scale and rotation mismatch. So targeting say 0.1 pixel disparity accuracy the scale mismatch should not cause error accumulation over window width exceeding that value. With 8×8 tiles (16×16 overlapped) acceptable scale mismatch (such as focal length variations) should be under 1%. That tolerance is reasonable, but it can not get much tighter.

What is “space variant“? One of the most universal operations performed in the FD is convolution (also related to correlation) that exploits convolution-multiplication property. Mathematically convolution applies the same operation to each of the points of the source data, so shifted object of the source image produces just a shifted result after convolution. In the physical world it is a close approximation, but not an exact one. Stars imaged by a telescope may have sharper images in the center, but more blurred in the peripheral areas. While close (angularly) stars produce almost the same shape images, the far ones do not. This does not invalidate convolution approach completely, but requires kernel to (smoothly) vary over the input images [12], makes it a space-variant kernel.

Figure 1. Complex Lapped Transform with DCT-IV/DST-IV: time-domain aliasing cancellation (TDAC) property. a) selection of overlapping input subsequences 2*N-long, multiplication by sine window; b) creating N-long sequences for DCT-IV (left) and DST-IV (right); c) (after frequency domain processing) extending N-long sequence using DCT-IV boundary conditions (DST-IV processing is similar); d) second multiplication by sine window; e) combining partial data

There is another issue related to the space-variant kernels. Fractional pixel shifts are required for multiple steps of the processing: aberration correction (obvious in the case of the lateral chromatic aberration), image rectification before matching that accounts for lens optical distortion, camera orientation mismatch and epipolar geometry transformations. Traditionally it is handled by the image rectification that involves re-sampling of the pixel values for a new grid using some type of the interpolation. This process distorts the signal data and introduces non-linear errors that reduce accuracy of the correlation, that is important for subpixel disparity measurements. Our approach completely eliminates resampling and combines integer pixel shift in the pixel domain and delegates the residual fractional pixel shift (±0.5 pix) to the FD, where it is implemented as a cosine/sine phase rotator. Multiple sources of the required pixel shift are combined for each tile, and then a single phase rotation is performed as a last step of pixel domain to FD conversion.

Frequency Domain Conversion with the Modulated Complex Lapped Transform

Modulated Complex Lapped Transform (MCLT)[3] can be used to split input sequence into overlapping fractions, processed separately and then recombined without block artifacts. Popular application is the signal compression where “processed separately” means compressed by the encoder (may be lossy) and then reconstructed by the decoder. MCLT is similar to the MDCT that is implemented with DCT-IV, but it additionally preserves and allows frequency domain modification of the signal phase. This feature is required for our application (fractional pixel shifts and asymmetrical lens aberrations modify phase), and MCLT includes both MDCT and MDST (that use DCT-IV and DST-IV respectively). For the image processing (2d conversion) four sub-transforms are needed:

  • horizontal DCT-IV followed by vertical DCT-IV
  • horizontal DST-IV followed by vertical DCT-IV
  • horizontal DCT-IV followed by vertical DST-IV
  • horizontal DST-IV followed by vertical DST-IV

Comments: no responses
Next Page »