Why “*small tile*“? Most camera images have short (up to few pixels) correlation/mutual information span related to the acquisition system properties – optical aberrations cause a single scene object point influence a small area of the sensor pixels. When matching multiple images increase of the window size reduces the lateral (x,y) resolution, so many of the 3d reconstruction algorithms do not use any windows at all, and process every pixel individually. Other limitation on the window size comes from the fact that FD conversions (Fourier and similar) in Cartesian coordinates are shift-invariant, but are sensitive to scale and rotation mismatch. So targeting say 0.1 pixel disparity accuracy the scale mismatch should not cause error accumulation over window width exceeding that value. With 8×8 tiles (16×16 overlapped) acceptable scale mismatch (such as focal length variations) should be under 1%. That tolerance is reasonable, but it can not get much tighter.

What is “*space variant*“? One of the most universal operations performed in the FD is convolution (also related to correlation) that exploits convolution-multiplication property. Mathematically convolution applies the same operation to each of the points of the source data, so shifted object of the source image produces just a shifted result after convolution. In the physical world it is a close approximation, but not an exact one. Stars imaged by a telescope may have sharper images in the center, but more blurred in the peripheral areas. While close (angularly) stars produce almost the same shape images, the far ones do not. This does not invalidate convolution approach completely, but requires kernel to (smoothly) vary over the input images ^{[1, 2]}, makes it a space-variant kernel.

There is another issue related to the space-variant kernels. Fractional pixel shifts are required for multiple steps of the processing: aberration correction (obvious in the case of the lateral chromatic aberration), image rectification before matching that accounts for lens optical distortion, camera orientation mismatch and epipolar geometry transformations. Traditionally it is handled by the image rectification that involves re-sampling of the pixel values for a new grid using some type of the interpolation. This process distorts the signal data and introduces non-linear errors that reduce accuracy of the correlation, that is important for subpixel disparity measurements. Our approach completely eliminates resampling and combines integer pixel shift in the pixel domain and delegates the residual fractional pixel shift (±0.5 pix) to the FD, where it is implemented as a cosine/sine phase rotator. Multiple sources of the required pixel shift are combined for each tile, and then a single phase rotation is performed as a last step of pixel domain to FD conversion.

Modulated Complex Lapped Transform (MCLT)^{[3]} can be used to split input sequence into overlapping fractions, processed separately and then recombined without block artifacts. Popular application is the signal compression where “processed separately” means compressed by the encoder (may be lossy) and then reconstructed by the decoder. MCLT is similar to the MDCT that is implemented with DCT-IV, but it additionally preserves and allows frequency domain modification of the signal phase. This feature is required for our application (fractional pixel shifts and asymmetrical lens aberrations modify phase), and MCLT includes both MDCT and MDST (that use DCT-IV and DST-IV respectively). For the image processing (2d conversion) four sub-transforms are needed:

- horizontal DCT-IV followed by vertical DCT-IV
- horizontal DST-IV followed by vertical DCT-IV
- horizontal DCT-IV followed by vertical DST-IV
- horizontal DST-IV followed by vertical DST-IV

Figure 1 illustrates the principle of TDAC (time-domain aliasing cancellation) that restores initial data from the series of individually converted subsections after seemingly lossy transformations. Step a) shows the 2*N long (in our case N=8) data subsets extraction, each subset is multiplied by a window function. It is a half period of the sine function, but other windows are possible as long as they satisfy the Princen-Bradley condition. Each of the sections $(a\dots h)$ corresponds to N/2 of the input samples, color gradients indicate input order. Figure 2b has seemingly lossy result of MDCT (left) and MDST (right) performed on the 2*N long input $(a,b,c,d)$ resulting in N-long $(\u2013\stackrel{~}{c}\u2013d,a\u2013\stackrel{~}{b})$ – tilde indicates time-reversal of the subsequence (image has the gradient direction reversed too). Upside-down pieces indicate subtraction, up-right – addition. Each of the 2*N → N transforms is irreversible by itself, TDAC depends on the neighbor sections.

Figure 1c shows the first step of original sequence restoration, it extends N-long sequence using DCT-IV boundary conditions – it continues symmetrically around left boundary, and anti-symmetrically around the right one (both around half-sample from the first/last one), behaving like a first quadrant (0 to π/2) of the cosine function. Conversions for DST branch are not shown, they are similar, just extended like a first quadrant of the sine function.

The result sequence of step c) (now 2*N long again) is multiplied by the window function for the second time in step d), the two added terms of the last column ($d$ and $\stackrel{~}{c}$) are swapped for clarity.

The last image e) places result of the step d) and those of similarly processed subsequences $(c,d,e,f)$ and $(e,f,g,h)$ to the same time line. Fraction $(\stackrel{~}{d},\stackrel{~}{c})$ of the first block compensates $(\u2013\stackrel{~}{d},\u2013\stackrel{~}{c})$ of the second, and $(\stackrel{~}{f},\stackrel{~}{e})$ of the second – $(\u2013\stackrel{~}{f},\u2013\stackrel{~}{e})$ of the third. As the window satisfies Princen-Bradley condition, and it is applied twice, the $c$ columns of the first and second block when added result in $c$ segment of the original sequence. The same is true for the $d$, $e$, and $f$ columns. First and last N samples are not restored as there are no respective left and right neighbors to be processed.

Modified discrete cosine and sine transforms exhibit a perfect reconstruction property when input sequence is split into regular overlapping intervals and this is the case for many applications, such as audio and video compression. But what happens in the case of a space-variant shift? As the total shift is split into integer and symmetrical fractional part even smooth variation of the required shift for the neighbor sample sequences there will be places where the required shift crosses ±0.5 pix boundaries. This will cause the overlap to be N±1 instead of exactly N and the remaining fractional pixel shift will jump from +0.5 pix to -0.5pix or vice versa.

Such condition is illustrated in Figure 2, where fractional pixel shift is increased to ±1.0 instead of ±0.5 to avoid signal shape distortion caused by a fractional shift. The sine window is also modified accordingly to have zeros on both ends and so to have a flat top, but it still satisfies Princen-Bradley condition. The sawtooth waveform represents input signal that has large pedestal = 10 and a full amplitude of 3. The two left input intervals (1 to 16 and 9 to 24) are shown in red, the two right ones (15 to 30 and 23 to 38) are blue. These intervals have extra overlap (N+2) between red and blue ones. Dotted waveforms show window functions centered around input intervals, dashed – inputs multiplied by the windows. Solid lines on the top plot show result of the FD rotation resulting in 1 shift to the right for the red subsequences, to the left – by the blue ones.

The bottom plot of Figure 2 is for the “rectified image”, where the overlapping intervals are spread evenly (0-15, 8-23, 16-31, 24-39), the restored data is multiplied by the windows again (red and blue solid lines) and then added together (dark green waveform) in an attempt to restore the original sawtooth.

There is an obvious problem in the center where the peak is twice higher than the center sawtooth (dotted waveform). It is actually a “wrinkle” caused by the input signal pedestal that was pushed to the center by the increased overlap of the input, not by the sawtooth shape. The FD phase shift moved *windowed* input signal, not just the input signal. So even if the input signal was constant, left two windows would be shifted right by 1, and right ones – left by one sample, distorting their sum. Figure 3 illustrates the opposite case where the input subsequences have reduced rather than increased overlap and FD phase rotation moves windowed data away from the center – the restored signal has a dip in the middle.

This problem can be corrected if the first window function used for input signal multiplication takes the FD shift into account, so that the input windows *after* phase rotation match the output ones. Signals similar to those in Figure 2, but with appropriately offset input windows (dotted waveforms are by 1 pixel asymmetrical) are presented in Figure 4. It shows perfect reconstruction (dark green line) of the offset input sawtooth signal.

The MCLT converter illustrated in Figure 5 takes an array of 256 pixel samples and processes them at the rate of 1 pixel per clock, resulting in 4 of the 64 (8×8) arrays representing FD transformation of this tile. Converter incorporates phase rotator that is equivalent to the fractional pixel shifter with 1/128 pixel resolution. Multiple pixel tiles tiles can be processed immediately after each other or with a minimal gap of 16 pixels needed to restore the pipeline state. *Fold sequencer* receives start signal and simultaneously two 7-bit fractional pixel X and Y shifts in 1/128 increments (-64 to +63). Pixel data is received from the read-only port of the dual-port memory (*input tile buffer*) filled from the external DDR memory. *Fold sequencer* generates buffer addresses (including memory page number) and can be configured for various buffer latency.

*Fold sequencer* simultaneously generates X and Y addresses for the *2-port window ROM* that generates window function (it is a half-sine as discussed above) values that are combined by a multiplier as the 2-d window function is separable. Each of the dimensions calculate window values with appropriate subpixel shift to allow space-variant FD processing. Mapping from the 16×16 to 8×8 tiles is performed according to Figure 1b for each direction, resulting in four 8×8 tiles for DCT/DCT (horizontal/vertical), DST/DCT, DCT/DST and DST/DST future processing, Each of the source pixels contribute to all 4 of the 8×8 arrays, and each corresponding elements of the 4 output arrays share the same 4 contributing pixels – just with different signs. That allows to iterate through the source pixels in groups of 4 once, multiply them by appropriate window values and in 4 cycles add/subtract them in 4 accumulators. These values are registered and multiplexed during the next 4 cycles feeding *512×25 DTT input buffer* (DTT stands for Discrete Trigonometric Transform – a collective name for both DCT and DST).

“Folded” data stored in the *512×25 DTT input buffer* is fed to the 2-dimensional 8×8 pixel DTT module. It is similar to the one described in the DCT-IV implementation blog post, it was just modified to allow all 4 DTT variants, not just the DCT/DCT described there. This was done using the property of DCT-IV/DST-IV that DST-IV can be calculated as DCT-IV if the input sequence is reversed (x0 ↔ x7, x1 ↔ x6, x2 ↔ x5, x3 ↔ x4) and sign of the odd output samples (y1, y3, y5, y7) is inverted. This property can be seen by comparing plots of the basis functions in Figure 6, the proof will be later in the text.

Another memory buffer is needed after the *2d DTT* module as it processes one of four 64-pixel transforms at a time, and the phase rotator (Figure 7) needs simultaneous access to all 4 components of the same FD sample. These 4 components are shown on the left of the diagram, they are multiplied by 4 sine/cosine values (shown as CH, SH, CV and SV) and then combined by the adders/subtracters. Total number of multiplications is 16, and they have to be performed it 4 clock cycles to maintain the same data rate through all the MCLT converter, so 4 multiplier-accumulator modules are needed. One FD point calculation uses 4 different sine/cosine coefficients, so a single ROM is sufficient. The phase rotator uses horizontal and vertical fractional pixel shift values for the second time (first was for the window function) and combines them with the multiplexed horizontal and vertical indexes (3 bits each) as address inputs of the coefficient ROM. Rotator provides rotated coefficients that correspond to the MCLT transform of the pixel data shifted in both horizontal and vertical directions by up to ±0.5 pix at a rate of 1 coefficient per clock cycle, providing both output data and address to the external multi-page memory buffer.

Most camera applications use color image sensors that provide Bayer mosaic color data: one red, one blue and two diagonal green pixels in each 2×2 pixel group. In old times image senor pixel density was below the lenses optical resolution and all the cameras performed color interpolation (usually bilinear) of the “missing” colors. This process implied that each red pixel, for example, had four green neighbors (up, down, left and right) at equal distance, and 4 blue pixels located diagonally. With the modern high-resolution sensors it is not the case, possible distortions are shown on Figure 8 (copied from the earlier post). More elaborate “de-mosaic” processing involves non-linear operations that would influence sub-pixel correlation results.

As the simple de-mosaic procedures can not be applied to the high resolution sensors without degrading the images, we treat each color subchannel individually, merging results after performing the optical aberration correction in the FD, or at least compensating the lateral chromatic aberration that causes pixels x/y shift.

Channel separation means that when converting data to the FD the input data arrays are decimated: for green color only half of the values (located in a checkerboard pattern) are non-zero, and for red and blue subchannels – only quarter of all pixels have non-zero values. The relative phase of the remaining pixels depends on the required integer pixel offset (only ±0.5 are delegated to the FD) and may have 2 values (black vs. white checkerboard cells) for green, and 4 different values (odd/even for each of the horizontal/vertical directions independently) for red and blue. MCLT can be performed on the sparse input arrays the same way as on the complete 16×16 ones, low-pass filters (LPF) may be applied on the later processing stages (LPF may be applied to the deconvolution kernels used for aberration correction during camera calibration). It is convenient to multiply red and blue values by 2 to compensate for the smaller number of participating pixels compared to the green sub-channel.

Direct approach to calculate FD transformation of the color mosaic input image would be to either run the monochrome converter (Figure 5) three times (in 256*3=768 clock cycles) masking out different pixel pattern each time, or implement the same module (including the input pixel buffer) three times to achieve 256 clock cycle operation. And actually the 16×16 pixel buffer used in monochrome converter is not sufficient – even the small lateral chromatic aberration leads to mismatch of the 16×16 tiles for different colors. That would require using larger – 18×18 (for minimal aberration) pixel buffer or larger if that aberration may exceed a full pixel.

Luckily MCLT that consists of MDCT and MDST, they it turn use DCT-IV and DST-IV that have a very convenient property when processing the Bayer mosaic data. Almost the same implementation as used for the monochrome converter can transform color data at the same 256 clock cycles. The only part that requires more resources is the final phase rotator – it has to output 3*4*64 values in 256 clock cycles so three instances of the same rotator are required to provide the full load of the rest of the circuitry. This almost (not including rotators) triple reduction of the MCLT calculation resources is based on “folding” of the input data for the DTT inputs and the following DCT-IV/DST-IV relation.

*DST-IV is equivalent to DCT-IV of the input sequence, where all odd input samples are multiplied by -1, and the result values are read in the reversed order.* In the matrix form it is shown in (1) below:

$${\mathrm{DST}}^{\mathrm{IV}}=\left[\begin{array}{ccccc}0& \mathrm{}& \mathrm{}& \mathrm{}& 1\\ \mathrm{}& \mathrm{}& \mathrm{}& 1& \mathrm{}\\ \mathrm{}& \mathrm{}& 1& \mathrm{}& \mathrm{}\\ \mathrm{}& \mathrm{\cdot \cdot \cdot}& \mathrm{}& \mathrm{}& \mathrm{}\\ 1& \mathrm{}& \mathrm{}& \mathrm{}& 0\end{array}\right]\cdot {\mathrm{DCT}}^{\mathrm{IV}}\cdot \left[\begin{array}{ccccc}1& \mathrm{}& \mathrm{}& \mathrm{}& 0\\ \mathrm{}& -1& \mathrm{}& \mathrm{}& \mathrm{}\\ \mathrm{}& \mathrm{}& 1& \mathrm{}& \mathrm{}\\ \mathrm{}& \mathrm{}& \mathrm{}& \mathrm{\cdot \cdot \cdot}& \mathrm{}\\ 0& \mathrm{}& \mathrm{}& \mathrm{}& -1\end{array}\right]$$ | (1) |

Equations (2) and (3) show definitions of DCT-IV and DST-IV^{[4]}:

$${\mathrm{DCT}}^{\mathrm{IV}}\left(k\right)=\sqrt{\frac{2}{N}}\cdot \sum _{l=0}^{N\u20131}\mathrm{cos}(\frac{\pi}{N}\cdot (l+\frac{1}{2})\cdot (k+\frac{1}{2}\left)\right)$$ | (2) |

$${\mathrm{DST}}^{\mathrm{IV}}\left(k\right)=\sqrt{\frac{2}{N}}\cdot \sum _{l=0}^{N\u20131}\mathrm{sin}(\frac{\pi}{N}\cdot (l+\frac{1}{2})\cdot (k+\frac{1}{2}\left)\right)$$ | (3) |

The modified by (1) DST-IV can be re-writted as the two separate sums for even (*l=2*m*) and odd (*l=2*m+1*) input samples and by replacing output samples *k* with reversed (*N-1-k*). Then after removing full periods (n*2*π) and applying trigonometric identities it can be converted to the same value as DST-IV:

$${\mathrm{DCT}}_{\mathrm{mod}}^{\mathrm{IV}}\left(k\right)=\sqrt{\frac{2}{N}}\cdot (\sum _{m=0}^{N/2\u20131}\mathrm{cos}(\frac{\pi}{N}\cdot (2\cdot m+\frac{1}{2})\cdot ((N\u20131\u2013k)+\frac{1}{2}\left)\right)\u2013\sum _{m=0}^{N/2\u20131}\mathrm{cos}(\frac{\pi}{N}\cdot (2\cdot m+\frac{3}{2})\cdot ((N\u20131\u2013k)+\frac{1}{2}\left)\right))=$$ $$\sqrt{\frac{2}{N}}\cdot (\sum _{m=0}^{N/2\u20131}\mathrm{cos}(\frac{\pi}{N}\cdot (\frac{1}{2}\cdot N\u2013(2\cdot m+\frac{1}{2})\cdot (k+\frac{1}{2})\left)\right)\u2013\sum _{m=0}^{N/2\u20131}\mathrm{cos}(\frac{\pi}{N}\cdot (\frac{3}{2}\cdot N\u2013(2\cdot m+\frac{3}{2})\cdot (k+\frac{1}{2})\left)\right))=$$ $$\sqrt{\frac{2}{N}}\cdot (\sum _{m=0}^{N/2\u20131}\mathrm{cos}(\u2013\frac{\pi}{2}+\frac{\pi}{N}\cdot (2\cdot m+\frac{1}{2})\cdot (k+\frac{1}{2}\left)\right)\u2013\sum _{m=0}^{N/2\u20131}\mathrm{cos}(\frac{\pi}{2}+\frac{\pi}{N}\cdot (2\cdot m+\frac{3}{2})\cdot (k+\frac{1}{2}\left)\right))=$$ $$\sqrt{\frac{2}{N}}\cdot (\sum _{m=0}^{N/2\u20131}\mathrm{sin}(\frac{\pi}{N}\cdot (2\cdot m+\frac{1}{2})\cdot (k+\frac{1}{2}\left)\right)+\sum _{m=0}^{N/2\u20131}\mathrm{sin}(\frac{\pi}{N}\cdot (2\cdot m+\frac{3}{2})\cdot (k+\frac{1}{2}\left)\right))={\mathrm{DST}}^{\mathrm{IV}}\left(k\right)$$ | (4) |

As shown in Figure 1b, the 2*N-long input sequence of four fragments (a,b,c,d) is folded in N-long sequence (5) for DCT-IV input:

$$(a,b,c,d)\u27f6(\u2013\stackrel{~}{c}\u2013d,a\u2013\stackrel{~}{b})$$ | (5) |

and (6) – for DST-IV:

$$(a,b,c,d)\u27f6(\stackrel{~}{c}\u2013d,a+\stackrel{~}{b})$$ | (6) |

where tilde “~” over the name indicates reversal of the segment. Such direction reversal for the sequences of even length (N/2=4 in our case) swaps odd- and even-numbered samples. Each of the halves of each of (5) and (6) show that both have the same $(\u2013d,a)$ for direct and differ in sign for reversed: $(\u2013\stackrel{~}{c},\u2013\stackrel{~}{b})$ and $(\stackrel{~}{c},\stackrel{~}{b})$ terms. Each of the input samples appears exactly once in each of the DCT-IV and DST inputs. Even samples of the input sequence contribute identically to the even samples of both output sequences (through a an d) and contribute with the opposite signs to the odd output samples (through b and c). And similarly the odd-numbered input samples contribute identically to the odd-numbered output samples and with the opposite sign to the even ones.

So, *sparse input sequence with only non-zero values at even positions result in both DCT-IV and DST-IV having the same values at even positions and multiplied by -1 – in the odd. Odd-only input sequences result in same values in odd positions and negatives – in the even ones.*

Now this property can be combined to the previously discussed DST-IV to DCT-IV relation and extended to the two-dimensional Bayer mosaic case. We can start with the red and blue channels that have 1-in-4 non-zero pixels. If the non-zero pixels are in the even columns, then after first horizontal pass the DST-IV output will differ from the DCT-IV only by the output samples order according to (1), as even input values are the same, and odd are negated. Reversed output coefficients order for the horizontal pass means that the DST-IV will be just horizontally flipped with respect to the DCT-IV one. If the non-zero input was for the odd columns, then the DST-IV output will be reversed and negated compared to the DCT-IV one.

The second (vertical) pass is applied similarly. If original pattern had non-zero even rows, the result of vertical DST-IV would be the same as those of the DCT-IV after a vertical flip. If the odd rows were non-zero instead – the result will be vertically flipped and negated with respect of the DCT-IV one.

This result means that for the 1-in-4 Bayer mosaic array only one DCT-IV/DCT-IV transform is need. The three other DTT combinations may be obtained by flipping the DCT-IV/DCT-IV output horizontally and/or vertically and possibly negating all the data. Reversal of the readout order does not require additional hardware resources (just a few gates for a little fancier memory address counter). Data negation is also “free” as it can easily be absorbed by the phase rotator that already has adders/subtracters and just needs an extra inversion sign control. Flips do not depend on the odd/even columns/rows, only negation depends on them.

Green channel has non-zero values in either (0,0) and (1,1) or (0,1) and (1,0) positions. So it can be considered as a sum of two 1-in-4 channels described above. In the (0,0) case neither horizontal, no vertical DST-IV inverts sign, and (1,1) inverts both, so the DST-IV/DST-IV does not have any inversion, and it will stay true for the green color – combination of (0,0) and (1,1). We can not compare DCT-IV/DCT-IV with ether DST-IV/DCT-IV or DCT-IV/DST-IV, but these two will have the same sign compared to each other (zero or double negation). So the DST-IV/DST-IV will be double-flipped (both horizontally and vertically) version of the DCT-IV/DCT-IV output, and DCT-IV/DST-IV – double flipped version of DST-IV/DCT-IV, and calculation for this green pattern requires just two of the 4 DTT operations. Similarly, for the (0,1)/(1,0) green pattern DST-IV/DST-IV will be double-flipped and negated version of the DCT-IV/DCT-IV output, and DCT-IV/DST-IV – double-flipped and negated version of DST-IV/DCT-IV.

Combining results for all three color components: red, blue and green, we need total of four DTT operations on 8×8 tiles: one for red, one for blue and 2 for green component instead of twelve if each pixel had full RGB value.

MCLT converter for the Bayer Mosaic data shown in Figure 9 is similar to the already described converter for the monochrome data. The first difference is that now the *Fold sequencer* addresses larger source tiles – it is run-time configurable to be 16×16, 18×18, 20×20 or 22×22 pixels. Each tile may have different size – this functionality can be used to optimize external memory access: use smaller tiles in the center areas of the sensor with lower lateral chromatic aberrations, read larger tiles in the peripheral areas with higher aberrations. The extended tile should accommodate all three color shifted 16×16 blocks as illustrated in Figure 10.

The *start* input triggers calculation of all 3 colors, it can appear either each 256-th clock cycle or it needs to wait for at least 16 cycles after the end of the previous tile input if started asynchronously. X/Y offsets are provided individually for each color channel and they are stored in a register file inside the module, the integer top-left corner offset is provided separately for each color to simplify internal calculations.

Dual-port window ROM is the same as in the monochrome module, both horizontal and vertical window components are combined by a multiplier and then the result is multiplied by the pixel data received from the external memory buffer with configurable latency.

There are only two (instead the four for monochrome) accumulators required because each of the “folded” elements of the 8×8 DTT inputs has two contributor source pixels at most (for green color), while red and blue use just a single accumulator. Red color is processed first, then blue, and finally green. For the first ones only the DCT-IV/DCT-IV input data is prepared (in 64 clock cycles each), green color requires DST-IV/DCT-IV block additionally (128 clock cycles).

DTT output data goes to three parallel dual-port memory buffers. Red and blue channels each require just single 64-element array that later provides 4 different values in 4 clock cycles for horizontally and vertically flipped data. Green channel requires 2*64 element array. These buffers are filled one at a time (red, blue, green) and each of them feed corresponding phase rotator. Rotator outputs need 256 cycles to provide 4*64=256 FD values, the symmetry exploited during DTT conversion is lost at this stage. The three phase rotators can provide output addresses/data asynchronously or the green and blue outputs can be delayed and output simultaneously with the green channel, sharing the external memory addresses.

Results of the simulation of the MCLT converter are shown in Figure 11. Simulation ran with the clock frequency set to 100 MHz, the synthesized code should be able to run at at least 250 MHz in the Zynq SoC of the NC393 camera – at the same rate as is currently used for the data compression – all the memories and DSPs are fully buffered. Simulation show processing of the 3 tiles: two consecutive and the third one after a minimal pause. The data input was provided by the Java code that was written for the post-processing of the camera images, the same code generated intermediate results and the output of the MCLT conversion. Java used code involved double precision floating point calculations, while the RTL code is based on fixed-point calculations with the precision of the architecture DSP primitives, so there is no bit-to-bit match of the data, instead there is a difference that corresponds to the width of the used data words.

The RTL code (licensed under GNU GPLv3+) developed for the MCLT-based frequency domain conversion of the Bayer color mosaic images uses almost the same resources as the monochrome image transformation for the tiles of the same size – three times less than a full RGB image would require. Input window function modification that accounts for the two-dimensional fractional pixel shift and the post-transform phase rotators allow to avoid re-sampling for the image rectification that degrades sub-pixel resolution.

This module is simulated with Icarus Verilog (a free software simulator), results are compared to those of the software post-processing used for the 3-d scene reconstruction of multiple image sets, described in the “Long range multi-view stereo camera with 4 sensors” post.

The MCLT module for the Bayer color mosaic images is not yet tested in the FPGA of the camera – it needs more companion code to implement a full processing chain that will generate disparity space images (DSI) required for the real-time 3d scene reconstruction and/or output of the aberration-corrected image textures. But it constitutes a critical part of the Tile Processor project and brings the overall system completion closer.

[1] Thiébaut, Éric, et al. “Spatially variant PSF modeling and image deblurring.” SPIE Astronomical Telescopes+ Instrumentation. International Society for Optics and Photonics, 2016. pdf

[2] Řeřábek, M., and P. Pata. “The space variant PSF for deconvolution of wide-field astronomical images.” SPIE Astronomical Telescopes+ Instrumentation. International Society for Optics and Photonics, 2008.pdf

[3] Malvar, Henrique. “A modulated complex lapped transform and its applications to audio processing.” Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on. Vol. 3. IEEE, 1999.pdf

[4] Britanak, Vladimir, Patrick C. Yip, and Kamisetty Ramamohan Rao. Discrete cosine and sine transforms: general properties, fast algorithms and integer approximations. Academic Press, 2010.

]]>- apache2-2.4.18 => apache2-2.4.29
- php-5.6.16 => php-5.6.31
- udev-182 changed to eudev-3.2.2, etc.

```
sysroot_stage_all_append() {
sysroot_stage_dir ${WORKDIR}/headers/include ${SYSROOT_DESTDIR}/usr/include-uapi
}
```

We had this task in Jethro but there was another variable used ```
chosen {
bootargs = "cma=128M console=ttyPS0,115200 root=/dev/mmcblk0p2 rw earlyprintk rootwait rootfstype=ext4";
linux,stdout-path = "/amba@0/serial@e0000000";
};
```

```
chosen {
bootargs = "earlycon cma=128M root=/dev/mmcblk0p2 rw rootwait rootfstype=ext4";
stdout-path = "serial0:115200n8";
};
```

```
phy3: phy@3 {
compatible = "atheros,8035";
device_type = "ethernet-phy";
reg = <0x3>;
};
```

```
phy3: phy@3 {
/* Atheros 8035 */
compatible = "ethernet-phy-id004d.d072";
/* compatible = "ethernet-phy-ieee802.3-c22";*/
device_type = "ethernet-phy";
reg = <0x3>;
};
```

The ```
# This option is for FPGA part
CONFIG_XILINX_DEVCFG=y
# prints time before messages
CONFIG_PRINTK_TIME=y
# dependency for DYNAMIC_DEBUG=y
CONFIG_DEBUG_FS=y
# turned off because old:
CONFIG_XILINX_PS_EMAC=n
CONFIG_XILINX_EMACLITE=n
CONFIG_XILINX_AXI_EMAC=n
```

`ACTION=="add", RUN+="/usr/bin/rsync -a /lib/udev/devices/ /dev/"`

This rule adds up ~6 secs to boot time for some reason vs almost nothing if run from the camera init script - Reminder: the system boots into initramfs first and runs
*init*built by initramfs-live-boot. The script runs*switch_root*in the end. udev daemon gets killed and restarted

- Drupal as a general purpose CMS for the main site
- WordPress for the development blogs
- Mediawiki for the wiki-based documentation.
- Mailman (self hosted) and Mail Archive (external site) for the mailing list that is our main channel of the user technical support
- Gitlab CE for the code and other version-controlled content. We used Github but switched to self-hosted Gitlab CE following FSF recommendations
- Other customized versions of FLOSS web applications, such as OSTicket for support tickets and FrontAccounting for inventory and production
- In-house developed free software web applications, such as x3dom-based 3D scene and map viewer, 3D mechanical assembly viewer available for the assemblies and mechanical components on the wiki, WebGL Panorama Viewer/Editor.

- to have a common search over all subdomains always available (looking glass icon in the top right corner)
- as we can not cross-link properly all the information, then at least we have to communicate the idea that there are multiple subdomains to our visitors.

#Tell search engines we do not cloak Header always add Vary: Referer RewriteCond %{REQUEST_FILENAME} !-f RewriteCond "%{HTTP_REFERER}" "!^((.*)elphel\.com(.*)|)$" RewriteCond %{REQUEST_URI} !category.*feed RewriteCond %{REQUEST_URI} !^/[0-9]+\..+\.cpaneldcv$ <span style="color:#999999">RewriteCond %{REQUEST_URI} !^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$</span> RewriteRule ^(.*)$ https://www.elphel.com/blog%{REQUEST_URI} [L,R=302]Such redirection is obviously impossible for external sites as Mail Archive and we did not use it for https://git.elphel.com – in both cases if visitors followed those links from the search engines results, they would not expect such pages to be parts of the well cross-linked company web site.

<script src="https://www.elphel.com/js/elphel_messenger.js"></script> <script> ElphelMessenger.init(); </script>

- the top (almost like a “frameset” in older HTML days) page itself has very little content – most is provided in the included iframe elements
- the served content depends on the referrer address – that might be considered as “cloaking.”

- Were we able to communicate the idea that the site consists of multiple loosely-connected subdomains with different navigation?
- Is the framed site navigation intuitive or annoying?
- Does the combined search over multiple subdomains do its job and does it behave as users expect?

- Initialize source/headers directories with bitbake, so it “knows” that everything needs to be rebuilt for the project
- Create a list of the source files (resolving symlinks when needed) and “touch” them, setting modification timestamps. This action prepares the files so the next (first after modification) file access will be recorded as access timestamp. Record the current time.
- Wait a few seconds to reliably distinguish if each file was accessed after modification
- Run bitbake build (“bitbake <target> -c compile -f”)
- Scan all the files from the previously created source list and generate ”include_list” of those that were accessed during the build process.
- As CDT accepts only “exclude” filters in this context, recursively combine full source list and include_list to generate “exclude_list” pruning all the branches that have nothing to include and replacing them with the full branch reference
- Apply the generated exclusion list to the CDT project file “.cproject”

$(“#forest01”).player(1);

- common (average) distortion of all four lenses approximated by analytical radial distortion model, and
- small residual deviation of each lens image transformation from the common distortion model

- tile center X,Y (for the virtual “center” image),
- center disparity, so the each of the 4 image tiles will be shifted accordingly, and
- the code of operation(s) to be performed on that tile.

- Reads the tile tasks from the shared system memory.
- Calculates locations and loads image and calibration data from the external image buffer memory (using on-chip memory to cache data as the overlapping nature of the tiles makes each pixel to participate on average in 4 neighbor tiles).
- Converts tiles to frequency domain using CLT based on 2d DCT-IV and DST-IV.
- Performs aberration correction in the frequency domain by pointwise multiplication by the calibration kernels.
- Calculates correlation-related data (Figure 4) for the tile pairs, resulting in tile disparity and disparity confidence values for all pairs combined, and/or more specific correlation types by pointwise multiplication, inverse CLT to the pixel domain, filtering and local maximums extraction by quadratic interpolation or windowed center of mass calculation.
- Calculates combined texture for the tile (Figure 5), using alpha channel to mask out pixels that do not match – this is the way how to effectively restore single-pixel lateral resolution after aggregating individual pixels to tiles. Textures can be combined after only programmed shifts according to specified disparity, or use additional shift calculated in the correlation module.
- Calculates other integral values for the tiles (Figure 5), such as per-channel number of mismatched pixels – such data can be used for quick second-level (using tiles instead of pixels) correlation runs to determine which 3d volumes potentially have objects and so need regular (pixel-level) matching.
- Finally tile processor saves results: correlation values and/or texture tile to the shared system memory, so software can access this data.

- drag the 3d view to rotate virtual camera without moving;
- move cross-hair ⌖ icon in the map view to rotate camera around vertical axis;
- toggle ⇅ button and adjust camera view elevation;
- use scroll wheel over the 3d area to change camera zoom (field of view is indicated on the map);
- drag with middle button pressed in the 3d view to move camera perpendicular to the view direction;
- drag the the camera icon (green circle) on the map to move camera horizontally;
- toggle ⇅ button and move the camera vertically;
- press a hotkey
**t**over the 3d area to reset to the initial view: set azimuth and elevation same as captured; - press a hotkey
**r**over the 3d area to set view azimuth as captured, elevation equal to zero (horizontal view).

- Obviously, self-driving cars – increased number of cameras located in a 2d pattern (square) results in significantly more robust matching even with low-contrast textures. It does not depend on sequential scanning and provides simultaneous data over wide field of view. Calculated confidence of distance measurements tells when alternative (active) ranging methods are needed – that would help to avoid infamous accident with a self-driving car that went under a truck.
- Visual odometry for the drones would also benefit from the higher robustness of image matching.
- Rovers on Mars or other planets using low-power passive (visual based) scene reconstruction.
- Maybe self-flying passenger multicopters in the heavy 3d traffic? Sure they will all be equipped with some transponders, but what about aerial roadkills? Like a flock of geese that forced water landing.
- High speed boating or sailing over uneven seas with active hydrofoils that can look ahead and adjust to the future waves.
- Landing on the asteroids for physical (not just Bitcoin) mining? With 150 mm baseline such camera can comfortably operate within several hundred meters from the object, with 1.5 m that will scale to kilometers.
- Cinematography: post-production depth of field control that would easily beat even the widest format optics, HDR with a pair of 4-sensor cameras, some new VFX?
- Multi-spectral imaging where more spatially separate cameras with different bandpass filters can be combined to the same texture in the 3d scene.
- Capturing underwater scenes and measuring how far the sea creatures are above the bottom.
- …

- Camera:
**NC393-F-CS**- Resolution@fps:
*1080p@30fps, 720p@60fps* - Compression quality:
*90%* - Exposure time:
*1.7 ms* - Stream formats:
*mjpeg, rtsp* - Sensor: MT9P001, 5MPx, 1/2.5″
- Lens: Computar f=5mm, f/1.4, 1/2″

- Resolution@fps:
- PC:
*Shuttle box, i7, 16GB RAM, GeForce GTX 560 Ti* - Display:
*ASUS VS24A, 60Hz (=16.7ms), 5ms gtg* - OS:
*Kubuntu 16.04* - Network connection:
*1Gbps,**direct camera-PC via cable* - Applications:
*gstreamer**chrome, firefox**mplayer**vlc*

- Stopwatch: basic javascript

Resolution/fps | Image size^{1}, KB |
Transfer time^{2}, ms |
Data rate^{3}, Mbps |
---|---|---|---|

720p/60 | 250 | 2 | 120 |

1080p/30 | 500 | 4 | 120 |

Resolution | t_{ROW}^{1}, us |
t_{TR}^{2}, us |
---|---|---|

720p | 22.75 | 13.33 |

1080p | 29.42 | 20 |

full res (2592×1936) | 36.38 | 27 |

Resolution | t_{ERS} avg^{1}, ms |
t_{ERS} whole range^{2}, ms |
---|---|---|

720p | 8 |
0.01-16 |

1080p | 16 |
0.02-32 |

Resolution | t_{CAM}, ms |
---|---|

720p | 9.9 |

1080p | 17.9 |

- 30 fps => 33.3 ms
- 60 fps => 16.7 ms

Resolution/fps | Total Latency, ms | Network+PC+SW latency, ms |
---|---|---|

720p@60fps | 33.3-50 |
23.4-40.1 |

1080p@30fps | 33.3-66.7 |
15.4-48.8 |

- For wifi: use 5GHz over 2.4GHz – smaller jitter, non-overlapping channels
- Lower latency software: for mjpeg use
**gstreamer**or vlc (takes an extra effort to setup) over chrome or firefox because they do extra buffering

- Latency in live network video surveillance
- Wifi latencies, 2.4GHz & 5GHz
- This video compares different displays.
- About ERS

mjpeg | rtsp | |
---|---|---|

port 0 | 2323 | 554 |

port 1 | 2324 | 556 |

port 2 | 2325 | 558 |

port 3 | 2326 | 560 |

- For mjpeg:

`~$ gst-launch-1.0 souphttpsrc is-live=true location=http://192.168.0.9:2323/mimg ! jpegdec ! xvimagesink`

- For rtsp:

`~$ gst-launch-1.0 rtspsrc is-live=true location=rtsp://192.168.0.9:554 ! rtpjpegdepay ! jpegdec ! xvimagesink`

`~$ vlc rtsp://192.168.0.9:554`

Fig.1. Image comparison of the different processing stages output

- Input composite signal is split by colors into 3 separate channels producing sparse data in each.
- Each channel data is directly convolved with a small (we used just four non-zero elements) asymmetrical kernel AK, resulting in a sequence of 16×16 pixel tiles, overlapping by 8 pixels (input pixels are not limited to 16×16 tiles).
- Each tile is multiplied by a window function, folded and converted with 8×8 pixel DCT-IV
^{[4]}– equivalent of the 16×16->8×8 MDCT. - 8×8 result tiles are multiplied by symmetrical kernels (SK) – equivalent of convolving the pre-MDCT signal.
- Each channel is subject to the low-pass filter that is implemented by multiplying in the frequency domain as these filters are indeed symmetrical. The cutoff frequency is different for the green (LPF1) and other (LPF2) colors as there are more source samples for the first. That was the last step before inverse transformation presented in the previous blog post, now we continued with a few more.
- Natural images have strong correlation between different color channels so most image processing (and compression) algorithms involve converting the pure color channels into intensity (Y) and two color difference signals that have lower bandwidth than intensity. There are different standards for the color conversion coefficients and here we are free to use any as this process is not a part of a matched encoder/decoder pair. All such conversions can be represented as a 3×3 matrix multiplication by the (r,g,b) vector.
- Two of the output signals – color differences are subject to an additional bandwidth limiting by LPF3.
- IMDCT includes 8×8 DCT-IV, unfolding 8×8 into 16×16 tiles, second multiplication by the window function and accumulation of the overlapping tiles in the pixel domain.

- To reduce remaining signal modulation caused by the Bayer pattern (each source pixel carries data about a single color component, not all 3), trying to remove it by a LPF would blur the image itself.
- Detect and enhance edges, as most useful high-frequency elements represent locally linear features
- Reduce visible noise in the uniform areas (such as blue sky) where significant (especially for the small-pixel sensors) noise originates from the shot noise of the pixels. This noise is amplified by the aberration correction that effectively increases the high frequency gain of the system.

- First the 3×3 center-symmetric matrices (one for Y, another for color) of coefficients are calculated using the Y channel data, then
- they are applied to the Y and color components by replacing the pixel value with the inner product of the calculated coefficients and the original data.

- Four inner products are calculated for the same 9-sample Y data and the shown matrices (corresponding to second derivatives along vertical, horizontal and the two diagonal directions).
- Each of these values is squared and
- the following four 3×3 matrices are multiplied by these values. Matrices are symmetrical around the center, so gray-colored cells do not need to be calculated.
- Four matrices are then added together and scaled by a variable parameter K1. The first two matrices are opposite to each other, and so are the second two. So if the absolute value of the two orthogonal second derivatives are equal (no linear features detected), the corresponding matrices will annihilate each other.
- A separate 3×3 matrix representing a weighted running average, scaled by K2 is added for noise reduction.
- The sum of the positive values is compared to a specified threshold value, and if it exceed it – all the matrix is proportionally scaled down – that makes different line directions to “compete” against each other and against the blurring kernel.
- The sum of all 9 elements of the calculated array is zero, so the default unity kernel is added and when correction coefficients are zeros, the result pixels will be the same as the input ones.
- Inner product of the calculated 9-element array and the input data is calculated and used as a new pixel value. Two of the arrays are created from the same Y channel data – one for Y and the other for two color differences, configurable parameters (K1, K2, threshold and the smoothing matrix) are independent in these two cases.

- lateral chromatic aberrations (or just shift in the image domain) – Fig.1b and
- “diagonal” kernels (Fig.1a) – not an even function of each of the vertical and horizontal axes.