------------------------------------------------------------------
From: jott@aol.com (Jott)
Newsgroups: comp.dsp
Subject: Re: needed optimized fft
Date: 16 Nov 1995 14:35:54 -0500
Organization: America Online, Inc. (1-800-827-6364)
Message-ID: <48g3qq$i82@newsbf02.news.aol.com>
References: <48dqoo$req@daffy.anetsrvcs.uwrf.edu>

Have you considered taking advantage of the complex FFT by combining two
real inputs into  one FFT computation. The book I am referencing is
"Handbook of Real-Time FFTs" by Winthrop Smith and Joanne Smith (IEEE
PRESS) though I'm sure it is in many other books.

Here is a short description of the method.
Step 1.
 start with a(n) and b(n) both real inputs and of length N.
n=0,1,2,...,N-1
c(n) = a(n) + j*b(n)

step 2.
C(k) = FFT{c(n)} = R(k) + j*I(k)           k=0,1,2,...,N-1

NOTE that C(k) = A(k) + j*B(k) but since A(k) and B(k) are complex they
need to be separated.

Step 3.
N is assumed even (if N is odd step 3 is different)

for k = 1,2,....(N-2)/2
    RP(k) = RP(N-k) = 0.5*[R(k) + R(N-k)]
    RM(k) = -RM(N-k) = 0.5*[R(k) - R(N-k)]
    IP(k) = IP(N-k) = 0.5*[I(k) + I(N-k)]
    IM(k) = -IM(N-k) = 0.5*[I(k) - I(N-k)]

    RP(0) = R(0)
    IP(0) = I(0)
    RM(0) = IM(0) = RM(N/2) = IM(N/2) = 0;
    RP(N/2) = R(N/2)
    IP(N/2) = I(N/2)

Step 4 - final output

 for k=0,1,2,...,N-1

   A(k) = RP(k) + j*IM(k)
   B(k) = IP(k) + j*RM(k)


This scheme may be more efficient than two real FFTs.

The above scheme could also be used if the orginal sequence x(m) of length
M is split into two sequences of length N= M/2 ( M has to be even and
assume N are even, if N is odd step 3 is slightly different ). The orginal
sequence is broken up such that the even samples are placed in the real
array and the odd samples placed in the imaginary array. 

a(n) = x(2n) , b(n) = x(2n+1)  ,  n = 0,1,2,...,N

The follow steps 1 through 3 above and use the following for step 4

step 4

for k= 0,1,2,...,N-1
XR(k) = XR(M-k) = RP(k) + cos(k*pi/N)*IP(k) - sin(k*pi/N)*RM(k)
XI(k) = -XI(M-k) = IM(k) - cos(k*pi/N)*RM(k) - sin(k*pi/N)*IP(k)

XR(0) = RP(0) + IP(0)
XI(0) = IM(0) - RM(0)

XR(N) = R(0) - I(0)
XI(N)  = 0

X(k) = X*(M-k) = XR(k) + j*XI(k)


These may or may not be more efficient than using FFTs algorithms designed
for real inputs, but they do allow more efficient use of what you do have.
Maybe a discussion on the benefits of the above approaches versus real
input FFT algorithms would be helpful. I would like to hear from people
who have used complex FFTs for real inputs. I have only investigated it
enough to prove it can be done. I have not investigated the computational
load of using complex FFTs to process real inputs versus real input FFT
algorithms.

HTH,

Joseph Ott
Senior System Engineer
VTG

------------------------------------------------------------------


I read the "note on real fft" on your hpmepage and may I suggest that
the method the guy is describing is VERY OLD and very well known.  It is
complicated and kludgy and will have more overhead than the simple straight-
forward real fft.  The ref to read is the one by Sorensen et al I cite
in my real fft C code.  I think you should suggest that people use
a real fft code instead.  I haven't needed the ifft yet but if I do I will
translate that to C too.

Cheers,
Bill Simpson

------------------------------------------------------------------


 I strongly disagree with this. I have used this method on shipped systems based
 on 56K. The problem with the real FFT is that you can't make it go anywhere near
 twice as fast as a complex FFT, and to speed it up significantly requires
 customizing a lot of passes - say the first two and the last two. This means you
 need a lot more DSP code, which can be a problem in many applications.

 I would never describe the technique as 'kludgey' because is it
 mathematically perfectly sound. Sure, it's 'well known' - it isn't
 in a lot of textbooks, but it is simple enough to figure out.

 Remember, you're getting two real FFT's for the cost of one CFFT, plus
 a bit of simple pre- and post- processing. In our application, the pre-
 and post-processing was rolled in with the processing before and after
 the FFT, and was almost free. There is no way this will have 'more
 overhead' than the "straightforward" real fft -- unless your FFT is VERY
 small, like 64 points or less.

 There is something to watch out for -- since you are rolling two FFT's
 into one, there will be noise crosstalk between the channels due to rounding
 errors (assuming an integer DSP). In our application, the two channels
 were left and right from stereo, so crosstalk wasn't too much of a problem.
 In a double precision floating-point enviroment, you would should have
 >100 db isolation easily.

 The type of FFT - DIT or DIF - affects the relationship between the signal
 spectrum and the rounding noise spectrum in interesting ways. One of the
 things they don't cover in textbooks.

 ---------------

reply my remark:

> btw.: i found that considerations
> for 
> -DSP/non DSP
> -integer FFT/ float FFT
> -small sizes/long sizes
> -cache mem machines / non ...
> etc etc
> are often completely different.

Yes, it does - there's also real-world/academic to deal
with - often academic works count the multiplies
and divides and ignore the addressing; in any decent
DSP the hard part is addressing and sequencing to
keep the ALU busy.

------------------------------------------------------------------