2. Tidy up assembler to prepare for Windows nehalem build
2. Workaround VC++ optimisation bug in mul_fft.c