a197a2d3eb
Removed directories for no longer supported architectures. |
||
---|---|---|
.. | ||
aors_n.asm | ||
aorsmul_1.asm | ||
gmp-mparam.h | ||
mul_basecase.asm | ||
README | ||
sqr_basecase.asm | ||
x86_64-defs.m4 |
Some assembly routines for AMD64 architecture (Opteron, Athlon64) Author: P. Gaudry Date: April 2005 -- March 2006 Copyright: LGPL Purpose: ======== This is a patch to gmp-4.2 for AMD64 architecture. The 4.2 version comes with basic assembly support. This patch gives substantial speed-up. Only a few functions have been written: add_n sub_n addmul_1 submul_1 mul_basecase sqr_basecase The assembly code is mostly a 64 bit translation of the k7 assembly code that is available in GMP. The main modifications are: * The ABI for function calls is not the same: up to 6 parameters are passed in registers, not on the stack. * Change movl to movq, eax to rax, etc... That's the easy part. * In an unrolled loop, the size of the unrolled code is not the same, so the computation of the jump is different. Changes: ======== There is almost no change compared to the patch for 4.1.4. The multiplication has been slighlty improved (around 3.15 cyc/limb) but most of the improvement in the gmpbench score comes from modifications in the C code of GMP between the 2 versions. Disclaimer: =========== The code has been reasonnably well tested. I used the program tests/devel/try that tests quite a few bug possibilities. Nonetheless, there is no warranty whatsoever. Bugs: ===== Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to the official GMP developpers: they have nothing to do with this code. Performance: ============ I've got a multiply bench of around 55000 on a 2.4 GHz Opteron (was 41500 with the plain 4.2). The whole gmpbench score is about 10000 (was 8200 before patch). Install: ======== 1) Get the gmp-4.2 archive and unpack it, thus creating a directory /path_to_gmp/gmp-4.2/ 2) In the directory of mpn_amd64.42, run ./install /path_to_gmp/gmp-4.2 3) cd /path_to_gmp/gmp-4.2 4) ./configure with your favorite options 5) make && make check && make install