Some assembly routines for AMD64 architecture (Opteron, Athlon64)
Author: P. Gaudry
Date: April 2005 -- March 2006
Copyright: LGPL
Purpose:
========
This is a patch to gmp-4.2 for AMD64 architecture. The 4.2 version comes
with basic assembly support. This patch gives substantial speed-up.
Only a few functions have been written:
add_n
sub_n
addmul_1
submul_1
mul_basecase
sqr_basecase
The assembly code is mostly a 64 bit translation of the k7 assembly code
that is available in GMP. The main modifications are:
* The ABI for function calls is not the same: up to 6 parameters
are passed in registers, not on the stack.
* Change movl to movq, eax to rax, etc... That's the easy part.
* In an unrolled loop, the size of the unrolled code is not the same, so
the computation of the jump is different.
Changes:
========
There is almost no change compared to the patch for 4.1.4. The
multiplication has been slighlty improved (around 3.15 cyc/limb) but most
of the improvement in the gmpbench score comes from modifications in the
C code of GMP between the 2 versions.
Disclaimer:
===========
The code has been reasonnably well tested. I used the program tests/devel/try
that tests quite a few bug possibilities. Nonetheless, there is no
warranty whatsoever.
Bugs:
=====
Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to
the official GMP developpers: they have nothing to do with this code.
Performance:
============
I've got a multiply bench of around 55000 on a 2.4 GHz Opteron (was 41500
with the plain 4.2). The whole gmpbench score is about 10000 (was
8200 before patch).
Install:
========
1) Get the gmp-4.2 archive and unpack it, thus creating a
directory /path_to_gmp/gmp-4.2/
2) In the directory of mpn_amd64.42, run
./install /path_to_gmp/gmp-4.2
3) cd /path_to_gmp/gmp-4.2
4) ./configure with your favorite options
5) make && make check && make install