mpir/mpn/x86_64/amd64
2008-05-29 23:55:41 +00:00
..
addmul_1.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
copyd.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
copyi.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
gmp-mparam.h The gmp-mparam.h files seemed to be different between my two sets of files. These ones make the segfault go away in the flint test. 2008-05-27 02:11:08 +00:00
mul_basecase.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
README Basic GMP files with a new core2 directory and amd_64 directory with Martin's and Gaudry's patches. 2008-04-17 21:03:07 +00:00
sqr_basecase.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
submul_1.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
udiv.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
umul.as Attempt to fix assembler file names. 2008-05-29 23:55:41 +00:00
x86_64-defs.m4 Basic GMP files with a new core2 directory and amd_64 directory with Martin's and Gaudry's patches. 2008-04-17 21:03:07 +00:00

Some assembly routines for AMD64 architecture (Opteron, Athlon64)

Author:    P. Gaudry
Date:      April 2005 -- March 2006
Copyright: LGPL

Purpose:
========

This is a patch to gmp-4.2 for AMD64 architecture. The 4.2 version comes
with basic assembly support. This patch gives substantial speed-up.

Only a few functions have been written:
  add_n
  sub_n
  addmul_1
  submul_1
  mul_basecase
  sqr_basecase

The assembly code is mostly a 64 bit translation of the k7 assembly code
that is available in GMP. The main modifications are:
* The ABI for function calls is not the same: up to 6 parameters
  are passed in registers, not on the stack.
* Change movl to movq, eax to rax, etc... That's the easy part.
* In an unrolled loop, the size of the unrolled code is not the same, so
  the computation of the jump is different.

Changes:
========

There is almost no change compared to the patch for 4.1.4. The
multiplication has been slighlty improved (around 3.15 cyc/limb) but most
of the improvement in the gmpbench score comes from modifications in the
C code of GMP between the 2 versions. 

Disclaimer:
===========

The code has been reasonnably well tested. I used the program tests/devel/try
that tests quite a few bug possibilities. Nonetheless, there is no
warranty whatsoever. 

Bugs:
=====

Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to
the official GMP developpers: they have nothing to do with this code.

Performance:
============

I've got a multiply bench of around 55000 on a 2.4 GHz Opteron (was 41500
with the plain 4.2). The whole gmpbench score is about 10000 (was
8200 before patch).

Install:
========

1) Get the gmp-4.2 archive and unpack it, thus creating a
   directory   /path_to_gmp/gmp-4.2/
2) In the directory of mpn_amd64.42, run
    ./install /path_to_gmp/gmp-4.2
3) cd /path_to_gmp/gmp-4.2
4) ./configure with your favorite options
5) make && make check && make install