mpir/mpn/x86_64/amd64
2008-09-11 23:50:19 +00:00
..
add_n.as
addmul_1.as
copyd.as
copyi.as
gmp-mparam.h Updated gmp_mparam.h for x86_64/amd64 2008-09-11 23:50:19 +00:00
mul_basecase.as
README
sqr_basecase.as
sub_n.as
submul_1.as
udiv.as
umul.as
x86_64-defs.m4

;  Copyright 2005, 2006 Pierrick Gaudry
;
;  This file is part of the MPIR Library.
;
;  The MPIR Library is free software; you can redistribute it and/or
;  modify it under the terms of the GNU Lesser General Public License as
;  published by the Free Software Foundation; either version 2.1 of the
;  License, or (at your option) any later version.
;
;  The MPIR Library is distributed in the hope that it will be useful,
;  but WITHOUT ANY WARRANTY; without even the implied warranty of
;  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;  Lesser General Public License for more details.
;
;  You should have received a copy of the GNU Lesser General Public
;  License along with the MPIR Library; see the file COPYING.LIB.  If
;  not, write to the Free Software Foundation, Inc., 51 Franklin Street,
;  Fifth Floor, Boston, MA 02110-1301, USA.

Some assembly routines for AMD64 architecture (Opteron, Athlon64)

Author:    P. Gaudry
Date:      April 2005 -- March 2006
Copyright: LGPL

Purpose:
========

This is a patch to gmp-4.2 for AMD64 architecture. The 4.2 version comes
with basic assembly support. This patch gives substantial speed-up.

Only a few functions have been written:
  add_n
  sub_n
  addmul_1
  submul_1
  mul_basecase
  sqr_basecase

The assembly code is mostly a 64 bit translation of the k7 assembly code
that is available in GMP. The main modifications are:
* The ABI for function calls is not the same: up to 6 parameters
  are passed in registers, not on the stack.
* Change movl to movq, eax to rax, etc... That's the easy part.
* In an unrolled loop, the size of the unrolled code is not the same, so
  the computation of the jump is different.

Changes:
========

There is almost no change compared to the patch for 4.1.4. The
multiplication has been slighlty improved (around 3.15 cyc/limb) but most
of the improvement in the gmpbench score comes from modifications in the
C code of GMP between the 2 versions. 

Disclaimer:
===========

The code has been reasonnably well tested. I used the program tests/devel/try
that tests quite a few bug possibilities. Nonetheless, there is no
warranty whatsoever. 

Bugs:
=====

Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to
the official GMP developpers: they have nothing to do with this code.

Performance:
============

I've got a multiply bench of around 55000 on a 2.4 GHz Opteron (was 41500
with the plain 4.2). The whole gmpbench score is about 10000 (was
8200 before patch).

Install:
========

1) Get the gmp-4.2 archive and unpack it, thus creating a
   directory   /path_to_gmp/gmp-4.2/
2) In the directory of mpn_amd64.42, run
    ./install /path_to_gmp/gmp-4.2
3) cd /path_to_gmp/gmp-4.2
4) ./configure with your favorite options
5) make && make check && make install