86 lines
2.8 KiB
Plaintext
86 lines
2.8 KiB
Plaintext
; Copyright 2005, 2006 Pierrick Gaudry
|
|
;
|
|
; This file is part of the MPIR Library.
|
|
;
|
|
; The MPIR Library is free software; you can redistribute it and/or
|
|
; modify it under the terms of the GNU Lesser General Public License as
|
|
; published by the Free Software Foundation; either version 2.1 of the
|
|
; License, or (at your option) any later version.
|
|
;
|
|
; The MPIR Library is distributed in the hope that it will be useful,
|
|
; but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
; Lesser General Public License for more details.
|
|
;
|
|
; You should have received a copy of the GNU Lesser General Public
|
|
; License along with the MPIR Library; see the file COPYING.LIB. If
|
|
; not, write to the Free Software Foundation, Inc., 51 Franklin Street,
|
|
; Fifth Floor, Boston, MA 02110-1301, USA.
|
|
|
|
Some assembly routines for AMD64 architecture (Opteron, Athlon64)
|
|
|
|
Author: P. Gaudry
|
|
Date: April 2005 -- March 2006
|
|
Copyright: LGPL
|
|
|
|
Purpose:
|
|
========
|
|
|
|
This is a patch to gmp-4.2 for AMD64 architecture. The 4.2 version comes
|
|
with basic assembly support. This patch gives substantial speed-up.
|
|
|
|
Only a few functions have been written:
|
|
add_n
|
|
sub_n
|
|
addmul_1
|
|
submul_1
|
|
mul_basecase
|
|
sqr_basecase
|
|
|
|
The assembly code is mostly a 64 bit translation of the k7 assembly code
|
|
that is available in GMP. The main modifications are:
|
|
* The ABI for function calls is not the same: up to 6 parameters
|
|
are passed in registers, not on the stack.
|
|
* Change movl to movq, eax to rax, etc... That's the easy part.
|
|
* In an unrolled loop, the size of the unrolled code is not the same, so
|
|
the computation of the jump is different.
|
|
|
|
Changes:
|
|
========
|
|
|
|
There is almost no change compared to the patch for 4.1.4. The
|
|
multiplication has been slighlty improved (around 3.15 cyc/limb) but most
|
|
of the improvement in the gmpbench score comes from modifications in the
|
|
C code of GMP between the 2 versions.
|
|
|
|
Disclaimer:
|
|
===========
|
|
|
|
The code has been reasonnably well tested. I used the program tests/devel/try
|
|
that tests quite a few bug possibilities. Nonetheless, there is no
|
|
warranty whatsoever.
|
|
|
|
Bugs:
|
|
=====
|
|
|
|
Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to
|
|
the official GMP developpers: they have nothing to do with this code.
|
|
|
|
Performance:
|
|
============
|
|
|
|
I've got a multiply bench of around 55000 on a 2.4 GHz Opteron (was 41500
|
|
with the plain 4.2). The whole gmpbench score is about 10000 (was
|
|
8200 before patch).
|
|
|
|
Install:
|
|
========
|
|
|
|
1) Get the gmp-4.2 archive and unpack it, thus creating a
|
|
directory /path_to_gmp/gmp-4.2/
|
|
2) In the directory of mpn_amd64.42, run
|
|
./install /path_to_gmp/gmp-4.2
|
|
3) cd /path_to_gmp/gmp-4.2
|
|
4) ./configure with your favorite options
|
|
5) make && make check && make install
|