; Copyright 2005, 2006 Pierrick Gaudry ; ; This file is part of the MPIR Library. ; ; The MPIR Library is free software; you can redistribute it and/or ; modify it under the terms of the GNU Lesser General Public License as ; published by the Free Software Foundation; either version 2.1 of the ; License, or (at your option) any later version. ; ; The MPIR Library is distributed in the hope that it will be useful, ; but WITHOUT ANY WARRANTY; without even the implied warranty of ; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ; Lesser General Public License for more details. ; ; You should have received a copy of the GNU Lesser General Public ; License along with the MPIR Library; see the file COPYING.LIB. If ; not, write to the Free Software Foundation, Inc., 51 Franklin Street, ; Fifth Floor, Boston, MA 02110-1301, USA. Some assembly routines for AMD64 architecture (Opteron, Athlon64) Author: P. Gaudry Date: April 2005 -- March 2006 Copyright: LGPL Purpose: ======== This is a patch to gmp-4.2 for AMD64 architecture. The 4.2 version comes with basic assembly support. This patch gives substantial speed-up. Only a few functions have been written: add_n sub_n addmul_1 submul_1 mul_basecase sqr_basecase The assembly code is mostly a 64 bit translation of the k7 assembly code that is available in GMP. The main modifications are: * The ABI for function calls is not the same: up to 6 parameters are passed in registers, not on the stack. * Change movl to movq, eax to rax, etc... That's the easy part. * In an unrolled loop, the size of the unrolled code is not the same, so the computation of the jump is different. Changes: ======== There is almost no change compared to the patch for 4.1.4. The multiplication has been slighlty improved (around 3.15 cyc/limb) but most of the improvement in the gmpbench score comes from modifications in the C code of GMP between the 2 versions. Disclaimer: =========== The code has been reasonnably well tested. I used the program tests/devel/try that tests quite a few bug possibilities. Nonetheless, there is no warranty whatsoever. Bugs: ===== Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to the official GMP developpers: they have nothing to do with this code. Performance: ============ I've got a multiply bench of around 55000 on a 2.4 GHz Opteron (was 41500 with the plain 4.2). The whole gmpbench score is about 10000 (was 8200 before patch). Install: ======== 1) Get the gmp-4.2 archive and unpack it, thus creating a directory /path_to_gmp/gmp-4.2/ 2) In the directory of mpn_amd64.42, run ./install /path_to_gmp/gmp-4.2 3) cd /path_to_gmp/gmp-4.2 4) ./configure with your favorite options 5) make && make check && make install