mpir/mpn/x86_64
2008-06-15 22:43:57 +00:00
..
amd64 Put macros instances in all yasm assembly files for global symbol 2008-06-15 22:00:33 +00:00
core2 Added macro for global symbols. 2008-06-15 19:56:42 +00:00
add_n.as Fixed another typo in add_n.as. 2008-06-15 22:19:55 +00:00
dive_1.brg
gmp-mparam.h Fixed the speed issues with a static library vs Pierrick Gaudry's 2008-06-04 03:47:49 +00:00
lshift.as Added macro for global symbols. 2008-06-15 19:56:42 +00:00
mode1o.as Added global symbol to symbol. 2008-06-15 22:43:57 +00:00
mode1o.brg
mul_1.as Put macros instances in all yasm assembly files for global symbol 2008-06-15 22:00:33 +00:00
README
rshift.as Added macro for global symbols. 2008-06-15 19:56:42 +00:00
sub_n.as Put macros instances in all yasm assembly files for global symbol 2008-06-15 22:00:33 +00:00
x86_64-defs.m4 Fixed the speed issues with a static library vs Pierrick Gaudry's 2008-06-04 03:47:49 +00:00

Copyright 2003, 2004, 2006 Free Software Foundation, Inc.

This file is part of the GNU MP Library.

The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at your
option) any later version.

The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.





			AMD64 MPN SUBROUTINES


This directory contains mpn functions for AMD64 chips.  It might also be
useful for 64-bit Pentiums, but that chip's poor carry handling makes it
unlikely.  We'll need completely separate code (in a subdirectory).


		     RELEVANT OPTIMIZATION ISSUES

The only AMD64 core as of this writing is the AMD Hammer, sold under the
names Opteron and Athlon64.  The Hammer can sustain up to 3 instructions per
cycle, but in practice that is only possible for integer instructions.  But
almost any three integer instructions can issue simultaneously, including
any 3 ALU operations, including shifts.  Up to two memory operations can
issue each cycle.

Scheduling typically requires that load-use instructions are split into
separate load and use instructions.  That requires more decode resources,
and it is rarely a win.  Hammer is a deep out-of-order core.


REFERENCES

"System V Application Binary Interface AMD64 Architecture Processor
Supplement", draft version 0.90, April 2003.