mpir/mpn/x86_64
2009-02-19 23:22:30 +00:00
..
amd64 merged k8-branch into trunk , autotools , few handbits , windows bits just copyed over 2009-02-19 23:22:30 +00:00
core2 changed gmp.h to mpir.h for a few odd cases left 2009-02-12 11:23:26 +00:00
fat for file in $(find -name \*.c ) ; do sed -e "s/#include \"gmp\.h\"/#include \"mpir.h\"/g" $file > temp ; mv temp $file ; done 2009-02-12 10:24:24 +00:00
add_n.as
addmul_1.as
gmp-mparam.h
lshift.as
mode1o.as changed libgmp*.* for a few odd cases left 2009-02-12 12:25:23 +00:00
mul_1.as
README
rshift.as
sub_n.as
submul_1.as
x86_64-defs.m4 Rewrote fat.c to work with x86_64 processors. Made fat_entry.asm 2009-01-18 23:21:54 +00:00
yasm_mac.inc Made changes to allow a fat build on x86_64. 2009-01-18 15:15:25 +00:00

Copyright 2003, 2004, 2006 Free Software Foundation, Inc.

This file is part of the GNU MP Library.

The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at your
option) any later version.

The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.





			AMD64 MPN SUBROUTINES


This directory contains mpn functions for AMD64 chips.  It might also be
useful for 64-bit Pentiums, but that chip's poor carry handling makes it
unlikely.  We'll need completely separate code (in a subdirectory).


		     RELEVANT OPTIMIZATION ISSUES

The only AMD64 core as of this writing is the AMD Hammer, sold under the
names Opteron and Athlon64.  The Hammer can sustain up to 3 instructions per
cycle, but in practice that is only possible for integer instructions.  But
almost any three integer instructions can issue simultaneously, including
any 3 ALU operations, including shifts.  Up to two memory operations can
issue each cycle.

Scheduling typically requires that load-use instructions are split into
separate load and use instructions.  That requires more decode resources,
and it is rarely a win.  Hammer is a deep out-of-order core.


REFERENCES

"System V Application Binary Interface AMD64 Architecture Processor
Supplement", draft version 0.90, April 2003.