mpir/core2 at 0b249343251c10a5922f5c1f740e0959b3da406d - mpir

cheng/mpir

History

William Hart 8435273a1a Remove sb_div* small implementation (due to bug and due to being a very minor performance improvement).		2015-11-13 14:47:44 +00:00
..
penryn	Remove sb_div* small implementation (due to bug and due to being a very minor	2015-11-13 14:47:44 +00:00
add_n.as	New asm functions mpn_add_n mpn_sub_n for Core2/penryn/nehalem	2009-05-10 01:26:52 +00:00
addadd_n.asm	Replace lahf and sahf with .byte declarations to support old coreutils such as	2015-06-11 12:51:45 +00:00
addlsh_n.as	convert addlsh from gas to yasm format	2009-11-18 17:43:25 +00:00
addmul_1.asm	Use GMP add/submul_1 on core2 as well.	2014-02-21 15:25:41 +00:00
addmul_2.as	converted addmul_2 to yasm	2009-04-14 17:00:30 +00:00
addsub_n.asm	Replace lahf and sahf with .byte declarations to support old coreutils such as	2015-06-11 12:51:45 +00:00
and_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
andn_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
com_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
copyd.as	New asm functions for mpn_copyi mpn_copyd for k8,k10,core2,penryn,nehalem	2009-05-10 00:20:44 +00:00
copyi.asm	change asm #comment to C comment	2009-10-15 18:13:19 +00:00
divexact_byff.as	replace divebyff.* with divexact_byff.*	2010-08-13 13:23:52 +00:00
divrem_hensel_qr_1_2.asm	change asm #comment to C comment	2009-10-15 18:13:19 +00:00
gmp-mparam.h	Remove sb_div* small implementation (due to bug and due to being a very minor	2015-11-13 14:47:44 +00:00
hamdist.asm	faster core2/penryn mpn_hamdist by using the K8 version	2010-12-05 07:49:17 +00:00
ior_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
iorn_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
karaadd.asm	copy k8 and karaadd/sub to the other cpu arches linux and windows	2011-07-05 20:16:41 +00:00
karasub.asm	correct pop order for karasub , add redc_2 , add generic addadd addsub subadd sumdiff	2012-03-10 08:27:37 +00:00
lshift.asm	change asm #comment to C comment	2009-10-15 18:13:19 +00:00
mod_1_1.asm	New mod_1_1 for core2 and a slightly different one for penryn	2011-02-19 10:38:02 +00:00
mod_1_2.asm	New mpn_mod_1_2 for Core2/Penryn	2011-02-20 18:16:50 +00:00
mod_1_3.asm	change asm #comment to C comment	2009-10-15 18:13:19 +00:00
mul_1.asm	New core2/penryn mul_1 asm function	2010-12-17 00:54:02 +00:00
mul_2.as	duplicate x86_64 mul_2.as to overcome fat issues	2009-04-13 20:32:16 +00:00
mul_basecase.as	mul_basecase to yasm	2009-05-20 13:03:53 +00:00
nand_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
nior_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
popcount.asm	movq to movd in asm	2011-04-16 16:55:00 +00:00
README	Basic GMP files with a new core2 directory and amd_64 directory with Martin's and Gaudry's patches.	2008-04-17 21:03:07 +00:00
redc_1.as	change calling convention on the asm code in x86_64 for redc_1	2010-12-16 22:39:03 +00:00
rsh1add_n.as	New asm functions mpn_rsh1add_n mpn_rsh1sub_n for K8/K10/Core2/penryn/nehalem	2009-05-10 18:46:48 +00:00
rsh1sub_n.as	New asm functions mpn_rsh1add_n mpn_rsh1sub_n for K8/K10/Core2/penryn/nehalem	2009-05-10 18:46:48 +00:00
rsh_divrem_hensel_qr_1_2.asm	some more masm? movq/movd mixups	2009-10-16 00:45:14 +00:00
rshift.asm	change asm #comment to C comment	2009-10-15 18:13:19 +00:00
store.asm	some more masm? movq/movd mixups	2009-10-16 00:45:14 +00:00
sub_n.as	New asm functions mpn_add_n mpn_sub_n for Core2/penryn/nehalem	2009-05-10 01:26:52 +00:00
subadd_n.asm	Replace lahf and sahf with .byte declarations to support old coreutils such as	2015-06-11 12:51:45 +00:00
sublsh1_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
submul_1.asm	Use GMP add/submul_1 on core2 as well.	2014-02-21 15:25:41 +00:00
sumdiff_n.asm	Replace lahf and sahf with .byte declarations to support old coreutils such as	2015-06-11 12:51:45 +00:00
xnor_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00
xor_n.as	removed dos crlf from linux asm files , update configure to recognize GLOBAL_FUNC for HAVE_NATIVE_functions	2009-03-05 17:50:57 +00:00

README

This is a patch to solve two problems:

1.  It makes gmp run faster on Intel Core2 CPUs (i.e. Woodcrest, Conroe,
    and friends) under Linux

2.  It makes gmp work (and run fast) under Mac OS X on Core2 CPU
    machines (e.g. Mac Pro)

As an added bonus, it actually gives a little speed up to gmp on AMD64
machines as well.


To Install on a 64 bit Intel Mac (e.g. Mac Pro)
-------------------------------------------------------
1. Download gmp-4.2.1-core2-port.tar.gz


2. Uncompress and untar it.  Let's say that it's in the directory
~/gmp-4.2.1-core2-port


3.  Download GMP version 4.2.1


4.  Uncompress and untar GMP.  Let's say that it's in the directory
~/gmp-4.2.1


5.  Change into the gmp-4.2.1-core2-port directory and run the install
script (if you want to see what it's doing, just read it... it's a
very simple script).

    > cd ~/gmp-4.2.1-core2-port
    > ./install_gmp_4.2.1_core2_patch.sh ~/gmp-4.2.1


6.  Configure gmp for a 64 bit Intel Mac as such:

   > cd ~/gmp-4.2.1
   > ./configure --build=x86_64-apple-darwin CFLAGS="-m64 -fast"

(You can, of course, add whatever other config options you want.  Be
sure to use the CFLAGS environmental variable given above on the
command line.  Otherwise, the CFLAGS setting that configure generates
by default will give you compilation problems.)

7.  Build it!  Execute the following:

   > make


8.  Check it!  Execute the following:

   > make check


9.  Install it.

   > sudo make install




To Install on a Linux machine.
-------------------------------------------------------
1. Download gmp-4.2.1-core2-port.tar.gz


2. Uncompress and untar it.  Let's say that it's in the directory
~/gmp-4.2.1-core2-port


3.  Download GMP version 4.2.1


4.  Uncompress and untar GMP.  Let's say that it's in the directory
~/gmp-4.2.1


5.  Change into the gmp-4.2.1-core2-port directory and run the install
script (if you want to see what it's doing, just read it... it's a
very simple script).

    > cd ~/gmp-4.2.1-core2-port
    > ./install_gmp_4.2.1_core2_patch.sh ~/gmp-4.2.1


6.  Configure gmp as normal.

   > cd ~/gmp-4.2.1
   > ./configure

(You can, of course, add whatever other config options you want.)


7.  Build it!  Execute the following:

   > make


8.  Check it!  Execute the following:

   > make check


9.  Install it.

   > sudo make install






NOTES:

1. Wow!  The GMP code base is really well organized!  It was very easy
for me to find out exactly what files needed changing.  Nice work guys!!

2. In amd64call.asm all I changed was to make the addressing relative to
the rip register rather than absolute.  The Apple 64bit ABI doesn't support
absolute addressing.  Since Linux can use either addressing mode, it
makes sense to use position independent code... it's more portable and
there's no real performance difference.

3. In add_n and sub_n I re-wrote the code to accomidate the Woodcrest
nuances.  Mainly, I unrolled the main loop and I got rid of the "inc"
instruction (which causes a false dependency on the flag register and
thus stalls the pipeline).  Of course, this also meant that I had to
save the carry flag between loop iterations using the "lahf" and
"sahf" instructions.  These instructions are available on the Mac Pro
using the Apple assembler, but because some early x86_64 CPUs didn't
support those instructins, the GNU assembler doesn't allow those
mnemonics on 64bit machines (even when the CPU will support it).  So,
my assembly code includes some m4 code which calls the shell script
"lahf_sahf_test.sh" which determines if the lahf and sahf instructions
are available on the CPU.  If so, then it includes some hand assembled
bytes to get around GNU as limitations.  Otherwise, it falls back to
using "setc" and "bt" which are slower.

4.  On my 2.66 GHz Mac Pro, I was able to get a GMPbench score of 8263.

5.  You'll notice a Makefile and a bunch of extraneous files.  These are
used for testing the code outside of the GMP source tree.  The Makefile
will produce a file called mpn_test which just runs the routines through
a bunch of speed and correctness tests and compares them against the
original GMP 4.2.1 assembly files.

6.  On Mac OS X I haven't found a nice way yet to build dynamic
libraries.  The biggest obstical is that the Apple "libtool" and the
GNU "libtool" have incompatible syntax.  My guess is that in the near
future the GNU libtool will support the Apple libtool for creating
dynamic shared libraries.  For the mean time, I'll be content with
static libraries.  If you find a simple solution please let me know.

Jason Worth Martin
Asst. Prof. of Mathematics
James Madison Univ.
martinjw@jmu.edu