Alexander Kruppa
bd53d5749e
add_n and sub_n with 8-way unrolling
...
1.075c/l on Haswell
2016-12-12 17:37:12 +01:00
Alexander Kruppa
cfc589609e
Move to haswell/
...
This sumdiff_n is much slower on Haswell (2.6c/l) than on Skylake (2c/l)
but it still provides a ~3% speed up for a 1M limb FFT compared to having
no sumdiff_n at all.
2016-12-08 16:23:48 +01:00
Alexander Kruppa
e3d7be3b31
sublsh1_n by Nurmann, adapted to MPIR
...
addlsh1_n.as and sublsh1_n.as mostly unified now
2016-12-08 06:07:02 +01:00
Alexander Kruppa
85d53dbc6e
addlsh1_n by Nurmann, adapted to MPIR
2016-12-08 03:39:25 +01:00
Alexander Kruppa
2f172e1dce
mul_1 from GMP 6.1.1
2016-12-07 19:22:28 +01:00
Alexander Kruppa
4c7cdee83c
sqr_basecase from GMP 6.1.1
2016-12-07 19:02:22 +01:00
Alexander Kruppa
6bb39eab79
com_n, adapted from Nurmann's copyi code
2016-12-06 18:08:13 +01:00
Alexander Kruppa
1871f04956
addmul_1 and submul_1, converted from GMP
2016-12-05 22:55:21 +01:00
Alexander Kruppa
17687a2992
Haswell mul_basecase from GMP 6.1.1, converted to Intel syntax
2016-12-01 12:39:26 +01:00
Alexander Kruppa
e508181a75
Version of mpn/x86_64/sandybridge/sub_n.as, super-optimized for Haswell
...
New speed about 1.20c/l on Haswell, was 1.33c/l
2016-11-28 19:43:46 +01:00
Alexander Kruppa
5d75ebc8bf
Reduce number of registers used and use %defines for register names
2016-11-27 00:51:45 +01:00
Alexander Kruppa
d11c3ca728
Bugfix: operand name macros were wrong
2016-11-25 18:11:38 +01:00
Alexander Kruppa
ea49db539e
Revert "Temporarily removed due to bug"
...
This reverts commit 38e8585c05
.
2016-11-25 18:11:21 +01:00
Alexander Kruppa
38e8585c05
Temporarily removed due to bug
2016-11-25 15:27:21 +01:00
Alexander Kruppa
8100363a85
Version of mpn/x86_64/sandybridge/add_n.as, super-optimized for Haswell
...
New speed about 1.21c/l on Haswell, was 1.33c/l
2016-11-25 15:25:09 +01:00
Alexander Kruppa
6316e39430
Increasing copy with AVX2 for Haswell
2016-11-25 11:51:54 +01:00
Alexander Kruppa
29577b5109
Decreasing copy with AVX2 for Haswell
2016-11-24 02:01:38 +01:00
Alexander Kruppa
4660be16f6
AVX-based rshift for 4-issue Intel cpus (Haswell and newer)
2016-11-22 23:18:52 +01:00
Alexander Kruppa
105c26c466
AVX-based lshift for 4-issue Intel cpus (Haswell and newer)
2016-11-22 21:58:43 +01:00
Alexander Kruppa
99a1f8d05b
Add vzeroupper to avoid stall on Haswell if SSE2 code follows
2016-11-22 15:03:02 +01:00
Alexander Kruppa
aa75752824
AVX-based lshift1 and rshift1 for 4-issue Intel cpus (Haswell and newer)
2016-11-18 21:54:07 +01:00
William Hart
8435273a1a
Remove sb_div* small implementation (due to bug and due to being a very minor
...
performance improvement).
2015-11-13 14:47:44 +00:00
William Hart
45e7dbc9b4
Added piledriver, ivybridge, haswell to configure and fat build.
2014-03-25 17:32:34 +00:00