Brian Gladman
0b24934325
Merge branch 'master' of https://github.com/akruppa/mpir
...
# Conflicts:
# mpn/x86_64/haswell/add_n.as
# mpn/x86_64/haswell/sub_n.as
# mpn/x86_64/skylake/add_n.as
# mpn/x86_64/skylake/sub_n.as
2017-01-17 09:05:55 +00:00
Alexander Kruppa
f82a093c18
mpn_sub_err1_n for AVX Skylake
...
2.25c/l
2017-01-17 06:47:38 +01:00
Alexander Kruppa
ff493ffc6a
mpn_add_err1_n for AVX Skylake
...
2.25c/l
2017-01-17 06:22:24 +01:00
Alexander Kruppa
e52db5b826
mul_1 by Nurmann
...
1.25c/l for large enough operands within L2
2017-01-16 23:23:00 +01:00
Alexander Kruppa
0c236c583a
add_n and sub_n by Nurmann, now reliably at 1c/l
2017-01-12 16:36:06 +01:00
Alexander Kruppa
366a80ddbc
Faster on Haswell, too
2017-01-11 20:26:09 +01:00
Alexander Kruppa
cf4c153842
Merge branch 'master' of github.com:akruppa/mpir
2017-01-11 20:19:45 +01:00
Alexander Kruppa
189047563e
karaadd that avoids rcl instructions
2017-01-10 01:27:36 +01:00
Alexander Kruppa
29e130da17
karasub that avoids rcl instructions and 3-component addresses
2017-01-10 01:27:22 +01:00
Brian Gladman
8edb5826fb
Merge branch 'master' of https://github.com/akruppa/mpir
2017-01-02 16:16:43 +00:00
Alex Kruppa
5fea3fd389
Use whichever existing function of MPIR or GMP is fastest for Bulldozer
2017-01-02 06:57:03 -08:00
Alex Kruppa
5c647dafc1
Add dummy IFDOS macro
2017-01-02 06:56:43 -08:00
Alex Kruppa
0595d7cc7c
Copied from core2/
...
The files differed only in one whitespace character
2017-01-02 06:32:54 -08:00
Alex Kruppa
8247a638e0
Copy SSE2 com_n from nehalem/
2016-12-31 12:26:35 -08:00
Brian Gladman
45322b6277
add GPL headers to assembler code
2016-12-22 22:23:25 +00:00
Brian Gladman
959308dd5f
add new assembler code to Windows
2016-12-22 17:50:17 +00:00
Brian Gladman
ac5ed04440
add latest assembler code to Windows
2016-12-22 17:31:36 +00:00
Brian Gladman
5167ce8705
Merge branch 'master' of https://github.com/akruppa/mpir
2016-12-21 13:51:52 +00:00
Alexander Kruppa
82b062537b
Merge branch 'master' of github.com:akruppa/mpir
2016-12-21 13:15:51 +01:00
Alexander Kruppa
a781118371
Use local label names
2016-12-21 01:25:24 +01:00
Alexander Kruppa
54816efdfd
Move AVX2-dependent files to avx/ subdirectories
2016-12-20 23:57:20 +01:00
Alexander Kruppa
4f46342830
addmul_1 for Skylake from GMP 6.1.1
2016-12-20 23:45:03 +01:00
Alexander Kruppa
c608c88dca
Improve dummy macros
2016-12-20 23:44:42 +01:00
Alexander Kruppa
72fe382864
sqr_basecase for Skylake from GMP 6.1.1
2016-12-20 06:32:39 +01:00
Alexander Kruppa
3e249beaf2
Add dummy defines for macros used by GMP
2016-12-20 06:32:14 +01:00
Alexander Kruppa
5ae24aef66
mul_basecase for Skylake from GMP 6.1.1
2016-12-20 05:15:54 +01:00
Alexander Kruppa
f28068172d
add_n and sub_n
...
1c/l most of the time, but sometimes gets into a bad "mode" where performance
degrades to up to 1.2c/l
2016-12-20 05:14:15 +01:00
Brian Gladman
3ef16e3f7c
correct typos in two assembler file names
2016-12-18 16:56:42 +00:00
Brian Gladman
207ba39dc8
minor assembler code changes
2016-12-17 14:37:31 +00:00
Brian Gladman
55752e8061
add the revised add_n/sub_n assembler code to the Windows build
2016-12-13 14:10:48 +00:00
Brian Gladman
df53b304fb
Merge branch 'master' of https://github.com/akruppa/mpir
2016-12-13 13:21:32 +00:00
Alexander Kruppa
4ed54114e5
Add add_nc, sub_nc
2016-12-12 18:29:19 +01:00
Alexander Kruppa
bd53d5749e
add_n and sub_n with 8-way unrolling
...
1.075c/l on Haswell
2016-12-12 17:37:12 +01:00
Brian Gladman
89c11fbdfb
Add the latest haswell and skylake code to the Windows x64 build
2016-12-10 14:15:40 +00:00
Brian Gladman
e2d20ad009
Merge branch 'master' of https://github.com/akruppa/mpir
2016-12-09 17:05:26 +00:00
Alexander Kruppa
cfc589609e
Move to haswell/
...
This sumdiff_n is much slower on Haswell (2.6c/l) than on Skylake (2c/l)
but it still provides a ~3% speed up for a 1M limb FFT compared to having
no sumdiff_n at all.
2016-12-08 16:23:48 +01:00
Alexander Kruppa
e3d7be3b31
sublsh1_n by Nurmann, adapted to MPIR
...
addlsh1_n.as and sublsh1_n.as mostly unified now
2016-12-08 06:07:02 +01:00
Alexander Kruppa
85d53dbc6e
addlsh1_n by Nurmann, adapted to MPIR
2016-12-08 03:39:25 +01:00
Alexander Kruppa
2f172e1dce
mul_1 from GMP 6.1.1
2016-12-07 19:22:28 +01:00
Alexander Kruppa
4c7cdee83c
sqr_basecase from GMP 6.1.1
2016-12-07 19:02:22 +01:00
Alexander Kruppa
ff7c73e955
Use local label names (.L)
2016-12-07 18:09:01 +01:00
Alexander Kruppa
95f95b17c6
Use local label names (.L)
...
Otherwise, profiling shows separate event counts for each jump label rather
than for the respective complete function
2016-12-07 17:46:16 +01:00
Alexander Kruppa
6bb39eab79
com_n, adapted from Nurmann's copyi code
2016-12-06 18:08:13 +01:00
Brian Gladman
e89e09a43d
add latest skylake code to Windows x64
2016-12-06 12:44:17 +00:00
Brian Gladman
4c7fa87118
add assembler code for haswell, skylake and skylake_avx to the WIn64 build
2016-12-06 12:01:20 +00:00
Alexander Kruppa
1871f04956
addmul_1 and submul_1, converted from GMP
2016-12-05 22:55:21 +01:00
Brian Gladman
a5193faa89
Merge branch 'master' of https://github.com/akruppa/mpir
2016-12-05 16:04:01 +00:00
Alexander Kruppa
4459641bad
sumdiff_n optimized for Skylake
...
2c/l
2016-12-05 16:40:57 +01:00
Brian Gladman
36983f9049
add Haswell mpn_mul_basecase and mpn_sub_n/nc for Win64; tidy up YASM macros
2016-12-01 16:51:05 +00:00
Alexander Kruppa
17687a2992
Haswell mul_basecase from GMP 6.1.1, converted to Intel syntax
2016-12-01 12:39:26 +01:00