cheng/mpir

Author	SHA1	Message	Date
Brian Gladman	0b24934325	Merge branch 'master' of https://github.com/akruppa/mpir # Conflicts: # mpn/x86_64/haswell/add_n.as # mpn/x86_64/haswell/sub_n.as # mpn/x86_64/skylake/add_n.as # mpn/x86_64/skylake/sub_n.as	2017-01-17 09:05:55 +00:00
Alexander Kruppa	f82a093c18	mpn_sub_err1_n for AVX Skylake 2.25c/l	2017-01-17 06:47:38 +01:00
Alexander Kruppa	ff493ffc6a	mpn_add_err1_n for AVX Skylake 2.25c/l	2017-01-17 06:22:24 +01:00
Alexander Kruppa	e52db5b826	mul_1 by Nurmann 1.25c/l for large enough operands within L2	2017-01-16 23:23:00 +01:00
Alexander Kruppa	0c236c583a	add_n and sub_n by Nurmann, now reliably at 1c/l	2017-01-12 16:36:06 +01:00
Alexander Kruppa	366a80ddbc	Faster on Haswell, too	2017-01-11 20:26:09 +01:00
Alexander Kruppa	cf4c153842	Merge branch 'master' of github.com:akruppa/mpir	2017-01-11 20:19:45 +01:00
Alexander Kruppa	189047563e	karaadd that avoids rcl instructions	2017-01-10 01:27:36 +01:00
Alexander Kruppa	29e130da17	karasub that avoids rcl instructions and 3-component addresses	2017-01-10 01:27:22 +01:00
Brian Gladman	8edb5826fb	Merge branch 'master' of https://github.com/akruppa/mpir	2017-01-02 16:16:43 +00:00
Alex Kruppa	5fea3fd389	Use whichever existing function of MPIR or GMP is fastest for Bulldozer	2017-01-02 06:57:03 -08:00
Alex Kruppa	5c647dafc1	Add dummy IFDOS macro	2017-01-02 06:56:43 -08:00
Alex Kruppa	0595d7cc7c	Copied from core2/ The files differed only in one whitespace character	2017-01-02 06:32:54 -08:00
Alex Kruppa	8247a638e0	Copy SSE2 com_n from nehalem/	2016-12-31 12:26:35 -08:00
Brian Gladman	45322b6277	add GPL headers to assembler code	2016-12-22 22:23:25 +00:00
Brian Gladman	959308dd5f	add new assembler code to Windows	2016-12-22 17:50:17 +00:00
Brian Gladman	ac5ed04440	add latest assembler code to Windows	2016-12-22 17:31:36 +00:00
Brian Gladman	5167ce8705	Merge branch 'master' of https://github.com/akruppa/mpir	2016-12-21 13:51:52 +00:00
Alexander Kruppa	82b062537b	Merge branch 'master' of github.com:akruppa/mpir	2016-12-21 13:15:51 +01:00
Alexander Kruppa	a781118371	Use local label names	2016-12-21 01:25:24 +01:00
Alexander Kruppa	54816efdfd	Move AVX2-dependent files to avx/ subdirectories	2016-12-20 23:57:20 +01:00
Alexander Kruppa	4f46342830	addmul_1 for Skylake from GMP 6.1.1	2016-12-20 23:45:03 +01:00
Alexander Kruppa	c608c88dca	Improve dummy macros	2016-12-20 23:44:42 +01:00
Alexander Kruppa	72fe382864	sqr_basecase for Skylake from GMP 6.1.1	2016-12-20 06:32:39 +01:00
Alexander Kruppa	3e249beaf2	Add dummy defines for macros used by GMP	2016-12-20 06:32:14 +01:00
Alexander Kruppa	5ae24aef66	mul_basecase for Skylake from GMP 6.1.1	2016-12-20 05:15:54 +01:00
Alexander Kruppa	f28068172d	add_n and sub_n 1c/l most of the time, but sometimes gets into a bad "mode" where performance degrades to up to 1.2c/l	2016-12-20 05:14:15 +01:00
Brian Gladman	3ef16e3f7c	correct typos in two assembler file names	2016-12-18 16:56:42 +00:00
Brian Gladman	207ba39dc8	minor assembler code changes	2016-12-17 14:37:31 +00:00
Brian Gladman	55752e8061	add the revised add_n/sub_n assembler code to the Windows build	2016-12-13 14:10:48 +00:00
Brian Gladman	df53b304fb	Merge branch 'master' of https://github.com/akruppa/mpir	2016-12-13 13:21:32 +00:00
Alexander Kruppa	4ed54114e5	Add add_nc, sub_nc	2016-12-12 18:29:19 +01:00
Alexander Kruppa	bd53d5749e	add_n and sub_n with 8-way unrolling 1.075c/l on Haswell	2016-12-12 17:37:12 +01:00
Brian Gladman	89c11fbdfb	Add the latest haswell and skylake code to the Windows x64 build	2016-12-10 14:15:40 +00:00
Brian Gladman	e2d20ad009	Merge branch 'master' of https://github.com/akruppa/mpir	2016-12-09 17:05:26 +00:00
Alexander Kruppa	cfc589609e	Move to haswell/ This sumdiff_n is much slower on Haswell (2.6c/l) than on Skylake (2c/l) but it still provides a ~3% speed up for a 1M limb FFT compared to having no sumdiff_n at all.	2016-12-08 16:23:48 +01:00
Alexander Kruppa	e3d7be3b31	sublsh1_n by Nurmann, adapted to MPIR addlsh1_n.as and sublsh1_n.as mostly unified now	2016-12-08 06:07:02 +01:00
Alexander Kruppa	85d53dbc6e	addlsh1_n by Nurmann, adapted to MPIR	2016-12-08 03:39:25 +01:00
Alexander Kruppa	2f172e1dce	mul_1 from GMP 6.1.1	2016-12-07 19:22:28 +01:00
Alexander Kruppa	4c7cdee83c	sqr_basecase from GMP 6.1.1	2016-12-07 19:02:22 +01:00
Alexander Kruppa	ff7c73e955	Use local label names (.L)	2016-12-07 18:09:01 +01:00
Alexander Kruppa	95f95b17c6	Use local label names (.L) Otherwise, profiling shows separate event counts for each jump label rather than for the respective complete function	2016-12-07 17:46:16 +01:00
Alexander Kruppa	6bb39eab79	com_n, adapted from Nurmann's copyi code	2016-12-06 18:08:13 +01:00
Brian Gladman	e89e09a43d	add latest skylake code to Windows x64	2016-12-06 12:44:17 +00:00
Brian Gladman	4c7fa87118	add assembler code for haswell, skylake and skylake_avx to the WIn64 build	2016-12-06 12:01:20 +00:00
Alexander Kruppa	1871f04956	addmul_1 and submul_1, converted from GMP	2016-12-05 22:55:21 +01:00
Brian Gladman	a5193faa89	Merge branch 'master' of https://github.com/akruppa/mpir	2016-12-05 16:04:01 +00:00
Alexander Kruppa	4459641bad	sumdiff_n optimized for Skylake 2c/l	2016-12-05 16:40:57 +01:00
Brian Gladman	36983f9049	add Haswell mpn_mul_basecase and mpn_sub_n/nc for Win64; tidy up YASM macros	2016-12-01 16:51:05 +00:00
Alexander Kruppa	17687a2992	Haswell mul_basecase from GMP 6.1.1, converted to Intel syntax	2016-12-01 12:39:26 +01:00

1 2 3 4 5 ...

1290 Commits