diff --git a/README b/README index 84920c99..74b5b226 100644 --- a/README +++ b/README @@ -1,107 +1,107 @@ -Copyright 1991, 1996, 1999, 2000 Free Software Foundation, Inc. - -This file is part of the GNU MP Library. - -The GNU MP Library is free software; you can redistribute it and/or modify -it under the terms of the GNU Lesser General Public License as published by -the Free Software Foundation; either version 2.1 of the License, or (at your -option) any later version. - -The GNU MP Library is distributed in the hope that it will be useful, but -WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY -or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public -License for more details. - -You should have received a copy of the GNU Lesser General Public License -along with the GNU MP Library; see the file COPYING.LIB. If not, write to -the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA -02110-1301, USA. - - - - - - - THE MPIR LIBRARY - - -MPIR is a library for arbitrary precision arithmetic, operating on signed -integers, rational numbers, and floating point numbers. It has a rich set of -functions, and the functions have a regular interface. - -MPIR is designed to be as fast as possible, both for small operands and huge -operands. The speed is achieved by using fullwords as the basic arithmetic -type, by using fast algorithms, with carefully optimized assembly code for the -most common inner loops for lots of CPUs, and by a general emphasis on speed -(instead of simplicity or elegance). - -MPIR is believed to be faster than any other similar library. Its advantage -increases with operand sizes for certain operations, since MPIR in many -cases has asymptotically faster algorithms. - -MPIR is free software and may be freely copied on the terms contained in the -files COPYING.LIB and COPYING (most of MPIR is under the former, some under -the latter). - - - - OVERVIEW OF MPIR - -There are five classes of functions in MPIR. - - 1. Signed integer arithmetic functions (mpz). These functions are intended - to be easy to use, with their regular interface. The associated type is - `mpz_t'. - - 2. Rational arithmetic functions (mpq). For now, just a small set of - functions necessary for basic rational arithmetics. The associated type - is `mpq_t'. - - 3. Floating-point arithmetic functions (mpf). If the C type `double' - doesn't give enough precision for your application, declare your - variables as `mpf_t' instead, set the precision to any number desired, - and call the functions in the mpf class for the arithmetic operations. - - 4. Positive-integer, hard-to-use, very low overhead functions are in the - mpn class. No memory management is performed. The caller must ensure - enough space is available for the results. The set of functions is not - regular, nor is the calling interface. These functions accept input - arguments in the form of pairs consisting of a pointer to the least - significant word, and an integral size telling how many limbs (= words) - the pointer points to. - - Almost all calculations, in the entire package, are made by calling these - low-level functions. - - 5. Berkeley MP compatible functions. - - To use these functions, include the file "mp.h". You can test if you are - using the GNU version by testing if the symbol __GNU_MP__ is defined. - -For more information on how to use MPIR, please refer to the documentation. -It is composed from the file gmp.texi, and can be displayed on the screen or -printed. How to do that, as well how to build the library, is described in -the INSTALL file in this directory. - - - - REPORTING BUGS - -If you find a bug in the library, please make sure to tell us about it! - -You should first check the MPIR web pages at http://www.swox.com/gmp/, -under "Status of the current release". There will be patches for all known -serious bugs there. - -Report bugs to bug-gmp@gnu.org. What information is needed in a good bug -report is described in the manual. The same address can be used for -suggesting modifications and enhancements. - - - - ----------------- -Local variables: -mode: text -fill-column: 78 -End: +Copyright 1991, 1996, 1999, 2000 Free Software Foundation, Inc. + +This file is part of the GNU MP Library. + +The GNU MP Library is free software; you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as published by +the Free Software Foundation; either version 2.1 of the License, or (at your +option) any later version. + +The GNU MP Library is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public +License for more details. + +You should have received a copy of the GNU Lesser General Public License +along with the GNU MP Library; see the file COPYING.LIB. If not, write to +the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +02110-1301, USA. + + + + + + + THE MPIR LIBRARY + + +MPIR is a library for arbitrary precision arithmetic, operating on signed +integers, rational numbers, and floating point numbers. It has a rich set of +functions, and the functions have a regular interface. + +MPIR is designed to be as fast as possible, both for small operands and huge +operands. The speed is achieved by using fullwords as the basic arithmetic +type, by using fast algorithms, with carefully optimized assembly code for the +most common inner loops for lots of CPUs, and by a general emphasis on speed +(instead of simplicity or elegance). + +MPIR is believed to be faster than any other similar library. Its advantage +increases with operand sizes for certain operations, since MPIR in many +cases has asymptotically faster algorithms. + +MPIR is free software and may be freely copied on the terms contained in the +files COPYING.LIB and COPYING (most of MPIR is under the former, some under +the latter). + + + + OVERVIEW OF MPIR + +There are five classes of functions in MPIR. + + 1. Signed integer arithmetic functions (mpz). These functions are intended + to be easy to use, with their regular interface. The associated type is + `mpz_t'. + + 2. Rational arithmetic functions (mpq). For now, just a small set of + functions necessary for basic rational arithmetics. The associated type + is `mpq_t'. + + 3. Floating-point arithmetic functions (mpf). If the C type `double' + doesn't give enough precision for your application, declare your + variables as `mpf_t' instead, set the precision to any number desired, + and call the functions in the mpf class for the arithmetic operations. + + 4. Positive-integer, hard-to-use, very low overhead functions are in the + mpn class. No memory management is performed. The caller must ensure + enough space is available for the results. The set of functions is not + regular, nor is the calling interface. These functions accept input + arguments in the form of pairs consisting of a pointer to the least + significant word, and an integral size telling how many limbs (= words) + the pointer points to. + + Almost all calculations, in the entire package, are made by calling these + low-level functions. + + 5. Berkeley MP compatible functions. + + To use these functions, include the file "mp.h". You can test if you are + using the GNU version by testing if the symbol __GNU_MP__ is defined. + +For more information on how to use MPIR, please refer to the documentation. +It is composed from the file gmp.texi, and can be displayed on the screen or +printed. How to do that, as well how to build the library, is described in +the INSTALL file in this directory. + + + + REPORTING BUGS + +If you find a bug in the library, please make sure to tell us about it! + +You should first check the MPIR web pages at http://www.swox.com/gmp/, +under "Status of the current release". There will be patches for all known +serious bugs there. + +Report bugs to bug-gmp@gnu.org. What information is needed in a good bug +report is described in the manual. The same address can be used for +suggesting modifications and enhancements. + + + + +---------------- +Local variables: +mode: text +fill-column: 78 +End: diff --git a/doc/projects.html b/doc/projects.html index 0e5bd2b6..ca28da18 100644 --- a/doc/projects.html +++ b/doc/projects.html @@ -1,645 +1,645 @@ - - -
-This file lists projects suitable for volunteers. Please see the - tasks file for smaller tasks. - -
If you want to work on any of the projects below, please let us know at - http://groups.google.co.uk/group/mpir-devel/. If you want to help with a project - that already somebody else is working on, we will help you will get in touch. - (There are no email addresses of - volunteers below, due to spamming problems.) - -
The current multiplication code uses Karatsuba, 3-way Toom, and Fermat - FFT. Several new developments are desirable: - -
mpf_mul
and by Newton approximations, a low half partial
- product might be of use in a future sub-quadratic REDC. On small
- sizes a partial product will be faster simply through fewer
- cross-products, similar to the way squaring is faster. But work by
- Thom Mulders shows that for Karatsuba and higher order algorithms the
- advantage is progressively lost, so for large sizes partial products
- turn out to be no faster.
-
- Another possibility would be an optimized cube. In the basecase that - should definitely be able to save cross-products in a similar fashion to - squaring, but some investigation might be needed for how best to adapt - the higher-order algorithms. Not sure whether cubing or further small - powers have any particularly important uses though. - - -
Write new and improve existing assembly routines. The tests/devel - programs and the tune/speed.c and tune/many.pl programs are useful for - testing and timing the routines you write. See the README files in those - directories for more information. - -
Please make sure your new routines are fast for these three situations: -
The most important routines are mpn_addmul_1, mpn_mul_basecase and - mpn_sqr_basecase. The latter two don't exist for all machines, while - mpn_addmul_1 exists for almost all machines. - -
Standard techniques for these routines are unrolling, software - pipelining, and specialization for common operand values. For machines - with poor integer multiplication, it is often possible to improve the - performance using floating-point operations, or SIMD operations such as - MMX or Sun's VIS. - -
Using floating-point operations is interesting but somewhat tricky. - Since IEEE double has 53 bit of mantissa, one has to split the operands - in small prices, so that no result is greater than 2^53. For 32-bit - computers, splitting one operand into 16-bit pieces works. For 64-bit - machines, one operand can be split into 21-bit pieces and the other into - 32-bit pieces. (A 64-bit operand can be split into just three 21-bit - pieces if one allows the split operands to be negative!) - - -
Work on Schönhage GCD and GCDEXT for large numbers is in progress. - Contact Niels Möller if you want to help. - - -
Implement the functions of math.h for the GMP mpf layer! Check the book - "Pi and the AGM" by Borwein and Borwein for ideas how to do this. These - functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, - cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh. - - -
The current code uses divisions, which are reasonably fast, but it'd be - possible to use only multiplications by computing 1/sqrt(A) using this - formula: -
- 2 - x = x (3 − A x )/2 - i+1 i i- The square root can then be computed like this: -
- sqrt(A) = A x - n-
That final multiply might be the full size of the input (though it might - only need the high half of that), so there may or may not be any speedup - overall. - - -
Improve mpn_rootrem. The current code is really naive, using full - precision from the first iteration. Also, calling mpn_pow_1 isn't very - clever, as only 1/n of the result bits will be used; truncation after - each multiplication would be better. Avoiding division might also be - possible. - - Work mostly done, see - http://gmplib.org/devel/ - for some new code for divexact. - -
mpn_bdivmod
and the redc
in
- mpz_powm
should use some sort of divide and conquer
- algorithm. This would benefit mpz_divexact
, and
- mpn_gcd
on large unequal size operands. See "Exact Division
- with Karatsuba Complexity" by Jebelean for a (brief) description.
-
-
Failing that, some sort of DIVEXACT_THRESHOLD
could be added
- to control whether mpz_divexact
uses
- mpn_bdivmod
or mpn_tdiv_qr
, since the latter is
- faster on large divisors.
-
-
For the REDC, basically all that's needed is Montgomery's algorithm done - in multi-limb integers. R is multiple limbs, and the inverse and - operands are multi-precision. - -
For exact division the time to calculate a multi-limb inverse is not
- amortized across many modular operations, but instead will probably
- create a threshold below which the current style mpn_bdivmod
- is best. There's also Krandick and Jebelean, "Bidirectional Exact
- Integer Division" to basically use a low to high exact division for the
- low half quotient, and a quotient-only division for the high half.
-
-
It will be noted that low-half and high-half multiplies, and a low-half - square, can be used. These ought to each take as little as half the time - of a full multiply, or square, though work by Thom Mulders shows the - advantage is progressively lost as Karatsuba and higher algorithms are - applied. - - -
Some sort of scheme for exceptions handling would be desirable.
- Presently the only thing documented is that divide by zero in GMP
- functions provokes a deliberate machine divide by zero (on those systems
- where such a thing exists at least). The global gmp_errno
- is not actually documented, except for the old gmp_randinit
- function. Being currently just a plain global means it's not
- thread-safe.
-
-
The basic choices for exceptions are returning an error code or having a
- handler function to be called. The disadvantage of error returns is they
- have to be checked, leading to tedious and rarely executed code, and
- strictly speaking such a scheme wouldn't be source or binary compatible.
- The disadvantage of a handler function is that a longjmp
or
- similar recovery from it may be difficult. A combination would be
- possible, for instance by allowing the handler to return an error code.
-
-
Divide-by-zero, sqrt-of-negative, and similar operand range errors can - normally be detected at the start of functions, so exception handling - would have a clean state. What's worth considering though is that the - GMP function detecting the exception may have been called via some third - party library or self contained application module, and hence have - various bits of state to be cleaned up above it. It'd be highly - desirable for an exceptions scheme to allow for such cleanups. - -
The C++ destructor mechanism could help with cleanups both internally and - externally, but being a plain C library we don't want to depend on that. - -
A C++ throw
might be a good optional extra exceptions
- mechanism, perhaps under a build option. For GCC
- -fexceptions
will add the necessary frame information to
- plain C code, or GMP could be compiled as C++.
-
-
Out-of-memory exceptions are expected to be handled by the
- mp_set_memory_functions
routines, rather than being a
- prospective part of divide-by-zero etc. Some similar considerations
- apply but what differs is that out-of-memory can arise deep within GMP
- internals. Even fundamental routines like mpn_add_n
and
- mpn_addmul_1
can use temporary memory (for instance on Cray
- vector systems). Allowing for an error code return would require an
- awful lot of checking internally. Perhaps it'd still be worthwhile, but
- it'd be a lot of changes and the extra code would probably be rather
- rarely executed in normal usages.
-
-
A longjmp
recovery for out-of-memory will currently, in
- general, lead to memory leaks and may leave GMP variables operated on in
- inconsistent states. Maybe it'd be possible to record recovery
- information for use by the relevant allocate or reallocate function, but
- that too would be a lot of changes.
-
-
One scheme for out-of-memory would be to note that all GMP allocations go
- through the mp_set_memory_functions
routines. So if the
- application has an intended setjmp
recovery point it can
- record memory activity by GMP and abandon space allocated and variables
- initialized after that point. This might be as simple as directing the
- allocation functions to a separate pool, but in general would have the
- disadvantage of needing application-level bookkeeping on top of the
- normal system malloc
. An advantage however is that it needs
- nothing from GMP itself and on that basis doesn't burden applications not
- needing recovery. Note that there's probably some details to be worked
- out here about reallocs of existing variables, and perhaps about copying
- or swapping between "permanent" and "temporary" variables.
-
-
Applications desiring a fine-grained error control, for instance a
- language interpreter, would very possibly not be well served by a scheme
- requiring longjmp
. Wrapping every GMP function call with a
- setjmp
would be very inconvenient.
-
-
Another option would be to let mpz_t
etc hold a sort of NaN,
- a special value indicating an out-of-memory or other failure. This would
- be similar to NaNs in mpfr. Unfortunately such a scheme could only be
- used by programs prepared to handle such special values, since for
- instance a program waiting for some condition to be satisfied could
- become an infinite loop if it wasn't also watching for NaNs. The work to
- implement this would be significant too, lots of checking of inputs and
- intermediate results. And if mpn
routines were to
- participate in this (which they would have to internally) a lot of new
- return values would need to be added, since of course there's no
- mpz_t
etc structure for them to indicate failure in.
-
-
Stack overflow is another possible exception, but perhaps not one that
- can be easily detected in general. On i386 GNU/Linux for instance GCC
- normally doesn't generate stack probes for an alloca
, but
- merely adjusts %esp
. A big enough alloca
can
- miss the stack redzone and hit arbitrary data. GMP stack usage is
- normally a function of operand size, which might be enough for some
- applications to know they'll be safe. Otherwise a fixed maximum usage
- can probably be obtained by building with
- --enable-alloca=malloc-reentrant
(or
- notreentrant
). Arranging the default to be
- alloca
only on blocks up to a certain size and
- malloc
thereafter might be a better approach and would have
- the advantage of not having calculations limited by available stack.
-
-
Actually recovering from stack overflow is of course another problem. It
- might be possible to catch a SIGSEGV
in the stack redzone
- and do something in a sigaltstack
, on systems which have
- that, but recovery might otherwise not be possible. This is worth
- bearing in mind because there's no point worrying about tight and careful
- out-of-memory recovery if an out-of-stack is fatal.
-
-
Operand overflow is another exception to be addressed. It's easy for
- instance to ask mpz_pow_ui
for a result bigger than an
- mpz_t
can possibly represent. Currently overflows in limb
- or byte count calculations will go undetected. Often they'll still end
- up asking the memory functions for blocks bigger than available memory,
- but that's by no means certain and results are unpredictable in general.
- It'd be desirable to tighten up such size calculations. Probably only
- selected routines would need checks, if it's assumed say that no input
- will be more than half of all memory and hence size additions like say
- mpz_mul
won't overflow.
-
-
-
It'd be nice to have some sort of tool for getting an overview of - performance. Clearly a great many things could be done, but some primary - uses would be, - -
A combination of measuring some fundamental routines and some - representative application routines might satisfy these. - -
The tune/time.c routines would be the easiest way to get good accurate
- measurements on lots of different systems. The high level
- speed_measure
may or may not suit, but the basic
- speed_starttime
and speed_endtime
would cover
- lots of portability and accuracy questions.
-
-
-
restrict
-
- There might be some value in judicious use of C99 style
- restrict
on various pointers, but this would need some
- careful thought about what it implies for the various operand overlaps
- permitted in GMP.
-
-
Rumour has it some pre-C99 compilers had restrict
, but
- expressing tighter (or perhaps looser) requirements. Might be worth
- investigating that before using restrict
unconditionally.
-
-
Loops are presumably where the greatest benefit would be had, by allowing
- the compiler to advance reads ahead of writes, perhaps as part of loop
- unrolling. However critical loops are generally coded in assembler, so
- there might not be very much to gain. And on Cray systems the explicit
- use of _Pragma
gives an equivalent effect.
-
-
One thing to note is that Microsoft C headers (on ia64 at least) contain
- __declspec(restrict)
, so a #define
of
- restrict
should be avoided. It might be wisest to setup a
- gmp_restrict
.
-
-
-
The limb-by-limb dependencies in the existing Nx1 division (and - remainder) code means that chips with multiple execution units or - pipelined multipliers are not fully utilized. - -
One possibility is to follow the current preinv method but taking two
- limbs at a time. That means a 2x2->4 and a 2x1->2 multiply for
- each two limbs processed, and because the 2x2 and 2x1 can each be done in
- parallel the latency will be not much more than 2 multiplies for two
- limbs, whereas the single limb method has a 2 multiply latency for just
- one limb. A version of mpn_divrem_1
doing this has been
- written in C, but not yet tested on likely chips. Clearly this scheme
- would extend to 3x3->9 and 3x1->3 etc, though with diminishing
- returns.
-
-
For mpn_mod_1
, Peter L. Montgomery proposes the following
- scheme. For a limb R=2^bits_per_mp_limb
, pre-calculate
- values R mod N, R^2 mod N, R^3 mod N, R^4 mod N. Then take dividend
- limbs and multiply them by those values, thereby reducing them (moving
- them down) by the corresponding factor. The products can be added to
- produce an intermediate remainder of 2 or 3 limbs to be similarly
- included in the next step. The point is that such multiplies can be done
- in parallel, meaning as little as 1 multiply worth of latency for 4
- limbs. If the modulus N is less than R/4 (or is it R/5?) the summed
- products will fit in 2 limbs, otherwise 3 will be required, but with the
- high only being small. Clearly this extends to as many factors of R as a
- chip can efficiently apply.
-
-
The logical conclusion for powers R^i is a whole array "p[i] = R^i mod N" - for i up to k, the size of the dividend. This could then be applied at - multiplier throughput speed like an inner product. If the powers took - roughly k divide steps to calculate then there'd be an advantage any time - the same N was used three or more times. Suggested by Victor Shoup in - connection with chinese-remainder style decompositions, but perhaps with - other uses. - -
mpn_modexact_1_odd
calculates an x in the range 0<=x<d
- satisfying a = q*d + x*b^n, where b=2^bits_per_limb. The factor b^n
- needed to get the true remainder r could be calculated by a powering
- algorithm, allowing mpn_modexact_1_odd
to be pressed into
- service for an mpn_mod_1
. modexact_1
is
- simpler and on some chips can run noticeably faster than plain
- mod_1
, on Athlon for instance 11 cycles/limb instead of 17.
- Such a difference could soon overcome the time to calculate b^n. The
- requirement for an odd divisor in modexact
can be handled by
- some shifting on-the-fly, or perhaps by an extra partial-limb step at the
- end.
-
-
-
The removal of twos in the current code could be extended to factors of 3 - or 5. Taking this to its logical conclusion would be a complete - decomposition into powers of primes. The power for a prime p is of - course floor(n/p)+floor(n/p^2)+... Conrad Curry found this is quite fast - (using simultaneous powering as per Handbook of Applied Cryptography - algorithm 14.88). - -
A difficulty with using all primes is that quite large n can be - calculated on a system with enough memory, larger than we'd probably want - for a table of primes, so some sort of sieving would be wanted. Perhaps - just taking out the factors of 3 and 5 would give most of the speedup - that a prime decomposition can offer. - - -
An obvious improvement to the current code would be to strip factors of 2 - from each multiplier and divisor and count them separately, to be applied - with a bit shift at the end. Factors of 3 and perhaps 5 could even be - handled similarly. - -
Conrad Curry reports a big speedup for binomial coefficients using a - prime powering scheme, at least for k near n/2. Of course this is only - practical for moderate size n since again it requires primes up to n. - -
When k is small the current (n-k+1)...n/1...k will be fastest. Some sort - of rule would be needed for when to use this or when to use prime - powering. Such a rule will be a function of both n and k. Some - investigation is needed to see what sort of shape the crossover line will - have, the usual parameter tuning can of course find machine dependent - constants to fill in where necessary. - -
An easier possibility also reported by Conrad Curry is that it may be
- faster not to divide out the denominator (1...k) one-limb at a time, but
- do one big division at the end. Is this because a big divisor in
- mpn_bdivmod
trades the latency of
- mpn_divexact_1
for the throughput of
- mpn_submul_1
? Overheads must hurt though.
-
-
Another reason a big divisor might help is that
- mpn_divexact_1
won't be getting a full limb in
- mpz_bin_uiui
. It's called when the n accumulator is full
- but the k may be far from full. Perhaps the two could be decoupled so k
- is applied when full. It'd be necessary to delay consideration of k
- terms until the corresponding n terms had been applied though, since
- otherwise the division won't be exact.
-
-
-
mpz_perfect_power_p
could be improved in a number of ways,
- for instance p-adic arithmetic to find possible roots.
-
-
Non-powers can be quickly identified by checking for Nth power residues
- modulo small primes, like mpn_perfect_square_p
does for
- squares. The residues to each power N for a given remainder could be
- grouped into a bit mask, the masks for the remainders to each divisor
- would then be "and"ed together to hopefully leave only a few candidate
- powers. Need to think about how wide to make such masks, ie. how many
- powers to examine in this way.
-
-
Any zero remainders found in residue testing reveal factors which can be
- divided out, with the multiplicity restricting the powers that need to be
- considered, as per the current code. Further prime dividing should be
- grouped into limbs like PP
. Need to think about how much
- dividing to do like that, probably more for bigger inputs, less for
- smaller inputs.
-
-
mpn_gcd_1
would probably be better than the current private
- GCD routine. The use it's put to isn't time-critical, and it might help
- ensure correctness to just use the main GCD routine.
-
-
-
GMP is not really a number theory library and probably shouldn't have - large amounts of code dedicated to sophisticated prime testing - algorithms, but basic things well-implemented would suit. Tests offering - certainty are probably all too big or too slow (or both!) to justify - inclusion in the main library. Demo programs showing some possibilities - would be good though. - -
The present "repetitions" argument to mpz_probab_prime_p
is
- rather specific to the Miller-Rabin tests of the current implementation.
- Better would be some sort of parameter asking perhaps for a maximum
- chance 1/2^x of a probable prime in fact being composite. If
- applications follow the advice that the present reps gives 1/4^reps
- chance then perhaps such a change is unnecessary, but an explicitly
- described 1/2^x would allow for changes in the implementation or even for
- new proofs about the theory.
-
-
mpz_probab_prime_p
always initializes a new
- gmp_randstate_t
for randomized tests, which unfortunately
- means it's not really very random and in particular always runs the same
- tests for a given input. Perhaps a new interface could accept an rstate
- to use, so successive tests could increase confidence in the result.
-
-
mpn_mod_34lsub1
is an obvious and easy improvement to the
- trial divisions. And since the various prime factors are constants, the
- remainder can be tested with something like
-
-#define MP_LIMB_DIVISIBLE_7_P(n) \ - ((n) * MODLIMB_INVERSE_7 <= MP_LIMB_T_MAX/7) -- Which would help compilers that don't know how to optimize divisions by - constants, and is even an improvement on current gcc 3.2 code. This - technique works for any modulus, see Granlund and Montgomery "Division by - Invariant Integers" section 9. - -
The trial divisions are done with primes generated and grouped at
- runtime. This could instead be a table of data, with pre-calculated
- inverses too. Storing deltas, ie. amounts to add, rather than actual
- primes would save space. udiv_qrnnd_preinv
style inverses
- can be made to exist by adding dummy factors of 2 if necessary. Some
- thought needs to be given as to how big such a table should be, based on
- how much dividing would be profitable for what sort of size inputs. The
- data could be shared by the perfect power testing.
-
-
Jason Moxham points out that if a sqrt(-1) mod N exists then any factor - of N must be == 1 mod 4, saving half the work in trial dividing. (If - x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the - jacobi symbol (-1/p)=1. But also (-1/p)=(-1)^((p-1)/2), hence must have - p==1 mod 4.) But knowing whether sqrt(-1) mod N exists is not too easy. - A strong pseudoprime test can reveal one, so perhaps such a test could be - inserted part way though the dividing. - -
Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a - quadratic pseudoprime test taking about 3x longer than a plain test, but - with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests - would offer only (1/4)^3 == 1/64). Such a test needs completely random - parameters to satisfy the theory, though single-limb values would run - faster. It's probably best to do at least one plain Miller-Rabin before - any quadratic tests, since that can identify composites in less total - time. - -
Some thought needs to be given to the structure of which tests (trial - division, Miller-Rabin, quadratic) and how many are done, based on what - sort of inputs we expect, with a view to minimizing average time. - -
It might be a good idea to break out subroutines for the various tests,
- so that an application can combine them in ways it prefers, if sensible
- defaults in mpz_probab_prime_p
don't suit. In particular
- this would let applications skip tests it knew would be unprofitable,
- like trial dividing when an input is already known to have no small
- factors.
-
-
For small inputs, combinations of theory and explicit search make it - relatively easy to offer certainty. For instance numbers up to 2^32 - could be handled with a strong pseudoprime test and table lookup. But - it's rather doubtful whether a smallnum prime test belongs in a bignum - library. Perhaps if it had other internal uses. - -
An mpz_nthprime
might be cute, but is almost certainly
- impractical for anything but small n.
-
-
-
On various systems, calls within libgmp still go through the PLT, TOC or - other mechanism, which makes the code bigger and slower than it needs to - be. - -
The theory would be to have all GMP intra-library calls resolved directly - to the routines in the library. An application wouldn't be able to - replace a routine, the way it can normally, but there seems no good - reason to do that, in normal circumstances. - -
The visibility
attribute in recent gcc is good for this,
- because it lets gcc omit unnecessary GOT pointer setups or whatever if it
- finds all calls are local and there's no global data references.
- Documented entrypoints would be protected
, and purely
- internal things not wanted by test programs or anything can be
- internal
.
-
-
Unfortunately, on i386 it seems protected
ends up causing
- text segment relocations within libgmp.so, meaning the library code can't
- be shared between processes, defeating the purpose of a shared library.
- Perhaps this is just a gremlin in binutils (debian packaged
- 2.13.90.0.16-1).
-
-
The linker can be told directly (with a link script, or options) to do - the same sort of thing. This doesn't change the code emitted by gcc of - course, but it does mean calls are resolved directly to their targets, - avoiding a PLT entry. - -
Keeping symbols private to libgmp.so is probably a good thing in general - too, to stop anyone even attempting to access them. But some - undocumented things will need or want to be kept visible, for use by - mpfr, or the test and tune programs. Libtool has a standard option for - selecting public symbols (used now for libmp). - - -