2015-06-09 15:33:32 -04:00
|
|
|
|
This is mpir.info, produced by makeinfo version 5.2 from mpir.texi.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2010-03-24 11:47:51 -04:00
|
|
|
|
This manual describes how to install and use MPIR, the Multiple
|
2015-11-13 16:27:39 -05:00
|
|
|
|
Precision Integers and Rationals library, version 2.7.1.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
|
|
|
|
|
2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013
|
|
|
|
|
Free Software Foundation, Inc.
|
2008-06-28 19:37:27 -04:00
|
|
|
|
|
2010-04-04 13:44:45 -04:00
|
|
|
|
Copyright 2008, 2009, 2010 William Hart
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
|
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
|
|
|
any later version published by the Free Software Foundation; with no
|
|
|
|
|
Invariant Sections, with the Front-Cover Texts being "A GNU Manual", and
|
|
|
|
|
with the Back-Cover Texts being "You have freedom to copy and modify
|
|
|
|
|
this GNU Manual, like GNU software". A copy of the license is included
|
|
|
|
|
in *note GNU Free Documentation License::.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
INFO-DIR-SECTION GNU libraries
|
|
|
|
|
START-INFO-DIR-ENTRY
|
2008-07-05 21:31:28 -04:00
|
|
|
|
* mpir: (mpir). MPIR Multiple Precision Integers and Rationals Library.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
END-INFO-DIR-ENTRY
|
|
|
|
|
|
2014-04-04 19:59:20 -04:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Nth Root Algorithm, Next: Perfect Square Algorithm, Prev: Square Root Algorithm, Up: Root Extraction Algorithms
|
|
|
|
|
|
|
|
|
|
15.5.2 Nth Root
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
Integer Nth roots are taken using Newton's method with the following
|
|
|
|
|
iteration, where A is the input and n is the root to be taken.
|
|
|
|
|
|
|
|
|
|
1 A
|
|
|
|
|
a[i+1] = - * ( --------- + (n-1)*a[i] )
|
|
|
|
|
n a[i]^(n-1)
|
|
|
|
|
|
|
|
|
|
The initial approximation a[1] is generated bitwise by successively
|
|
|
|
|
powering a trial root with or without new 1 bits, aiming to be just
|
2015-06-09 15:33:32 -04:00
|
|
|
|
above the true root. The iteration converges quadratically when started
|
|
|
|
|
from a good approximation. When n is large more initial bits are needed
|
|
|
|
|
to get good convergence. The current implementation is not particularly
|
|
|
|
|
well optimized.
|
2014-04-04 19:59:20 -04:00
|
|
|
|
|
2012-11-07 12:20:08 -05:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Perfect Square Algorithm, Next: Perfect Power Algorithm, Prev: Nth Root Algorithm, Up: Root Extraction Algorithms
|
|
|
|
|
|
|
|
|
|
15.5.3 Perfect Square
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
A significant fraction of non-squares can be quickly identified by
|
|
|
|
|
checking whether the input is a quadratic residue modulo small integers.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_perfect_square_p' first tests the input mod 256, which means
|
|
|
|
|
just examining the low byte. Only 44 different values occur for squares
|
|
|
|
|
mod 256, so 82.8% of inputs can be immediately identified as
|
2012-11-07 12:20:08 -05:00
|
|
|
|
non-squares.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
On a 32-bit system similar tests are done mod 9, 5, 7, 13 and 17, for
|
|
|
|
|
a total 99.25% of inputs identified as non-squares. On a 64-bit system
|
|
|
|
|
97 is tested too, for a total 99.62%.
|
2012-11-07 12:20:08 -05:00
|
|
|
|
|
|
|
|
|
These moduli are chosen because they're factors of 2^24-1 (or 2^48-1
|
|
|
|
|
for 64-bits), and such a remainder can be quickly taken just using
|
2015-06-09 15:33:32 -04:00
|
|
|
|
additions (see 'mpn_mod_34lsub1').
|
2012-11-07 12:20:08 -05:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
When nails are in use moduli are instead selected by the 'gen-psqr.c'
|
|
|
|
|
program and applied with an 'mpn_mod_1'. The same 2^24-1 or 2^48-1
|
2012-11-07 12:20:08 -05:00
|
|
|
|
could be done with nails using some extra bit shifts, but this is not
|
|
|
|
|
currently implemented.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
In any case each modulus is applied to the 'mpn_mod_34lsub1' or
|
|
|
|
|
'mpn_mod_1' remainder and a table lookup identifies non-squares. By
|
|
|
|
|
using a "modexact" style calculation, and suitably permuted tables, just
|
|
|
|
|
one multiply each is required, see the code for details. Moduli are
|
|
|
|
|
also combined to save operations, so long as the lookup tables don't
|
|
|
|
|
become too big. 'gen-psqr.c' does all the pre-calculations.
|
2012-11-07 12:20:08 -05:00
|
|
|
|
|
|
|
|
|
A square root must still be taken for any value that passes these
|
|
|
|
|
tests, to verify it's really a square and not one of the small fraction
|
|
|
|
|
of non-squares that get through (ie. a pseudo-square to all the tested
|
|
|
|
|
bases).
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Clearly more residue tests could be done, 'mpz_perfect_square_p' only
|
2012-11-07 12:20:08 -05:00
|
|
|
|
uses a compact and efficient set. Big inputs would probably benefit
|
|
|
|
|
from more residue testing, small inputs might be better off with less.
|
|
|
|
|
The assumed distribution of squares versus non-squares in the input
|
|
|
|
|
would affect such considerations.
|
|
|
|
|
|
2012-01-04 18:13:25 -05:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Perfect Power Algorithm, Prev: Perfect Square Algorithm, Up: Root Extraction Algorithms
|
|
|
|
|
|
|
|
|
|
15.5.4 Perfect Power
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
Detecting perfect powers is required by some factorization algorithms.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Currently 'mpz_perfect_power_p' is implemented using repeated Nth root
|
2012-01-04 18:13:25 -05:00
|
|
|
|
extractions, though naturally only prime roots need to be considered.
|
|
|
|
|
(*Note Nth Root Algorithm::.)
|
|
|
|
|
|
|
|
|
|
If a prime divisor p with multiplicity e can be found, then only
|
|
|
|
|
roots which are divisors of e need to be considered, much reducing the
|
|
|
|
|
work necessary. To this end divisibility by a set of small primes is
|
|
|
|
|
checked.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Radix Conversion Algorithms, Next: Other Algorithms, Prev: Root Extraction Algorithms, Up: Algorithms
|
|
|
|
|
|
|
|
|
|
15.6 Radix Conversion
|
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
Radix conversions are less important than other algorithms. A program
|
|
|
|
|
dominated by conversions should probably use a different data
|
|
|
|
|
representation.
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Binary to Radix::
|
|
|
|
|
* Radix to Binary::
|
|
|
|
|
|
2010-04-04 13:44:45 -04:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Binary to Radix, Next: Radix to Binary, Prev: Radix Conversion Algorithms, Up: Radix Conversion Algorithms
|
|
|
|
|
|
|
|
|
|
15.6.1 Binary to Radix
|
|
|
|
|
----------------------
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Conversions from binary to a power-of-2 radix use a simple and fast O(N)
|
|
|
|
|
bit extraction algorithm.
|
2010-04-04 13:44:45 -04:00
|
|
|
|
|
|
|
|
|
Conversions from binary to other radices use one of two algorithms.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Sizes below 'GET_STR_PRECOMPUTE_THRESHOLD' use a basic O(N^2) method.
|
2010-04-04 13:44:45 -04:00
|
|
|
|
Repeated divisions by b^n are made, where b is the radix and n is the
|
|
|
|
|
biggest power that fits in a limb. But instead of simply using the
|
|
|
|
|
remainder r from such divisions, an extra divide step is done to give a
|
|
|
|
|
fractional limb representing r/b^n. The digits of r can then be
|
|
|
|
|
extracted using multiplications by b rather than divisions. Special
|
|
|
|
|
case code is provided for decimal, allowing multiplications by 10 to
|
|
|
|
|
optimize to shifts and adds.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Above 'GET_STR_PRECOMPUTE_THRESHOLD' a sub-quadratic algorithm is
|
2010-04-04 13:44:45 -04:00
|
|
|
|
used. For an input t, powers b^(n*2^i) of the radix are calculated,
|
|
|
|
|
until a power between t and sqrt(t) is reached. t is then divided by
|
|
|
|
|
that largest power, giving a quotient which is the digits above that
|
|
|
|
|
power, and a remainder which is those below. These two parts are in
|
2015-06-09 15:33:32 -04:00
|
|
|
|
turn divided by the second highest power, and so on recursively. When a
|
|
|
|
|
piece has been divided down to less than 'GET_STR_DC_THRESHOLD' limbs,
|
|
|
|
|
the basecase algorithm described above is used.
|
2010-04-04 13:44:45 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The advantage of this algorithm is that big divisions can make use of
|
|
|
|
|
the sub-quadratic divide and conquer division (*note Divide and Conquer
|
|
|
|
|
Division::), and big divisions tend to have less overheads than lots of
|
|
|
|
|
separate single limb divisions anyway. But in any case the cost of
|
|
|
|
|
calculating the powers b^(n*2^i) must first be overcome.
|
2010-04-04 13:44:45 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'GET_STR_PRECOMPUTE_THRESHOLD' and 'GET_STR_DC_THRESHOLD' represent
|
2010-04-04 13:44:45 -04:00
|
|
|
|
the same basic thing, the point where it becomes worth doing a big
|
2015-06-09 15:33:32 -04:00
|
|
|
|
division to cut the input in half. 'GET_STR_PRECOMPUTE_THRESHOLD'
|
2010-04-04 13:44:45 -04:00
|
|
|
|
includes the cost of calculating the radix power required, whereas
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'GET_STR_DC_THRESHOLD' assumes that's already available, which is the
|
2010-04-04 13:44:45 -04:00
|
|
|
|
case when recursing.
|
|
|
|
|
|
|
|
|
|
Since the base case produces digits from least to most significant
|
|
|
|
|
but they want to be stored from most to least, it's necessary to
|
|
|
|
|
calculate in advance how many digits there will be, or at least be sure
|
|
|
|
|
not to underestimate that. For MPIR the number of input bits is
|
2015-06-09 15:33:32 -04:00
|
|
|
|
multiplied by 'chars_per_bit_exactly' from 'mp_bases', rounding up. The
|
|
|
|
|
result is either correct or one too big.
|
2010-04-04 13:44:45 -04:00
|
|
|
|
|
|
|
|
|
Examining some of the high bits of the input could increase the
|
|
|
|
|
chance of getting the exact number of digits, but an exact result every
|
|
|
|
|
time would not be practical, since in general the difference between
|
|
|
|
|
numbers 100... and 99... is only in the last few bits and the work to
|
2015-06-09 15:33:32 -04:00
|
|
|
|
identify 99... might well be almost as much as a full conversion.
|
2010-04-04 13:44:45 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpf_get_str' doesn't currently use the algorithm described here, it
|
2010-04-04 13:44:45 -04:00
|
|
|
|
multiplies or divides by a power of b to move the radix point to the
|
|
|
|
|
just above the highest non-zero digit (or at worst one above that
|
|
|
|
|
location), then multiplies by b^n to bring out digits. This is O(N^2)
|
|
|
|
|
and is certainly not optimal.
|
|
|
|
|
|
|
|
|
|
The r/b^n scheme described above for using multiplications to bring
|
|
|
|
|
out digits might be useful for more than a single limb. Some brief
|
|
|
|
|
experiments with it on the base case when recursing didn't give a
|
|
|
|
|
noticeable improvement, but perhaps that was only due to the
|
|
|
|
|
implementation. Something similar would work for the sub-quadratic
|
|
|
|
|
divisions too, though there would be the cost of calculating a bigger
|
|
|
|
|
radix power.
|
|
|
|
|
|
|
|
|
|
Another possible improvement for the sub-quadratic part would be to
|
|
|
|
|
arrange for radix powers that balanced the sizes of quotient and
|
|
|
|
|
remainder produced, ie. the highest power would be an b^(n*k)
|
|
|
|
|
approximately equal to sqrt(t), not restricted to a 2^i factor. That
|
|
|
|
|
ought to smooth out a graph of times against sizes, but may or may not
|
|
|
|
|
be a net speedup.
|
|
|
|
|
|
2009-09-14 06:34:26 -04:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Radix to Binary, Prev: Binary to Radix, Up: Radix Conversion Algorithms
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.6.2 Radix to Binary
|
2009-09-14 06:34:26 -04:00
|
|
|
|
----------------------
|
|
|
|
|
|
2010-04-04 20:18:02 -04:00
|
|
|
|
This section is out-of-date.
|
|
|
|
|
|
|
|
|
|
Conversions from a power-of-2 radix into binary use a simple and fast
|
2009-09-14 06:34:26 -04:00
|
|
|
|
O(N) bitwise concatenation algorithm.
|
|
|
|
|
|
|
|
|
|
Conversions from other radices use one of two algorithms. Sizes
|
2015-06-09 15:33:32 -04:00
|
|
|
|
below 'SET_STR_THRESHOLD' use a basic O(N^2) method. Groups of n digits
|
|
|
|
|
are converted to limbs, where n is the biggest power of the base b which
|
|
|
|
|
will fit in a limb, then those groups are accumulated into the result by
|
|
|
|
|
multiplying by b^n and adding. This saves multi-precision operations,
|
|
|
|
|
as per Knuth section 4.4 part E (*note References::). Some special case
|
|
|
|
|
code is provided for decimal, giving the compiler a chance to optimize
|
|
|
|
|
multiplications by 10.
|
|
|
|
|
|
|
|
|
|
Above 'SET_STR_THRESHOLD' a sub-quadratic algorithm is used. First
|
2009-09-14 06:34:26 -04:00
|
|
|
|
groups of n digits are converted into limbs. Then adjacent limbs are
|
|
|
|
|
combined into limb pairs with x*b^n+y, where x and y are the limbs.
|
|
|
|
|
Adjacent limb pairs are combined into quads similarly with x*b^(2n)+y.
|
|
|
|
|
This continues until a single block remains, that being the result.
|
|
|
|
|
|
|
|
|
|
The advantage of this method is that the multiplications for each x
|
|
|
|
|
are big blocks, allowing Karatsuba and higher algorithms to be used.
|
|
|
|
|
But the cost of calculating the powers b^(n*2^i) must be overcome.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'SET_STR_THRESHOLD' usually ends up quite big, around 5000 digits, and
|
2009-09-14 06:34:26 -04:00
|
|
|
|
on some processors much bigger still.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'SET_STR_THRESHOLD' is based on the input digits (and tuned for
|
2009-09-14 06:34:26 -04:00
|
|
|
|
decimal), though it might be better based on a limb count, so as to be
|
|
|
|
|
independent of the base. But that sort of count isn't used by the base
|
|
|
|
|
case and so would need some sort of initial calculation or estimate.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The main reason 'SET_STR_THRESHOLD' is so much bigger than the
|
|
|
|
|
corresponding 'GET_STR_PRECOMPUTE_THRESHOLD' is that 'mpn_mul_1' is much
|
|
|
|
|
faster than 'mpn_divrem_1' (often by a factor of 10, or more).
|
2009-09-14 06:34:26 -04:00
|
|
|
|
|
2009-09-13 06:35:22 -04:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Other Algorithms, Next: Assembler Coding, Prev: Radix Conversion Algorithms, Up: Algorithms
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.7 Other Algorithms
|
2009-09-13 06:35:22 -04:00
|
|
|
|
=====================
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Prime Testing Algorithm::
|
|
|
|
|
* Factorial Algorithm::
|
|
|
|
|
* Binomial Coefficients Algorithm::
|
|
|
|
|
* Fibonacci Numbers Algorithm::
|
|
|
|
|
* Lucas Numbers Algorithm::
|
|
|
|
|
* Random Number Algorithms::
|
|
|
|
|
|
2009-09-08 22:07:02 -04:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Prime Testing Algorithm, Next: Factorial Algorithm, Prev: Other Algorithms, Up: Other Algorithms
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.7.1 Prime Testing
|
2009-09-08 22:07:02 -04:00
|
|
|
|
--------------------
|
|
|
|
|
|
2010-04-04 20:18:02 -04:00
|
|
|
|
This section is somewhat out-of-date.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The primality testing in 'mpz_probab_prime_p' (*note Number Theoretic
|
|
|
|
|
Functions::) first does some trial division by small factors and then
|
|
|
|
|
uses the Miller-Rabin probabilistic primality testing algorithm, as
|
|
|
|
|
described in Knuth section 4.5.4 algorithm P (*note References::).
|
2009-09-08 22:07:02 -04:00
|
|
|
|
|
|
|
|
|
For an odd input n, and with n = q*2^k+1 where q is odd, this
|
|
|
|
|
algorithm selects a random base x and tests whether x^q mod n is 1 or
|
|
|
|
|
-1, or an x^(q*2^j) mod n is 1, for 1<=j<=k. If so then n is probably
|
|
|
|
|
prime, if not then n is definitely composite.
|
|
|
|
|
|
|
|
|
|
Any prime n will pass the test, but some composites do too. Such
|
2015-06-09 15:33:32 -04:00
|
|
|
|
composites are known as strong pseudoprimes to base x. No n is a strong
|
|
|
|
|
pseudoprime to more than 1/4 of all bases (see Knuth exercise 22), hence
|
|
|
|
|
with x chosen at random there's no more than a 1/4 chance a "probable
|
|
|
|
|
prime" will in fact be composite.
|
2009-09-08 22:07:02 -04:00
|
|
|
|
|
|
|
|
|
In fact strong pseudoprimes are quite rare, making the test much more
|
|
|
|
|
powerful than this analysis would suggest, but 1/4 is all that's proven
|
|
|
|
|
for an arbitrary n.
|
|
|
|
|
|
2009-09-03 22:20:31 -04:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Factorial Algorithm, Next: Binomial Coefficients Algorithm, Prev: Prime Testing Algorithm, Up: Other Algorithms
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.7.2 Factorial
|
2009-09-03 22:20:31 -04:00
|
|
|
|
----------------
|
|
|
|
|
|
2010-04-04 20:18:02 -04:00
|
|
|
|
This section is out-of-date.
|
|
|
|
|
|
|
|
|
|
Factorials are calculated by a combination of removal of twos,
|
2009-09-03 22:20:31 -04:00
|
|
|
|
powering, and binary splitting. The procedure can be best illustrated
|
|
|
|
|
with an example,
|
|
|
|
|
|
|
|
|
|
23! = 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23
|
|
|
|
|
|
|
|
|
|
has factors of two removed,
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
23! = 2^{19}.1.1.3.1.5.3.7.1.9.5.11.3.13.7.15.1.17.9.19.5.21.11.23
|
2009-09-03 22:20:31 -04:00
|
|
|
|
|
|
|
|
|
and the resulting terms collected up according to their multiplicity,
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
23! = 2^{19}.(3.5)^3.(7.9.11)^2.(13.15.17.19.21.23)
|
2009-09-03 22:20:31 -04:00
|
|
|
|
|
|
|
|
|
Each sequence such as 13.15.17.19.21.23 is evaluated by splitting
|
|
|
|
|
into every second term, as for instance (13.17.21).(15.19.23), and the
|
|
|
|
|
same recursively on each half. This is implemented iteratively using
|
|
|
|
|
some bit twiddling.
|
|
|
|
|
|
|
|
|
|
Such splitting is more efficient than repeated Nx1 multiplies since
|
|
|
|
|
it forms big multiplies, allowing Karatsuba and higher algorithms to be
|
2015-06-09 15:33:32 -04:00
|
|
|
|
used. And even below the Karatsuba threshold a big block of work can be
|
|
|
|
|
more efficient for the basecase algorithm.
|
2009-09-03 22:20:31 -04:00
|
|
|
|
|
|
|
|
|
Splitting into subsequences of every second term keeps the resulting
|
|
|
|
|
products more nearly equal in size than would the simpler approach of
|
2015-06-09 15:33:32 -04:00
|
|
|
|
say taking the first half and second half of the sequence. Nearly equal
|
|
|
|
|
products are more efficient for the current multiply implementation.
|
2009-09-03 22:20:31 -04:00
|
|
|
|
|
2009-08-17 04:40:14 -04:00
|
|
|
|
|
|
|
|
|
File: mpir.info, Node: Binomial Coefficients Algorithm, Next: Fibonacci Numbers Algorithm, Prev: Factorial Algorithm, Up: Other Algorithms
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.7.3 Binomial Coefficients
|
2009-08-17 04:40:14 -04:00
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
|
|
Binomial coefficients C(n,k) are calculated by first arranging k <= n/2
|
|
|
|
|
using C(n,k) = C(n,n-k) if necessary, and then evaluating the following
|
|
|
|
|
product simply from i=2 to i=k.
|
|
|
|
|
|
|
|
|
|
k (n-k+i)
|
|
|
|
|
C(n,k) = (n-k+1) * prod -------
|
|
|
|
|
i=2 i
|
|
|
|
|
|
|
|
|
|
It's easy to show that each denominator i will divide the product so
|
|
|
|
|
far, so the exact division algorithm is used (*note Exact Division::).
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The numerators n-k+i and denominators i are first accumulated into as
|
|
|
|
|
many fit a limb, to save multi-precision operations, though for
|
|
|
|
|
'mpz_bin_ui' this applies only to the divisors, since n is an 'mpz_t'
|
2009-08-17 04:40:14 -04:00
|
|
|
|
and n-k+i in general won't fit in a limb at all.
|
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Fibonacci Numbers Algorithm, Next: Lucas Numbers Algorithm, Prev: Binomial Coefficients Algorithm, Up: Other Algorithms
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.7.4 Fibonacci Numbers
|
2008-06-16 00:39:47 -04:00
|
|
|
|
------------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The Fibonacci functions 'mpz_fib_ui' and 'mpz_fib2_ui' are designed for
|
2008-04-17 17:03:07 -04:00
|
|
|
|
calculating isolated F[n] or F[n],F[n-1] values efficiently.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
For small n, a table of single limb values in '__gmp_fib_table' is
|
|
|
|
|
used. On a 32-bit limb this goes up to F[47], or on a 64-bit limb up to
|
|
|
|
|
F[93]. For convenience the table starts at F[-1].
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Beyond the table, values are generated with a binary powering
|
|
|
|
|
algorithm, calculating a pair F[n] and F[n-1] working from high to low
|
|
|
|
|
across the bits of n. The formulas used are
|
|
|
|
|
|
|
|
|
|
F[2k+1] = 4*F[k]^2 - F[k-1]^2 + 2*(-1)^k
|
|
|
|
|
F[2k-1] = F[k]^2 + F[k-1]^2
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
F[2k] = F[2k+1] - F[2k-1]
|
|
|
|
|
|
|
|
|
|
At each step, k is the high b bits of n. If the next bit of n is 0
|
|
|
|
|
then F[2k],F[2k-1] is used, or if it's a 1 then F[2k+1],F[2k] is used,
|
|
|
|
|
and the process repeated until all bits of n are incorporated. Notice
|
|
|
|
|
these formulas require just two squares per bit of n.
|
|
|
|
|
|
|
|
|
|
It'd be possible to handle the first few n above the single limb
|
|
|
|
|
table with simple additions, using the defining Fibonacci recurrence
|
|
|
|
|
F[k+1]=F[k]+F[k-1], but this is not done since it usually turns out to
|
|
|
|
|
be faster for only about 10 or 20 values of n, and including a block of
|
|
|
|
|
code for just those doesn't seem worthwhile. If they really mattered
|
|
|
|
|
it'd be better to extend the data table.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Using a table avoids lots of calculations on small numbers, and makes
|
|
|
|
|
small n go fast. A bigger table would make more small n go fast, it's
|
|
|
|
|
just a question of balancing size against desired speed. For MPIR the
|
|
|
|
|
code is kept compact, with the emphasis primarily on a good powering
|
|
|
|
|
algorithm.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_fib2_ui' returns both F[n] and F[n-1], but 'mpz_fib_ui' is only
|
2008-04-17 17:03:07 -04:00
|
|
|
|
interested in F[n]. In this case the last step of the algorithm can
|
|
|
|
|
become one multiply instead of two squares. One of the following two
|
|
|
|
|
formulas is used, according as n is odd or even.
|
|
|
|
|
|
|
|
|
|
F[2k] = F[k]*(F[k]+2F[k-1])
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
F[2k+1] = (2F[k]+F[k-1])*(2F[k]-F[k-1]) + 2*(-1)^k
|
|
|
|
|
|
|
|
|
|
F[2k+1] here is the same as above, just rearranged to be a multiply.
|
2010-03-24 11:47:51 -04:00
|
|
|
|
For interest, the 2*(-1)^k term both here and above can be applied just
|
|
|
|
|
to the low limb of the calculation, without a carry or borrow into
|
2008-04-17 17:03:07 -04:00
|
|
|
|
further limbs, which saves some code size. See comments with
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_fib_ui' and the internal 'mpn_fib2_ui' for how this is done.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Lucas Numbers Algorithm, Next: Random Number Algorithms, Prev: Fibonacci Numbers Algorithm, Up: Other Algorithms
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.7.5 Lucas Numbers
|
2008-06-16 00:39:47 -04:00
|
|
|
|
--------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_lucnum2_ui' derives a pair of Lucas numbers from a pair of
|
2008-04-17 17:03:07 -04:00
|
|
|
|
Fibonacci numbers with the following simple formulas.
|
|
|
|
|
|
|
|
|
|
L[k] = F[k] + 2*F[k-1]
|
|
|
|
|
L[k-1] = 2*F[k] - F[k-1]
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_lucnum_ui' is only interested in L[n], and some work can be
|
2008-04-17 17:03:07 -04:00
|
|
|
|
saved. Trailing zero bits on n can be handled with a single square
|
|
|
|
|
each.
|
|
|
|
|
|
|
|
|
|
L[2k] = L[k]^2 - 2*(-1)^k
|
|
|
|
|
|
|
|
|
|
And the lowest 1 bit can be handled with one multiply of a pair of
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Fibonacci numbers, similar to what 'mpz_fib_ui' does.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*(-1)^k
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Random Number Algorithms, Prev: Lucas Numbers Algorithm, Up: Other Algorithms
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.7.6 Random Numbers
|
2008-06-16 00:39:47 -04:00
|
|
|
|
---------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
For the 'urandomb' functions, random numbers are generated simply by
|
2008-04-17 17:03:07 -04:00
|
|
|
|
concatenating bits produced by the generator. As long as the generator
|
|
|
|
|
has good randomness properties this will produce well-distributed N bit
|
|
|
|
|
numbers.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
For the 'urandomm' functions, random numbers in a range 0<=R<N are
|
2008-04-17 17:03:07 -04:00
|
|
|
|
generated by taking values R of ceil(log2(N)) bits each until one
|
2015-06-09 15:33:32 -04:00
|
|
|
|
satisfies R<N. This will normally require only one or two attempts, but
|
|
|
|
|
the attempts are limited in case the generator is somehow degenerate and
|
|
|
|
|
produces only 1 bits or similar.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The Mersenne Twister generator is by Matsumoto and Nishimura (*note
|
|
|
|
|
References::). It has a non-repeating period of 2^19937-1, which is a
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Mersenne prime, hence the name of the generator. The state is 624 words
|
|
|
|
|
of 32-bits each, which is iterated with one XOR and shift for each
|
2008-04-17 17:03:07 -04:00
|
|
|
|
32-bit word generated, making the algorithm very fast. Randomness
|
|
|
|
|
properties are also very good and this is the default algorithm used by
|
2008-07-05 21:31:28 -04:00
|
|
|
|
MPIR.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Linear congruential generators are described in many text books, for
|
|
|
|
|
instance Knuth volume 2 (*note References::). With a modulus M and
|
|
|
|
|
parameters A and C, a integer state S is iterated by the formula S <-
|
|
|
|
|
A*S+C mod M. At each step the new state is a linear function of the
|
|
|
|
|
previous, mod M, hence the name of the generator.
|
|
|
|
|
|
2008-07-05 21:31:28 -04:00
|
|
|
|
In MPIR only moduli of the form 2^N are supported, and the current
|
2008-04-17 17:03:07 -04:00
|
|
|
|
implementation is not as well optimized as it could be. Overheads are
|
2015-06-09 15:33:32 -04:00
|
|
|
|
significant when N is small, and when N is large clearly the multiply at
|
|
|
|
|
each step will become slow. This is not a big concern, since the
|
2008-04-17 17:03:07 -04:00
|
|
|
|
Mersenne Twister generator is better in every respect and is therefore
|
|
|
|
|
recommended for all normal applications.
|
|
|
|
|
|
|
|
|
|
For both generators the current state can be deduced by observing
|
|
|
|
|
enough output and applying some linear algebra (over GF(2) in the case
|
2015-06-09 15:33:32 -04:00
|
|
|
|
of the Mersenne Twister). This generally means raw output is unsuitable
|
|
|
|
|
for cryptographic applications without further hashing or the like.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Coding, Prev: Other Algorithms, Up: Algorithms
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8 Assembler Coding
|
2008-06-16 00:39:47 -04:00
|
|
|
|
=====================
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2008-07-05 21:31:28 -04:00
|
|
|
|
The assembler subroutines in MPIR are the most significant source of
|
2008-04-17 17:03:07 -04:00
|
|
|
|
speed at small to moderate sizes. At larger sizes algorithm selection
|
|
|
|
|
becomes more important, but of course speedups in low level routines
|
|
|
|
|
will still speed up everything proportionally.
|
|
|
|
|
|
2008-07-05 21:31:28 -04:00
|
|
|
|
Carry handling and widening multiplies that are important for MPIR
|
2015-06-09 15:33:32 -04:00
|
|
|
|
can't be easily expressed in C. GCC 'asm' blocks help a lot and are
|
|
|
|
|
provided in 'longlong.h', but hand coding low level routines invariably
|
2008-04-17 17:03:07 -04:00
|
|
|
|
offers a speedup over generic C by a factor of anything from 2 to 10.
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Assembler Code Organisation::
|
|
|
|
|
* Assembler Basics::
|
|
|
|
|
* Assembler Carry Propagation::
|
|
|
|
|
* Assembler Cache Handling::
|
|
|
|
|
* Assembler Functional Units::
|
|
|
|
|
* Assembler Floating Point::
|
|
|
|
|
* Assembler SIMD Instructions::
|
|
|
|
|
* Assembler Software Pipelining::
|
|
|
|
|
* Assembler Loop Unrolling::
|
|
|
|
|
* Assembler Writing Guide::
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Code Organisation, Next: Assembler Basics, Prev: Assembler Coding, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.1 Code Organisation
|
2008-06-16 00:39:47 -04:00
|
|
|
|
------------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The various 'mpn' subdirectories contain machine-dependent code, written
|
|
|
|
|
in C or assembler. The 'mpn/generic' subdirectory contains default
|
2008-04-17 17:03:07 -04:00
|
|
|
|
code, used when there's no machine-specific version of a particular
|
|
|
|
|
file.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Each 'mpn' subdirectory is for an ISA family. Generally 32-bit and
|
2008-04-17 17:03:07 -04:00
|
|
|
|
64-bit variants in a family cannot share code and have separate
|
|
|
|
|
directories. Within a family further subdirectories may exist for CPU
|
|
|
|
|
variants.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
In each directory a 'nails' subdirectory may exist, holding code with
|
|
|
|
|
nails support for that CPU variant. A 'NAILS_SUPPORT' directive in each
|
2008-04-17 17:03:07 -04:00
|
|
|
|
file indicates the nails values the code handles. Nails code only
|
|
|
|
|
exists where it's faster, or promises to be faster, than plain code.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
There's no effort put into nails if they're not going to enhance a given
|
|
|
|
|
CPU.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Basics, Next: Assembler Carry Propagation, Prev: Assembler Code Organisation, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.2 Assembler Basics
|
2008-06-16 00:39:47 -04:00
|
|
|
|
-----------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpn_addmul_1' and 'mpn_submul_1' are the most important routines for
|
2008-07-05 21:31:28 -04:00
|
|
|
|
overall MPIR performance. All multiplications and divisions come down
|
2015-06-09 15:33:32 -04:00
|
|
|
|
to repeated calls to these. 'mpn_add_n', 'mpn_sub_n', 'mpn_lshift' and
|
|
|
|
|
'mpn_rshift' are next most important.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
On some CPUs assembler versions of the internal functions
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpn_mul_basecase' and 'mpn_sqr_basecase' give significant speedups,
|
2008-04-17 17:03:07 -04:00
|
|
|
|
mainly through avoiding function call overheads. They can also
|
|
|
|
|
potentially make better use of a wide superscalar processor, as can
|
2015-06-09 15:33:32 -04:00
|
|
|
|
bigger primitives like 'mpn_addmul_2' or 'mpn_addmul_4'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The restrictions on overlaps between sources and destinations (*note
|
|
|
|
|
Low-level Functions::) are designed to facilitate a variety of
|
2015-06-09 15:33:32 -04:00
|
|
|
|
implementations. For example, knowing 'mpn_add_n' won't have partly
|
2008-04-17 17:03:07 -04:00
|
|
|
|
overlapping sources and destination means reading can be done far ahead
|
|
|
|
|
of writing on superscalar processors, and loops can be vectorized on a
|
|
|
|
|
vector processor, depending on the carry handling.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Carry Propagation, Next: Assembler Cache Handling, Prev: Assembler Basics, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.3 Carry Propagation
|
2008-06-16 00:39:47 -04:00
|
|
|
|
------------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The problem that presents most challenges in MPIR is propagating carries
|
|
|
|
|
from one limb to the next. In functions like 'mpn_addmul_1' and
|
|
|
|
|
'mpn_add_n', carries are the only dependencies between limb operations.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
On processors with carry flags, a straightforward CISC style 'adc' is
|
|
|
|
|
generally best. AMD K6 'mpn_addmul_1' however is an example of an
|
2008-04-17 17:03:07 -04:00
|
|
|
|
unusual set of circumstances where a branch works out better.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
On RISC processors generally an add and compare for overflow is used.
|
|
|
|
|
This sort of thing can be seen in 'mpn/generic/aors_n.c'. Some carry
|
|
|
|
|
propagation schemes require 4 instructions, meaning at least 4 cycles
|
|
|
|
|
per limb, but other schemes may use just 1 or 2. On wide superscalar
|
|
|
|
|
processors performance may be completely determined by the number of
|
|
|
|
|
dependent instructions between carry-in and carry-out for each limb.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
On vector processors good use can be made of the fact that a carry
|
|
|
|
|
bit only very rarely propagates more than one limb. When adding a
|
|
|
|
|
single bit to a limb, there's only a carry out if that limb was
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'0xFF...FF' which on random data will be only 1 in 2^mp_bits_per_limb.
|
|
|
|
|
'mpn/cray/add_n.c' is an example of this, it adds all limbs in parallel,
|
|
|
|
|
adds one set of carry bits in parallel and then only rarely needs to
|
|
|
|
|
fall through to a loop propagating further carries.
|
|
|
|
|
|
|
|
|
|
On the x86s, GCC (as of version 2.95.2) doesn't generate particularly
|
|
|
|
|
good code for the RISC style idioms that are necessary to handle carry
|
|
|
|
|
bits in C. Often conditional jumps are generated where 'adc' or 'sbb'
|
|
|
|
|
forms would be better. And so unfortunately almost any loop involving
|
|
|
|
|
carry bits needs to be coded in assembler for best results.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Cache Handling, Next: Assembler Functional Units, Prev: Assembler Carry Propagation, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.4 Cache Handling
|
2008-06-16 00:39:47 -04:00
|
|
|
|
---------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
MPIR aims to perform well both on operands that fit entirely in L1 cache
|
|
|
|
|
and those which don't.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Basic routines like 'mpn_add_n' or 'mpn_lshift' are often used on
|
2008-04-17 17:03:07 -04:00
|
|
|
|
large operands, so L2 and main memory performance is important for them.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpn_mul_1' and 'mpn_addmul_1' are mostly used for multiply and square
|
2008-04-17 17:03:07 -04:00
|
|
|
|
basecases, so L1 performance matters most for them, unless assembler
|
2015-06-09 15:33:32 -04:00
|
|
|
|
versions of 'mpn_mul_basecase' and 'mpn_sqr_basecase' exist, in which
|
2008-04-17 17:03:07 -04:00
|
|
|
|
case the remaining uses are mostly for larger operands.
|
|
|
|
|
|
|
|
|
|
For L2 or main memory operands, memory access times will almost
|
|
|
|
|
certainly be more than the calculation time. The aim therefore is to
|
|
|
|
|
maximize memory throughput, by starting a load of the next cache line
|
2015-06-09 15:33:32 -04:00
|
|
|
|
while processing the contents of the previous one. Clearly this is only
|
|
|
|
|
possible if the chip has a lock-up free cache or some sort of prefetch
|
|
|
|
|
instruction. Most current chips have both these features.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Prefetching sources combines well with loop unrolling, since a
|
|
|
|
|
prefetch can be initiated once per unrolled loop (or more than once if
|
|
|
|
|
the loop covers more than one cache line).
|
|
|
|
|
|
|
|
|
|
On CPUs without write-allocate caches, prefetching destinations will
|
|
|
|
|
ensure individual stores don't go further down the cache hierarchy,
|
|
|
|
|
limiting bandwidth. Of course for calculations which are slow anyway,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
like 'mpn_divrem_1', write-throughs might be fine.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The distance ahead to prefetch will be determined by memory latency
|
|
|
|
|
versus throughput. The aim of course is to have data arriving
|
|
|
|
|
continuously, at peak throughput. Some CPUs have limits on the number
|
|
|
|
|
of fetches or prefetches in progress.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
If a special prefetch instruction doesn't exist then a plain load can
|
|
|
|
|
be used, but in that case care must be taken not to attempt to read past
|
|
|
|
|
the end of an operand, since that might produce a segmentation
|
2008-04-17 17:03:07 -04:00
|
|
|
|
violation.
|
|
|
|
|
|
|
|
|
|
Some CPUs or systems have hardware that detects sequential memory
|
|
|
|
|
accesses and initiates suitable cache movements automatically, making
|
|
|
|
|
life easy.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Functional Units, Next: Assembler Floating Point, Prev: Assembler Cache Handling, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.5 Functional Units
|
2008-06-16 00:39:47 -04:00
|
|
|
|
-----------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
When choosing an approach for an assembler loop, consideration is given
|
|
|
|
|
to what operations can execute simultaneously and what throughput can
|
|
|
|
|
thereby be achieved. In some cases an algorithm can be tweaked to
|
|
|
|
|
accommodate available resources.
|
|
|
|
|
|
|
|
|
|
Loop control will generally require a counter and pointer updates,
|
|
|
|
|
costing as much as 5 instructions, plus any delays a branch introduces.
|
|
|
|
|
CPU addressing modes might reduce pointer updates, perhaps by allowing
|
2015-06-09 15:33:32 -04:00
|
|
|
|
just one updating pointer and others expressed as offsets from it, or on
|
|
|
|
|
CISC chips with all addressing done with the loop counter as a scaled
|
|
|
|
|
index.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The final loop control cost can be amortised by processing several
|
|
|
|
|
limbs in each iteration (*note Assembler Loop Unrolling::). This at
|
|
|
|
|
least ensures loop control isn't a big fraction the work done.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Memory throughput is always a limit. If perhaps only one load or one
|
|
|
|
|
store can be done per cycle then 3 cycles/limb will the top speed for
|
|
|
|
|
"binary" operations like 'mpn_add_n', and any code achieving that is
|
|
|
|
|
optimal.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Integer resources can be freed up by having the loop counter in a
|
|
|
|
|
float register, or by pressing the float units into use for some
|
|
|
|
|
multiplying, perhaps doing every second limb on the float side (*note
|
|
|
|
|
Assembler Floating Point::).
|
|
|
|
|
|
|
|
|
|
Float resources can be freed up by doing carry propagation on the
|
|
|
|
|
integer side, or even by doing integer to float conversions in integers
|
|
|
|
|
using bit twiddling.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Floating Point, Next: Assembler SIMD Instructions, Prev: Assembler Functional Units, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.6 Floating Point
|
2008-06-16 00:39:47 -04:00
|
|
|
|
---------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2008-07-05 21:31:28 -04:00
|
|
|
|
Floating point arithmetic is used in MPIR for multiplications on CPUs
|
2015-06-09 15:33:32 -04:00
|
|
|
|
with poor integer multipliers. It's mostly useful for 'mpn_mul_1',
|
|
|
|
|
'mpn_addmul_1' and 'mpn_submul_1' on 64-bit machines, and
|
|
|
|
|
'mpn_mul_basecase' on both 32-bit and 64-bit machines.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
With IEEE 53-bit double precision floats, integer multiplications
|
|
|
|
|
producing up to 53 bits will give exact results. Breaking a 64x64
|
2015-06-09 15:33:32 -04:00
|
|
|
|
multiplication into eight 16x32->48 bit pieces is convenient. With some
|
|
|
|
|
care though six 21x32->53 bit products can be used, if one of the lower
|
|
|
|
|
two 21-bit pieces also uses the sign bit.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
For the 'mpn_mul_1' family of functions on a 64-bit machine, the
|
|
|
|
|
invariant single limb is split at the start, into 3 or 4 pieces. Inside
|
|
|
|
|
the loop, the bignum operand is split into 32-bit pieces. Fast
|
2008-04-17 17:03:07 -04:00
|
|
|
|
conversion of these unsigned 32-bit pieces to floating point is highly
|
|
|
|
|
machine-dependent. In some cases, reading the data into the integer
|
2015-06-09 15:33:32 -04:00
|
|
|
|
unit, zero-extending to 64-bits, then transferring to the floating point
|
|
|
|
|
unit back via memory is the only option.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Converting partial products back to 64-bit limbs is usually best done
|
|
|
|
|
as a signed conversion. Since all values are smaller than 2^53, signed
|
|
|
|
|
and unsigned are the same, but most processors lack unsigned
|
2008-04-17 17:03:07 -04:00
|
|
|
|
conversions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Here is a diagram showing 16x32 bit products for an 'mpn_mul_1' or
|
|
|
|
|
'mpn_addmul_1' with a 64-bit limb. The single limb operand V is split
|
2008-04-17 17:03:07 -04:00
|
|
|
|
into four 16-bit parts. The multi-limb operand U is split in the loop
|
|
|
|
|
into two 32-bit parts.
|
|
|
|
|
|
|
|
|
|
+---+---+---+---+
|
|
|
|
|
|v48|v32|v16|v00| V operand
|
|
|
|
|
+---+---+---+---+
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
+-------+---+---+
|
|
|
|
|
x | u32 | u00 | U operand (one limb)
|
|
|
|
|
+---------------+
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
---------------------------------
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
+-----------+
|
|
|
|
|
| u00 x v00 | p00 48-bit products
|
|
|
|
|
+-----------+
|
|
|
|
|
+-----------+
|
|
|
|
|
| u00 x v16 | p16
|
|
|
|
|
+-----------+
|
|
|
|
|
+-----------+
|
|
|
|
|
| u00 x v32 | p32
|
|
|
|
|
+-----------+
|
|
|
|
|
+-----------+
|
|
|
|
|
| u00 x v48 | p48
|
|
|
|
|
+-----------+
|
|
|
|
|
+-----------+
|
|
|
|
|
| u32 x v00 | r32
|
|
|
|
|
+-----------+
|
|
|
|
|
+-----------+
|
|
|
|
|
| u32 x v16 | r48
|
|
|
|
|
+-----------+
|
|
|
|
|
+-----------+
|
|
|
|
|
| u32 x v32 | r64
|
|
|
|
|
+-----------+
|
|
|
|
|
+-----------+
|
|
|
|
|
| u32 x v48 | r80
|
|
|
|
|
+-----------+
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
p32 and r32 can be summed using floating-point addition, and likewise
|
|
|
|
|
p48 and r48. p00 and p16 can be summed with r64 and r80 from the
|
|
|
|
|
previous iteration.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
For each loop then, four 49-bit quantities are transfered to the
|
|
|
|
|
integer unit, aligned as follows,
|
|
|
|
|
|
|
|
|
|
|-----64bits----|-----64bits----|
|
|
|
|
|
+------------+
|
|
|
|
|
| p00 + r64' | i00
|
|
|
|
|
+------------+
|
|
|
|
|
+------------+
|
|
|
|
|
| p16 + r80' | i16
|
|
|
|
|
+------------+
|
|
|
|
|
+------------+
|
|
|
|
|
| p32 + r32 | i32
|
|
|
|
|
+------------+
|
|
|
|
|
+------------+
|
|
|
|
|
| p48 + r48 | i48
|
|
|
|
|
+------------+
|
|
|
|
|
|
|
|
|
|
The challenge then is to sum these efficiently and add in a carry
|
|
|
|
|
limb, generating a low 64-bit result limb and a high 33-bit carry limb
|
|
|
|
|
(i48 extends 33 bits into the high half).
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler SIMD Instructions, Next: Assembler Software Pipelining, Prev: Assembler Floating Point, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.7 SIMD Instructions
|
2008-06-16 00:39:47 -04:00
|
|
|
|
------------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The single-instruction multiple-data support in current microprocessors
|
|
|
|
|
is aimed at signal processing algorithms where each data point can be
|
|
|
|
|
treated more or less independently. There's generally not much support
|
2008-07-05 21:31:28 -04:00
|
|
|
|
for propagating the sort of carries that arise in MPIR.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
SIMD multiplications of say four 16x16 bit multiplies only do as much
|
2008-07-05 21:31:28 -04:00
|
|
|
|
work as one 32x32 from MPIR's point of view, and need some shifts and
|
2015-06-09 15:33:32 -04:00
|
|
|
|
adds besides. But of course if say the SIMD form is fully pipelined and
|
|
|
|
|
uses less instruction decoding then it may still be worthwhile.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
On the x86 chips, MMX has so far found a use in 'mpn_rshift' and
|
|
|
|
|
'mpn_lshift', and is used in a special case for 16-bit multipliers in
|
|
|
|
|
the P55 'mpn_mul_1'. SSE2 is used for Pentium 4 'mpn_mul_1',
|
|
|
|
|
'mpn_addmul_1', and 'mpn_submul_1'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Software Pipelining, Next: Assembler Loop Unrolling, Prev: Assembler SIMD Instructions, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.8 Software Pipelining
|
2008-06-16 00:39:47 -04:00
|
|
|
|
--------------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Software pipelining consists of scheduling instructions around the
|
|
|
|
|
branch point in a loop. For example a loop might issue a load not for
|
2015-06-09 15:33:32 -04:00
|
|
|
|
use in the present iteration but the next, thereby allowing extra cycles
|
|
|
|
|
for the data to arrive from memory.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Naturally this is wanted only when doing things like loads or
|
|
|
|
|
multiplies that take several cycles to complete, and only where a CPU
|
|
|
|
|
has multiple functional units so that other work can be done in the
|
|
|
|
|
meantime.
|
|
|
|
|
|
|
|
|
|
A pipeline with several stages will have a data value in progress at
|
|
|
|
|
each stage and each loop iteration moves them along one stage. This is
|
|
|
|
|
like juggling.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
If the latency of some instruction is greater than the loop time then
|
|
|
|
|
it will be necessary to unroll, so one register has a result ready to
|
|
|
|
|
use while another (or multiple others) are still in progress. (*note
|
|
|
|
|
Assembler Loop Unrolling::).
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Loop Unrolling, Next: Assembler Writing Guide, Prev: Assembler Software Pipelining, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.9 Loop Unrolling
|
2008-06-16 00:39:47 -04:00
|
|
|
|
---------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Loop unrolling consists of replicating code so that several limbs are
|
|
|
|
|
processed in each loop. At a minimum this reduces loop overheads by a
|
|
|
|
|
corresponding factor, but it can also allow better register usage, for
|
|
|
|
|
example alternately using one register combination and then another.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Judicious use of 'm4' macros can help avoid lots of duplication in the
|
2008-04-17 17:03:07 -04:00
|
|
|
|
source code.
|
|
|
|
|
|
|
|
|
|
Any amount of unrolling can be handled with a loop counter that's
|
|
|
|
|
decremented by N each time, stopping when the remaining count is less
|
|
|
|
|
than the further N the loop will process. Or by subtracting N at the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
start, the termination condition becomes when the counter C is less than
|
|
|
|
|
0 (and the count of remaining limbs is C+N).
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Alternately for a power of 2 unroll the loop count and remainder can
|
2015-06-09 15:33:32 -04:00
|
|
|
|
be established with a shift and mask. This is convenient if also making
|
|
|
|
|
a computed jump into the middle of a large loop.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The limbs not a multiple of the unrolling can be handled in various
|
|
|
|
|
ways, for example
|
|
|
|
|
|
|
|
|
|
* A simple loop at the end (or the start) to process the excess.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Care will be wanted that it isn't too much slower than the unrolled
|
|
|
|
|
part.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* A set of binary tests, for example after an 8-limb unrolling, test
|
|
|
|
|
for 4 more limbs to process, then a further 2 more or not, and
|
|
|
|
|
finally 1 more or not. This will probably take more code space
|
|
|
|
|
than a simple loop.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
* A 'switch' statement, providing separate code for each possible
|
2008-04-17 17:03:07 -04:00
|
|
|
|
excess, for example an 8-limb unrolling would have separate code
|
|
|
|
|
for 0 remaining, 1 remaining, etc, up to 7 remaining. This might
|
|
|
|
|
take a lot of code, but may be the best way to optimize all cases
|
|
|
|
|
in combination with a deep pipelined loop.
|
|
|
|
|
|
|
|
|
|
* A computed jump into the middle of the loop, thus making the first
|
|
|
|
|
iteration handle the excess. This should make times smoothly
|
|
|
|
|
increase with size, which is attractive, but setups for the jump
|
|
|
|
|
and adjustments for pointers can be tricky and could become quite
|
|
|
|
|
difficult in combination with deep pipelining.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Assembler Writing Guide, Prev: Assembler Loop Unrolling, Up: Assembler Coding
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
15.8.10 Writing Guide
|
2008-06-16 00:39:47 -04:00
|
|
|
|
---------------------
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
This is a guide to writing software pipelined loops for processing limb
|
|
|
|
|
vectors in assembler.
|
|
|
|
|
|
|
|
|
|
First determine the algorithm and which instructions are needed.
|
|
|
|
|
Code it without unrolling or scheduling, to make sure it works. On a
|
|
|
|
|
3-operand CPU try to write each new value to a new register, this will
|
|
|
|
|
greatly simplify later steps.
|
|
|
|
|
|
|
|
|
|
Then note for each instruction the functional unit and/or issue port
|
2015-06-09 15:33:32 -04:00
|
|
|
|
requirements. If an instruction can use either of two units, like U0 or
|
|
|
|
|
U1 then make a category "U0/U1". Count the total using each unit (or
|
|
|
|
|
combined unit), and count all instructions.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Figure out from those counts the best possible loop time. The goal
|
|
|
|
|
will be to find a perfect schedule where instruction latencies are
|
|
|
|
|
completely hidden. The total instruction count might be the limiting
|
|
|
|
|
factor, or perhaps a particular functional unit. It might be possible
|
|
|
|
|
to tweak the instructions to help the limiting factor.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Suppose the loop time is N, then make N issue buckets, with the final
|
|
|
|
|
loop branch at the end of the last. Now fill the buckets with dummy
|
|
|
|
|
instructions using the functional units desired. Run this to make sure
|
|
|
|
|
the intended speed is reached.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Now replace the dummy instructions with the real instructions from
|
2015-06-09 15:33:32 -04:00
|
|
|
|
the slow but correct loop you started with. The first will typically be
|
|
|
|
|
a load instruction. Then the instruction using that value is placed in
|
|
|
|
|
a bucket an appropriate distance down. Run the loop again, to check it
|
|
|
|
|
still runs at target speed.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Keep placing instructions, frequently measuring the loop. After a
|
2015-06-09 15:33:32 -04:00
|
|
|
|
few you will need to wrap around from the last bucket back to the top of
|
|
|
|
|
the loop. If you used the new-register for new-value strategy above
|
2008-04-17 17:03:07 -04:00
|
|
|
|
then there will be no register conflicts. If not then take care not to
|
|
|
|
|
clobber something already in use. Changing registers at this time is
|
|
|
|
|
very error prone.
|
|
|
|
|
|
|
|
|
|
The loop will overlap two or more of the original loop iterations,
|
|
|
|
|
and the computation of one vector element result will be started in one
|
|
|
|
|
iteration of the new loop, and completed one or several iterations
|
|
|
|
|
later.
|
|
|
|
|
|
|
|
|
|
The final step is to create feed-in and wind-down code for the loop.
|
|
|
|
|
A good way to do this is to make a copy (or copies) of the loop at the
|
|
|
|
|
start and delete those instructions which don't have valid antecedents,
|
|
|
|
|
and at the end replicate and delete those whose results are unwanted
|
|
|
|
|
(including any further loads).
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The loop will have a minimum number of limbs loaded and processed, so
|
|
|
|
|
the feed-in code must test if the request size is smaller and skip
|
2008-04-17 17:03:07 -04:00
|
|
|
|
either to a suitable part of the wind-down or to special code for small
|
|
|
|
|
sizes.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Internals, Next: Contributors, Prev: Algorithms, Up: Top
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
16 Internals
|
2008-06-16 00:39:47 -04:00
|
|
|
|
************
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
*This chapter is provided only for informational purposes and the
|
2008-07-05 21:31:28 -04:00
|
|
|
|
various internals described here may change in future MPIR releases.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
Applications expecting to be compatible with future releases should use
|
|
|
|
|
only the documented interfaces described in previous chapters.*
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Integer Internals::
|
|
|
|
|
* Rational Internals::
|
|
|
|
|
* Float Internals::
|
|
|
|
|
* Raw Output Internals::
|
|
|
|
|
* C++ Interface Internals::
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Integer Internals, Next: Rational Internals, Prev: Internals, Up: Internals
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
16.1 Integer Internals
|
2008-06-16 00:39:47 -04:00
|
|
|
|
======================
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_t' variables represent integers using sign and magnitude, in space
|
2008-04-17 17:03:07 -04:00
|
|
|
|
dynamically allocated and reallocated. The fields are as follows.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_size'
|
2008-04-17 17:03:07 -04:00
|
|
|
|
The number of limbs, or the negative of that when representing a
|
2015-06-09 15:33:32 -04:00
|
|
|
|
negative integer. Zero is represented by '_mp_size' set to zero,
|
|
|
|
|
in which case the '_mp_d' data is unused.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_d'
|
2008-04-17 17:03:07 -04:00
|
|
|
|
A pointer to an array of limbs which is the magnitude. These are
|
2015-06-09 15:33:32 -04:00
|
|
|
|
stored "little endian" as per the 'mpn' functions, so '_mp_d[0]' is
|
|
|
|
|
the least significant limb and '_mp_d[ABS(_mp_size)-1]' is the most
|
|
|
|
|
significant. Whenever '_mp_size' is non-zero, the most significant
|
|
|
|
|
limb is non-zero.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Currently there's always at least one limb allocated, so for
|
2015-06-09 15:33:32 -04:00
|
|
|
|
instance 'mpz_set_ui' never needs to reallocate, and 'mpz_get_ui'
|
|
|
|
|
can fetch '_mp_d[0]' unconditionally (though its value is then only
|
|
|
|
|
wanted if '_mp_size' is non-zero).
|
|
|
|
|
|
|
|
|
|
'_mp_alloc'
|
|
|
|
|
'_mp_alloc' is the number of limbs currently allocated at '_mp_d',
|
|
|
|
|
and naturally '_mp_alloc >= ABS(_mp_size)'. When an 'mpz' routine
|
|
|
|
|
is about to (or might be about to) increase '_mp_size', it checks
|
|
|
|
|
'_mp_alloc' to see whether there's enough space, and reallocates if
|
|
|
|
|
not. 'MPZ_REALLOC' is generally used for this.
|
|
|
|
|
|
|
|
|
|
The various bitwise logical functions like 'mpz_and' behave as if
|
2008-04-17 17:03:07 -04:00
|
|
|
|
negative values were twos complement. But sign and magnitude is always
|
|
|
|
|
used internally, and necessary adjustments are made during the
|
|
|
|
|
calculations. Sometimes this isn't pretty, but sign and magnitude are
|
|
|
|
|
best for other routines.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Some internal temporary variables are setup with 'MPZ_TMP_INIT' and
|
|
|
|
|
these have '_mp_d' space obtained from 'TMP_ALLOC' rather than the
|
|
|
|
|
memory allocation functions. Care is taken to ensure that these are big
|
|
|
|
|
enough that no reallocation is necessary (since it would have
|
2008-04-17 17:03:07 -04:00
|
|
|
|
unpredictable consequences).
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_size' and '_mp_alloc' are 'int', although 'mp_size_t' is usually
|
|
|
|
|
a 'long'. This is done to make the fields just 32 bits on some 64 bits
|
|
|
|
|
systems, thereby saving a few bytes of data space but still providing
|
|
|
|
|
plenty of range.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Rational Internals, Next: Float Internals, Prev: Integer Internals, Up: Internals
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
16.2 Rational Internals
|
2008-06-16 00:39:47 -04:00
|
|
|
|
=======================
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpq_t' variables represent rationals using an 'mpz_t' numerator and
|
2008-04-17 17:03:07 -04:00
|
|
|
|
denominator (*note Integer Internals::).
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The canonical form adopted is denominator positive (and non-zero), no
|
|
|
|
|
common factors between numerator and denominator, and zero uniquely
|
2008-04-17 17:03:07 -04:00
|
|
|
|
represented as 0/1.
|
|
|
|
|
|
|
|
|
|
It's believed that casting out common factors at each stage of a
|
|
|
|
|
calculation is best in general. A GCD is an O(N^2) operation so it's
|
2015-06-09 15:33:32 -04:00
|
|
|
|
better to do a few small ones immediately than to delay and have to do a
|
|
|
|
|
big one later. Knowing the numerator and denominator have no common
|
|
|
|
|
factors can be used for example in 'mpq_mul' to make only two cross GCDs
|
|
|
|
|
necessary, not four.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
This general approach to common factors is badly sub-optimal in the
|
|
|
|
|
presence of simple factorizations or little prospect for cancellation,
|
2010-03-24 11:47:51 -04:00
|
|
|
|
but MPIR has no way to know when this will occur. As per *note
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Efficiency::, that's left to applications. The 'mpq_t' framework might
|
|
|
|
|
still suit, with 'mpq_numref' and 'mpq_denref' for direct access to the
|
|
|
|
|
numerator and denominator, or of course 'mpz_t' variables can be used
|
2008-04-17 17:03:07 -04:00
|
|
|
|
directly.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Float Internals, Next: Raw Output Internals, Prev: Rational Internals, Up: Internals
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
16.3 Float Internals
|
2008-06-16 00:39:47 -04:00
|
|
|
|
====================
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2008-07-05 21:31:28 -04:00
|
|
|
|
Efficient calculation is the primary aim of MPIR floats and the use of
|
2008-04-17 17:03:07 -04:00
|
|
|
|
whole limbs and simple rounding facilitates this.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpf_t' floats have a variable precision mantissa and a single
|
2008-04-17 17:03:07 -04:00
|
|
|
|
machine word signed exponent. The mantissa is represented using sign
|
|
|
|
|
and magnitude.
|
|
|
|
|
|
|
|
|
|
most least
|
|
|
|
|
significant significant
|
|
|
|
|
limb limb
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
_mp_d
|
|
|
|
|
|---- _mp_exp ---> |
|
|
|
|
|
_____ _____ _____ _____ _____
|
|
|
|
|
|_____|_____|_____|_____|_____|
|
|
|
|
|
. <------------ radix point
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
<-------- _mp_size --------->
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
The fields are as follows.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_size'
|
2008-04-17 17:03:07 -04:00
|
|
|
|
The number of limbs currently in use, or the negative of that when
|
2015-06-09 15:33:32 -04:00
|
|
|
|
representing a negative value. Zero is represented by '_mp_size'
|
|
|
|
|
and '_mp_exp' both set to zero, and in that case the '_mp_d' data
|
|
|
|
|
is unused. (In the future '_mp_exp' might be undefined when
|
2008-04-17 17:03:07 -04:00
|
|
|
|
representing zero.)
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_prec'
|
2008-04-17 17:03:07 -04:00
|
|
|
|
The precision of the mantissa, in limbs. In any calculation the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
aim is to produce '_mp_prec' limbs of result (the most significant
|
2008-04-17 17:03:07 -04:00
|
|
|
|
being non-zero).
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_d'
|
2008-04-17 17:03:07 -04:00
|
|
|
|
A pointer to the array of limbs which is the absolute value of the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
mantissa. These are stored "little endian" as per the 'mpn'
|
|
|
|
|
functions, so '_mp_d[0]' is the least significant limb and
|
|
|
|
|
'_mp_d[ABS(_mp_size)-1]' the most significant.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The most significant limb is always non-zero, but there are no
|
|
|
|
|
other restrictions on its value, in particular the highest 1 bit
|
|
|
|
|
can be anywhere within the limb.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_prec+1' limbs are allocated to '_mp_d', the extra limb being
|
2008-04-17 17:03:07 -04:00
|
|
|
|
for convenience (see below). There are no reallocations during a
|
2015-06-09 15:33:32 -04:00
|
|
|
|
calculation, only in a change of precision with 'mpf_set_prec'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_exp'
|
2008-04-17 17:03:07 -04:00
|
|
|
|
The exponent, in limbs, determining the location of the implied
|
|
|
|
|
radix point. Zero means the radix point is just above the most
|
|
|
|
|
significant limb. Positive values mean a radix point offset
|
|
|
|
|
towards the lower limbs and hence a value >= 1, as for example in
|
|
|
|
|
the diagram above. Negative exponents mean a radix point further
|
|
|
|
|
above the highest limb.
|
|
|
|
|
|
|
|
|
|
Naturally the exponent can be any value, it doesn't have to fall
|
|
|
|
|
within the limbs as the diagram shows, it can be a long way above
|
|
|
|
|
or a long way below. Limbs other than those included in the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'{_mp_d,_mp_size}' data are treated as zero.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_size' and '_mp_prec' are 'int', although 'mp_size_t' is usually
|
|
|
|
|
a 'long'. This is done to make the fields just 32 bits on some 64 bits
|
2008-04-17 17:03:07 -04:00
|
|
|
|
systems, thereby saving a few bytes of data space but still providing
|
|
|
|
|
plenty of range.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following various points should be noted.
|
|
|
|
|
|
|
|
|
|
Low Zeros
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The least significant limbs '_mp_d[0]' etc can be zero, though such
|
|
|
|
|
low zeros can always be ignored. Routines likely to produce low
|
|
|
|
|
zeros check and avoid them to save time in subsequent calculations,
|
|
|
|
|
but for most routines they're quite unlikely and aren't checked.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Mantissa Size Range
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The '_mp_size' count of limbs in use can be less than '_mp_prec' if
|
2008-04-17 17:03:07 -04:00
|
|
|
|
the value can be represented in less. This means low precision
|
2015-06-09 15:33:32 -04:00
|
|
|
|
values or small integers stored in a high precision 'mpf_t' can
|
2008-04-17 17:03:07 -04:00
|
|
|
|
still be operated on efficiently.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_size' can also be greater than '_mp_prec'. Firstly a value is
|
|
|
|
|
allowed to use all of the '_mp_prec+1' limbs available at '_mp_d',
|
|
|
|
|
and secondly when 'mpf_set_prec_raw' lowers '_mp_prec' it leaves
|
|
|
|
|
'_mp_size' unchanged and so the size can be arbitrarily bigger than
|
|
|
|
|
'_mp_prec'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Rounding
|
2015-06-09 15:33:32 -04:00
|
|
|
|
All rounding is done on limb boundaries. Calculating '_mp_prec'
|
2008-04-17 17:03:07 -04:00
|
|
|
|
limbs with the high non-zero will ensure the application requested
|
|
|
|
|
minimum precision is obtained.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The use of simple "trunc" rounding towards zero is efficient, since
|
|
|
|
|
there's no need to examine extra limbs and increment or decrement.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Bit Shifts
|
|
|
|
|
Since the exponent is in limbs, there are no bit shifts in basic
|
2015-06-09 15:33:32 -04:00
|
|
|
|
operations like 'mpf_add' and 'mpf_mul'. When differing exponents
|
2008-04-17 17:03:07 -04:00
|
|
|
|
are encountered all that's needed is to adjust pointers to line up
|
|
|
|
|
the relevant limbs.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Of course 'mpf_mul_2exp' and 'mpf_div_2exp' will require bit
|
2008-04-17 17:03:07 -04:00
|
|
|
|
shifts, but the choice is between an exponent in limbs which
|
|
|
|
|
requires shifts there, or one in bits which requires them almost
|
|
|
|
|
everywhere else.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Use of '_mp_prec+1' Limbs
|
|
|
|
|
The extra limb on '_mp_d' ('_mp_prec+1' rather than just
|
|
|
|
|
'_mp_prec') helps when an 'mpf' routine might get a carry from its
|
|
|
|
|
operation. 'mpf_add' for instance will do an 'mpn_add' of
|
|
|
|
|
'_mp_prec' limbs. If there's no carry then that's the result, but
|
2008-04-17 17:03:07 -04:00
|
|
|
|
if there is a carry then it's stored in the extra limb of space and
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_size' becomes '_mp_prec+1'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Whenever '_mp_prec+1' limbs are held in a variable, the low limb is
|
|
|
|
|
not needed for the intended precision, only the '_mp_prec' high
|
2008-04-17 17:03:07 -04:00
|
|
|
|
limbs. But zeroing it out or moving the rest down is unnecessary.
|
|
|
|
|
Subsequent routines reading the value will simply take the high
|
2015-06-09 15:33:32 -04:00
|
|
|
|
limbs they need, and this will be '_mp_prec' if their target has
|
2008-04-17 17:03:07 -04:00
|
|
|
|
that same precision. This is no more than a pointer adjustment,
|
|
|
|
|
and must be checked anyway since the destination precision can be
|
|
|
|
|
different from the sources.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Copy functions like 'mpf_set' will retain a full '_mp_prec+1' limbs
|
|
|
|
|
if available. This ensures that a variable which has '_mp_size'
|
|
|
|
|
equal to '_mp_prec+1' will get its full exact value copied.
|
|
|
|
|
Strictly speaking this is unnecessary since only '_mp_prec' limbs
|
2008-04-17 17:03:07 -04:00
|
|
|
|
are needed for the application's requested precision, but it's
|
2015-06-09 15:33:32 -04:00
|
|
|
|
considered that an 'mpf_set' from one variable into another of the
|
2008-04-17 17:03:07 -04:00
|
|
|
|
same precision ought to produce an exact copy.
|
|
|
|
|
|
|
|
|
|
Application Precisions
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'__GMPF_BITS_TO_PREC' converts an application requested precision
|
|
|
|
|
to an '_mp_prec'. The value in bits is rounded up to a whole limb
|
2008-04-17 17:03:07 -04:00
|
|
|
|
then an extra limb is added since the most significant limb of
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'_mp_d' is only non-zero and therefore might contain only one bit.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'__GMPF_PREC_TO_BITS' does the reverse conversion, and removes the
|
|
|
|
|
extra limb from '_mp_prec' before converting to bits. The net
|
|
|
|
|
effect of reading back with 'mpf_get_prec' is simply the precision
|
|
|
|
|
rounded up to a multiple of 'mp_bits_per_limb'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Note that the extra limb added here for the high only being
|
2015-06-09 15:33:32 -04:00
|
|
|
|
non-zero is in addition to the extra limb allocated to '_mp_d'.
|
|
|
|
|
For example with a 32-bit limb, an application request for 250 bits
|
|
|
|
|
will be rounded up to 8 limbs, then an extra added for the high
|
|
|
|
|
being only non-zero, giving an '_mp_prec' of 9. '_mp_d' then gets
|
|
|
|
|
10 limbs allocated. Reading back with 'mpf_get_prec' will take
|
|
|
|
|
'_mp_prec' subtract 1 limb and multiply by 32, giving 256 bits.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Strictly speaking, the fact the high limb has at least one bit
|
|
|
|
|
means that a float with, say, 3 limbs of 32-bits each will be
|
2015-06-09 15:33:32 -04:00
|
|
|
|
holding at least 65 bits, but for the purposes of 'mpf_t' it's
|
2008-04-17 17:03:07 -04:00
|
|
|
|
considered simply to be 64 bits, a nice multiple of the limb size.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Raw Output Internals, Next: C++ Interface Internals, Prev: Float Internals, Up: Internals
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
16.4 Raw Output Internals
|
2008-06-16 00:39:47 -04:00
|
|
|
|
=========================
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_out_raw' uses the following format.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
+------+------------------------+
|
|
|
|
|
| size | data bytes |
|
|
|
|
|
+------+------------------------+
|
|
|
|
|
|
|
|
|
|
The size is 4 bytes written most significant byte first, being the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
number of subsequent data bytes, or the twos complement negative of that
|
|
|
|
|
when a negative integer is represented. The data bytes are the absolute
|
|
|
|
|
value of the integer, written most significant byte first.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The most significant data byte is always non-zero, so the output is
|
|
|
|
|
the same on all systems, irrespective of limb size.
|
|
|
|
|
|
|
|
|
|
In GMP 1, leading zero bytes were written to pad the data bytes to a
|
2015-06-09 15:33:32 -04:00
|
|
|
|
multiple of the limb size. 'mpz_inp_raw' will still accept this, for
|
2008-04-17 17:03:07 -04:00
|
|
|
|
compatibility.
|
|
|
|
|
|
|
|
|
|
The use of "big endian" for both the size and data fields is
|
|
|
|
|
deliberate, it makes the data easy to read in a hex dump of a file.
|
|
|
|
|
Unfortunately it also means that the limb data must be reversed when
|
2015-06-09 15:33:32 -04:00
|
|
|
|
reading or writing, so neither a big endian nor little endian system can
|
|
|
|
|
just read and write '_mp_d'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: C++ Interface Internals, Prev: Raw Output Internals, Up: Internals
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
16.5 C++ Interface Internals
|
2008-06-16 00:39:47 -04:00
|
|
|
|
============================
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
A system of expression templates is used to ensure something like
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'a=b+c' turns into a simple call to 'mpz_add' etc. For 'mpf_class' the
|
2008-04-17 17:03:07 -04:00
|
|
|
|
scheme also ensures the precision of the final destination is used for
|
2015-06-09 15:33:32 -04:00
|
|
|
|
any temporaries within a statement like 'f=w*x+y*z'. These are
|
2008-04-17 17:03:07 -04:00
|
|
|
|
important features which a naive implementation cannot provide.
|
|
|
|
|
|
|
|
|
|
A simplified description of the scheme follows. The true scheme is
|
|
|
|
|
complicated by the fact that expressions have different return types.
|
|
|
|
|
For detailed information, refer to the source code.
|
|
|
|
|
|
|
|
|
|
To perform an operation, say, addition, we first define a "function
|
|
|
|
|
object" evaluating it,
|
|
|
|
|
|
|
|
|
|
struct __gmp_binary_plus
|
|
|
|
|
{
|
|
|
|
|
static void eval(mpf_t f, mpf_t g, mpf_t h) { mpf_add(f, g, h); }
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
And an "additive expression" object,
|
|
|
|
|
|
|
|
|
|
__gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
|
|
|
|
|
operator+(const mpf_class &f, const mpf_class &g)
|
|
|
|
|
{
|
|
|
|
|
return __gmp_expr
|
|
|
|
|
<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
|
|
|
|
|
}
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
The seemingly redundant '__gmp_expr<__gmp_binary_expr<...>>' is used
|
2008-04-17 17:03:07 -04:00
|
|
|
|
to encapsulate any possible kind of expression into a single template
|
2015-06-09 15:33:32 -04:00
|
|
|
|
type. In fact even 'mpf_class' etc are 'typedef' specializations of
|
|
|
|
|
'__gmp_expr'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Next we define assignment of '__gmp_expr' to 'mpf_class'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
template <class T>
|
|
|
|
|
mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
|
|
|
|
|
{
|
|
|
|
|
expr.eval(this->get_mpf_t(), this->precision());
|
|
|
|
|
return *this;
|
|
|
|
|
}
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
template <class Op>
|
|
|
|
|
void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
|
2010-04-04 15:03:22 -04:00
|
|
|
|
(mpf_t f, mp_bitcnt_t precision)
|
2008-04-17 17:03:07 -04:00
|
|
|
|
{
|
|
|
|
|
Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
|
|
|
|
|
}
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
where 'expr.val1' and 'expr.val2' are references to the expression's
|
|
|
|
|
operands (here 'expr' is the '__gmp_binary_expr' stored within the
|
|
|
|
|
'__gmp_expr').
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
This way, the expression is actually evaluated only at the time of
|
2015-06-09 15:33:32 -04:00
|
|
|
|
assignment, when the required precision (that of 'f') is known.
|
|
|
|
|
Furthermore the target 'mpf_t' is now available, thus we can call
|
|
|
|
|
'mpf_add' directly with 'f' as the output argument.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Compound expressions are handled by defining operators taking
|
|
|
|
|
subexpressions as their arguments, like this:
|
|
|
|
|
|
|
|
|
|
template <class T, class U>
|
|
|
|
|
__gmp_expr
|
|
|
|
|
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
|
|
|
|
|
operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
|
|
|
|
|
{
|
|
|
|
|
return __gmp_expr
|
|
|
|
|
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
|
|
|
|
|
(expr1, expr2);
|
|
|
|
|
}
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
And the corresponding specializations of '__gmp_expr::eval':
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
template <class T, class U, class Op>
|
|
|
|
|
void __gmp_expr
|
|
|
|
|
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
|
2010-04-04 15:03:22 -04:00
|
|
|
|
(mpf_t f, mp_bitcnt_t precision)
|
2008-04-17 17:03:07 -04:00
|
|
|
|
{
|
|
|
|
|
// declare two temporaries
|
|
|
|
|
mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
|
|
|
|
|
Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
The expression is thus recursively evaluated to any level of
|
2015-06-09 15:33:32 -04:00
|
|
|
|
complexity and all subexpressions are evaluated to the precision of 'f'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Contributors, Next: References, Prev: Internals, Up: Top
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2014-02-04 08:24:05 -05:00
|
|
|
|
Appendix A Contributors
|
|
|
|
|
***********************
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Torbjorn Granlund wrote the original GMP library and is still developing
|
|
|
|
|
and maintaining it. Several other individuals and organizations have
|
|
|
|
|
contributed to GMP in various ways. Here is a list in chronological
|
|
|
|
|
order:
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Gunnar Sjoedin and Hans Riesel helped with mathematical problems in
|
|
|
|
|
early versions of the library.
|
|
|
|
|
|
|
|
|
|
Richard Stallman contributed to the interface design and revised the
|
|
|
|
|
first version of this manual.
|
|
|
|
|
|
|
|
|
|
Brian Beuning and Doug Lea helped with testing of early versions of
|
|
|
|
|
the library and made creative suggestions.
|
|
|
|
|
|
|
|
|
|
John Amanatides of York University in Canada contributed the function
|
2015-06-09 15:33:32 -04:00
|
|
|
|
'mpz_probab_prime_p'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Paul Zimmermann of Inria sparked the development of GMP 2, with his
|
|
|
|
|
comparisons between bignum packages.
|
|
|
|
|
|
|
|
|
|
Ken Weber (Kent State University, Universidade Federal do Rio Grande
|
2015-06-09 15:33:32 -04:00
|
|
|
|
do Sul) contributed 'mpz_gcd', 'mpz_divexact', 'mpn_gcd', and
|
|
|
|
|
'mpn_bdivmod', partially supported by CNPq (Brazil) grant 301314194-2.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Per Bothner of Cygnus Support helped to set up GMP to use Cygnus'
|
|
|
|
|
configure. He has also made valuable suggestions and tested numerous
|
|
|
|
|
intermediary releases.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Joachim Hollman was involved in the design of the 'mpf' interface,
|
|
|
|
|
and in the 'mpz' design revisions for version 2.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Bennet Yee contributed the initial versions of 'mpz_jacobi' and
|
|
|
|
|
'mpz_legendre'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Andreas Schwab contributed the files 'mpn/m68k/lshift.S' and
|
|
|
|
|
'mpn/m68k/rshift.S' (now in '.asm' form).
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The development of floating point functions of GNU MP 2, were
|
|
|
|
|
supported in part by the ESPRIT-BRA (Basic Research Activities) 6846
|
|
|
|
|
project POSSO (POlynomial System SOlving).
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
GNU MP 2 was finished and released by SWOX AB, SWEDEN, in cooperation
|
|
|
|
|
with the IDA Center for Computing Sciences, USA.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Robert Harley of Inria, France and David Seal of ARM, England,
|
|
|
|
|
suggested clever improvements for population count.
|
|
|
|
|
|
|
|
|
|
Robert Harley also wrote highly optimized Karatsuba and 3-way Toom
|
|
|
|
|
multiplication functions for GMP 3. He also contributed the ARM
|
|
|
|
|
assembly code.
|
|
|
|
|
|
|
|
|
|
Torsten Ekedahl of the Mathematical department of Stockholm
|
2015-06-09 15:33:32 -04:00
|
|
|
|
University provided significant inspiration during several phases of the
|
|
|
|
|
GMP development. His mathematical expertise helped improve several
|
2008-04-17 17:03:07 -04:00
|
|
|
|
algorithms.
|
|
|
|
|
|
|
|
|
|
Paul Zimmermann wrote the Divide and Conquer division code, the REDC
|
|
|
|
|
code, the REDC-based mpz_powm code, the FFT multiply code, and the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Karatsuba square root code. He also rewrote the Toom3 code for GMP 4.2.
|
|
|
|
|
The ECMNET project Paul is organizing was a driving force behind many of
|
|
|
|
|
the optimizations in GMP 3.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Linus Nordberg wrote the new configure system based on autoconf and
|
|
|
|
|
implemented the new random functions.
|
|
|
|
|
|
|
|
|
|
Kent Boortz made the Mac OS 9 port.
|
|
|
|
|
|
|
|
|
|
Kevin Ryde worked on a number of things: optimized x86 code, m4 asm
|
|
|
|
|
macros, parameter tuning, speed measuring, the configure system,
|
|
|
|
|
function inlining, divisibility tests, bit scanning, Jacobi symbols,
|
|
|
|
|
Fibonacci and Lucas number functions, printf and scanf functions, perl
|
2015-06-09 15:33:32 -04:00
|
|
|
|
interface, demo expression parser, the algorithms chapter in the manual,
|
|
|
|
|
'gmpasm-mode.el', and various miscellaneous improvements elsewhere.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Steve Root helped write the optimized alpha 21264 assembly code.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Gerardo Ballabio wrote the 'gmpxx.h' C++ class interface and the C++
|
|
|
|
|
'istream' input routines.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
GNU MP 4 was finished and released by Torbjorn Granlund and Kevin
|
|
|
|
|
Ryde. Torbjorn's work was partially funded by the IDA Center for
|
|
|
|
|
Computing Sciences, USA.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Jason Moxham rewrote 'mpz_fac_ui'.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Pedro Gimeno implemented the Mersenne Twister and made other random
|
|
|
|
|
number improvements.
|
|
|
|
|
|
|
|
|
|
(This list is chronological, not ordered after significance. If you
|
2008-07-05 21:31:28 -04:00
|
|
|
|
have contributed to GMP/MPIR but are not listed above, please tell
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://groups.google.com/group/mpir-devel> about the omission!)
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Thanks go to Hans Thorsen for donating an SGI system for the GMP test
|
|
|
|
|
system environment.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2008-07-05 21:31:28 -04:00
|
|
|
|
In 2008 GMP was forked and gave rise to the MPIR (Multiple Precision
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Integers and Rationals) project. In 2010 version 2.0.0 of MPIR switched
|
2010-04-04 23:35:52 -04:00
|
|
|
|
to LGPL v3+ and much code from GMP was again incorporated into MPIR.
|
|
|
|
|
|
|
|
|
|
The MPIR project has largely been a collaboration of William Hart,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Brian Gladman and Jason Moxham. MPIR code not obtained from GMP and not
|
|
|
|
|
specifically mentioned elsewhere below is likely written by one of these
|
|
|
|
|
three.
|
2010-04-04 23:35:52 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
William Hart did much of the early MPIR coding including build system
|
|
|
|
|
fixes. His contributions also include Toom 4 and 7 code and variants,
|
|
|
|
|
extended GCD based on Niels Mollers ngcd work, asymptotically fast
|
|
|
|
|
division code. He does much of the release management work.
|
2010-04-04 23:35:52 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Brian Gladman wrote and maintains MSVC project files. He has also
|
|
|
|
|
done much of the conversion of assembly code to yasm format. He rewrote
|
2010-04-04 23:35:52 -04:00
|
|
|
|
the benchmark program and developed MSVC ports of tune, speed, try and
|
|
|
|
|
the benchmark code. He helped with many aspects of the merging of GMP
|
|
|
|
|
code into MPIR after the switch to LGPL v3+.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Jason Moxham has contributed a great deal of x86 assembly code. He
|
|
|
|
|
has also contributed improved root code and mulhi and mullo routines and
|
|
|
|
|
implemented Peter Montgomery's single limb remainder algorithm. He has
|
|
|
|
|
also contributed a command line build system for Windows and numerous
|
|
|
|
|
build system fixes.
|
2010-04-04 23:35:52 -04:00
|
|
|
|
|
|
|
|
|
The following people have either contributed directly to the MPIR
|
|
|
|
|
project, made code available on their websites or contributed code to
|
|
|
|
|
the official GNU project which has been used in MPIR.
|
2008-07-05 21:31:28 -04:00
|
|
|
|
|
|
|
|
|
Pierrick Gaudry wrote some fast assembly support for AMD 64.
|
|
|
|
|
|
|
|
|
|
Jason Martin wrote some fast assembly patches for Core 2 and
|
2015-06-09 15:33:32 -04:00
|
|
|
|
converted them to intel format. He also did the initial merge of Niels
|
2010-04-04 23:35:52 -04:00
|
|
|
|
Moller's fast GCD patches. He wrote fast addmul functions for Itanium.
|
2008-07-05 21:31:28 -04:00
|
|
|
|
|
|
|
|
|
Gonzalo Tornaria helped patch config.guess and associated files to
|
2015-06-09 15:33:32 -04:00
|
|
|
|
distinguish modern processors. He also patched mpirbench.
|
2008-07-05 21:31:28 -04:00
|
|
|
|
|
|
|
|
|
Michael Abshoff helped resolve some build issues on various
|
2015-06-09 15:33:32 -04:00
|
|
|
|
platforms. He served for a while as release manager for the MPIR
|
2009-10-19 04:15:15 -04:00
|
|
|
|
project.
|
|
|
|
|
|
|
|
|
|
Mariah Lennox contributed patches to mpirbench and various build
|
2015-06-09 15:33:32 -04:00
|
|
|
|
failure reports. She has also reported gcc bugs found during MPIR
|
2010-04-04 23:35:52 -04:00
|
|
|
|
development.
|
2009-10-19 04:15:15 -04:00
|
|
|
|
|
2010-04-04 23:35:52 -04:00
|
|
|
|
Niels Moller wrote the fast ngcd code for computing integer GCD, the
|
2010-04-07 18:38:18 -04:00
|
|
|
|
quadratic Hensel division code and precomputed inverse code for
|
2015-10-03 08:41:18 -04:00
|
|
|
|
Euclidean division, along with fast jacobi symbols code. He also made
|
|
|
|
|
contributions to the Toom multiply code, especially helper functions to
|
|
|
|
|
simplify Toom evaluations.
|
2009-10-19 04:15:15 -04:00
|
|
|
|
|
|
|
|
|
Pierrick Gaudry provided initial AMD 64 assembly support and revised
|
|
|
|
|
the FFT code.
|
|
|
|
|
|
|
|
|
|
Paul Zimmermann provided an mpz implementation of Toom 4, wrote much
|
2015-06-09 15:33:32 -04:00
|
|
|
|
of the FFT code, wrote some of the rootrem code and contributed invert.c
|
|
|
|
|
for computing precomputed inverses.
|
2009-10-19 04:15:15 -04:00
|
|
|
|
|
|
|
|
|
Alexander Kruppa revised the FFT code.
|
|
|
|
|
|
2010-04-04 23:35:52 -04:00
|
|
|
|
Torbjorn Granlund revised the FFT code and wrote a lot of division
|
2015-06-09 15:33:32 -04:00
|
|
|
|
code, including the quadratic Euclidean division code, many parts of the
|
|
|
|
|
divide and conquer division code, both Hensel and Euclidean, and his
|
|
|
|
|
code was also reused for parts of the asymptotically fast division code.
|
|
|
|
|
He also helped write the root code and wrote much of the Itanium
|
2010-04-04 23:35:52 -04:00
|
|
|
|
assembly code and a couple of Core 2 assembly functions and part of the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
basecase middle product assembly code for x86 64 bit. He also wrote the
|
2010-04-04 23:35:52 -04:00
|
|
|
|
improved string input and output code and made improvements to the GCD
|
2015-10-03 08:41:18 -04:00
|
|
|
|
and extended GCD code. He also contributed the nextprime code and
|
|
|
|
|
coauthored the bin_uiui code. Torbjorn is also responsible for numerous
|
|
|
|
|
other bits and pieces that have been used from the GNU project.
|
2010-04-04 23:35:52 -04:00
|
|
|
|
|
|
|
|
|
Marco Bodrato and Alberto Zanoni suggested the unbalanced multiply
|
|
|
|
|
strategy and found optimal Toom multiplication sequences.
|
2009-10-19 04:15:15 -04:00
|
|
|
|
|
2010-04-04 23:35:52 -04:00
|
|
|
|
Marco Bodrato wrote an mpz implementation of the Toom 7 code and
|
2015-06-09 15:33:32 -04:00
|
|
|
|
wrote most of the Toom 8.5 multiply and squaring code. He also helped
|
2015-10-03 08:41:18 -04:00
|
|
|
|
write the divide and conquer Euclidean division code. He also
|
|
|
|
|
contributed many improved number theoretical functions including
|
|
|
|
|
factorial, multi-factorial, primorial, n-choose-k.
|
2009-10-19 04:15:15 -04:00
|
|
|
|
|
|
|
|
|
Robert Gerbicz contributed fast factorial code.
|
|
|
|
|
|
2015-10-03 08:41:18 -04:00
|
|
|
|
Martin Boij made assorted contributions to the nextprime code.
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
David Harvey wrote fast middle product code and divide and conquer
|
2010-04-04 23:35:52 -04:00
|
|
|
|
approximate quotient code for both Euclidean and Hensel division and
|
|
|
|
|
contributed to the quadratic Hensel code.
|
2009-10-19 04:15:15 -04:00
|
|
|
|
|
|
|
|
|
T. R. Nicely wrote primality tests used in the benchmark code.
|
|
|
|
|
|
|
|
|
|
Jeff Gilchrist assisted with the porting of T. R. Nicely's primality
|
2012-10-25 18:17:55 -04:00
|
|
|
|
code to MPIR and helped with tuning.
|
2008-07-05 21:31:28 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
Peter Shrimpton wrote the BPSW primality test used up to
|
|
|
|
|
GMP_LIMB_BITS.
|
|
|
|
|
|
2010-04-04 23:35:52 -04:00
|
|
|
|
Thanks to Microsoft for supporting Jason Moxham to work on a command
|
|
|
|
|
line build system for Windows and some assembly improvements for
|
|
|
|
|
Windows.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Thanks to the Free Software Foundation France for giving us access to
|
|
|
|
|
their build farm.
|
2010-04-04 23:35:52 -04:00
|
|
|
|
|
|
|
|
|
Thanks to William Stein for giving us access to his sage.math
|
|
|
|
|
machines for testing and for hosting the MPIR website, and for
|
|
|
|
|
supporting us in inumerably many other ways.
|
|
|
|
|
|
2010-06-08 18:54:27 -04:00
|
|
|
|
Minh Van Nguyen served as release manager for MPIR 2.1.0.
|
|
|
|
|
|
2012-10-03 12:55:33 -04:00
|
|
|
|
Case Vanhorsen helped with release testing.
|
|
|
|
|
|
|
|
|
|
David Cleaver filed a bug report.
|
|
|
|
|
|
2012-10-25 18:17:55 -04:00
|
|
|
|
Julien Puydt provided tuning values.
|
|
|
|
|
|
|
|
|
|
Leif Lionhardy provided tuning values.
|
|
|
|
|
|
|
|
|
|
Jean-Pierre Flori provided tuning values.
|
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: References, Next: GNU Free Documentation License, Prev: Contributors, Up: Top
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2014-02-04 08:24:05 -05:00
|
|
|
|
Appendix B References
|
|
|
|
|
*********************
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2008-06-16 00:39:47 -04:00
|
|
|
|
B.1 Books
|
|
|
|
|
=========
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Jonathan M. Borwein and Peter B. Borwein, "Pi and the AGM: A Study
|
|
|
|
|
in Analytic Number Theory and Computational Complexity", Wiley,
|
|
|
|
|
1998.
|
|
|
|
|
|
|
|
|
|
* Henri Cohen, "A Course in Computational Algebraic Number Theory",
|
|
|
|
|
Graduate Texts in Mathematics number 138, Springer-Verlag, 1993.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www.math.u-bordeaux.fr/~cohen/>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
* Richard Crandall, Carl Pomerance, "Prime Numbers: A Computational
|
|
|
|
|
Perspective" 2nd edition, Springer, 2005.
|
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Donald E. Knuth, "The Art of Computer Programming", volume 2,
|
|
|
|
|
"Seminumerical Algorithms", 3rd edition, Addison-Wesley, 1998.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www-cs-faculty.stanford.edu/~knuth/taocp.html>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* John D. Lipson, "Elements of Algebra and Algebraic Computing", The
|
|
|
|
|
Benjamin Cummings Publishing Company Inc, 1981.
|
|
|
|
|
|
|
|
|
|
* Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone,
|
|
|
|
|
"Handbook of Applied Cryptography",
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www.cacr.math.uwaterloo.ca/hac/>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Richard M. Stallman, "Using and Porting GCC", Free Software
|
|
|
|
|
Foundation, 1999, available online
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://gcc.gnu.org/onlinedocs/>, and in the GCC package
|
|
|
|
|
<ftp://ftp.gnu.org/gnu/gcc/>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2008-06-16 00:39:47 -04:00
|
|
|
|
B.2 Papers
|
|
|
|
|
==========
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
* Dan Bernstein, "Detecting perfect powers in essentially linear
|
2015-06-09 15:33:32 -04:00
|
|
|
|
time", Math. Comp. (67) pp. 1253-1283, 1998.
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Yves Bertot, Nicolas Magaud and Paul Zimmermann, "A Proof of GMP
|
|
|
|
|
Square Root", Journal of Automated Reasoning, volume 29, 2002, pp.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
225-252. Also available online as INRIA Research Report 4475, June
|
|
|
|
|
2001, <http://www.inria.fr/rrrt/rr-4475.html>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
* Marco Bodrato, Alberto Zanoni, "Integer and Polynomial
|
|
|
|
|
Multiplication: Towards optimal Toom-Cook Matrices", ISAAC 2007
|
|
|
|
|
Proceedings, Ontario, Canada, July 29 - August 1, 2007, ACM Press.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Available online at <http://ln.bodrato.it/issac2007_pdf>
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
* Marco Bodrato, "High degree Toom'n'half for balanced and unbalanced
|
|
|
|
|
multiplication", E. Antelo, D. Hough and P. Ienne, editors,
|
|
|
|
|
Proceedings of the 20th IEEE Symposium on Computer Arithmetic,
|
|
|
|
|
IEEE, Tubingen, Germany, July 25-27, 2011, pp. 15-222. See
|
|
|
|
|
<http://bodrato.it/papers>
|
2012-10-03 17:22:07 -04:00
|
|
|
|
|
2010-01-16 18:52:29 -05:00
|
|
|
|
* Richard Brent and Paul Zimmermann, "Modern Computer Arithmetic",
|
2010-01-16 18:32:43 -05:00
|
|
|
|
version 0.4, November 2009,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www.loria.fr/~zimmerma/mca/mca-0.4.pdf>
|
2010-01-16 18:32:43 -05:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division",
|
|
|
|
|
Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
* Agner Fog, "Software optimization resources", online at
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www.agner.org/optimize/>
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
|
|
|
|
* Pierrick Gaudry, Alexander Kruppa, Paul Zimmermann, "A GMP-based
|
|
|
|
|
implementation of Schoenhage-Strassen's large integer
|
2015-06-09 15:33:32 -04:00
|
|
|
|
multiplication algorithm", ISAAC 2007 Proceedings, Ontario, Canada,
|
|
|
|
|
July 29 - August 1, 2007, pp. 167-174, ACM Press. Full text
|
|
|
|
|
available at
|
|
|
|
|
<http://hal.inria.fr/docs/00/14/86/20/PDF/fft.final.pdf>
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
|
|
|
|
|
Integers using Multiplication", in Proceedings of the SIGPLAN
|
|
|
|
|
PLDI'94 Conference, June 1994. Also available
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz> (and .psl.gz).
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2010-04-04 23:53:49 -04:00
|
|
|
|
* Niels Mo"ller and Torbjo"rn Granlund, "Improved division by
|
|
|
|
|
invariant integers", to appear.
|
|
|
|
|
|
|
|
|
|
* Torbjo"rn Granlund and Niels Mo"ller, "Division of integers large
|
|
|
|
|
and small", to appear.
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
* David Harvey, "The Karatsuba middle product for integers",
|
2015-06-09 15:33:32 -04:00
|
|
|
|
(preprint), 2009. Available at
|
|
|
|
|
<http://www.cims.nyu.edu/~harvey/mulmid/mulmid.pdf>
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Tudor Jebelean, "An algorithm for exact division", Journal of
|
|
|
|
|
Symbolic Computation, volume 15, 1993, pp. 169-180. Research
|
|
|
|
|
report version available
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Tudor Jebelean, "Exact Division with Karatsuba Complexity -
|
|
|
|
|
Extended Abstract", RISC-Linz technical report 96-31,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Tudor Jebelean, "Practical Integer Division with Karatsuba
|
|
|
|
|
Complexity", ISSAC 97, pp. 339-341. Technical report available
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Tudor Jebelean, "A Generalization of the Binary GCD Algorithm",
|
|
|
|
|
ISSAC 93, pp. 111-116. Technical report version available
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
* Tudor Jebelean, "A Double-Digit Lehmer-Euclid Algorithm for Finding
|
|
|
|
|
the GCD of Long Integers", Journal of Symbolic Computation, volume
|
|
|
|
|
19, 1995, pp. 145-157. Technical report version also available
|
|
|
|
|
<ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
* Werner Krandick, Jeremy R. Johnson, "Efficient Multiprecision
|
|
|
|
|
Floating Point Multiplication with Exact Rounding", Technical
|
|
|
|
|
Report, RISC Linz, 1993, available at
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-76.ps.gz>
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Werner Krandick and Tudor Jebelean, "Bidirectional Exact Integer
|
|
|
|
|
Division", Journal of Symbolic Computation, volume 21, 1996, pp.
|
|
|
|
|
441-455. Early technical report version also available
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Makoto Matsumoto and Takuji Nishimura, "Mersenne Twister: A
|
|
|
|
|
623-dimensionally equidistributed uniform pseudorandom number
|
|
|
|
|
generator", ACM Transactions on Modelling and Computer Simulation,
|
|
|
|
|
volume 8, January 1998, pp. 3-30. Available online
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www.math.keio.ac.jp/~nisimura/random/doc/mt.ps.gz> (or
|
2008-04-17 17:03:07 -04:00
|
|
|
|
.pdf)
|
|
|
|
|
|
|
|
|
|
* R. Moenck and A. Borodin, "Fast Modular Transforms via Division",
|
|
|
|
|
Proceedings of the 13th Annual IEEE Symposium on Switching and
|
|
|
|
|
Automata Theory, October 1972, pp. 90-96. Reprinted as "Fast
|
|
|
|
|
Modular Transforms", Journal of Computer and System Sciences,
|
|
|
|
|
volume 8, number 3, June 1974, pp. 366-386.
|
|
|
|
|
|
2010-04-04 16:17:21 -04:00
|
|
|
|
* Niels Mo"ller, "On Schoenhage's algorithm and subquadratic integer
|
2015-06-09 15:33:32 -04:00
|
|
|
|
GCD computation", Math. Comp. 2007. Available online at
|
|
|
|
|
<http://www.lysator.liu.se/~nisse/archive/S0025-5718-07-02017-0.pdf>
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Peter L. Montgomery, "Modular Multiplication Without Trial
|
|
|
|
|
Division", in Mathematics of Computation, volume 44, number 170,
|
|
|
|
|
April 1985.
|
|
|
|
|
|
2009-12-11 13:56:41 -05:00
|
|
|
|
* Thom Mulders, "On short multiplications and divisions", Appl.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Algebra Engrg. Comm. Comput. 11 (2000), no. 1, pp. 69-88.
|
|
|
|
|
Tech. report No. 276, Dept. of Comp. Sci., ETH Zurich, Nov
|
|
|
|
|
1997, available online at
|
|
|
|
|
<ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/2xx/276.pdf>
|
2009-12-11 13:56:41 -05:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Arnold Scho"nhage and Volker Strassen, "Schnelle Multiplikation
|
|
|
|
|
grosser Zahlen", Computing 7, 1971, pp. 281-292.
|
|
|
|
|
|
2010-04-04 16:17:21 -04:00
|
|
|
|
* A. Scho"nhage, A. F. W. Grotefeld and E. Vetter, "Fast Algorithms,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
A Multitape Turing Machine Implementation" BI Wissenschafts-Verlag,
|
|
|
|
|
Mannheim, 1994.
|
2010-04-04 16:17:21 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
* Kenneth Weber, "The accelerated integer GCD algorithm", ACM
|
|
|
|
|
Transactions on Mathematical Software, volume 21, number 1, March
|
|
|
|
|
1995, pp. 111-122.
|
|
|
|
|
|
|
|
|
|
* Paul Zimmermann, "Karatsuba Square Root", INRIA Research Report
|
2015-06-09 15:33:32 -04:00
|
|
|
|
3805, November 1999, <http://www.inria.fr/rrrt/rr-3805.html>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Paul Zimmermann, "A Proof of GMP Fast Division and Square Root
|
|
|
|
|
Implementations",
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz>
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
* Dan Zuras, "On Squaring and Multiplying Large Integers", ARITH-11:
|
|
|
|
|
IEEE Symposium on Computer Arithmetic, 1993, pp. 260 to 271.
|
|
|
|
|
Reprinted as "More on Multiplying and Squaring Large Integers",
|
|
|
|
|
IEEE Transactions on Computers, volume 43, number 8, August 1994,
|
|
|
|
|
pp. 899-908.
|
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: GNU Free Documentation License, Next: Concept Index, Prev: References, Up: Top
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2014-02-04 08:24:05 -05:00
|
|
|
|
Appendix C GNU Free Documentation License
|
|
|
|
|
*****************************************
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2010-04-04 13:44:45 -04:00
|
|
|
|
Version 1.3, 3 November 2008
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2010-04-04 13:44:45 -04:00
|
|
|
|
Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://fsf.org/>
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
Everyone is permitted to copy and distribute verbatim copies
|
|
|
|
|
of this license document, but changing it is not allowed.
|
|
|
|
|
|
|
|
|
|
0. PREAMBLE
|
|
|
|
|
|
|
|
|
|
The purpose of this License is to make a manual, textbook, or other
|
|
|
|
|
functional and useful document "free" in the sense of freedom: to
|
|
|
|
|
assure everyone the effective freedom to copy and redistribute it,
|
|
|
|
|
with or without modifying it, either commercially or
|
|
|
|
|
noncommercially. Secondarily, this License preserves for the
|
|
|
|
|
author and publisher a way to get credit for their work, while not
|
|
|
|
|
being considered responsible for modifications made by others.
|
|
|
|
|
|
|
|
|
|
This License is a kind of "copyleft", which means that derivative
|
|
|
|
|
works of the document must themselves be free in the same sense.
|
|
|
|
|
It complements the GNU General Public License, which is a copyleft
|
|
|
|
|
license designed for free software.
|
|
|
|
|
|
|
|
|
|
We have designed this License in order to use it for manuals for
|
|
|
|
|
free software, because free software needs free documentation: a
|
|
|
|
|
free program should come with manuals providing the same freedoms
|
|
|
|
|
that the software does. But this License is not limited to
|
|
|
|
|
software manuals; it can be used for any textual work, regardless
|
2015-06-09 15:33:32 -04:00
|
|
|
|
of subject matter or whether it is published as a printed book. We
|
|
|
|
|
recommend this License principally for works whose purpose is
|
2008-04-17 17:03:07 -04:00
|
|
|
|
instruction or reference.
|
|
|
|
|
|
|
|
|
|
1. APPLICABILITY AND DEFINITIONS
|
|
|
|
|
|
|
|
|
|
This License applies to any manual or other work, in any medium,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
that contains a notice placed by the copyright holder saying it can
|
|
|
|
|
be distributed under the terms of this License. Such a notice
|
2008-04-17 17:03:07 -04:00
|
|
|
|
grants a world-wide, royalty-free license, unlimited in duration,
|
|
|
|
|
to use that work under the conditions stated herein. The
|
|
|
|
|
"Document", below, refers to any such manual or work. Any member
|
2015-06-09 15:33:32 -04:00
|
|
|
|
of the public is a licensee, and is addressed as "you". You accept
|
|
|
|
|
the license if you copy, modify or distribute the work in a way
|
|
|
|
|
requiring permission under copyright law.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
A "Modified Version" of the Document means any work containing the
|
|
|
|
|
Document or a portion of it, either copied verbatim, or with
|
|
|
|
|
modifications and/or translated into another language.
|
|
|
|
|
|
|
|
|
|
A "Secondary Section" is a named appendix or a front-matter section
|
|
|
|
|
of the Document that deals exclusively with the relationship of the
|
|
|
|
|
publishers or authors of the Document to the Document's overall
|
|
|
|
|
subject (or to related matters) and contains nothing that could
|
|
|
|
|
fall directly within that overall subject. (Thus, if the Document
|
|
|
|
|
is in part a textbook of mathematics, a Secondary Section may not
|
|
|
|
|
explain any mathematics.) The relationship could be a matter of
|
|
|
|
|
historical connection with the subject or with related matters, or
|
|
|
|
|
of legal, commercial, philosophical, ethical or political position
|
|
|
|
|
regarding them.
|
|
|
|
|
|
|
|
|
|
The "Invariant Sections" are certain Secondary Sections whose
|
2015-06-09 15:33:32 -04:00
|
|
|
|
titles are designated, as being those of Invariant Sections, in the
|
|
|
|
|
notice that says that the Document is released under this License.
|
|
|
|
|
If a section does not fit the above definition of Secondary then it
|
|
|
|
|
is not allowed to be designated as Invariant. The Document may
|
|
|
|
|
contain zero Invariant Sections. If the Document does not identify
|
|
|
|
|
any Invariant Sections then there are none.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The "Cover Texts" are certain short passages of text that are
|
|
|
|
|
listed, as Front-Cover Texts or Back-Cover Texts, in the notice
|
|
|
|
|
that says that the Document is released under this License. A
|
|
|
|
|
Front-Cover Text may be at most 5 words, and a Back-Cover Text may
|
|
|
|
|
be at most 25 words.
|
|
|
|
|
|
|
|
|
|
A "Transparent" copy of the Document means a machine-readable copy,
|
|
|
|
|
represented in a format whose specification is available to the
|
|
|
|
|
general public, that is suitable for revising the document
|
2015-06-09 15:33:32 -04:00
|
|
|
|
straightforwardly with generic text editors or (for images composed
|
|
|
|
|
of pixels) generic paint programs or (for drawings) some widely
|
|
|
|
|
available drawing editor, and that is suitable for input to text
|
|
|
|
|
formatters or for automatic translation to a variety of formats
|
|
|
|
|
suitable for input to text formatters. A copy made in an otherwise
|
|
|
|
|
Transparent file format whose markup, or absence of markup, has
|
|
|
|
|
been arranged to thwart or discourage subsequent modification by
|
|
|
|
|
readers is not Transparent. An image format is not Transparent if
|
|
|
|
|
used for any substantial amount of text. A copy that is not
|
|
|
|
|
"Transparent" is called "Opaque".
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Examples of suitable formats for Transparent copies include plain
|
|
|
|
|
ASCII without markup, Texinfo input format, LaTeX input format,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
SGML or XML using a publicly available DTD, and standard-conforming
|
|
|
|
|
simple HTML, PostScript or PDF designed for human modification.
|
|
|
|
|
Examples of transparent image formats include PNG, XCF and JPG.
|
|
|
|
|
Opaque formats include proprietary formats that can be read and
|
|
|
|
|
edited only by proprietary word processors, SGML or XML for which
|
|
|
|
|
the DTD and/or processing tools are not generally available, and
|
|
|
|
|
the machine-generated HTML, PostScript or PDF produced by some word
|
|
|
|
|
processors for output purposes only.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The "Title Page" means, for a printed book, the title page itself,
|
|
|
|
|
plus such following pages as are needed to hold, legibly, the
|
|
|
|
|
material this License requires to appear in the title page. For
|
|
|
|
|
works in formats which do not have any title page as such, "Title
|
|
|
|
|
Page" means the text near the most prominent appearance of the
|
|
|
|
|
work's title, preceding the beginning of the body of the text.
|
|
|
|
|
|
2010-04-04 13:44:45 -04:00
|
|
|
|
The "publisher" means any person or entity that distributes copies
|
|
|
|
|
of the Document to the public.
|
|
|
|
|
|
2008-04-17 17:03:07 -04:00
|
|
|
|
A section "Entitled XYZ" means a named subunit of the Document
|
|
|
|
|
whose title either is precisely XYZ or contains XYZ in parentheses
|
|
|
|
|
following text that translates XYZ in another language. (Here XYZ
|
|
|
|
|
stands for a specific section name mentioned below, such as
|
|
|
|
|
"Acknowledgements", "Dedications", "Endorsements", or "History".)
|
|
|
|
|
To "Preserve the Title" of such a section when you modify the
|
|
|
|
|
Document means that it remains a section "Entitled XYZ" according
|
|
|
|
|
to this definition.
|
|
|
|
|
|
|
|
|
|
The Document may include Warranty Disclaimers next to the notice
|
|
|
|
|
which states that this License applies to the Document. These
|
|
|
|
|
Warranty Disclaimers are considered to be included by reference in
|
|
|
|
|
this License, but only as regards disclaiming warranties: any other
|
|
|
|
|
implication that these Warranty Disclaimers may have is void and
|
|
|
|
|
has no effect on the meaning of this License.
|
|
|
|
|
|
|
|
|
|
2. VERBATIM COPYING
|
|
|
|
|
|
|
|
|
|
You may copy and distribute the Document in any medium, either
|
|
|
|
|
commercially or noncommercially, provided that this License, the
|
|
|
|
|
copyright notices, and the license notice saying this License
|
|
|
|
|
applies to the Document are reproduced in all copies, and that you
|
|
|
|
|
add no other conditions whatsoever to those of this License. You
|
|
|
|
|
may not use technical measures to obstruct or control the reading
|
|
|
|
|
or further copying of the copies you make or distribute. However,
|
|
|
|
|
you may accept compensation in exchange for copies. If you
|
2015-06-09 15:33:32 -04:00
|
|
|
|
distribute a large enough number of copies you must also follow the
|
|
|
|
|
conditions in section 3.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
You may also lend copies, under the same conditions stated above,
|
|
|
|
|
and you may publicly display copies.
|
|
|
|
|
|
|
|
|
|
3. COPYING IN QUANTITY
|
|
|
|
|
|
|
|
|
|
If you publish printed copies (or copies in media that commonly
|
|
|
|
|
have printed covers) of the Document, numbering more than 100, and
|
|
|
|
|
the Document's license notice requires Cover Texts, you must
|
|
|
|
|
enclose the copies in covers that carry, clearly and legibly, all
|
|
|
|
|
these Cover Texts: Front-Cover Texts on the front cover, and
|
|
|
|
|
Back-Cover Texts on the back cover. Both covers must also clearly
|
|
|
|
|
and legibly identify you as the publisher of these copies. The
|
2015-06-09 15:33:32 -04:00
|
|
|
|
front cover must present the full title with all words of the title
|
|
|
|
|
equally prominent and visible. You may add other material on the
|
|
|
|
|
covers in addition. Copying with changes limited to the covers, as
|
|
|
|
|
long as they preserve the title of the Document and satisfy these
|
|
|
|
|
conditions, can be treated as verbatim copying in other respects.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
If the required texts for either cover are too voluminous to fit
|
|
|
|
|
legibly, you should put the first ones listed (as many as fit
|
|
|
|
|
reasonably) on the actual cover, and continue the rest onto
|
|
|
|
|
adjacent pages.
|
|
|
|
|
|
|
|
|
|
If you publish or distribute Opaque copies of the Document
|
2015-06-09 15:33:32 -04:00
|
|
|
|
numbering more than 100, you must either include a machine-readable
|
|
|
|
|
Transparent copy along with each Opaque copy, or state in or with
|
|
|
|
|
each Opaque copy a computer-network location from which the general
|
|
|
|
|
network-using public has access to download using public-standard
|
|
|
|
|
network protocols a complete Transparent copy of the Document, free
|
|
|
|
|
of added material. If you use the latter option, you must take
|
|
|
|
|
reasonably prudent steps, when you begin distribution of Opaque
|
|
|
|
|
copies in quantity, to ensure that this Transparent copy will
|
|
|
|
|
remain thus accessible at the stated location until at least one
|
|
|
|
|
year after the last time you distribute an Opaque copy (directly or
|
|
|
|
|
through your agents or retailers) of that edition to the public.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
It is requested, but not required, that you contact the authors of
|
2015-06-09 15:33:32 -04:00
|
|
|
|
the Document well before redistributing any large number of copies,
|
|
|
|
|
to give them a chance to provide you with an updated version of the
|
|
|
|
|
Document.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
4. MODIFICATIONS
|
|
|
|
|
|
|
|
|
|
You may copy and distribute a Modified Version of the Document
|
|
|
|
|
under the conditions of sections 2 and 3 above, provided that you
|
2015-06-09 15:33:32 -04:00
|
|
|
|
release the Modified Version under precisely this License, with the
|
|
|
|
|
Modified Version filling the role of the Document, thus licensing
|
|
|
|
|
distribution and modification of the Modified Version to whoever
|
|
|
|
|
possesses a copy of it. In addition, you must do these things in
|
|
|
|
|
the Modified Version:
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
A. Use in the Title Page (and on the covers, if any) a title
|
2015-06-09 15:33:32 -04:00
|
|
|
|
distinct from that of the Document, and from those of previous
|
|
|
|
|
versions (which should, if there were any, be listed in the
|
|
|
|
|
History section of the Document). You may use the same title
|
|
|
|
|
as a previous version if the original publisher of that
|
|
|
|
|
version gives permission.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
B. List on the Title Page, as authors, one or more persons or
|
|
|
|
|
entities responsible for authorship of the modifications in
|
|
|
|
|
the Modified Version, together with at least five of the
|
|
|
|
|
principal authors of the Document (all of its principal
|
|
|
|
|
authors, if it has fewer than five), unless they release you
|
|
|
|
|
from this requirement.
|
|
|
|
|
|
|
|
|
|
C. State on the Title page the name of the publisher of the
|
|
|
|
|
Modified Version, as the publisher.
|
|
|
|
|
|
|
|
|
|
D. Preserve all the copyright notices of the Document.
|
|
|
|
|
|
|
|
|
|
E. Add an appropriate copyright notice for your modifications
|
|
|
|
|
adjacent to the other copyright notices.
|
|
|
|
|
|
|
|
|
|
F. Include, immediately after the copyright notices, a license
|
|
|
|
|
notice giving the public permission to use the Modified
|
|
|
|
|
Version under the terms of this License, in the form shown in
|
|
|
|
|
the Addendum below.
|
|
|
|
|
|
|
|
|
|
G. Preserve in that license notice the full lists of Invariant
|
|
|
|
|
Sections and required Cover Texts given in the Document's
|
|
|
|
|
license notice.
|
|
|
|
|
|
|
|
|
|
H. Include an unaltered copy of this License.
|
|
|
|
|
|
|
|
|
|
I. Preserve the section Entitled "History", Preserve its Title,
|
|
|
|
|
and add to it an item stating at least the title, year, new
|
2015-06-09 15:33:32 -04:00
|
|
|
|
authors, and publisher of the Modified Version as given on the
|
|
|
|
|
Title Page. If there is no section Entitled "History" in the
|
|
|
|
|
Document, create one stating the title, year, authors, and
|
|
|
|
|
publisher of the Document as given on its Title Page, then add
|
|
|
|
|
an item describing the Modified Version as stated in the
|
|
|
|
|
previous sentence.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
J. Preserve the network location, if any, given in the Document
|
|
|
|
|
for public access to a Transparent copy of the Document, and
|
|
|
|
|
likewise the network locations given in the Document for
|
2015-06-09 15:33:32 -04:00
|
|
|
|
previous versions it was based on. These may be placed in the
|
|
|
|
|
"History" section. You may omit a network location for a work
|
|
|
|
|
that was published at least four years before the Document
|
|
|
|
|
itself, or if the original publisher of the version it refers
|
|
|
|
|
to gives permission.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
K. For any section Entitled "Acknowledgements" or "Dedications",
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Preserve the Title of the section, and preserve in the section
|
|
|
|
|
all the substance and tone of each of the contributor
|
2008-04-17 17:03:07 -04:00
|
|
|
|
acknowledgements and/or dedications given therein.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
L. Preserve all the Invariant Sections of the Document, unaltered
|
|
|
|
|
in their text and in their titles. Section numbers or the
|
|
|
|
|
equivalent are not considered part of the section titles.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
M. Delete any section Entitled "Endorsements". Such a section
|
|
|
|
|
may not be included in the Modified Version.
|
|
|
|
|
|
|
|
|
|
N. Do not retitle any existing section to be Entitled
|
|
|
|
|
"Endorsements" or to conflict in title with any Invariant
|
|
|
|
|
Section.
|
|
|
|
|
|
|
|
|
|
O. Preserve any Warranty Disclaimers.
|
|
|
|
|
|
|
|
|
|
If the Modified Version includes new front-matter sections or
|
|
|
|
|
appendices that qualify as Secondary Sections and contain no
|
2015-06-09 15:33:32 -04:00
|
|
|
|
material copied from the Document, you may at your option designate
|
|
|
|
|
some or all of these sections as invariant. To do this, add their
|
|
|
|
|
titles to the list of Invariant Sections in the Modified Version's
|
|
|
|
|
license notice. These titles must be distinct from any other
|
|
|
|
|
section titles.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
You may add a section Entitled "Endorsements", provided it contains
|
|
|
|
|
nothing but endorsements of your Modified Version by various
|
|
|
|
|
parties--for example, statements of peer review or that the text
|
|
|
|
|
has been approved by an organization as the authoritative
|
|
|
|
|
definition of a standard.
|
|
|
|
|
|
|
|
|
|
You may add a passage of up to five words as a Front-Cover Text,
|
2015-06-09 15:33:32 -04:00
|
|
|
|
and a passage of up to 25 words as a Back-Cover Text, to the end of
|
|
|
|
|
the list of Cover Texts in the Modified Version. Only one passage
|
|
|
|
|
of Front-Cover Text and one of Back-Cover Text may be added by (or
|
|
|
|
|
through arrangements made by) any one entity. If the Document
|
|
|
|
|
already includes a cover text for the same cover, previously added
|
|
|
|
|
by you or by arrangement made by the same entity you are acting on
|
|
|
|
|
behalf of, you may not add another; but you may replace the old
|
|
|
|
|
one, on explicit permission from the previous publisher that added
|
|
|
|
|
the old one.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The author(s) and publisher(s) of the Document do not by this
|
|
|
|
|
License give permission to use their names for publicity for or to
|
|
|
|
|
assert or imply endorsement of any Modified Version.
|
|
|
|
|
|
|
|
|
|
5. COMBINING DOCUMENTS
|
|
|
|
|
|
|
|
|
|
You may combine the Document with other documents released under
|
|
|
|
|
this License, under the terms defined in section 4 above for
|
2015-06-09 15:33:32 -04:00
|
|
|
|
modified versions, provided that you include in the combination all
|
|
|
|
|
of the Invariant Sections of all of the original documents,
|
2008-04-17 17:03:07 -04:00
|
|
|
|
unmodified, and list them all as Invariant Sections of your
|
|
|
|
|
combined work in its license notice, and that you preserve all
|
|
|
|
|
their Warranty Disclaimers.
|
|
|
|
|
|
|
|
|
|
The combined work need only contain one copy of this License, and
|
|
|
|
|
multiple identical Invariant Sections may be replaced with a single
|
|
|
|
|
copy. If there are multiple Invariant Sections with the same name
|
|
|
|
|
but different contents, make the title of each such section unique
|
|
|
|
|
by adding at the end of it, in parentheses, the name of the
|
|
|
|
|
original author or publisher of that section if known, or else a
|
|
|
|
|
unique number. Make the same adjustment to the section titles in
|
|
|
|
|
the list of Invariant Sections in the license notice of the
|
|
|
|
|
combined work.
|
|
|
|
|
|
|
|
|
|
In the combination, you must combine any sections Entitled
|
|
|
|
|
"History" in the various original documents, forming one section
|
|
|
|
|
Entitled "History"; likewise combine any sections Entitled
|
|
|
|
|
"Acknowledgements", and any sections Entitled "Dedications". You
|
|
|
|
|
must delete all sections Entitled "Endorsements."
|
|
|
|
|
|
|
|
|
|
6. COLLECTIONS OF DOCUMENTS
|
|
|
|
|
|
|
|
|
|
You may make a collection consisting of the Document and other
|
|
|
|
|
documents released under this License, and replace the individual
|
|
|
|
|
copies of this License in the various documents with a single copy
|
|
|
|
|
that is included in the collection, provided that you follow the
|
2015-06-09 15:33:32 -04:00
|
|
|
|
rules of this License for verbatim copying of each of the documents
|
|
|
|
|
in all other respects.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
You may extract a single document from such a collection, and
|
|
|
|
|
distribute it individually under this License, provided you insert
|
2015-06-09 15:33:32 -04:00
|
|
|
|
a copy of this License into the extracted document, and follow this
|
|
|
|
|
License in all other respects regarding verbatim copying of that
|
|
|
|
|
document.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
7. AGGREGATION WITH INDEPENDENT WORKS
|
|
|
|
|
|
|
|
|
|
A compilation of the Document or its derivatives with other
|
2015-06-09 15:33:32 -04:00
|
|
|
|
separate and independent documents or works, in or on a volume of a
|
|
|
|
|
storage or distribution medium, is called an "aggregate" if the
|
2008-04-17 17:03:07 -04:00
|
|
|
|
copyright resulting from the compilation is not used to limit the
|
|
|
|
|
legal rights of the compilation's users beyond what the individual
|
|
|
|
|
works permit. When the Document is included in an aggregate, this
|
|
|
|
|
License does not apply to the other works in the aggregate which
|
|
|
|
|
are not themselves derivative works of the Document.
|
|
|
|
|
|
|
|
|
|
If the Cover Text requirement of section 3 is applicable to these
|
|
|
|
|
copies of the Document, then if the Document is less than one half
|
|
|
|
|
of the entire aggregate, the Document's Cover Texts may be placed
|
|
|
|
|
on covers that bracket the Document within the aggregate, or the
|
|
|
|
|
electronic equivalent of covers if the Document is in electronic
|
|
|
|
|
form. Otherwise they must appear on printed covers that bracket
|
|
|
|
|
the whole aggregate.
|
|
|
|
|
|
|
|
|
|
8. TRANSLATION
|
|
|
|
|
|
|
|
|
|
Translation is considered a kind of modification, so you may
|
|
|
|
|
distribute translations of the Document under the terms of section
|
|
|
|
|
4. Replacing Invariant Sections with translations requires special
|
|
|
|
|
permission from their copyright holders, but you may include
|
|
|
|
|
translations of some or all Invariant Sections in addition to the
|
|
|
|
|
original versions of these Invariant Sections. You may include a
|
|
|
|
|
translation of this License, and all the license notices in the
|
|
|
|
|
Document, and any Warranty Disclaimers, provided that you also
|
|
|
|
|
include the original English version of this License and the
|
|
|
|
|
original versions of those notices and disclaimers. In case of a
|
|
|
|
|
disagreement between the translation and the original version of
|
|
|
|
|
this License or a notice or disclaimer, the original version will
|
|
|
|
|
prevail.
|
|
|
|
|
|
|
|
|
|
If a section in the Document is Entitled "Acknowledgements",
|
|
|
|
|
"Dedications", or "History", the requirement (section 4) to
|
|
|
|
|
Preserve its Title (section 1) will typically require changing the
|
|
|
|
|
actual title.
|
|
|
|
|
|
|
|
|
|
9. TERMINATION
|
|
|
|
|
|
|
|
|
|
You may not copy, modify, sublicense, or distribute the Document
|
2010-04-04 13:44:45 -04:00
|
|
|
|
except as expressly provided under this License. Any attempt
|
|
|
|
|
otherwise to copy, modify, sublicense, or distribute it is void,
|
|
|
|
|
and will automatically terminate your rights under this License.
|
|
|
|
|
|
|
|
|
|
However, if you cease all violation of this License, then your
|
|
|
|
|
license from a particular copyright holder is reinstated (a)
|
2015-06-09 15:33:32 -04:00
|
|
|
|
provisionally, unless and until the copyright holder explicitly and
|
|
|
|
|
finally terminates your license, and (b) permanently, if the
|
2010-04-04 13:44:45 -04:00
|
|
|
|
copyright holder fails to notify you of the violation by some
|
|
|
|
|
reasonable means prior to 60 days after the cessation.
|
|
|
|
|
|
|
|
|
|
Moreover, your license from a particular copyright holder is
|
|
|
|
|
reinstated permanently if the copyright holder notifies you of the
|
|
|
|
|
violation by some reasonable means, this is the first time you have
|
|
|
|
|
received notice of violation of this License (for any work) from
|
|
|
|
|
that copyright holder, and you cure the violation prior to 30 days
|
|
|
|
|
after your receipt of the notice.
|
|
|
|
|
|
|
|
|
|
Termination of your rights under this section does not terminate
|
2015-06-09 15:33:32 -04:00
|
|
|
|
the licenses of parties who have received copies or rights from you
|
|
|
|
|
under this License. If your rights have been terminated and not
|
|
|
|
|
permanently reinstated, receipt of a copy of some or all of the
|
|
|
|
|
same material does not give you any rights to use it.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
10. FUTURE REVISIONS OF THIS LICENSE
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
The Free Software Foundation may publish new, revised versions of
|
|
|
|
|
the GNU Free Documentation License from time to time. Such new
|
|
|
|
|
versions will be similar in spirit to the present version, but may
|
|
|
|
|
differ in detail to address new problems or concerns. See
|
2015-06-09 15:33:32 -04:00
|
|
|
|
<http://www.gnu.org/copyleft/>.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Each version of the License is given a distinguishing version
|
|
|
|
|
number. If the Document specifies that a particular numbered
|
|
|
|
|
version of this License "or any later version" applies to it, you
|
|
|
|
|
have the option of following the terms and conditions either of
|
|
|
|
|
that specified version or of any later version that has been
|
2015-06-09 15:33:32 -04:00
|
|
|
|
published (not as a draft) by the Free Software Foundation. If the
|
|
|
|
|
Document does not specify a version number of this License, you may
|
|
|
|
|
choose any version ever published (not as a draft) by the Free
|
|
|
|
|
Software Foundation. If the Document specifies that a proxy can
|
|
|
|
|
decide which future versions of this License can be used, that
|
2010-04-04 13:44:45 -04:00
|
|
|
|
proxy's public statement of acceptance of a version permanently
|
|
|
|
|
authorizes you to choose that version for the Document.
|
|
|
|
|
|
2015-06-09 15:33:32 -04:00
|
|
|
|
11. RELICENSING
|
2010-04-04 13:44:45 -04:00
|
|
|
|
|
|
|
|
|
"Massive Multiauthor Collaboration Site" (or "MMC Site") means any
|
|
|
|
|
World Wide Web server that publishes copyrightable works and also
|
|
|
|
|
provides prominent facilities for anybody to edit those works. A
|
|
|
|
|
public wiki that anybody can edit is an example of such a server.
|
|
|
|
|
A "Massive Multiauthor Collaboration" (or "MMC") contained in the
|
|
|
|
|
site means any set of copyrightable works thus published on the MMC
|
|
|
|
|
site.
|
|
|
|
|
|
|
|
|
|
"CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0
|
|
|
|
|
license published by Creative Commons Corporation, a not-for-profit
|
|
|
|
|
corporation with a principal place of business in San Francisco,
|
|
|
|
|
California, as well as future copyleft versions of that license
|
|
|
|
|
published by that same organization.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
2010-04-04 13:44:45 -04:00
|
|
|
|
"Incorporate" means to publish or republish a Document, in whole or
|
|
|
|
|
in part, as part of another Document.
|
|
|
|
|
|
|
|
|
|
An MMC is "eligible for relicensing" if it is licensed under this
|
|
|
|
|
License, and if all works that were first published under this
|
|
|
|
|
License somewhere other than this MMC, and subsequently
|
|
|
|
|
incorporated in whole or in part into the MMC, (1) had no cover
|
|
|
|
|
texts or invariant sections, and (2) were thus incorporated prior
|
|
|
|
|
to November 1, 2008.
|
|
|
|
|
|
|
|
|
|
The operator of an MMC Site may republish an MMC contained in the
|
|
|
|
|
site under CC-BY-SA on the same site at any time before August 1,
|
|
|
|
|
2009, provided the MMC is eligible for relicensing.
|
|
|
|
|
|
|
|
|
|
ADDENDUM: How to use this License for your documents
|
|
|
|
|
====================================================
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
To use this License in a document you have written, include a copy of
|
|
|
|
|
the License in the document and put the following copyright and license
|
|
|
|
|
notices just after the title page:
|
|
|
|
|
|
|
|
|
|
Copyright (C) YEAR YOUR NAME.
|
|
|
|
|
Permission is granted to copy, distribute and/or modify this document
|
2010-04-04 13:44:45 -04:00
|
|
|
|
under the terms of the GNU Free Documentation License, Version 1.3
|
2008-04-17 17:03:07 -04:00
|
|
|
|
or any later version published by the Free Software Foundation;
|
|
|
|
|
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
|
|
|
|
|
Texts. A copy of the license is included in the section entitled ``GNU
|
|
|
|
|
Free Documentation License''.
|
|
|
|
|
|
|
|
|
|
If you have Invariant Sections, Front-Cover Texts and Back-Cover
|
2015-06-09 15:33:32 -04:00
|
|
|
|
Texts, replace the "with...Texts." line with this:
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
with the Invariant Sections being LIST THEIR TITLES, with
|
|
|
|
|
the Front-Cover Texts being LIST, and with the Back-Cover Texts
|
|
|
|
|
being LIST.
|
|
|
|
|
|
|
|
|
|
If you have Invariant Sections without Cover Texts, or some other
|
|
|
|
|
combination of the three, merge those two alternatives to suit the
|
|
|
|
|
situation.
|
|
|
|
|
|
|
|
|
|
If your document contains nontrivial examples of program code, we
|
2015-06-09 15:33:32 -04:00
|
|
|
|
recommend releasing these examples in parallel under your choice of free
|
|
|
|
|
software license, such as the GNU General Public License, to permit
|
|
|
|
|
their use in free software.
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
|
2009-02-12 09:17:32 -05:00
|
|
|
|
File: mpir.info, Node: Concept Index, Next: Function Index, Prev: GNU Free Documentation License, Up: Top
|
2008-04-17 17:03:07 -04:00
|
|
|
|
|
|
|
|
|
Concept Index
|
|
|
|
|
*************
|
|
|
|
|
|
2008-06-16 00:39:47 -04:00
|
|
|
|
|