0783ca3a37
section and put alignment details in the section on Custom Allocation.
3158 lines
170 KiB
Plaintext
3158 lines
170 KiB
Plaintext
This is mpir.info, produced by makeinfo version 4.11 from mpir.texi.
|
||
|
||
This manual describes how to install and use MPIR, the Multiple
|
||
Precision Integers and Rationals library, version 1.3.0.
|
||
|
||
Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
|
||
2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
|
||
|
||
Copyright 2008 William Hart
|
||
|
||
Permission is granted to copy, distribute and/or modify this
|
||
document under the terms of the GNU Free Documentation License, Version
|
||
1.2 or any later version published by the Free Software Foundation;
|
||
with no Invariant Sections, with the Front-Cover Texts being "A GNU
|
||
Manual", and with the Back-Cover Texts being "You have freedom to copy
|
||
and modify this GNU Manual, like GNU software". A copy of the license
|
||
is included in *note GNU Free Documentation License::.
|
||
|
||
INFO-DIR-SECTION GNU libraries
|
||
START-INFO-DIR-ENTRY
|
||
* mpir: (mpir). MPIR Multiple Precision Integers and Rationals Library.
|
||
END-INFO-DIR-ENTRY
|
||
|
||
|
||
File: mpir.info, Node: Radix to Binary, Prev: Binary to Radix, Up: Radix Conversion Algorithms
|
||
|
||
15.6.2 Radix to Binary
|
||
----------------------
|
||
|
||
Conversions from a power-of-2 radix into binary use a simple and fast
|
||
O(N) bitwise concatenation algorithm.
|
||
|
||
Conversions from other radices use one of two algorithms. Sizes
|
||
below `SET_STR_THRESHOLD' use a basic O(N^2) method. Groups of n
|
||
digits are converted to limbs, where n is the biggest power of the base
|
||
b which will fit in a limb, then those groups are accumulated into the
|
||
result by multiplying by b^n and adding. This saves multi-precision
|
||
operations, as per Knuth section 4.4 part E (*note References::). Some
|
||
special case code is provided for decimal, giving the compiler a chance
|
||
to optimize multiplications by 10.
|
||
|
||
Above `SET_STR_THRESHOLD' a sub-quadratic algorithm is used. First
|
||
groups of n digits are converted into limbs. Then adjacent limbs are
|
||
combined into limb pairs with x*b^n+y, where x and y are the limbs.
|
||
Adjacent limb pairs are combined into quads similarly with x*b^(2n)+y.
|
||
This continues until a single block remains, that being the result.
|
||
|
||
The advantage of this method is that the multiplications for each x
|
||
are big blocks, allowing Karatsuba and higher algorithms to be used.
|
||
But the cost of calculating the powers b^(n*2^i) must be overcome.
|
||
`SET_STR_THRESHOLD' usually ends up quite big, around 5000 digits, and
|
||
on some processors much bigger still.
|
||
|
||
`SET_STR_THRESHOLD' is based on the input digits (and tuned for
|
||
decimal), though it might be better based on a limb count, so as to be
|
||
independent of the base. But that sort of count isn't used by the base
|
||
case and so would need some sort of initial calculation or estimate.
|
||
|
||
The main reason `SET_STR_THRESHOLD' is so much bigger than the
|
||
corresponding `GET_STR_PRECOMPUTE_THRESHOLD' is that `mpn_mul_1' is
|
||
much faster than `mpn_divrem_1' (often by a factor of 10, or more).
|
||
|
||
|
||
File: mpir.info, Node: Other Algorithms, Next: Assembler Coding, Prev: Radix Conversion Algorithms, Up: Algorithms
|
||
|
||
15.7 Other Algorithms
|
||
=====================
|
||
|
||
* Menu:
|
||
|
||
* Prime Testing Algorithm::
|
||
* Factorial Algorithm::
|
||
* Binomial Coefficients Algorithm::
|
||
* Fibonacci Numbers Algorithm::
|
||
* Lucas Numbers Algorithm::
|
||
* Random Number Algorithms::
|
||
|
||
|
||
File: mpir.info, Node: Prime Testing Algorithm, Next: Factorial Algorithm, Prev: Other Algorithms, Up: Other Algorithms
|
||
|
||
15.7.1 Prime Testing
|
||
--------------------
|
||
|
||
The primality testing in `mpz_probab_prime_p' (*note Number Theoretic
|
||
Functions::) first does some trial division by small factors and then
|
||
uses the Miller-Rabin probabilistic primality testing algorithm, as
|
||
described in Knuth section 4.5.4 algorithm P (*note References::).
|
||
|
||
For an odd input n, and with n = q*2^k+1 where q is odd, this
|
||
algorithm selects a random base x and tests whether x^q mod n is 1 or
|
||
-1, or an x^(q*2^j) mod n is 1, for 1<=j<=k. If so then n is probably
|
||
prime, if not then n is definitely composite.
|
||
|
||
Any prime n will pass the test, but some composites do too. Such
|
||
composites are known as strong pseudoprimes to base x. No n is a
|
||
strong pseudoprime to more than 1/4 of all bases (see Knuth exercise
|
||
22), hence with x chosen at random there's no more than a 1/4 chance a
|
||
"probable prime" will in fact be composite.
|
||
|
||
In fact strong pseudoprimes are quite rare, making the test much more
|
||
powerful than this analysis would suggest, but 1/4 is all that's proven
|
||
for an arbitrary n.
|
||
|
||
|
||
File: mpir.info, Node: Factorial Algorithm, Next: Binomial Coefficients Algorithm, Prev: Prime Testing Algorithm, Up: Other Algorithms
|
||
|
||
15.7.2 Factorial
|
||
----------------
|
||
|
||
Factorials are calculated by a combination of removal of twos,
|
||
powering, and binary splitting. The procedure can be best illustrated
|
||
with an example,
|
||
|
||
23! = 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23
|
||
|
||
has factors of two removed,
|
||
|
||
23! = 2^19.1.1.3.1.5.3.7.1.9.5.11.3.13.7.15.1.17.9.19.5.21.11.23
|
||
|
||
and the resulting terms collected up according to their multiplicity,
|
||
|
||
23! = 2^19.(3.5)^3.(7.9.11)^2.(13.15.17.19.21.23)
|
||
|
||
Each sequence such as 13.15.17.19.21.23 is evaluated by splitting
|
||
into every second term, as for instance (13.17.21).(15.19.23), and the
|
||
same recursively on each half. This is implemented iteratively using
|
||
some bit twiddling.
|
||
|
||
Such splitting is more efficient than repeated Nx1 multiplies since
|
||
it forms big multiplies, allowing Karatsuba and higher algorithms to be
|
||
used. And even below the Karatsuba threshold a big block of work can
|
||
be more efficient for the basecase algorithm.
|
||
|
||
Splitting into subsequences of every second term keeps the resulting
|
||
products more nearly equal in size than would the simpler approach of
|
||
say taking the first half and second half of the sequence. Nearly
|
||
equal products are more efficient for the current multiply
|
||
implementation.
|
||
|
||
|
||
File: mpir.info, Node: Binomial Coefficients Algorithm, Next: Fibonacci Numbers Algorithm, Prev: Factorial Algorithm, Up: Other Algorithms
|
||
|
||
15.7.3 Binomial Coefficients
|
||
----------------------------
|
||
|
||
Binomial coefficients C(n,k) are calculated by first arranging k <= n/2
|
||
using C(n,k) = C(n,n-k) if necessary, and then evaluating the following
|
||
product simply from i=2 to i=k.
|
||
|
||
k (n-k+i)
|
||
C(n,k) = (n-k+1) * prod -------
|
||
i=2 i
|
||
|
||
It's easy to show that each denominator i will divide the product so
|
||
far, so the exact division algorithm is used (*note Exact Division::).
|
||
|
||
The numerators n-k+i and denominators i are first accumulated into
|
||
as many fit a limb, to save multi-precision operations, though for
|
||
`mpz_bin_ui' this applies only to the divisors, since n is an `mpz_t'
|
||
and n-k+i in general won't fit in a limb at all.
|
||
|
||
|
||
File: mpir.info, Node: Fibonacci Numbers Algorithm, Next: Lucas Numbers Algorithm, Prev: Binomial Coefficients Algorithm, Up: Other Algorithms
|
||
|
||
15.7.4 Fibonacci Numbers
|
||
------------------------
|
||
|
||
The Fibonacci functions `mpz_fib_ui' and `mpz_fib2_ui' are designed for
|
||
calculating isolated F[n] or F[n],F[n-1] values efficiently.
|
||
|
||
For small n, a table of single limb values in `__gmp_fib_table' is
|
||
used. On a 32-bit limb this goes up to F[47], or on a 64-bit limb up
|
||
to F[93]. For convenience the table starts at F[-1].
|
||
|
||
Beyond the table, values are generated with a binary powering
|
||
algorithm, calculating a pair F[n] and F[n-1] working from high to low
|
||
across the bits of n. The formulas used are
|
||
|
||
F[2k+1] = 4*F[k]^2 - F[k-1]^2 + 2*(-1)^k
|
||
F[2k-1] = F[k]^2 + F[k-1]^2
|
||
|
||
F[2k] = F[2k+1] - F[2k-1]
|
||
|
||
At each step, k is the high b bits of n. If the next bit of n is 0
|
||
then F[2k],F[2k-1] is used, or if it's a 1 then F[2k+1],F[2k] is used,
|
||
and the process repeated until all bits of n are incorporated. Notice
|
||
these formulas require just two squares per bit of n.
|
||
|
||
It'd be possible to handle the first few n above the single limb
|
||
table with simple additions, using the defining Fibonacci recurrence
|
||
F[k+1]=F[k]+F[k-1], but this is not done since it usually turns out to
|
||
be faster for only about 10 or 20 values of n, and including a block of
|
||
code for just those doesn't seem worthwhile. If they really mattered
|
||
it'd be better to extend the data table.
|
||
|
||
Using a table avoids lots of calculations on small numbers, and
|
||
makes small n go fast. A bigger table would make more small n go fast,
|
||
it's just a question of balancing size against desired speed. For MPIR
|
||
the code is kept compact, with the emphasis primarily on a good
|
||
powering algorithm.
|
||
|
||
`mpz_fib2_ui' returns both F[n] and F[n-1], but `mpz_fib_ui' is only
|
||
interested in F[n]. In this case the last step of the algorithm can
|
||
become one multiply instead of two squares. One of the following two
|
||
formulas is used, according as n is odd or even.
|
||
|
||
F[2k] = F[k]*(F[k]+2F[k-1])
|
||
|
||
F[2k+1] = (2F[k]+F[k-1])*(2F[k]-F[k-1]) + 2*(-1)^k
|
||
|
||
F[2k+1] here is the same as above, just rearranged to be a multiply.
|
||
For interest, the 2*(-1)^k term both here and above can be applied just
|
||
to the low limb of the calculation, without a carry or borrow into
|
||
further limbs, which saves some code size. See comments with
|
||
`mpz_fib_ui' and the internal `mpn_fib2_ui' for how this is done.
|
||
|
||
|
||
File: mpir.info, Node: Lucas Numbers Algorithm, Next: Random Number Algorithms, Prev: Fibonacci Numbers Algorithm, Up: Other Algorithms
|
||
|
||
15.7.5 Lucas Numbers
|
||
--------------------
|
||
|
||
`mpz_lucnum2_ui' derives a pair of Lucas numbers from a pair of
|
||
Fibonacci numbers with the following simple formulas.
|
||
|
||
L[k] = F[k] + 2*F[k-1]
|
||
L[k-1] = 2*F[k] - F[k-1]
|
||
|
||
`mpz_lucnum_ui' is only interested in L[n], and some work can be
|
||
saved. Trailing zero bits on n can be handled with a single square
|
||
each.
|
||
|
||
L[2k] = L[k]^2 - 2*(-1)^k
|
||
|
||
And the lowest 1 bit can be handled with one multiply of a pair of
|
||
Fibonacci numbers, similar to what `mpz_fib_ui' does.
|
||
|
||
L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*(-1)^k
|
||
|
||
|
||
File: mpir.info, Node: Random Number Algorithms, Prev: Lucas Numbers Algorithm, Up: Other Algorithms
|
||
|
||
15.7.6 Random Numbers
|
||
---------------------
|
||
|
||
For the `urandomb' functions, random numbers are generated simply by
|
||
concatenating bits produced by the generator. As long as the generator
|
||
has good randomness properties this will produce well-distributed N bit
|
||
numbers.
|
||
|
||
For the `urandomm' functions, random numbers in a range 0<=R<N are
|
||
generated by taking values R of ceil(log2(N)) bits each until one
|
||
satisfies R<N. This will normally require only one or two attempts,
|
||
but the attempts are limited in case the generator is somehow
|
||
degenerate and produces only 1 bits or similar.
|
||
|
||
The Mersenne Twister generator is by Matsumoto and Nishimura (*note
|
||
References::). It has a non-repeating period of 2^19937-1, which is a
|
||
Mersenne prime, hence the name of the generator. The state is 624
|
||
words of 32-bits each, which is iterated with one XOR and shift for each
|
||
32-bit word generated, making the algorithm very fast. Randomness
|
||
properties are also very good and this is the default algorithm used by
|
||
MPIR.
|
||
|
||
Linear congruential generators are described in many text books, for
|
||
instance Knuth volume 2 (*note References::). With a modulus M and
|
||
parameters A and C, a integer state S is iterated by the formula S <-
|
||
A*S+C mod M. At each step the new state is a linear function of the
|
||
previous, mod M, hence the name of the generator.
|
||
|
||
In MPIR only moduli of the form 2^N are supported, and the current
|
||
implementation is not as well optimized as it could be. Overheads are
|
||
significant when N is small, and when N is large clearly the multiply
|
||
at each step will become slow. This is not a big concern, since the
|
||
Mersenne Twister generator is better in every respect and is therefore
|
||
recommended for all normal applications.
|
||
|
||
For both generators the current state can be deduced by observing
|
||
enough output and applying some linear algebra (over GF(2) in the case
|
||
of the Mersenne Twister). This generally means raw output is
|
||
unsuitable for cryptographic applications without further hashing or
|
||
the like.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Coding, Prev: Other Algorithms, Up: Algorithms
|
||
|
||
15.8 Assembler Coding
|
||
=====================
|
||
|
||
The assembler subroutines in MPIR are the most significant source of
|
||
speed at small to moderate sizes. At larger sizes algorithm selection
|
||
becomes more important, but of course speedups in low level routines
|
||
will still speed up everything proportionally.
|
||
|
||
Carry handling and widening multiplies that are important for MPIR
|
||
can't be easily expressed in C. GCC `asm' blocks help a lot and are
|
||
provided in `longlong.h', but hand coding low level routines invariably
|
||
offers a speedup over generic C by a factor of anything from 2 to 10.
|
||
|
||
* Menu:
|
||
|
||
* Assembler Code Organisation::
|
||
* Assembler Basics::
|
||
* Assembler Carry Propagation::
|
||
* Assembler Cache Handling::
|
||
* Assembler Functional Units::
|
||
* Assembler Floating Point::
|
||
* Assembler SIMD Instructions::
|
||
* Assembler Software Pipelining::
|
||
* Assembler Loop Unrolling::
|
||
* Assembler Writing Guide::
|
||
|
||
|
||
File: mpir.info, Node: Assembler Code Organisation, Next: Assembler Basics, Prev: Assembler Coding, Up: Assembler Coding
|
||
|
||
15.8.1 Code Organisation
|
||
------------------------
|
||
|
||
The various `mpn' subdirectories contain machine-dependent code, written
|
||
in C or assembler. The `mpn/generic' subdirectory contains default
|
||
code, used when there's no machine-specific version of a particular
|
||
file.
|
||
|
||
Each `mpn' subdirectory is for an ISA family. Generally 32-bit and
|
||
64-bit variants in a family cannot share code and have separate
|
||
directories. Within a family further subdirectories may exist for CPU
|
||
variants.
|
||
|
||
In each directory a `nails' subdirectory may exist, holding code with
|
||
nails support for that CPU variant. A `NAILS_SUPPORT' directive in each
|
||
file indicates the nails values the code handles. Nails code only
|
||
exists where it's faster, or promises to be faster, than plain code.
|
||
There's no effort put into nails if they're not going to enhance a
|
||
given CPU.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Basics, Next: Assembler Carry Propagation, Prev: Assembler Code Organisation, Up: Assembler Coding
|
||
|
||
15.8.2 Assembler Basics
|
||
-----------------------
|
||
|
||
`mpn_addmul_1' and `mpn_submul_1' are the most important routines for
|
||
overall MPIR performance. All multiplications and divisions come down
|
||
to repeated calls to these. `mpn_add_n', `mpn_sub_n', `mpn_lshift' and
|
||
`mpn_rshift' are next most important.
|
||
|
||
On some CPUs assembler versions of the internal functions
|
||
`mpn_mul_basecase' and `mpn_sqr_basecase' give significant speedups,
|
||
mainly through avoiding function call overheads. They can also
|
||
potentially make better use of a wide superscalar processor, as can
|
||
bigger primitives like `mpn_addmul_2' or `mpn_addmul_4'.
|
||
|
||
The restrictions on overlaps between sources and destinations (*note
|
||
Low-level Functions::) are designed to facilitate a variety of
|
||
implementations. For example, knowing `mpn_add_n' won't have partly
|
||
overlapping sources and destination means reading can be done far ahead
|
||
of writing on superscalar processors, and loops can be vectorized on a
|
||
vector processor, depending on the carry handling.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Carry Propagation, Next: Assembler Cache Handling, Prev: Assembler Basics, Up: Assembler Coding
|
||
|
||
15.8.3 Carry Propagation
|
||
------------------------
|
||
|
||
The problem that presents most challenges in MPIR is propagating
|
||
carries from one limb to the next. In functions like `mpn_addmul_1' and
|
||
`mpn_add_n', carries are the only dependencies between limb operations.
|
||
|
||
On processors with carry flags, a straightforward CISC style `adc' is
|
||
generally best. AMD K6 `mpn_addmul_1' however is an example of an
|
||
unusual set of circumstances where a branch works out better.
|
||
|
||
On RISC processors generally an add and compare for overflow is
|
||
used. This sort of thing can be seen in `mpn/generic/aors_n.c'. Some
|
||
carry propagation schemes require 4 instructions, meaning at least 4
|
||
cycles per limb, but other schemes may use just 1 or 2. On wide
|
||
superscalar processors performance may be completely determined by the
|
||
number of dependent instructions between carry-in and carry-out for
|
||
each limb.
|
||
|
||
On vector processors good use can be made of the fact that a carry
|
||
bit only very rarely propagates more than one limb. When adding a
|
||
single bit to a limb, there's only a carry out if that limb was
|
||
`0xFF...FF' which on random data will be only 1 in 2^mp_bits_per_limb.
|
||
`mpn/cray/add_n.c' is an example of this, it adds all limbs in
|
||
parallel, adds one set of carry bits in parallel and then only rarely
|
||
needs to fall through to a loop propagating further carries.
|
||
|
||
On the x86s, GCC (as of version 2.95.2) doesn't generate
|
||
particularly good code for the RISC style idioms that are necessary to
|
||
handle carry bits in C. Often conditional jumps are generated where
|
||
`adc' or `sbb' forms would be better. And so unfortunately almost any
|
||
loop involving carry bits needs to be coded in assembler for best
|
||
results.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Cache Handling, Next: Assembler Functional Units, Prev: Assembler Carry Propagation, Up: Assembler Coding
|
||
|
||
15.8.4 Cache Handling
|
||
---------------------
|
||
|
||
MPIR aims to perform well both on operands that fit entirely in L1
|
||
cache and those which don't.
|
||
|
||
Basic routines like `mpn_add_n' or `mpn_lshift' are often used on
|
||
large operands, so L2 and main memory performance is important for them.
|
||
`mpn_mul_1' and `mpn_addmul_1' are mostly used for multiply and square
|
||
basecases, so L1 performance matters most for them, unless assembler
|
||
versions of `mpn_mul_basecase' and `mpn_sqr_basecase' exist, in which
|
||
case the remaining uses are mostly for larger operands.
|
||
|
||
For L2 or main memory operands, memory access times will almost
|
||
certainly be more than the calculation time. The aim therefore is to
|
||
maximize memory throughput, by starting a load of the next cache line
|
||
while processing the contents of the previous one. Clearly this is
|
||
only possible if the chip has a lock-up free cache or some sort of
|
||
prefetch instruction. Most current chips have both these features.
|
||
|
||
Prefetching sources combines well with loop unrolling, since a
|
||
prefetch can be initiated once per unrolled loop (or more than once if
|
||
the loop covers more than one cache line).
|
||
|
||
On CPUs without write-allocate caches, prefetching destinations will
|
||
ensure individual stores don't go further down the cache hierarchy,
|
||
limiting bandwidth. Of course for calculations which are slow anyway,
|
||
like `mpn_divrem_1', write-throughs might be fine.
|
||
|
||
The distance ahead to prefetch will be determined by memory latency
|
||
versus throughput. The aim of course is to have data arriving
|
||
continuously, at peak throughput. Some CPUs have limits on the number
|
||
of fetches or prefetches in progress.
|
||
|
||
If a special prefetch instruction doesn't exist then a plain load
|
||
can be used, but in that case care must be taken not to attempt to read
|
||
past the end of an operand, since that might produce a segmentation
|
||
violation.
|
||
|
||
Some CPUs or systems have hardware that detects sequential memory
|
||
accesses and initiates suitable cache movements automatically, making
|
||
life easy.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Functional Units, Next: Assembler Floating Point, Prev: Assembler Cache Handling, Up: Assembler Coding
|
||
|
||
15.8.5 Functional Units
|
||
-----------------------
|
||
|
||
When choosing an approach for an assembler loop, consideration is given
|
||
to what operations can execute simultaneously and what throughput can
|
||
thereby be achieved. In some cases an algorithm can be tweaked to
|
||
accommodate available resources.
|
||
|
||
Loop control will generally require a counter and pointer updates,
|
||
costing as much as 5 instructions, plus any delays a branch introduces.
|
||
CPU addressing modes might reduce pointer updates, perhaps by allowing
|
||
just one updating pointer and others expressed as offsets from it, or
|
||
on CISC chips with all addressing done with the loop counter as a
|
||
scaled index.
|
||
|
||
The final loop control cost can be amortised by processing several
|
||
limbs in each iteration (*note Assembler Loop Unrolling::). This at
|
||
least ensures loop control isn't a big fraction the work done.
|
||
|
||
Memory throughput is always a limit. If perhaps only one load or
|
||
one store can be done per cycle then 3 cycles/limb will the top speed
|
||
for "binary" operations like `mpn_add_n', and any code achieving that
|
||
is optimal.
|
||
|
||
Integer resources can be freed up by having the loop counter in a
|
||
float register, or by pressing the float units into use for some
|
||
multiplying, perhaps doing every second limb on the float side (*note
|
||
Assembler Floating Point::).
|
||
|
||
Float resources can be freed up by doing carry propagation on the
|
||
integer side, or even by doing integer to float conversions in integers
|
||
using bit twiddling.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Floating Point, Next: Assembler SIMD Instructions, Prev: Assembler Functional Units, Up: Assembler Coding
|
||
|
||
15.8.6 Floating Point
|
||
---------------------
|
||
|
||
Floating point arithmetic is used in MPIR for multiplications on CPUs
|
||
with poor integer multipliers. It's mostly useful for `mpn_mul_1',
|
||
`mpn_addmul_1' and `mpn_submul_1' on 64-bit machines, and
|
||
`mpn_mul_basecase' on both 32-bit and 64-bit machines.
|
||
|
||
With IEEE 53-bit double precision floats, integer multiplications
|
||
producing up to 53 bits will give exact results. Breaking a 64x64
|
||
multiplication into eight 16x32->48 bit pieces is convenient. With
|
||
some care though six 21x32->53 bit products can be used, if one of the
|
||
lower two 21-bit pieces also uses the sign bit.
|
||
|
||
For the `mpn_mul_1' family of functions on a 64-bit machine, the
|
||
invariant single limb is split at the start, into 3 or 4 pieces.
|
||
Inside the loop, the bignum operand is split into 32-bit pieces. Fast
|
||
conversion of these unsigned 32-bit pieces to floating point is highly
|
||
machine-dependent. In some cases, reading the data into the integer
|
||
unit, zero-extending to 64-bits, then transferring to the floating
|
||
point unit back via memory is the only option.
|
||
|
||
Converting partial products back to 64-bit limbs is usually best
|
||
done as a signed conversion. Since all values are smaller than 2^53,
|
||
signed and unsigned are the same, but most processors lack unsigned
|
||
conversions.
|
||
|
||
|
||
|
||
Here is a diagram showing 16x32 bit products for an `mpn_mul_1' or
|
||
`mpn_addmul_1' with a 64-bit limb. The single limb operand V is split
|
||
into four 16-bit parts. The multi-limb operand U is split in the loop
|
||
into two 32-bit parts.
|
||
|
||
+---+---+---+---+
|
||
|v48|v32|v16|v00| V operand
|
||
+---+---+---+---+
|
||
|
||
+-------+---+---+
|
||
x | u32 | u00 | U operand (one limb)
|
||
+---------------+
|
||
|
||
---------------------------------
|
||
|
||
+-----------+
|
||
| u00 x v00 | p00 48-bit products
|
||
+-----------+
|
||
+-----------+
|
||
| u00 x v16 | p16
|
||
+-----------+
|
||
+-----------+
|
||
| u00 x v32 | p32
|
||
+-----------+
|
||
+-----------+
|
||
| u00 x v48 | p48
|
||
+-----------+
|
||
+-----------+
|
||
| u32 x v00 | r32
|
||
+-----------+
|
||
+-----------+
|
||
| u32 x v16 | r48
|
||
+-----------+
|
||
+-----------+
|
||
| u32 x v32 | r64
|
||
+-----------+
|
||
+-----------+
|
||
| u32 x v48 | r80
|
||
+-----------+
|
||
|
||
p32 and r32 can be summed using floating-point addition, and
|
||
likewise p48 and r48. p00 and p16 can be summed with r64 and r80 from
|
||
the previous iteration.
|
||
|
||
For each loop then, four 49-bit quantities are transfered to the
|
||
integer unit, aligned as follows,
|
||
|
||
|-----64bits----|-----64bits----|
|
||
+------------+
|
||
| p00 + r64' | i00
|
||
+------------+
|
||
+------------+
|
||
| p16 + r80' | i16
|
||
+------------+
|
||
+------------+
|
||
| p32 + r32 | i32
|
||
+------------+
|
||
+------------+
|
||
| p48 + r48 | i48
|
||
+------------+
|
||
|
||
The challenge then is to sum these efficiently and add in a carry
|
||
limb, generating a low 64-bit result limb and a high 33-bit carry limb
|
||
(i48 extends 33 bits into the high half).
|
||
|
||
|
||
File: mpir.info, Node: Assembler SIMD Instructions, Next: Assembler Software Pipelining, Prev: Assembler Floating Point, Up: Assembler Coding
|
||
|
||
15.8.7 SIMD Instructions
|
||
------------------------
|
||
|
||
The single-instruction multiple-data support in current microprocessors
|
||
is aimed at signal processing algorithms where each data point can be
|
||
treated more or less independently. There's generally not much support
|
||
for propagating the sort of carries that arise in MPIR.
|
||
|
||
SIMD multiplications of say four 16x16 bit multiplies only do as much
|
||
work as one 32x32 from MPIR's point of view, and need some shifts and
|
||
adds besides. But of course if say the SIMD form is fully pipelined
|
||
and uses less instruction decoding then it may still be worthwhile.
|
||
|
||
On the x86 chips, MMX has so far found a use in `mpn_rshift' and
|
||
`mpn_lshift', and is used in a special case for 16-bit multipliers in
|
||
the P55 `mpn_mul_1'. SSE2 is used for Pentium 4 `mpn_mul_1',
|
||
`mpn_addmul_1', and `mpn_submul_1'.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Software Pipelining, Next: Assembler Loop Unrolling, Prev: Assembler SIMD Instructions, Up: Assembler Coding
|
||
|
||
15.8.8 Software Pipelining
|
||
--------------------------
|
||
|
||
Software pipelining consists of scheduling instructions around the
|
||
branch point in a loop. For example a loop might issue a load not for
|
||
use in the present iteration but the next, thereby allowing extra
|
||
cycles for the data to arrive from memory.
|
||
|
||
Naturally this is wanted only when doing things like loads or
|
||
multiplies that take several cycles to complete, and only where a CPU
|
||
has multiple functional units so that other work can be done in the
|
||
meantime.
|
||
|
||
A pipeline with several stages will have a data value in progress at
|
||
each stage and each loop iteration moves them along one stage. This is
|
||
like juggling.
|
||
|
||
If the latency of some instruction is greater than the loop time
|
||
then it will be necessary to unroll, so one register has a result ready
|
||
to use while another (or multiple others) are still in progress.
|
||
(*note Assembler Loop Unrolling::).
|
||
|
||
|
||
File: mpir.info, Node: Assembler Loop Unrolling, Next: Assembler Writing Guide, Prev: Assembler Software Pipelining, Up: Assembler Coding
|
||
|
||
15.8.9 Loop Unrolling
|
||
---------------------
|
||
|
||
Loop unrolling consists of replicating code so that several limbs are
|
||
processed in each loop. At a minimum this reduces loop overheads by a
|
||
corresponding factor, but it can also allow better register usage, for
|
||
example alternately using one register combination and then another.
|
||
Judicious use of `m4' macros can help avoid lots of duplication in the
|
||
source code.
|
||
|
||
Any amount of unrolling can be handled with a loop counter that's
|
||
decremented by N each time, stopping when the remaining count is less
|
||
than the further N the loop will process. Or by subtracting N at the
|
||
start, the termination condition becomes when the counter C is less
|
||
than 0 (and the count of remaining limbs is C+N).
|
||
|
||
Alternately for a power of 2 unroll the loop count and remainder can
|
||
be established with a shift and mask. This is convenient if also
|
||
making a computed jump into the middle of a large loop.
|
||
|
||
The limbs not a multiple of the unrolling can be handled in various
|
||
ways, for example
|
||
|
||
* A simple loop at the end (or the start) to process the excess.
|
||
Care will be wanted that it isn't too much slower than the
|
||
unrolled part.
|
||
|
||
* A set of binary tests, for example after an 8-limb unrolling, test
|
||
for 4 more limbs to process, then a further 2 more or not, and
|
||
finally 1 more or not. This will probably take more code space
|
||
than a simple loop.
|
||
|
||
* A `switch' statement, providing separate code for each possible
|
||
excess, for example an 8-limb unrolling would have separate code
|
||
for 0 remaining, 1 remaining, etc, up to 7 remaining. This might
|
||
take a lot of code, but may be the best way to optimize all cases
|
||
in combination with a deep pipelined loop.
|
||
|
||
* A computed jump into the middle of the loop, thus making the first
|
||
iteration handle the excess. This should make times smoothly
|
||
increase with size, which is attractive, but setups for the jump
|
||
and adjustments for pointers can be tricky and could become quite
|
||
difficult in combination with deep pipelining.
|
||
|
||
|
||
File: mpir.info, Node: Assembler Writing Guide, Prev: Assembler Loop Unrolling, Up: Assembler Coding
|
||
|
||
15.8.10 Writing Guide
|
||
---------------------
|
||
|
||
This is a guide to writing software pipelined loops for processing limb
|
||
vectors in assembler.
|
||
|
||
First determine the algorithm and which instructions are needed.
|
||
Code it without unrolling or scheduling, to make sure it works. On a
|
||
3-operand CPU try to write each new value to a new register, this will
|
||
greatly simplify later steps.
|
||
|
||
Then note for each instruction the functional unit and/or issue port
|
||
requirements. If an instruction can use either of two units, like U0
|
||
or U1 then make a category "U0/U1". Count the total using each unit
|
||
(or combined unit), and count all instructions.
|
||
|
||
Figure out from those counts the best possible loop time. The goal
|
||
will be to find a perfect schedule where instruction latencies are
|
||
completely hidden. The total instruction count might be the limiting
|
||
factor, or perhaps a particular functional unit. It might be possible
|
||
to tweak the instructions to help the limiting factor.
|
||
|
||
Suppose the loop time is N, then make N issue buckets, with the
|
||
final loop branch at the end of the last. Now fill the buckets with
|
||
dummy instructions using the functional units desired. Run this to
|
||
make sure the intended speed is reached.
|
||
|
||
Now replace the dummy instructions with the real instructions from
|
||
the slow but correct loop you started with. The first will typically
|
||
be a load instruction. Then the instruction using that value is placed
|
||
in a bucket an appropriate distance down. Run the loop again, to check
|
||
it still runs at target speed.
|
||
|
||
Keep placing instructions, frequently measuring the loop. After a
|
||
few you will need to wrap around from the last bucket back to the top
|
||
of the loop. If you used the new-register for new-value strategy above
|
||
then there will be no register conflicts. If not then take care not to
|
||
clobber something already in use. Changing registers at this time is
|
||
very error prone.
|
||
|
||
The loop will overlap two or more of the original loop iterations,
|
||
and the computation of one vector element result will be started in one
|
||
iteration of the new loop, and completed one or several iterations
|
||
later.
|
||
|
||
The final step is to create feed-in and wind-down code for the loop.
|
||
A good way to do this is to make a copy (or copies) of the loop at the
|
||
start and delete those instructions which don't have valid antecedents,
|
||
and at the end replicate and delete those whose results are unwanted
|
||
(including any further loads).
|
||
|
||
The loop will have a minimum number of limbs loaded and processed,
|
||
so the feed-in code must test if the request size is smaller and skip
|
||
either to a suitable part of the wind-down or to special code for small
|
||
sizes.
|
||
|
||
|
||
File: mpir.info, Node: Internals, Next: Contributors, Prev: Algorithms, Up: Top
|
||
|
||
16 Internals
|
||
************
|
||
|
||
*This chapter is provided only for informational purposes and the
|
||
various internals described here may change in future MPIR releases.
|
||
Applications expecting to be compatible with future releases should use
|
||
only the documented interfaces described in previous chapters.*
|
||
|
||
* Menu:
|
||
|
||
* Integer Internals::
|
||
* Rational Internals::
|
||
* Float Internals::
|
||
* Raw Output Internals::
|
||
* C++ Interface Internals::
|
||
|
||
|
||
File: mpir.info, Node: Integer Internals, Next: Rational Internals, Prev: Internals, Up: Internals
|
||
|
||
16.1 Integer Internals
|
||
======================
|
||
|
||
`mpz_t' variables represent integers using sign and magnitude, in space
|
||
dynamically allocated and reallocated. The fields are as follows.
|
||
|
||
`_mp_size'
|
||
The number of limbs, or the negative of that when representing a
|
||
negative integer. Zero is represented by `_mp_size' set to zero,
|
||
in which case the `_mp_d' data is unused.
|
||
|
||
`_mp_d'
|
||
A pointer to an array of limbs which is the magnitude. These are
|
||
stored "little endian" as per the `mpn' functions, so `_mp_d[0]'
|
||
is the least significant limb and `_mp_d[ABS(_mp_size)-1]' is the
|
||
most significant. Whenever `_mp_size' is non-zero, the most
|
||
significant limb is non-zero.
|
||
|
||
Currently there's always at least one limb allocated, so for
|
||
instance `mpz_set_ui' never needs to reallocate, and `mpz_get_ui'
|
||
can fetch `_mp_d[0]' unconditionally (though its value is then
|
||
only wanted if `_mp_size' is non-zero).
|
||
|
||
`_mp_alloc'
|
||
`_mp_alloc' is the number of limbs currently allocated at `_mp_d',
|
||
and naturally `_mp_alloc >= ABS(_mp_size)'. When an `mpz' routine
|
||
is about to (or might be about to) increase `_mp_size', it checks
|
||
`_mp_alloc' to see whether there's enough space, and reallocates
|
||
if not. `MPZ_REALLOC' is generally used for this.
|
||
|
||
The various bitwise logical functions like `mpz_and' behave as if
|
||
negative values were twos complement. But sign and magnitude is always
|
||
used internally, and necessary adjustments are made during the
|
||
calculations. Sometimes this isn't pretty, but sign and magnitude are
|
||
best for other routines.
|
||
|
||
Some internal temporary variables are setup with `MPZ_TMP_INIT' and
|
||
these have `_mp_d' space obtained from `TMP_ALLOC' rather than the
|
||
memory allocation functions. Care is taken to ensure that these are
|
||
big enough that no reallocation is necessary (since it would have
|
||
unpredictable consequences).
|
||
|
||
`_mp_size' and `_mp_alloc' are `int', although `mp_size_t' is
|
||
usually a `long'. This is done to make the fields just 32 bits on some
|
||
64 bits systems, thereby saving a few bytes of data space but still
|
||
providing plenty of range.
|
||
|
||
|
||
File: mpir.info, Node: Rational Internals, Next: Float Internals, Prev: Integer Internals, Up: Internals
|
||
|
||
16.2 Rational Internals
|
||
=======================
|
||
|
||
`mpq_t' variables represent rationals using an `mpz_t' numerator and
|
||
denominator (*note Integer Internals::).
|
||
|
||
The canonical form adopted is denominator positive (and non-zero),
|
||
no common factors between numerator and denominator, and zero uniquely
|
||
represented as 0/1.
|
||
|
||
It's believed that casting out common factors at each stage of a
|
||
calculation is best in general. A GCD is an O(N^2) operation so it's
|
||
better to do a few small ones immediately than to delay and have to do
|
||
a big one later. Knowing the numerator and denominator have no common
|
||
factors can be used for example in `mpq_mul' to make only two cross
|
||
GCDs necessary, not four.
|
||
|
||
This general approach to common factors is badly sub-optimal in the
|
||
presence of simple factorizations or little prospect for cancellation,
|
||
but MPIR has no way to know when this will occur. As per *note
|
||
Efficiency::, that's left to applications. The `mpq_t' framework might
|
||
still suit, with `mpq_numref' and `mpq_denref' for direct access to the
|
||
numerator and denominator, or of course `mpz_t' variables can be used
|
||
directly.
|
||
|
||
|
||
File: mpir.info, Node: Float Internals, Next: Raw Output Internals, Prev: Rational Internals, Up: Internals
|
||
|
||
16.3 Float Internals
|
||
====================
|
||
|
||
Efficient calculation is the primary aim of MPIR floats and the use of
|
||
whole limbs and simple rounding facilitates this.
|
||
|
||
`mpf_t' floats have a variable precision mantissa and a single
|
||
machine word signed exponent. The mantissa is represented using sign
|
||
and magnitude.
|
||
|
||
most least
|
||
significant significant
|
||
limb limb
|
||
|
||
_mp_d
|
||
|---- _mp_exp ---> |
|
||
_____ _____ _____ _____ _____
|
||
|_____|_____|_____|_____|_____|
|
||
. <------------ radix point
|
||
|
||
<-------- _mp_size --------->
|
||
|
||
The fields are as follows.
|
||
|
||
`_mp_size'
|
||
The number of limbs currently in use, or the negative of that when
|
||
representing a negative value. Zero is represented by `_mp_size'
|
||
and `_mp_exp' both set to zero, and in that case the `_mp_d' data
|
||
is unused. (In the future `_mp_exp' might be undefined when
|
||
representing zero.)
|
||
|
||
`_mp_prec'
|
||
The precision of the mantissa, in limbs. In any calculation the
|
||
aim is to produce `_mp_prec' limbs of result (the most significant
|
||
being non-zero).
|
||
|
||
`_mp_d'
|
||
A pointer to the array of limbs which is the absolute value of the
|
||
mantissa. These are stored "little endian" as per the `mpn'
|
||
functions, so `_mp_d[0]' is the least significant limb and
|
||
`_mp_d[ABS(_mp_size)-1]' the most significant.
|
||
|
||
The most significant limb is always non-zero, but there are no
|
||
other restrictions on its value, in particular the highest 1 bit
|
||
can be anywhere within the limb.
|
||
|
||
`_mp_prec+1' limbs are allocated to `_mp_d', the extra limb being
|
||
for convenience (see below). There are no reallocations during a
|
||
calculation, only in a change of precision with `mpf_set_prec'.
|
||
|
||
`_mp_exp'
|
||
The exponent, in limbs, determining the location of the implied
|
||
radix point. Zero means the radix point is just above the most
|
||
significant limb. Positive values mean a radix point offset
|
||
towards the lower limbs and hence a value >= 1, as for example in
|
||
the diagram above. Negative exponents mean a radix point further
|
||
above the highest limb.
|
||
|
||
Naturally the exponent can be any value, it doesn't have to fall
|
||
within the limbs as the diagram shows, it can be a long way above
|
||
or a long way below. Limbs other than those included in the
|
||
`{_mp_d,_mp_size}' data are treated as zero.
|
||
|
||
`_mp_size' and `_mp_prec' are `int', although `mp_size_t' is usually
|
||
a `long'. This is done to make the fields just 32 bits on some 64 bits
|
||
systems, thereby saving a few bytes of data space but still providing
|
||
plenty of range.
|
||
|
||
|
||
The following various points should be noted.
|
||
|
||
Low Zeros
|
||
The least significant limbs `_mp_d[0]' etc can be zero, though
|
||
such low zeros can always be ignored. Routines likely to produce
|
||
low zeros check and avoid them to save time in subsequent
|
||
calculations, but for most routines they're quite unlikely and
|
||
aren't checked.
|
||
|
||
Mantissa Size Range
|
||
The `_mp_size' count of limbs in use can be less than `_mp_prec' if
|
||
the value can be represented in less. This means low precision
|
||
values or small integers stored in a high precision `mpf_t' can
|
||
still be operated on efficiently.
|
||
|
||
`_mp_size' can also be greater than `_mp_prec'. Firstly a value is
|
||
allowed to use all of the `_mp_prec+1' limbs available at `_mp_d',
|
||
and secondly when `mpf_set_prec_raw' lowers `_mp_prec' it leaves
|
||
`_mp_size' unchanged and so the size can be arbitrarily bigger than
|
||
`_mp_prec'.
|
||
|
||
Rounding
|
||
All rounding is done on limb boundaries. Calculating `_mp_prec'
|
||
limbs with the high non-zero will ensure the application requested
|
||
minimum precision is obtained.
|
||
|
||
The use of simple "trunc" rounding towards zero is efficient,
|
||
since there's no need to examine extra limbs and increment or
|
||
decrement.
|
||
|
||
Bit Shifts
|
||
Since the exponent is in limbs, there are no bit shifts in basic
|
||
operations like `mpf_add' and `mpf_mul'. When differing exponents
|
||
are encountered all that's needed is to adjust pointers to line up
|
||
the relevant limbs.
|
||
|
||
Of course `mpf_mul_2exp' and `mpf_div_2exp' will require bit
|
||
shifts, but the choice is between an exponent in limbs which
|
||
requires shifts there, or one in bits which requires them almost
|
||
everywhere else.
|
||
|
||
Use of `_mp_prec+1' Limbs
|
||
The extra limb on `_mp_d' (`_mp_prec+1' rather than just
|
||
`_mp_prec') helps when an `mpf' routine might get a carry from its
|
||
operation. `mpf_add' for instance will do an `mpn_add' of
|
||
`_mp_prec' limbs. If there's no carry then that's the result, but
|
||
if there is a carry then it's stored in the extra limb of space and
|
||
`_mp_size' becomes `_mp_prec+1'.
|
||
|
||
Whenever `_mp_prec+1' limbs are held in a variable, the low limb
|
||
is not needed for the intended precision, only the `_mp_prec' high
|
||
limbs. But zeroing it out or moving the rest down is unnecessary.
|
||
Subsequent routines reading the value will simply take the high
|
||
limbs they need, and this will be `_mp_prec' if their target has
|
||
that same precision. This is no more than a pointer adjustment,
|
||
and must be checked anyway since the destination precision can be
|
||
different from the sources.
|
||
|
||
Copy functions like `mpf_set' will retain a full `_mp_prec+1' limbs
|
||
if available. This ensures that a variable which has `_mp_size'
|
||
equal to `_mp_prec+1' will get its full exact value copied.
|
||
Strictly speaking this is unnecessary since only `_mp_prec' limbs
|
||
are needed for the application's requested precision, but it's
|
||
considered that an `mpf_set' from one variable into another of the
|
||
same precision ought to produce an exact copy.
|
||
|
||
Application Precisions
|
||
`__GMPF_BITS_TO_PREC' converts an application requested precision
|
||
to an `_mp_prec'. The value in bits is rounded up to a whole limb
|
||
then an extra limb is added since the most significant limb of
|
||
`_mp_d' is only non-zero and therefore might contain only one bit.
|
||
|
||
`__GMPF_PREC_TO_BITS' does the reverse conversion, and removes the
|
||
extra limb from `_mp_prec' before converting to bits. The net
|
||
effect of reading back with `mpf_get_prec' is simply the precision
|
||
rounded up to a multiple of `mp_bits_per_limb'.
|
||
|
||
Note that the extra limb added here for the high only being
|
||
non-zero is in addition to the extra limb allocated to `_mp_d'.
|
||
For example with a 32-bit limb, an application request for 250
|
||
bits will be rounded up to 8 limbs, then an extra added for the
|
||
high being only non-zero, giving an `_mp_prec' of 9. `_mp_d' then
|
||
gets 10 limbs allocated. Reading back with `mpf_get_prec' will
|
||
take `_mp_prec' subtract 1 limb and multiply by 32, giving 256
|
||
bits.
|
||
|
||
Strictly speaking, the fact the high limb has at least one bit
|
||
means that a float with, say, 3 limbs of 32-bits each will be
|
||
holding at least 65 bits, but for the purposes of `mpf_t' it's
|
||
considered simply to be 64 bits, a nice multiple of the limb size.
|
||
|
||
|
||
File: mpir.info, Node: Raw Output Internals, Next: C++ Interface Internals, Prev: Float Internals, Up: Internals
|
||
|
||
16.4 Raw Output Internals
|
||
=========================
|
||
|
||
`mpz_out_raw' uses the following format.
|
||
|
||
+------+------------------------+
|
||
| size | data bytes |
|
||
+------+------------------------+
|
||
|
||
The size is 4 bytes written most significant byte first, being the
|
||
number of subsequent data bytes, or the twos complement negative of
|
||
that when a negative integer is represented. The data bytes are the
|
||
absolute value of the integer, written most significant byte first.
|
||
|
||
The most significant data byte is always non-zero, so the output is
|
||
the same on all systems, irrespective of limb size.
|
||
|
||
In GMP 1, leading zero bytes were written to pad the data bytes to a
|
||
multiple of the limb size. `mpz_inp_raw' will still accept this, for
|
||
compatibility.
|
||
|
||
The use of "big endian" for both the size and data fields is
|
||
deliberate, it makes the data easy to read in a hex dump of a file.
|
||
Unfortunately it also means that the limb data must be reversed when
|
||
reading or writing, so neither a big endian nor little endian system
|
||
can just read and write `_mp_d'.
|
||
|
||
|
||
File: mpir.info, Node: C++ Interface Internals, Prev: Raw Output Internals, Up: Internals
|
||
|
||
16.5 C++ Interface Internals
|
||
============================
|
||
|
||
A system of expression templates is used to ensure something like
|
||
`a=b+c' turns into a simple call to `mpz_add' etc. For `mpf_class' the
|
||
scheme also ensures the precision of the final destination is used for
|
||
any temporaries within a statement like `f=w*x+y*z'. These are
|
||
important features which a naive implementation cannot provide.
|
||
|
||
A simplified description of the scheme follows. The true scheme is
|
||
complicated by the fact that expressions have different return types.
|
||
For detailed information, refer to the source code.
|
||
|
||
To perform an operation, say, addition, we first define a "function
|
||
object" evaluating it,
|
||
|
||
struct __gmp_binary_plus
|
||
{
|
||
static void eval(mpf_t f, mpf_t g, mpf_t h) { mpf_add(f, g, h); }
|
||
};
|
||
|
||
And an "additive expression" object,
|
||
|
||
__gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
|
||
operator+(const mpf_class &f, const mpf_class &g)
|
||
{
|
||
return __gmp_expr
|
||
<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
|
||
}
|
||
|
||
The seemingly redundant `__gmp_expr<__gmp_binary_expr<...>>' is used
|
||
to encapsulate any possible kind of expression into a single template
|
||
type. In fact even `mpf_class' etc are `typedef' specializations of
|
||
`__gmp_expr'.
|
||
|
||
Next we define assignment of `__gmp_expr' to `mpf_class'.
|
||
|
||
template <class T>
|
||
mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
|
||
{
|
||
expr.eval(this->get_mpf_t(), this->precision());
|
||
return *this;
|
||
}
|
||
|
||
template <class Op>
|
||
void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
|
||
(mpf_t f, unsigned long int precision)
|
||
{
|
||
Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
|
||
}
|
||
|
||
where `expr.val1' and `expr.val2' are references to the expression's
|
||
operands (here `expr' is the `__gmp_binary_expr' stored within the
|
||
`__gmp_expr').
|
||
|
||
This way, the expression is actually evaluated only at the time of
|
||
assignment, when the required precision (that of `f') is known.
|
||
Furthermore the target `mpf_t' is now available, thus we can call
|
||
`mpf_add' directly with `f' as the output argument.
|
||
|
||
Compound expressions are handled by defining operators taking
|
||
subexpressions as their arguments, like this:
|
||
|
||
template <class T, class U>
|
||
__gmp_expr
|
||
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
|
||
operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
|
||
{
|
||
return __gmp_expr
|
||
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
|
||
(expr1, expr2);
|
||
}
|
||
|
||
And the corresponding specializations of `__gmp_expr::eval':
|
||
|
||
template <class T, class U, class Op>
|
||
void __gmp_expr
|
||
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
|
||
(mpf_t f, unsigned long int precision)
|
||
{
|
||
// declare two temporaries
|
||
mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
|
||
Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
|
||
}
|
||
|
||
The expression is thus recursively evaluated to any level of
|
||
complexity and all subexpressions are evaluated to the precision of `f'.
|
||
|
||
|
||
File: mpir.info, Node: Contributors, Next: References, Prev: Internals, Up: Top
|
||
|
||
Appendix A Contributors
|
||
***********************
|
||
|
||
Torbjorn Granlund wrote the original GMP library and is still
|
||
developing and maintaining it. Several other individuals and
|
||
organizations have contributed to GMP in various ways. Here is a list
|
||
in chronological order:
|
||
|
||
Gunnar Sjoedin and Hans Riesel helped with mathematical problems in
|
||
early versions of the library.
|
||
|
||
Richard Stallman contributed to the interface design and revised the
|
||
first version of this manual.
|
||
|
||
Brian Beuning and Doug Lea helped with testing of early versions of
|
||
the library and made creative suggestions.
|
||
|
||
John Amanatides of York University in Canada contributed the function
|
||
`mpz_probab_prime_p'.
|
||
|
||
Paul Zimmermann of Inria sparked the development of GMP 2, with his
|
||
comparisons between bignum packages.
|
||
|
||
Ken Weber (Kent State University, Universidade Federal do Rio Grande
|
||
do Sul) contributed `mpz_gcd', `mpz_divexact', `mpn_gcd', and
|
||
`mpn_bdivmod', partially supported by CNPq (Brazil) grant 301314194-2.
|
||
|
||
Per Bothner of Cygnus Support helped to set up GMP to use Cygnus'
|
||
configure. He has also made valuable suggestions and tested numerous
|
||
intermediary releases.
|
||
|
||
Joachim Hollman was involved in the design of the `mpf' interface,
|
||
and in the `mpz' design revisions for version 2.
|
||
|
||
Bennet Yee contributed the initial versions of `mpz_jacobi' and
|
||
`mpz_legendre'.
|
||
|
||
Andreas Schwab contributed the files `mpn/m68k/lshift.S' and
|
||
`mpn/m68k/rshift.S' (now in `.asm' form).
|
||
|
||
The development of floating point functions of GNU MP 2, were
|
||
supported in part by the ESPRIT-BRA (Basic Research Activities) 6846
|
||
project POSSO (POlynomial System SOlving).
|
||
|
||
GNU MP 2 was finished and released by SWOX AB, SWEDEN, in
|
||
cooperation with the IDA Center for Computing Sciences, USA.
|
||
|
||
Robert Harley of Inria, France and David Seal of ARM, England,
|
||
suggested clever improvements for population count.
|
||
|
||
Robert Harley also wrote highly optimized Karatsuba and 3-way Toom
|
||
multiplication functions for GMP 3. He also contributed the ARM
|
||
assembly code.
|
||
|
||
Torsten Ekedahl of the Mathematical department of Stockholm
|
||
University provided significant inspiration during several phases of
|
||
the GMP development. His mathematical expertise helped improve several
|
||
algorithms.
|
||
|
||
Paul Zimmermann wrote the Divide and Conquer division code, the REDC
|
||
code, the REDC-based mpz_powm code, the FFT multiply code, and the
|
||
Karatsuba square root code. He also rewrote the Toom3 code for GMP
|
||
4.2. The ECMNET project Paul is organizing was a driving force behind
|
||
many of the optimizations in GMP 3.
|
||
|
||
Linus Nordberg wrote the new configure system based on autoconf and
|
||
implemented the new random functions.
|
||
|
||
Kent Boortz made the Mac OS 9 port.
|
||
|
||
Kevin Ryde worked on a number of things: optimized x86 code, m4 asm
|
||
macros, parameter tuning, speed measuring, the configure system,
|
||
function inlining, divisibility tests, bit scanning, Jacobi symbols,
|
||
Fibonacci and Lucas number functions, printf and scanf functions, perl
|
||
interface, demo expression parser, the algorithms chapter in the
|
||
manual, `gmpasm-mode.el', and various miscellaneous improvements
|
||
elsewhere.
|
||
|
||
Steve Root helped write the optimized alpha 21264 assembly code.
|
||
|
||
Gerardo Ballabio wrote the `gmpxx.h' C++ class interface and the C++
|
||
`istream' input routines.
|
||
|
||
GNU MP 4 was finished and released by Torbjorn Granlund and Kevin
|
||
Ryde. Torbjorn's work was partially funded by the IDA Center for
|
||
Computing Sciences, USA.
|
||
|
||
Jason Moxham rewrote `mpz_fac_ui'.
|
||
|
||
Pedro Gimeno implemented the Mersenne Twister and made other random
|
||
number improvements.
|
||
|
||
(This list is chronological, not ordered after significance. If you
|
||
have contributed to GMP/MPIR but are not listed above, please tell
|
||
`http://groups.google.com/group/mpir-devel' about the omission!)
|
||
|
||
Thanks go to Hans Thorsen for donating an SGI system for the GMP
|
||
test system environment.
|
||
|
||
In 2008 GMP was forked and gave rise to the MPIR (Multiple Precision
|
||
Integers and Rationals) project. The following people have contributed
|
||
to the MPIR project.
|
||
|
||
William Hart did work on the build system and helped get the first
|
||
release working on numerous systems, including adding build support for
|
||
new assembly patches that compile using yasm. He provided mpn
|
||
implementations of Toom 4 and 7 routines. He also wrote an extended GCD
|
||
version of Niels Moller's fast GCD patches and a fast `mpn_tdiv_q'
|
||
routine.
|
||
|
||
Brian Gladman wrote and maintains MSVC project files so the project
|
||
can build on MSVC. He also did the initial conversion of Pierrick
|
||
Gaudry's and Jason Martin's assembly patches to Intel format. He
|
||
rewrote the benchmark program in C and developed MSVC ports of tune,
|
||
speed, try and the benchmark.
|
||
|
||
Pierrick Gaudry wrote some fast assembly support for AMD 64.
|
||
|
||
Jason Martin wrote some fast assembly patches for Core 2 and
|
||
converted them to intel format. He also did the initial merge of Niels
|
||
Moller's fast GCD patches.
|
||
|
||
Gonzalo Tornaria helped patch config.guess and associated files to
|
||
distinguish modern processors. He also patched mpirbench.
|
||
|
||
Michael Abshoff helped resolve some build issues on various
|
||
platforms. He served for a while as release manager for the MPIR
|
||
project.
|
||
|
||
Mariah Lennox contributed patches to mpirbench and various build
|
||
failure reports.
|
||
|
||
Niels Moller wrote the fast ngcd code for computing integer GCD.
|
||
|
||
Jason Moxham contributed dramatic speed improvements for x86_64
|
||
platforms. He refactored the CPU detection code, improved the speed
|
||
program and contributed many new assembler functions, including
|
||
division functions. He contributed improved root code and mulhi and
|
||
mullo routines. He implemented Peter Montgomery's single limb remainder
|
||
code.
|
||
|
||
Pierrick Gaudry provided initial AMD 64 assembly support and revised
|
||
the FFT code.
|
||
|
||
Paul Zimmermann provided an mpz implementation of Toom 4, wrote much
|
||
of the FFT code and contributed invert.c for computing precomputed
|
||
inverses.
|
||
|
||
Alexander Kruppa revised the FFT code.
|
||
|
||
Torbjorn Granlund revised the FFT code.
|
||
|
||
Marco Bodrato wrote an mpz implementation of the Toom 7 code.
|
||
|
||
Robert Gerbicz contributed fast factorial code.
|
||
|
||
David Harvey wrote fast middle product code and divide and conquer
|
||
approximate quotient code.
|
||
|
||
T. R. Nicely wrote primality tests used in the benchmark code.
|
||
|
||
Jeff Gilchrist assisted with the porting of T. R. Nicely's primality
|
||
code to MPIR.
|
||
|
||
Peter Shrimpton wrote the BPSW primality test used up to
|
||
GMP_LIMB_BITS.
|
||
|
||
|
||
File: mpir.info, Node: References, Next: GNU Free Documentation License, Prev: Contributors, Up: Top
|
||
|
||
Appendix B References
|
||
*********************
|
||
|
||
B.1 Books
|
||
=========
|
||
|
||
* Jonathan M. Borwein and Peter B. Borwein, "Pi and the AGM: A Study
|
||
in Analytic Number Theory and Computational Complexity", Wiley,
|
||
1998.
|
||
|
||
* Henri Cohen, "A Course in Computational Algebraic Number Theory",
|
||
Graduate Texts in Mathematics number 138, Springer-Verlag, 1993.
|
||
`http://www.math.u-bordeaux.fr/~cohen/'
|
||
|
||
* Richard Crandall, Carl Pomerance, "Prime Numbers: A Computational
|
||
Perspective" 2nd edition, Springer, 2005.
|
||
|
||
* Donald E. Knuth, "The Art of Computer Programming", volume 2,
|
||
"Seminumerical Algorithms", 3rd edition, Addison-Wesley, 1998.
|
||
`http://www-cs-faculty.stanford.edu/~knuth/taocp.html'
|
||
|
||
* John D. Lipson, "Elements of Algebra and Algebraic Computing", The
|
||
Benjamin Cummings Publishing Company Inc, 1981.
|
||
|
||
* Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone,
|
||
"Handbook of Applied Cryptography",
|
||
`http://www.cacr.math.uwaterloo.ca/hac/'
|
||
|
||
* Richard M. Stallman, "Using and Porting GCC", Free Software
|
||
Foundation, 1999, available online
|
||
`http://gcc.gnu.org/onlinedocs/', and in the GCC package
|
||
`ftp://ftp.gnu.org/gnu/gcc/'
|
||
|
||
B.2 Papers
|
||
==========
|
||
|
||
* Dan Bernstein, "Detecting perfect powers in essentially linear
|
||
time", Math. Comp. (67) pp. 1253-1283, 1998.
|
||
|
||
* Yves Bertot, Nicolas Magaud and Paul Zimmermann, "A Proof of GMP
|
||
Square Root", Journal of Automated Reasoning, volume 29, 2002, pp.
|
||
225-252. Also available online as INRIA Research Report 4475,
|
||
June 2001, `http://www.inria.fr/rrrt/rr-4475.html'
|
||
|
||
* Marco Bodrato, Alberto Zanoni, "Integer and Polynomial
|
||
Multiplication: Towards optimal Toom-Cook Matrices", ISAAC 2007
|
||
Proceedings, Ontario, Canada, July 29 - August 1, 2007, ACM Press.
|
||
Available online at `http://ln.bodrato.it/issac2007_pdf'
|
||
|
||
* Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division",
|
||
Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022,
|
||
`http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022'
|
||
|
||
* Agner Fog, "Software optimization resources", online at
|
||
`http://www.agner.org/optimize/'
|
||
|
||
* Pierrick Gaudry, Alexander Kruppa, Paul Zimmermann, "A GMP-based
|
||
implementation of Schoenhage-Strassen's large integer
|
||
multiplication algorithm", ISAAC 2007 Proceedings, Ontario,
|
||
Canada, July 29 - August 1, 2007, pp. 167-174, ACM Press. Full
|
||
text available at
|
||
`http://hal.inria.fr/docs/00/14/86/20/PDF/fft.final.pdf'
|
||
|
||
* Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
|
||
Integers using Multiplication", in Proceedings of the SIGPLAN
|
||
PLDI'94 Conference, June 1994. Also available
|
||
`ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz' (and .psl.gz).
|
||
|
||
* David Harvey, "The Karatsuba middle product for integers",
|
||
(preprint), 2009. Available at
|
||
`http://www.cims.nyu.edu/~harvey/mulmid/mulmid.pdf'
|
||
|
||
* Tudor Jebelean, "An algorithm for exact division", Journal of
|
||
Symbolic Computation, volume 15, 1993, pp. 169-180. Research
|
||
report version available
|
||
`ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz'
|
||
|
||
* Tudor Jebelean, "Exact Division with Karatsuba Complexity -
|
||
Extended Abstract", RISC-Linz technical report 96-31,
|
||
`ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz'
|
||
|
||
* Tudor Jebelean, "Practical Integer Division with Karatsuba
|
||
Complexity", ISSAC 97, pp. 339-341. Technical report available
|
||
`ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz'
|
||
|
||
* Tudor Jebelean, "A Generalization of the Binary GCD Algorithm",
|
||
ISSAC 93, pp. 111-116. Technical report version available
|
||
`ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz'
|
||
|
||
* Tudor Jebelean, "A Double-Digit Lehmer-Euclid Algorithm for
|
||
Finding the GCD of Long Integers", Journal of Symbolic
|
||
Computation, volume 19, 1995, pp. 145-157. Technical report
|
||
version also available
|
||
`ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz'
|
||
|
||
* Werner Krandick, Jeremy R. Johnson, "Efficient Multiprecision
|
||
Floating Point Multiplication with Exact Rounding", Technical
|
||
Report, RISC Linz, 1993, available at
|
||
`ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-76.ps.gz'
|
||
|
||
* Werner Krandick and Tudor Jebelean, "Bidirectional Exact Integer
|
||
Division", Journal of Symbolic Computation, volume 21, 1996, pp.
|
||
441-455. Early technical report version also available
|
||
`ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz'
|
||
|
||
* Makoto Matsumoto and Takuji Nishimura, "Mersenne Twister: A
|
||
623-dimensionally equidistributed uniform pseudorandom number
|
||
generator", ACM Transactions on Modelling and Computer Simulation,
|
||
volume 8, January 1998, pp. 3-30. Available online
|
||
`http://www.math.keio.ac.jp/~nisimura/random/doc/mt.ps.gz' (or
|
||
.pdf)
|
||
|
||
* R. Moenck and A. Borodin, "Fast Modular Transforms via Division",
|
||
Proceedings of the 13th Annual IEEE Symposium on Switching and
|
||
Automata Theory, October 1972, pp. 90-96. Reprinted as "Fast
|
||
Modular Transforms", Journal of Computer and System Sciences,
|
||
volume 8, number 3, June 1974, pp. 366-386.
|
||
|
||
* Niels Moller, "On Schoenhage's algorithm and subquadratic integer
|
||
GCD computation", Math. Comp. 2007. Available online at
|
||
`http://www.lysator.liu.se/~nisse/archive/S0025-5718-07-02017-0.pdf'
|
||
|
||
* Peter L. Montgomery, "Modular Multiplication Without Trial
|
||
Division", in Mathematics of Computation, volume 44, number 170,
|
||
April 1985.
|
||
|
||
* Thom Mulders, "On short multiplications and divisions", Appl.
|
||
Algebra Engrg. Comm. Comput. 11 (2000), no. 1, pp. 69-88. Tech.
|
||
report No. 276, Dept. of Comp. Sci., ETH Zurich, Nov 1997,
|
||
available online at
|
||
`ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/2xx/276.pdf'
|
||
|
||
* Arnold Scho"nhage and Volker Strassen, "Schnelle Multiplikation
|
||
grosser Zahlen", Computing 7, 1971, pp. 281-292.
|
||
|
||
* Kenneth Weber, "The accelerated integer GCD algorithm", ACM
|
||
Transactions on Mathematical Software, volume 21, number 1, March
|
||
1995, pp. 111-122.
|
||
|
||
* Paul Zimmermann, "Karatsuba Square Root", INRIA Research Report
|
||
3805, November 1999, `http://www.inria.fr/rrrt/rr-3805.html'
|
||
|
||
* Paul Zimmermann, "A Proof of GMP Fast Division and Square Root
|
||
Implementations",
|
||
`http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz'
|
||
|
||
* Dan Zuras, "On Squaring and Multiplying Large Integers", ARITH-11:
|
||
IEEE Symposium on Computer Arithmetic, 1993, pp. 260 to 271.
|
||
Reprinted as "More on Multiplying and Squaring Large Integers",
|
||
IEEE Transactions on Computers, volume 43, number 8, August 1994,
|
||
pp. 899-908.
|
||
|
||
|
||
File: mpir.info, Node: GNU Free Documentation License, Next: Concept Index, Prev: References, Up: Top
|
||
|
||
Appendix C GNU Free Documentation License
|
||
*****************************************
|
||
|
||
Version 1.2, November 2002
|
||
|
||
Copyright (C) 2000,2001,2002 Free Software Foundation, Inc.
|
||
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
|
||
|
||
Everyone is permitted to copy and distribute verbatim copies
|
||
of this license document, but changing it is not allowed.
|
||
|
||
0. PREAMBLE
|
||
|
||
The purpose of this License is to make a manual, textbook, or other
|
||
functional and useful document "free" in the sense of freedom: to
|
||
assure everyone the effective freedom to copy and redistribute it,
|
||
with or without modifying it, either commercially or
|
||
noncommercially. Secondarily, this License preserves for the
|
||
author and publisher a way to get credit for their work, while not
|
||
being considered responsible for modifications made by others.
|
||
|
||
This License is a kind of "copyleft", which means that derivative
|
||
works of the document must themselves be free in the same sense.
|
||
It complements the GNU General Public License, which is a copyleft
|
||
license designed for free software.
|
||
|
||
We have designed this License in order to use it for manuals for
|
||
free software, because free software needs free documentation: a
|
||
free program should come with manuals providing the same freedoms
|
||
that the software does. But this License is not limited to
|
||
software manuals; it can be used for any textual work, regardless
|
||
of subject matter or whether it is published as a printed book.
|
||
We recommend this License principally for works whose purpose is
|
||
instruction or reference.
|
||
|
||
1. APPLICABILITY AND DEFINITIONS
|
||
|
||
This License applies to any manual or other work, in any medium,
|
||
that contains a notice placed by the copyright holder saying it
|
||
can be distributed under the terms of this License. Such a notice
|
||
grants a world-wide, royalty-free license, unlimited in duration,
|
||
to use that work under the conditions stated herein. The
|
||
"Document", below, refers to any such manual or work. Any member
|
||
of the public is a licensee, and is addressed as "you". You
|
||
accept the license if you copy, modify or distribute the work in a
|
||
way requiring permission under copyright law.
|
||
|
||
A "Modified Version" of the Document means any work containing the
|
||
Document or a portion of it, either copied verbatim, or with
|
||
modifications and/or translated into another language.
|
||
|
||
A "Secondary Section" is a named appendix or a front-matter section
|
||
of the Document that deals exclusively with the relationship of the
|
||
publishers or authors of the Document to the Document's overall
|
||
subject (or to related matters) and contains nothing that could
|
||
fall directly within that overall subject. (Thus, if the Document
|
||
is in part a textbook of mathematics, a Secondary Section may not
|
||
explain any mathematics.) The relationship could be a matter of
|
||
historical connection with the subject or with related matters, or
|
||
of legal, commercial, philosophical, ethical or political position
|
||
regarding them.
|
||
|
||
The "Invariant Sections" are certain Secondary Sections whose
|
||
titles are designated, as being those of Invariant Sections, in
|
||
the notice that says that the Document is released under this
|
||
License. If a section does not fit the above definition of
|
||
Secondary then it is not allowed to be designated as Invariant.
|
||
The Document may contain zero Invariant Sections. If the Document
|
||
does not identify any Invariant Sections then there are none.
|
||
|
||
The "Cover Texts" are certain short passages of text that are
|
||
listed, as Front-Cover Texts or Back-Cover Texts, in the notice
|
||
that says that the Document is released under this License. A
|
||
Front-Cover Text may be at most 5 words, and a Back-Cover Text may
|
||
be at most 25 words.
|
||
|
||
A "Transparent" copy of the Document means a machine-readable copy,
|
||
represented in a format whose specification is available to the
|
||
general public, that is suitable for revising the document
|
||
straightforwardly with generic text editors or (for images
|
||
composed of pixels) generic paint programs or (for drawings) some
|
||
widely available drawing editor, and that is suitable for input to
|
||
text formatters or for automatic translation to a variety of
|
||
formats suitable for input to text formatters. A copy made in an
|
||
otherwise Transparent file format whose markup, or absence of
|
||
markup, has been arranged to thwart or discourage subsequent
|
||
modification by readers is not Transparent. An image format is
|
||
not Transparent if used for any substantial amount of text. A
|
||
copy that is not "Transparent" is called "Opaque".
|
||
|
||
Examples of suitable formats for Transparent copies include plain
|
||
ASCII without markup, Texinfo input format, LaTeX input format,
|
||
SGML or XML using a publicly available DTD, and
|
||
standard-conforming simple HTML, PostScript or PDF designed for
|
||
human modification. Examples of transparent image formats include
|
||
PNG, XCF and JPG. Opaque formats include proprietary formats that
|
||
can be read and edited only by proprietary word processors, SGML or
|
||
XML for which the DTD and/or processing tools are not generally
|
||
available, and the machine-generated HTML, PostScript or PDF
|
||
produced by some word processors for output purposes only.
|
||
|
||
The "Title Page" means, for a printed book, the title page itself,
|
||
plus such following pages as are needed to hold, legibly, the
|
||
material this License requires to appear in the title page. For
|
||
works in formats which do not have any title page as such, "Title
|
||
Page" means the text near the most prominent appearance of the
|
||
work's title, preceding the beginning of the body of the text.
|
||
|
||
A section "Entitled XYZ" means a named subunit of the Document
|
||
whose title either is precisely XYZ or contains XYZ in parentheses
|
||
following text that translates XYZ in another language. (Here XYZ
|
||
stands for a specific section name mentioned below, such as
|
||
"Acknowledgements", "Dedications", "Endorsements", or "History".)
|
||
To "Preserve the Title" of such a section when you modify the
|
||
Document means that it remains a section "Entitled XYZ" according
|
||
to this definition.
|
||
|
||
The Document may include Warranty Disclaimers next to the notice
|
||
which states that this License applies to the Document. These
|
||
Warranty Disclaimers are considered to be included by reference in
|
||
this License, but only as regards disclaiming warranties: any other
|
||
implication that these Warranty Disclaimers may have is void and
|
||
has no effect on the meaning of this License.
|
||
|
||
2. VERBATIM COPYING
|
||
|
||
You may copy and distribute the Document in any medium, either
|
||
commercially or noncommercially, provided that this License, the
|
||
copyright notices, and the license notice saying this License
|
||
applies to the Document are reproduced in all copies, and that you
|
||
add no other conditions whatsoever to those of this License. You
|
||
may not use technical measures to obstruct or control the reading
|
||
or further copying of the copies you make or distribute. However,
|
||
you may accept compensation in exchange for copies. If you
|
||
distribute a large enough number of copies you must also follow
|
||
the conditions in section 3.
|
||
|
||
You may also lend copies, under the same conditions stated above,
|
||
and you may publicly display copies.
|
||
|
||
3. COPYING IN QUANTITY
|
||
|
||
If you publish printed copies (or copies in media that commonly
|
||
have printed covers) of the Document, numbering more than 100, and
|
||
the Document's license notice requires Cover Texts, you must
|
||
enclose the copies in covers that carry, clearly and legibly, all
|
||
these Cover Texts: Front-Cover Texts on the front cover, and
|
||
Back-Cover Texts on the back cover. Both covers must also clearly
|
||
and legibly identify you as the publisher of these copies. The
|
||
front cover must present the full title with all words of the
|
||
title equally prominent and visible. You may add other material
|
||
on the covers in addition. Copying with changes limited to the
|
||
covers, as long as they preserve the title of the Document and
|
||
satisfy these conditions, can be treated as verbatim copying in
|
||
other respects.
|
||
|
||
If the required texts for either cover are too voluminous to fit
|
||
legibly, you should put the first ones listed (as many as fit
|
||
reasonably) on the actual cover, and continue the rest onto
|
||
adjacent pages.
|
||
|
||
If you publish or distribute Opaque copies of the Document
|
||
numbering more than 100, you must either include a
|
||
machine-readable Transparent copy along with each Opaque copy, or
|
||
state in or with each Opaque copy a computer-network location from
|
||
which the general network-using public has access to download
|
||
using public-standard network protocols a complete Transparent
|
||
copy of the Document, free of added material. If you use the
|
||
latter option, you must take reasonably prudent steps, when you
|
||
begin distribution of Opaque copies in quantity, to ensure that
|
||
this Transparent copy will remain thus accessible at the stated
|
||
location until at least one year after the last time you
|
||
distribute an Opaque copy (directly or through your agents or
|
||
retailers) of that edition to the public.
|
||
|
||
It is requested, but not required, that you contact the authors of
|
||
the Document well before redistributing any large number of
|
||
copies, to give them a chance to provide you with an updated
|
||
version of the Document.
|
||
|
||
4. MODIFICATIONS
|
||
|
||
You may copy and distribute a Modified Version of the Document
|
||
under the conditions of sections 2 and 3 above, provided that you
|
||
release the Modified Version under precisely this License, with
|
||
the Modified Version filling the role of the Document, thus
|
||
licensing distribution and modification of the Modified Version to
|
||
whoever possesses a copy of it. In addition, you must do these
|
||
things in the Modified Version:
|
||
|
||
A. Use in the Title Page (and on the covers, if any) a title
|
||
distinct from that of the Document, and from those of
|
||
previous versions (which should, if there were any, be listed
|
||
in the History section of the Document). You may use the
|
||
same title as a previous version if the original publisher of
|
||
that version gives permission.
|
||
|
||
B. List on the Title Page, as authors, one or more persons or
|
||
entities responsible for authorship of the modifications in
|
||
the Modified Version, together with at least five of the
|
||
principal authors of the Document (all of its principal
|
||
authors, if it has fewer than five), unless they release you
|
||
from this requirement.
|
||
|
||
C. State on the Title page the name of the publisher of the
|
||
Modified Version, as the publisher.
|
||
|
||
D. Preserve all the copyright notices of the Document.
|
||
|
||
E. Add an appropriate copyright notice for your modifications
|
||
adjacent to the other copyright notices.
|
||
|
||
F. Include, immediately after the copyright notices, a license
|
||
notice giving the public permission to use the Modified
|
||
Version under the terms of this License, in the form shown in
|
||
the Addendum below.
|
||
|
||
G. Preserve in that license notice the full lists of Invariant
|
||
Sections and required Cover Texts given in the Document's
|
||
license notice.
|
||
|
||
H. Include an unaltered copy of this License.
|
||
|
||
I. Preserve the section Entitled "History", Preserve its Title,
|
||
and add to it an item stating at least the title, year, new
|
||
authors, and publisher of the Modified Version as given on
|
||
the Title Page. If there is no section Entitled "History" in
|
||
the Document, create one stating the title, year, authors,
|
||
and publisher of the Document as given on its Title Page,
|
||
then add an item describing the Modified Version as stated in
|
||
the previous sentence.
|
||
|
||
J. Preserve the network location, if any, given in the Document
|
||
for public access to a Transparent copy of the Document, and
|
||
likewise the network locations given in the Document for
|
||
previous versions it was based on. These may be placed in
|
||
the "History" section. You may omit a network location for a
|
||
work that was published at least four years before the
|
||
Document itself, or if the original publisher of the version
|
||
it refers to gives permission.
|
||
|
||
K. For any section Entitled "Acknowledgements" or "Dedications",
|
||
Preserve the Title of the section, and preserve in the
|
||
section all the substance and tone of each of the contributor
|
||
acknowledgements and/or dedications given therein.
|
||
|
||
L. Preserve all the Invariant Sections of the Document,
|
||
unaltered in their text and in their titles. Section numbers
|
||
or the equivalent are not considered part of the section
|
||
titles.
|
||
|
||
M. Delete any section Entitled "Endorsements". Such a section
|
||
may not be included in the Modified Version.
|
||
|
||
N. Do not retitle any existing section to be Entitled
|
||
"Endorsements" or to conflict in title with any Invariant
|
||
Section.
|
||
|
||
O. Preserve any Warranty Disclaimers.
|
||
|
||
If the Modified Version includes new front-matter sections or
|
||
appendices that qualify as Secondary Sections and contain no
|
||
material copied from the Document, you may at your option
|
||
designate some or all of these sections as invariant. To do this,
|
||
add their titles to the list of Invariant Sections in the Modified
|
||
Version's license notice. These titles must be distinct from any
|
||
other section titles.
|
||
|
||
You may add a section Entitled "Endorsements", provided it contains
|
||
nothing but endorsements of your Modified Version by various
|
||
parties--for example, statements of peer review or that the text
|
||
has been approved by an organization as the authoritative
|
||
definition of a standard.
|
||
|
||
You may add a passage of up to five words as a Front-Cover Text,
|
||
and a passage of up to 25 words as a Back-Cover Text, to the end
|
||
of the list of Cover Texts in the Modified Version. Only one
|
||
passage of Front-Cover Text and one of Back-Cover Text may be
|
||
added by (or through arrangements made by) any one entity. If the
|
||
Document already includes a cover text for the same cover,
|
||
previously added by you or by arrangement made by the same entity
|
||
you are acting on behalf of, you may not add another; but you may
|
||
replace the old one, on explicit permission from the previous
|
||
publisher that added the old one.
|
||
|
||
The author(s) and publisher(s) of the Document do not by this
|
||
License give permission to use their names for publicity for or to
|
||
assert or imply endorsement of any Modified Version.
|
||
|
||
5. COMBINING DOCUMENTS
|
||
|
||
You may combine the Document with other documents released under
|
||
this License, under the terms defined in section 4 above for
|
||
modified versions, provided that you include in the combination
|
||
all of the Invariant Sections of all of the original documents,
|
||
unmodified, and list them all as Invariant Sections of your
|
||
combined work in its license notice, and that you preserve all
|
||
their Warranty Disclaimers.
|
||
|
||
The combined work need only contain one copy of this License, and
|
||
multiple identical Invariant Sections may be replaced with a single
|
||
copy. If there are multiple Invariant Sections with the same name
|
||
but different contents, make the title of each such section unique
|
||
by adding at the end of it, in parentheses, the name of the
|
||
original author or publisher of that section if known, or else a
|
||
unique number. Make the same adjustment to the section titles in
|
||
the list of Invariant Sections in the license notice of the
|
||
combined work.
|
||
|
||
In the combination, you must combine any sections Entitled
|
||
"History" in the various original documents, forming one section
|
||
Entitled "History"; likewise combine any sections Entitled
|
||
"Acknowledgements", and any sections Entitled "Dedications". You
|
||
must delete all sections Entitled "Endorsements."
|
||
|
||
6. COLLECTIONS OF DOCUMENTS
|
||
|
||
You may make a collection consisting of the Document and other
|
||
documents released under this License, and replace the individual
|
||
copies of this License in the various documents with a single copy
|
||
that is included in the collection, provided that you follow the
|
||
rules of this License for verbatim copying of each of the
|
||
documents in all other respects.
|
||
|
||
You may extract a single document from such a collection, and
|
||
distribute it individually under this License, provided you insert
|
||
a copy of this License into the extracted document, and follow
|
||
this License in all other respects regarding verbatim copying of
|
||
that document.
|
||
|
||
7. AGGREGATION WITH INDEPENDENT WORKS
|
||
|
||
A compilation of the Document or its derivatives with other
|
||
separate and independent documents or works, in or on a volume of
|
||
a storage or distribution medium, is called an "aggregate" if the
|
||
copyright resulting from the compilation is not used to limit the
|
||
legal rights of the compilation's users beyond what the individual
|
||
works permit. When the Document is included in an aggregate, this
|
||
License does not apply to the other works in the aggregate which
|
||
are not themselves derivative works of the Document.
|
||
|
||
If the Cover Text requirement of section 3 is applicable to these
|
||
copies of the Document, then if the Document is less than one half
|
||
of the entire aggregate, the Document's Cover Texts may be placed
|
||
on covers that bracket the Document within the aggregate, or the
|
||
electronic equivalent of covers if the Document is in electronic
|
||
form. Otherwise they must appear on printed covers that bracket
|
||
the whole aggregate.
|
||
|
||
8. TRANSLATION
|
||
|
||
Translation is considered a kind of modification, so you may
|
||
distribute translations of the Document under the terms of section
|
||
4. Replacing Invariant Sections with translations requires special
|
||
permission from their copyright holders, but you may include
|
||
translations of some or all Invariant Sections in addition to the
|
||
original versions of these Invariant Sections. You may include a
|
||
translation of this License, and all the license notices in the
|
||
Document, and any Warranty Disclaimers, provided that you also
|
||
include the original English version of this License and the
|
||
original versions of those notices and disclaimers. In case of a
|
||
disagreement between the translation and the original version of
|
||
this License or a notice or disclaimer, the original version will
|
||
prevail.
|
||
|
||
If a section in the Document is Entitled "Acknowledgements",
|
||
"Dedications", or "History", the requirement (section 4) to
|
||
Preserve its Title (section 1) will typically require changing the
|
||
actual title.
|
||
|
||
9. TERMINATION
|
||
|
||
You may not copy, modify, sublicense, or distribute the Document
|
||
except as expressly provided for under this License. Any other
|
||
attempt to copy, modify, sublicense or distribute the Document is
|
||
void, and will automatically terminate your rights under this
|
||
License. However, parties who have received copies, or rights,
|
||
from you under this License will not have their licenses
|
||
terminated so long as such parties remain in full compliance.
|
||
|
||
10. FUTURE REVISIONS OF THIS LICENSE
|
||
|
||
The Free Software Foundation may publish new, revised versions of
|
||
the GNU Free Documentation License from time to time. Such new
|
||
versions will be similar in spirit to the present version, but may
|
||
differ in detail to address new problems or concerns. See
|
||
`http://www.gnu.org/copyleft/'.
|
||
|
||
Each version of the License is given a distinguishing version
|
||
number. If the Document specifies that a particular numbered
|
||
version of this License "or any later version" applies to it, you
|
||
have the option of following the terms and conditions either of
|
||
that specified version or of any later version that has been
|
||
published (not as a draft) by the Free Software Foundation. If
|
||
the Document does not specify a version number of this License,
|
||
you may choose any version ever published (not as a draft) by the
|
||
Free Software Foundation.
|
||
|
||
C.1 ADDENDUM: How to use this License for your documents
|
||
========================================================
|
||
|
||
To use this License in a document you have written, include a copy of
|
||
the License in the document and put the following copyright and license
|
||
notices just after the title page:
|
||
|
||
Copyright (C) YEAR YOUR NAME.
|
||
Permission is granted to copy, distribute and/or modify this document
|
||
under the terms of the GNU Free Documentation License, Version 1.2
|
||
or any later version published by the Free Software Foundation;
|
||
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
|
||
Texts. A copy of the license is included in the section entitled ``GNU
|
||
Free Documentation License''.
|
||
|
||
If you have Invariant Sections, Front-Cover Texts and Back-Cover
|
||
Texts, replace the "with...Texts." line with this:
|
||
|
||
with the Invariant Sections being LIST THEIR TITLES, with
|
||
the Front-Cover Texts being LIST, and with the Back-Cover Texts
|
||
being LIST.
|
||
|
||
If you have Invariant Sections without Cover Texts, or some other
|
||
combination of the three, merge those two alternatives to suit the
|
||
situation.
|
||
|
||
If your document contains nontrivial examples of program code, we
|
||
recommend releasing these examples in parallel under your choice of
|
||
free software license, such as the GNU General Public License, to
|
||
permit their use in free software.
|
||
|
||
|
||
File: mpir.info, Node: Concept Index, Next: Function Index, Prev: GNU Free Documentation License, Up: Top
|
||
|
||
Concept Index
|
||
*************
|
||
|
||
|