10293 lines
408 KiB
Plaintext
10293 lines
408 KiB
Plaintext
\input texinfo @c -*-texinfo-*-
|
|
@c %**start of header
|
|
@setfilename gmp.info
|
|
@include version.texi
|
|
@settitle MPIR @value{VERSION}
|
|
@synindex tp fn
|
|
@iftex
|
|
@afourpaper
|
|
@end iftex
|
|
@comment %**end of header
|
|
|
|
@copying
|
|
This manual describes how to install and use MPIR, the Multiple Precision Integers and Rationals
|
|
library, version @value{VERSION}.
|
|
|
|
Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
|
|
2003, 2004, 2005, 2006 Free Software Foundation, Inc.
|
|
|
|
Copyright 2008 William Hart
|
|
|
|
Permission is granted to copy, distribute and/or modify this document under
|
|
the terms of the GNU Free Documentation License, Version 1.2 or any later
|
|
version published by the Free Software Foundation; with no Invariant Sections,
|
|
with the Front-Cover Texts being ``A GNU Manual'', and with the Back-Cover
|
|
Texts being ``You have freedom to copy and modify this GNU Manual, like GNU
|
|
software''. A copy of the license is included in
|
|
@ref{GNU Free Documentation License}.
|
|
@end copying
|
|
@c Note the @ref above must be on one line, a line break in an @ref within
|
|
@c @copying will bomb in recent texinfo.tex (eg. 2004-04-07.08 which comes
|
|
@c with texinfo 4.7), with messages about missing @endcsname.
|
|
|
|
|
|
@c Texinfo version 4.2 or up will be needed to process this file.
|
|
@c
|
|
@c The version number and edition number are taken from version.texi provided
|
|
@c by automake (note that it's regenerated only if you configure with
|
|
@c --enable-maintainer-mode).
|
|
@c
|
|
@c Notes discussing the present version number of GMP/MPIR in relation to previous
|
|
@c ones (for instance in the "Compatibility" section) must be updated
|
|
@c manually though.
|
|
@c
|
|
@c @cindex entries have been made for function categories and programming
|
|
@c topics. The "mpn" section is not included in this, because a beginner
|
|
@c looking for "GCD" or something is only going to be confused by pointers to
|
|
@c low level routines.
|
|
@c
|
|
@c @cindex entries are present for processors and systems when there's
|
|
@c particular notes concerning them, but not just for everything MPIR
|
|
@c supports.
|
|
@c
|
|
@c Index entries for files use @code rather than @file, @samp or @option,
|
|
@c since the latter come out with quotes in TeX, which are nice in the text
|
|
@c but don't look so good in index columns.
|
|
@c
|
|
@c Tex:
|
|
@c
|
|
@c A suitable texinfo.tex is supplied, a newer one should work equally well.
|
|
@c
|
|
@c HTML:
|
|
@c
|
|
@c Nothing special is done for links to external manuals, they just come out
|
|
@c in the usual makeinfo style, eg. "../libc/Locales.html". If you have
|
|
@c local copies of such manuals then this is a good thing, if not then you
|
|
@c may want to search-and-replace to some online source.
|
|
@c
|
|
|
|
@dircategory GNU libraries
|
|
@direntry
|
|
* mpir: (mpir). MPIR Multiple Precision Integers and Rationals Library.
|
|
@end direntry
|
|
|
|
@c html <meta name="description" content="...">
|
|
@documentdescription
|
|
How to install and use the MPIR multiple precision arithmetic library, version @value{VERSION}.
|
|
@end documentdescription
|
|
|
|
@c smallbook
|
|
@finalout
|
|
@setchapternewpage on
|
|
|
|
@ifnottex
|
|
@node Top, Copying, (dir), (dir)
|
|
@top MPIR
|
|
@end ifnottex
|
|
|
|
@iftex
|
|
@titlepage
|
|
@title MPIR
|
|
@subtitle The Multiple Precision Integers and Rationals Library
|
|
@subtitle Edition @value{EDITION}
|
|
@subtitle @value{UPDATED}
|
|
|
|
@c @author Original version by Torbj@"orn Granlund, Swox AB - modified by William Hart
|
|
@c @email{goodwillhart@gmail.com}
|
|
|
|
@c Include the Distribution inside the titlepage so
|
|
@c that headings are turned off.
|
|
|
|
@tex
|
|
\global\parindent=0pt
|
|
\global\parskip=8pt
|
|
\global\baselineskip=13pt
|
|
@end tex
|
|
|
|
@page
|
|
@vskip 0pt plus 1filll
|
|
@end iftex
|
|
|
|
@insertcopying
|
|
@ifnottex
|
|
@sp 1
|
|
@end ifnottex
|
|
|
|
@iftex
|
|
@end titlepage
|
|
@headings double
|
|
@end iftex
|
|
|
|
@c Don't bother with contents for html, the menus seem adequate.
|
|
@ifnothtml
|
|
@contents
|
|
@end ifnothtml
|
|
|
|
@menu
|
|
* Copying:: MPIR Copying Conditions (LGPL).
|
|
* Introduction to MPIR:: Brief introduction to MPIR.
|
|
* Installing MPIR:: How to configure and compile the MPIR library.
|
|
* MPIR Basics:: What every MPIR user should know.
|
|
* Reporting Bugs:: How to usefully report bugs.
|
|
* Integer Functions:: Functions for arithmetic on signed integers.
|
|
* Rational Number Functions:: Functions for arithmetic on rational numbers.
|
|
* Floating-point Functions:: Functions for arithmetic on floats.
|
|
* Low-level Functions:: Fast functions for natural numbers.
|
|
* Random Number Functions:: Functions for generating random numbers.
|
|
* Formatted Output:: @code{printf} style output.
|
|
* Formatted Input:: @code{scanf} style input.
|
|
* C++ Class Interface:: Class wrappers around MPIR types.
|
|
* BSD Compatible Functions:: All functions found in BSD MP.
|
|
* Custom Allocation:: How to customize the internal allocation.
|
|
* Language Bindings:: Using MPIR from other languages.
|
|
* Algorithms:: What happens behind the scenes.
|
|
* Internals:: How values are represented behind the scenes.
|
|
|
|
* Contributors:: Who brings you this library?
|
|
* References:: Some useful papers and books to read.
|
|
* GNU Free Documentation License::
|
|
* Concept Index::
|
|
* Function Index::
|
|
@end menu
|
|
|
|
|
|
@c @m{T,N} is $T$ in tex or @math{N} otherwise. This is an easy way to give
|
|
@c different forms for math in tex and info. Commas in N or T don't work,
|
|
@c but @C{} can be used instead. \, works in info but not in tex.
|
|
@iftex
|
|
@macro m {T,N}
|
|
@tex$\T\$@end tex
|
|
@end macro
|
|
@end iftex
|
|
@ifnottex
|
|
@macro m {T,N}
|
|
@math{\N\}
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@macro C {}
|
|
,
|
|
@end macro
|
|
|
|
@c @ms{V,N} is $V_N$ in tex or just vn otherwise. This suits simple
|
|
@c subscripts like @ms{x,0}.
|
|
@iftex
|
|
@macro ms {V,N}
|
|
@tex$\V\_{\N\}$@end tex
|
|
@end macro
|
|
@end iftex
|
|
@ifnottex
|
|
@macro ms {V,N}
|
|
\V\\N\
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c @nicode{S} is plain S in info, or @code{S} elsewhere. This can be used
|
|
@c when the quotes that @code{} gives in info aren't wanted, but the
|
|
@c fontification in tex or html is wanted. Doesn't work as @nicode{'\\0'}
|
|
@c though (gives two backslashes in tex).
|
|
@ifinfo
|
|
@macro nicode {S}
|
|
\S\
|
|
@end macro
|
|
@end ifinfo
|
|
@ifnotinfo
|
|
@macro nicode {S}
|
|
@code{\S\}
|
|
@end macro
|
|
@end ifnotinfo
|
|
|
|
@c @nisamp{S} is plain S in info, or @samp{S} elsewhere. This can be used
|
|
@c when the quotes that @samp{} gives in info aren't wanted, but the
|
|
@c fontification in tex or html is wanted.
|
|
@ifinfo
|
|
@macro nisamp {S}
|
|
\S\
|
|
@end macro
|
|
@end ifinfo
|
|
@ifnotinfo
|
|
@macro nisamp {S}
|
|
@samp{\S\}
|
|
@end macro
|
|
@end ifnotinfo
|
|
|
|
@c Usage: @GMPtimes{}
|
|
@c Give either \times or the word "times".
|
|
@tex
|
|
\gdef\GMPtimes{\times}
|
|
@end tex
|
|
@ifnottex
|
|
@macro GMPtimes
|
|
times
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: @GMPmultiply{}
|
|
@c Give * in info, or nothing in tex.
|
|
@tex
|
|
\gdef\GMPmultiply{}
|
|
@end tex
|
|
@ifnottex
|
|
@macro GMPmultiply
|
|
*
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: @GMPabs{x}
|
|
@c Give either |x| in tex, or abs(x) in info or html.
|
|
@tex
|
|
\gdef\GMPabs#1{|#1|}
|
|
@end tex
|
|
@ifnottex
|
|
@macro GMPabs {X}
|
|
@abs{}(\X\)
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: @GMPfloor{x}
|
|
@c Give either \lfloor x\rfloor in tex, or floor(x) in info or html.
|
|
@tex
|
|
\gdef\GMPfloor#1{\lfloor #1\rfloor}
|
|
@end tex
|
|
@ifnottex
|
|
@macro GMPfloor {X}
|
|
floor(\X\)
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: @GMPceil{x}
|
|
@c Give either \lceil x\rceil in tex, or ceil(x) in info or html.
|
|
@tex
|
|
\gdef\GMPceil#1{\lceil #1 \rceil}
|
|
@end tex
|
|
@ifnottex
|
|
@macro GMPceil {X}
|
|
ceil(\X\)
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Math operators already available in tex, made available in info too.
|
|
@c For example @bmod{} can be used in both tex and info.
|
|
@ifnottex
|
|
@macro bmod
|
|
mod
|
|
@end macro
|
|
@macro gcd
|
|
gcd
|
|
@end macro
|
|
@macro ge
|
|
>=
|
|
@end macro
|
|
@macro le
|
|
<=
|
|
@end macro
|
|
@macro log
|
|
log
|
|
@end macro
|
|
@macro min
|
|
min
|
|
@end macro
|
|
@macro leftarrow
|
|
<-
|
|
@end macro
|
|
@macro rightarrow
|
|
->
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c New math operators.
|
|
@c @abs{} can be used in both tex and info, or just \abs in tex.
|
|
@tex
|
|
\gdef\abs{\mathop{\rm abs}}
|
|
@end tex
|
|
@ifnottex
|
|
@macro abs
|
|
abs
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c @cross{} is a \times symbol in tex, or an "x" in info. In tex it works
|
|
@c inside or outside $ $.
|
|
@tex
|
|
\gdef\cross{\ifmmode\times\else$\times$\fi}
|
|
@end tex
|
|
@ifnottex
|
|
@macro cross
|
|
x
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c @times{} made available as a "*" in info and html (already works in tex).
|
|
@ifnottex
|
|
@macro times
|
|
*
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: @W{text}
|
|
@c Like @w{} but working in math mode too.
|
|
@tex
|
|
\gdef\W#1{\ifmmode{#1}\else\w{#1}\fi}
|
|
@end tex
|
|
@ifnottex
|
|
@macro W {S}
|
|
@w{\S\}
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: \GMPdisplay{text}
|
|
@c Put the given text in an @display style indent, but without turning off
|
|
@c paragraph reflow etc.
|
|
@tex
|
|
\gdef\GMPdisplay#1{%
|
|
\noindent
|
|
\advance\leftskip by \lispnarrowing
|
|
#1\par}
|
|
@end tex
|
|
|
|
@c Usage: \GMPhat
|
|
@c A new \hat that will work in math mode, unlike the texinfo redefined
|
|
@c version.
|
|
@tex
|
|
\gdef\GMPhat{\mathaccent"705E}
|
|
@end tex
|
|
|
|
@c Usage: \GMPraise{text}
|
|
@c For use in a $ $ math expression as an alternative to "^". This is good
|
|
@c for @code{} in an exponent, since there seems to be no superscript font
|
|
@c for that.
|
|
@tex
|
|
\gdef\GMPraise#1{\mskip0.5\thinmuskip\hbox{\raise0.8ex\hbox{#1}}}
|
|
@end tex
|
|
|
|
@c Usage: @texlinebreak{}
|
|
@c A line break as per @*, but only in tex.
|
|
@iftex
|
|
@macro texlinebreak
|
|
@*
|
|
@end macro
|
|
@end iftex
|
|
@ifnottex
|
|
@macro texlinebreak
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: @maybepagebreak
|
|
@c Allow tex to insert a page break, if it feels the urge.
|
|
@c Normally blocks of @deftypefun/funx are kept together, which can lead to
|
|
@c some poor page break positioning if it's a big block, like the sets of
|
|
@c division functions etc.
|
|
@tex
|
|
\gdef\maybepagebreak{\penalty0}
|
|
@end tex
|
|
@ifnottex
|
|
@macro maybepagebreak
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@c Usage: @GMPreftop{info,title}
|
|
@c Usage: @GMPpxreftop{info,title}
|
|
@c
|
|
@c Like @ref{} and @pxref{}, but designed for a reference to the top of a
|
|
@c document, not a particular section. The TeX output for plain @ref insists
|
|
@c on printing a particular section, GMPreftop gives just the title.
|
|
@c
|
|
@c The texinfo manual recommends putting a likely section name in references
|
|
@c like this, eg. "Introduction", but it seems better to just give the title.
|
|
@c
|
|
@iftex
|
|
@macro GMPreftop{info,title}
|
|
@i{\title\}
|
|
@end macro
|
|
@macro GMPpxreftop{info,title}
|
|
see @i{\title\}
|
|
@end macro
|
|
@end iftex
|
|
@c
|
|
@ifnottex
|
|
@macro GMPreftop{info,title}
|
|
@ref{Top,\title\,\title\,\info\,\title\}
|
|
@end macro
|
|
@macro GMPpxreftop{info,title}
|
|
@pxref{Top,\title\,\title\,\info\,\title\}
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
|
|
@node Copying, Introduction to MPIR, Top, Top
|
|
@comment node-name, next, previous, up
|
|
@unnumbered MPIR Copying Conditions
|
|
@cindex Copying conditions
|
|
@cindex Conditions for copying MPIR
|
|
@cindex License conditions
|
|
|
|
This library is @dfn{free}; this means that everyone is free to use it and
|
|
free to redistribute it on a free basis. The library is not in the public
|
|
domain; it is copyrighted and there are restrictions on its distribution, but
|
|
these restrictions are designed to permit everything that a good cooperating
|
|
citizen would want to do. What is not allowed is to try to prevent others
|
|
from further sharing any version of this library that they might get from
|
|
you.@refill
|
|
|
|
Specifically, we want to make sure that you have the right to give away copies
|
|
of the library, that you receive source code or else can get it if you want
|
|
it, that you can change this library or use pieces of it in new free programs,
|
|
and that you know you can do these things.@refill
|
|
|
|
To make sure that everyone has such rights, we have to forbid you to deprive
|
|
anyone else of these rights. For example, if you distribute copies of the MPIR
|
|
library, you must give the recipients all the rights that you have. You
|
|
must make sure that they, too, receive or can get the source code. And you
|
|
must tell them their rights.@refill
|
|
|
|
Also, for our own protection, we must make certain that everyone finds out
|
|
that there is no warranty for the MPIR library. If it is modified by
|
|
someone else and passed on, we want their recipients to know that what they
|
|
have is not what we distributed, so that any problems introduced by others
|
|
will not reflect on our reputation.@refill
|
|
|
|
The precise conditions of the license for the MPIR library are found in the
|
|
Lesser General Public License version 2.1 that accompanies the source code,
|
|
see @file{COPYING.LIB}. Certain demonstration programs are provided under the
|
|
terms of the plain General Public License version 2, see @file{COPYING}.
|
|
|
|
|
|
@node Introduction to MPIR, Installing MPIR, Copying, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Introduction to MPIR
|
|
@cindex Introduction
|
|
|
|
MPIR is a portable library written in C for arbitrary precision arithmetic
|
|
on integers, rational numbers, and floating-point numbers. It aims to provide
|
|
the fastest possible arithmetic for all applications that need higher
|
|
precision than is directly supported by the basic C types.
|
|
|
|
Many applications use just a few hundred bits of precision; but some
|
|
applications may need thousands or even millions of bits. MPIR is designed to
|
|
give good performance for both, by choosing algorithms based on the sizes of
|
|
the operands, and by carefully keeping the overhead at a minimum.
|
|
|
|
The speed of MPIR is achieved by using fullwords as the basic arithmetic type,
|
|
by using sophisticated algorithms, by including carefully optimized assembly
|
|
code for the most common inner loops for many different CPUs, and by a general
|
|
emphasis on speed (as opposed to simplicity or elegance).
|
|
|
|
There is assembly code for these CPUs:
|
|
@cindex CPU types
|
|
ARM,
|
|
DEC Alpha 21064, 21164, and 21264,
|
|
AMD 29000,
|
|
AMD K6, K6-2, Athlon, and AMD 64,
|
|
Hitachi SuperH and SH-2,
|
|
HPPA 1.0, 1.1 and 2.0,
|
|
Intel Pentium, Pentium Pro/II/III, Pentium 4, generic x86,
|
|
Intel IA-64, i960, Core 2
|
|
Motorola MC68000, MC68020, MC88100, and MC88110,
|
|
Motorola/IBM PowerPC 32 and 64,
|
|
National NS32000,
|
|
IBM POWER,
|
|
MIPS R3000, R4000,
|
|
SPARCv7, SuperSPARC, generic SPARCv8, UltraSPARC,
|
|
DEC VAX,
|
|
and
|
|
Zilog Z8000.
|
|
Some optimizations also for
|
|
Cray vector systems,
|
|
Clipper,
|
|
IBM ROMP (RT),
|
|
and
|
|
Pyramid AP/XP.
|
|
|
|
@cindex Home page
|
|
@cindex Web page
|
|
@noindent
|
|
For up-to-date information on, and latest version of, MPIR, please see the MPIR web pages at
|
|
|
|
@display
|
|
@uref{http://www.mpir.org/}
|
|
@end display
|
|
|
|
@cindex Mailing lists
|
|
There are a number of public mailing lists of interest. The development list is
|
|
|
|
@display
|
|
@uref{http://groups.google.com/group/mpir-devel/}.
|
|
@end display
|
|
|
|
The proper place for bug reports is @uref{http://groups.google.com/group/mpir-devel}. See
|
|
@ref{Reporting Bugs} for information about reporting bugs.
|
|
|
|
@sp 1
|
|
@section How to use this Manual
|
|
@cindex About this manual
|
|
|
|
Everyone should read @ref{MPIR Basics}. If you need to install the library
|
|
yourself, then read @ref{Installing MPIR}. If you have a system with multiple
|
|
ABIs, then read @ref{ABI and ISA}, for the compiler options that must be used
|
|
on applications.
|
|
|
|
The rest of the manual can be used for later reference, although it is
|
|
probably a good idea to glance through it.
|
|
|
|
|
|
@node Installing MPIR, MPIR Basics, Introduction to MPIR, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Installing MPIR
|
|
@cindex Installing MPIR
|
|
@cindex Configuring MPIR
|
|
@cindex Building MPIR
|
|
|
|
MPIR has an autoconf/automake/libtool based configuration system. On a
|
|
Unix-like system a basic build can be done with
|
|
|
|
@example
|
|
./configure
|
|
make
|
|
@end example
|
|
|
|
@noindent
|
|
Some self-tests can be run with
|
|
|
|
@example
|
|
make check
|
|
@end example
|
|
|
|
@noindent
|
|
And you can install (under @file{/usr/local} by default) with
|
|
|
|
@example
|
|
make install
|
|
@end example
|
|
|
|
If you experience problems, please report them to @uref{http://groups.google.com/group/mpir-devel}.
|
|
See @ref{Reporting Bugs}, for information on what to include in useful bug
|
|
reports.
|
|
|
|
@menu
|
|
* Build Options::
|
|
* ABI and ISA::
|
|
* Notes for Package Builds::
|
|
* Notes for Particular Systems::
|
|
* Known Build Problems::
|
|
* Performance optimization::
|
|
@end menu
|
|
|
|
|
|
@node Build Options, ABI and ISA, Installing MPIR, Installing MPIR
|
|
@section Build Options
|
|
@cindex Build options
|
|
|
|
All the usual autoconf configure options are available, run @samp{./configure
|
|
--help} for a summary. The file @file{INSTALL.autoconf} has some generic
|
|
installation information too.
|
|
|
|
@table @asis
|
|
@item Tools
|
|
@cindex Non-Unix systems
|
|
@samp{configure} requires various Unix-like tools. See @ref{Notes for
|
|
Particular Systems}, for some options on non-Unix systems.
|
|
|
|
It might be possible to build without the help of @samp{configure}, certainly
|
|
all the code is there, but unfortunately you'll be on your own.
|
|
|
|
@item Build Directory
|
|
@cindex Build directory
|
|
To compile in a separate build directory, @command{cd} to that directory, and
|
|
prefix the configure command with the path to the MPIR source directory. For
|
|
example
|
|
|
|
@example
|
|
cd /my/build/dir
|
|
/my/sources/mpir-@value{VERSION}/configure
|
|
@end example
|
|
|
|
Not all @samp{make} programs have the necessary features (@code{VPATH}) to
|
|
support this. In particular, SunOS and Solaris @command{make} have bugs that
|
|
make them unable to build in a separate directory. Use GNU @command{make}
|
|
instead.
|
|
|
|
@item @option{--prefix} and @option{--exec-prefix}
|
|
@cindex Prefix
|
|
@cindex Exec prefix
|
|
@cindex Install prefix
|
|
@cindex @code{--prefix}
|
|
@cindex @code{--exec-prefix}
|
|
The @option{--prefix} option can be used in the normal way to direct MPIR to
|
|
install under a particular tree. The default is @samp{/usr/local}.
|
|
|
|
@option{--exec-prefix} can be used to direct architecture-dependent files like
|
|
@file{libgmp.a}/@file{libmpir.a} to a different location. This can be used to share
|
|
architecture-independent parts like the documentation, but separate the
|
|
dependent parts. Note however that @file{mpir.h} and @file{mp.h} are
|
|
architecture-dependent since they encode certain aspects of @file{libgmp}/@file{libmpir}, so
|
|
it will be necessary to ensure both @file{$prefix/include} and
|
|
@file{$exec_prefix/include} are available to the compiler.
|
|
|
|
@item @option{--disable-shared}, @option{--disable-static}
|
|
@cindex @code{--disable-shared}
|
|
@cindex @code{--disable-static}
|
|
By default both shared and static libraries are built (where possible), but
|
|
one or other can be disabled. Shared libraries result in smaller executables
|
|
and permit code sharing between separate running processes, but on some CPUs
|
|
are slightly slower, having a small cost on each function call.
|
|
|
|
@item Native Compilation, @option{--build=CPU-VENDOR-OS}
|
|
@cindex Native compilation
|
|
@cindex Build system
|
|
@cindex @code{--build}
|
|
For normal native compilation, the system can be specified with
|
|
@samp{--build}. By default @samp{./configure} uses the output from running
|
|
@samp{./config.guess}. On some systems @samp{./config.guess} can determine
|
|
the exact CPU type, on others it will be necessary to give it explicitly. For
|
|
example,
|
|
|
|
@example
|
|
./configure --build=ultrasparc-sun-solaris2.7
|
|
@end example
|
|
|
|
In all cases the @samp{OS} part is important, since it controls how libtool
|
|
generates shared libraries. Running @samp{./config.guess} is the simplest way
|
|
to see what it should be, if you don't know already.
|
|
|
|
@item Cross Compilation, @option{--host=CPU-VENDOR-OS}
|
|
@cindex Cross compiling
|
|
@cindex Host system
|
|
@cindex @code{--host}
|
|
When cross-compiling, the system used for compiling is given by @samp{--build}
|
|
and the system where the library will run is given by @samp{--host}. For
|
|
example when using a FreeBSD Athlon system to build GNU/Linux m68k binaries,
|
|
|
|
@example
|
|
./configure --build=athlon-pc-freebsd3.5 --host=m68k-mac-linux-gnu
|
|
@end example
|
|
|
|
Compiler tools are sought first with the host system type as a prefix. For
|
|
example @command{m68k-mac-linux-gnu-ranlib} is tried, then plain
|
|
@command{ranlib}. This makes it possible for a set of cross-compiling tools
|
|
to co-exist with native tools. The prefix is the argument to @samp{--host},
|
|
and this can be an alias, such as @samp{m68k-linux}. But note that tools
|
|
don't have to be setup this way, it's enough to just have a @env{PATH} with a
|
|
suitable cross-compiling @command{cc} etc.
|
|
|
|
Compiling for a different CPU in the same family as the build system is a form
|
|
of cross-compilation, though very possibly this would merely be special
|
|
options on a native compiler. In any case @samp{./configure} avoids depending
|
|
on being able to run code on the build system, which is important when
|
|
creating binaries for a newer CPU since they very possibly won't run on the
|
|
build system.
|
|
|
|
In all cases the compiler must be able to produce an executable (of whatever
|
|
format) from a standard C @code{main}. Although only object files will go to
|
|
make up @file{libgmp}/@file{libmpir}, @samp{./configure} uses linking tests for various
|
|
purposes, such as determining what functions are available on the host system.
|
|
|
|
Currently a warning is given unless an explicit @samp{--build} is used when
|
|
cross-compiling, because it may not be possible to correctly guess the build
|
|
system type if the @env{PATH} has only a cross-compiling @command{cc}.
|
|
|
|
Note that the @samp{--target} option is not appropriate for MPIR@. It's for use
|
|
when building compiler tools, with @samp{--host} being where they will run,
|
|
and @samp{--target} what they'll produce code for. Ordinary programs or
|
|
libraries like MPIR are only interested in the @samp{--host} part, being where
|
|
they'll run.
|
|
|
|
@item CPU types
|
|
@cindex CPU types
|
|
In general, if you want a library that runs as fast as possible, you should
|
|
configure MPIR for the exact CPU type your system uses. However, this may mean
|
|
the binaries won't run on older members of the family, and might run slower on
|
|
other members, older or newer. The best idea is always to build MPIR for the
|
|
exact machine type you intend to run it on.
|
|
|
|
The following CPUs have specific support. See @file{configure.in} for details
|
|
of what code and compiler options they select.
|
|
|
|
@itemize @bullet
|
|
|
|
@c Keep this formatting, it's easy to read and it can be grepped to
|
|
@c automatically test that CPUs listed get through ./config.sub
|
|
|
|
@item
|
|
Alpha:
|
|
@nisamp{alpha},
|
|
@nisamp{alphaev5},
|
|
@nisamp{alphaev56},
|
|
@nisamp{alphapca56},
|
|
@nisamp{alphapca57},
|
|
@nisamp{alphaev6},
|
|
@nisamp{alphaev67},
|
|
@nisamp{alphaev68}
|
|
@nisamp{alphaev7}
|
|
|
|
@item
|
|
Cray:
|
|
@nisamp{c90},
|
|
@nisamp{j90},
|
|
@nisamp{t90},
|
|
@nisamp{sv1}
|
|
|
|
@item
|
|
HPPA:
|
|
@nisamp{hppa1.0},
|
|
@nisamp{hppa1.1},
|
|
@nisamp{hppa2.0},
|
|
@nisamp{hppa2.0n},
|
|
@nisamp{hppa2.0w},
|
|
@nisamp{hppa64}
|
|
|
|
@item
|
|
IA-64:
|
|
@nisamp{ia64},
|
|
@nisamp{itanium},
|
|
@nisamp{itanium2}
|
|
|
|
@item
|
|
MIPS:
|
|
@nisamp{mips},
|
|
@nisamp{mips3},
|
|
@nisamp{mips64}
|
|
|
|
@item
|
|
Motorola:
|
|
@nisamp{m68k},
|
|
@nisamp{m68000},
|
|
@nisamp{m68010},
|
|
@nisamp{m68020},
|
|
@nisamp{m68030},
|
|
@nisamp{m68040},
|
|
@nisamp{m68060},
|
|
@nisamp{m68302},
|
|
@nisamp{m68360},
|
|
@nisamp{m88k},
|
|
@nisamp{m88110}
|
|
|
|
@item
|
|
POWER:
|
|
@nisamp{power},
|
|
@nisamp{power1},
|
|
@nisamp{power2},
|
|
@nisamp{power2sc}
|
|
|
|
@item
|
|
PowerPC:
|
|
@nisamp{powerpc},
|
|
@nisamp{powerpc64},
|
|
@nisamp{powerpc401},
|
|
@nisamp{powerpc403},
|
|
@nisamp{powerpc405},
|
|
@nisamp{powerpc505},
|
|
@nisamp{powerpc601},
|
|
@nisamp{powerpc602},
|
|
@nisamp{powerpc603},
|
|
@nisamp{powerpc603e},
|
|
@nisamp{powerpc604},
|
|
@nisamp{powerpc604e},
|
|
@nisamp{powerpc620},
|
|
@nisamp{powerpc630},
|
|
@nisamp{powerpc740},
|
|
@nisamp{powerpc7400},
|
|
@nisamp{powerpc7450},
|
|
@nisamp{powerpc750},
|
|
@nisamp{powerpc801},
|
|
@nisamp{powerpc821},
|
|
@nisamp{powerpc823},
|
|
@nisamp{powerpc860},
|
|
@nisamp{powerpc970}
|
|
|
|
@item
|
|
SPARC:
|
|
@nisamp{sparc},
|
|
@nisamp{sparcv8},
|
|
@nisamp{microsparc},
|
|
@nisamp{supersparc},
|
|
@nisamp{sparcv9},
|
|
@nisamp{ultrasparc},
|
|
@nisamp{ultrasparc2},
|
|
@nisamp{ultrasparc2i},
|
|
@nisamp{ultrasparc3},
|
|
@nisamp{sparc64}
|
|
|
|
@item
|
|
x86 family:
|
|
@nisamp{i386},
|
|
@nisamp{i486},
|
|
@nisamp{i586},
|
|
@nisamp{core2},
|
|
@nisamp{pentium},
|
|
@nisamp{pentiummmx},
|
|
@nisamp{pentiumpro},
|
|
@nisamp{pentium2},
|
|
@nisamp{pentium3},
|
|
@nisamp{pentium4},
|
|
@nisamp{k6},
|
|
@nisamp{k62},
|
|
@nisamp{k63},
|
|
@nisamp{athlon},
|
|
@nisamp{amd64},
|
|
@nisamp{viac3},
|
|
@nisamp{viac32}
|
|
@nisamp{x86_64}
|
|
|
|
@item
|
|
Other:
|
|
@nisamp{a29k},
|
|
@nisamp{arm},
|
|
@nisamp{clipper},
|
|
@nisamp{i960},
|
|
@nisamp{ns32k},
|
|
@nisamp{pyramid},
|
|
@nisamp{sh},
|
|
@nisamp{sh2},
|
|
@nisamp{vax},
|
|
@nisamp{z8k}
|
|
@end itemize
|
|
|
|
CPUs not listed will use generic C code.
|
|
|
|
@item Generic C Build
|
|
@cindex Generic C
|
|
If some of the assembly code causes problems, or if otherwise desired, the
|
|
generic C code can be selected with CPU @samp{none}. For example,
|
|
|
|
@example
|
|
./configure --host=none-unknown-freebsd3.5
|
|
@end example
|
|
|
|
Note that this will run quite slowly, but it should be portable and should at
|
|
least make it possible to get something running if all else fails.
|
|
|
|
@item Fat binary, @option{--enable-fat}
|
|
@cindex Fat binary
|
|
@cindex @option{--enable-fat}
|
|
Using @option{--enable-fat} selects a ``fat binary'' build on x86, where
|
|
optimized low level subroutines are chosen at runtime according to the CPU
|
|
detected. This means more code, but gives good performance on all x86 chips.
|
|
(This option might become available for more architectures in the future.)
|
|
|
|
@item @option{ABI}
|
|
@cindex ABI
|
|
On some systems MPIR supports multiple ABIs (application binary interfaces),
|
|
meaning data type sizes and calling conventions. By default MPIR chooses the
|
|
best ABI available, but a particular ABI can be selected. For example
|
|
|
|
@example
|
|
./configure --host=mips64-sgi-irix6 ABI=n32
|
|
@end example
|
|
|
|
See @ref{ABI and ISA}, for the available choices on relevant CPUs, and what
|
|
applications need to do.
|
|
|
|
@item @option{CC}, @option{CFLAGS}
|
|
@cindex C compiler
|
|
@cindex @code{CC}
|
|
@cindex @code{CFLAGS}
|
|
By default the C compiler used is chosen from among some likely candidates,
|
|
with @command{gcc} normally preferred if it's present. The usual
|
|
@samp{CC=whatever} can be passed to @samp{./configure} to choose something
|
|
different.
|
|
|
|
For various systems, default compiler flags are set based on the CPU and
|
|
compiler. The usual @samp{CFLAGS="-whatever"} can be passed to
|
|
@samp{./configure} to use something different or to set good flags for systems
|
|
MPIR doesn't otherwise know.
|
|
|
|
The @samp{CC} and @samp{CFLAGS} used are printed during @samp{./configure},
|
|
and can be found in each generated @file{Makefile}. This is the easiest way
|
|
to check the defaults when considering changing or adding something.
|
|
|
|
Note that when @samp{CC} and @samp{CFLAGS} are specified on a system
|
|
supporting multiple ABIs it's important to give an explicit
|
|
@samp{ABI=whatever}, since MPIR can't determine the ABI just from the flags and
|
|
won't be able to select the correct assembler code.
|
|
|
|
If just @samp{CC} is selected then normal default @samp{CFLAGS} for that
|
|
compiler will be used (if MPIR recognises it). For example @samp{CC=gcc} can
|
|
be used to force the use of GCC, with default flags (and default ABI).
|
|
|
|
@item @option{CPPFLAGS}
|
|
@cindex @code{CPPFLAGS}
|
|
Any flags like @samp{-D} defines or @samp{-I} includes required by the
|
|
preprocessor should be set in @samp{CPPFLAGS} rather than @samp{CFLAGS}.
|
|
Compiling is done with both @samp{CPPFLAGS} and @samp{CFLAGS}, but
|
|
preprocessing uses just @samp{CPPFLAGS}. This distinction is because most
|
|
preprocessors won't accept all the flags the compiler does. Preprocessing is
|
|
done separately in some configure tests, and in the @samp{ansi2knr} support
|
|
for K&R compilers.
|
|
|
|
@item @option{CC_FOR_BUILD}
|
|
@cindex @code{CC_FOR_BUILD}
|
|
Some build-time programs are compiled and run to generate host-specific data
|
|
tables. @samp{CC_FOR_BUILD} is the compiler used for this. It doesn't need
|
|
to be in any particular ABI or mode, it merely needs to generate executables
|
|
that can run. The default is to try the selected @samp{CC} and some likely
|
|
candidates such as @samp{cc} and @samp{gcc}, looking for something that works.
|
|
|
|
No flags are used with @samp{CC_FOR_BUILD} because a simple invocation like
|
|
@samp{cc foo.c} should be enough. If some particular options are required
|
|
they can be included as for instance @samp{CC_FOR_BUILD="cc -whatever"}.
|
|
|
|
@item C++ Support, @option{--enable-cxx}
|
|
@cindex C++ support
|
|
@cindex @code{--enable-cxx}
|
|
C++ support in MPIR can be enabled with @samp{--enable-cxx}, in which case a
|
|
C++ compiler will be required. As a convenience @samp{--enable-cxx=detect}
|
|
can be used to enable C++ support only if a compiler can be found. The C++
|
|
support consists of a library @file{libgmpxx.la}/@file{libmpirxx.la} and header file
|
|
@file{gmpxx.h}/@file{mpirxx.h} (@pxref{Headers and Libraries}).
|
|
|
|
A separate @file{libgmpxx.la}/@file{libmpirxx.la} has been adopted rather than having C++ objects
|
|
within @file{libgmp.la}/@file{libmpir.la} in order to ensure dynamic linked C programs aren't
|
|
bloated by a dependency on the C++ standard library, and to avoid any chance
|
|
that the C++ compiler could be required when linking plain C programs.
|
|
|
|
@file{libgmpxx.la}/@file{libmpirxx.la} will use certain internals from @file{libgmp.la}/@file{libmpir.la} and can
|
|
only be expected to work with @file{libgmp.la}/@file{libmpir.la} from the same MPIR version.
|
|
Future changes to the relevant internals will be accompanied by renaming, so a
|
|
mismatch will cause unresolved symbols rather than perhaps mysterious
|
|
misbehaviour.
|
|
|
|
In general @file{libgmpxx.la}/@file{libmpirxx.la} will be usable only with the C++ compiler that
|
|
built it, since name mangling and runtime support are usually incompatible
|
|
between different compilers.
|
|
|
|
@item @option{CXX}, @option{CXXFLAGS}
|
|
@cindex C++ compiler
|
|
@cindex @code{CXX}
|
|
@cindex @code{CXXFLAGS}
|
|
When C++ support is enabled, the C++ compiler and its flags can be set with
|
|
variables @samp{CXX} and @samp{CXXFLAGS} in the usual way. The default for
|
|
@samp{CXX} is the first compiler that works from a list of likely candidates,
|
|
with @command{g++} normally preferred when available. The default for
|
|
@samp{CXXFLAGS} is to try @samp{CFLAGS}, @samp{CFLAGS} without @samp{-g}, then
|
|
for @command{g++} either @samp{-g -O2} or @samp{-O2}, or for other compilers
|
|
@samp{-g} or nothing. Trying @samp{CFLAGS} this way is convenient when using
|
|
@samp{gcc} and @samp{g++} together, since the flags for @samp{gcc} will
|
|
usually suit @samp{g++}.
|
|
|
|
It's important that the C and C++ compilers match, meaning their startup and
|
|
runtime support routines are compatible and that they generate code in the
|
|
same ABI (if there's a choice of ABIs on the system). @samp{./configure}
|
|
isn't currently able to check these things very well itself, so for that
|
|
reason @samp{--disable-cxx} is the default, to avoid a build failure due to a
|
|
compiler mismatch. Perhaps this will change in the future.
|
|
|
|
Incidentally, it's normally not good enough to set @samp{CXX} to the same as
|
|
@samp{CC}. Although @command{gcc} for instance recognises @file{foo.cc} as
|
|
C++ code, only @command{g++} will invoke the linker the right way when
|
|
building an executable or shared library from C++ object files.
|
|
|
|
@item Temporary Memory, @option{--enable-alloca=<choice>}
|
|
@cindex Temporary memory
|
|
@cindex Stack overflow
|
|
@cindex @code{alloca}
|
|
@cindex @code{--enable-alloca}
|
|
MPIR allocates temporary workspace using one of the following three methods,
|
|
which can be selected with for instance
|
|
@samp{--enable-alloca=malloc-reentrant}.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@samp{alloca} - C library or compiler builtin.
|
|
@item
|
|
@samp{malloc-reentrant} - the heap, in a re-entrant fashion.
|
|
@item
|
|
@samp{malloc-notreentrant} - the heap, with global variables.
|
|
@end itemize
|
|
|
|
For convenience, the following choices are also available.
|
|
@samp{--disable-alloca} is the same as @samp{no}.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@samp{yes} - a synonym for @samp{alloca}.
|
|
@item
|
|
@samp{no} - a synonym for @samp{malloc-reentrant}.
|
|
@item
|
|
@samp{reentrant} - @code{alloca} if available, otherwise
|
|
@samp{malloc-reentrant}. This is the default.
|
|
@item
|
|
@samp{notreentrant} - @code{alloca} if available, otherwise
|
|
@samp{malloc-notreentrant}.
|
|
@end itemize
|
|
|
|
@code{alloca} is reentrant and fast, and is recommended. It actually allocates
|
|
just small blocks on the stack; larger ones use malloc-reentrant.
|
|
|
|
@samp{malloc-reentrant} is, as the name suggests, reentrant and thread safe,
|
|
but @samp{malloc-notreentrant} is faster and should be used if reentrancy is
|
|
not required.
|
|
|
|
The two malloc methods in fact use the memory allocation functions selected by
|
|
@code{mp_set_memory_functions}, these being @code{malloc} and friends by
|
|
default. @xref{Custom Allocation}.
|
|
|
|
An additional choice @samp{--enable-alloca=debug} is available, to help when
|
|
debugging memory related problems (@pxref{Debugging}).
|
|
|
|
@item FFT Multiplication, @option{--disable-fft}
|
|
@cindex FFT multiplication
|
|
@cindex @code{--disable-fft}
|
|
By default multiplications are done using Karatsuba, 3-way Toom, and
|
|
Fermat FFT@. The FFT is only used on large to very large operands and can be
|
|
disabled to save code size if desired.
|
|
|
|
@item Berkeley MP, @option{--enable-mpbsd}
|
|
@cindex Berkeley MP compatible functions
|
|
@cindex BSD MP compatible functions
|
|
@cindex @code{--enable-mpbsd}
|
|
The Berkeley MP compatibility library (@file{libmp}) and header file
|
|
(@file{mp.h}) are built and installed only if @option{--enable-mpbsd} is used.
|
|
@xref{BSD Compatible Functions}.
|
|
|
|
@item Assertion Checking, @option{--enable-assert}
|
|
@cindex Assertion checking
|
|
@cindex @code{--enable-assert}
|
|
This option enables some consistency checking within the library. This can be
|
|
of use while debugging, @pxref{Debugging}.
|
|
|
|
@item Execution Profiling, @option{--enable-profiling=prof/gprof/instrument}
|
|
@cindex Execution profiling
|
|
@cindex @code{--enable-profiling}
|
|
Enable profiling support, in one of various styles, @pxref{Profiling}.
|
|
|
|
@item @option{MPN_PATH}
|
|
@cindex @code{MPN_PATH}
|
|
Various assembler versions of each mpn subroutines are provided. For a given
|
|
CPU, a search is made though a path to choose a version of each. For example
|
|
@samp{sparcv8} has
|
|
|
|
@example
|
|
MPN_PATH="sparc32/v8 sparc32 generic"
|
|
@end example
|
|
|
|
which means look first for v8 code, then plain sparc32 (which is v7), and
|
|
finally fall back on generic C@. Knowledgeable users with special requirements
|
|
can specify a different path. Normally this is completely unnecessary.
|
|
|
|
@item Documentation
|
|
@cindex Documentation formats
|
|
@cindex Texinfo
|
|
The source for the document you're now reading is @file{doc/gmp.texi}, in
|
|
Texinfo format, see @GMPreftop{texinfo, Texinfo}.
|
|
|
|
@cindex Postscript
|
|
@cindex DVI
|
|
@cindex PDF
|
|
Info format @samp{doc/gmp.info} is included in the distribution. The usual
|
|
automake targets are available to make PostScript, DVI, PDF and HTML (these
|
|
will require various @TeX{} and Texinfo tools).
|
|
|
|
@cindex DocBook
|
|
@cindex XML
|
|
DocBook and XML can be generated by the Texinfo @command{makeinfo} program
|
|
too, see @ref{makeinfo options,, Options for @command{makeinfo}, texinfo,
|
|
Texinfo}.
|
|
|
|
Some supplementary notes can also be found in the @file{doc} subdirectory.
|
|
|
|
@end table
|
|
|
|
|
|
@need 2000
|
|
@node ABI and ISA, Notes for Package Builds, Build Options, Installing MPIR
|
|
@section ABI and ISA
|
|
@cindex ABI
|
|
@cindex Application Binary Interface
|
|
@cindex ISA
|
|
@cindex Instruction Set Architecture
|
|
|
|
ABI (Application Binary Interface) refers to the calling conventions between
|
|
functions, meaning what registers are used and what sizes the various C data
|
|
types are. ISA (Instruction Set Architecture) refers to the instructions and
|
|
registers a CPU has available.
|
|
|
|
Some 64-bit ISA CPUs have both a 64-bit ABI and a 32-bit ABI defined, the
|
|
latter for compatibility with older CPUs in the family. MPIR supports some
|
|
CPUs like this in both ABIs. In fact within MPIR @samp{ABI} means a
|
|
combination of chip ABI, plus how MPIR chooses to use it. For example in some
|
|
32-bit ABIs, MPIR may support a limb as either a 32-bit @code{long} or a 64-bit
|
|
@code{long long}.
|
|
|
|
By default MPIR chooses the best ABI available for a given system, and this
|
|
generally gives significantly greater speed. But an ABI can be chosen
|
|
explicitly to make MPIR compatible with other libraries, or particular
|
|
application requirements. For example,
|
|
|
|
@example
|
|
./configure ABI=32
|
|
@end example
|
|
|
|
In all cases it's vital that all object code used in a given program is
|
|
compiled for the same ABI.
|
|
|
|
Usually a limb is implemented as a @code{long}. When a @code{long long} limb
|
|
is used this is encoded in the generated @file{mpir.h}. This is convenient for
|
|
applications, but it does mean that @file{mpir.h} will vary, and can't be just
|
|
copied around. @file{mpir.h} remains compiler independent though, since all
|
|
compilers for a particular ABI will be expected to use the same limb type.
|
|
|
|
Currently no attempt is made to follow whatever conventions a system has for
|
|
installing library or header files built for a particular ABI@. This will
|
|
probably only matter when installing multiple builds of MPIR, and it might be
|
|
as simple as configuring with a special @samp{libdir}, or it might require
|
|
more than that. Note that builds for different ABIs need to done separately,
|
|
with a fresh @command{./configure} and @command{make} each.
|
|
|
|
@sp 1
|
|
@table @asis
|
|
@need 1000
|
|
@item AMD64 (@samp{x86_64})
|
|
@cindex AMD64
|
|
On AMD64 systems supporting both 32-bit and 64-bit modes for applications, the
|
|
following ABI choices are available.
|
|
|
|
@table @asis
|
|
@item @samp{ABI=64}
|
|
The 64-bit ABI uses 64-bit limbs and pointers and makes full use of the chip
|
|
architecture. This is the default. Applications will usually not need
|
|
special compiler flags, but for reference the option is
|
|
|
|
@example
|
|
gcc -m64
|
|
@end example
|
|
|
|
@item @samp{ABI=32}
|
|
The 32-bit ABI is the usual i386 conventions. This will be slower, and is not
|
|
recommended except for inter-operating with other code not yet 64-bit capable.
|
|
Applications must be compiled with
|
|
|
|
@example
|
|
gcc -m32
|
|
@end example
|
|
|
|
(In GCC 2.95 and earlier there's no @samp{-m32} option, it's the only mode.)
|
|
@end table
|
|
|
|
@sp 1
|
|
@need 1000
|
|
@item HPPA 2.0 (@samp{hppa2.0*}, @samp{hppa64})
|
|
@cindex HPPA
|
|
@cindex HP-UX
|
|
@table @asis
|
|
@item @samp{ABI=2.0w}
|
|
The 2.0w ABI uses 64-bit limbs and pointers and is available on HP-UX 11 or
|
|
up. Applications must be compiled with
|
|
|
|
@example
|
|
gcc [built for 2.0w]
|
|
cc +DD64
|
|
@end example
|
|
|
|
@item @samp{ABI=2.0n}
|
|
The 2.0n ABI means the 32-bit HPPA 1.0 ABI and all its normal calling
|
|
conventions, but with 64-bit instructions permitted within functions. MPIR
|
|
uses a 64-bit @code{long long} for a limb. This ABI is available on hppa64
|
|
GNU/Linux and on HP-UX 10 or higher. Applications must be compiled with
|
|
|
|
@example
|
|
gcc [built for 2.0n]
|
|
cc +DA2.0 +e
|
|
@end example
|
|
|
|
Note that current versions of GCC (eg.@: 3.2) don't generate 64-bit
|
|
instructions for @code{long long} operations and so may be slower than for
|
|
2.0w. (The MPIR assembler code is the same though.)
|
|
|
|
@item @samp{ABI=1.0}
|
|
HPPA 2.0 CPUs can run all HPPA 1.0 and 1.1 code in the 32-bit HPPA 1.0 ABI@.
|
|
No special compiler options are needed for applications.
|
|
@end table
|
|
|
|
All three ABIs are available for CPU types @samp{hppa2.0w}, @samp{hppa2.0} and
|
|
@samp{hppa64}, but for CPU type @samp{hppa2.0n} only 2.0n or 1.0 are
|
|
considered.
|
|
|
|
Note that GCC on HP-UX has no options to choose between 2.0n and 2.0w modes,
|
|
unlike HP @command{cc}. Instead it must be built for one or the other ABI@.
|
|
MPIR will detect how it was built, and skip to the corresponding @samp{ABI}.
|
|
|
|
@sp 1
|
|
@need 1500
|
|
@item IA-64 under HP-UX (@samp{ia64*-*-hpux*}, @samp{itanium*-*-hpux*})
|
|
@cindex IA-64
|
|
@cindex HP-UX
|
|
HP-UX supports two ABIs for IA-64. MPIR performance is the same in both.
|
|
|
|
@table @asis
|
|
@item @samp{ABI=32}
|
|
In the 32-bit ABI, pointers, @code{int}s and @code{long}s are 32 bits and MPIR
|
|
uses a 64 bit @code{long long} for a limb. Applications can be compiled
|
|
without any special flags since this ABI is the default in both HP C and GCC,
|
|
but for reference the flags are
|
|
|
|
@example
|
|
gcc -milp32
|
|
cc +DD32
|
|
@end example
|
|
|
|
@item @samp{ABI=64}
|
|
In the 64-bit ABI, @code{long}s and pointers are 64 bits and MPIR uses a
|
|
@code{long} for a limb. Applications must be compiled with
|
|
|
|
@example
|
|
gcc -mlp64
|
|
cc +DD64
|
|
@end example
|
|
@end table
|
|
|
|
On other IA-64 systems, GNU/Linux for instance, @samp{ABI=64} is the only
|
|
choice.
|
|
|
|
@sp 1
|
|
@need 1000
|
|
@item MIPS under IRIX 6 (@samp{mips*-*-irix[6789]})
|
|
@cindex MIPS
|
|
@cindex IRIX
|
|
IRIX 6 always has a 64-bit MIPS 3 or better CPU, and supports ABIs o32, n32,
|
|
and 64. n32 or 64 are recommended, and MPIR performance will be the same in
|
|
each. The default is n32.
|
|
|
|
@table @asis
|
|
@item @samp{ABI=o32}
|
|
The o32 ABI is 32-bit pointers and integers, and no 64-bit operations. MPIR
|
|
will be slower than in n32 or 64, this option only exists to support old
|
|
compilers, eg.@: GCC 2.7.2. Applications can be compiled with no special
|
|
flags on an old compiler, or on a newer compiler with
|
|
|
|
@example
|
|
gcc -mabi=32
|
|
cc -32
|
|
@end example
|
|
|
|
@item @samp{ABI=n32}
|
|
The n32 ABI is 32-bit pointers and integers, but with a 64-bit limb using a
|
|
@code{long long}. Applications must be compiled with
|
|
|
|
@example
|
|
gcc -mabi=n32
|
|
cc -n32
|
|
@end example
|
|
|
|
@item @samp{ABI=64}
|
|
The 64-bit ABI is 64-bit pointers and integers. Applications must be compiled
|
|
with
|
|
|
|
@example
|
|
gcc -mabi=64
|
|
cc -64
|
|
@end example
|
|
@end table
|
|
|
|
Note that MIPS GNU/Linux, as of kernel version 2.2, doesn't have the necessary
|
|
support for n32 or 64 and so only gets a 32-bit limb and the MIPS 2 code.
|
|
|
|
@sp 1
|
|
@need 1000
|
|
@item PowerPC 64 (@samp{powerpc64}, @samp{powerpc620}, @samp{powerpc630}, @samp{powerpc970})
|
|
@cindex PowerPC
|
|
@table @asis
|
|
@item @samp{ABI=aix64}
|
|
@cindex AIX
|
|
The AIX 64 ABI uses 64-bit limbs and pointers and is the default on PowerPC 64
|
|
@samp{*-*-aix*} systems. Applications must be compiled with
|
|
|
|
@example
|
|
gcc -maix64
|
|
xlc -q64
|
|
@end example
|
|
|
|
@item @samp{ABI=mode32}
|
|
@cindex AIX
|
|
The @samp{mode32} ABI uses a 64-bit @code{long long} limb but with the chip
|
|
still in 32-bit mode and using 32-bit calling conventions. This is the
|
|
default on PowerPC 64 @samp{*-*-darwin*} systems. No special compiler options
|
|
are needed for applications.
|
|
|
|
@item @samp{ABI=32}
|
|
This is the basic 32-bit PowerPC ABI, with a 32-bit limb. No special compiler
|
|
options are needed for applications.
|
|
@end table
|
|
|
|
MPIR speed is greatest in @samp{aix64} and @samp{mode32}. In @samp{ABI=32}
|
|
only the 32-bit ISA is used and this doesn't make full use of a 64-bit chip.
|
|
On a suitable system we could perhaps use more of the ISA, but there are no
|
|
plans to do so.
|
|
|
|
@sp 1
|
|
@need 1000
|
|
@item Sparc V9 (@samp{sparc64}, @samp{sparcv9}, @samp{ultrasparc*})
|
|
@cindex Sparc V9
|
|
@cindex Solaris
|
|
@cindex Sun
|
|
@table @asis
|
|
@item @samp{ABI=64}
|
|
The 64-bit V9 ABI is available on the various BSD sparc64 ports, recent
|
|
versions of Sparc64 GNU/Linux, and Solaris 2.7 and up (when the kernel is in
|
|
64-bit mode). GCC 3.2 or higher, or Sun @command{cc} is required. On
|
|
GNU/Linux, depending on the default @command{gcc} mode, applications must be
|
|
compiled with
|
|
|
|
@example
|
|
gcc -m64
|
|
@end example
|
|
|
|
On Solaris applications must be compiled with
|
|
|
|
@example
|
|
gcc -m64 -mptr64 -Wa,-xarch=v9 -mcpu=v9
|
|
cc -xarch=v9
|
|
@end example
|
|
|
|
On the BSD sparc64 systems no special options are required, since 64-bits is
|
|
the only ABI available.
|
|
|
|
@item @samp{ABI=32}
|
|
For the basic 32-bit ABI, MPIR still uses as much of the V9 ISA as it can. In
|
|
the Sun documentation this combination is known as ``v8plus''. On GNU/Linux,
|
|
depending on the default @command{gcc} mode, applications may need to be
|
|
compiled with
|
|
|
|
@example
|
|
gcc -m32
|
|
@end example
|
|
|
|
On Solaris, no special compiler options are required for applications, though
|
|
using something like the following is recommended. (@command{gcc} 2.8 and
|
|
earlier only support @samp{-mv8} though.)
|
|
|
|
@example
|
|
gcc -mv8plus
|
|
cc -xarch=v8plus
|
|
@end example
|
|
@end table
|
|
|
|
MPIR speed is greatest in @samp{ABI=64}, so it's the default where available.
|
|
The speed is partly because there are extra registers available and partly
|
|
because 64-bits is considered the more important case and has therefore had
|
|
better code written for it.
|
|
|
|
Don't be confused by the names of the @samp{-m} and @samp{-x} compiler
|
|
options, they're called @samp{arch} but effectively control both ABI and ISA@.
|
|
|
|
On Solaris 2.6 and earlier, only @samp{ABI=32} is available since the kernel
|
|
doesn't save all registers.
|
|
|
|
On Solaris 2.7 with the kernel in 32-bit mode, a normal native build will
|
|
reject @samp{ABI=64} because the resulting executables won't run.
|
|
@samp{ABI=64} can still be built if desired by making it look like a
|
|
cross-compile, for example
|
|
|
|
@example
|
|
./configure --build=none --host=sparcv9-sun-solaris2.7 ABI=64
|
|
@end example
|
|
@end table
|
|
|
|
|
|
@need 2000
|
|
@node Notes for Package Builds, Notes for Particular Systems, ABI and ISA, Installing MPIR
|
|
@section Notes for Package Builds
|
|
@cindex Build notes for binary packaging
|
|
@cindex Packaged builds
|
|
|
|
MPIR should present no great difficulties for packaging in a binary
|
|
distribution.
|
|
|
|
@cindex Libtool versioning
|
|
@cindex Shared library versioning
|
|
Libtool is used to build the library and @samp{-version-info} is set
|
|
appropriately, having started from @samp{3:0:0} in GMP 3.0 (@pxref{Versioning,
|
|
Library interface versions, Library interface versions, libtool, GNU
|
|
Libtool}).
|
|
|
|
The GMP 4 series and MPIR 1 series will be upwardly binary compatible in each
|
|
release and will be upwardly binary compatible with all of the GMP 3 series.
|
|
Additional function interfaces may be added in each release, so on systems where
|
|
libtool versioning is not fully checked by the loader an auxiliary mechanism may be
|
|
needed to express that a dynamic linked application depends on a new enough
|
|
MPIR.
|
|
|
|
An auxiliary mechanism may also be needed to express that @file{libgmpxx.la}/@file{libmpirxx.la}
|
|
(from @option{--enable-cxx}, @pxref{Build Options}) requires @file{libgmp.la}/@file{libmpir.la}
|
|
from the same MPIR version, since this is not done by the libtool versioning,
|
|
nor otherwise. A mismatch will result in unresolved symbols from the linker,
|
|
or perhaps the loader.
|
|
|
|
When building a package for a CPU family, care should be taken to use
|
|
@samp{--host} (or @samp{--build}) to choose the least common denominator among
|
|
the CPUs which might use the package. For example this might mean plain
|
|
@samp{sparc} (meaning V7) for SPARCs.
|
|
|
|
For x86s, @option{--enable-fat} sets things up for a fat binary build, making a
|
|
runtime selection of optimized low level routines. This is a good choice for
|
|
packaging to run on a range of x86 chips.
|
|
|
|
Users who care about speed will want MPIR built for their exact CPU type, to
|
|
make best use of the available optimizations. Providing a way to suitably
|
|
rebuild a package may be useful. This could be as simple as making it
|
|
possible for a user to omit @samp{--build} (and @samp{--host}) so
|
|
@samp{./config.guess} will detect the CPU@. But a way to manually specify a
|
|
@samp{--build} will be wanted for systems where @samp{./config.guess} is
|
|
inexact.
|
|
|
|
On systems with multiple ABIs, a packaged build will need to decide which
|
|
among the choices is to be provided, see @ref{ABI and ISA}. A given run of
|
|
@samp{./configure} etc will only build one ABI@. If a second ABI is also
|
|
required then a second run of @samp{./configure} etc must be made, starting
|
|
from a clean directory tree (@samp{make distclean}).
|
|
|
|
As noted under ``ABI and ISA'', currently no attempt is made to follow system
|
|
conventions for install locations that vary with ABI, such as
|
|
@file{/usr/lib/sparcv9} for @samp{ABI=64} as opposed to @file{/usr/lib} for
|
|
@samp{ABI=32}. A package build can override @samp{libdir} and other standard
|
|
variables as necessary.
|
|
|
|
Note that @file{mpir.h} is a generated file, and will be architecture and ABI
|
|
dependent. When attempting to install two ABIs simultaneously it will be
|
|
important that an application compile gets the correct @file{mpir.h} for its
|
|
desired ABI@. If compiler include paths don't vary with ABI options then it
|
|
might be necessary to create a @file{/usr/include/mpir.h} which tests
|
|
preprocessor symbols and chooses the correct actual @file{mpir.h}.
|
|
|
|
|
|
@need 2000
|
|
@node Notes for Particular Systems, Known Build Problems, Notes for Package Builds, Installing MPIR
|
|
@section Notes for Particular Systems
|
|
@cindex Build notes for particular systems
|
|
@cindex Particular systems
|
|
@cindex Systems
|
|
@table @asis
|
|
|
|
@c This section is more or less meant for notes about performance or about
|
|
@c build problems that have been worked around but might leave a user
|
|
@c scratching their head. Fun with different ABIs on a system belongs in the
|
|
@c above section.
|
|
|
|
@item AIX 3 and 4
|
|
@cindex AIX
|
|
On systems @samp{*-*-aix[34]*} shared libraries are disabled by default, since
|
|
some versions of the native @command{ar} fail on the convenience libraries
|
|
used. A shared build can be attempted with
|
|
|
|
@example
|
|
./configure --enable-shared --disable-static
|
|
@end example
|
|
|
|
Note that the @samp{--disable-static} is necessary because in a shared build
|
|
libtool makes @file{libgmp.a}/@file{libmpir.a} a symlink to @file{libgmp.so}/@file{libmpir.so}, apparently for
|
|
the benefit of old versions of @command{ld} which only recognise @file{.a},
|
|
but unfortunately this is done even if a fully functional @command{ld} is
|
|
available.
|
|
|
|
@item ARM
|
|
@cindex ARM
|
|
On systems @samp{arm*-*-*}, versions of GCC up to and including 2.95.3 have a
|
|
bug in unsigned division, giving wrong results for some operands. MPIR
|
|
@samp{./configure} will demand GCC 2.95.4 or later.
|
|
|
|
@item Compaq C++
|
|
@cindex Compaq C++
|
|
Compaq C++ on OSF 5.1 has two flavours of @code{iostream}, a standard one and
|
|
an old pre-standard one (see @samp{man iostream_intro}). MPIR can only use the
|
|
standard one, which unfortunately is not the default but must be selected by
|
|
defining @code{__USE_STD_IOSTREAM}. Configure with for instance
|
|
|
|
@example
|
|
./configure --enable-cxx CPPFLAGS=-D__USE_STD_IOSTREAM
|
|
@end example
|
|
|
|
@item Floating Point Mode
|
|
@cindex Floating point mode
|
|
@cindex Hardware floating point mode
|
|
@cindex Precision of hardware floating point
|
|
@cindex x87
|
|
On some systems, the hardware floating point has a control mode which can set
|
|
all operations to be done in a particular precision, for instance single,
|
|
double or extended on x86 systems (x87 floating point). The MPIR functions
|
|
involving a @code{double} cannot be expected to operate to their full
|
|
precision when the hardware is in single precision mode. Of course this
|
|
affects all code, including application code, not just MPIR.
|
|
|
|
@item MS-DOS and MS Windows
|
|
@cindex MS-DOS
|
|
@cindex MS Windows
|
|
@cindex Windows
|
|
@cindex Cygwin
|
|
@cindex DJGPP
|
|
@cindex MINGW
|
|
On an MS-DOS system DJGPP can be used to build MPIR, and on an MS Windows
|
|
system Cygwin, DJGPP and MINGW can all be used. All three are excellent ports of
|
|
GCC and the various GNU tools.
|
|
|
|
In addition, project files for MSVC are provided, allowing MPIR to build on Microsoft's
|
|
compiler. This support is provided by Brian Gladman.
|
|
|
|
@display
|
|
@uref{http://www.cygwin.com/}
|
|
@uref{http://www.delorie.com/djgpp/}
|
|
@uref{http://www.mingw.org/}
|
|
@end display
|
|
|
|
@cindex Interix
|
|
@cindex Services for Unix
|
|
Microsoft also publishes an Interix ``Services for Unix'' which can be used to
|
|
build MPIR on Windows (with a normal @samp{./configure}), but it's not free
|
|
software.
|
|
|
|
@item MS Windows DLLs
|
|
@cindex DLLs
|
|
@cindex MS Windows
|
|
@cindex Windows
|
|
On systems @samp{*-*-cygwin*}, @samp{*-*-mingw*} and @samp{*-*-pw32*} by
|
|
default MPIR builds only a static library, but a DLL can be built instead using
|
|
|
|
@example
|
|
./configure --disable-static --enable-shared
|
|
@end example
|
|
|
|
Static and DLL libraries can't both be built, since certain export directives
|
|
in @file{mpir.h} must be different.
|
|
|
|
A MINGW DLL build of MPIR can be used with Microsoft C@. Libtool doesn't
|
|
install a @file{.lib} format import library, but it can be created with MS
|
|
@command{lib} as follows, and copied to the install directory. Similarly for
|
|
@file{libmp} and @file{libgmpxx}/@file{libmpirxx}.
|
|
|
|
@example
|
|
cd .libs
|
|
lib /def:libgmp-3.dll.def /out:libgmp-3.lib
|
|
@end example
|
|
|
|
MINGW uses the C runtime library @samp{msvcrt.dll} for I/O, so applications
|
|
wanting to use the MPIR I/O routines must be compiled with @samp{cl /MD} to do
|
|
the same. If one of the other C runtime library choices provided by MS C is
|
|
desired then the suggestion is to use the MPIR string functions and confine I/O
|
|
to the application.
|
|
|
|
@item Motorola 68k CPU Types
|
|
@cindex 68000
|
|
@samp{m68k} is taken to mean 68000. @samp{m68020} or higher will give a
|
|
performance boost on applicable CPUs. @samp{m68360} can be used for CPU32
|
|
series chips. @samp{m68302} can be used for ``Dragonball'' series chips,
|
|
though this is merely a synonym for @samp{m68000}.
|
|
|
|
@item OpenBSD 2.6
|
|
@cindex OpenBSD
|
|
@command{m4} in this release of OpenBSD has a bug in @code{eval} that makes it
|
|
unsuitable for @file{.asm} file processing. @samp{./configure} will detect
|
|
the problem and either abort or choose another m4 in the @env{PATH}. The bug
|
|
is fixed in OpenBSD 2.7, so either upgrade or use GNU m4.
|
|
|
|
@item Power CPU Types
|
|
@cindex Power/PowerPC
|
|
In MPIR, CPU types @samp{power*} and @samp{powerpc*} will each use instructions
|
|
not available on the other, so it's important to choose the right one for the
|
|
CPU that will be used. Currently MPIR has no assembler code support for using
|
|
just the common instruction subset. To get executables that run on both, the
|
|
current suggestion is to use the generic C code (CPU @samp{none}), possibly
|
|
with appropriate compiler options (like @samp{-mcpu=common} for
|
|
@command{gcc}). CPU @samp{rs6000} (which is not a CPU but a family of
|
|
workstations) is accepted by @file{config.sub}, but is currently equivalent to
|
|
@samp{none}.
|
|
|
|
@item Sparc CPU Types
|
|
@cindex Sparc
|
|
@samp{sparcv8} or @samp{supersparc} on relevant systems will give a
|
|
significant performance increase over the V7 code selected by plain
|
|
@samp{sparc}.
|
|
|
|
@item Sparc App Regs
|
|
@cindex Sparc
|
|
The MPIR assembler code for both 32-bit and 64-bit Sparc clobbers the
|
|
``application registers'' @code{g2}, @code{g3} and @code{g4}, the same way
|
|
that the GCC default @samp{-mapp-regs} does (@pxref{SPARC Options,, SPARC
|
|
Options, gcc, Using the GNU Compiler Collection (GCC)}).
|
|
|
|
This makes that code unsuitable for use with the special V9
|
|
@samp{-mcmodel=embmedany} (which uses @code{g4} as a data segment pointer),
|
|
and for applications wanting to use those registers for special purposes. In
|
|
these cases the only suggestion currently is to build MPIR with CPU @samp{none}
|
|
to avoid the assembler code.
|
|
|
|
@item SunOS 4
|
|
@cindex SunOS
|
|
@command{/usr/bin/m4} lacks various features needed to process @file{.asm}
|
|
files, and instead @samp{./configure} will automatically use
|
|
@command{/usr/5bin/m4}, which we believe is always available (if not then use
|
|
GNU m4).
|
|
|
|
@item x86 CPU Types
|
|
@cindex x86
|
|
@cindex 80x86
|
|
@cindex i386
|
|
@samp{i586}, @samp{pentium} or @samp{pentiummmx} code is good for its intended
|
|
P5 Pentium chips, but quite slow when run on Intel P6 class chips (PPro, P-II,
|
|
P-III)@. @samp{i386} is a better choice when making binaries that must run on
|
|
both.
|
|
|
|
@item x86 MMX and SSE2 Code
|
|
@cindex MMX
|
|
@cindex SSE2
|
|
If the CPU selected has MMX code but the assembler doesn't support it, a
|
|
warning is given and non-MMX code is used instead. This will be an inferior
|
|
build, since the MMX code that's present is there because it's faster than the
|
|
corresponding plain integer code. The same applies to SSE2.
|
|
|
|
Old versions of @samp{gas} don't support MMX instructions, in particular
|
|
version 1.92.3 that comes with FreeBSD 2.2.8 or the more recent OpenBSD 3.1
|
|
doesn't.
|
|
|
|
Solaris 2.6 and 2.7 @command{as} generate incorrect object code for register
|
|
to register @code{movq} instructions, and so can't be used for MMX code.
|
|
Install a recent @command{gas} if MMX code is wanted on these systems.
|
|
@end table
|
|
|
|
|
|
@need 2000
|
|
@node Known Build Problems, Performance optimization, Notes for Particular Systems, Installing MPIR
|
|
@section Known Build Problems
|
|
@cindex Build problems known
|
|
|
|
@c This section is more or less meant for known build problems that are not
|
|
@c otherwise worked around and require some sort of manual intervention.
|
|
|
|
You might find more up-to-date information at @uref{http://www.mpir.org/}.
|
|
|
|
@table @asis
|
|
@item Compiler link options
|
|
The version of libtool currently in use rather aggressively strips compiler
|
|
options when linking a shared library. This will hopefully be relaxed in the
|
|
future, but for now if this is a problem the suggestion is to create a little
|
|
script to hide them, and for instance configure with
|
|
|
|
@example
|
|
./configure CC=gcc-with-my-options
|
|
@end example
|
|
|
|
@item DJGPP (@samp{*-*-msdosdjgpp*})
|
|
@cindex DJGPP
|
|
The DJGPP port of @command{bash} 2.03 is unable to run the @samp{configure}
|
|
script, it exits silently, having died writing a preamble to
|
|
@file{config.log}. Use @command{bash} 2.04 or higher.
|
|
|
|
@samp{make all} was found to run out of memory during the final
|
|
@file{libgmp.la} link on one system tested, despite having 64Mb available.
|
|
Running @samp{make libgmp.la} directly helped, perhaps recursing into the
|
|
various subdirectories uses up memory.
|
|
|
|
@item GNU binutils @command{strip} prior to 2.12
|
|
@cindex Stripped libraries
|
|
@cindex Binutils @command{strip}
|
|
@cindex GNU @command{strip}
|
|
@command{strip} from GNU binutils 2.11 and earlier should not be used on the
|
|
static libraries @file{libgmp.a}/@file{libmpir.a} and @file{libmp.a} since it will discard all
|
|
but the last of multiple archive members with the same name, like the three
|
|
versions of @file{init.o} in @file{libgmp.a}/@file{libmpir.a}. Binutils 2.12 or higher can be
|
|
used successfully.
|
|
|
|
The shared libraries @file{libgmp.so}/@file{libmpir.so} and @file{libmp.so} are not affected by
|
|
this and any version of @command{strip} can be used on them.
|
|
|
|
@item @command{make} syntax error
|
|
@cindex SCO
|
|
@cindex IRIX
|
|
On certain versions of SCO OpenServer 5 and IRIX 6.5 the native @command{make}
|
|
is unable to handle the long dependencies list for @file{libgmp.la}/@file{libmpir.la}. The
|
|
symptom is a ``syntax error'' on the following line of the top-level
|
|
@file{Makefile}.
|
|
|
|
@example
|
|
libgmp.la: $(libgmp_la_OBJECTS) $(libgmp_la_DEPENDENCIES)
|
|
@end example
|
|
|
|
Either use GNU Make, or as a workaround remove
|
|
@code{$(libgmp_la_DEPENDENCIES)} from that line (which will make the initial
|
|
build work, but if any recompiling is done @file{libgmp.la}/@file{libmpir.la} might not be
|
|
rebuilt).
|
|
|
|
@item MacOS X (@samp{*-*-darwin*})
|
|
@cindex MacOS X
|
|
@cindex Darwin
|
|
Libtool currently only knows how to create shared libraries on MacOS X using
|
|
the native @command{cc} (which is a modified GCC), not a plain GCC@. A
|
|
static-only build should work though (@samp{--disable-shared}).
|
|
|
|
@item NeXT prior to 3.3
|
|
@cindex NeXT
|
|
The system compiler on old versions of NeXT was a massacred and old GCC, even
|
|
if it called itself @file{cc}. This compiler cannot be used to build MPIR, you
|
|
need to get a real GCC, and install that. (NeXT may have fixed this in
|
|
release 3.3 of their system.)
|
|
|
|
@item POWER and PowerPC
|
|
@cindex Power/PowerPC
|
|
Bugs in GCC 2.7.2 (and 2.6.3) mean it can't be used to compile MPIR on POWER or
|
|
PowerPC@. If you want to use GCC for these machines, get GCC 2.7.2.1 (or
|
|
later).
|
|
|
|
@item Sequent Symmetry
|
|
@cindex Sequent Symmetry
|
|
Use the GNU assembler instead of the system assembler, since the latter has
|
|
serious bugs.
|
|
|
|
@item Solaris 2.6
|
|
@cindex Solaris
|
|
The system @command{sed} prints an error ``Output line too long'' when libtool
|
|
builds @file{libgmp.la}/@file{libmpir.la}. This doesn't seem to cause any obvious ill effects,
|
|
but GNU @command{sed} is recommended, to avoid any doubt.
|
|
|
|
@item Sparc Solaris 2.7 with gcc 2.95.2 in @samp{ABI=32}
|
|
@cindex Solaris
|
|
A shared library build of MPIR seems to fail in this combination, it builds but
|
|
then fails the tests, apparently due to some incorrect data relocations within
|
|
@code{gmp_randinit_lc_2exp_size}. The exact cause is unknown,
|
|
@samp{--disable-shared} is recommended.
|
|
@end table
|
|
|
|
|
|
@need 2000
|
|
@node Performance optimization, , Known Build Problems, Installing MPIR
|
|
@section Performance optimization
|
|
@cindex Optimizing performance
|
|
|
|
@c At some point, this should perhaps move to a separate chapter on optimizing
|
|
@c performance.
|
|
|
|
For optimal performance, build MPIR for the exact CPU type of the target
|
|
computer, see @ref{Build Options}.
|
|
|
|
Unlike what is the case for most other programs, the compiler typically
|
|
doesn't matter much, since MPIR uses assembly language for the most critical
|
|
operation.
|
|
|
|
In particular for long-running MPIR applications, and applications demanding
|
|
extremely large numbers, building and running the @code{tuneup} program in the
|
|
@file{tune} subdirectory, can be important. For example,
|
|
|
|
@example
|
|
cd tune
|
|
make tuneup
|
|
./tuneup
|
|
@end example
|
|
|
|
will generate better contents for the @file{gmp-mparam.h} parameter file.
|
|
|
|
To use the results, put the output in the file indicated in the
|
|
@samp{Parameters for ...} header. Then recompile from scratch.
|
|
|
|
The @code{tuneup} program takes one useful parameter, @samp{-f NNN}, which
|
|
instructs the program how long to check FFT multiply parameters. If you're
|
|
going to use MPIR for extremely large numbers, you may want to run @code{tuneup}
|
|
with a large NNN value.
|
|
|
|
|
|
@node MPIR Basics, Reporting Bugs, Installing MPIR, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter MPIR Basics
|
|
@cindex Basics
|
|
|
|
@strong{Using functions, macros, data types, etc.@: not documented in this
|
|
manual is strongly discouraged. If you do so your application is guaranteed
|
|
to be incompatible with future versions of MPIR.}
|
|
|
|
@menu
|
|
* Headers and Libraries::
|
|
* Nomenclature and Types::
|
|
* Function Classes::
|
|
* Variable Conventions::
|
|
* Parameter Conventions::
|
|
* Memory Management::
|
|
* Reentrancy::
|
|
* Useful Macros and Constants::
|
|
* Compatibility with older versions::
|
|
* Demonstration Programs::
|
|
* Efficiency::
|
|
* Debugging::
|
|
* Profiling::
|
|
* Autoconf::
|
|
* Emacs::
|
|
@end menu
|
|
|
|
@node Headers and Libraries, Nomenclature and Types, MPIR Basics, MPIR Basics
|
|
@section Headers and Libraries
|
|
@cindex Headers
|
|
|
|
@cindex @file{mpir.h}
|
|
@cindex Include files
|
|
@cindex @code{#include}
|
|
All declarations needed to use MPIR are collected in the include file
|
|
@file{mpir.h}. It is designed to work with both C and C++ compilers.
|
|
|
|
@example
|
|
#include <mpir.h>
|
|
@end example
|
|
|
|
@cindex @code{stdio.h}
|
|
Note however that prototypes for MPIR functions with @code{FILE *} parameters
|
|
are only provided if @code{<stdio.h>} is included too.
|
|
|
|
@example
|
|
#include <stdio.h>
|
|
#include <mpir.h>
|
|
@end example
|
|
|
|
@cindex @code{stdarg.h}
|
|
Likewise @code{<stdarg.h>} (or @code{<varargs.h>}) is required for prototypes
|
|
with @code{va_list} parameters, such as @code{gmp_vprintf}. And
|
|
@code{<obstack.h>} for prototypes with @code{struct obstack} parameters, such
|
|
as @code{gmp_obstack_printf}, when available.
|
|
|
|
@cindex Libraries
|
|
@cindex Linking
|
|
@cindex @code{libgmp/libmpir}
|
|
All programs using MPIR must link against the @file{libgmp} or @file{libmpir} library. On a
|
|
typical Unix-like system this can be done with @samp{-lgmp} or @samp{-lmpir} respectively, for example
|
|
|
|
@example
|
|
gcc myprogram.c -lmpir
|
|
@end example
|
|
|
|
@cindex @code{libgmpxx}
|
|
MPIR C++ functions are in a separate @file{libgmpxx} or @file{libmpirxx} library. This is built
|
|
and installed if C++ support has been enabled (@pxref{Build Options}). For
|
|
example,
|
|
|
|
@example
|
|
g++ mycxxprog.cc -lmpirxx -lmpir
|
|
@end example
|
|
|
|
@cindex Libtool
|
|
MPIR is built using Libtool and an application can use that to link if desired,
|
|
@GMPpxreftop{libtool, GNU Libtool}
|
|
|
|
If MPIR has been installed to a non-standard location then it may be necessary
|
|
to use @samp{-I} and @samp{-L} compiler options to point to the right
|
|
directories, and some sort of run-time path for a shared library.
|
|
|
|
|
|
@node Nomenclature and Types, Function Classes, Headers and Libraries, MPIR Basics
|
|
@section Nomenclature and Types
|
|
@cindex Nomenclature
|
|
@cindex Types
|
|
|
|
@cindex Integer
|
|
@tindex @code{mpz_t}
|
|
In this manual, @dfn{integer} usually means a multiple precision integer, as
|
|
defined by the MPIR library. The C data type for such integers is @code{mpz_t}.
|
|
Here are some examples of how to declare such integers:
|
|
|
|
@example
|
|
mpz_t sum;
|
|
|
|
struct foo @{ mpz_t x, y; @};
|
|
|
|
mpz_t vec[20];
|
|
@end example
|
|
|
|
@cindex Rational number
|
|
@tindex @code{mpq_t}
|
|
@dfn{Rational number} means a multiple precision fraction. The C data type
|
|
for these fractions is @code{mpq_t}. For example:
|
|
|
|
@example
|
|
mpq_t quotient;
|
|
@end example
|
|
|
|
@cindex Floating-point number
|
|
@tindex @code{mpf_t}
|
|
@dfn{Floating point number} or @dfn{Float} for short, is an arbitrary precision
|
|
mantissa with a limited precision exponent. The C data type for such objects
|
|
is @code{mpf_t}. For example:
|
|
|
|
@example
|
|
mpf_t fp;
|
|
@end example
|
|
|
|
@tindex @code{mp_exp_t}
|
|
The floating point functions accept and return exponents in the C type
|
|
@code{mp_exp_t}. Currently this is usually a @code{long}, but on some systems
|
|
it's an @code{int} for efficiency.
|
|
|
|
@cindex Limb
|
|
@tindex @code{mp_limb_t}
|
|
A @dfn{limb} means the part of a multi-precision number that fits in a single
|
|
machine word. (We chose this word because a limb of the human body is
|
|
analogous to a digit, only larger, and containing several digits.) Normally a
|
|
limb is 32 or 64 bits. The C data type for a limb is @code{mp_limb_t}.
|
|
|
|
@tindex @code{mp_size_t}
|
|
Counts of limbs are represented in the C type @code{mp_size_t}. Currently
|
|
this is normally a @code{long}, but on some systems it's an @code{int} for
|
|
efficiency.
|
|
|
|
@cindex Random state
|
|
@tindex @code{gmp_randstate_t}
|
|
@dfn{Random state} means an algorithm selection and current state data. The C
|
|
data type for such objects is @code{gmp_randstate_t}. For example:
|
|
|
|
@example
|
|
gmp_randstate_t rstate;
|
|
@end example
|
|
|
|
Also, in general @code{unsigned long} is used for bit counts and ranges, and
|
|
@code{size_t} is used for byte or character counts.
|
|
|
|
|
|
@node Function Classes, Variable Conventions, Nomenclature and Types, MPIR Basics
|
|
@section Function Classes
|
|
@cindex Function classes
|
|
|
|
There are six classes of functions in the MPIR library:
|
|
|
|
@enumerate
|
|
@item
|
|
Functions for signed integer arithmetic, with names beginning with
|
|
@code{mpz_}. The associated type is @code{mpz_t}. There are about 150
|
|
functions in this class. (@pxref{Integer Functions})
|
|
|
|
@item
|
|
Functions for rational number arithmetic, with names beginning with
|
|
@code{mpq_}. The associated type is @code{mpq_t}. There are about 40
|
|
functions in this class, but the integer functions can be used for arithmetic
|
|
on the numerator and denominator separately. (@pxref{Rational Number
|
|
Functions})
|
|
|
|
@item
|
|
Functions for floating-point arithmetic, with names beginning with
|
|
@code{mpf_}. The associated type is @code{mpf_t}. There are about 60
|
|
functions is this class. (@pxref{Floating-point Functions})
|
|
|
|
@item
|
|
Functions compatible with Berkeley MP, such as @code{itom}, @code{madd}, and
|
|
@code{mult}. The associated type is @code{MINT}. (@pxref{BSD Compatible
|
|
Functions})
|
|
|
|
@item
|
|
Fast low-level functions that operate on natural numbers. These are used by
|
|
the functions in the preceding groups, and you can also call them directly
|
|
from very time-critical user programs. These functions' names begin with
|
|
@code{mpn_}. The associated type is array of @code{mp_limb_t}. There are
|
|
about 30 (hard-to-use) functions in this class. (@pxref{Low-level Functions})
|
|
|
|
@item
|
|
Miscellaneous functions. Functions for setting up custom allocation and
|
|
functions for generating random numbers. (@pxref{Custom Allocation}, and
|
|
@pxref{Random Number Functions})
|
|
@end enumerate
|
|
|
|
|
|
@node Variable Conventions, Parameter Conventions, Function Classes, MPIR Basics
|
|
@section Variable Conventions
|
|
@cindex Variable conventions
|
|
@cindex Conventions for variables
|
|
|
|
MPIR functions generally have output arguments before input arguments. This
|
|
notation is by analogy with the assignment operator. The BSD MP compatibility
|
|
functions are exceptions, having the output arguments last.
|
|
|
|
MPIR lets you use the same variable for both input and output in one call. For
|
|
example, the main function for integer multiplication, @code{mpz_mul}, can be
|
|
used to square @code{x} and put the result back in @code{x} with
|
|
|
|
@example
|
|
mpz_mul (x, x, x);
|
|
@end example
|
|
|
|
Before you can assign to an MPIR variable, you need to initialize it by calling
|
|
one of the special initialization functions. When you're done with a
|
|
variable, you need to clear it out, using one of the functions for that
|
|
purpose. Which function to use depends on the type of variable. See the
|
|
chapters on integer functions, rational number functions, and floating-point
|
|
functions for details.
|
|
|
|
A variable should only be initialized once, or at least cleared between each
|
|
initialization. After a variable has been initialized, it may be assigned to
|
|
any number of times.
|
|
|
|
For efficiency reasons, avoid excessive initializing and clearing. In
|
|
general, initialize near the start of a function and clear near the end. For
|
|
example,
|
|
|
|
@example
|
|
void
|
|
foo (void)
|
|
@{
|
|
mpz_t n;
|
|
int i;
|
|
mpz_init (n);
|
|
for (i = 1; i < 100; i++)
|
|
@{
|
|
mpz_mul (n, @dots{});
|
|
mpz_fdiv_q (n, @dots{});
|
|
@dots{}
|
|
@}
|
|
mpz_clear (n);
|
|
@}
|
|
@end example
|
|
|
|
|
|
@node Parameter Conventions, Memory Management, Variable Conventions, MPIR Basics
|
|
@section Parameter Conventions
|
|
@cindex Parameter conventions
|
|
@cindex Conventions for parameters
|
|
|
|
When an MPIR variable is used as a function parameter, it's effectively a
|
|
call-by-reference, meaning if the function stores a value there it will change
|
|
the original in the caller. Parameters which are input-only can be designated
|
|
@code{const} to provoke a compiler error or warning on attempting to modify
|
|
them.
|
|
|
|
When a function is going to return an MPIR result, it should designate a
|
|
parameter that it sets, like the library functions do. More than one value
|
|
can be returned by having more than one output parameter, again like the
|
|
library functions. A @code{return} of an @code{mpz_t} etc doesn't return the
|
|
object, only a pointer, and this is almost certainly not what's wanted.
|
|
|
|
Here's an example accepting an @code{mpz_t} parameter, doing a calculation,
|
|
and storing the result to the indicated parameter.
|
|
|
|
@example
|
|
void
|
|
foo (mpz_t result, const mpz_t param, unsigned long n)
|
|
@{
|
|
unsigned long i;
|
|
mpz_mul_ui (result, param, n);
|
|
for (i = 1; i < n; i++)
|
|
mpz_add_ui (result, result, i*7);
|
|
@}
|
|
|
|
int
|
|
main (void)
|
|
@{
|
|
mpz_t r, n;
|
|
mpz_init (r);
|
|
mpz_init_set_str (n, "123456", 0);
|
|
foo (r, n, 20L);
|
|
gmp_printf ("%Zd\n", r);
|
|
return 0;
|
|
@}
|
|
@end example
|
|
|
|
@code{foo} works even if the mainline passes the same variable for
|
|
@code{param} and @code{result}, just like the library functions. But
|
|
sometimes it's tricky to make that work, and an application might not want to
|
|
bother supporting that sort of thing.
|
|
|
|
For interest, the MPIR types @code{mpz_t} etc are implemented as one-element
|
|
arrays of certain structures. This is why declaring a variable creates an
|
|
object with the fields MPIR needs, but then using it as a parameter passes a
|
|
pointer to the object. Note that the actual fields in each @code{mpz_t} etc
|
|
are for internal use only and should not be accessed directly by code that
|
|
expects to be compatible with future MPIR releases.
|
|
|
|
|
|
@need 1000
|
|
@node Memory Management, Reentrancy, Parameter Conventions, MPIR Basics
|
|
@section Memory Management
|
|
@cindex Memory management
|
|
|
|
The MPIR types like @code{mpz_t} are small, containing only a couple of sizes,
|
|
and pointers to allocated data. Once a variable is initialized, MPIR takes
|
|
care of all space allocation. Additional space is allocated whenever a
|
|
variable doesn't have enough.
|
|
|
|
@code{mpz_t} and @code{mpq_t} variables never reduce their allocated space.
|
|
Normally this is the best policy, since it avoids frequent reallocation.
|
|
Applications that need to return memory to the heap at some particular point
|
|
can use @code{mpz_realloc2}, or clear variables no longer needed.
|
|
|
|
@code{mpf_t} variables, in the current implementation, use a fixed amount of
|
|
space, determined by the chosen precision and allocated at initialization, so
|
|
their size doesn't change.
|
|
|
|
All memory is allocated using @code{malloc} and friends by default, but this
|
|
can be changed, see @ref{Custom Allocation}. Temporary memory on the stack is
|
|
also used (via @code{alloca}), but this can be changed at build-time if
|
|
desired, see @ref{Build Options}.
|
|
|
|
|
|
@node Reentrancy, Useful Macros and Constants, Memory Management, MPIR Basics
|
|
@section Reentrancy
|
|
@cindex Reentrancy
|
|
@cindex Thread safety
|
|
@cindex Multi-threading
|
|
|
|
@noindent
|
|
MPIR is reentrant and thread-safe, with some exceptions:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
If configured with @option{--enable-alloca=malloc-notreentrant} (or with
|
|
@option{--enable-alloca=notreentrant} when @code{alloca} is not available),
|
|
then naturally MPIR is not reentrant.
|
|
|
|
@item
|
|
@code{mpf_set_default_prec} and @code{mpf_init} use a global variable for the
|
|
selected precision. @code{mpf_init2} can be used instead, and in the C++
|
|
interface an explicit precision to the @code{mpf_class} constructor.
|
|
|
|
@item
|
|
@code{mpz_random} and the other old random number functions use a global
|
|
random state and are hence not reentrant. The newer random number functions
|
|
that accept a @code{gmp_randstate_t} parameter can be used instead.
|
|
|
|
@item
|
|
@code{gmp_randinit} (obsolete) returns an error indication through a global
|
|
variable, which is not thread safe. Applications are advised to use
|
|
@code{gmp_randinit_default} or @code{gmp_randinit_lc_2exp} instead.
|
|
|
|
@item
|
|
@code{mp_set_memory_functions} uses global variables to store the selected
|
|
memory allocation functions.
|
|
|
|
@item
|
|
If the memory allocation functions set by a call to
|
|
@code{mp_set_memory_functions} (or @code{malloc} and friends by default) are
|
|
not reentrant, then MPIR will not be reentrant either.
|
|
|
|
@item
|
|
If the standard I/O functions such as @code{fwrite} are not reentrant then the
|
|
MPIR I/O functions using them will not be reentrant either.
|
|
|
|
@item
|
|
It's safe for two threads to read from the same MPIR variable simultaneously,
|
|
but it's not safe for one to read while the another might be writing, nor for
|
|
two threads to write simultaneously. It's not safe for two threads to
|
|
generate a random number from the same @code{gmp_randstate_t} simultaneously,
|
|
since this involves an update of that variable.
|
|
@end itemize
|
|
|
|
|
|
@need 2000
|
|
@node Useful Macros and Constants, Compatibility with older versions, Reentrancy, MPIR Basics
|
|
@section Useful Macros and Constants
|
|
@cindex Useful macros and constants
|
|
@cindex Constants
|
|
|
|
@deftypevr {Global Constant} {const int} mp_bits_per_limb
|
|
@findex mp_bits_per_limb
|
|
@cindex Bits per limb
|
|
@cindex Limb size
|
|
The number of bits per limb.
|
|
@end deftypevr
|
|
|
|
@defmac __GNU_MP_VERSION
|
|
@defmacx __GNU_MP_VERSION_MINOR
|
|
@defmacx __GNU_MP_VERSION_PATCHLEVEL
|
|
@cindex Version number
|
|
@cindex MPIR version number
|
|
The major and minor GMP version, and patch level, respectively, as integers.
|
|
For GMP i.j, these numbers will be i, j, and 0, respectively.
|
|
For GMP i.j.k, these numbers will be i, j, and k, respectively.
|
|
These numbers represent the version of GMP fully supported by this version of MPIR.
|
|
@end defmac
|
|
|
|
@defmac __MPIR_VERSION
|
|
@defmacx __MPIR_VERSION_MINOR
|
|
@defmacx __MPIR_VERSION_PATCHLEVEL
|
|
@cindex Version number
|
|
@cindex MPIR version number
|
|
The major and minor MPIR version, and patch level, respectively, as integers.
|
|
For MPIR i.j, these numbers will be i, j, and 0, respectively.
|
|
For MPIR i.j.k, these numbers will be i, j, and k, respectively.
|
|
@end defmac
|
|
|
|
@deftypevr {Global Constant} {const char * const} gmp_version
|
|
@findex gmp_version
|
|
The GNU MP version number, as a null-terminated string, in the form ``i.j'' or
|
|
``i.j.k''.
|
|
@end deftypevr
|
|
|
|
@deftypevr {Global Constant} {const char * const} mpir_version
|
|
@findex mpir_version
|
|
The MPIR version number, as a null-terminated string, in the form ``i.j'' or
|
|
``i.j.k''. This release is @nicode{"@value{VERSION}"}.
|
|
@end deftypevr
|
|
|
|
@node Compatibility with older versions, Demonstration Programs, Useful Macros and Constants, MPIR Basics
|
|
@section Compatibility with older versions
|
|
@cindex Compatibility with older versions
|
|
@cindex Past GMP/MPIR versions
|
|
@cindex Upward compatibility
|
|
|
|
This version of MPIR is upwardly binary compatible with all GMP 4.x and 3.x
|
|
versions, and upwardly compatible at the source level with all 2.x versions,
|
|
with the following exceptions.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@code{mpn_gcd} had its source arguments swapped as of GMP 3.0, for consistency
|
|
with other @code{mpn} functions.
|
|
|
|
@item
|
|
@code{mpf_get_prec} counted precision slightly differently in GMP 3.0 and
|
|
3.0.1, but in 3.1 reverted to the 2.x style.
|
|
@end itemize
|
|
|
|
There are a number of compatibility issues between GMP 1 and GMP 2 that of
|
|
course also apply when porting applications from GMP 1 to GMP 4 and MPIR 1. Please
|
|
see the GMP 2 manual for details.
|
|
|
|
The Berkeley MP compatibility library (@pxref{BSD Compatible Functions}) is
|
|
source and binary compatible with the standard @file{libmp}.
|
|
|
|
@c @enumerate
|
|
@c @item Integer division functions round the result differently. The obsolete
|
|
@c functions (@code{mpz_div}, @code{mpz_divmod}, @code{mpz_mdiv},
|
|
@c @code{mpz_mdivmod}, etc) now all use floor rounding (i.e., they round the
|
|
@c quotient towards
|
|
@c @ifinfo
|
|
@c @minus{}infinity).
|
|
@c @end ifinfo
|
|
@c @iftex
|
|
@c @tex
|
|
@c $-\infty$).
|
|
@c @end tex
|
|
@c @end iftex
|
|
@c There are a lot of functions for integer division, giving the user better
|
|
@c control over the rounding.
|
|
|
|
@c @item The function @code{mpz_mod} now compute the true @strong{mod} function.
|
|
|
|
@c @item The functions @code{mpz_powm} and @code{mpz_powm_ui} now use
|
|
@c @strong{mod} for reduction.
|
|
|
|
@c @item The assignment functions for rational numbers do no longer canonicalize
|
|
@c their results. In the case a non-canonical result could arise from an
|
|
@c assignment, the user need to insert an explicit call to
|
|
@c @code{mpq_canonicalize}. This change was made for efficiency.
|
|
|
|
@c @item Output generated by @code{mpz_out_raw} in this release cannot be read
|
|
@c by @code{mpz_inp_raw} in previous releases. This change was made for making
|
|
@c the file format truly portable between machines with different word sizes.
|
|
|
|
@c @item Several @code{mpn} functions have changed. But they were intentionally
|
|
@c undocumented in previous releases.
|
|
|
|
@c @item The functions @code{mpz_cmp_ui}, @code{mpz_cmp_si}, and @code{mpq_cmp_ui}
|
|
@c are now implemented as macros, and thereby sometimes evaluate their
|
|
@c arguments multiple times.
|
|
|
|
@c @item The functions @code{mpz_pow_ui} and @code{mpz_ui_pow_ui} now yield 1
|
|
@c for 0^0. (In version 1, they yielded 0.)
|
|
|
|
@c In version 1 of the library, @code{mpq_set_den} handled negative
|
|
@c denominators by copying the sign to the numerator. That is no longer done.
|
|
|
|
@c Pure assignment functions do not canonicalize the assigned variable. It is
|
|
@c the responsibility of the user to canonicalize the assigned variable before
|
|
@c any arithmetic operations are performed on that variable.
|
|
@c Note that this is an incompatible change from version 1 of the library.
|
|
|
|
@c @end enumerate
|
|
|
|
|
|
@need 1000
|
|
@node Demonstration Programs, Efficiency, Compatibility with older versions, MPIR Basics
|
|
@section Demonstration programs
|
|
@cindex Demonstration programs
|
|
@cindex Example programs
|
|
@cindex Sample programs
|
|
The @file{demos} subdirectory has some sample programs using MPIR@. These
|
|
aren't built or installed, but there's a @file{Makefile} with rules for them.
|
|
For instance,
|
|
|
|
@example
|
|
make pexpr
|
|
./pexpr 68^975+10
|
|
@end example
|
|
|
|
@noindent
|
|
The following programs are provided
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@cindex Expression parsing demo
|
|
@cindex Parsing expressions demo
|
|
@samp{pexpr} is an expression evaluator, the program used on the GMP web page.
|
|
@item
|
|
@cindex Expression parsing demo
|
|
@cindex Parsing expressions demo
|
|
The @samp{calc} subdirectory has a similar but simpler evaluator using
|
|
@command{lex} and @command{yacc}.
|
|
@item
|
|
@cindex Expression parsing demo
|
|
@cindex Parsing expressions demo
|
|
The @samp{expr} subdirectory is yet another expression evaluator, a library
|
|
designed for ease of use within a C program. See @file{demos/expr/README} for
|
|
more information.
|
|
@item
|
|
@cindex Factorization demo
|
|
@samp{factorize} is a Pollard-Rho factorization program.
|
|
@item
|
|
@samp{isprime} is a command-line interface to the @code{mpz_probab_prime_p}
|
|
function.
|
|
@item
|
|
@samp{primes} counts or lists primes in an interval, using a sieve.
|
|
@item
|
|
@samp{qcn} is an example use of @code{mpz_kronecker_ui} to estimate quadratic
|
|
class numbers.
|
|
@item
|
|
@cindex @code{perl}
|
|
@cindex MPIR Perl module
|
|
@cindex Perl module
|
|
The @samp{perl} subdirectory is a comprehensive perl interface to MPIR@. See
|
|
@file{demos/perl/INSTALL} for more information. Documentation is in POD
|
|
format in @file{demos/perl/GMP.pm}.
|
|
@end itemize
|
|
|
|
As an aside, consideration has been given at various times to some sort of
|
|
expression evaluation within the main MPIR library. Going beyond something
|
|
minimal quickly leads to matters like user-defined functions, looping, fixnums
|
|
for control variables, etc, which are considered outside the scope of MPIR
|
|
(much closer to language interpreters or compilers, @xref{Language Bindings}.)
|
|
Something simple for program input convenience may yet be a possibility, a
|
|
combination of the @file{expr} demo and the @file{pexpr} tree back-end
|
|
perhaps. But for now the above evaluators are offered as illustrations.
|
|
|
|
|
|
@need 1000
|
|
@node Efficiency, Debugging, Demonstration Programs, MPIR Basics
|
|
@section Efficiency
|
|
@cindex Efficiency
|
|
|
|
@table @asis
|
|
@item Small Operands
|
|
@cindex Small operands
|
|
On small operands, the time for function call overheads and memory allocation
|
|
can be significant in comparison to actual calculation. This is unavoidable
|
|
in a general purpose variable precision library, although MPIR attempts to be
|
|
as efficient as it can on both large and small operands.
|
|
|
|
@item Static Linking
|
|
@cindex Static linking
|
|
On some CPUs, in particular the x86s, the static @file{libgmp.a}/@file{libmpir.a} should be
|
|
used for maximum speed, since the PIC code in the shared @file{libgmp.so}/@file{libmpir.so} will
|
|
have a small overhead on each function call and global data address. For many
|
|
programs this will be insignificant, but for long calculations there's a gain
|
|
to be had.
|
|
|
|
@item Initializing and Clearing
|
|
@cindex Initializing and clearing
|
|
Avoid excessive initializing and clearing of variables, since this can be
|
|
quite time consuming, especially in comparison to otherwise fast operations
|
|
like addition.
|
|
|
|
A language interpreter might want to keep a free list or stack of
|
|
initialized variables ready for use. It should be possible to integrate
|
|
something like that with a garbage collector too.
|
|
|
|
@item Reallocations
|
|
@cindex Reallocations
|
|
An @code{mpz_t} or @code{mpq_t} variable used to hold successively increasing
|
|
values will have its memory repeatedly @code{realloc}ed, which could be quite
|
|
slow or could fragment memory, depending on the C library. If an application
|
|
can estimate the final size then @code{mpz_init2} or @code{mpz_realloc2} can
|
|
be called to allocate the necessary space from the beginning
|
|
(@pxref{Initializing Integers}).
|
|
|
|
It doesn't matter if a size set with @code{mpz_init2} or @code{mpz_realloc2}
|
|
is too small, since all functions will do a further reallocation if necessary.
|
|
Badly overestimating memory required will waste space though.
|
|
|
|
@item @code{2exp} Functions
|
|
@cindex @code{2exp} functions
|
|
It's up to an application to call functions like @code{mpz_mul_2exp} when
|
|
appropriate. General purpose functions like @code{mpz_mul} make no attempt to
|
|
identify powers of two or other special forms, because such inputs will
|
|
usually be very rare and testing every time would be wasteful.
|
|
|
|
@item @code{ui} and @code{si} Functions
|
|
@cindex @code{ui} and @code{si} functions
|
|
The @code{ui} functions and the small number of @code{si} functions exist for
|
|
convenience and should be used where applicable. But if for example an
|
|
@code{mpz_t} contains a value that fits in an @code{unsigned long} there's no
|
|
need extract it and call a @code{ui} function, just use the regular @code{mpz}
|
|
function.
|
|
|
|
@item In-Place Operations
|
|
@cindex In-place operations
|
|
@code{mpz_abs}, @code{mpq_abs}, @code{mpf_abs}, @code{mpz_neg}, @code{mpq_neg}
|
|
and @code{mpf_neg} are fast when used for in-place operations like
|
|
@code{mpz_abs(x,x)}, since in the current implementation only a single field
|
|
of @code{x} needs changing. On suitable compilers (GCC for instance) this is
|
|
inlined too.
|
|
|
|
@code{mpz_add_ui}, @code{mpz_sub_ui}, @code{mpf_add_ui} and @code{mpf_sub_ui}
|
|
benefit from an in-place operation like @code{mpz_add_ui(x,x,y)}, since
|
|
usually only one or two limbs of @code{x} will need to be changed. The same
|
|
applies to the full precision @code{mpz_add} etc if @code{y} is small. If
|
|
@code{y} is big then cache locality may be helped, but that's all.
|
|
|
|
@code{mpz_mul} is currently the opposite, a separate destination is slightly
|
|
better. A call like @code{mpz_mul(x,x,y)} will, unless @code{y} is only one
|
|
limb, make a temporary copy of @code{x} before forming the result. Normally
|
|
that copying will only be a tiny fraction of the time for the multiply, so
|
|
this is not a particularly important consideration.
|
|
|
|
@code{mpz_set}, @code{mpq_set}, @code{mpq_set_num}, @code{mpf_set}, etc, make
|
|
no attempt to recognise a copy of something to itself, so a call like
|
|
@code{mpz_set(x,x)} will be wasteful. Naturally that would never be written
|
|
deliberately, but if it might arise from two pointers to the same object then
|
|
a test to avoid it might be desirable.
|
|
|
|
@example
|
|
if (x != y)
|
|
mpz_set (x, y);
|
|
@end example
|
|
|
|
Note that it's never worth introducing extra @code{mpz_set} calls just to get
|
|
in-place operations. If a result should go to a particular variable then just
|
|
direct it there and let MPIR take care of data movement.
|
|
|
|
@item Divisibility Testing (Small Integers)
|
|
@cindex Divisibility testing
|
|
@code{mpz_divisible_ui_p} and @code{mpz_congruent_ui_p} are the best functions
|
|
for testing whether an @code{mpz_t} is divisible by an individual small
|
|
integer. They use an algorithm which is faster than @code{mpz_tdiv_ui}, but
|
|
which gives no useful information about the actual remainder, only whether
|
|
it's zero (or a particular value).
|
|
|
|
However when testing divisibility by several small integers, it's best to take
|
|
a remainder modulo their product, to save multi-precision operations. For
|
|
instance to test whether a number is divisible by any of 23, 29 or 31 take a
|
|
remainder modulo @math{23@times{}29@times{}31 = 20677} and then test that.
|
|
|
|
The division functions like @code{mpz_tdiv_q_ui} which give a quotient as well
|
|
as a remainder are generally a little slower than the remainder-only functions
|
|
like @code{mpz_tdiv_ui}. If the quotient is only rarely wanted then it's
|
|
probably best to just take a remainder and then go back and calculate the
|
|
quotient if and when it's wanted (@code{mpz_divexact_ui} can be used if the
|
|
remainder is zero).
|
|
|
|
@item Rational Arithmetic
|
|
@cindex Rational arithmetic
|
|
The @code{mpq} functions operate on @code{mpq_t} values with no common factors
|
|
in the numerator and denominator. Common factors are checked-for and cast out
|
|
as necessary. In general, cancelling factors every time is the best approach
|
|
since it minimizes the sizes for subsequent operations.
|
|
|
|
However, applications that know something about the factorization of the
|
|
values they're working with might be able to avoid some of the GCDs used for
|
|
canonicalization, or swap them for divisions. For example when multiplying by
|
|
a prime it's enough to check for factors of it in the denominator instead of
|
|
doing a full GCD@. Or when forming a big product it might be known that very
|
|
little cancellation will be possible, and so canonicalization can be left to
|
|
the end.
|
|
|
|
The @code{mpq_numref} and @code{mpq_denref} macros give access to the
|
|
numerator and denominator to do things outside the scope of the supplied
|
|
@code{mpq} functions. @xref{Applying Integer Functions}.
|
|
|
|
The canonical form for rationals allows mixed-type @code{mpq_t} and integer
|
|
additions or subtractions to be done directly with multiples of the
|
|
denominator. This will be somewhat faster than @code{mpq_add}. For example,
|
|
|
|
@example
|
|
/* mpq increment */
|
|
mpz_add (mpq_numref(q), mpq_numref(q), mpq_denref(q));
|
|
|
|
/* mpq += unsigned long */
|
|
mpz_addmul_ui (mpq_numref(q), mpq_denref(q), 123UL);
|
|
|
|
/* mpq -= mpz */
|
|
mpz_submul (mpq_numref(q), mpq_denref(q), z);
|
|
@end example
|
|
|
|
@item Number Sequences
|
|
@cindex Number sequences
|
|
Functions like @code{mpz_fac_ui}, @code{mpz_fib_ui} and @code{mpz_bin_uiui}
|
|
are designed for calculating isolated values. If a range of values is wanted
|
|
it's probably best to call to get a starting point and iterate from there.
|
|
|
|
@item Text Input/Output
|
|
@cindex Text input/output
|
|
Hexadecimal or octal are suggested for input or output in text form.
|
|
Power-of-2 bases like these can be converted much more efficiently than other
|
|
bases, like decimal. For big numbers there's usually nothing of particular
|
|
interest to be seen in the digits, so the base doesn't matter much.
|
|
|
|
Maybe we can hope octal will one day become the normal base for everyday use,
|
|
as proposed by King Charles XII of Sweden and later reformers.
|
|
@c Reference: Knuth volume 2 section 4.1, page 184 of second edition. :-)
|
|
@end table
|
|
|
|
|
|
@node Debugging, Profiling, Efficiency, MPIR Basics
|
|
@section Debugging
|
|
@cindex Debugging
|
|
|
|
@table @asis
|
|
@item Stack Overflow
|
|
@cindex Stack overflow
|
|
@cindex Segmentation violation
|
|
@cindex Bus error
|
|
Depending on the system, a segmentation violation or bus error might be the
|
|
only indication of stack overflow. See @samp{--enable-alloca} choices in
|
|
@ref{Build Options}, for how to address this.
|
|
|
|
In new enough versions of GCC, @samp{-fstack-check} may be able to ensure an
|
|
overflow is recognised by the system before too much damage is done, or
|
|
@samp{-fstack-limit-symbol} or @samp{-fstack-limit-register} may be able to
|
|
add checking if the system itself doesn't do any (@pxref{Code Gen Options,,
|
|
Options for Code Generation, gcc, Using the GNU Compiler Collection (GCC)}).
|
|
These options must be added to the @samp{CFLAGS} used in the MPIR build
|
|
(@pxref{Build Options}), adding them just to an application will have no
|
|
effect. Note also they're a slowdown, adding overhead to each function call
|
|
and each stack allocation.
|
|
|
|
@item Heap Problems
|
|
@cindex Heap problems
|
|
@cindex Malloc problems
|
|
The most likely cause of application problems with MPIR is heap corruption.
|
|
Failing to @code{init} MPIR variables will have unpredictable effects, and
|
|
corruption arising elsewhere in a program may well affect MPIR@. Initializing
|
|
MPIR variables more than once or failing to clear them will cause memory leaks.
|
|
|
|
@cindex Malloc debugger
|
|
In all such cases a @code{malloc} debugger is recommended. On a GNU or BSD
|
|
system the standard C library @code{malloc} has some diagnostic facilities,
|
|
see @ref{Allocation Debugging,, Allocation Debugging, libc, The GNU C Library
|
|
Reference Manual}, or @samp{man 3 malloc}. Other possibilities, in no
|
|
particular order, include
|
|
|
|
@display
|
|
@uref{http://www.inf.ethz.ch/personal/biere/projects/ccmalloc/}
|
|
@uref{http://dmalloc.com/}
|
|
@uref{http://www.perens.com/FreeSoftware/} @ (electric fence)
|
|
@uref{http://packages.debian.org/fda}
|
|
@uref{http://www.gnupdate.org/components/leakbug/}
|
|
@uref{http://people.redhat.com/~otaylor/memprof/}
|
|
@uref{http://www.cbmamiga.demon.co.uk/mpatrol/}
|
|
@end display
|
|
|
|
The MPIR default allocation routines in @file{memory.c} also have a simple
|
|
sentinel scheme which can be enabled with @code{#define DEBUG} in that file.
|
|
This is mainly designed for detecting buffer overruns during MPIR development,
|
|
but might find other uses.
|
|
|
|
@item Stack Backtraces
|
|
@cindex Stack backtrace
|
|
On some systems the compiler options MPIR uses by default can interfere with
|
|
debugging. In particular on x86 and 68k systems @samp{-fomit-frame-pointer}
|
|
is used and this generally inhibits stack backtracing. Recompiling without
|
|
such options may help while debugging, though the usual caveats about it
|
|
potentially moving a memory problem or hiding a compiler bug will apply.
|
|
|
|
@item GDB, the GNU Debugger
|
|
@cindex GDB
|
|
@cindex GNU Debugger
|
|
A sample @file{.gdbinit} is included in the distribution, showing how to call
|
|
some undocumented dump functions to print MPIR variables from within GDB@. Note
|
|
that these functions shouldn't be used in final application code since they're
|
|
undocumented and may be subject to incompatible changes in future versions of
|
|
MPIR.
|
|
|
|
@item Source File Paths
|
|
MPIR has multiple source files with the same name, in different directories.
|
|
For example @file{mpz}, @file{mpq} and @file{mpf} each have an
|
|
@file{init.c}. If the debugger can't already determine the right one it may
|
|
help to build with absolute paths on each C file. One way to do that is to
|
|
use a separate object directory with an absolute path to the source directory.
|
|
|
|
@example
|
|
cd /my/build/dir
|
|
/my/source/dir/gmp-@value{VERSION}/configure
|
|
@end example
|
|
|
|
This works via @code{VPATH}, and might require GNU @command{make}.
|
|
Alternately it might be possible to change the @code{.c.lo} rules
|
|
appropriately.
|
|
|
|
@item Assertion Checking
|
|
@cindex Assertion checking
|
|
The build option @option{--enable-assert} is available to add some consistency
|
|
checks to the library (see @ref{Build Options}). These are likely to be of
|
|
limited value to most applications. Assertion failures are just as likely to
|
|
indicate memory corruption as a library or compiler bug.
|
|
|
|
Applications using the low-level @code{mpn} functions, however, will benefit
|
|
from @option{--enable-assert} since it adds checks on the parameters of most
|
|
such functions, many of which have subtle restrictions on their usage. Note
|
|
however that only the generic C code has checks, not the assembler code, so
|
|
CPU @samp{none} should be used for maximum checking.
|
|
|
|
@item Temporary Memory Checking
|
|
The build option @option{--enable-alloca=debug} arranges that each block of
|
|
temporary memory in MPIR is allocated with a separate call to @code{malloc} (or
|
|
the allocation function set with @code{mp_set_memory_functions}).
|
|
|
|
This can help a malloc debugger detect accesses outside the intended bounds,
|
|
or detect memory not released. In a normal build, on the other hand,
|
|
temporary memory is allocated in blocks which MPIR divides up for its own use,
|
|
or may be allocated with a compiler builtin @code{alloca} which will go
|
|
nowhere near any malloc debugger hooks.
|
|
|
|
@item Maximum Debuggability
|
|
To summarize the above, an MPIR build for maximum debuggability would be
|
|
|
|
@example
|
|
./configure --disable-shared --enable-assert \
|
|
--enable-alloca=debug --host=none CFLAGS=-g
|
|
@end example
|
|
|
|
For C++, add @samp{--enable-cxx CXXFLAGS=-g}.
|
|
|
|
@item Checker
|
|
@cindex Checker
|
|
@cindex GCC Checker
|
|
The GCC checker (@uref{http://savannah.gnu.org/projects/checker/}) can be used
|
|
with MPIR@. It contains a stub library which means MPIR applications compiled
|
|
with checker can use a normal MPIR build.
|
|
|
|
A build of MPIR with checking within MPIR itself can be made. This will run
|
|
very very slowly. On GNU/Linux for example,
|
|
|
|
@cindex @command{checkergcc}
|
|
@example
|
|
./configure --host=none-pc-linux-gnu CC=checkergcc
|
|
@end example
|
|
|
|
@samp{--host=none} must be used, since the MPIR assembler code doesn't support
|
|
the checking scheme. The MPIR C++ features cannot be used, since current
|
|
versions of checker (0.9.9.1) don't yet support the standard C++ library.
|
|
|
|
@item Valgrind
|
|
@cindex Valgrind
|
|
The valgrind program (@uref{http://valgrind.org/}) is a memory
|
|
checker for x86s. It translates and emulates machine instructions to do
|
|
strong checks for uninitialized data (at the level of individual bits), memory
|
|
accesses through bad pointers, and memory leaks.
|
|
|
|
Recent versions of Valgrind are getting support for MMX and SSE/SSE2
|
|
instructions, for past versions MPIR will need to be configured not to use
|
|
those, ie.@: for an x86 without them (for instance plain @samp{i486}).
|
|
|
|
@item Other Problems
|
|
Any suspected bug in MPIR itself should be isolated to make sure it's not an
|
|
application problem, see @ref{Reporting Bugs}.
|
|
@end table
|
|
|
|
|
|
@node Profiling, Autoconf, Debugging, MPIR Basics
|
|
@section Profiling
|
|
@cindex Profiling
|
|
@cindex Execution profiling
|
|
@cindex @code{--enable-profiling}
|
|
|
|
Running a program under a profiler is a good way to find where it's spending
|
|
most time and where improvements can be best sought. The profiling choices
|
|
for a MPIR build are as follows.
|
|
|
|
@table @asis
|
|
@item @samp{--disable-profiling}
|
|
The default is to add nothing special for profiling.
|
|
|
|
It should be possible to just compile the mainline of a program with @code{-p}
|
|
and use @command{prof} to get a profile consisting of timer-based sampling of
|
|
the program counter. Most of the MPIR assembler code has the necessary symbol
|
|
information.
|
|
|
|
This approach has the advantage of minimizing interference with normal program
|
|
operation, but on most systems the resolution of the sampling is quite low (10
|
|
milliseconds for instance), requiring long runs to get accurate information.
|
|
|
|
@item @samp{--enable-profiling=prof}
|
|
@cindex @code{prof}
|
|
Build with support for the system @command{prof}, which means @samp{-p} added
|
|
to the @samp{CFLAGS}.
|
|
|
|
This provides call counting in addition to program counter sampling, which
|
|
allows the most frequently called routines to be identified, and an average
|
|
time spent in each routine to be determined.
|
|
|
|
The x86 assembler code has support for this option, but on other processors
|
|
the assembler routines will be as if compiled without @samp{-p} and therefore
|
|
won't appear in the call counts.
|
|
|
|
On some systems, such as GNU/Linux, @samp{-p} in fact means @samp{-pg} and in
|
|
this case @samp{--enable-profiling=gprof} described below should be used
|
|
instead.
|
|
|
|
@item @samp{--enable-profiling=gprof}
|
|
@cindex @code{gprof}
|
|
Build with support for @command{gprof} (@GMPpxreftop{gprof, GNU gprof}), which
|
|
means @samp{-pg} added to the @samp{CFLAGS}.
|
|
|
|
This provides call graph construction in addition to call counting and program
|
|
counter sampling, which makes it possible to count calls coming from different
|
|
locations. For example the number of calls to @code{mpn_mul} from
|
|
@code{mpz_mul} versus the number from @code{mpf_mul}. The program counter
|
|
sampling is still flat though, so only a total time in @code{mpn_mul} would be
|
|
accumulated, not a separate amount for each call site.
|
|
|
|
The x86 assembler code has support for this option, but on other processors
|
|
the assembler routines will be as if compiled without @samp{-pg} and therefore
|
|
not be included in the call counts.
|
|
|
|
On x86 and m68k systems @samp{-pg} and @samp{-fomit-frame-pointer} are
|
|
incompatible, so the latter is omitted from the default flags in that case,
|
|
which might result in poorer code generation.
|
|
|
|
Incidentally, it should be possible to use the @command{gprof} program with a
|
|
plain @samp{--enable-profiling=prof} build. But in that case only the
|
|
@samp{gprof -p} flat profile and call counts can be expected to be valid, not
|
|
the @samp{gprof -q} call graph.
|
|
|
|
@item @samp{--enable-profiling=instrument}
|
|
@cindex @code{-finstrument-functions}
|
|
@cindex @code{instrument-functions}
|
|
Build with the GCC option @samp{-finstrument-functions} added to the
|
|
@samp{CFLAGS} (@pxref{Code Gen Options,, Options for Code Generation, gcc,
|
|
Using the GNU Compiler Collection (GCC)}).
|
|
|
|
This inserts special instrumenting calls at the start and end of each
|
|
function, allowing exact timing and full call graph construction.
|
|
|
|
This instrumenting is not normally a standard system feature and will require
|
|
support from an external library, such as
|
|
|
|
@cindex FunctionCheck
|
|
@cindex fnccheck
|
|
@display
|
|
@uref{http://sourceforge.net/projects/fnccheck/}
|
|
@end display
|
|
|
|
This should be included in @samp{LIBS} during the MPIR configure so that test
|
|
programs will link. For example,
|
|
|
|
@example
|
|
./configure --enable-profiling=instrument LIBS=-lfc
|
|
@end example
|
|
|
|
On a GNU system the C library provides dummy instrumenting functions, so
|
|
programs compiled with this option will link. In this case it's only
|
|
necessary to ensure the correct library is added when linking an application.
|
|
|
|
The x86 assembler code supports this option, but on other processors the
|
|
assembler routines will be as if compiled without
|
|
@samp{-finstrument-functions} meaning time spent in them will effectively be
|
|
attributed to their caller.
|
|
@end table
|
|
|
|
|
|
@node Autoconf, Emacs, Profiling, MPIR Basics
|
|
@section Autoconf
|
|
@cindex Autoconf
|
|
|
|
Autoconf based applications can easily check whether MPIR is installed. The
|
|
only thing to be noted is that GMP/MPIR library symbols from version 3 of GMP
|
|
and version 1 of MPIR onwards have prefixes like @code{__gmpz}. The following
|
|
therefore would be a simple test,
|
|
|
|
@cindex @code{AC_CHECK_LIB}
|
|
@example
|
|
AC_CHECK_LIB(mpir, __gmpz_init)
|
|
@end example
|
|
|
|
This just uses the default @code{AC_CHECK_LIB} actions for found or not found,
|
|
but an application that must have MPIR would want to generate an error if not
|
|
found. For example,
|
|
|
|
@example
|
|
AC_CHECK_LIB(mpir, __gmpz_init, ,
|
|
[AC_MSG_ERROR([MPIR not found, see http://www.mpir.org/])])
|
|
@end example
|
|
|
|
If functions added in some particular version of GMP/MPIR are required, then one of
|
|
those can be used when checking. For example @code{mpz_mul_si} was added in
|
|
GMP 3.1,
|
|
|
|
@example
|
|
AC_CHECK_LIB(mpir, __gmpz_mul_si, ,
|
|
[AC_MSG_ERROR(
|
|
[GMP/MPIR not found, or not GMP 3.1 or up or MPIR 1.0 or up, see http://www.mpir.org/])])
|
|
@end example
|
|
|
|
An alternative would be to test the version number in @file{mpir.h} using say
|
|
@code{AC_EGREP_CPP}. That would make it possible to test the exact version,
|
|
if some particular sub-minor release is known to be necessary.
|
|
|
|
In general it's recommended that applications should simply demand a new
|
|
enough MPIR rather than trying to provide supplements for features not
|
|
available in past versions.
|
|
|
|
Occasionally an application will need or want to know the size of a type at
|
|
configuration or preprocessing time, not just with @code{sizeof} in the code.
|
|
This can be done in the normal way with @code{mp_limb_t} etc, but GMP 4.0 or
|
|
up and MPIR 1.0 and up is best for this, since prior versions needed certain
|
|
@samp{-D} defines on systems using a @code{long long} limb. The following
|
|
would suit Autoconf 2.50 or up,
|
|
|
|
@example
|
|
AC_CHECK_SIZEOF(mp_limb_t, , [#include <mpir.h>])
|
|
@end example
|
|
|
|
|
|
@node Emacs, , Autoconf, MPIR Basics
|
|
@section Emacs
|
|
@cindex Emacs
|
|
@cindex @code{info-lookup-symbol}
|
|
|
|
@key{C-h C-i} (@code{info-lookup-symbol}) is a good way to find documentation
|
|
on C functions while editing (@pxref{Info Lookup, , Info Documentation Lookup,
|
|
emacs, The Emacs Editor}).
|
|
|
|
The MPIR manual can be included in such lookups by putting the following in
|
|
your @file{.emacs},
|
|
|
|
@c This isn't pretty, but there doesn't seem to be a better way (in emacs
|
|
@c 21.2 at least). info-lookup->mode-value could be used for the "assoc"s,
|
|
@c but that function isn't documented, whereas info-lookup-alist is.
|
|
@c
|
|
@example
|
|
(eval-after-load "info-look"
|
|
'(let ((mode-value (assoc 'c-mode (assoc 'symbol info-lookup-alist))))
|
|
(setcar (nthcdr 3 mode-value)
|
|
(cons '("(gmp)Function Index" nil "^ -.* " "\\>")
|
|
(nth 3 mode-value)))))
|
|
@end example
|
|
|
|
|
|
@node Reporting Bugs, Integer Functions, MPIR Basics, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Reporting Bugs
|
|
@cindex Reporting bugs
|
|
@cindex Bug reporting
|
|
|
|
If you think you have found a bug in the MPIR library, please investigate it
|
|
and report it. We have made this library available to you, and it is not too
|
|
much to ask you to report the bugs you find.
|
|
|
|
Before you report a bug, check it's not already addressed in @ref{Known Build
|
|
Problems}, or perhaps @ref{Notes for Particular Systems}. You may also want
|
|
to check @uref{http://www.mpir.org/} for patches for this release.
|
|
|
|
Please include the following in any report,
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The MPIR version number, and if pre-packaged or patched then say so.
|
|
|
|
@item
|
|
A test program that makes it possible for us to reproduce the bug. Include
|
|
instructions on how to run the program.
|
|
|
|
@item
|
|
A description of what is wrong. If the results are incorrect, in what way.
|
|
If you get a crash, say so.
|
|
|
|
@item
|
|
If you get a crash, include a stack backtrace from the debugger if it's
|
|
informative (@samp{where} in @command{gdb}, or @samp{$C} in @command{adb}).
|
|
|
|
@item
|
|
Please do not send core dumps, executables or @command{strace}s.
|
|
|
|
@item
|
|
The configuration options you used when building MPIR, if any.
|
|
|
|
@item
|
|
The name of the compiler and its version. For @command{gcc}, get the version
|
|
with @samp{gcc -v}, otherwise perhaps @samp{what `which cc`}, or similar.
|
|
|
|
@item
|
|
The output from running @samp{uname -a}.
|
|
|
|
@item
|
|
The output from running @samp{./config.guess}, and from running
|
|
@samp{./configfsf.guess} (might be the same).
|
|
|
|
@item
|
|
If the bug is related to @samp{configure}, then the contents of
|
|
@file{config.log}.
|
|
|
|
@item
|
|
If the bug is related to an @file{asm} file not assembling, then the contents
|
|
of @file{config.m4} and the offending line or lines from the temporary
|
|
@file{mpn/tmp-<file>.s}.
|
|
@end itemize
|
|
|
|
Please make an effort to produce a self-contained report, with something
|
|
definite that can be tested or debugged. Vague queries or piecemeal messages
|
|
are difficult to act on and don't help the development effort.
|
|
|
|
It is not uncommon that an observed problem is actually due to a bug in the
|
|
compiler; the MPIR code tends to explore interesting corners in compilers.
|
|
|
|
If your bug report is good, we will do our best to help you get a corrected
|
|
version of the library; if the bug report is poor, we won't do anything about
|
|
it (except maybe ask you to send a better report).
|
|
|
|
Send your report to: @uref{http://groups.google.com/group/mpir-devel}.
|
|
|
|
If you think something in this manual is unclear, or downright incorrect, or if
|
|
the language needs to be improved, please send a note to the same address.
|
|
|
|
|
|
@node Integer Functions, Rational Number Functions, Reporting Bugs, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Integer Functions
|
|
@cindex Integer functions
|
|
|
|
This chapter describes the MPIR functions for performing integer arithmetic.
|
|
These functions start with the prefix @code{mpz_}.
|
|
|
|
MPIR integers are stored in objects of type @code{mpz_t}.
|
|
|
|
@menu
|
|
* Initializing Integers::
|
|
* Assigning Integers::
|
|
* Simultaneous Integer Init & Assign::
|
|
* Converting Integers::
|
|
* Integer Arithmetic::
|
|
* Integer Division::
|
|
* Integer Exponentiation::
|
|
* Integer Roots::
|
|
* Number Theoretic Functions::
|
|
* Integer Comparisons::
|
|
* Integer Logic and Bit Fiddling::
|
|
* I/O of Integers::
|
|
* Integer Random Numbers::
|
|
* Integer Import and Export::
|
|
* Miscellaneous Integer Functions::
|
|
* Integer Special Functions::
|
|
@end menu
|
|
|
|
@node Initializing Integers, Assigning Integers, Integer Functions, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Initialization Functions
|
|
@cindex Integer initialization functions
|
|
@cindex Initialization functions
|
|
|
|
The functions for integer arithmetic assume that all integer objects are
|
|
initialized. You do that by calling the function @code{mpz_init}. For
|
|
example,
|
|
|
|
@example
|
|
@{
|
|
mpz_t integ;
|
|
mpz_init (integ);
|
|
@dots{}
|
|
mpz_add (integ, @dots{});
|
|
@dots{}
|
|
mpz_sub (integ, @dots{});
|
|
|
|
/* Unless the program is about to exit, do ... */
|
|
mpz_clear (integ);
|
|
@}
|
|
@end example
|
|
|
|
As you can see, you can store new values any number of times, once an
|
|
object is initialized.
|
|
|
|
@deftypefun void mpz_init (mpz_t @var{integer})
|
|
Initialize @var{integer}, and set its value to 0.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_init2 (mpz_t @var{integer}, unsigned long @var{n})
|
|
Initialize @var{integer}, with space for @var{n} bits, and set its value to 0.
|
|
|
|
@var{n} is only the initial space, @var{integer} will grow automatically in
|
|
the normal way, if necessary, for subsequent values stored. @code{mpz_init2}
|
|
makes it possible to avoid such reallocations if a maximum size is known in
|
|
advance.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_clear (mpz_t @var{integer})
|
|
Free the space occupied by @var{integer}. Call this function for all
|
|
@code{mpz_t} variables when you are done with them.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_realloc2 (mpz_t @var{integer}, unsigned long @var{n})
|
|
Change the space allocated for @var{integer} to @var{n} bits. The value in
|
|
@var{integer} is preserved if it fits, or is set to 0 if not.
|
|
|
|
This function can be used to increase the space for a variable in order to
|
|
avoid repeated automatic reallocations, or to decrease it to give memory back
|
|
to the heap.
|
|
@end deftypefun
|
|
|
|
|
|
@node Assigning Integers, Simultaneous Integer Init & Assign, Initializing Integers, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Assignment Functions
|
|
@cindex Integer assignment functions
|
|
@cindex Assignment functions
|
|
|
|
These functions assign new values to already initialized integers
|
|
(@pxref{Initializing Integers}).
|
|
|
|
@deftypefun void mpz_set (mpz_t @var{rop}, mpz_t @var{op})
|
|
@deftypefunx void mpz_set_ui (mpz_t @var{rop}, unsigned long int @var{op})
|
|
@deftypefunx void mpz_set_si (mpz_t @var{rop}, signed long int @var{op})
|
|
@deftypefunx void mpz_set_d (mpz_t @var{rop}, double @var{op})
|
|
@deftypefunx void mpz_set_q (mpz_t @var{rop}, mpq_t @var{op})
|
|
@deftypefunx void mpz_set_f (mpz_t @var{rop}, mpf_t @var{op})
|
|
Set the value of @var{rop} from @var{op}.
|
|
|
|
@code{mpz_set_d}, @code{mpz_set_q} and @code{mpz_set_f} truncate @var{op} to
|
|
make it an integer.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_set_str (mpz_t @var{rop}, char *@var{str}, int @var{base})
|
|
Set the value of @var{rop} from @var{str}, a null-terminated C string in base
|
|
@var{base}. White space is allowed in the string, and is simply ignored.
|
|
|
|
The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading
|
|
characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and
|
|
@code{0B} for binary, @code{0} for octal, or decimal otherwise.
|
|
|
|
For bases up to 36, case is ignored; upper-case and lower-case letters have
|
|
the same value. For bases 37 to 62, upper-case letter represent the usual
|
|
10..35 while lower-case letter represent 36..61.
|
|
|
|
This function returns 0 if the entire string is a valid number in base
|
|
@var{base}. Otherwise it returns @minus{}1.
|
|
@c
|
|
@c It turns out that it is not entirely true that this function ignores
|
|
@c white-space. It does ignore it between digits, but not after a minus sign
|
|
@c or within or after ``0x''. Some thought was given to disallowing all
|
|
@c whitespace, but that would be an incompatible change, whitespace has been
|
|
@c documented as ignored ever since GMP 1.
|
|
@c
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_swap (mpz_t @var{rop1}, mpz_t @var{rop2})
|
|
Swap the values @var{rop1} and @var{rop2} efficiently.
|
|
@end deftypefun
|
|
|
|
|
|
@node Simultaneous Integer Init & Assign, Converting Integers, Assigning Integers, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Combined Initialization and Assignment Functions
|
|
@cindex Integer assignment functions
|
|
@cindex Assignment functions
|
|
@cindex Integer initialization functions
|
|
@cindex Initialization functions
|
|
|
|
For convenience, MPIR provides a parallel series of initialize-and-set functions
|
|
which initialize the output and then store the value there. These functions'
|
|
names have the form @code{mpz_init_set@dots{}}
|
|
|
|
Here is an example of using one:
|
|
|
|
@example
|
|
@{
|
|
mpz_t pie;
|
|
mpz_init_set_str (pie, "3141592653589793238462643383279502884", 10);
|
|
@dots{}
|
|
mpz_sub (pie, @dots{});
|
|
@dots{}
|
|
mpz_clear (pie);
|
|
@}
|
|
@end example
|
|
|
|
@noindent
|
|
Once the integer has been initialized by any of the @code{mpz_init_set@dots{}}
|
|
functions, it can be used as the source or destination operand for the ordinary
|
|
integer functions. Don't use an initialize-and-set function on a variable
|
|
already initialized!
|
|
|
|
@deftypefun void mpz_init_set (mpz_t @var{rop}, mpz_t @var{op})
|
|
@deftypefunx void mpz_init_set_ui (mpz_t @var{rop}, unsigned long int @var{op})
|
|
@deftypefunx void mpz_init_set_si (mpz_t @var{rop}, signed long int @var{op})
|
|
@deftypefunx void mpz_init_set_d (mpz_t @var{rop}, double @var{op})
|
|
Initialize @var{rop} with limb space and set the initial numeric value from
|
|
@var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_init_set_str (mpz_t @var{rop}, char *@var{str}, int @var{base})
|
|
Initialize @var{rop} and set its value like @code{mpz_set_str} (see its
|
|
documentation above for details).
|
|
|
|
If the string is a correct base @var{base} number, the function returns 0;
|
|
if an error occurs it returns @minus{}1. @var{rop} is initialized even if
|
|
an error occurs. (I.e., you have to call @code{mpz_clear} for it.)
|
|
@end deftypefun
|
|
|
|
|
|
@node Converting Integers, Integer Arithmetic, Simultaneous Integer Init & Assign, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Conversion Functions
|
|
@cindex Integer conversion functions
|
|
@cindex Conversion functions
|
|
|
|
This section describes functions for converting MPIR integers to standard C
|
|
types. Functions for converting @emph{to} MPIR integers are described in
|
|
@ref{Assigning Integers} and @ref{I/O of Integers}.
|
|
|
|
@deftypefun {unsigned long int} mpz_get_ui (mpz_t @var{op})
|
|
Return the value of @var{op} as an @code{unsigned long}.
|
|
|
|
If @var{op} is too big to fit an @code{unsigned long} then just the least
|
|
significant bits that do fit are returned. The sign of @var{op} is ignored,
|
|
only the absolute value is used.
|
|
@end deftypefun
|
|
|
|
@deftypefun {signed long int} mpz_get_si (mpz_t @var{op})
|
|
If @var{op} fits into a @code{signed long int} return the value of @var{op}.
|
|
Otherwise return the least significant part of @var{op}, with the same sign
|
|
as @var{op}.
|
|
|
|
If @var{op} is too big to fit in a @code{signed long int}, the returned
|
|
result is probably not very useful. To find out if the value will fit, use
|
|
the function @code{mpz_fits_slong_p}.
|
|
@end deftypefun
|
|
|
|
@deftypefun double mpz_get_d (mpz_t @var{op})
|
|
Convert @var{op} to a @code{double}, truncating if necessary (ie.@: rounding
|
|
towards zero).
|
|
|
|
If the exponent from the conversion is too big, the result is system
|
|
dependent. An infinity is returned where available. A hardware overflow trap
|
|
may or may not occur.
|
|
@end deftypefun
|
|
|
|
@deftypefun double mpz_get_d_2exp (signed long int *@var{exp}, mpz_t @var{op})
|
|
Convert @var{op} to a @code{double}, truncating if necessary (ie.@: rounding
|
|
towards zero), and returning the exponent separately.
|
|
|
|
The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the
|
|
exponent is stored to @code{*@var{exp}}. @m{@var{d} * 2^{exp}, @var{d} *
|
|
2^@var{exp}} is the (truncated) @var{op} value. If @var{op} is zero, the
|
|
return is @math{0.0} and 0 is stored to @code{*@var{exp}}.
|
|
|
|
@cindex @code{frexp}
|
|
This is similar to the standard C @code{frexp} function (@pxref{Normalization
|
|
Functions,,, libc, The GNU C Library Reference Manual}).
|
|
@end deftypefun
|
|
|
|
@deftypefun {char *} mpz_get_str (char *@var{str}, int @var{base}, mpz_t @var{op})
|
|
Convert @var{op} to a string of digits in base @var{base}. The base may vary
|
|
from 2 to 36.
|
|
|
|
If @var{str} is @code{NULL}, the result string is allocated using the current
|
|
allocation function (@pxref{Custom Allocation}). The block will be
|
|
@code{strlen(str)+1} bytes, that being exactly enough for the string and
|
|
null-terminator.
|
|
|
|
If @var{str} is not @code{NULL}, it should point to a block of storage large
|
|
enough for the result, that being @code{mpz_sizeinbase (@var{op}, @var{base})
|
|
+ 2}. The two extra bytes are for a possible minus sign, and the
|
|
null-terminator.
|
|
|
|
A pointer to the result string is returned, being either the allocated block,
|
|
or the given @var{str}.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Integer Arithmetic, Integer Division, Converting Integers, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Arithmetic Functions
|
|
@cindex Integer arithmetic functions
|
|
@cindex Arithmetic functions
|
|
|
|
@deftypefun void mpz_add (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefunx void mpz_add_ui (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @math{@var{op1} + @var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_sub (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefunx void mpz_sub_ui (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long int @var{op2})
|
|
@deftypefunx void mpz_ui_sub (mpz_t @var{rop}, unsigned long int @var{op1}, mpz_t @var{op2})
|
|
Set @var{rop} to @var{op1} @minus{} @var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_mul (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefunx void mpz_mul_si (mpz_t @var{rop}, mpz_t @var{op1}, long int @var{op2})
|
|
@deftypefunx void mpz_mul_ui (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_addmul (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefunx void mpz_addmul_ui (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @math{@var{rop} + @var{op1} @GMPtimes{} @var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_submul (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefunx void mpz_submul_ui (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @math{@var{rop} - @var{op1} @GMPtimes{} @var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_mul_2exp (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long int @var{op2})
|
|
@cindex Bit shift left
|
|
Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
|
|
@var{op2}}. This operation can also be defined as a left shift by @var{op2}
|
|
bits.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_neg (mpz_t @var{rop}, mpz_t @var{op})
|
|
Set @var{rop} to @minus{}@var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_abs (mpz_t @var{rop}, mpz_t @var{op})
|
|
Set @var{rop} to the absolute value of @var{op}.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Integer Division, Integer Exponentiation, Integer Arithmetic, Integer Functions
|
|
@section Division Functions
|
|
@cindex Integer division functions
|
|
@cindex Division functions
|
|
|
|
Division is undefined if the divisor is zero. Passing a zero divisor to the
|
|
division or modulo functions (including the modular powering functions
|
|
@code{mpz_powm} and @code{mpz_powm_ui}), will cause an intentional division by
|
|
zero. This lets a program handle arithmetic exceptions in these functions the
|
|
same way as for normal C @code{int} arithmetic.
|
|
|
|
@c Separate deftypefun groups for cdiv, fdiv and tdiv produce a blank line
|
|
@c between each, and seem to let tex do a better job of page breaks than an
|
|
@c @sp 1 in the middle of one big set.
|
|
|
|
@deftypefun void mpz_cdiv_q (mpz_t @var{q}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx void mpz_cdiv_r (mpz_t @var{r}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx void mpz_cdiv_qr (mpz_t @var{q}, mpz_t @var{r}, mpz_t @var{n}, mpz_t @var{d})
|
|
@maybepagebreak
|
|
@deftypefunx {unsigned long int} mpz_cdiv_q_ui (mpz_t @var{q}, mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_cdiv_r_ui (mpz_t @var{r}, mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_cdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{mpz_t @var{n}}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_cdiv_ui (mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@maybepagebreak
|
|
@deftypefunx void mpz_cdiv_q_2exp (mpz_t @var{q}, mpz_t @var{n}, @w{unsigned long int @var{b}})
|
|
@deftypefunx void mpz_cdiv_r_2exp (mpz_t @var{r}, mpz_t @var{n}, @w{unsigned long int @var{b}})
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_fdiv_q (mpz_t @var{q}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx void mpz_fdiv_r (mpz_t @var{r}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx void mpz_fdiv_qr (mpz_t @var{q}, mpz_t @var{r}, mpz_t @var{n}, mpz_t @var{d})
|
|
@maybepagebreak
|
|
@deftypefunx {unsigned long int} mpz_fdiv_q_ui (mpz_t @var{q}, mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_fdiv_r_ui (mpz_t @var{r}, mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_fdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{mpz_t @var{n}}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_fdiv_ui (mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@maybepagebreak
|
|
@deftypefunx void mpz_fdiv_q_2exp (mpz_t @var{q}, mpz_t @var{n}, @w{unsigned long int @var{b}})
|
|
@deftypefunx void mpz_fdiv_r_2exp (mpz_t @var{r}, mpz_t @var{n}, @w{unsigned long int @var{b}})
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_tdiv_q (mpz_t @var{q}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx void mpz_tdiv_r (mpz_t @var{r}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx void mpz_tdiv_qr (mpz_t @var{q}, mpz_t @var{r}, mpz_t @var{n}, mpz_t @var{d})
|
|
@maybepagebreak
|
|
@deftypefunx {unsigned long int} mpz_tdiv_q_ui (mpz_t @var{q}, mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_tdiv_r_ui (mpz_t @var{r}, mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_tdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{mpz_t @var{n}}, @w{unsigned long int @var{d}})
|
|
@deftypefunx {unsigned long int} mpz_tdiv_ui (mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
@maybepagebreak
|
|
@deftypefunx void mpz_tdiv_q_2exp (mpz_t @var{q}, mpz_t @var{n}, @w{unsigned long int @var{b}})
|
|
@deftypefunx void mpz_tdiv_r_2exp (mpz_t @var{r}, mpz_t @var{n}, @w{unsigned long int @var{b}})
|
|
@cindex Bit shift right
|
|
|
|
@sp 1
|
|
Divide @var{n} by @var{d}, forming a quotient @var{q} and/or remainder
|
|
@var{r}. For the @code{2exp} functions, @m{@var{d}=2^b, @var{d}=2^@var{b}}.
|
|
The rounding is in three styles, each suiting different applications.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@code{cdiv} rounds @var{q} up towards @m{+\infty, +infinity}, and @var{r} will
|
|
have the opposite sign to @var{d}. The @code{c} stands for ``ceil''.
|
|
|
|
@item
|
|
@code{fdiv} rounds @var{q} down towards @m{-\infty, @minus{}infinity}, and
|
|
@var{r} will have the same sign as @var{d}. The @code{f} stands for
|
|
``floor''.
|
|
|
|
@item
|
|
@code{tdiv} rounds @var{q} towards zero, and @var{r} will have the same sign
|
|
as @var{n}. The @code{t} stands for ``truncate''.
|
|
@end itemize
|
|
|
|
In all cases @var{q} and @var{r} will satisfy
|
|
@m{@var{n}=@var{q}@var{d}+@var{r}, @var{n}=@var{q}*@var{d}+@var{r}}, and
|
|
@var{r} will satisfy @math{0@le{}@GMPabs{@var{r}}<@GMPabs{@var{d}}}.
|
|
|
|
The @code{q} functions calculate only the quotient, the @code{r} functions
|
|
only the remainder, and the @code{qr} functions calculate both. Note that for
|
|
@code{qr} the same variable cannot be passed for both @var{q} and @var{r}, or
|
|
results will be unpredictable.
|
|
|
|
For the @code{ui} variants the return value is the remainder, and in fact
|
|
returning the remainder is all the @code{div_ui} functions do. For
|
|
@code{tdiv} and @code{cdiv} the remainder can be negative, so for those the
|
|
return value is the absolute value of the remainder.
|
|
|
|
For the @code{2exp} variants the divisor is @m{2^b,2^@var{b}}. These
|
|
functions are implemented as right shifts and bit masks, but of course they
|
|
round the same as the other functions.
|
|
|
|
For positive @var{n} both @code{mpz_fdiv_q_2exp} and @code{mpz_tdiv_q_2exp}
|
|
are simple bitwise right shifts. For negative @var{n}, @code{mpz_fdiv_q_2exp}
|
|
is effectively an arithmetic right shift treating @var{n} as twos complement
|
|
the same as the bitwise logical functions do, whereas @code{mpz_tdiv_q_2exp}
|
|
effectively treats @var{n} as sign and magnitude.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_mod (mpz_t @var{r}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx {unsigned long int} mpz_mod_ui (mpz_t @var{r}, mpz_t @var{n}, @w{unsigned long int @var{d}})
|
|
Set @var{r} to @var{n} @code{mod} @var{d}. The sign of the divisor is
|
|
ignored; the result is always non-negative.
|
|
|
|
@code{mpz_mod_ui} is identical to @code{mpz_fdiv_r_ui} above, returning the
|
|
remainder as well as setting @var{r}. See @code{mpz_fdiv_ui} above if only
|
|
the return value is wanted.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_divexact (mpz_t @var{q}, mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx void mpz_divexact_ui (mpz_t @var{q}, mpz_t @var{n}, unsigned long @var{d})
|
|
@cindex Exact division functions
|
|
Set @var{q} to @var{n}/@var{d}. These functions produce correct results only
|
|
when it is known in advance that @var{d} divides @var{n}.
|
|
|
|
These routines are much faster than the other division functions, and are the
|
|
best choice when exact division is known to occur, for example reducing a
|
|
rational to lowest terms.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_divisible_p (mpz_t @var{n}, mpz_t @var{d})
|
|
@deftypefunx int mpz_divisible_ui_p (mpz_t @var{n}, unsigned long int @var{d})
|
|
@deftypefunx int mpz_divisible_2exp_p (mpz_t @var{n}, unsigned long int @var{b})
|
|
@cindex Divisibility functions
|
|
Return non-zero if @var{n} is exactly divisible by @var{d}, or in the case of
|
|
@code{mpz_divisible_2exp_p} by @m{2^b,2^@var{b}}.
|
|
|
|
@var{n} is divisible by @var{d} if there exists an integer @var{q} satisfying
|
|
@math{@var{n} = @var{q}@GMPmultiply{}@var{d}}. Unlike the other division
|
|
functions, @math{@var{d}=0} is accepted and following the rule it can be seen
|
|
that only 0 is considered divisible by 0.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_congruent_p (mpz_t @var{n}, mpz_t @var{c}, mpz_t @var{d})
|
|
@deftypefunx int mpz_congruent_ui_p (mpz_t @var{n}, unsigned long int @var{c}, unsigned long int @var{d})
|
|
@deftypefunx int mpz_congruent_2exp_p (mpz_t @var{n}, mpz_t @var{c}, unsigned long int @var{b})
|
|
@cindex Divisibility functions
|
|
@cindex Congruence functions
|
|
Return non-zero if @var{n} is congruent to @var{c} modulo @var{d}, or in the
|
|
case of @code{mpz_congruent_2exp_p} modulo @m{2^b,2^@var{b}}.
|
|
|
|
@var{n} is congruent to @var{c} mod @var{d} if there exists an integer @var{q}
|
|
satisfying @math{@var{n} = @var{c} + @var{q}@GMPmultiply{}@var{d}}. Unlike
|
|
the other division functions, @math{@var{d}=0} is accepted and following the
|
|
rule it can be seen that @var{n} and @var{c} are considered congruent mod 0
|
|
only when exactly equal.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Integer Exponentiation, Integer Roots, Integer Division, Integer Functions
|
|
@section Exponentiation Functions
|
|
@cindex Integer exponentiation functions
|
|
@cindex Exponentiation functions
|
|
@cindex Powering functions
|
|
|
|
@deftypefun void mpz_powm (mpz_t @var{rop}, mpz_t @var{base}, mpz_t @var{exp}, mpz_t @var{mod})
|
|
@deftypefunx void mpz_powm_ui (mpz_t @var{rop}, mpz_t @var{base}, unsigned long int @var{exp}, mpz_t @var{mod})
|
|
Set @var{rop} to @m{base^{exp} \bmod mod, (@var{base} raised to @var{exp})
|
|
modulo @var{mod}}.
|
|
|
|
Negative @var{exp} is supported if an inverse @math{@var{base}^@W{-1} @bmod
|
|
@var{mod}} exists (see @code{mpz_invert} in @ref{Number Theoretic Functions}).
|
|
If an inverse doesn't exist then a divide by zero is raised.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_pow_ui (mpz_t @var{rop}, mpz_t @var{base}, unsigned long int @var{exp})
|
|
@deftypefunx void mpz_ui_pow_ui (mpz_t @var{rop}, unsigned long int @var{base}, unsigned long int @var{exp})
|
|
Set @var{rop} to @m{base^{exp}, @var{base} raised to @var{exp}}. The case
|
|
@math{0^0} yields 1.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Integer Roots, Number Theoretic Functions, Integer Exponentiation, Integer Functions
|
|
@section Root Extraction Functions
|
|
@cindex Integer root functions
|
|
@cindex Root extraction functions
|
|
|
|
@deftypefun int mpz_root (mpz_t @var{rop}, mpz_t @var{op}, unsigned long int @var{n})
|
|
Set @var{rop} to @m{\lfloor\root n \of {op}\rfloor@C{},} the truncated integer
|
|
part of the @var{n}th root of @var{op}. Return non-zero if the computation
|
|
was exact, i.e., if @var{op} is @var{rop} to the @var{n}th power.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_rootrem (mpz_t @var{root}, mpz_t @var{rem}, mpz_t @var{u}, unsigned long int @var{n})
|
|
Set @var{root} to @m{\lfloor\root n \of {u}\rfloor@C{},} the truncated
|
|
integer part of the @var{n}th root of @var{u}. Set @var{rem} to the
|
|
remainder, @m{(@var{u} - @var{root}^n),
|
|
@var{u}@minus{}@var{root}**@var{n}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_sqrt (mpz_t @var{rop}, mpz_t @var{op})
|
|
Set @var{rop} to @m{\lfloor\sqrt{@var{op}}\rfloor@C{},} the truncated
|
|
integer part of the square root of @var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_sqrtrem (mpz_t @var{rop1}, mpz_t @var{rop2}, mpz_t @var{op})
|
|
Set @var{rop1} to @m{\lfloor\sqrt{@var{op}}\rfloor, the truncated integer part
|
|
of the square root of @var{op}}, like @code{mpz_sqrt}. Set @var{rop2} to the
|
|
remainder @m{(@var{op} - @var{rop1}^2),
|
|
@var{op}@minus{}@var{rop1}*@var{rop1}}, which will be zero if @var{op} is a
|
|
perfect square.
|
|
|
|
If @var{rop1} and @var{rop2} are the same variable, the results are
|
|
undefined.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_perfect_power_p (mpz_t @var{op})
|
|
@cindex Perfect power functions
|
|
@cindex Root testing functions
|
|
Return non-zero if @var{op} is a perfect power, i.e., if there exist integers
|
|
@m{a,@var{a}} and @m{b,@var{b}}, with @m{b>1, @var{b}>1}, such that
|
|
@m{@var{op}=a^b, @var{op} equals @var{a} raised to the power @var{b}}.
|
|
|
|
Under this definition both 0 and 1 are considered to be perfect powers.
|
|
Negative values of @var{op} are accepted, but of course can only be odd
|
|
perfect powers.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_perfect_square_p (mpz_t @var{op})
|
|
@cindex Perfect square functions
|
|
@cindex Root testing functions
|
|
Return non-zero if @var{op} is a perfect square, i.e., if the square root of
|
|
@var{op} is an integer. Under this definition both 0 and 1 are considered to
|
|
be perfect squares.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Number Theoretic Functions, Integer Comparisons, Integer Roots, Integer Functions
|
|
@section Number Theoretic Functions
|
|
@cindex Number theoretic functions
|
|
|
|
@deftypefun int mpz_probab_prime_p (mpz_t @var{n}, int @var{reps})
|
|
@cindex Prime testing functions
|
|
@cindex Probable prime testing functions
|
|
Determine whether @var{n} is prime. Return 2 if @var{n} is definitely prime,
|
|
return 1 if @var{n} is probably prime (without being certain), or return 0 if
|
|
@var{n} is definitely composite.
|
|
|
|
This function does some trial divisions, then some Miller-Rabin probabilistic
|
|
primality tests. @var{reps} controls how many such tests are done, 5 to 10 is
|
|
a reasonable number, more will reduce the chances of a composite being
|
|
returned as ``probably prime''.
|
|
|
|
Miller-Rabin and similar tests can be more properly called compositeness
|
|
tests. Numbers which fail are known to be composite but those which pass
|
|
might be prime or might be composite. Only a few composites pass, hence those
|
|
which pass are considered probably prime.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_nextprime (mpz_t @var{rop}, mpz_t @var{op})
|
|
@cindex Next prime function
|
|
Set @var{rop} to the next prime greater than @var{op}.
|
|
|
|
This function uses a probabilistic algorithm to identify primes. For
|
|
practical purposes it's adequate, the chance of a composite passing will be
|
|
extremely small.
|
|
@end deftypefun
|
|
|
|
@c mpz_prime_p not implemented as of gmp 3.0.
|
|
|
|
@c @deftypefun int mpz_prime_p (mpz_t @var{n})
|
|
@c Return non-zero if @var{n} is prime and zero if @var{n} is a non-prime.
|
|
@c This function is far slower than @code{mpz_probab_prime_p}, but then it
|
|
@c never returns non-zero for composite numbers.
|
|
|
|
@c (For practical purposes, using @code{mpz_probab_prime_p} is adequate.
|
|
@c The likelihood of a programming error or hardware malfunction is orders
|
|
@c of magnitudes greater than the likelihood for a composite to pass as a
|
|
@c prime, if the @var{reps} argument is in the suggested range.)
|
|
@c @end deftypefun
|
|
|
|
@deftypefun void mpz_gcd (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@cindex Greatest common divisor functions
|
|
@cindex GCD functions
|
|
Set @var{rop} to the greatest common divisor of @var{op1} and @var{op2}.
|
|
The result is always positive even if one or both input operands
|
|
are negative.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpz_gcd_ui (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long int @var{op2})
|
|
Compute the greatest common divisor of @var{op1} and @var{op2}. If
|
|
@var{rop} is not @code{NULL}, store the result there.
|
|
|
|
If the result is small enough to fit in an @code{unsigned long int}, it is
|
|
returned. If the result does not fit, 0 is returned, and the result is equal
|
|
to the argument @var{op1}. Note that the result will always fit if @var{op2}
|
|
is non-zero.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_gcdext (mpz_t @var{g}, mpz_t @var{s}, mpz_t @var{t}, mpz_t @var{a}, mpz_t @var{b})
|
|
@cindex Extended GCD
|
|
@cindex GCD extended
|
|
Set @var{g} to the greatest common divisor of @var{a} and @var{b}, and in
|
|
addition set @var{s} and @var{t} to coefficients satisfying
|
|
@math{@var{a}@GMPmultiply{}@var{s} + @var{b}@GMPmultiply{}@var{t} = @var{g}}.
|
|
@var{g} is always positive, even if one or both of @var{a} and @var{b} are
|
|
negative.
|
|
|
|
If @var{t} is @code{NULL} then that value is not computed.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_lcm (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefunx void mpz_lcm_ui (mpz_t @var{rop}, mpz_t @var{op1}, unsigned long @var{op2})
|
|
@cindex Least common multiple functions
|
|
@cindex LCM functions
|
|
Set @var{rop} to the least common multiple of @var{op1} and @var{op2}.
|
|
@var{rop} is always positive, irrespective of the signs of @var{op1} and
|
|
@var{op2}. @var{rop} will be zero if either @var{op1} or @var{op2} is zero.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_invert (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
@cindex Modular inverse functions
|
|
@cindex Inverse modulo functions
|
|
Compute the inverse of @var{op1} modulo @var{op2} and put the result in
|
|
@var{rop}. If the inverse exists, the return value is non-zero and @var{rop}
|
|
will satisfy @math{0 @le{} @var{rop} < @var{op2}}. If an inverse doesn't exist
|
|
the return value is zero and @var{rop} is undefined.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_jacobi (mpz_t @var{a}, mpz_t @var{b})
|
|
@cindex Jacobi symbol functions
|
|
Calculate the Jacobi symbol @m{\left(a \over b\right),
|
|
(@var{a}/@var{b})}. This is defined only for @var{b} odd.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_legendre (mpz_t @var{a}, mpz_t @var{p})
|
|
@cindex Legendre symbol functions
|
|
Calculate the Legendre symbol @m{\left(a \over p\right),
|
|
(@var{a}/@var{p})}. This is defined only for @var{p} an odd positive
|
|
prime, and for such @var{p} it's identical to the Jacobi symbol.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_kronecker (mpz_t @var{a}, mpz_t @var{b})
|
|
@deftypefunx int mpz_kronecker_si (mpz_t @var{a}, long @var{b})
|
|
@deftypefunx int mpz_kronecker_ui (mpz_t @var{a}, unsigned long @var{b})
|
|
@deftypefunx int mpz_si_kronecker (long @var{a}, mpz_t @var{b})
|
|
@deftypefunx int mpz_ui_kronecker (unsigned long @var{a}, mpz_t @var{b})
|
|
@cindex Kronecker symbol functions
|
|
Calculate the Jacobi symbol @m{\left(a \over b\right),
|
|
(@var{a}/@var{b})} with the Kronecker extension @m{\left(a \over
|
|
2\right) = \left(2 \over a\right), (a/2)=(2/a)} when @math{a} odd, or
|
|
@m{\left(a \over 2\right) = 0, (a/2)=0} when @math{a} even.
|
|
|
|
When @var{b} is odd the Jacobi symbol and Kronecker symbol are
|
|
identical, so @code{mpz_kronecker_ui} etc can be used for mixed
|
|
precision Jacobi symbols too.
|
|
|
|
For more information see Henri Cohen section 1.4.2 (@pxref{References}),
|
|
or any number theory textbook. See also the example program
|
|
@file{demos/qcn.c} which uses @code{mpz_kronecker_ui}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpz_remove (mpz_t @var{rop}, mpz_t @var{op}, mpz_t @var{f})
|
|
@cindex Remove factor functions
|
|
@cindex Factor removal functions
|
|
Remove all occurrences of the factor @var{f} from @var{op} and store the
|
|
result in @var{rop}. The return value is how many such occurrences were
|
|
removed.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_fac_ui (mpz_t @var{rop}, unsigned long int @var{op})
|
|
@cindex Factorial functions
|
|
Set @var{rop} to @var{op}!, the factorial of @var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_bin_ui (mpz_t @var{rop}, mpz_t @var{n}, unsigned long int @var{k})
|
|
@deftypefunx void mpz_bin_uiui (mpz_t @var{rop}, unsigned long int @var{n}, @w{unsigned long int @var{k}})
|
|
@cindex Binomial coefficient functions
|
|
Compute the binomial coefficient @m{\left({n}\atop{k}\right), @var{n} over
|
|
@var{k}} and store the result in @var{rop}. Negative values of @var{n} are
|
|
supported by @code{mpz_bin_ui}, using the identity
|
|
@m{\left({-n}\atop{k}\right) = (-1)^k \left({n+k-1}\atop{k}\right),
|
|
bin(-n@C{}k) = (-1)^k * bin(n+k-1@C{}k)}, see Knuth volume 1 section 1.2.6
|
|
part G.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_fib_ui (mpz_t @var{fn}, unsigned long int @var{n})
|
|
@deftypefunx void mpz_fib2_ui (mpz_t @var{fn}, mpz_t @var{fnsub1}, unsigned long int @var{n})
|
|
@cindex Fibonacci sequence functions
|
|
@code{mpz_fib_ui} sets @var{fn} to to @m{F_n,F[n]}, the @var{n}'th Fibonacci
|
|
number. @code{mpz_fib2_ui} sets @var{fn} to @m{F_n,F[n]}, and @var{fnsub1} to
|
|
@m{F_{n-1},F[n-1]}.
|
|
|
|
These functions are designed for calculating isolated Fibonacci numbers. When
|
|
a sequence of values is wanted it's best to start with @code{mpz_fib2_ui} and
|
|
iterate the defining @m{F_{n+1} = F_n + F_{n-1}, F[n+1]=F[n]+F[n-1]} or
|
|
similar.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_lucnum_ui (mpz_t @var{ln}, unsigned long int @var{n})
|
|
@deftypefunx void mpz_lucnum2_ui (mpz_t @var{ln}, mpz_t @var{lnsub1}, unsigned long int @var{n})
|
|
@cindex Lucas number functions
|
|
@code{mpz_lucnum_ui} sets @var{ln} to to @m{L_n,L[n]}, the @var{n}'th Lucas
|
|
number. @code{mpz_lucnum2_ui} sets @var{ln} to @m{L_n,L[n]}, and @var{lnsub1}
|
|
to @m{L_{n-1},L[n-1]}.
|
|
|
|
These functions are designed for calculating isolated Lucas numbers. When a
|
|
sequence of values is wanted it's best to start with @code{mpz_lucnum2_ui} and
|
|
iterate the defining @m{L_{n+1} = L_n + L_{n-1}, L[n+1]=L[n]+L[n-1]} or
|
|
similar.
|
|
|
|
The Fibonacci numbers and Lucas numbers are related sequences, so it's never
|
|
necessary to call both @code{mpz_fib2_ui} and @code{mpz_lucnum2_ui}. The
|
|
formulas for going from Fibonacci to Lucas can be found in @ref{Lucas Numbers
|
|
Algorithm}, the reverse is straightforward too.
|
|
@end deftypefun
|
|
|
|
|
|
@node Integer Comparisons, Integer Logic and Bit Fiddling, Number Theoretic Functions, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Comparison Functions
|
|
@cindex Integer comparison functions
|
|
@cindex Comparison functions
|
|
|
|
@deftypefn Function int mpz_cmp (mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefnx Function int mpz_cmp_d (mpz_t @var{op1}, double @var{op2})
|
|
@deftypefnx Macro int mpz_cmp_si (mpz_t @var{op1}, signed long int @var{op2})
|
|
@deftypefnx Macro int mpz_cmp_ui (mpz_t @var{op1}, unsigned long int @var{op2})
|
|
Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} >
|
|
@var{op2}}, zero if @math{@var{op1} = @var{op2}}, or a negative value if
|
|
@math{@var{op1} < @var{op2}}.
|
|
|
|
@code{mpz_cmp_ui} and @code{mpz_cmp_si} are macros and will evaluate their
|
|
arguments more than once. @code{mpz_cmp_d} can be called with an infinity,
|
|
but results are undefined for a NaN.
|
|
@end deftypefn
|
|
|
|
@deftypefn Function int mpz_cmpabs (mpz_t @var{op1}, mpz_t @var{op2})
|
|
@deftypefnx Function int mpz_cmpabs_d (mpz_t @var{op1}, double @var{op2})
|
|
@deftypefnx Function int mpz_cmpabs_ui (mpz_t @var{op1}, unsigned long int @var{op2})
|
|
Compare the absolute values of @var{op1} and @var{op2}. Return a positive
|
|
value if @math{@GMPabs{@var{op1}} > @GMPabs{@var{op2}}}, zero if
|
|
@math{@GMPabs{@var{op1}} = @GMPabs{@var{op2}}}, or a negative value if
|
|
@math{@GMPabs{@var{op1}} < @GMPabs{@var{op2}}}.
|
|
|
|
@code{mpz_cmpabs_d} can be called with an infinity, but results are undefined
|
|
for a NaN.
|
|
@end deftypefn
|
|
|
|
@deftypefn Macro int mpz_sgn (mpz_t @var{op})
|
|
@cindex Sign tests
|
|
@cindex Integer sign tests
|
|
Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
|
|
@math{-1} if @math{@var{op} < 0}.
|
|
|
|
This function is actually implemented as a macro. It evaluates its argument
|
|
multiple times.
|
|
@end deftypefn
|
|
|
|
|
|
@node Integer Logic and Bit Fiddling, I/O of Integers, Integer Comparisons, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Logical and Bit Manipulation Functions
|
|
@cindex Logical functions
|
|
@cindex Bit manipulation functions
|
|
@cindex Integer logical functions
|
|
@cindex Integer bit manipulation functions
|
|
|
|
These functions behave as if twos complement arithmetic were used (although
|
|
sign-magnitude is the actual implementation). The least significant bit is
|
|
number 0.
|
|
|
|
@deftypefun void mpz_and (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
Set @var{rop} to @var{op1} bitwise-and @var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_ior (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
Set @var{rop} to @var{op1} bitwise inclusive-or @var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_xor (mpz_t @var{rop}, mpz_t @var{op1}, mpz_t @var{op2})
|
|
Set @var{rop} to @var{op1} bitwise exclusive-or @var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_com (mpz_t @var{rop}, mpz_t @var{op})
|
|
Set @var{rop} to the one's complement of @var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpz_popcount (mpz_t @var{op})
|
|
If @math{@var{op}@ge{}0}, return the population count of @var{op}, which is
|
|
the number of 1 bits in the binary representation. If @math{@var{op}<0}, the
|
|
number of 1s is infinite, and the return value is @var{ULONG_MAX}, the largest
|
|
possible @code{unsigned long}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpz_hamdist (mpz_t @var{op1}, mpz_t @var{op2})
|
|
If @var{op1} and @var{op2} are both @math{@ge{}0} or both @math{<0}, return
|
|
the hamming distance between the two operands, which is the number of bit
|
|
positions where @var{op1} and @var{op2} have different bit values. If one
|
|
operand is @math{@ge{}0} and the other @math{<0} then the number of bits
|
|
different is infinite, and the return value is @var{ULONG_MAX}, the largest
|
|
possible @code{unsigned long}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpz_scan0 (mpz_t @var{op}, unsigned long int @var{starting_bit})
|
|
@deftypefunx {unsigned long int} mpz_scan1 (mpz_t @var{op}, unsigned long int @var{starting_bit})
|
|
@cindex Bit scanning functions
|
|
@cindex Scan bit functions
|
|
Scan @var{op}, starting from bit @var{starting_bit}, towards more significant
|
|
bits, until the first 0 or 1 bit (respectively) is found. Return the index of
|
|
the found bit.
|
|
|
|
If the bit at @var{starting_bit} is already what's sought, then
|
|
@var{starting_bit} is returned.
|
|
|
|
If there's no bit found, then @var{ULONG_MAX} is returned. This will happen
|
|
in @code{mpz_scan0} past the end of a positive number, or @code{mpz_scan1}
|
|
past the end of a negative.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_setbit (mpz_t @var{rop}, unsigned long int @var{bit_index})
|
|
Set bit @var{bit_index} in @var{rop}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_clrbit (mpz_t @var{rop}, unsigned long int @var{bit_index})
|
|
Clear bit @var{bit_index} in @var{rop}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_combit (mpz_t @var{rop}, unsigned long int @var{bit_index})
|
|
Complement bit @var{bit_index} in @var{rop}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpz_tstbit (mpz_t @var{op}, unsigned long int @var{bit_index})
|
|
Test bit @var{bit_index} in @var{op} and return 0 or 1 accordingly.
|
|
@end deftypefun
|
|
|
|
@node I/O of Integers, Integer Random Numbers, Integer Logic and Bit Fiddling, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Input and Output Functions
|
|
@cindex Integer input and output functions
|
|
@cindex Input functions
|
|
@cindex Output functions
|
|
@cindex I/O functions
|
|
|
|
Functions that perform input from a stdio stream, and functions that output to
|
|
a stdio stream. Passing a @code{NULL} pointer for a @var{stream} argument to any of
|
|
these functions will make them read from @code{stdin} and write to
|
|
@code{stdout}, respectively.
|
|
|
|
When using any of these functions, it is a good idea to include @file{stdio.h}
|
|
before @file{mpir.h}, since that will allow @file{mpir.h} to define prototypes
|
|
for these functions.
|
|
|
|
@deftypefun size_t mpz_out_str (FILE *@var{stream}, int @var{base}, mpz_t @var{op})
|
|
Output @var{op} on stdio stream @var{stream}, as a string of digits in base
|
|
@var{base}. The base may vary from 2 to 36.
|
|
|
|
Return the number of bytes written, or if an error occurred, return 0.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t mpz_inp_str (mpz_t @var{rop}, FILE *@var{stream}, int @var{base})
|
|
Input a possibly white-space preceded string in base @var{base} from stdio
|
|
stream @var{stream}, and put the read integer in @var{rop}.
|
|
|
|
The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading
|
|
characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and
|
|
@code{0B} for binary, @code{0} for octal, or decimal otherwise.
|
|
|
|
For bases up to 36, case is ignored; upper-case and lower-case letters have
|
|
the same value. For bases 37 to 62, upper-case letter represent the usual
|
|
10..35 while lower-case letter represent 36..61.
|
|
|
|
Return the number of bytes read, or if an error occurred, return 0.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t mpz_out_raw (FILE *@var{stream}, mpz_t @var{op})
|
|
Output @var{op} on stdio stream @var{stream}, in raw binary format. The
|
|
integer is written in a portable format, with 4 bytes of size information, and
|
|
that many bytes of limbs. Both the size and the limbs are written in
|
|
decreasing significance order (i.e., in big-endian).
|
|
|
|
The output can be read with @code{mpz_inp_raw}.
|
|
|
|
Return the number of bytes written, or if an error occurred, return 0.
|
|
|
|
The output of this can not be read by @code{mpz_inp_raw} from GMP 1, because
|
|
of changes necessary for compatibility between 32-bit and 64-bit machines.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t mpz_inp_raw (mpz_t @var{rop}, FILE *@var{stream})
|
|
Input from stdio stream @var{stream} in the format written by
|
|
@code{mpz_out_raw}, and put the result in @var{rop}. Return the number of
|
|
bytes read, or if an error occurred, return 0.
|
|
|
|
This routine can read the output from @code{mpz_out_raw} also from GMP 1, in
|
|
spite of changes necessary for compatibility between 32-bit and 64-bit
|
|
machines.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Integer Random Numbers, Integer Import and Export, I/O of Integers, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Random Number Functions
|
|
@cindex Integer random number functions
|
|
@cindex Random number functions
|
|
|
|
The random number functions of MPIR come in two groups; older function
|
|
that rely on a global state, and newer functions that accept a state
|
|
parameter that is read and modified. Please see the @ref{Random Number
|
|
Functions} for more information on how to use and not to use random
|
|
number functions.
|
|
|
|
@deftypefun void mpz_urandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, unsigned long int @var{n})
|
|
Generate a uniformly distributed random integer in the range 0 to @m{2^n-1,
|
|
2^@var{n}@minus{}1}, inclusive.
|
|
|
|
The variable @var{state} must be initialized by calling one of the
|
|
@code{gmp_randinit} functions (@ref{Random State Initialization}) before
|
|
invoking this function.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_urandomm (mpz_t @var{rop}, gmp_randstate_t @var{state}, mpz_t @var{n})
|
|
Generate a uniform random integer in the range 0 to @math{@var{n}-1},
|
|
inclusive.
|
|
|
|
The variable @var{state} must be initialized by calling one of the
|
|
@code{gmp_randinit} functions (@ref{Random State Initialization})
|
|
before invoking this function.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_rrandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, unsigned long int @var{n})
|
|
Generate a random integer with long strings of zeros and ones in the
|
|
binary representation. Useful for testing functions and algorithms,
|
|
since this kind of random numbers have proven to be more likely to
|
|
trigger corner-case bugs. The random number will be in the range
|
|
0 to @m{2^n-1, 2^@var{n}@minus{}1}, inclusive.
|
|
|
|
The variable @var{state} must be initialized by calling one of the
|
|
@code{gmp_randinit} functions (@ref{Random State Initialization})
|
|
before invoking this function.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_random (mpz_t @var{rop}, mp_size_t @var{max_size})
|
|
Generate a random integer of at most @var{max_size} limbs. The generated
|
|
random number doesn't satisfy any particular requirements of randomness.
|
|
Negative random numbers are generated when @var{max_size} is negative.
|
|
|
|
This function is obsolete. Use @code{mpz_urandomb} or
|
|
@code{mpz_urandomm} instead.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_random2 (mpz_t @var{rop}, mp_size_t @var{max_size})
|
|
Generate a random integer of at most @var{max_size} limbs, with long strings
|
|
of zeros and ones in the binary representation. Useful for testing functions
|
|
and algorithms, since this kind of random numbers have proven to be more
|
|
likely to trigger corner-case bugs. Negative random numbers are generated
|
|
when @var{max_size} is negative.
|
|
|
|
This function is obsolete. Use @code{mpz_rrandomb} instead.
|
|
@end deftypefun
|
|
|
|
|
|
@node Integer Import and Export, Miscellaneous Integer Functions, Integer Random Numbers, Integer Functions
|
|
@section Integer Import and Export
|
|
|
|
@code{mpz_t} variables can be converted to and from arbitrary words of binary
|
|
data with the following functions.
|
|
|
|
@deftypefun void mpz_import (mpz_t @var{rop}, size_t @var{count}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, const void *@var{op})
|
|
@cindex Integer import
|
|
@cindex Import
|
|
Set @var{rop} from an array of word data at @var{op}.
|
|
|
|
The parameters specify the format of the data. @var{count} many words are
|
|
read, each @var{size} bytes. @var{order} can be 1 for most significant word
|
|
first or -1 for least significant first. Within each word @var{endian} can be
|
|
1 for most significant byte first, -1 for least significant first, or 0 for
|
|
the native endianness of the host CPU@. The most significant @var{nails} bits
|
|
of each word are skipped, this can be 0 to use the full words.
|
|
|
|
There is no sign taken from the data, @var{rop} will simply be a positive
|
|
integer. An application can handle any sign itself, and apply it for instance
|
|
with @code{mpz_neg}.
|
|
|
|
There are no data alignment restrictions on @var{op}, any address is allowed.
|
|
|
|
Here's an example converting an array of @code{unsigned long} data, most
|
|
significant element first, and host byte order within each value.
|
|
|
|
@example
|
|
unsigned long a[20];
|
|
mpz_t z;
|
|
mpz_import (z, 20, 1, sizeof(a[0]), 0, 0, a);
|
|
@end example
|
|
|
|
This example assumes the full @code{sizeof} bytes are used for data in the
|
|
given type, which is usually true, and certainly true for @code{unsigned long}
|
|
everywhere we know of. However on Cray vector systems it may be noted that
|
|
@code{short} and @code{int} are always stored in 8 bytes (and with
|
|
@code{sizeof} indicating that) but use only 32 or 46 bits. The @var{nails}
|
|
feature can account for this, by passing for instance
|
|
@code{8*sizeof(int)-INT_BIT}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} mpz_export (void *@var{rop}, size_t *@var{countp}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, mpz_t @var{op})
|
|
@cindex Integer export
|
|
@cindex Export
|
|
Fill @var{rop} with word data from @var{op}.
|
|
|
|
The parameters specify the format of the data produced. Each word will be
|
|
@var{size} bytes and @var{order} can be 1 for most significant word first or
|
|
-1 for least significant first. Within each word @var{endian} can be 1 for
|
|
most significant byte first, -1 for least significant first, or 0 for the
|
|
native endianness of the host CPU@. The most significant @var{nails} bits of
|
|
each word are unused and set to zero, this can be 0 to produce full words.
|
|
|
|
The number of words produced is written to @code{*@var{countp}}, or
|
|
@var{countp} can be @code{NULL} to discard the count. @var{rop} must have
|
|
enough space for the data, or if @var{rop} is @code{NULL} then a result array
|
|
of the necessary size is allocated using the current MPIR allocation function
|
|
(@pxref{Custom Allocation}). In either case the return value is the
|
|
destination used, either @var{rop} or the allocated block.
|
|
|
|
If @var{op} is non-zero then the most significant word produced will be
|
|
non-zero. If @var{op} is zero then the count returned will be zero and
|
|
nothing written to @var{rop}. If @var{rop} is @code{NULL} in this case, no
|
|
block is allocated, just @code{NULL} is returned.
|
|
|
|
The sign of @var{op} is ignored, just the absolute value is exported. An
|
|
application can use @code{mpz_sgn} to get the sign and handle it as desired.
|
|
(@pxref{Integer Comparisons})
|
|
|
|
There are no data alignment restrictions on @var{rop}, any address is allowed.
|
|
|
|
When an application is allocating space itself the required size can be
|
|
determined with a calculation like the following. Since @code{mpz_sizeinbase}
|
|
always returns at least 1, @code{count} here will be at least one, which
|
|
avoids any portability problems with @code{malloc(0)}, though if @code{z} is
|
|
zero no space at all is actually needed (or written).
|
|
|
|
@example
|
|
numb = 8*size - nail;
|
|
count = (mpz_sizeinbase (z, 2) + numb-1) / numb;
|
|
p = malloc (count * size);
|
|
@end example
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Miscellaneous Integer Functions, Integer Special Functions, Integer Import and Export, Integer Functions
|
|
@comment node-name, next, previous, up
|
|
@section Miscellaneous Functions
|
|
@cindex Miscellaneous integer functions
|
|
@cindex Integer miscellaneous functions
|
|
|
|
@deftypefun int mpz_fits_ulong_p (mpz_t @var{op})
|
|
@deftypefunx int mpz_fits_slong_p (mpz_t @var{op})
|
|
@deftypefunx int mpz_fits_uint_p (mpz_t @var{op})
|
|
@deftypefunx int mpz_fits_sint_p (mpz_t @var{op})
|
|
@deftypefunx int mpz_fits_ushort_p (mpz_t @var{op})
|
|
@deftypefunx int mpz_fits_sshort_p (mpz_t @var{op})
|
|
Return non-zero iff the value of @var{op} fits in an @code{unsigned long int},
|
|
@code{signed long int}, @code{unsigned int}, @code{signed int}, @code{unsigned
|
|
short int}, or @code{signed short int}, respectively. Otherwise, return zero.
|
|
@end deftypefun
|
|
|
|
@deftypefn Macro int mpz_odd_p (mpz_t @var{op})
|
|
@deftypefnx Macro int mpz_even_p (mpz_t @var{op})
|
|
Determine whether @var{op} is odd or even, respectively. Return non-zero if
|
|
yes, zero if no. These macros evaluate their argument more than once.
|
|
@end deftypefn
|
|
|
|
@deftypefun size_t mpz_sizeinbase (mpz_t @var{op}, int @var{base})
|
|
@cindex Size in digits
|
|
@cindex Digits in an integer
|
|
Return the size of @var{op} measured in number of digits in the given
|
|
@var{base}. @var{base} can vary from 2 to 36. The sign of @var{op} is
|
|
ignored, just the absolute value is used. The result will be either exact or
|
|
1 too big. If @var{base} is a power of 2, the result is always exact. If
|
|
@var{op} is zero the return value is always 1.
|
|
|
|
This function can be used to determine the space required when converting
|
|
@var{op} to a string. The right amount of allocation is normally two more
|
|
than the value returned by @code{mpz_sizeinbase}, one extra for a minus sign
|
|
and one for the null-terminator.
|
|
|
|
@cindex Most significant bit
|
|
It will be noted that @code{mpz_sizeinbase(@var{op},2)} can be used to locate
|
|
the most significant 1 bit in @var{op}, counting from 1. (Unlike the bitwise
|
|
functions which start from 0, @xref{Integer Logic and Bit Fiddling,, Logical
|
|
and Bit Manipulation Functions}.)
|
|
@end deftypefun
|
|
|
|
|
|
@node Integer Special Functions, , Miscellaneous Integer Functions, Integer Functions
|
|
@section Special Functions
|
|
@cindex Special integer functions
|
|
@cindex Integer special functions
|
|
|
|
The functions in this section are for various special purposes. Most
|
|
applications will not need them.
|
|
|
|
@deftypefun void mpz_array_init (mpz_t @var{integer_array}, size_t @var{array_size}, @w{mp_size_t @var{fixed_num_bits}})
|
|
This is a special type of initialization. @strong{Fixed} space of
|
|
@var{fixed_num_bits} is allocated to each of the @var{array_size} integers in
|
|
@var{integer_array}. There is no way to free the storage allocated by this
|
|
function. Don't call @code{mpz_clear}!
|
|
|
|
The @var{integer_array} parameter is the first @code{mpz_t} in the array. For
|
|
example,
|
|
|
|
@example
|
|
mpz_t arr[20000];
|
|
mpz_array_init (arr[0], 20000, 512);
|
|
@end example
|
|
|
|
@c In case anyone's wondering, yes this parameter style is a bit anomalous,
|
|
@c it'd probably be nicer if it was "arr" instead of "arr[0]". Obviously the
|
|
@c two differ only in the declaration, not the pointer value, but changing is
|
|
@c not possible since it'd provoke warnings or errors in existing sources.
|
|
|
|
This function is only intended for programs that create a large number
|
|
of integers and need to reduce memory usage by avoiding the overheads of
|
|
allocating and reallocating lots of small blocks. In normal programs this
|
|
function is not recommended.
|
|
|
|
The space allocated to each integer by this function will not be automatically
|
|
increased, unlike the normal @code{mpz_init}, so an application must ensure it
|
|
is sufficient for any value stored. The following space requirements apply to
|
|
various routines,
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@code{mpz_abs}, @code{mpz_neg}, @code{mpz_set}, @code{mpz_set_si} and
|
|
@code{mpz_set_ui} need room for the value they store.
|
|
|
|
@item
|
|
@code{mpz_add}, @code{mpz_add_ui}, @code{mpz_sub} and @code{mpz_sub_ui} need
|
|
room for the larger of the two operands, plus an extra
|
|
@code{mp_bits_per_limb}.
|
|
|
|
@item
|
|
@code{mpz_mul}, @code{mpz_mul_ui} and @code{mpz_mul_ui} need room for the sum
|
|
of the number of bits in their operands, but each rounded up to a multiple of
|
|
@code{mp_bits_per_limb}.
|
|
|
|
@item
|
|
@code{mpz_swap} can be used between two array variables, but not between an
|
|
array and a normal variable.
|
|
@end itemize
|
|
|
|
For other functions, or if in doubt, the suggestion is to calculate in a
|
|
regular @code{mpz_init} variable and copy the result to an array variable with
|
|
@code{mpz_set}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} _mpz_realloc (mpz_t @var{integer}, mp_size_t @var{new_alloc})
|
|
Change the space for @var{integer} to @var{new_alloc} limbs. The value in
|
|
@var{integer} is preserved if it fits, or is set to 0 if not. The return
|
|
value is not useful to applications and should be ignored.
|
|
|
|
@code{mpz_realloc2} is the preferred way to accomplish allocation changes like
|
|
this. @code{mpz_realloc2} and @code{_mpz_realloc} are the same except that
|
|
@code{_mpz_realloc} takes its size in limbs.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpz_getlimbn (mpz_t @var{op}, mp_size_t @var{n})
|
|
Return limb number @var{n} from @var{op}. The sign of @var{op} is ignored,
|
|
just the absolute value is used. The least significant limb is number 0.
|
|
|
|
@code{mpz_size} can be used to find how many limbs make up @var{op}.
|
|
@code{mpz_getlimbn} returns zero if @var{n} is outside the range 0 to
|
|
@code{mpz_size(@var{op})-1}.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t mpz_size (mpz_t @var{op})
|
|
Return the size of @var{op} measured in number of limbs. If @var{op} is zero,
|
|
the returned value will be zero.
|
|
@c (@xref{Nomenclature}, for an explanation of the concept @dfn{limb}.)
|
|
@end deftypefun
|
|
|
|
|
|
|
|
@node Rational Number Functions, Floating-point Functions, Integer Functions, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Rational Number Functions
|
|
@cindex Rational number functions
|
|
|
|
This chapter describes the MPIR functions for performing arithmetic on rational
|
|
numbers. These functions start with the prefix @code{mpq_}.
|
|
|
|
Rational numbers are stored in objects of type @code{mpq_t}.
|
|
|
|
All rational arithmetic functions assume operands have a canonical form, and
|
|
canonicalize their result. The canonical from means that the denominator and
|
|
the numerator have no common factors, and that the denominator is positive.
|
|
Zero has the unique representation 0/1.
|
|
|
|
Pure assignment functions do not canonicalize the assigned variable. It is
|
|
the responsibility of the user to canonicalize the assigned variable before
|
|
any arithmetic operations are performed on that variable.
|
|
|
|
@deftypefun void mpq_canonicalize (mpq_t @var{op})
|
|
Remove any factors that are common to the numerator and denominator of
|
|
@var{op}, and make the denominator positive.
|
|
@end deftypefun
|
|
|
|
@menu
|
|
* Initializing Rationals::
|
|
* Rational Conversions::
|
|
* Rational Arithmetic::
|
|
* Comparing Rationals::
|
|
* Applying Integer Functions::
|
|
* I/O of Rationals::
|
|
@end menu
|
|
|
|
@node Initializing Rationals, Rational Conversions, Rational Number Functions, Rational Number Functions
|
|
@comment node-name, next, previous, up
|
|
@section Initialization and Assignment Functions
|
|
@cindex Rational assignment functions
|
|
@cindex Assignment functions
|
|
@cindex Rational initialization functions
|
|
@cindex Initialization functions
|
|
|
|
@deftypefun void mpq_init (mpq_t @var{dest_rational})
|
|
Initialize @var{dest_rational} and set it to 0/1. Each variable should
|
|
normally only be initialized once, or at least cleared out (using the function
|
|
@code{mpq_clear}) between each initialization.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_clear (mpq_t @var{rational_number})
|
|
Free the space occupied by @var{rational_number}. Make sure to call this
|
|
function for all @code{mpq_t} variables when you are done with them.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_set (mpq_t @var{rop}, mpq_t @var{op})
|
|
@deftypefunx void mpq_set_z (mpq_t @var{rop}, mpz_t @var{op})
|
|
Assign @var{rop} from @var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_set_ui (mpq_t @var{rop}, unsigned long int @var{op1}, unsigned long int @var{op2})
|
|
@deftypefunx void mpq_set_si (mpq_t @var{rop}, signed long int @var{op1}, unsigned long int @var{op2})
|
|
Set the value of @var{rop} to @var{op1}/@var{op2}. Note that if @var{op1} and
|
|
@var{op2} have common factors, @var{rop} has to be passed to
|
|
@code{mpq_canonicalize} before any operations are performed on @var{rop}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpq_set_str (mpq_t @var{rop}, char *@var{str}, int @var{base})
|
|
Set @var{rop} from a null-terminated string @var{str} in the given @var{base}.
|
|
|
|
The string can be an integer like ``41'' or a fraction like ``41/152''. The
|
|
fraction must be in canonical form (@pxref{Rational Number Functions}), or if
|
|
not then @code{mpq_canonicalize} must be called.
|
|
|
|
The numerator and optional denominator are parsed the same as in
|
|
@code{mpz_set_str} (@pxref{Assigning Integers}). White space is allowed in
|
|
the string, and is simply ignored. The @var{base} can vary from 2 to 62, or
|
|
if @var{base} is 0 then the leading characters are used: @code{0x} or @code{0X} for hex,
|
|
@code{0b} or @code{0B} for binary,
|
|
@code{0} for octal, or decimal otherwise. Note that this is done separately
|
|
for the numerator and denominator, so for instance @code{0xEF/100} is 239/100,
|
|
whereas @code{0xEF/0x100} is 239/256.
|
|
|
|
The return value is 0 if the entire string is a valid number, or @minus{}1 if
|
|
not.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_swap (mpq_t @var{rop1}, mpq_t @var{rop2})
|
|
Swap the values @var{rop1} and @var{rop2} efficiently.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Rational Conversions, Rational Arithmetic, Initializing Rationals, Rational Number Functions
|
|
@comment node-name, next, previous, up
|
|
@section Conversion Functions
|
|
@cindex Rational conversion functions
|
|
@cindex Conversion functions
|
|
|
|
@deftypefun double mpq_get_d (mpq_t @var{op})
|
|
Convert @var{op} to a @code{double}, truncating if necessary (ie.@: rounding
|
|
towards zero).
|
|
|
|
If the exponent from the conversion is too big or too small to fit a
|
|
@code{double} then the result is system dependent. For too big an infinity is
|
|
returned when available. For too small @math{0.0} is normally returned.
|
|
Hardware overflow, underflow and denorm traps may or may not occur.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_set_d (mpq_t @var{rop}, double @var{op})
|
|
@deftypefunx void mpq_set_f (mpq_t @var{rop}, mpf_t @var{op})
|
|
Set @var{rop} to the value of @var{op}. There is no rounding, this conversion
|
|
is exact.
|
|
@end deftypefun
|
|
|
|
@deftypefun {char *} mpq_get_str (char *@var{str}, int @var{base}, mpq_t @var{op})
|
|
Convert @var{op} to a string of digits in base @var{base}. The base may vary
|
|
from 2 to 36. The string will be of the form @samp{num/den}, or if the
|
|
denominator is 1 then just @samp{num}.
|
|
|
|
If @var{str} is @code{NULL}, the result string is allocated using the current
|
|
allocation function (@pxref{Custom Allocation}). The block will be
|
|
@code{strlen(str)+1} bytes, that being exactly enough for the string and
|
|
null-terminator.
|
|
|
|
If @var{str} is not @code{NULL}, it should point to a block of storage large
|
|
enough for the result, that being
|
|
|
|
@example
|
|
mpz_sizeinbase (mpq_numref(@var{op}), @var{base})
|
|
+ mpz_sizeinbase (mpq_denref(@var{op}), @var{base}) + 3
|
|
@end example
|
|
|
|
The three extra bytes are for a possible minus sign, possible slash, and the
|
|
null-terminator.
|
|
|
|
A pointer to the result string is returned, being either the allocated block,
|
|
or the given @var{str}.
|
|
@end deftypefun
|
|
|
|
|
|
@node Rational Arithmetic, Comparing Rationals, Rational Conversions, Rational Number Functions
|
|
@comment node-name, next, previous, up
|
|
@section Arithmetic Functions
|
|
@cindex Rational arithmetic functions
|
|
@cindex Arithmetic functions
|
|
|
|
@deftypefun void mpq_add (mpq_t @var{sum}, mpq_t @var{addend1}, mpq_t @var{addend2})
|
|
Set @var{sum} to @var{addend1} + @var{addend2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_sub (mpq_t @var{difference}, mpq_t @var{minuend}, mpq_t @var{subtrahend})
|
|
Set @var{difference} to @var{minuend} @minus{} @var{subtrahend}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_mul (mpq_t @var{product}, mpq_t @var{multiplier}, mpq_t @var{multiplicand})
|
|
Set @var{product} to @math{@var{multiplier} @GMPtimes{} @var{multiplicand}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_mul_2exp (mpq_t @var{rop}, mpq_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
|
|
@var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_div (mpq_t @var{quotient}, mpq_t @var{dividend}, mpq_t @var{divisor})
|
|
@cindex Division functions
|
|
Set @var{quotient} to @var{dividend}/@var{divisor}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_div_2exp (mpq_t @var{rop}, mpq_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to
|
|
@var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_neg (mpq_t @var{negated_operand}, mpq_t @var{operand})
|
|
Set @var{negated_operand} to @minus{}@var{operand}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_abs (mpq_t @var{rop}, mpq_t @var{op})
|
|
Set @var{rop} to the absolute value of @var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_inv (mpq_t @var{inverted_number}, mpq_t @var{number})
|
|
Set @var{inverted_number} to 1/@var{number}. If the new denominator is
|
|
zero, this routine will divide by zero.
|
|
@end deftypefun
|
|
|
|
@node Comparing Rationals, Applying Integer Functions, Rational Arithmetic, Rational Number Functions
|
|
@comment node-name, next, previous, up
|
|
@section Comparison Functions
|
|
@cindex Rational comparison functions
|
|
@cindex Comparison functions
|
|
|
|
@deftypefun int mpq_cmp (mpq_t @var{op1}, mpq_t @var{op2})
|
|
Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} >
|
|
@var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if
|
|
@math{@var{op1} < @var{op2}}.
|
|
|
|
To determine if two rationals are equal, @code{mpq_equal} is faster than
|
|
@code{mpq_cmp}.
|
|
@end deftypefun
|
|
|
|
@deftypefn Macro int mpq_cmp_ui (mpq_t @var{op1}, unsigned long int @var{num2}, unsigned long int @var{den2})
|
|
@deftypefnx Macro int mpq_cmp_si (mpq_t @var{op1}, long int @var{num2}, unsigned long int @var{den2})
|
|
Compare @var{op1} and @var{num2}/@var{den2}. Return a positive value if
|
|
@math{@var{op1} > @var{num2}/@var{den2}}, zero if @math{@var{op1} =
|
|
@var{num2}/@var{den2}}, and a negative value if @math{@var{op1} <
|
|
@var{num2}/@var{den2}}.
|
|
|
|
@var{num2} and @var{den2} are allowed to have common factors.
|
|
|
|
These functions are implemented as a macros and evaluate their arguments
|
|
multiple times.
|
|
@end deftypefn
|
|
|
|
@deftypefn Macro int mpq_sgn (mpq_t @var{op})
|
|
@cindex Sign tests
|
|
@cindex Rational sign tests
|
|
Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
|
|
@math{-1} if @math{@var{op} < 0}.
|
|
|
|
This function is actually implemented as a macro. It evaluates its
|
|
arguments multiple times.
|
|
@end deftypefn
|
|
|
|
@deftypefun int mpq_equal (mpq_t @var{op1}, mpq_t @var{op2})
|
|
Return non-zero if @var{op1} and @var{op2} are equal, zero if they are
|
|
non-equal. Although @code{mpq_cmp} can be used for the same purpose, this
|
|
function is much faster.
|
|
@end deftypefun
|
|
|
|
@node Applying Integer Functions, I/O of Rationals, Comparing Rationals, Rational Number Functions
|
|
@comment node-name, next, previous, up
|
|
@section Applying Integer Functions to Rationals
|
|
@cindex Rational numerator and denominator
|
|
@cindex Numerator and denominator
|
|
|
|
The set of @code{mpq} functions is quite small. In particular, there are few
|
|
functions for either input or output. The following functions give direct
|
|
access to the numerator and denominator of an @code{mpq_t}.
|
|
|
|
Note that if an assignment to the numerator and/or denominator could take an
|
|
@code{mpq_t} out of the canonical form described at the start of this chapter
|
|
(@pxref{Rational Number Functions}) then @code{mpq_canonicalize} must be
|
|
called before any other @code{mpq} functions are applied to that @code{mpq_t}.
|
|
|
|
@deftypefn Macro mpz_t mpq_numref (mpq_t @var{op})
|
|
@deftypefnx Macro mpz_t mpq_denref (mpq_t @var{op})
|
|
Return a reference to the numerator and denominator of @var{op}, respectively.
|
|
The @code{mpz} functions can be used on the result of these macros.
|
|
@end deftypefn
|
|
|
|
@deftypefun void mpq_get_num (mpz_t @var{numerator}, mpq_t @var{rational})
|
|
@deftypefunx void mpq_get_den (mpz_t @var{denominator}, mpq_t @var{rational})
|
|
@deftypefunx void mpq_set_num (mpq_t @var{rational}, mpz_t @var{numerator})
|
|
@deftypefunx void mpq_set_den (mpq_t @var{rational}, mpz_t @var{denominator})
|
|
Get or set the numerator or denominator of a rational. These functions are
|
|
equivalent to calling @code{mpz_set} with an appropriate @code{mpq_numref} or
|
|
@code{mpq_denref}. Direct use of @code{mpq_numref} or @code{mpq_denref} is
|
|
recommended instead of these functions.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node I/O of Rationals, , Applying Integer Functions, Rational Number Functions
|
|
@comment node-name, next, previous, up
|
|
@section Input and Output Functions
|
|
@cindex Rational input and output functions
|
|
@cindex Input functions
|
|
@cindex Output functions
|
|
@cindex I/O functions
|
|
|
|
When using any of these functions, it's a good idea to include @file{stdio.h}
|
|
before @file{mpir.h}, since that will allow @file{mpir.h} to define prototypes
|
|
for these functions.
|
|
|
|
Passing a @code{NULL} pointer for a @var{stream} argument to any of these
|
|
functions will make them read from @code{stdin} and write to @code{stdout},
|
|
respectively.
|
|
|
|
@deftypefun size_t mpq_out_str (FILE *@var{stream}, int @var{base}, mpq_t @var{op})
|
|
Output @var{op} on stdio stream @var{stream}, as a string of digits in base
|
|
@var{base}. The base may vary from 2 to 36. Output is in the form
|
|
@samp{num/den} or if the denominator is 1 then just @samp{num}.
|
|
|
|
Return the number of bytes written, or if an error occurred, return 0.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t mpq_inp_str (mpq_t @var{rop}, FILE *@var{stream}, int @var{base})
|
|
Read a string of digits from @var{stream} and convert them to a rational in
|
|
@var{rop}. Any initial white-space characters are read and discarded. Return
|
|
the number of characters read (including white space), or 0 if a rational
|
|
could not be read.
|
|
|
|
The input can be a fraction like @samp{17/63} or just an integer like
|
|
@samp{123}. Reading stops at the first character not in this form, and white
|
|
space is not permitted within the string. If the input might not be in
|
|
canonical form, then @code{mpq_canonicalize} must be called (@pxref{Rational
|
|
Number Functions}).
|
|
|
|
The @var{base} can be between 2 and 36, or can be 0 in which case the leading
|
|
characters of the string determine the base, @samp{0x} or @samp{0X} for
|
|
hexadecimal, @samp{0} for octal, or decimal otherwise. The leading characters
|
|
are examined separately for the numerator and denominator of a fraction, so
|
|
for instance @samp{0x10/11} is @math{16/11}, whereas @samp{0x10/0x11} is
|
|
@math{16/17}.
|
|
@end deftypefun
|
|
|
|
|
|
@node Floating-point Functions, Low-level Functions, Rational Number Functions, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Floating-point Functions
|
|
@cindex Floating-point functions
|
|
@cindex Float functions
|
|
@cindex User-defined precision
|
|
@cindex Precision of floats
|
|
|
|
MPIR floating point numbers are stored in objects of type @code{mpf_t} and
|
|
functions operating on them have an @code{mpf_} prefix.
|
|
|
|
The mantissa of each float has a user-selectable precision, limited only by
|
|
available memory. Each variable has its own precision, and that can be
|
|
increased or decreased at any time.
|
|
|
|
The exponent of each float is a fixed precision, one machine word on most
|
|
systems. In the current implementation the exponent is a count of limbs, so
|
|
for example on a 32-bit system this means a range of roughly
|
|
@math{2^@W{-68719476768}} to @math{2^@W{68719476736}}, or on a 64-bit system
|
|
this will be greater. Note however @code{mpf_get_str} can only return an
|
|
exponent which fits an @code{mp_exp_t} and currently @code{mpf_set_str}
|
|
doesn't accept exponents bigger than a @code{long}.
|
|
|
|
Each variable keeps a size for the mantissa data actually in use. This means
|
|
that if a float is exactly represented in only a few bits then only those bits
|
|
will be used in a calculation, even if the selected precision is high.
|
|
|
|
All calculations are performed to the precision of the destination variable.
|
|
Each function is defined to calculate with ``infinite precision'' followed by
|
|
a truncation to the destination precision, but of course the work done is only
|
|
what's needed to determine a result under that definition.
|
|
|
|
The precision selected for a variable is a minimum value, MPIR may increase it
|
|
a little to facilitate efficient calculation. Currently this means rounding
|
|
up to a whole limb, and then sometimes having a further partial limb,
|
|
depending on the high limb of the mantissa. But applications shouldn't be
|
|
concerned by such details.
|
|
|
|
The mantissa in stored in binary, as might be imagined from the fact
|
|
precisions are expressed in bits. One consequence of this is that decimal
|
|
fractions like @math{0.1} cannot be represented exactly. The same is true of
|
|
plain IEEE @code{double} floats. This makes both highly unsuitable for
|
|
calculations involving money or other values that should be exact decimal
|
|
fractions. (Suitably scaled integers, or perhaps rationals, are better
|
|
choices.)
|
|
|
|
@code{mpf} functions and variables have no special notion of infinity or
|
|
not-a-number, and applications must take care not to overflow the exponent or
|
|
results will be unpredictable. This might change in a future release.
|
|
|
|
Note that the @code{mpf} functions are @emph{not} intended as a smooth
|
|
extension to IEEE P754 arithmetic. In particular results obtained on one
|
|
computer often differ from the results on a computer with a different word
|
|
size.
|
|
|
|
@menu
|
|
* Initializing Floats::
|
|
* Assigning Floats::
|
|
* Simultaneous Float Init & Assign::
|
|
* Converting Floats::
|
|
* Float Arithmetic::
|
|
* Float Comparison::
|
|
* I/O of Floats::
|
|
* Miscellaneous Float Functions::
|
|
@end menu
|
|
|
|
@node Initializing Floats, Assigning Floats, Floating-point Functions, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Initialization Functions
|
|
@cindex Float initialization functions
|
|
@cindex Initialization functions
|
|
|
|
@deftypefun void mpf_set_default_prec (unsigned long int @var{prec})
|
|
Set the default precision to be @strong{at least} @var{prec} bits. All
|
|
subsequent calls to @code{mpf_init} will use this precision, but previously
|
|
initialized variables are unaffected.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpf_get_default_prec (void)
|
|
Return the default precision actually used.
|
|
@end deftypefun
|
|
|
|
An @code{mpf_t} object must be initialized before storing the first value in
|
|
it. The functions @code{mpf_init} and @code{mpf_init2} are used for that
|
|
purpose.
|
|
|
|
@deftypefun void mpf_init (mpf_t @var{x})
|
|
Initialize @var{x} to 0. Normally, a variable should be initialized once only
|
|
or at least be cleared, using @code{mpf_clear}, between initializations. The
|
|
precision of @var{x} is undefined unless a default precision has already been
|
|
established by a call to @code{mpf_set_default_prec}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_init2 (mpf_t @var{x}, unsigned long int @var{prec})
|
|
Initialize @var{x} to 0 and set its precision to be @strong{at least}
|
|
@var{prec} bits. Normally, a variable should be initialized once only or at
|
|
least be cleared, using @code{mpf_clear}, between initializations.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_clear (mpf_t @var{x})
|
|
Free the space occupied by @var{x}. Make sure to call this function for all
|
|
@code{mpf_t} variables when you are done with them.
|
|
@end deftypefun
|
|
|
|
@need 2000
|
|
Here is an example on how to initialize floating-point variables:
|
|
@example
|
|
@{
|
|
mpf_t x, y;
|
|
mpf_init (x); /* use default precision */
|
|
mpf_init2 (y, 256); /* precision @emph{at least} 256 bits */
|
|
@dots{}
|
|
/* Unless the program is about to exit, do ... */
|
|
mpf_clear (x);
|
|
mpf_clear (y);
|
|
@}
|
|
@end example
|
|
|
|
The following three functions are useful for changing the precision during a
|
|
calculation. A typical use would be for adjusting the precision gradually in
|
|
iterative algorithms like Newton-Raphson, making the computation precision
|
|
closely match the actual accurate part of the numbers.
|
|
|
|
@deftypefun {unsigned long int} mpf_get_prec (mpf_t @var{op})
|
|
Return the current precision of @var{op}, in bits.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_set_prec (mpf_t @var{rop}, unsigned long int @var{prec})
|
|
Set the precision of @var{rop} to be @strong{at least} @var{prec} bits. The
|
|
value in @var{rop} will be truncated to the new precision.
|
|
|
|
This function requires a call to @code{realloc}, and so should not be used in
|
|
a tight loop.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_set_prec_raw (mpf_t @var{rop}, unsigned long int @var{prec})
|
|
Set the precision of @var{rop} to be @strong{at least} @var{prec} bits,
|
|
without changing the memory allocated.
|
|
|
|
@var{prec} must be no more than the allocated precision for @var{rop}, that
|
|
being the precision when @var{rop} was initialized, or in the most recent
|
|
@code{mpf_set_prec}.
|
|
|
|
The value in @var{rop} is unchanged, and in particular if it had a higher
|
|
precision than @var{prec} it will retain that higher precision. New values
|
|
written to @var{rop} will use the new @var{prec}.
|
|
|
|
Before calling @code{mpf_clear} or the full @code{mpf_set_prec}, another
|
|
@code{mpf_set_prec_raw} call must be made to restore @var{rop} to its original
|
|
allocated precision. Failing to do so will have unpredictable results.
|
|
|
|
@code{mpf_get_prec} can be used before @code{mpf_set_prec_raw} to get the
|
|
original allocated precision. After @code{mpf_set_prec_raw} it reflects the
|
|
@var{prec} value set.
|
|
|
|
@code{mpf_set_prec_raw} is an efficient way to use an @code{mpf_t} variable at
|
|
different precisions during a calculation, perhaps to gradually increase
|
|
precision in an iteration, or just to use various different precisions for
|
|
different purposes during a calculation.
|
|
@end deftypefun
|
|
|
|
|
|
@need 2000
|
|
@node Assigning Floats, Simultaneous Float Init & Assign, Initializing Floats, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Assignment Functions
|
|
@cindex Float assignment functions
|
|
@cindex Assignment functions
|
|
|
|
These functions assign new values to already initialized floats
|
|
(@pxref{Initializing Floats}).
|
|
|
|
@deftypefun void mpf_set (mpf_t @var{rop}, mpf_t @var{op})
|
|
@deftypefunx void mpf_set_ui (mpf_t @var{rop}, unsigned long int @var{op})
|
|
@deftypefunx void mpf_set_si (mpf_t @var{rop}, signed long int @var{op})
|
|
@deftypefunx void mpf_set_d (mpf_t @var{rop}, double @var{op})
|
|
@deftypefunx void mpf_set_z (mpf_t @var{rop}, mpz_t @var{op})
|
|
@deftypefunx void mpf_set_q (mpf_t @var{rop}, mpq_t @var{op})
|
|
Set the value of @var{rop} from @var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpf_set_str (mpf_t @var{rop}, char *@var{str}, int @var{base})
|
|
Set the value of @var{rop} from the string in @var{str}. The string is of the
|
|
form @samp{M@@N} or, if the base is 10 or less, alternatively @samp{MeN}.
|
|
@samp{M} is the mantissa and @samp{N} is the exponent. The mantissa is always
|
|
in the specified base. The exponent is either in the specified base or, if
|
|
@var{base} is negative, in decimal. The decimal point expected is taken from
|
|
the current locale, on systems providing @code{localeconv}.
|
|
|
|
The argument @var{base} may be in the ranges 2 to 62, or @minus{}62 to
|
|
@minus{}2. Negative values are used to specify that the exponent is in
|
|
decimal.
|
|
|
|
For bases up to 36, case is ignored; upper-case and lower-case letters have
|
|
the same value; for bases 37 to 62, upper-case letter represent the usual
|
|
10..35 while lower-case letter represent 36..61.
|
|
|
|
Unlike the corresponding @code{mpz} function, the base will not be determined
|
|
from the leading characters of the string if @var{base} is 0. This is so that
|
|
numbers like @samp{0.23} are not interpreted as octal.
|
|
|
|
White space is allowed in the string, and is simply ignored. [This is not
|
|
really true; white-space is ignored in the beginning of the string and within
|
|
the mantissa, but not in other places, such as after a minus sign or in the
|
|
exponent. We are considering changing the definition of this function, making
|
|
it fail when there is any white-space in the input, since that makes a lot of
|
|
sense. Please tell us your opinion about this change. Do you really want it
|
|
to accept @nicode{"3 14"} as meaning 314 as it does now?]
|
|
|
|
This function returns 0 if the entire string is a valid number in base
|
|
@var{base}. Otherwise it returns @minus{}1.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_swap (mpf_t @var{rop1}, mpf_t @var{rop2})
|
|
Swap @var{rop1} and @var{rop2} efficiently. Both the values and the
|
|
precisions of the two variables are swapped.
|
|
@end deftypefun
|
|
|
|
|
|
@node Simultaneous Float Init & Assign, Converting Floats, Assigning Floats, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Combined Initialization and Assignment Functions
|
|
@cindex Float assignment functions
|
|
@cindex Assignment functions
|
|
@cindex Float initialization functions
|
|
@cindex Initialization functions
|
|
|
|
For convenience, MPIR provides a parallel series of initialize-and-set functions
|
|
which initialize the output and then store the value there. These functions'
|
|
names have the form @code{mpf_init_set@dots{}}
|
|
|
|
Once the float has been initialized by any of the @code{mpf_init_set@dots{}}
|
|
functions, it can be used as the source or destination operand for the ordinary
|
|
float functions. Don't use an initialize-and-set function on a variable
|
|
already initialized!
|
|
|
|
@deftypefun void mpf_init_set (mpf_t @var{rop}, mpf_t @var{op})
|
|
@deftypefunx void mpf_init_set_ui (mpf_t @var{rop}, unsigned long int @var{op})
|
|
@deftypefunx void mpf_init_set_si (mpf_t @var{rop}, signed long int @var{op})
|
|
@deftypefunx void mpf_init_set_d (mpf_t @var{rop}, double @var{op})
|
|
Initialize @var{rop} and set its value from @var{op}.
|
|
|
|
The precision of @var{rop} will be taken from the active default precision, as
|
|
set by @code{mpf_set_default_prec}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpf_init_set_str (mpf_t @var{rop}, char *@var{str}, int @var{base})
|
|
Initialize @var{rop} and set its value from the string in @var{str}. See
|
|
@code{mpf_set_str} above for details on the assignment operation.
|
|
|
|
Note that @var{rop} is initialized even if an error occurs. (I.e., you have to
|
|
call @code{mpf_clear} for it.)
|
|
|
|
The precision of @var{rop} will be taken from the active default precision, as
|
|
set by @code{mpf_set_default_prec}.
|
|
@end deftypefun
|
|
|
|
|
|
@node Converting Floats, Float Arithmetic, Simultaneous Float Init & Assign, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Conversion Functions
|
|
@cindex Float conversion functions
|
|
@cindex Conversion functions
|
|
|
|
@deftypefun double mpf_get_d (mpf_t @var{op})
|
|
Convert @var{op} to a @code{double}, truncating if necessary (ie.@: rounding
|
|
towards zero).
|
|
|
|
If the exponent in @var{op} is too big or too small to fit a @code{double}
|
|
then the result is system dependent. For too big an infinity is returned when
|
|
available. For too small @math{0.0} is normally returned. Hardware overflow,
|
|
underflow and denorm traps may or may not occur.
|
|
@end deftypefun
|
|
|
|
@deftypefun double mpf_get_d_2exp (signed long int *@var{exp}, mpf_t @var{op})
|
|
Convert @var{op} to a @code{double}, truncating if necessary (ie.@: rounding
|
|
towards zero), and with an exponent returned separately.
|
|
|
|
The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the
|
|
exponent is stored to @code{*@var{exp}}. @m{@var{d} * 2^{exp}, @var{d} *
|
|
2^@var{exp}} is the (truncated) @var{op} value. If @var{op} is zero, the
|
|
return is @math{0.0} and 0 is stored to @code{*@var{exp}}.
|
|
|
|
@cindex @code{frexp}
|
|
This is similar to the standard C @code{frexp} function (@pxref{Normalization
|
|
Functions,,, libc, The GNU C Library Reference Manual}).
|
|
@end deftypefun
|
|
|
|
@deftypefun long mpf_get_si (mpf_t @var{op})
|
|
@deftypefunx {unsigned long} mpf_get_ui (mpf_t @var{op})
|
|
Convert @var{op} to a @code{long} or @code{unsigned long}, truncating any
|
|
fraction part. If @var{op} is too big for the return type, the result is
|
|
undefined.
|
|
|
|
See also @code{mpf_fits_slong_p} and @code{mpf_fits_ulong_p}
|
|
(@pxref{Miscellaneous Float Functions}).
|
|
@end deftypefun
|
|
|
|
@deftypefun {char *} mpf_get_str (char *@var{str}, mp_exp_t *@var{expptr}, int @var{base}, size_t @var{n_digits}, mpf_t @var{op})
|
|
Convert @var{op} to a string of digits in base @var{base}. @var{base} can be
|
|
2 to 36. Up to @var{n_digits} digits will be generated. Trailing zeros are
|
|
not returned. No more digits than can be accurately represented by @var{op}
|
|
are ever generated. If @var{n_digits} is 0 then that accurate maximum number
|
|
of digits are generated.
|
|
|
|
If @var{str} is @code{NULL}, the result string is allocated using the current
|
|
allocation function (@pxref{Custom Allocation}). The block will be
|
|
@code{strlen(str)+1} bytes, that being exactly enough for the string and
|
|
null-terminator.
|
|
|
|
If @var{str} is not @code{NULL}, it should point to a block of
|
|
@math{@var{n_digits} + 2} bytes, that being enough for the mantissa, a
|
|
possible minus sign, and a null-terminator. When @var{n_digits} is 0 to get
|
|
all significant digits, an application won't be able to know the space
|
|
required, and @var{str} should be @code{NULL} in that case.
|
|
|
|
The generated string is a fraction, with an implicit radix point immediately
|
|
to the left of the first digit. The applicable exponent is written through
|
|
the @var{expptr} pointer. For example, the number 3.1416 would be returned as
|
|
string @nicode{"31416"} and exponent 1.
|
|
|
|
When @var{op} is zero, an empty string is produced and the exponent returned
|
|
is 0.
|
|
|
|
A pointer to the result string is returned, being either the allocated block
|
|
or the given @var{str}.
|
|
@end deftypefun
|
|
|
|
|
|
@node Float Arithmetic, Float Comparison, Converting Floats, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Arithmetic Functions
|
|
@cindex Float arithmetic functions
|
|
@cindex Arithmetic functions
|
|
|
|
@deftypefun void mpf_add (mpf_t @var{rop}, mpf_t @var{op1}, mpf_t @var{op2})
|
|
@deftypefunx void mpf_add_ui (mpf_t @var{rop}, mpf_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @math{@var{op1} + @var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_sub (mpf_t @var{rop}, mpf_t @var{op1}, mpf_t @var{op2})
|
|
@deftypefunx void mpf_ui_sub (mpf_t @var{rop}, unsigned long int @var{op1}, mpf_t @var{op2})
|
|
@deftypefunx void mpf_sub_ui (mpf_t @var{rop}, mpf_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @var{op1} @minus{} @var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_mul (mpf_t @var{rop}, mpf_t @var{op1}, mpf_t @var{op2})
|
|
@deftypefunx void mpf_mul_ui (mpf_t @var{rop}, mpf_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}.
|
|
@end deftypefun
|
|
|
|
Division is undefined if the divisor is zero, and passing a zero divisor to the
|
|
divide functions will make these functions intentionally divide by zero. This
|
|
lets the user handle arithmetic exceptions in these functions in the same
|
|
manner as other arithmetic exceptions.
|
|
|
|
@deftypefun void mpf_div (mpf_t @var{rop}, mpf_t @var{op1}, mpf_t @var{op2})
|
|
@deftypefunx void mpf_ui_div (mpf_t @var{rop}, unsigned long int @var{op1}, mpf_t @var{op2})
|
|
@deftypefunx void mpf_div_ui (mpf_t @var{rop}, mpf_t @var{op1}, unsigned long int @var{op2})
|
|
@cindex Division functions
|
|
Set @var{rop} to @var{op1}/@var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_sqrt (mpf_t @var{rop}, mpf_t @var{op})
|
|
@deftypefunx void mpf_sqrt_ui (mpf_t @var{rop}, unsigned long int @var{op})
|
|
@cindex Root extraction functions
|
|
Set @var{rop} to @m{\sqrt{@var{op}}, the square root of @var{op}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_pow_ui (mpf_t @var{rop}, mpf_t @var{op1}, unsigned long int @var{op2})
|
|
@cindex Exponentiation functions
|
|
@cindex Powering functions
|
|
Set @var{rop} to @m{@var{op1}^{op2}, @var{op1} raised to the power @var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_neg (mpf_t @var{rop}, mpf_t @var{op})
|
|
Set @var{rop} to @minus{}@var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_abs (mpf_t @var{rop}, mpf_t @var{op})
|
|
Set @var{rop} to the absolute value of @var{op}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_mul_2exp (mpf_t @var{rop}, mpf_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
|
|
@var{op2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_div_2exp (mpf_t @var{rop}, mpf_t @var{op1}, unsigned long int @var{op2})
|
|
Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to
|
|
@var{op2}}.
|
|
@end deftypefun
|
|
|
|
@node Float Comparison, I/O of Floats, Float Arithmetic, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Comparison Functions
|
|
@cindex Float comparison functions
|
|
@cindex Comparison functions
|
|
|
|
@deftypefun int mpf_cmp (mpf_t @var{op1}, mpf_t @var{op2})
|
|
@deftypefunx int mpf_cmp_d (mpf_t @var{op1}, double @var{op2})
|
|
@deftypefunx int mpf_cmp_ui (mpf_t @var{op1}, unsigned long int @var{op2})
|
|
@deftypefunx int mpf_cmp_si (mpf_t @var{op1}, signed long int @var{op2})
|
|
Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} >
|
|
@var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if
|
|
@math{@var{op1} < @var{op2}}.
|
|
|
|
@code{mpf_cmp_d} can be called with an infinity, but results are undefined for
|
|
a NaN.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpf_eq (mpf_t @var{op1}, mpf_t @var{op2}, unsigned long int op3)
|
|
Return non-zero if the first @var{op3} bits of @var{op1} and @var{op2} are
|
|
equal, zero otherwise. I.e., test if @var{op1} and @var{op2} are approximately
|
|
equal.
|
|
|
|
Caution: Currently only whole limbs are compared, and only in an exact
|
|
fashion. In the future values like 1000 and 0111 may be considered the same
|
|
to 3 bits (on the basis that their difference is that small).
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_reldiff (mpf_t @var{rop}, mpf_t @var{op1}, mpf_t @var{op2})
|
|
Compute the relative difference between @var{op1} and @var{op2} and store the
|
|
result in @var{rop}. This is @math{@GMPabs{@var{op1}-@var{op2}}/@var{op1}}.
|
|
@end deftypefun
|
|
|
|
@deftypefn Macro int mpf_sgn (mpf_t @var{op})
|
|
@cindex Sign tests
|
|
@cindex Float sign tests
|
|
Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
|
|
@math{-1} if @math{@var{op} < 0}.
|
|
|
|
This function is actually implemented as a macro. It evaluates its arguments
|
|
multiple times.
|
|
@end deftypefn
|
|
|
|
@node I/O of Floats, Miscellaneous Float Functions, Float Comparison, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Input and Output Functions
|
|
@cindex Float input and output functions
|
|
@cindex Input functions
|
|
@cindex Output functions
|
|
@cindex I/O functions
|
|
|
|
Functions that perform input from a stdio stream, and functions that output to
|
|
a stdio stream. Passing a @code{NULL} pointer for a @var{stream} argument to
|
|
any of these functions will make them read from @code{stdin} and write to
|
|
@code{stdout}, respectively.
|
|
|
|
When using any of these functions, it is a good idea to include @file{stdio.h}
|
|
before @file{mpir.h}, since that will allow @file{mpir.h} to define prototypes
|
|
for these functions.
|
|
|
|
@deftypefun size_t mpf_out_str (FILE *@var{stream}, int @var{base}, size_t @var{n_digits}, mpf_t @var{op})
|
|
Print @var{op} to @var{stream}, as a string of digits. Return the number of
|
|
bytes written, or if an error occurred, return 0.
|
|
|
|
The mantissa is prefixed with an @samp{0.} and is in the given @var{base},
|
|
which may vary from 2 to 36. An exponent then printed, separated by an
|
|
@samp{e}, or if @var{base} is greater than 10 then by an @samp{@@}. The
|
|
exponent is always in decimal. The decimal point follows the current locale,
|
|
on systems providing @code{localeconv}.
|
|
|
|
Up to @var{n_digits} will be printed from the mantissa, except that no more
|
|
digits than are accurately representable by @var{op} will be printed.
|
|
@var{n_digits} can be 0 to select that accurate maximum.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t mpf_inp_str (mpf_t @var{rop}, FILE *@var{stream}, int @var{base})
|
|
Read a string in base @var{base} from @var{stream}, and put the read float in
|
|
@var{rop}. The string is of the form @samp{M@@N} or, if the base is 10 or
|
|
less, alternatively @samp{MeN}. @samp{M} is the mantissa and @samp{N} is the
|
|
exponent. The mantissa is always in the specified base. The exponent is
|
|
either in the specified base or, if @var{base} is negative, in decimal. The
|
|
decimal point expected is taken from the current locale, on systems providing
|
|
@code{localeconv}.
|
|
|
|
The argument @var{base} may be in the ranges 2 to 36, or @minus{}36 to
|
|
@minus{}2. Negative values are used to specify that the exponent is in
|
|
decimal.
|
|
|
|
Unlike the corresponding @code{mpz} function, the base will not be determined
|
|
from the leading characters of the string if @var{base} is 0. This is so that
|
|
numbers like @samp{0.23} are not interpreted as octal.
|
|
|
|
Return the number of bytes read, or if an error occurred, return 0.
|
|
@end deftypefun
|
|
|
|
@c @deftypefun void mpf_out_raw (FILE *@var{stream}, mpf_t @var{float})
|
|
@c Output @var{float} on stdio stream @var{stream}, in raw binary
|
|
@c format. The float is written in a portable format, with 4 bytes of
|
|
@c size information, and that many bytes of limbs. Both the size and the
|
|
@c limbs are written in decreasing significance order.
|
|
@c @end deftypefun
|
|
|
|
@c @deftypefun void mpf_inp_raw (mpf_t @var{float}, FILE *@var{stream})
|
|
@c Input from stdio stream @var{stream} in the format written by
|
|
@c @code{mpf_out_raw}, and put the result in @var{float}.
|
|
@c @end deftypefun
|
|
|
|
|
|
@node Miscellaneous Float Functions, , I/O of Floats, Floating-point Functions
|
|
@comment node-name, next, previous, up
|
|
@section Miscellaneous Functions
|
|
@cindex Miscellaneous float functions
|
|
@cindex Float miscellaneous functions
|
|
|
|
@deftypefun void mpf_ceil (mpf_t @var{rop}, mpf_t @var{op})
|
|
@deftypefunx void mpf_floor (mpf_t @var{rop}, mpf_t @var{op})
|
|
@deftypefunx void mpf_trunc (mpf_t @var{rop}, mpf_t @var{op})
|
|
@cindex Rounding functions
|
|
@cindex Float rounding functions
|
|
Set @var{rop} to @var{op} rounded to an integer. @code{mpf_ceil} rounds to the
|
|
next higher integer, @code{mpf_floor} to the next lower, and @code{mpf_trunc}
|
|
to the integer towards zero.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpf_integer_p (mpf_t @var{op})
|
|
Return non-zero if @var{op} is an integer.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpf_fits_ulong_p (mpf_t @var{op})
|
|
@deftypefunx int mpf_fits_slong_p (mpf_t @var{op})
|
|
@deftypefunx int mpf_fits_uint_p (mpf_t @var{op})
|
|
@deftypefunx int mpf_fits_sint_p (mpf_t @var{op})
|
|
@deftypefunx int mpf_fits_ushort_p (mpf_t @var{op})
|
|
@deftypefunx int mpf_fits_sshort_p (mpf_t @var{op})
|
|
Return non-zero if @var{op} would fit in the respective C data type, when
|
|
truncated to an integer.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_urandomb (mpf_t @var{rop}, gmp_randstate_t @var{state}, unsigned long int @var{nbits})
|
|
@cindex Random number functions
|
|
@cindex Float random number functions
|
|
Generate a uniformly distributed random float in @var{rop}, such that @math{0
|
|
@le{} @var{rop} < 1}, with @var{nbits} significant bits in the mantissa.
|
|
|
|
The variable @var{state} must be initialized by calling one of the
|
|
@code{gmp_randinit} functions (@ref{Random State Initialization}) before
|
|
invoking this function.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_random2 (mpf_t @var{rop}, mp_size_t @var{max_size}, mp_exp_t @var{exp})
|
|
Generate a random float of at most @var{max_size} limbs, with long strings of
|
|
zeros and ones in the binary representation. The exponent of the number is in
|
|
the interval @minus{}@var{exp} to @var{exp} (in limbs). This function is
|
|
useful for testing functions and algorithms, since these kind of random
|
|
numbers have proven to be more likely to trigger corner-case bugs. Negative
|
|
random numbers are generated when @var{max_size} is negative.
|
|
@end deftypefun
|
|
|
|
@c @deftypefun size_t mpf_size (mpf_t @var{op})
|
|
@c Return the size of @var{op} measured in number of limbs. If @var{op} is
|
|
@c zero, the returned value will be zero. (@xref{Nomenclature}, for an
|
|
@c explanation of the concept @dfn{limb}.)
|
|
@c
|
|
@c @strong{This function is obsolete. It will disappear from future MPIR
|
|
@c releases.}
|
|
@c @end deftypefun
|
|
|
|
|
|
@node Low-level Functions, Random Number Functions, Floating-point Functions, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Low-level Functions
|
|
@cindex Low-level functions
|
|
|
|
This chapter describes low-level MPIR functions, used to implement the
|
|
high-level MPIR functions, but also intended for time-critical user code.
|
|
|
|
These functions start with the prefix @code{mpn_}.
|
|
|
|
@c 1. Some of these function clobber input operands.
|
|
@c
|
|
|
|
The @code{mpn} functions are designed to be as fast as possible, @strong{not}
|
|
to provide a coherent calling interface. The different functions have somewhat
|
|
similar interfaces, but there are variations that make them hard to use. These
|
|
functions do as little as possible apart from the real multiple precision
|
|
computation, so that no time is spent on things that not all callers need.
|
|
|
|
A source operand is specified by a pointer to the least significant limb and a
|
|
limb count. A destination operand is specified by just a pointer. It is the
|
|
responsibility of the caller to ensure that the destination has enough space
|
|
for storing the result.
|
|
|
|
With this way of specifying operands, it is possible to perform computations on
|
|
subranges of an argument, and store the result into a subrange of a
|
|
destination.
|
|
|
|
A common requirement for all functions is that each source area needs at least
|
|
one limb. No size argument may be zero. Unless otherwise stated, in-place
|
|
operations are allowed where source and destination are the same, but not where
|
|
they only partly overlap.
|
|
|
|
The @code{mpn} functions are the base for the implementation of the
|
|
@code{mpz_}, @code{mpf_}, and @code{mpq_} functions.
|
|
|
|
This example adds the number beginning at @var{s1p} and the number beginning at
|
|
@var{s2p} and writes the sum at @var{destp}. All areas have @var{n} limbs.
|
|
|
|
@example
|
|
cy = mpn_add_n (destp, s1p, s2p, n)
|
|
@end example
|
|
|
|
It should be noted that the @code{mpn} functions make no attempt to identify
|
|
high or low zero limbs on their operands, or other special forms. On random
|
|
data such cases will be unlikely and it'd be wasteful for every function to
|
|
check every time. An application knowing something about its data can take
|
|
steps to trim or perhaps split its calculations.
|
|
@c
|
|
@c For reference, within gmp mpz_t operands never have high zero limbs, and
|
|
@c we rate low zero limbs as unlikely too (or something an application should
|
|
@c handle). This is a prime motivation for not stripping zero limbs in say
|
|
@c mpn_mul_n etc.
|
|
@c
|
|
@c Other applications doing variable-length calculations will quite likely do
|
|
@c something similar to mpz. And even if not then it's highly likely zero
|
|
@c limb stripping can be done at just a few judicious points, which will be
|
|
@c more efficient than having lots of mpn functions checking every time.
|
|
|
|
@sp 1
|
|
@noindent
|
|
In the notation used below, a source operand is identified by the pointer to
|
|
the least significant limb, and the limb count in braces. For example,
|
|
@{@var{s1p}, @var{s1n}@}.
|
|
|
|
@deftypefun mp_limb_t mpn_add_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
|
|
Add @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the @var{n}
|
|
least significant limbs of the result to @var{rp}. Return carry, either 0 or
|
|
1.
|
|
|
|
This is the lowest-level function for addition. It is the preferred function
|
|
for addition, since it is written in assembly for most CPUs. For addition of
|
|
a variable to itself (i.e., @var{s1p} equals @var{s2p}, use @code{mpn_lshift}
|
|
with a count of 1 for optimal speed.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_add_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
|
|
Add @{@var{s1p}, @var{n}@} and @var{s2limb}, and write the @var{n} least
|
|
significant limbs of the result to @var{rp}. Return carry, either 0 or 1.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_add (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
|
|
Add @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the
|
|
@var{s1n} least significant limbs of the result to @var{rp}. Return carry,
|
|
either 0 or 1.
|
|
|
|
This function requires that @var{s1n} is greater than or equal to @var{s2n}.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_sub_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
|
|
Subtract @{@var{s2p}, @var{n}@} from @{@var{s1p}, @var{n}@}, and write the
|
|
@var{n} least significant limbs of the result to @var{rp}. Return borrow,
|
|
either 0 or 1.
|
|
|
|
This is the lowest-level function for subtraction. It is the preferred
|
|
function for subtraction, since it is written in assembly for most CPUs.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_sub_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
|
|
Subtract @var{s2limb} from @{@var{s1p}, @var{n}@}, and write the @var{n} least
|
|
significant limbs of the result to @var{rp}. Return borrow, either 0 or 1.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_sub (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
|
|
Subtract @{@var{s2p}, @var{s2n}@} from @{@var{s1p}, @var{s1n}@}, and write the
|
|
@var{s1n} least significant limbs of the result to @var{rp}. Return borrow,
|
|
either 0 or 1.
|
|
|
|
This function requires that @var{s1n} is greater than or equal to
|
|
@var{s2n}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpn_mul_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
|
|
Multiply @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the
|
|
2*@var{n}-limb result to @var{rp}.
|
|
|
|
The destination has to have space for 2*@var{n} limbs, even if the product's
|
|
most significant limb is zero. No overlap is permitted between the
|
|
destination and either source.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_mul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
|
|
Multiply @{@var{s1p}, @var{n}@} by @var{s2limb}, and write the @var{n} least
|
|
significant limbs of the product to @var{rp}. Return the most significant
|
|
limb of the product. @{@var{s1p}, @var{n}@} and @{@var{rp}, @var{n}@} are
|
|
allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}.
|
|
|
|
This is a low-level function that is a building block for general
|
|
multiplication as well as other operations in MPIR@. It is written in assembly
|
|
for most CPUs.
|
|
|
|
Don't call this function if @var{s2limb} is a power of 2; use @code{mpn_lshift}
|
|
with a count equal to the logarithm of @var{s2limb} instead, for optimal speed.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_addmul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
|
|
Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and add the @var{n} least
|
|
significant limbs of the product to @{@var{rp}, @var{n}@} and write the result
|
|
to @var{rp}. Return the most significant limb of the product, plus carry-out
|
|
from the addition.
|
|
|
|
This is a low-level function that is a building block for general
|
|
multiplication as well as other operations in MPIR@. It is written in assembly
|
|
for most CPUs.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_submul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
|
|
Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and subtract the @var{n}
|
|
least significant limbs of the product from @{@var{rp}, @var{n}@} and write the
|
|
result to @var{rp}. Return the most significant limb of the product, minus
|
|
borrow-out from the subtraction.
|
|
|
|
This is a low-level function that is a building block for general
|
|
multiplication and division as well as other operations in MPIR@. It is written
|
|
in assembly for most CPUs.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_mul (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
|
|
Multiply @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the
|
|
result to @var{rp}. Return the most significant limb of the result.
|
|
|
|
The destination has to have space for @var{s1n} + @var{s2n} limbs, even if the
|
|
result might be one limb smaller.
|
|
|
|
This function requires that @var{s1n} is greater than or equal to
|
|
@var{s2n}. The destination must be distinct from both input operands.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpn_tdiv_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{rp}, mp_size_t @var{qxn}, const mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn})
|
|
Divide @{@var{np}, @var{nn}@} by @{@var{dp}, @var{dn}@} and put the quotient
|
|
at @{@var{qp}, @var{nn}@minus{}@var{dn}+1@} and the remainder at @{@var{rp},
|
|
@var{dn}@}. The quotient is rounded towards 0.
|
|
|
|
No overlap is permitted between arguments. @var{nn} must be greater than or
|
|
equal to @var{dn}. The most significant limb of @var{dp} must be non-zero.
|
|
The @var{qxn} operand must be zero.
|
|
@comment FIXME: Relax overlap requirements!
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_divrem (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})
|
|
[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best
|
|
performance.]
|
|
|
|
Divide @{@var{rs2p}, @var{rs2n}@} by @{@var{s3p}, @var{s3n}@}, and write the
|
|
quotient at @var{r1p}, with the exception of the most significant limb, which
|
|
is returned. The remainder replaces the dividend at @var{rs2p}; it will be
|
|
@var{s3n} limbs long (i.e., as many limbs as the divisor).
|
|
|
|
In addition to an integer quotient, @var{qxn} fraction limbs are developed, and
|
|
stored after the integral limbs. For most usages, @var{qxn} will be zero.
|
|
|
|
It is required that @var{rs2n} is greater than or equal to @var{s3n}. It is
|
|
required that the most significant bit of the divisor is set.
|
|
|
|
If the quotient is not needed, pass @var{rs2p} + @var{s3n} as @var{r1p}. Aside
|
|
from that special case, no overlap between arguments is permitted.
|
|
|
|
Return the most significant limb of the quotient, either 0 or 1.
|
|
|
|
The area at @var{r1p} needs to be @var{rs2n} @minus{} @var{s3n} + @var{qxn}
|
|
limbs large.
|
|
@end deftypefun
|
|
|
|
@deftypefn Function mp_limb_t mpn_divrem_1 (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, @w{mp_limb_t *@var{s2p}}, mp_size_t @var{s2n}, mp_limb_t @var{s3limb})
|
|
@deftypefnx Macro mp_limb_t mpn_divmod_1 (mp_limb_t *@var{r1p}, mp_limb_t *@var{s2p}, @w{mp_size_t @var{s2n}}, @w{mp_limb_t @var{s3limb}})
|
|
Divide @{@var{s2p}, @var{s2n}@} by @var{s3limb}, and write the quotient at
|
|
@var{r1p}. Return the remainder.
|
|
|
|
The integer quotient is written to @{@var{r1p}+@var{qxn}, @var{s2n}@} and in
|
|
addition @var{qxn} fraction limbs are developed and written to @{@var{r1p},
|
|
@var{qxn}@}. Either or both @var{s2n} and @var{qxn} can be zero. For most
|
|
usages, @var{qxn} will be zero.
|
|
|
|
@code{mpn_divmod_1} exists for upward source compatibility and is simply a
|
|
macro calling @code{mpn_divrem_1} with a @var{qxn} of 0.
|
|
|
|
The areas at @var{r1p} and @var{s2p} have to be identical or completely
|
|
separate, not partially overlapping.
|
|
@end deftypefn
|
|
|
|
@deftypefun mp_limb_t mpn_divmod (mp_limb_t *@var{r1p}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})
|
|
[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best
|
|
performance.]
|
|
@end deftypefun
|
|
|
|
@deftypefn Macro mp_limb_t mpn_divexact_by3 (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}})
|
|
@deftypefnx Function mp_limb_t mpn_divexact_by3c (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}}, mp_limb_t @var{carry})
|
|
Divide @{@var{sp}, @var{n}@} by 3, expecting it to divide exactly, and writing
|
|
the result to @{@var{rp}, @var{n}@}. If 3 divides exactly, the return value is
|
|
zero and the result is the quotient. If not, the return value is non-zero and
|
|
the result won't be anything useful.
|
|
|
|
@code{mpn_divexact_by3c} takes an initial carry parameter, which can be the
|
|
return value from a previous call, so a large calculation can be done piece by
|
|
piece from low to high. @code{mpn_divexact_by3} is simply a macro calling
|
|
@code{mpn_divexact_by3c} with a 0 carry parameter.
|
|
|
|
These routines use a multiply-by-inverse and will be faster than
|
|
@code{mpn_divrem_1} on CPUs with fast multiplication but slow division.
|
|
|
|
The source @math{a}, result @math{q}, size @math{n}, initial carry @math{i},
|
|
and return value @math{c} satisfy @m{cb^n+a-i=3q, c*b^n + a-i = 3*q}, where
|
|
@m{b=2\GMPraise{@code{GMP\_NUMB\_BITS}}, b=2^GMP_NUMB_BITS}. The
|
|
return @math{c} is always 0, 1 or 2, and the initial carry @math{i} must also
|
|
be 0, 1 or 2 (these are both borrows really). When @math{c=0} clearly
|
|
@math{q=(a-i)/3}. When @m{c \neq 0, c!=0}, the remainder @math{(a-i) @bmod{}
|
|
3} is given by @math{3-c}, because @math{b @equiv{} 1 @bmod{} 3} (when
|
|
@code{mp_bits_per_limb} is even, which is always so currently).
|
|
@end deftypefn
|
|
|
|
@deftypefun mp_limb_t mpn_mod_1 (mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t @var{s2limb})
|
|
Divide @{@var{s1p}, @var{s1n}@} by @var{s2limb}, and return the remainder.
|
|
@var{s1n} can be zero.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_bdivmod (mp_limb_t *@var{rp}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n}, unsigned long int @var{d})
|
|
This function puts the low
|
|
@math{@GMPfloor{@var{d}/@nicode{mp\_bits\_per\_limb}}} limbs of @var{q} =
|
|
@{@var{s1p}, @var{s1n}@}/@{@var{s2p}, @var{s2n}@} mod @m{2^d,2^@var{d}} at
|
|
@var{rp}, and returns the high @var{d} mod @code{mp_bits_per_limb} bits of
|
|
@var{q}.
|
|
|
|
@{@var{s1p}, @var{s1n}@} - @var{q} * @{@var{s2p}, @var{s2n}@} mod @m{2
|
|
\GMPraise{@var{s1n}*@code{mp\_bits\_per\_limb}},
|
|
2^(@var{s1n}*@nicode{mp\_bits\_per\_limb})} is placed at @var{s1p}. Since the
|
|
low @math{@GMPfloor{@var{d}/@nicode{mp\_bits\_per\_limb}}} limbs of this
|
|
difference are zero, it is possible to overwrite the low limbs at @var{s1p}
|
|
with this difference, provided @math{@var{rp} @le{} @var{s1p}}.
|
|
|
|
This function requires that @math{@var{s1n} * @nicode{mp\_bits\_per\_limb}
|
|
@ge{} @var{D}}, and that @{@var{s2p}, @var{s2n}@} is odd.
|
|
|
|
@strong{This interface is preliminary. It might change incompatibly in future
|
|
revisions.}
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_lshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})
|
|
Shift @{@var{sp}, @var{n}@} left by @var{count} bits, and write the result to
|
|
@{@var{rp}, @var{n}@}. The bits shifted out at the left are returned in the
|
|
least significant @var{count} bits of the return value (the rest of the return
|
|
value is zero).
|
|
|
|
@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The
|
|
regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided
|
|
@math{@var{rp} @ge{} @var{sp}}.
|
|
|
|
This function is written in assembly for most CPUs.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_rshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})
|
|
Shift @{@var{sp}, @var{n}@} right by @var{count} bits, and write the result to
|
|
@{@var{rp}, @var{n}@}. The bits shifted out at the right are returned in the
|
|
most significant @var{count} bits of the return value (the rest of the return
|
|
value is zero).
|
|
|
|
@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The
|
|
regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided
|
|
@math{@var{rp} @le{} @var{sp}}.
|
|
|
|
This function is written in assembly for most CPUs.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpn_cmp (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
|
|
Compare @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@} and return a
|
|
positive value if @math{@var{s1} > @var{s2}}, 0 if they are equal, or a
|
|
negative value if @math{@var{s1} < @var{s2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_size_t mpn_gcd (mp_limb_t *@var{rp}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
|
|
Set @{@var{rp}, @var{retval}@} to the greatest common divisor of @{@var{s1p},
|
|
@var{s1n}@} and @{@var{s2p}, @var{s2n}@}. The result can be up to @var{s2n}
|
|
limbs, the return value is the actual number produced. Both source operands
|
|
are destroyed.
|
|
|
|
@{@var{s1p}, @var{s1n}@} must have at least as many bits as @{@var{s2p},
|
|
@var{s2n}@}. @{@var{s2p}, @var{s2n}@} must be odd. Both operands must have
|
|
non-zero most significant limbs. No overlap is permitted between @{@var{s1p},
|
|
@var{s1n}@} and @{@var{s2p}, @var{s2n}@}.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_limb_t mpn_gcd_1 (const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t @var{s2limb})
|
|
Return the greatest common divisor of @{@var{s1p}, @var{s1n}@} and
|
|
@var{s2limb}. Both operands must be non-zero.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_size_t mpn_gcdext (mp_limb_t *@var{r1p}, mp_limb_t *@var{r2p}, mp_size_t *@var{r2n}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
|
|
Calculate the greatest common divisor of @{@var{s1p}, @var{s1n}@} and
|
|
@{@var{s2p}, @var{s2n}@}. Store the gcd at @{@var{r1p}, @var{retval}@} and
|
|
the first cofactor at @{@var{r2p}, *@var{r2n}@}, with *@var{r2n} negative if
|
|
the cofactor is negative. @var{r1p} and @var{r2p} should each have room for
|
|
@math{@var{s1n}+1} limbs, but the return value and value stored through
|
|
@var{r2n} indicate the actual number produced.
|
|
|
|
@math{@{@var{s1p}, @var{s1n}@} @ge{} @{@var{s2p}, @var{s2n}@}} is required,
|
|
and both must be non-zero. The regions @{@var{s1p}, @math{@var{s1n}+1}@} and
|
|
@{@var{s2p}, @math{@var{s2n}+1}@} are destroyed (i.e.@: the operands plus an
|
|
extra limb past the end of each).
|
|
|
|
The cofactor @var{r1} will satisfy @m{r_2 s_1 + k s_2 = r_1, @var{r2}*@var{s1}
|
|
+ @var{k}*@var{s2} = @var{r1}}. The second cofactor @var{k} is not calculated
|
|
but can easily be obtained from @m{(r_1 - r_2 s_1) / s_2, (@var{r1} -
|
|
@var{r2}*@var{s1}) / @var{s2}}.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_size_t mpn_sqrtrem (mp_limb_t *@var{r1p}, mp_limb_t *@var{r2p}, const mp_limb_t *@var{sp}, mp_size_t @var{n})
|
|
Compute the square root of @{@var{sp}, @var{n}@} and put the result at
|
|
@{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and the remainder at @{@var{r2p},
|
|
@var{retval}@}. @var{r2p} needs space for @var{n} limbs, but the return value
|
|
indicates how many are produced.
|
|
|
|
The most significant limb of @{@var{sp}, @var{n}@} must be non-zero. The
|
|
areas @{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and @{@var{sp}, @var{n}@} must
|
|
be completely separate. The areas @{@var{r2p}, @var{n}@} and @{@var{sp},
|
|
@var{n}@} must be either identical or completely separate.
|
|
|
|
If the remainder is not wanted then @var{r2p} can be @code{NULL}, and in this
|
|
case the return value is zero or non-zero according to whether the remainder
|
|
would have been zero or non-zero.
|
|
|
|
A return value of zero indicates a perfect square. See also
|
|
@code{mpz_perfect_square_p}.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_size_t mpn_get_str (unsigned char *@var{str}, int @var{base}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n})
|
|
Convert @{@var{s1p}, @var{s1n}@} to a raw unsigned char array at @var{str} in
|
|
base @var{base}, and return the number of characters produced. There may be
|
|
leading zeros in the string. The string is not in ASCII; to convert it to
|
|
printable format, add the ASCII codes for @samp{0} or @samp{A}, depending on
|
|
the base and range. @var{base} can vary from 2 to 256.
|
|
|
|
The most significant limb of the input @{@var{s1p}, @var{s1n}@} must be
|
|
non-zero. The input @{@var{s1p}, @var{s1n}@} is clobbered, except when
|
|
@var{base} is a power of 2, in which case it's unchanged.
|
|
|
|
The area at @var{str} has to have space for the largest possible number
|
|
represented by a @var{s1n} long limb array, plus one extra character.
|
|
@end deftypefun
|
|
|
|
@deftypefun mp_size_t mpn_set_str (mp_limb_t *@var{rp}, const unsigned char *@var{str}, size_t @var{strsize}, int @var{base})
|
|
Convert bytes @{@var{str},@var{strsize}@} in the given @var{base} to limbs at
|
|
@var{rp}.
|
|
|
|
@math{@var{str}[0]} is the most significant byte and
|
|
@math{@var{str}[@var{strsize}-1]} is the least significant. Each byte should
|
|
be a value in the range 0 to @math{@var{base}-1}, not an ASCII character.
|
|
@var{base} can vary from 2 to 256.
|
|
|
|
The return value is the number of limbs written to @var{rp}. If the most
|
|
significant input byte is non-zero then the high limb at @var{rp} will be
|
|
non-zero, and only that exact number of limbs will be required there.
|
|
|
|
If the most significant input byte is zero then there may be high zero limbs
|
|
written to @var{rp} and included in the return value.
|
|
|
|
@var{strsize} must be at least 1, and no overlap is permitted between
|
|
@{@var{str},@var{strsize}@} and the result at @var{rp}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpn_scan0 (const mp_limb_t *@var{s1p}, unsigned long int @var{bit})
|
|
Scan @var{s1p} from bit position @var{bit} for the next clear bit.
|
|
|
|
It is required that there be a clear bit within the area at @var{s1p} at or
|
|
beyond bit position @var{bit}, so that the function has something to return.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpn_scan1 (const mp_limb_t *@var{s1p}, unsigned long int @var{bit})
|
|
Scan @var{s1p} from bit position @var{bit} for the next set bit.
|
|
|
|
It is required that there be a set bit within the area at @var{s1p} at or
|
|
beyond bit position @var{bit}, so that the function has something to return.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpn_random (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})
|
|
@deftypefunx void mpn_random2 (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})
|
|
Generate a random number of length @var{r1n} and store it at @var{r1p}. The
|
|
most significant limb is always non-zero. @code{mpn_random} generates
|
|
uniformly distributed limb data, @code{mpn_random2} generates long strings of
|
|
zeros and ones in the binary representation.
|
|
|
|
@code{mpn_random2} is intended for testing the correctness of the @code{mpn}
|
|
routines.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpn_popcount (const mp_limb_t *@var{s1p}, mp_size_t @var{n})
|
|
Count the number of set bits in @{@var{s1p}, @var{n}@}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpn_hamdist (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
|
|
Compute the hamming distance between @{@var{s1p}, @var{n}@} and @{@var{s2p},
|
|
@var{n}@}, which is the number of bit positions where the two operands have
|
|
different bit values.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mpn_perfect_square_p (const mp_limb_t *@var{s1p}, mp_size_t @var{n})
|
|
Return non-zero iff @{@var{s1p}, @var{n}@} is a perfect square.
|
|
@end deftypefun
|
|
|
|
|
|
@sp 1
|
|
@section Nails
|
|
@cindex Nails
|
|
|
|
@strong{Everything in this section is highly experimental and may disappear or
|
|
be subject to incompatible changes in a future version of MPIR.}
|
|
|
|
N.B: Nails are currently disabled and not supported in MPIR. They may or may not return in a future version of MPIR.
|
|
|
|
Nails are an experimental feature whereby a few bits are left unused at the
|
|
top of each @code{mp_limb_t}. This can significantly improve carry handling
|
|
on some processors.
|
|
|
|
All the @code{mpn} functions accepting limb data will expect the nail bits to
|
|
be zero on entry, and will return data with the nails similarly all zero.
|
|
This applies both to limb vectors and to single limb arguments.
|
|
|
|
Nails can be enabled by configuring with @samp{--enable-nails}. By default
|
|
the number of bits will be chosen according to what suits the host processor,
|
|
but a particular number can be selected with @samp{--enable-nails=N}.
|
|
|
|
At the mpn level, a nail build is neither source nor binary compatible with a
|
|
non-nail build, strictly speaking. But programs acting on limbs only through
|
|
the mpn functions are likely to work equally well with either build, and
|
|
judicious use of the definitions below should make any program compatible with
|
|
either build, at the source level.
|
|
|
|
For the higher level routines, meaning @code{mpz} etc, a nail build should be
|
|
fully source and binary compatible with a non-nail build.
|
|
|
|
@defmac GMP_NAIL_BITS
|
|
@defmacx GMP_NUMB_BITS
|
|
@defmacx GMP_LIMB_BITS
|
|
@code{GMP_NAIL_BITS} is the number of nail bits, or 0 when nails are not in
|
|
use. @code{GMP_NUMB_BITS} is the number of data bits in a limb.
|
|
@code{GMP_LIMB_BITS} is the total number of bits in an @code{mp_limb_t}. In
|
|
all cases
|
|
|
|
@example
|
|
GMP_LIMB_BITS == GMP_NAIL_BITS + GMP_NUMB_BITS
|
|
@end example
|
|
@end defmac
|
|
|
|
@defmac GMP_NAIL_MASK
|
|
@defmacx GMP_NUMB_MASK
|
|
Bit masks for the nail and number parts of a limb. @code{GMP_NAIL_MASK} is 0
|
|
when nails are not in use.
|
|
|
|
@code{GMP_NAIL_MASK} is not often needed, since the nail part can be obtained
|
|
with @code{x >> GMP_NUMB_BITS}, and that means one less large constant, which
|
|
can help various RISC chips.
|
|
@end defmac
|
|
|
|
@defmac GMP_NUMB_MAX
|
|
The maximum value that can be stored in the number part of a limb. This is
|
|
the same as @code{GMP_NUMB_MASK}, but can be used for clarity when doing
|
|
comparisons rather than bit-wise operations.
|
|
@end defmac
|
|
|
|
The term ``nails'' comes from finger or toe nails, which are at the ends of a
|
|
limb (arm or leg). ``numb'' is short for number, but is also how the
|
|
developers felt after trying for a long time to come up with sensible names
|
|
for these things.
|
|
|
|
In the future (the distant future most likely) a non-zero nail might be
|
|
permitted, giving non-unique representations for numbers in a limb vector.
|
|
This would help vector processors since carries would only ever need to
|
|
propagate one or two limbs.
|
|
|
|
|
|
@node Random Number Functions, Formatted Output, Low-level Functions, Top
|
|
@chapter Random Number Functions
|
|
@cindex Random number functions
|
|
|
|
Sequences of pseudo-random numbers in MPIR are generated using a variable of
|
|
type @code{gmp_randstate_t}, which holds an algorithm selection and a current
|
|
state. Such a variable must be initialized by a call to one of the
|
|
@code{gmp_randinit} functions, and can be seeded with one of the
|
|
@code{gmp_randseed} functions.
|
|
|
|
The functions actually generating random numbers are described in @ref{Integer
|
|
Random Numbers}, and @ref{Miscellaneous Float Functions}.
|
|
|
|
The older style random number functions don't accept a @code{gmp_randstate_t}
|
|
parameter but instead share a global variable of that type. They use a
|
|
default algorithm and are currently not seeded (though perhaps that will
|
|
change in the future). The new functions accepting a @code{gmp_randstate_t}
|
|
are recommended for applications that care about randomness.
|
|
|
|
@menu
|
|
* Random State Initialization::
|
|
* Random State Seeding::
|
|
* Random State Miscellaneous::
|
|
@end menu
|
|
|
|
@node Random State Initialization, Random State Seeding, Random Number Functions, Random Number Functions
|
|
@section Random State Initialization
|
|
@cindex Random number state
|
|
@cindex Initialization functions
|
|
|
|
@deftypefun void gmp_randinit_default (gmp_randstate_t @var{state})
|
|
Initialize @var{state} with a default algorithm. This will be a compromise
|
|
between speed and randomness, and is recommended for applications with no
|
|
special requirements. Currently this is @code{gmp_randinit_mt}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void gmp_randinit_mt (gmp_randstate_t @var{state})
|
|
@cindex Mersenne twister random numbers
|
|
Initialize @var{state} for a Mersenne Twister algorithm. This algorithm is
|
|
fast and has good randomness properties.
|
|
@end deftypefun
|
|
|
|
@deftypefun void gmp_randinit_lc_2exp (gmp_randstate_t @var{state}, mpz_t @var{a}, @w{unsigned long @var{c}}, @w{unsigned long @var{m2exp}})
|
|
@cindex Linear congruential random numbers
|
|
Initialize @var{state} with a linear congruential algorithm @m{X = (@var{a}X +
|
|
@var{c}) @bmod 2^{m2exp}, X = (@var{a}*X + @var{c}) mod 2^@var{m2exp}}.
|
|
|
|
The low bits of @math{X} in this algorithm are not very random. The least
|
|
significant bit will have a period no more than 2, and the second bit no more
|
|
than 4, etc. For this reason only the high half of each @math{X} is actually
|
|
used.
|
|
|
|
When a random number of more than @math{@var{m2exp}/2} bits is to be
|
|
generated, multiple iterations of the recurrence are used and the results
|
|
concatenated.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_randinit_lc_2exp_size (gmp_randstate_t @var{state}, unsigned long @var{size})
|
|
@cindex Linear congruential random numbers
|
|
Initialize @var{state} for a linear congruential algorithm as per
|
|
@code{gmp_randinit_lc_2exp}. @var{a}, @var{c} and @var{m2exp} are selected
|
|
from a table, chosen so that @var{size} bits (or more) of each @math{X} will
|
|
be used, ie.@: @math{@var{m2exp}/2 @ge{} @var{size}}.
|
|
|
|
If successful the return value is non-zero. If @var{size} is bigger than the
|
|
table data provides then the return value is zero. The maximum @var{size}
|
|
currently supported is 128.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_randinit_set (gmp_randstate_t @var{rop}, gmp_randstate_t @var{op})
|
|
Initialize @var{rop} with a copy of the algorithm and state from @var{op}.
|
|
@end deftypefun
|
|
|
|
@c Although gmp_randinit, gmp_errno and related constants are obsolete, we
|
|
@c still put @findex entries for them, since they're still documented and
|
|
@c someone might be looking them up when perusing old application code.
|
|
|
|
@deftypefun void gmp_randinit (gmp_randstate_t @var{state}, @w{gmp_randalg_t @var{alg}}, @dots{})
|
|
@strong{This function is obsolete.}
|
|
|
|
@findex GMP_RAND_ALG_LC
|
|
@findex GMP_RAND_ALG_DEFAULT
|
|
Initialize @var{state} with an algorithm selected by @var{alg}. The only
|
|
choice is @code{GMP_RAND_ALG_LC}, which is @code{gmp_randinit_lc_2exp_size}
|
|
described above. A third parameter of type @code{unsigned long} is required,
|
|
this is the @var{size} for that function. @code{GMP_RAND_ALG_DEFAULT} or 0
|
|
are the same as @code{GMP_RAND_ALG_LC}.
|
|
|
|
@c For reference, this is the only place gmp_errno has been documented, and
|
|
@c due to being non thread safe we won't be adding to it's uses.
|
|
@findex gmp_errno
|
|
@findex GMP_ERROR_UNSUPPORTED_ARGUMENT
|
|
@findex GMP_ERROR_INVALID_ARGUMENT
|
|
@code{gmp_randinit} sets bits in the global variable @code{gmp_errno} to
|
|
indicate an error. @code{GMP_ERROR_UNSUPPORTED_ARGUMENT} if @var{alg} is
|
|
unsupported, or @code{GMP_ERROR_INVALID_ARGUMENT} if the @var{size} parameter
|
|
is too big. It may be noted this error reporting is not thread safe (a good
|
|
reason to use @code{gmp_randinit_lc_2exp_size} instead).
|
|
@end deftypefun
|
|
|
|
@deftypefun void gmp_randclear (gmp_randstate_t @var{state})
|
|
Free all memory occupied by @var{state}.
|
|
@end deftypefun
|
|
|
|
|
|
@node Random State Seeding, Random State Miscellaneous, Random State Initialization, Random Number Functions
|
|
@section Random State Seeding
|
|
@cindex Random number seeding
|
|
@cindex Seeding random numbers
|
|
|
|
@deftypefun void gmp_randseed (gmp_randstate_t @var{state}, mpz_t @var{seed})
|
|
@deftypefunx void gmp_randseed_ui (gmp_randstate_t @var{state}, @w{unsigned long int @var{seed}})
|
|
Set an initial seed value into @var{state}.
|
|
|
|
The size of a seed determines how many different sequences of random numbers
|
|
that it's possible to generate. The ``quality'' of the seed is the randomness
|
|
of a given seed compared to the previous seed used, and this affects the
|
|
randomness of separate number sequences. The method for choosing a seed is
|
|
critical if the generated numbers are to be used for important applications,
|
|
such as generating cryptographic keys.
|
|
|
|
Traditionally the system time has been used to seed, but care needs to be
|
|
taken with this. If an application seeds often and the resolution of the
|
|
system clock is low, then the same sequence of numbers might be repeated.
|
|
Also, the system time is quite easy to guess, so if unpredictability is
|
|
required then it should definitely not be the only source for the seed value.
|
|
On some systems there's a special device @file{/dev/random} which provides
|
|
random data better suited for use as a seed.
|
|
@end deftypefun
|
|
|
|
|
|
@node Random State Miscellaneous, , Random State Seeding, Random Number Functions
|
|
@section Random State Miscellaneous
|
|
|
|
@deftypefun {unsigned long} gmp_urandomb_ui (gmp_randstate_t @var{state}, unsigned long @var{n})
|
|
Return a uniformly distributed random number of @var{n} bits, ie.@: in the
|
|
range 0 to @m{2^n-1,2^@var{n}-1} inclusive. @var{n} must be less than or
|
|
equal to the number of bits in an @code{unsigned long}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long} gmp_urandomm_ui (gmp_randstate_t @var{state}, unsigned long @var{n})
|
|
Return a uniformly distributed random number in the range 0 to
|
|
@math{@var{n}-1}, inclusive.
|
|
@end deftypefun
|
|
|
|
|
|
@node Formatted Output, Formatted Input, Random Number Functions, Top
|
|
@chapter Formatted Output
|
|
@cindex Formatted output
|
|
@cindex @code{printf} formatted output
|
|
|
|
@menu
|
|
* Formatted Output Strings::
|
|
* Formatted Output Functions::
|
|
* C++ Formatted Output::
|
|
@end menu
|
|
|
|
@node Formatted Output Strings, Formatted Output Functions, Formatted Output, Formatted Output
|
|
@section Format Strings
|
|
|
|
@code{gmp_printf} and friends accept format strings similar to the standard C
|
|
@code{printf} (@pxref{Formatted Output,, Formatted Output, libc, The GNU C
|
|
Library Reference Manual}). A format specification is of the form
|
|
|
|
@example
|
|
% [flags] [width] [.[precision]] [type] conv
|
|
@end example
|
|
|
|
MPIR adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}
|
|
and @code{mpf_t} respectively, @samp{M} for @code{mp_limb_t}, and @samp{N} for
|
|
an @code{mp_limb_t} array. @samp{Z}, @samp{Q}, @samp{M} and @samp{N} behave
|
|
like integers. @samp{Q} will print a @samp{/} and a denominator, if needed.
|
|
@samp{F} behaves like a float. For example,
|
|
|
|
@example
|
|
mpz_t z;
|
|
gmp_printf ("%s is an mpz %Zd\n", "here", z);
|
|
|
|
mpq_t q;
|
|
gmp_printf ("a hex rational: %#40Qx\n", q);
|
|
|
|
mpf_t f;
|
|
int n;
|
|
gmp_printf ("fixed point mpf %.*Ff with %d digits\n", n, f, n);
|
|
|
|
mp_limb_t l;
|
|
gmp_printf ("limb %Mu\n", limb);
|
|
|
|
const mp_limb_t *ptr;
|
|
mp_size_t size;
|
|
gmp_printf ("limb array %Nx\n", ptr, size);
|
|
@end example
|
|
|
|
For @samp{N} the limbs are expected least significant first, as per the
|
|
@code{mpn} functions (@pxref{Low-level Functions}). A negative size can be
|
|
given to print the value as a negative.
|
|
|
|
All the standard C @code{printf} types behave the same as the C library
|
|
@code{printf}, and can be freely intermixed with the MPIR extensions. In the
|
|
current implementation the standard parts of the format string are simply
|
|
handed to @code{printf} and only the MPIR extensions handled directly.
|
|
|
|
The flags accepted are as follows. GLIBC style @nisamp{'} is only for the
|
|
standard C types (not the MPIR types), and only if the C library supports it.
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{0} @tab pad with zeros (rather than spaces)
|
|
@item @nicode{#} @tab show the base with @samp{0x}, @samp{0X} or @samp{0}
|
|
@item @nicode{+} @tab always show a sign
|
|
@item (space) @tab show a space or a @samp{-} sign
|
|
@item @nicode{'} @tab group digits, GLIBC style (not MPIR types)
|
|
@end multitable
|
|
@end quotation
|
|
|
|
The optional width and precision can be given as a number within the format
|
|
string, or as a @samp{*} to take an extra parameter of type @code{int}, the
|
|
same as the standard @code{printf}.
|
|
|
|
The standard types accepted are as follows. @samp{h} and @samp{l} are
|
|
portable, the rest will depend on the compiler (or include files) for the type
|
|
and the C library for the output.
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{h} @tab @nicode{short}
|
|
@item @nicode{hh} @tab @nicode{char}
|
|
@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t}
|
|
@item @nicode{l} @tab @nicode{long} or @nicode{wchar_t}
|
|
@item @nicode{ll} @tab @nicode{long long}
|
|
@item @nicode{L} @tab @nicode{long double}
|
|
@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t}
|
|
@item @nicode{t} @tab @nicode{ptrdiff_t}
|
|
@item @nicode{z} @tab @nicode{size_t}
|
|
@end multitable
|
|
@end quotation
|
|
|
|
@noindent
|
|
The MPIR types are
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{F} @tab @nicode{mpf_t}, float conversions
|
|
@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions
|
|
@item @nicode{M} @tab @nicode{mp_limb_t}, integer conversions
|
|
@item @nicode{N} @tab @nicode{mp_limb_t} array, integer conversions
|
|
@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions
|
|
@end multitable
|
|
@end quotation
|
|
|
|
The conversions accepted are as follows. @samp{a} and @samp{A} are always
|
|
supported for @code{mpf_t} but depend on the C library for standard C float
|
|
types. @samp{m} and @samp{p} depend on the C library.
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{a} @nicode{A} @tab hex floats, C99 style
|
|
@item @nicode{c} @tab character
|
|
@item @nicode{d} @tab decimal integer
|
|
@item @nicode{e} @nicode{E} @tab scientific format float
|
|
@item @nicode{f} @tab fixed point float
|
|
@item @nicode{i} @tab same as @nicode{d}
|
|
@item @nicode{g} @nicode{G} @tab fixed or scientific float
|
|
@item @nicode{m} @tab @code{strerror} string, GLIBC style
|
|
@item @nicode{n} @tab store characters written so far
|
|
@item @nicode{o} @tab octal integer
|
|
@item @nicode{p} @tab pointer
|
|
@item @nicode{s} @tab string
|
|
@item @nicode{u} @tab unsigned integer
|
|
@item @nicode{x} @nicode{X} @tab hex integer
|
|
@end multitable
|
|
@end quotation
|
|
|
|
@samp{o}, @samp{x} and @samp{X} are unsigned for the standard C types, but for
|
|
types @samp{Z}, @samp{Q} and @samp{N} they are signed. @samp{u} is not
|
|
meaningful for @samp{Z}, @samp{Q} and @samp{N}.
|
|
|
|
@samp{M} is a proxy for the C library @samp{l} or @samp{L}, according to the
|
|
size of @code{mp_limb_t}. Unsigned conversions will be usual, but a signed
|
|
conversion can be used and will interpret the value as a twos complement
|
|
negative.
|
|
|
|
@samp{n} can be used with any type, even the MPIR types.
|
|
|
|
Other types or conversions that might be accepted by the C library
|
|
@code{printf} cannot be used through @code{gmp_printf}, this includes for
|
|
instance extensions registered with GLIBC @code{register_printf_function}.
|
|
Also currently there's no support for POSIX @samp{$} style numbered arguments
|
|
(perhaps this will be added in the future).
|
|
|
|
The precision field has it's usual meaning for integer @samp{Z} and float
|
|
@samp{F} types, but is currently undefined for @samp{Q} and should not be used
|
|
with that.
|
|
|
|
@code{mpf_t} conversions only ever generate as many digits as can be
|
|
accurately represented by the operand, the same as @code{mpf_get_str} does.
|
|
Zeros will be used if necessary to pad to the requested precision. This
|
|
happens even for an @samp{f} conversion of an @code{mpf_t} which is an
|
|
integer, for instance @math{2^@W{1024}} in an @code{mpf_t} of 128 bits
|
|
precision will only produce about 40 digits, then pad with zeros to the
|
|
decimal point. An empty precision field like @samp{%.Fe} or @samp{%.Ff} can
|
|
be used to specifically request just the significant digits.
|
|
|
|
The decimal point character (or string) is taken from the current locale
|
|
settings on systems which provide @code{localeconv} (@pxref{Locales,, Locales
|
|
and Internationalization, libc, The GNU C Library Reference Manual}). The C
|
|
library will normally do the same for standard float output.
|
|
|
|
The format string is only interpreted as plain @code{char}s, multibyte
|
|
characters are not recognised. Perhaps this will change in the future.
|
|
|
|
|
|
@node Formatted Output Functions, C++ Formatted Output, Formatted Output Strings, Formatted Output
|
|
@section Functions
|
|
@cindex Output functions
|
|
|
|
Each of the following functions is similar to the corresponding C library
|
|
function. The basic @code{printf} forms take a variable argument list. The
|
|
@code{vprintf} forms take an argument pointer, see @ref{Variadic Functions,,
|
|
Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3
|
|
va_start}.
|
|
|
|
It should be emphasised that if a format string is invalid, or the arguments
|
|
don't match what the format specifies, then the behaviour of any of these
|
|
functions will be unpredictable. GCC format string checking is not available,
|
|
since it doesn't recognise the MPIR extensions.
|
|
|
|
The file based functions @code{gmp_printf} and @code{gmp_fprintf} will return
|
|
@math{-1} to indicate a write error. Output is not ``atomic'', so partial
|
|
output may be produced if a write error occurs. All the functions can return
|
|
@math{-1} if the C library @code{printf} variant in use returns @math{-1}, but
|
|
this shouldn't normally occur.
|
|
|
|
@deftypefun int gmp_printf (const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vprintf (const char *@var{fmt}, va_list @var{ap})
|
|
Print to the standard output @code{stdout}. Return the number of characters
|
|
written, or @math{-1} if an error occurred.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_fprintf (FILE *@var{fp}, const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vfprintf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})
|
|
Print to the stream @var{fp}. Return the number of characters written, or
|
|
@math{-1} if an error occurred.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_sprintf (char *@var{buf}, const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vsprintf (char *@var{buf}, const char *@var{fmt}, va_list @var{ap})
|
|
Form a null-terminated string in @var{buf}. Return the number of characters
|
|
written, excluding the terminating null.
|
|
|
|
No overlap is permitted between the space at @var{buf} and the string
|
|
@var{fmt}.
|
|
|
|
These functions are not recommended, since there's no protection against
|
|
exceeding the space available at @var{buf}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_snprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vsnprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, va_list @var{ap})
|
|
Form a null-terminated string in @var{buf}. No more than @var{size} bytes
|
|
will be written. To get the full output, @var{size} must be enough for the
|
|
string and null-terminator.
|
|
|
|
The return value is the total number of characters which ought to have been
|
|
produced, excluding the terminating null. If @math{@var{retval} @ge{}
|
|
@var{size}} then the actual output has been truncated to the first
|
|
@math{@var{size}-1} characters, and a null appended.
|
|
|
|
No overlap is permitted between the region @{@var{buf},@var{size}@} and the
|
|
@var{fmt} string.
|
|
|
|
Notice the return value is in ISO C99 @code{snprintf} style. This is so even
|
|
if the C library @code{vsnprintf} is the older GLIBC 2.0.x style.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_asprintf (char **@var{pp}, const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vasprintf (char **@var{pp}, const char *@var{fmt}, va_list @var{ap})
|
|
Form a null-terminated string in a block of memory obtained from the current
|
|
memory allocation function (@pxref{Custom Allocation}). The block will be the
|
|
size of the string and null-terminator. The address of the block in stored to
|
|
*@var{pp}. The return value is the number of characters produced, excluding
|
|
the null-terminator.
|
|
|
|
Unlike the C library @code{asprintf}, @code{gmp_asprintf} doesn't return
|
|
@math{-1} if there's no more memory available, it lets the current allocation
|
|
function handle that.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_obstack_printf (struct obstack *@var{ob}, const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_obstack_vprintf (struct obstack *@var{ob}, const char *@var{fmt}, va_list @var{ap})
|
|
@cindex @code{obstack} output
|
|
Append to the current object in @var{ob}. The return value is the number of
|
|
characters written. A null-terminator is not written.
|
|
|
|
@var{fmt} cannot be within the current object in @var{ob}, since that object
|
|
might move as it grows.
|
|
|
|
These functions are available only when the C library provides the obstack
|
|
feature, which probably means only on GNU systems, see @ref{Obstacks,,
|
|
Obstacks, libc, The GNU C Library Reference Manual}.
|
|
@end deftypefun
|
|
|
|
|
|
@node C++ Formatted Output, , Formatted Output Functions, Formatted Output
|
|
@section C++ Formatted Output
|
|
@cindex C++ @code{ostream} output
|
|
@cindex @code{ostream} output
|
|
|
|
The following functions are provided in @file{libgmpxx}/@file{libmpirxx} (@pxref{Headers and
|
|
Libraries}), which is built if C++ support is enabled (@pxref{Build Options}).
|
|
Prototypes are available from @code{<mpir.h>}.
|
|
|
|
@deftypefun ostream& operator<< (ostream& @var{stream}, mpz_t @var{op})
|
|
Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
|
|
@code{ios::width} is reset to 0 after output, the same as the standard
|
|
@code{ostream operator<<} routines do.
|
|
|
|
In hex or octal, @var{op} is printed as a signed number, the same as for
|
|
decimal. This is unlike the standard @code{operator<<} routines on @code{int}
|
|
etc, which instead give twos complement.
|
|
@end deftypefun
|
|
|
|
@deftypefun ostream& operator<< (ostream& @var{stream}, mpq_t @var{op})
|
|
Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
|
|
@code{ios::width} is reset to 0 after output, the same as the standard
|
|
@code{ostream operator<<} routines do.
|
|
|
|
Output will be a fraction like @samp{5/9}, or if the denominator is 1 then
|
|
just a plain integer like @samp{123}.
|
|
|
|
In hex or octal, @var{op} is printed as a signed value, the same as for
|
|
decimal. If @code{ios::showbase} is set then a base indicator is shown on
|
|
both the numerator and denominator (if the denominator is required).
|
|
@end deftypefun
|
|
|
|
@deftypefun ostream& operator<< (ostream& @var{stream}, mpf_t @var{op})
|
|
Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
|
|
@code{ios::width} is reset to 0 after output, the same as the standard
|
|
@code{ostream operator<<} routines do.
|
|
|
|
The decimal point follows the standard library float @code{operator<<}, which
|
|
on recent systems means the @code{std::locale} imbued on @var{stream}.
|
|
|
|
Hex and octal are supported, unlike the standard @code{operator<<} on
|
|
@code{double}. The mantissa will be in hex or octal, the exponent will be in
|
|
decimal. For hex the exponent delimiter is an @samp{@@}. This is as per
|
|
@code{mpf_out_str}.
|
|
|
|
@code{ios::showbase} is supported, and will put a base on the mantissa, for
|
|
example hex @samp{0x1.8} or @samp{0x0.8}, or octal @samp{01.4} or @samp{00.4}.
|
|
This last form is slightly strange, but at least differentiates itself from
|
|
decimal.
|
|
@end deftypefun
|
|
|
|
These operators mean that MPIR types can be printed in the usual C++ way, for
|
|
example,
|
|
|
|
@example
|
|
mpz_t z;
|
|
int n;
|
|
...
|
|
cout << "iteration " << n << " value " << z << "\n";
|
|
@end example
|
|
|
|
But note that @code{ostream} output (and @code{istream} input, @pxref{C++
|
|
Formatted Input}) is the only overloading available for the MPIR types and that
|
|
for instance using @code{+} with an @code{mpz_t} will have unpredictable
|
|
results. For classes with overloading, see @ref{C++ Class Interface}.
|
|
|
|
|
|
@node Formatted Input, C++ Class Interface, Formatted Output, Top
|
|
@chapter Formatted Input
|
|
@cindex Formatted input
|
|
@cindex @code{scanf} formatted input
|
|
|
|
@menu
|
|
* Formatted Input Strings::
|
|
* Formatted Input Functions::
|
|
* C++ Formatted Input::
|
|
@end menu
|
|
|
|
|
|
@node Formatted Input Strings, Formatted Input Functions, Formatted Input, Formatted Input
|
|
@section Formatted Input Strings
|
|
|
|
@code{gmp_scanf} and friends accept format strings similar to the standard C
|
|
@code{scanf} (@pxref{Formatted Input,, Formatted Input, libc, The GNU C
|
|
Library Reference Manual}). A format specification is of the form
|
|
|
|
@example
|
|
% [flags] [width] [type] conv
|
|
@end example
|
|
|
|
MPIR adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}
|
|
and @code{mpf_t} respectively. @samp{Z} and @samp{Q} behave like integers.
|
|
@samp{Q} will read a @samp{/} and a denominator, if present. @samp{F} behaves
|
|
like a float.
|
|
|
|
MPIR variables don't require an @code{&} when passed to @code{gmp_scanf}, since
|
|
they're already ``call-by-reference''. For example,
|
|
|
|
@example
|
|
/* to read say "a(5) = 1234" */
|
|
int n;
|
|
mpz_t z;
|
|
gmp_scanf ("a(%d) = %Zd\n", &n, z);
|
|
|
|
mpq_t q1, q2;
|
|
gmp_sscanf ("0377 + 0x10/0x11", "%Qi + %Qi", q1, q2);
|
|
|
|
/* to read say "topleft (1.55,-2.66)" */
|
|
mpf_t x, y;
|
|
char buf[32];
|
|
gmp_scanf ("%31s (%Ff,%Ff)", buf, x, y);
|
|
@end example
|
|
|
|
All the standard C @code{scanf} types behave the same as in the C library
|
|
@code{scanf}, and can be freely intermixed with the MPIR extensions. In the
|
|
current implementation the standard parts of the format string are simply
|
|
handed to @code{scanf} and only the MPIR extensions handled directly.
|
|
|
|
The flags accepted are as follows. @samp{a} and @samp{'} will depend on
|
|
support from the C library, and @samp{'} cannot be used with MPIR types.
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{*} @tab read but don't store
|
|
@item @nicode{a} @tab allocate a buffer (string conversions)
|
|
@item @nicode{'} @tab grouped digits, GLIBC style (not MPIR types)
|
|
@end multitable
|
|
@end quotation
|
|
|
|
The standard types accepted are as follows. @samp{h} and @samp{l} are
|
|
portable, the rest will depend on the compiler (or include files) for the type
|
|
and the C library for the input.
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{h} @tab @nicode{short}
|
|
@item @nicode{hh} @tab @nicode{char}
|
|
@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t}
|
|
@item @nicode{l} @tab @nicode{long int}, @nicode{double} or @nicode{wchar_t}
|
|
@item @nicode{ll} @tab @nicode{long long}
|
|
@item @nicode{L} @tab @nicode{long double}
|
|
@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t}
|
|
@item @nicode{t} @tab @nicode{ptrdiff_t}
|
|
@item @nicode{z} @tab @nicode{size_t}
|
|
@end multitable
|
|
@end quotation
|
|
|
|
@noindent
|
|
The MPIR types are
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{F} @tab @nicode{mpf_t}, float conversions
|
|
@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions
|
|
@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions
|
|
@end multitable
|
|
@end quotation
|
|
|
|
The conversions accepted are as follows. @samp{p} and @samp{[} will depend on
|
|
support from the C library, the rest are standard.
|
|
|
|
@quotation
|
|
@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item @nicode{c} @tab character or characters
|
|
@item @nicode{d} @tab decimal integer
|
|
@item @nicode{e} @nicode{E} @nicode{f} @nicode{g} @nicode{G}
|
|
@tab float
|
|
@item @nicode{i} @tab integer with base indicator
|
|
@item @nicode{n} @tab characters read so far
|
|
@item @nicode{o} @tab octal integer
|
|
@item @nicode{p} @tab pointer
|
|
@item @nicode{s} @tab string of non-whitespace characters
|
|
@item @nicode{u} @tab decimal integer
|
|
@item @nicode{x} @nicode{X} @tab hex integer
|
|
@item @nicode{[} @tab string of characters in a set
|
|
@end multitable
|
|
@end quotation
|
|
|
|
@samp{e}, @samp{E}, @samp{f}, @samp{g} and @samp{G} are identical, they all
|
|
read either fixed point or scientific format, and either upper or lower case
|
|
@samp{e} for the exponent in scientific format.
|
|
|
|
C99 style hex float format (@code{printf %a}, @pxref{Formatted Output
|
|
Strings}) is always accepted for @code{mpf_t}, but for the standard float
|
|
types it will depend on the C library.
|
|
|
|
@samp{x} and @samp{X} are identical, both accept both upper and lower case
|
|
hexadecimal.
|
|
|
|
@samp{o}, @samp{u}, @samp{x} and @samp{X} all read positive or negative
|
|
values. For the standard C types these are described as ``unsigned''
|
|
conversions, but that merely affects certain overflow handling, negatives are
|
|
still allowed (per @code{strtoul}, @pxref{Parsing of Integers,, Parsing of
|
|
Integers, libc, The GNU C Library Reference Manual}). For MPIR types there are
|
|
no overflows, so @samp{d} and @samp{u} are identical.
|
|
|
|
@samp{Q} type reads the numerator and (optional) denominator as given. If the
|
|
value might not be in canonical form then @code{mpq_canonicalize} must be
|
|
called before using it in any calculations (@pxref{Rational Number
|
|
Functions}).
|
|
|
|
@samp{Qi} will read a base specification separately for the numerator and
|
|
denominator. For example @samp{0x10/11} would be 16/11, whereas
|
|
@samp{0x10/0x11} would be 16/17.
|
|
|
|
@samp{n} can be used with any of the types above, even the MPIR types.
|
|
@samp{*} to suppress assignment is allowed, though in that case it would do
|
|
nothing at all.
|
|
|
|
Other conversions or types that might be accepted by the C library
|
|
@code{scanf} cannot be used through @code{gmp_scanf}.
|
|
|
|
Whitespace is read and discarded before a field, except for @samp{c} and
|
|
@samp{[} conversions.
|
|
|
|
For float conversions, the decimal point character (or string) expected is
|
|
taken from the current locale settings on systems which provide
|
|
@code{localeconv} (@pxref{Locales,, Locales and Internationalization, libc,
|
|
The GNU C Library Reference Manual}). The C library will normally do the same
|
|
for standard float input.
|
|
|
|
The format string is only interpreted as plain @code{char}s, multibyte
|
|
characters are not recognised. Perhaps this will change in the future.
|
|
|
|
|
|
@node Formatted Input Functions, C++ Formatted Input, Formatted Input Strings, Formatted Input
|
|
@section Formatted Input Functions
|
|
@cindex Input functions
|
|
|
|
Each of the following functions is similar to the corresponding C library
|
|
function. The plain @code{scanf} forms take a variable argument list. The
|
|
@code{vscanf} forms take an argument pointer, see @ref{Variadic Functions,,
|
|
Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3
|
|
va_start}.
|
|
|
|
It should be emphasised that if a format string is invalid, or the arguments
|
|
don't match what the format specifies, then the behaviour of any of these
|
|
functions will be unpredictable. GCC format string checking is not available,
|
|
since it doesn't recognise the MPIR extensions.
|
|
|
|
No overlap is permitted between the @var{fmt} string and any of the results
|
|
produced.
|
|
|
|
@deftypefun int gmp_scanf (const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vscanf (const char *@var{fmt}, va_list @var{ap})
|
|
Read from the standard input @code{stdin}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_fscanf (FILE *@var{fp}, const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vfscanf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})
|
|
Read from the stream @var{fp}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int gmp_sscanf (const char *@var{s}, const char *@var{fmt}, @dots{})
|
|
@deftypefunx int gmp_vsscanf (const char *@var{s}, const char *@var{fmt}, va_list @var{ap})
|
|
Read from a null-terminated string @var{s}.
|
|
@end deftypefun
|
|
|
|
The return value from each of these functions is the same as the standard C99
|
|
@code{scanf}, namely the number of fields successfully parsed and stored.
|
|
@samp{%n} fields and fields read but suppressed by @samp{*} don't count
|
|
towards the return value.
|
|
|
|
If end of input (or a file error) is reached before a character for a field or
|
|
a literal, and if no previous non-suppressed fields have matched, then the
|
|
return value is @code{EOF} instead of 0. A whitespace character in the format
|
|
string is only an optional match and doesn't induce an @code{EOF} in this
|
|
fashion. Leading whitespace read and discarded for a field don't count as
|
|
characters for that field.
|
|
|
|
For the MPIR types, input parsing follows C99 rules, namely one character of
|
|
lookahead is used and characters are read while they continue to meet the
|
|
format requirements. If this doesn't provide a complete number then the
|
|
function terminates, with that field not stored nor counted towards the return
|
|
value. For instance with @code{mpf_t} an input @samp{1.23e-XYZ} would be read
|
|
up to the @samp{X} and that character pushed back since it's not a digit. The
|
|
string @samp{1.23e-} would then be considered invalid since an @samp{e} must
|
|
be followed by at least one digit.
|
|
|
|
For the standard C types, in the current implementation MPIR calls the C
|
|
library @code{scanf} functions, which might have looser rules about what
|
|
constitutes a valid input.
|
|
|
|
Note that @code{gmp_sscanf} is the same as @code{gmp_fscanf} and only does one
|
|
character of lookahead when parsing. Although clearly it could look at its
|
|
entire input, it is deliberately made identical to @code{gmp_fscanf}, the same
|
|
way C99 @code{sscanf} is the same as @code{fscanf}.
|
|
|
|
|
|
@node C++ Formatted Input, , Formatted Input Functions, Formatted Input
|
|
@section C++ Formatted Input
|
|
@cindex C++ @code{istream} input
|
|
@cindex @code{istream} input
|
|
|
|
The following functions are provided in @file{libgmpxx}/@file{libmpirxx} (@pxref{Headers and
|
|
Libraries}), which is built only if C++ support is enabled (@pxref{Build
|
|
Options}). Prototypes are available from @code{<mpir.h>}.
|
|
|
|
@deftypefun istream& operator>> (istream& @var{stream}, mpz_t @var{rop})
|
|
Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.
|
|
@end deftypefun
|
|
|
|
@deftypefun istream& operator>> (istream& @var{stream}, mpq_t @var{rop})
|
|
An integer like @samp{123} will be read, or a fraction like @samp{5/9}. No
|
|
whitespace is allowed around the @samp{/}. If the fraction is not in
|
|
canonical form then @code{mpq_canonicalize} must be called (@pxref{Rational
|
|
Number Functions}) before operating on it.
|
|
|
|
As per integer input, an @samp{0} or @samp{0x} base indicator is read when
|
|
none of @code{ios::dec}, @code{ios::oct} or @code{ios::hex} are set. This is
|
|
done separately for numerator and denominator, so that for instance
|
|
@samp{0x10/11} is @math{16/11} and @samp{0x10/0x11} is @math{16/17}.
|
|
@end deftypefun
|
|
|
|
@deftypefun istream& operator>> (istream& @var{stream}, mpf_t @var{rop})
|
|
Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.
|
|
|
|
Hex or octal floats are not supported, but might be in the future, or perhaps
|
|
it's best to accept only what the standard float @code{operator>>} does.
|
|
@end deftypefun
|
|
|
|
Note that digit grouping specified by the @code{istream} locale is currently
|
|
not accepted. Perhaps this will change in the future.
|
|
|
|
@sp 1
|
|
These operators mean that MPIR types can be read in the usual C++ way, for
|
|
example,
|
|
|
|
@example
|
|
mpz_t z;
|
|
...
|
|
cin >> z;
|
|
@end example
|
|
|
|
But note that @code{istream} input (and @code{ostream} output, @pxref{C++
|
|
Formatted Output}) is the only overloading available for the MPIR types and
|
|
that for instance using @code{+} with an @code{mpz_t} will have unpredictable
|
|
results. For classes with overloading, see @ref{C++ Class Interface}.
|
|
|
|
|
|
|
|
@node C++ Class Interface, BSD Compatible Functions, Formatted Input, Top
|
|
@chapter C++ Class Interface
|
|
@cindex C++ interface
|
|
|
|
This chapter describes the C++ class based interface to MPIR.
|
|
|
|
All MPIR C language types and functions can be used in C++ programs, since
|
|
@file{mpir.h} has @code{extern "C"} qualifiers, but the class interface offers
|
|
overloaded functions and operators which may be more convenient.
|
|
|
|
Due to the implementation of this interface, a reasonably recent C++ compiler
|
|
is required, one supporting namespaces, partial specialization of templates
|
|
and member templates. For GCC this means version 2.91 or later.
|
|
|
|
@strong{Everything described in this chapter is to be considered preliminary
|
|
and might be subject to incompatible changes if some unforeseen difficulty
|
|
reveals itself.}
|
|
|
|
@menu
|
|
* C++ Interface General::
|
|
* C++ Interface Integers::
|
|
* C++ Interface Rationals::
|
|
* C++ Interface Floats::
|
|
* C++ Interface Random Numbers::
|
|
* C++ Interface Limitations::
|
|
@end menu
|
|
|
|
|
|
@node C++ Interface General, C++ Interface Integers, C++ Class Interface, C++ Class Interface
|
|
@section C++ Interface General
|
|
|
|
@noindent
|
|
All the C++ classes and functions are available with
|
|
|
|
@cindex @code{gmpxx.h}
|
|
@example
|
|
#include <gmpxx.h>
|
|
@end example
|
|
|
|
Programs should be linked with the @file{libgmpxx}/@file{libmpirxx} and @file{libgmp}/@file{libmpir}
|
|
libraries. For example,
|
|
|
|
@example
|
|
g++ mycxxprog.cc -lmpirxx -lmpir
|
|
@end example
|
|
|
|
@noindent
|
|
The classes defined are
|
|
|
|
@deftp Class mpz_class
|
|
@deftpx Class mpq_class
|
|
@deftpx Class mpf_class
|
|
@end deftp
|
|
|
|
The standard operators and various standard functions are overloaded to allow
|
|
arithmetic with these classes. For example,
|
|
|
|
@example
|
|
int
|
|
main (void)
|
|
@{
|
|
mpz_class a, b, c;
|
|
|
|
a = 1234;
|
|
b = "-5678";
|
|
c = a+b;
|
|
cout << "sum is " << c << "\n";
|
|
cout << "absolute value is " << abs(c) << "\n";
|
|
|
|
return 0;
|
|
@}
|
|
@end example
|
|
|
|
An important feature of the implementation is that an expression like
|
|
@code{a=b+c} results in a single call to the corresponding @code{mpz_add},
|
|
without using a temporary for the @code{b+c} part. Expressions which by their
|
|
nature imply intermediate values, like @code{a=b*c+d*e}, still use temporaries
|
|
though.
|
|
|
|
The classes can be freely intermixed in expressions, as can the classes and
|
|
the standard types @code{long}, @code{unsigned long} and @code{double}.
|
|
Smaller types like @code{int} or @code{float} can also be intermixed, since
|
|
C++ will promote them.
|
|
|
|
Note that @code{bool} is not accepted directly, but must be explicitly cast to
|
|
an @code{int} first. This is because C++ will automatically convert any
|
|
pointer to a @code{bool}, so if MPIR accepted @code{bool} it would make all
|
|
sorts of invalid class and pointer combinations compile but almost certainly
|
|
not do anything sensible.
|
|
|
|
Conversions back from the classes to standard C++ types aren't done
|
|
automatically, instead member functions like @code{get_si} are provided (see
|
|
the following sections for details).
|
|
|
|
Also there are no automatic conversions from the classes to the corresponding
|
|
MPIR C types, instead a reference to the underlying C object can be obtained
|
|
with the following functions,
|
|
|
|
@deftypefun mpz_t mpz_class::get_mpz_t ()
|
|
@deftypefunx mpq_t mpq_class::get_mpq_t ()
|
|
@deftypefunx mpf_t mpf_class::get_mpf_t ()
|
|
@end deftypefun
|
|
|
|
These can be used to call a C function which doesn't have a C++ class
|
|
interface. For example to set @code{a} to the GCD of @code{b} and @code{c},
|
|
|
|
@example
|
|
mpz_class a, b, c;
|
|
...
|
|
mpz_gcd (a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t());
|
|
@end example
|
|
|
|
In the other direction, a class can be initialized from the corresponding MPIR
|
|
C type, or assigned to if an explicit constructor is used. In both cases this
|
|
makes a copy of the value, it doesn't create any sort of association. For
|
|
example,
|
|
|
|
@example
|
|
mpz_t z;
|
|
// ... init and calculate z ...
|
|
mpz_class x(z);
|
|
mpz_class y;
|
|
y = mpz_class (z);
|
|
@end example
|
|
|
|
There are no namespace setups in @file{gmpxx.h}, all types and functions are
|
|
simply put into the global namespace. This is what @file{mpir.h} has done in
|
|
the past, and continues to do for compatibility. The extras provided by
|
|
@file{gmpxx.h} follow MPIR naming conventions and are unlikely to clash with
|
|
anything.
|
|
|
|
|
|
@node C++ Interface Integers, C++ Interface Rationals, C++ Interface General, C++ Class Interface
|
|
@section C++ Interface Integers
|
|
|
|
@deftypefun void mpz_class::mpz_class (type @var{n})
|
|
Construct an @code{mpz_class}. All the standard C++ types may be used, except
|
|
@code{long long} and @code{long double}, and all the MPIR C++ classes can be
|
|
used. Any necessary conversion follows the corresponding C function, for
|
|
example @code{double} follows @code{mpz_set_d} (@pxref{Assigning Integers}).
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_class::mpz_class (mpz_t @var{z})
|
|
Construct an @code{mpz_class} from an @code{mpz_t}. The value in @var{z} is
|
|
copied into the new @code{mpz_class}, there won't be any permanent association
|
|
between it and @var{z}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpz_class::mpz_class (const char *@var{s})
|
|
@deftypefunx void mpz_class::mpz_class (const char *@var{s}, int @var{base} = 0)
|
|
@deftypefunx void mpz_class::mpz_class (const string& @var{s})
|
|
@deftypefunx void mpz_class::mpz_class (const string& @var{s}, int @var{base} = 0)
|
|
Construct an @code{mpz_class} converted from a string using @code{mpz_set_str}
|
|
(@pxref{Assigning Integers}).
|
|
|
|
If the string is not a valid integer, an @code{std::invalid_argument}
|
|
exception is thrown. The same applies to @code{operator=}.
|
|
@end deftypefun
|
|
|
|
@deftypefun mpz_class operator/ (mpz_class @var{a}, mpz_class @var{d})
|
|
@deftypefunx mpz_class operator% (mpz_class @var{a}, mpz_class @var{d})
|
|
Divisions involving @code{mpz_class} round towards zero, as per the
|
|
@code{mpz_tdiv_q} and @code{mpz_tdiv_r} functions (@pxref{Integer Division}).
|
|
This is the same as the C99 @code{/} and @code{%} operators.
|
|
|
|
The @code{mpz_fdiv@dots{}} or @code{mpz_cdiv@dots{}} functions can always be called
|
|
directly if desired. For example,
|
|
|
|
@example
|
|
mpz_class q, a, d;
|
|
...
|
|
mpz_fdiv_q (q.get_mpz_t(), a.get_mpz_t(), d.get_mpz_t());
|
|
@end example
|
|
@end deftypefun
|
|
|
|
@deftypefun mpz_class abs (mpz_class @var{op1})
|
|
@deftypefunx int cmp (mpz_class @var{op1}, type @var{op2})
|
|
@deftypefunx int cmp (type @var{op1}, mpz_class @var{op2})
|
|
@maybepagebreak
|
|
@deftypefunx bool mpz_class::fits_sint_p (void)
|
|
@deftypefunx bool mpz_class::fits_slong_p (void)
|
|
@deftypefunx bool mpz_class::fits_sshort_p (void)
|
|
@maybepagebreak
|
|
@deftypefunx bool mpz_class::fits_uint_p (void)
|
|
@deftypefunx bool mpz_class::fits_ulong_p (void)
|
|
@deftypefunx bool mpz_class::fits_ushort_p (void)
|
|
@maybepagebreak
|
|
@deftypefunx double mpz_class::get_d (void)
|
|
@deftypefunx long mpz_class::get_si (void)
|
|
@deftypefunx string mpz_class::get_str (int @var{base} = 10)
|
|
@deftypefunx {unsigned long} mpz_class::get_ui (void)
|
|
@maybepagebreak
|
|
@deftypefunx int mpz_class::set_str (const char *@var{str}, int @var{base})
|
|
@deftypefunx int mpz_class::set_str (const string& @var{str}, int @var{base})
|
|
@deftypefunx int sgn (mpz_class @var{op})
|
|
@deftypefunx mpz_class sqrt (mpz_class @var{op})
|
|
These functions provide a C++ class interface to the corresponding MPIR C
|
|
routines.
|
|
|
|
@code{cmp} can be used with any of the classes or the standard C++ types,
|
|
except @code{long long} and @code{long double}.
|
|
@end deftypefun
|
|
|
|
@sp 1
|
|
Overloaded operators for combinations of @code{mpz_class} and @code{double}
|
|
are provided for completeness, but it should be noted that if the given
|
|
@code{double} is not an integer then the way any rounding is done is currently
|
|
unspecified. The rounding might take place at the start, in the middle, or at
|
|
the end of the operation, and it might change in the future.
|
|
|
|
Conversions between @code{mpz_class} and @code{double}, however, are defined
|
|
to follow the corresponding C functions @code{mpz_get_d} and @code{mpz_set_d}.
|
|
And comparisons are always made exactly, as per @code{mpz_cmp_d}.
|
|
|
|
|
|
@node C++ Interface Rationals, C++ Interface Floats, C++ Interface Integers, C++ Class Interface
|
|
@section C++ Interface Rationals
|
|
|
|
In all the following constructors, if a fraction is given then it should be in
|
|
canonical form, or if not then @code{mpq_class::canonicalize} called.
|
|
|
|
@deftypefun void mpq_class::mpq_class (type @var{op})
|
|
@deftypefunx void mpq_class::mpq_class (integer @var{num}, integer @var{den})
|
|
Construct an @code{mpq_class}. The initial value can be a single value of any
|
|
type, or a pair of integers (@code{mpz_class} or standard C++ integer types)
|
|
representing a fraction, except that @code{long long} and @code{long double}
|
|
are not supported. For example,
|
|
|
|
@example
|
|
mpq_class q (99);
|
|
mpq_class q (1.75);
|
|
mpq_class q (1, 3);
|
|
@end example
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_class::mpq_class (mpq_t @var{q})
|
|
Construct an @code{mpq_class} from an @code{mpq_t}. The value in @var{q} is
|
|
copied into the new @code{mpq_class}, there won't be any permanent association
|
|
between it and @var{q}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_class::mpq_class (const char *@var{s})
|
|
@deftypefunx void mpq_class::mpq_class (const char *@var{s}, int @var{base} = 0)
|
|
@deftypefunx void mpq_class::mpq_class (const string& @var{s})
|
|
@deftypefunx void mpq_class::mpq_class (const string& @var{s}, int @var{base} = 0)
|
|
Construct an @code{mpq_class} converted from a string using @code{mpq_set_str}
|
|
(@pxref{Initializing Rationals}).
|
|
|
|
If the string is not a valid rational, an @code{std::invalid_argument}
|
|
exception is thrown. The same applies to @code{operator=}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpq_class::canonicalize ()
|
|
Put an @code{mpq_class} into canonical form, as per @ref{Rational Number
|
|
Functions}. All arithmetic operators require their operands in canonical
|
|
form, and will return results in canonical form.
|
|
@end deftypefun
|
|
|
|
@deftypefun mpq_class abs (mpq_class @var{op})
|
|
@deftypefunx int cmp (mpq_class @var{op1}, type @var{op2})
|
|
@deftypefunx int cmp (type @var{op1}, mpq_class @var{op2})
|
|
@maybepagebreak
|
|
@deftypefunx double mpq_class::get_d (void)
|
|
@deftypefunx string mpq_class::get_str (int @var{base} = 10)
|
|
@maybepagebreak
|
|
@deftypefunx int mpq_class::set_str (const char *@var{str}, int @var{base})
|
|
@deftypefunx int mpq_class::set_str (const string& @var{str}, int @var{base})
|
|
@deftypefunx int sgn (mpq_class @var{op})
|
|
These functions provide a C++ class interface to the corresponding MPIR C
|
|
routines.
|
|
|
|
@code{cmp} can be used with any of the classes or the standard C++ types,
|
|
except @code{long long} and @code{long double}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {mpz_class&} mpq_class::get_num ()
|
|
@deftypefunx {mpz_class&} mpq_class::get_den ()
|
|
Get a reference to an @code{mpz_class} which is the numerator or denominator
|
|
of an @code{mpq_class}. This can be used both for read and write access. If
|
|
the object returned is modified, it modifies the original @code{mpq_class}.
|
|
|
|
If direct manipulation might produce a non-canonical value, then
|
|
@code{mpq_class::canonicalize} must be called before further operations.
|
|
@end deftypefun
|
|
|
|
@deftypefun mpz_t mpq_class::get_num_mpz_t ()
|
|
@deftypefunx mpz_t mpq_class::get_den_mpz_t ()
|
|
Get a reference to the underlying @code{mpz_t} numerator or denominator of an
|
|
@code{mpq_class}. This can be passed to C functions expecting an
|
|
@code{mpz_t}. Any modifications made to the @code{mpz_t} will modify the
|
|
original @code{mpq_class}.
|
|
|
|
If direct manipulation might produce a non-canonical value, then
|
|
@code{mpq_class::canonicalize} must be called before further operations.
|
|
@end deftypefun
|
|
|
|
@deftypefun istream& operator>> (istream& @var{stream}, mpq_class& @var{rop});
|
|
Read @var{rop} from @var{stream}, using its @code{ios} formatting settings,
|
|
the same as @code{mpq_t operator>>} (@pxref{C++ Formatted Input}).
|
|
|
|
If the @var{rop} read might not be in canonical form then
|
|
@code{mpq_class::canonicalize} must be called.
|
|
@end deftypefun
|
|
|
|
|
|
@node C++ Interface Floats, C++ Interface Random Numbers, C++ Interface Rationals, C++ Class Interface
|
|
@section C++ Interface Floats
|
|
|
|
When an expression requires the use of temporary intermediate @code{mpf_class}
|
|
values, like @code{f=g*h+x*y}, those temporaries will have the same precision
|
|
as the destination @code{f}. Explicit constructors can be used if this
|
|
doesn't suit.
|
|
|
|
@deftypefun {} mpf_class::mpf_class (type @var{op})
|
|
@deftypefunx {} mpf_class::mpf_class (type @var{op}, unsigned long @var{prec})
|
|
Construct an @code{mpf_class}. Any standard C++ type can be used, except
|
|
@code{long long} and @code{long double}, and any of the MPIR C++ classes can be
|
|
used.
|
|
|
|
If @var{prec} is given, the initial precision is that value, in bits. If
|
|
@var{prec} is not given, then the initial precision is determined by the type
|
|
of @var{op} given. An @code{mpz_class}, @code{mpq_class}, or C++
|
|
builtin type will give the default @code{mpf} precision (@pxref{Initializing
|
|
Floats}). An @code{mpf_class} or expression will give the precision of that
|
|
value. The precision of a binary expression is the higher of the two
|
|
operands.
|
|
|
|
@example
|
|
mpf_class f(1.5); // default precision
|
|
mpf_class f(1.5, 500); // 500 bits (at least)
|
|
mpf_class f(x); // precision of x
|
|
mpf_class f(abs(x)); // precision of x
|
|
mpf_class f(-g, 1000); // 1000 bits (at least)
|
|
mpf_class f(x+y); // greater of precisions of x and y
|
|
@end example
|
|
@end deftypefun
|
|
|
|
@deftypefun void mpf_class::mpf_class (const char *@var{s})
|
|
@deftypefunx void mpf_class::mpf_class (const char *@var{s}, unsigned long @var{prec}, int @var{base} = 0)
|
|
@deftypefunx void mpf_class::mpf_class (const string& @var{s})
|
|
@deftypefunx void mpf_class::mpf_class (const string& @var{s}, unsigned long @var{prec}, int @var{base} = 0)
|
|
Construct an @code{mpf_class} converted from a string using @code{mpf_set_str}
|
|
(@pxref{Assigning Floats}). If @var{prec} is given, the initial precision is
|
|
that value, in bits. If not, the default @code{mpf} precision
|
|
(@pxref{Initializing Floats}) is used.
|
|
|
|
If the string is not a valid float, an @code{std::invalid_argument} exception
|
|
is thrown. The same applies to @code{operator=}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {mpf_class&} mpf_class::operator= (type @var{op})
|
|
Convert and store the given @var{op} value to an @code{mpf_class} object. The
|
|
same types are accepted as for the constructors above.
|
|
|
|
Note that @code{operator=} only stores a new value, it doesn't copy or change
|
|
the precision of the destination, instead the value is truncated if necessary.
|
|
This is the same as @code{mpf_set} etc. Note in particular this means for
|
|
@code{mpf_class} a copy constructor is not the same as a default constructor
|
|
plus assignment.
|
|
|
|
@example
|
|
mpf_class x (y); // x created with precision of y
|
|
|
|
mpf_class x; // x created with default precision
|
|
x = y; // value truncated to that precision
|
|
@end example
|
|
|
|
Applications using templated code may need to be careful about the assumptions
|
|
the code makes in this area, when working with @code{mpf_class} values of
|
|
various different or non-default precisions. For instance implementations of
|
|
the standard @code{complex} template have been seen in both styles above,
|
|
though of course @code{complex} is normally only actually specified for use
|
|
with the builtin float types.
|
|
@end deftypefun
|
|
|
|
@deftypefun mpf_class abs (mpf_class @var{op})
|
|
@deftypefunx mpf_class ceil (mpf_class @var{op})
|
|
@deftypefunx int cmp (mpf_class @var{op1}, type @var{op2})
|
|
@deftypefunx int cmp (type @var{op1}, mpf_class @var{op2})
|
|
@maybepagebreak
|
|
@deftypefunx bool mpf_class::fits_sint_p (void)
|
|
@deftypefunx bool mpf_class::fits_slong_p (void)
|
|
@deftypefunx bool mpf_class::fits_sshort_p (void)
|
|
@maybepagebreak
|
|
@deftypefunx bool mpf_class::fits_uint_p (void)
|
|
@deftypefunx bool mpf_class::fits_ulong_p (void)
|
|
@deftypefunx bool mpf_class::fits_ushort_p (void)
|
|
@maybepagebreak
|
|
@deftypefunx mpf_class floor (mpf_class @var{op})
|
|
@deftypefunx mpf_class hypot (mpf_class @var{op1}, mpf_class @var{op2})
|
|
@maybepagebreak
|
|
@deftypefunx double mpf_class::get_d (void)
|
|
@deftypefunx long mpf_class::get_si (void)
|
|
@deftypefunx string mpf_class::get_str (mp_exp_t& @var{exp}, int @var{base} = 10, size_t @var{digits} = 0)
|
|
@deftypefunx {unsigned long} mpf_class::get_ui (void)
|
|
@maybepagebreak
|
|
@deftypefunx int mpf_class::set_str (const char *@var{str}, int @var{base})
|
|
@deftypefunx int mpf_class::set_str (const string& @var{str}, int @var{base})
|
|
@deftypefunx int sgn (mpf_class @var{op})
|
|
@deftypefunx mpf_class sqrt (mpf_class @var{op})
|
|
@deftypefunx mpf_class trunc (mpf_class @var{op})
|
|
These functions provide a C++ class interface to the corresponding MPIR C
|
|
routines.
|
|
|
|
@code{cmp} can be used with any of the classes or the standard C++ types,
|
|
except @code{long long} and @code{long double}.
|
|
|
|
The accuracy provided by @code{hypot} is not currently guaranteed.
|
|
@end deftypefun
|
|
|
|
@deftypefun {unsigned long int} mpf_class::get_prec ()
|
|
@deftypefunx void mpf_class::set_prec (unsigned long @var{prec})
|
|
@deftypefunx void mpf_class::set_prec_raw (unsigned long @var{prec})
|
|
Get or set the current precision of an @code{mpf_class}.
|
|
|
|
The restrictions described for @code{mpf_set_prec_raw} (@pxref{Initializing
|
|
Floats}) apply to @code{mpf_class::set_prec_raw}. Note in particular that the
|
|
@code{mpf_class} must be restored to it's allocated precision before being
|
|
destroyed. This must be done by application code, there's no automatic
|
|
mechanism for it.
|
|
@end deftypefun
|
|
|
|
|
|
@node C++ Interface Random Numbers, C++ Interface Limitations, C++ Interface Floats, C++ Class Interface
|
|
@section C++ Interface Random Numbers
|
|
|
|
@deftp Class gmp_randclass
|
|
The C++ class interface to the MPIR random number functions uses
|
|
@code{gmp_randclass} to hold an algorithm selection and current state, as per
|
|
@code{gmp_randstate_t}.
|
|
@end deftp
|
|
|
|
@deftypefun {} gmp_randclass::gmp_randclass (void (*@var{randinit}) (gmp_randstate_t, @dots{}), @dots{})
|
|
Construct a @code{gmp_randclass}, using a call to the given @var{randinit}
|
|
function (@pxref{Random State Initialization}). The arguments expected are
|
|
the same as @var{randinit}, but with @code{mpz_class} instead of @code{mpz_t}.
|
|
For example,
|
|
|
|
@example
|
|
gmp_randclass r1 (gmp_randinit_default);
|
|
gmp_randclass r2 (gmp_randinit_lc_2exp_size, 32);
|
|
gmp_randclass r3 (gmp_randinit_lc_2exp, a, c, m2exp);
|
|
gmp_randclass r4 (gmp_randinit_mt);
|
|
@end example
|
|
|
|
@code{gmp_randinit_lc_2exp_size} will fail if the size requested is too big,
|
|
an @code{std::length_error} exception is thrown in that case.
|
|
@end deftypefun
|
|
|
|
@deftypefun {} gmp_randclass::gmp_randclass (gmp_randalg_t @var{alg}, @dots{})
|
|
Construct a @code{gmp_randclass} using the same parameters as
|
|
@code{gmp_randinit} (@pxref{Random State Initialization}). This function is
|
|
obsolete and the above @var{randinit} style should be preferred.
|
|
@end deftypefun
|
|
|
|
@deftypefun void gmp_randclass::seed (unsigned long int @var{s})
|
|
@deftypefunx void gmp_randclass::seed (mpz_class @var{s})
|
|
Seed a random number generator. See @pxref{Random Number Functions}, for how
|
|
to choose a good seed.
|
|
@end deftypefun
|
|
|
|
@deftypefun mpz_class gmp_randclass::get_z_bits (unsigned long @var{bits})
|
|
@deftypefunx mpz_class gmp_randclass::get_z_bits (mpz_class @var{bits})
|
|
Generate a random integer with a specified number of bits.
|
|
@end deftypefun
|
|
|
|
@deftypefun mpz_class gmp_randclass::get_z_range (mpz_class @var{n})
|
|
Generate a random integer in the range 0 to @math{@var{n}-1} inclusive.
|
|
@end deftypefun
|
|
|
|
@deftypefun mpf_class gmp_randclass::get_f ()
|
|
@deftypefunx mpf_class gmp_randclass::get_f (unsigned long @var{prec})
|
|
Generate a random float @var{f} in the range @math{0 <= @var{f} < 1}. @var{f}
|
|
will be to @var{prec} bits precision, or if @var{prec} is not given then to
|
|
the precision of the destination. For example,
|
|
|
|
@example
|
|
gmp_randclass r;
|
|
...
|
|
mpf_class f (0, 512); // 512 bits precision
|
|
f = r.get_f(); // random number, 512 bits
|
|
@end example
|
|
@end deftypefun
|
|
|
|
|
|
|
|
@node C++ Interface Limitations, , C++ Interface Random Numbers, C++ Class Interface
|
|
@section C++ Interface Limitations
|
|
|
|
@table @asis
|
|
@item @code{mpq_class} and Templated Reading
|
|
A generic piece of template code probably won't know that @code{mpq_class}
|
|
requires a @code{canonicalize} call if inputs read with @code{operator>>}
|
|
might be non-canonical. This can lead to incorrect results.
|
|
|
|
@code{operator>>} behaves as it does for reasons of efficiency. A
|
|
canonicalize can be quite time consuming on large operands, and is best
|
|
avoided if it's not necessary.
|
|
|
|
But this potential difficulty reduces the usefulness of @code{mpq_class}.
|
|
Perhaps a mechanism to tell @code{operator>>} what to do will be adopted in
|
|
the future, maybe a preprocessor define, a global flag, or an @code{ios} flag
|
|
pressed into service. Or maybe, at the risk of inconsistency, the
|
|
@code{mpq_class} @code{operator>>} could canonicalize and leave @code{mpq_t}
|
|
@code{operator>>} not doing so, for use on those occasions when that's
|
|
acceptable. Send feedback or alternate ideas to @uref{http://groups.google.com/group/mpir-devel}.
|
|
|
|
@item Subclassing
|
|
Subclassing the MPIR C++ classes works, but is not currently recommended.
|
|
|
|
Expressions involving subclasses resolve correctly (or seem to), but in normal
|
|
C++ fashion the subclass doesn't inherit constructors and assignments.
|
|
There's many of those in the MPIR classes, and a good way to reestablish them
|
|
in a subclass is not yet provided.
|
|
|
|
@item Templated Expressions
|
|
A subtle difficulty exists when using expressions together with
|
|
application-defined template functions. Consider the following, with @code{T}
|
|
intended to be some numeric type,
|
|
|
|
@example
|
|
template <class T>
|
|
T fun (const T &, const T &);
|
|
@end example
|
|
|
|
@noindent
|
|
When used with, say, plain @code{mpz_class} variables, it works fine: @code{T}
|
|
is resolved as @code{mpz_class}.
|
|
|
|
@example
|
|
mpz_class f(1), g(2);
|
|
fun (f, g); // Good
|
|
@end example
|
|
|
|
@noindent
|
|
But when one of the arguments is an expression, it doesn't work.
|
|
|
|
@example
|
|
mpz_class f(1), g(2), h(3);
|
|
fun (f, g+h); // Bad
|
|
@end example
|
|
|
|
This is because @code{g+h} ends up being a certain expression template type
|
|
internal to @code{gmpxx.h}, which the C++ template resolution rules are unable
|
|
to automatically convert to @code{mpz_class}. The workaround is simply to add
|
|
an explicit cast.
|
|
|
|
@example
|
|
mpz_class f(1), g(2), h(3);
|
|
fun (f, mpz_class(g+h)); // Good
|
|
@end example
|
|
|
|
Similarly, within @code{fun} it may be necessary to cast an expression to type
|
|
@code{T} when calling a templated @code{fun2}.
|
|
|
|
@example
|
|
template <class T>
|
|
void fun (T f, T g)
|
|
@{
|
|
fun2 (f, f+g); // Bad
|
|
@}
|
|
|
|
template <class T>
|
|
void fun (T f, T g)
|
|
@{
|
|
fun2 (f, T(f+g)); // Good
|
|
@}
|
|
@end example
|
|
@end table
|
|
|
|
|
|
@node BSD Compatible Functions, Custom Allocation, C++ Class Interface, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Berkeley MP Compatible Functions
|
|
@cindex Berkeley MP compatible functions
|
|
@cindex BSD MP compatible functions
|
|
|
|
These functions are intended to be fully compatible with the Berkeley MP
|
|
library which is available on many BSD derived U*ix systems. The
|
|
@samp{--enable-mpbsd} option must be used when building MPIR to make these
|
|
available (@pxref{Installing MPIR}).
|
|
|
|
The original Berkeley MP library has a usage restriction: you cannot use the
|
|
same variable as both source and destination in a single function call. The
|
|
compatible functions in MPIR do not share this restriction---inputs and
|
|
outputs may overlap.
|
|
|
|
It is not recommended that new programs are written using these functions.
|
|
Apart from the incomplete set of functions, the interface for initializing
|
|
@code{MINT} objects is more error prone, and the @code{pow} function collides
|
|
with @code{pow} in @file{libm.a}.
|
|
|
|
@cindex @code{mp.h}
|
|
@tindex MINT
|
|
Include the header @file{mp.h} to get the definition of the necessary types and
|
|
functions. If you are on a BSD derived system, make sure to include GNU
|
|
@file{mp.h} if you are going to link the GNU @file{libmp.a} to your program.
|
|
This means that you probably need to give the @samp{-I<dir>} option to the
|
|
compiler, where @samp{<dir>} is the directory where you have GNU @file{mp.h}.
|
|
|
|
@deftypefun {MINT *} itom (signed short int @var{initial_value})
|
|
Allocate an integer consisting of a @code{MINT} object and dynamic limb space.
|
|
Initialize the integer to @var{initial_value}. Return a pointer to the
|
|
@code{MINT} object.
|
|
@end deftypefun
|
|
|
|
@deftypefun {MINT *} xtom (char *@var{initial_value})
|
|
Allocate an integer consisting of a @code{MINT} object and dynamic limb space.
|
|
Initialize the integer from @var{initial_value}, a hexadecimal,
|
|
null-terminated C string. Return a pointer to the @code{MINT} object.
|
|
@end deftypefun
|
|
|
|
@deftypefun void move (MINT *@var{src}, MINT *@var{dest})
|
|
Set @var{dest} to @var{src} by copying. Both variables must be previously
|
|
initialized.
|
|
@end deftypefun
|
|
|
|
@deftypefun void madd (MINT *@var{src_1}, MINT *@var{src_2}, MINT *@var{destination})
|
|
Add @var{src_1} and @var{src_2} and put the sum in @var{destination}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void msub (MINT *@var{src_1}, MINT *@var{src_2}, MINT *@var{destination})
|
|
Subtract @var{src_2} from @var{src_1} and put the difference in
|
|
@var{destination}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mult (MINT *@var{src_1}, MINT *@var{src_2}, MINT *@var{destination})
|
|
Multiply @var{src_1} and @var{src_2} and put the product in @var{destination}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mdiv (MINT *@var{dividend}, MINT *@var{divisor}, MINT *@var{quotient}, MINT *@var{remainder})
|
|
@deftypefunx void sdiv (MINT *@var{dividend}, signed short int @var{divisor}, MINT *@var{quotient}, signed short int *@var{remainder})
|
|
Set @var{quotient} to @var{dividend}/@var{divisor}, and @var{remainder} to
|
|
@var{dividend} mod @var{divisor}. The quotient is rounded towards zero; the
|
|
remainder has the same sign as the dividend unless it is zero.
|
|
|
|
Some implementations of these functions work differently---or not at all---for
|
|
negative arguments.
|
|
@end deftypefun
|
|
|
|
@deftypefun void msqrt (MINT *@var{op}, MINT *@var{root}, MINT *@var{remainder})
|
|
Set @var{root} to @m{\lfloor\sqrt{@var{op}}\rfloor, the truncated integer part
|
|
of the square root of @var{op}}, like @code{mpz_sqrt}. Set @var{remainder} to
|
|
@m{(@var{op} - @var{root}^2), @var{op}@minus{}@var{root}*@var{root}}, i.e.
|
|
zero if @var{op} is a perfect square.
|
|
|
|
If @var{root} and @var{remainder} are the same variable, the results are
|
|
undefined.
|
|
@end deftypefun
|
|
|
|
@deftypefun void pow (MINT *@var{base}, MINT *@var{exp}, MINT *@var{mod}, MINT *@var{dest})
|
|
Set @var{dest} to (@var{base} raised to @var{exp}) modulo @var{mod}.
|
|
|
|
Note that the name @code{pow} clashes with @code{pow} from the standard C math
|
|
library (@pxref{Exponents and Logarithms,, Exponentiation and Logarithms,
|
|
libc, The GNU C Library Reference Manual}). An application will only be able
|
|
to use one or the other.
|
|
@end deftypefun
|
|
|
|
@deftypefun void rpow (MINT *@var{base}, signed short int @var{exp}, MINT *@var{dest})
|
|
Set @var{dest} to @var{base} raised to @var{exp}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void gcd (MINT *@var{op1}, MINT *@var{op2}, MINT *@var{res})
|
|
Set @var{res} to the greatest common divisor of @var{op1} and @var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int mcmp (MINT *@var{op1}, MINT *@var{op2})
|
|
Compare @var{op1} and @var{op2}. Return a positive value if @var{op1} >
|
|
@var{op2}, zero if @var{op1} = @var{op2}, and a negative value if @var{op1} <
|
|
@var{op2}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void min (MINT *@var{dest})
|
|
Input a decimal string from @code{stdin}, and put the read integer in
|
|
@var{dest}. SPC and TAB are allowed in the number string, and are ignored.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mout (MINT *@var{src})
|
|
Output @var{src} to @code{stdout}, as a decimal string. Also output a newline.
|
|
@end deftypefun
|
|
|
|
@deftypefun {char *} mtox (MINT *@var{op})
|
|
Convert @var{op} to a hexadecimal string, and return a pointer to the string.
|
|
The returned string is allocated using the default memory allocation function,
|
|
@code{malloc} by default. It will be @code{strlen(str)+1} bytes, that being
|
|
exactly enough for the string and null-terminator.
|
|
@end deftypefun
|
|
|
|
@deftypefun void mfree (MINT *@var{op})
|
|
De-allocate, the space used by @var{op}. @strong{This function should only be
|
|
passed a value returned by @code{itom} or @code{xtom}.}
|
|
@end deftypefun
|
|
|
|
|
|
@node Custom Allocation, Language Bindings, BSD Compatible Functions, Top
|
|
@comment node-name, next, previous, up
|
|
@chapter Custom Allocation
|
|
@cindex Custom allocation
|
|
@cindex Memory allocation
|
|
@cindex Allocation of memory
|
|
|
|
By default MPIR uses @code{malloc}, @code{realloc} and @code{free} for memory
|
|
allocation, and if they fail MPIR prints a message to the standard error output
|
|
and terminates the program.
|
|
|
|
Alternate functions can be specified, to allocate memory in a different way or
|
|
to have a different error action on running out of memory.
|
|
|
|
This feature is available in the Berkeley compatibility library (@pxref{BSD
|
|
Compatible Functions}) as well as the main MPIR library.
|
|
|
|
@deftypefun void mp_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (*@var{free_func_ptr}) (void *, size_t))
|
|
Replace the current allocation functions from the arguments. If an argument
|
|
is @code{NULL}, the corresponding default function is used.
|
|
|
|
These functions will be used for all memory allocation done by MPIR, apart from
|
|
temporary space from @code{alloca} if that function is available and MPIR is
|
|
configured to use it (@pxref{Build Options}).
|
|
|
|
@strong{Be sure to call @code{mp_set_memory_functions} only when there are no
|
|
active MPIR objects allocated using the previous memory functions! Usually
|
|
that means calling it before any other MPIR function.}
|
|
@end deftypefun
|
|
|
|
The functions supplied should fit the following declarations:
|
|
|
|
@deftypevr Function {void *} allocate_function (size_t @var{alloc_size})
|
|
Return a pointer to newly allocated space with at least @var{alloc_size}
|
|
bytes.
|
|
@end deftypevr
|
|
|
|
@deftypevr Function {void *} reallocate_function (void *@var{ptr}, size_t @var{old_size}, size_t @var{new_size})
|
|
Resize a previously allocated block @var{ptr} of @var{old_size} bytes to be
|
|
@var{new_size} bytes.
|
|
|
|
The block may be moved if necessary or if desired, and in that case the
|
|
smaller of @var{old_size} and @var{new_size} bytes must be copied to the new
|
|
location. The return value is a pointer to the resized block, that being the
|
|
new location if moved or just @var{ptr} if not.
|
|
|
|
@var{ptr} is never @code{NULL}, it's always a previously allocated block.
|
|
@var{new_size} may be bigger or smaller than @var{old_size}.
|
|
@end deftypevr
|
|
|
|
@deftypevr Function void free_function (void *@var{ptr}, size_t @var{size})
|
|
De-allocate the space pointed to by @var{ptr}.
|
|
|
|
@var{ptr} is never @code{NULL}, it's always a previously allocated block of
|
|
@var{size} bytes.
|
|
@end deftypevr
|
|
|
|
A @dfn{byte} here means the unit used by the @code{sizeof} operator.
|
|
|
|
The @var{old_size} parameters to @var{reallocate_function} and
|
|
@var{free_function} are passed for convenience, but of course can be ignored
|
|
if not needed. The default functions using @code{malloc} and friends for
|
|
instance don't use them.
|
|
|
|
No error return is allowed from any of these functions, if they return then
|
|
they must have performed the specified operation. In particular note that
|
|
@var{allocate_function} or @var{reallocate_function} mustn't return
|
|
@code{NULL}.
|
|
|
|
Getting a different fatal error action is a good use for custom allocation
|
|
functions, for example giving a graphical dialog rather than the default print
|
|
to @code{stderr}. How much is possible when genuinely out of memory is
|
|
another question though.
|
|
|
|
There's currently no defined way for the allocation functions to recover from
|
|
an error such as out of memory, they must terminate program execution. A
|
|
@code{longjmp} or throwing a C++ exception will have undefined results. This
|
|
may change in the future.
|
|
|
|
MPIR may use allocated blocks to hold pointers to other allocated blocks. This
|
|
will limit the assumptions a conservative garbage collection scheme can make.
|
|
|
|
Since the default MPIR allocation uses @code{malloc} and friends, those
|
|
functions will be linked in even if the first thing a program does is an
|
|
@code{mp_set_memory_functions}. It's necessary to change the MPIR sources if
|
|
this is a problem.
|
|
|
|
@sp 1
|
|
@deftypefun void mp_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (**@var{free_func_ptr}) (void *, size_t))
|
|
Get the current allocation functions, storing function pointers to the
|
|
locations given by the arguments. If an argument is @code{NULL}, that
|
|
function pointer is not stored.
|
|
|
|
@need 1000
|
|
For example, to get just the current free function,
|
|
|
|
@example
|
|
void (*freefunc) (void *, size_t);
|
|
|
|
mp_get_memory_functions (NULL, NULL, &freefunc);
|
|
@end example
|
|
@end deftypefun
|
|
|
|
@node Language Bindings, Algorithms, Custom Allocation, Top
|
|
@chapter Language Bindings
|
|
@cindex Language bindings
|
|
@cindex Other languages
|
|
|
|
The following packages and projects offer access to MPIR from languages other
|
|
than C, though perhaps with varying levels of functionality and efficiency.
|
|
|
|
@c @spaceuref{U} is the same as @uref{U}, but with a couple of extra spaces
|
|
@c in tex, just to separate the URL from the preceding text a bit.
|
|
@iftex
|
|
@macro spaceuref {U}
|
|
@ @ @uref{\U\}
|
|
@end macro
|
|
@end iftex
|
|
@ifnottex
|
|
@macro spaceuref {U}
|
|
@uref{\U\}
|
|
@end macro
|
|
@end ifnottex
|
|
|
|
@sp 1
|
|
@table @asis
|
|
@item C++
|
|
@itemize @bullet
|
|
@item
|
|
MPIR C++ class interface, @pxref{C++ Class Interface} @* Straightforward
|
|
interface, expression templates to eliminate temporaries.
|
|
@item
|
|
ALP @spaceuref{http://www-sop.inria.fr/saga/logiciels/ALP/} @* Linear algebra and
|
|
polynomials using templates.
|
|
@item
|
|
Arithmos @spaceuref{http://www.win.ua.ac.be/~cant/arithmos/} @* Rationals
|
|
with infinities and square roots.
|
|
@item
|
|
CLN @spaceuref{http://www.ginac.de/CLN/} @* High level classes for arithmetic.
|
|
@item
|
|
LiDIA @spaceuref{http://www.informatik.tu-darmstadt.de/TI/LiDIA/} @* A C++
|
|
library for computational number theory.
|
|
@item
|
|
Linbox @spaceuref{http://www.linalg.org/} @* Sparse vectors and matrices.
|
|
@item
|
|
NTL @spaceuref{http://www.shoup.net/ntl/} @* A C++ number theory library.
|
|
@end itemize
|
|
|
|
@item D
|
|
@itemize @bullet
|
|
@item
|
|
gmp-d @spaceuref{http://home.comcast.net/~benhinkle/gmp-d}
|
|
@end itemize
|
|
|
|
@item Fortran
|
|
@itemize @bullet
|
|
@item
|
|
Omni F77 @spaceuref{http://phase.hpcc.jp/Omni/home.html} @* Arbitrary
|
|
precision floats.
|
|
@end itemize
|
|
|
|
@item Haskell
|
|
@itemize @bullet
|
|
@item
|
|
Glasgow Haskell Compiler @spaceuref{http://www.haskell.org/ghc/}
|
|
@end itemize
|
|
|
|
@item Java
|
|
@itemize @bullet
|
|
@item
|
|
Kaffe @spaceuref{http://www.kaffe.org/}
|
|
@item
|
|
Kissme @spaceuref{http://kissme.sourceforge.net/}
|
|
@end itemize
|
|
|
|
@item Lisp
|
|
@itemize @bullet
|
|
@item
|
|
GNU Common Lisp @spaceuref{http://www.gnu.org/software/gcl/gcl.html}
|
|
@item
|
|
Librep @spaceuref{http://librep.sourceforge.net/}
|
|
@item
|
|
@c FIXME: When there's a stable release with gmp support, just refer to it
|
|
@c rather than bothering to talk about betas.
|
|
XEmacs (21.5.18 beta and up) @spaceuref{http://www.xemacs.org} @* Optional
|
|
big integers, rationals and floats using MPIR.
|
|
@end itemize
|
|
|
|
@item M4
|
|
@itemize @bullet
|
|
@item
|
|
@c FIXME: When there's a stable release with gmp support, just refer to it
|
|
@c rather than bothering to talk about betas.
|
|
GNU m4 betas @spaceuref{http://www.seindal.dk/rene/gnu/} @* Optionally provides
|
|
an arbitrary precision @code{mpeval}.
|
|
@end itemize
|
|
|
|
@item ML
|
|
@itemize @bullet
|
|
@item
|
|
MLton compiler @spaceuref{http://mlton.org/}
|
|
@end itemize
|
|
|
|
@item Objective Caml
|
|
@itemize @bullet
|
|
@item
|
|
MLGMP @spaceuref{http://www.di.ens.fr/~monniaux/programmes.html.en}
|
|
@item
|
|
Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* Optionally using
|
|
GMP.
|
|
@end itemize
|
|
|
|
@item Oz
|
|
@itemize @bullet
|
|
@item
|
|
Mozart @spaceuref{http://www.mozart-oz.org/}
|
|
@end itemize
|
|
|
|
@item Pascal
|
|
@itemize @bullet
|
|
@item
|
|
GNU Pascal Compiler @spaceuref{http://www.gnu-pascal.de/} @* GMP unit.
|
|
@item
|
|
Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* For Free Pascal,
|
|
optionally using GMP.
|
|
@end itemize
|
|
|
|
@item Perl
|
|
@itemize @bullet
|
|
@item
|
|
GMP module, see @file{demos/perl} in the MPIR sources (@pxref{Demonstration
|
|
Programs}).
|
|
@item
|
|
Math::GMP @spaceuref{http://www.cpan.org/} @* Compatible with Math::BigInt, but
|
|
not as many functions as the GMP module above.
|
|
@item
|
|
Math::BigInt::GMP @spaceuref{http://www.cpan.org/} @* Plug Math::GMP into
|
|
normal Math::BigInt operations.
|
|
@end itemize
|
|
|
|
@need 1000
|
|
@item Pike
|
|
@itemize @bullet
|
|
@item
|
|
mpz module in the standard distribution, @uref{http://pike.ida.liu.se/}
|
|
@end itemize
|
|
|
|
@need 500
|
|
@item Prolog
|
|
@itemize @bullet
|
|
@item
|
|
SWI Prolog @spaceuref{http://www.swi-prolog.org/} @*
|
|
Arbitrary precision floats.
|
|
@end itemize
|
|
|
|
@item Python
|
|
@itemize @bullet
|
|
@item
|
|
mpz module in the standard distribution, @uref{http://www.python.org/}
|
|
@item
|
|
GMPY @uref{http://gmpy.sourceforge.net/}
|
|
@end itemize
|
|
|
|
@item Scheme
|
|
@itemize @bullet
|
|
@item
|
|
GNU Guile (upcoming 1.8) @spaceuref{http://www.gnu.org/software/guile/guile.html}
|
|
@item
|
|
RScheme @spaceuref{http://www.rscheme.org/}
|
|
@item
|
|
STklos @spaceuref{http://www.stklos.org/}
|
|
@c
|
|
@c For reference, MzScheme uses some of gmp, but (as of version 205) it only
|
|
@c has copies of some of the generic C code, and we don't consider that a
|
|
@c language binding to gmp.
|
|
@c
|
|
@end itemize
|
|
|
|
@item Smalltalk
|
|
@itemize @bullet
|
|
@item
|
|
GNU Smalltalk @spaceuref{http://www.smalltalk.org/versions/GNUSmalltalk.html}
|
|
@end itemize
|
|
|
|
@item Other
|
|
@itemize @bullet
|
|
@item
|
|
Axiom @uref{http://savannah.nongnu.org/projects/axiom} @* Computer algebra
|
|
using GCL.
|
|
@item
|
|
DrGenius @spaceuref{http://drgenius.seul.org/} @* Geometry system and
|
|
mathematical programming language.
|
|
@item
|
|
GiNaC @spaceuref{http://www.ginac.de/} @* C++ computer algebra using CLN.
|
|
@item
|
|
GOO @spaceuref{http://www.googoogaga.org/} @* Dynamic object oriented
|
|
language.
|
|
@item
|
|
Maxima @uref{http://www.ma.utexas.edu/users/wfs/maxima.html} @* Macsyma
|
|
computer algebra using GCL.
|
|
@item
|
|
Q @spaceuref{http://q-lang.sourceforge.net/} @* Equational programming system.
|
|
@item
|
|
Regina @spaceuref{http://regina.sourceforge.net/} @* Topological calculator.
|
|
@item
|
|
Yacas @spaceuref{http://www.xs4all.nl/~apinkus/yacas.html} @* Yet another
|
|
computer algebra system.
|
|
@end itemize
|
|
|
|
@end table
|
|
|
|
|
|
@node Algorithms, Internals, Language Bindings, Top
|
|
@chapter Algorithms
|
|
@cindex Algorithms
|
|
|
|
This chapter is an introduction to some of the algorithms used for various MPIR
|
|
operations. The code is likely to be hard to understand without knowing
|
|
something about the algorithms.
|
|
|
|
Some MPIR internals are mentioned, but applications that expect to be
|
|
compatible with future MPIR releases should take care to use only the
|
|
documented functions.
|
|
|
|
@menu
|
|
* Multiplication Algorithms::
|
|
* Division Algorithms::
|
|
* Greatest Common Divisor Algorithms::
|
|
* Powering Algorithms::
|
|
* Root Extraction Algorithms::
|
|
* Radix Conversion Algorithms::
|
|
* Other Algorithms::
|
|
* Assembler Coding::
|
|
@end menu
|
|
|
|
|
|
@node Multiplication Algorithms, Division Algorithms, Algorithms, Algorithms
|
|
@section Multiplication
|
|
@cindex Multiplication algorithms
|
|
|
|
N@cross{}N limb multiplications and squares are done using one of four
|
|
algorithms, as the size N increases.
|
|
|
|
@quotation
|
|
@multitable {KaratsubaMMM} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item Algorithm @tab Threshold
|
|
@item Basecase @tab (none)
|
|
@item Karatsuba @tab @code{MUL_KARATSUBA_THRESHOLD}
|
|
@item Toom-3 @tab @code{MUL_TOOM3_THRESHOLD}
|
|
@item FFT @tab @code{MUL_FFT_THRESHOLD}
|
|
@end multitable
|
|
@end quotation
|
|
|
|
Similarly for squaring, with the @code{SQR} thresholds.
|
|
|
|
N@cross{}M multiplications of operands with different sizes above
|
|
@code{MUL_KARATSUBA_THRESHOLD} are currently done by splitting into M@cross{}M
|
|
pieces. The Karatsuba and Toom-3 routines then operate only on equal size
|
|
operands. This is not very efficient, and is slated for improvement in the
|
|
future.
|
|
|
|
@menu
|
|
* Basecase Multiplication::
|
|
* Karatsuba Multiplication::
|
|
* Toom 3-Way Multiplication::
|
|
* FFT Multiplication::
|
|
* Other Multiplication::
|
|
@end menu
|
|
|
|
|
|
@node Basecase Multiplication, Karatsuba Multiplication, Multiplication Algorithms, Multiplication Algorithms
|
|
@subsection Basecase Multiplication
|
|
|
|
Basecase N@cross{}M multiplication is a straightforward rectangular set of
|
|
cross-products, the same as long multiplication done by hand and for that
|
|
reason sometimes known as the schoolbook or grammar school method. This is an
|
|
@m{O(NM),O(N*M)} algorithm. See Knuth section 4.3.1 algorithm M
|
|
(@pxref{References}), and the @file{mpn/generic/mul_basecase.c} code.
|
|
|
|
Assembler implementations of @code{mpn_mul_basecase} are essentially the same
|
|
as the generic C code, but have all the usual assembler tricks and
|
|
obscurities introduced for speed.
|
|
|
|
A square can be done in roughly half the time of a multiply, by using the fact
|
|
that the cross products above and below the diagonal are the same. A triangle
|
|
of products below the diagonal is formed, doubled (left shift by one bit), and
|
|
then the products on the diagonal added. This can be seen in
|
|
@file{mpn/generic/sqr_basecase.c}. Again the assembler implementations take
|
|
essentially the same approach.
|
|
|
|
@tex
|
|
\def\GMPline#1#2#3#4#5#6{%
|
|
\hbox {%
|
|
\vrule height 2.5ex depth 1ex
|
|
\hbox to 2em {\hfil{#2}\hfil}%
|
|
\vrule \hbox to 2em {\hfil{#3}\hfil}%
|
|
\vrule \hbox to 2em {\hfil{#4}\hfil}%
|
|
\vrule \hbox to 2em {\hfil{#5}\hfil}%
|
|
\vrule \hbox to 2em {\hfil{#6}\hfil}%
|
|
\vrule}}
|
|
\GMPdisplay{
|
|
\hbox{%
|
|
\vbox{%
|
|
\hbox to 1.5em {\vrule height 2.5ex depth 1ex width 0pt}%
|
|
\hbox {\vrule height 2.5ex depth 1ex width 0pt u0\hfil}%
|
|
\hbox {\vrule height 2.5ex depth 1ex width 0pt u1\hfil}%
|
|
\hbox {\vrule height 2.5ex depth 1ex width 0pt u2\hfil}%
|
|
\hbox {\vrule height 2.5ex depth 1ex width 0pt u3\hfil}%
|
|
\hbox {\vrule height 2.5ex depth 1ex width 0pt u4\hfil}%
|
|
\vfill}%
|
|
\vbox{%
|
|
\hbox{%
|
|
\hbox to 2em {\hfil u0\hfil}%
|
|
\hbox to 2em {\hfil u1\hfil}%
|
|
\hbox to 2em {\hfil u2\hfil}%
|
|
\hbox to 2em {\hfil u3\hfil}%
|
|
\hbox to 2em {\hfil u4\hfil}}%
|
|
\vskip 0.7ex
|
|
\hrule
|
|
\GMPline{u0}{d}{}{}{}{}%
|
|
\hrule
|
|
\GMPline{u1}{}{d}{}{}{}%
|
|
\hrule
|
|
\GMPline{u2}{}{}{d}{}{}%
|
|
\hrule
|
|
\GMPline{u3}{}{}{}{d}{}%
|
|
\hrule
|
|
\GMPline{u4}{}{}{}{}{d}%
|
|
\hrule}}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
u0 u1 u2 u3 u4
|
|
+---+---+---+---+---+
|
|
u0 | d | | | | |
|
|
+---+---+---+---+---+
|
|
u1 | | d | | | |
|
|
+---+---+---+---+---+
|
|
u2 | | | d | | |
|
|
+---+---+---+---+---+
|
|
u3 | | | | d | |
|
|
+---+---+---+---+---+
|
|
u4 | | | | | d |
|
|
+---+---+---+---+---+
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
In practice squaring isn't a full 2@cross{} faster than multiplying, it's
|
|
usually around 1.5@cross{}. Less than 1.5@cross{} probably indicates
|
|
@code{mpn_sqr_basecase} wants improving on that CPU.
|
|
|
|
On some CPUs @code{mpn_mul_basecase} can be faster than the generic C
|
|
@code{mpn_sqr_basecase} on some small sizes. @code{SQR_BASECASE_THRESHOLD} is
|
|
the size at which to use @code{mpn_sqr_basecase}, this will be zero if that
|
|
routine should be used always.
|
|
|
|
|
|
@node Karatsuba Multiplication, Toom 3-Way Multiplication, Basecase Multiplication, Multiplication Algorithms
|
|
@subsection Karatsuba Multiplication
|
|
@cindex Karatsuba multiplication
|
|
|
|
The Karatsuba multiplication algorithm is described in Knuth section 4.3.3
|
|
part A, and various other textbooks. A brief description is given here.
|
|
|
|
The inputs @math{x} and @math{y} are treated as each split into two parts of
|
|
equal length (or the most significant part one limb shorter if N is odd).
|
|
|
|
@tex
|
|
% GMPboxwidth used for all the multiplication pictures
|
|
\global\newdimen\GMPboxwidth \global\GMPboxwidth=5em
|
|
% GMPboxdepth and GMPboxheight are also used for the float pictures
|
|
\global\newdimen\GMPboxdepth \global\GMPboxdepth=1ex
|
|
\global\newdimen\GMPboxheight \global\GMPboxheight=2ex
|
|
\gdef\GMPvrule{\vrule height \GMPboxheight depth \GMPboxdepth}
|
|
\def\GMPbox#1#2{%
|
|
\vbox {%
|
|
\hrule
|
|
\hbox to 2\GMPboxwidth{%
|
|
\GMPvrule \hfil $#1$\hfil \vrule \hfil $#2$\hfil \vrule}%
|
|
\hrule}}
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\hbox to 2\GMPboxwidth {high \hfil low}
|
|
\vskip 0.7ex
|
|
\GMPbox{x_1}{x_0}
|
|
\vskip 0.5ex
|
|
\GMPbox{y_1}{y_0}
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
high low
|
|
+----------+----------+
|
|
| x1 | x0 |
|
|
+----------+----------+
|
|
|
|
+----------+----------+
|
|
| y1 | y0 |
|
|
+----------+----------+
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
Let @math{b} be the power of 2 where the split occurs, ie.@: if @ms{x,0} is
|
|
@math{k} limbs (@ms{y,0} the same) then
|
|
@m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}.
|
|
With that @m{x=x_1b+x_0,x=x1*b+x0} and @m{y=y_1b+y_0,y=y1*b+y0}, and the
|
|
following holds,
|
|
|
|
@display
|
|
@m{xy = (b^2+b)x_1y_1 - b(x_1-x_0)(y_1-y_0) + (b+1)x_0y_0,
|
|
x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0}
|
|
@end display
|
|
|
|
This formula means doing only three multiplies of (N/2)@cross{}(N/2) limbs,
|
|
whereas a basecase multiply of N@cross{}N limbs is equivalent to four
|
|
multiplies of (N/2)@cross{}(N/2). The factors @math{(b^2+b)} etc represent
|
|
the positions where the three products must be added.
|
|
|
|
@tex
|
|
\def\GMPboxA#1#2{%
|
|
\vbox{%
|
|
\hrule
|
|
\hbox{%
|
|
\GMPvrule
|
|
\hbox to 2\GMPboxwidth {\hfil\hbox{$#1$}\hfil}%
|
|
\vrule
|
|
\hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%
|
|
\vrule}
|
|
\hrule}}
|
|
\def\GMPboxB#1#2{%
|
|
\hbox{%
|
|
\raise \GMPboxdepth \hbox to \GMPboxwidth {\hfil #1\hskip 0.5em}%
|
|
\vbox{%
|
|
\hrule
|
|
\hbox{%
|
|
\GMPvrule
|
|
\hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%
|
|
\vrule}%
|
|
\hrule}}}
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\hbox to 4\GMPboxwidth {high \hfil low}
|
|
\vskip 0.7ex
|
|
\GMPboxA{x_1y_1}{x_0y_0}
|
|
\vskip 0.5ex
|
|
\GMPboxB{$+$}{x_1y_1}
|
|
\vskip 0.5ex
|
|
\GMPboxB{$+$}{x_0y_0}
|
|
\vskip 0.5ex
|
|
\GMPboxB{$-$}{(x_1-x_0)(y_1-y_0)}
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
high low
|
|
+--------+--------+ +--------+--------+
|
|
| x1*y1 | | x0*y0 |
|
|
+--------+--------+ +--------+--------+
|
|
+--------+--------+
|
|
add | x1*y1 |
|
|
+--------+--------+
|
|
+--------+--------+
|
|
add | x0*y0 |
|
|
+--------+--------+
|
|
+--------+--------+
|
|
sub | (x1-x0)*(y1-y0) |
|
|
+--------+--------+
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
The term @m{(x_1-x_0)(y_1-y_0),(x1-x0)*(y1-y0)} is best calculated as an
|
|
absolute value, and the sign used to choose to add or subtract. Notice the
|
|
sum @m{\mathop{\rm high}(x_0y_0)+\mathop{\rm low}(x_1y_1),
|
|
high(x0*y0)+low(x1*y1)} occurs twice, so it's possible to do @m{5k,5*k} limb
|
|
additions, rather than @m{6k,6*k}, but in MPIR extra function call overheads
|
|
outweigh the saving.
|
|
|
|
Squaring is similar to multiplying, but with @math{x=y} the formula reduces to
|
|
an equivalent with three squares,
|
|
|
|
@display
|
|
@m{x^2 = (b^2+b)x_1^2 - b(x_1-x_0)^2 + (b+1)x_0^2,
|
|
x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2}
|
|
@end display
|
|
|
|
The final result is accumulated from those three squares the same way as for
|
|
the three multiplies above. The middle term @m{(x_1-x_0)^2,(x1-x0)^2} is now
|
|
always positive.
|
|
|
|
A similar formula for both multiplying and squaring can be constructed with a
|
|
middle term @m{(x_1+x_0)(y_1+y_0),(x1+x0)*(y1+y0)}. But those sums can exceed
|
|
@math{k} limbs, leading to more carry handling and additions than the form
|
|
above.
|
|
|
|
Karatsuba multiplication is asymptotically an @math{O(N^@W{1.585})} algorithm,
|
|
the exponent being @m{\log3/\log2,log(3)/log(2)}, representing 3 multiplies
|
|
each @math{1/2} the size of the inputs. This is a big improvement over the
|
|
basecase multiply at @math{O(N^2)} and the advantage soon overcomes the extra
|
|
additions Karatsuba performs. @code{MUL_KARATSUBA_THRESHOLD} can be as little
|
|
as 10 limbs. The @code{SQR} threshold is usually about twice the @code{MUL}.
|
|
|
|
The basecase algorithm will take a time of the form @m{M(N) = aN^2 + bN + c,
|
|
M(N) = a*N^2 + b*N + c} and the Karatsuba algorithm @m{K(N) = 3M(N/2) + dN +
|
|
e, K(N) = 3*M(N/2) + d*N + e}, which expands to @m{K(N) = {3\over4} aN^2 +
|
|
{3\over2} bN + 3c + dN + e, K(N) = 3/4*a*N^2 + 3/2*b*N + 3*c + d*N + e}. The
|
|
factor @m{3\over4, 3/4} for @math{a} means per-crossproduct speedups in the
|
|
basecase code will increase the threshold since they benefit @math{M(N)} more
|
|
than @math{K(N)}. And conversely the @m{3\over2, 3/2} for @math{b} means
|
|
linear style speedups of @math{b} will increase the threshold since they
|
|
benefit @math{K(N)} more than @math{M(N)}. The latter can be seen for
|
|
instance when adding an optimized @code{mpn_sqr_diagonal} to
|
|
@code{mpn_sqr_basecase}. Of course all speedups reduce total time, and in
|
|
that sense the algorithm thresholds are merely of academic interest.
|
|
|
|
|
|
@node Toom 3-Way Multiplication, FFT Multiplication, Karatsuba Multiplication, Multiplication Algorithms
|
|
@subsection Toom 3-Way Multiplication
|
|
@cindex Toom multiplication
|
|
|
|
The Karatsuba formula is the simplest case of a general approach to splitting
|
|
inputs that leads to both Toom and FFT algorithms. A description of
|
|
Toom can be found in Knuth section 4.3.3, with an example 3-way
|
|
calculation after Theorem A@. The 3-way form used in MPIR is described here.
|
|
|
|
The operands are each considered split into 3 pieces of equal length (or the
|
|
most significant part 1 or 2 limbs shorter than the other two).
|
|
|
|
@tex
|
|
\def\GMPbox#1#2#3{%
|
|
\vbox{%
|
|
\hrule \vfil
|
|
\hbox to 3\GMPboxwidth {%
|
|
\GMPvrule
|
|
\hfil$#1$\hfil
|
|
\vrule
|
|
\hfil$#2$\hfil
|
|
\vrule
|
|
\hfil$#3$\hfil
|
|
\vrule}%
|
|
\vfil \hrule
|
|
}}
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\hbox to 3\GMPboxwidth {high \hfil low}
|
|
\vskip 0.7ex
|
|
\GMPbox{x_2}{x_1}{x_0}
|
|
\vskip 0.5ex
|
|
\GMPbox{y_2}{y_1}{y_0}
|
|
\vskip 0.5ex
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
high low
|
|
+----------+----------+----------+
|
|
| x2 | x1 | x0 |
|
|
+----------+----------+----------+
|
|
|
|
+----------+----------+----------+
|
|
| y2 | y1 | y0 |
|
|
+----------+----------+----------+
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
@noindent
|
|
These parts are treated as the coefficients of two polynomials
|
|
|
|
@display
|
|
@group
|
|
@m{X(t) = x_2t^2 + x_1t + x_0,
|
|
X(t) = x2*t^2 + x1*t + x0}
|
|
@m{Y(t) = y_2t^2 + y_1t + y_0,
|
|
Y(t) = y2*t^2 + y1*t + y0}
|
|
@end group
|
|
@end display
|
|
|
|
Let @math{b} equal the power of 2 which is the size of the @ms{x,0}, @ms{x,1},
|
|
@ms{y,0} and @ms{y,1} pieces, ie.@: if they're @math{k} limbs each then
|
|
@m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}.
|
|
With this @math{x=X(b)} and @math{y=Y(b)}.
|
|
|
|
Let a polynomial @m{W(t)=X(t)Y(t),W(t)=X(t)*Y(t)} and suppose its coefficients
|
|
are
|
|
|
|
@display
|
|
@m{W(t) = w_4t^4 + w_3t^3 + w_2t^2 + w_1t + w_0,
|
|
W(t) = w4*t^4 + w3*t^3 + w2*t^2 + w1*t + w0}
|
|
@end display
|
|
|
|
The @m{w_i,w[i]} are going to be determined, and when they are they'll give
|
|
the final result using @math{w=W(b)}, since
|
|
@m{xy=X(b)Y(b),x*y=X(b)*Y(b)=W(b)}. The coefficients will be roughly
|
|
@math{b^2} each, and the final @math{W(b)} will be an addition like,
|
|
|
|
@tex
|
|
\def\GMPbox#1#2{%
|
|
\moveright #1\GMPboxwidth
|
|
\vbox{%
|
|
\hrule
|
|
\hbox{%
|
|
\GMPvrule
|
|
\hbox to 2\GMPboxwidth {\hfil$#2$\hfil}%
|
|
\vrule}%
|
|
\hrule
|
|
}}
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\hbox to 6\GMPboxwidth {high \hfil low}%
|
|
\vskip 0.7ex
|
|
\GMPbox{0}{w_4}
|
|
\vskip 0.5ex
|
|
\GMPbox{1}{w_3}
|
|
\vskip 0.5ex
|
|
\GMPbox{2}{w_2}
|
|
\vskip 0.5ex
|
|
\GMPbox{3}{w_1}
|
|
\vskip 0.5ex
|
|
\GMPbox{4}{w_1}
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
high low
|
|
+-------+-------+
|
|
| w4 |
|
|
+-------+-------+
|
|
+--------+-------+
|
|
| w3 |
|
|
+--------+-------+
|
|
+--------+-------+
|
|
| w2 |
|
|
+--------+-------+
|
|
+--------+-------+
|
|
| w1 |
|
|
+--------+-------+
|
|
+-------+-------+
|
|
| w0 |
|
|
+-------+-------+
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
The @m{w_i,w[i]} coefficients could be formed by a simple set of cross
|
|
products, like @m{w_4=x_2y_2,w4=x2*y2}, @m{w_3=x_2y_1+x_1y_2,w3=x2*y1+x1*y2},
|
|
@m{w_2=x_2y_0+x_1y_1+x_0y_2,w2=x2*y0+x1*y1+x0*y2} etc, but this would need all
|
|
nine @m{x_iy_j,x[i]*y[j]} for @math{i,j=0,1,2}, and would be equivalent merely
|
|
to a basecase multiply. Instead the following approach is used.
|
|
|
|
@math{X(t)} and @math{Y(t)} are evaluated and multiplied at 5 points, giving
|
|
values of @math{W(t)} at those points. In MPIR the following points are used,
|
|
|
|
@quotation
|
|
@multitable {@m{t=\infty,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
|
|
@item Point @tab Value
|
|
@item @math{t=0} @tab @m{x_0y_0,x0 * y0}, which gives @ms{w,0} immediately
|
|
@item @math{t=1} @tab @m{(x_2+x_1+x_0)(y_2+y_1+y_0),(x2+x1+x0) * (y2+y1+y0)}
|
|
@item @math{t=-1} @tab @m{(x_2-x_1+x_0)(y_2-y_1+y_0),(x2-x1+x0) * (y2-y1+y0)}
|
|
@item @math{t=2} @tab @m{(4x_2+2x_1+x_0)(4y_2+2y_1+y_0),(4*x2+2*x1+x0) * (4*y2+2*y1+y0)}
|
|
@item @m{t=\infty,t=inf} @tab @m{x_2y_2,x2 * y2}, which gives @ms{w,4} immediately
|
|
@end multitable
|
|
@end quotation
|
|
|
|
At @math{t=-1} the values can be negative and that's handled using the
|
|
absolute values and tracking the sign separately. At @m{t=\infty,t=inf} the
|
|
value is actually @m{\lim_{t\to\infty} {X(t)Y(t)\over t^4}, X(t)*Y(t)/t^4 in
|
|
the limit as t approaches infinity}, but it's much easier to think of as
|
|
simply @m{x_2y_2,x2*y2} giving @ms{w,4} immediately (much like
|
|
@m{x_0y_0,x0*y0} at @math{t=0} gives @ms{w,0} immediately).
|
|
|
|
Each of the points substituted into
|
|
@m{W(t)=w_4t^4+\cdots+w_0,W(t)=w4*t^4+@dots{}+w0} gives a linear combination
|
|
of the @m{w_i,w[i]} coefficients, and the value of those combinations has just
|
|
been calculated.
|
|
|
|
@tex
|
|
\GMPdisplay{%
|
|
$\matrix{%
|
|
W(0) & = & & & & & & & & & w_0 \cr
|
|
W(1) & = & w_4 & + & w_3 & + & w_2 & + & w_1 & + & w_0 \cr
|
|
W(-1) & = & w_4 & - & w_3 & + & w_2 & - & w_1 & + & w_0 \cr
|
|
W(2) & = & 16w_4 & + & 8w_3 & + & 4w_2 & + & 2w_1 & + & w_0 \cr
|
|
W(\infty) & = & w_4 \cr
|
|
}$}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
W(0) = w0
|
|
W(1) = w4 + w3 + w2 + w1 + w0
|
|
W(-1) = w4 - w3 + w2 - w1 + w0
|
|
W(2) = 16*w4 + 8*w3 + 4*w2 + 2*w1 + w0
|
|
W(inf) = w4
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
This is a set of five equations in five unknowns, and some elementary linear
|
|
algebra quickly isolates each @m{w_i,w[i]}. This involves adding or
|
|
subtracting one @math{W(t)} value from another, and a couple of divisions by
|
|
powers of 2 and one division by 3, the latter using the special
|
|
@code{mpn_divexact_by3} (@pxref{Exact Division}).
|
|
|
|
The conversion of @math{W(t)} values to the coefficients is interpolation. A
|
|
polynomial of degree 4 like @math{W(t)} is uniquely determined by values known
|
|
at 5 different points. The points are arbitrary and can be chosen to make the
|
|
linear equations come out with a convenient set of steps for quickly isolating
|
|
the @m{w_i,w[i]}.
|
|
|
|
Squaring follows the same procedure as multiplication, but there's only one
|
|
@math{X(t)} and it's evaluated at the 5 points, and those values squared to
|
|
give values of @math{W(t)}. The interpolation is then identical, and in fact
|
|
the same @code{toom3_interpolate} subroutine is used for both squaring and
|
|
multiplying.
|
|
|
|
Toom-3 is asymptotically @math{O(N^@W{1.465})}, the exponent being
|
|
@m{\log5/\log3,log(5)/log(3)}, representing 5 recursive multiplies of 1/3 the
|
|
original size each. This is an improvement over Karatsuba at
|
|
@math{O(N^@W{1.585})}, though Toom does more work in the evaluation and
|
|
interpolation and so it only realizes its advantage above a certain size.
|
|
|
|
Near the crossover between Toom-3 and Karatsuba there's generally a range of
|
|
sizes where the difference between the two is small.
|
|
@code{MUL_TOOM3_THRESHOLD} is a somewhat arbitrary point in that range and
|
|
successive runs of the tune program can give different values due to small
|
|
variations in measuring. A graph of time versus size for the two shows the
|
|
effect, see @file{tune/README}.
|
|
|
|
At the fairly small sizes where the Toom-3 thresholds occur it's worth
|
|
remembering that the asymptotic behaviour for Karatsuba and Toom-3 can't be
|
|
expected to make accurate predictions, due of course to the big influence of
|
|
all sorts of overheads, and the fact that only a few recursions of each are
|
|
being performed. Even at large sizes there's a good chance machine dependent
|
|
effects like cache architecture will mean actual performance deviates from
|
|
what might be predicted.
|
|
|
|
The formula given for the Karatsuba algorithm (@pxref{Karatsuba
|
|
Multiplication}) has an equivalent for Toom-3 involving only five multiplies,
|
|
but this would be complicated and unenlightening.
|
|
|
|
An alternate view of Toom-3 can be found in Zuras (@pxref{References}), using
|
|
a vector to represent the @math{x} and @math{y} splits and a matrix
|
|
multiplication for the evaluation and interpolation stages. The matrix
|
|
inverses are not meant to be actually used, and they have elements with values
|
|
much greater than in fact arise in the interpolation steps. The diagram shown
|
|
for the 3-way is attractive, but again doesn't have to be implemented that way
|
|
and for example with a bit of rearrangement just one division by 6 can be
|
|
done.
|
|
|
|
|
|
@node FFT Multiplication, Other Multiplication, Toom 3-Way Multiplication, Multiplication Algorithms
|
|
@subsection FFT Multiplication
|
|
@cindex FFT multiplication
|
|
@cindex Fast Fourier Transform
|
|
|
|
At large to very large sizes a Fermat style FFT multiplication is used,
|
|
following Sch@"onhage and Strassen (@pxref{References}). Descriptions of FFTs
|
|
in various forms can be found in many textbooks, for instance Knuth section
|
|
4.3.3 part C or Lipson chapter IX@. A brief description of the form used in
|
|
MPIR is given here.
|
|
|
|
The multiplication done is @m{xy \bmod 2^N+1, x*y mod 2^N+1}, for a given
|
|
@math{N}. A full product @m{xy,x*y} is obtained by choosing @m{N \ge
|
|
\mathop{\rm bits}(x)+\mathop{\rm bits}(y), N>=bits(x)+bits(y)} and padding
|
|
@math{x} and @math{y} with high zero limbs. The modular product is the native
|
|
form for the algorithm, so padding to get a full product is unavoidable.
|
|
|
|
The algorithm follows a split, evaluate, pointwise multiply, interpolate and
|
|
combine similar to that described above for Karatsuba and Toom-3. A @math{k}
|
|
parameter controls the split, with an FFT-@math{k} splitting into @math{2^k}
|
|
pieces of @math{M=N/2^k} bits each. @math{N} must be a multiple of
|
|
@m{2^k\times@code{mp\_bits\_per\_limb}, (2^k)*@nicode{mp_bits_per_limb}} so
|
|
the split falls on limb boundaries, avoiding bit shifts in the split and
|
|
combine stages.
|
|
|
|
The evaluations, pointwise multiplications, and interpolation, are all done
|
|
modulo @m{2^{N'}+1, 2^N'+1} where @math{N'} is @math{2M+k+3} rounded up to a
|
|
multiple of @math{2^k} and of @code{mp_bits_per_limb}. The results of
|
|
interpolation will be the following negacyclic convolution of the input
|
|
pieces, and the choice of @math{N'} ensures these sums aren't truncated.
|
|
@tex
|
|
$$ w_n = \sum_{{i+j = b2^k+n}\atop{b=0,1}} (-1)^b x_i y_j $$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
---
|
|
\ b
|
|
w[n] = / (-1) * x[i] * y[j]
|
|
---
|
|
i+j==b*2^k+n
|
|
b=0,1
|
|
@end example
|
|
|
|
@end ifnottex
|
|
The points used for the evaluation are @math{g^i} for @math{i=0} to
|
|
@math{2^k-1} where @m{g=2^{2N'/2^k}, g=2^(2N'/2^k)}. @math{g} is a
|
|
@m{2^k,2^k'}th root of unity mod @m{2^{N'}+1,2^N'+1}, which produces necessary
|
|
cancellations at the interpolation stage, and it's also a power of 2 so the
|
|
fast fourier transforms used for the evaluation and interpolation do only
|
|
shifts, adds and negations.
|
|
|
|
The pointwise multiplications are done modulo @m{2^{N'}+1, 2^N'+1} and either
|
|
recurse into a further FFT or use a plain multiplication (Toom-3, Karatsuba or
|
|
basecase), whichever is optimal at the size @math{N'}. The interpolation is
|
|
an inverse fast fourier transform. The resulting set of sums of @m{x_iy_j,
|
|
x[i]*y[j]} are added at appropriate offsets to give the final result.
|
|
|
|
Squaring is the same, but @math{x} is the only input so it's one transform at
|
|
the evaluate stage and the pointwise multiplies are squares. The
|
|
interpolation is the same.
|
|
|
|
For a mod @math{2^N+1} product, an FFT-@math{k} is an @m{O(N^{k/(k-1)}),
|
|
O(N^(k/(k-1)))} algorithm, the exponent representing @math{2^k} recursed
|
|
modular multiplies each @m{1/2^{k-1},1/2^(k-1)} the size of the original.
|
|
Each successive @math{k} is an asymptotic improvement, but overheads mean each
|
|
is only faster at bigger and bigger sizes. In the code, @code{MUL_FFT_TABLE}
|
|
and @code{SQR_FFT_TABLE} are the thresholds where each @math{k} is used. Each
|
|
new @math{k} effectively swaps some multiplying for some shifts, adds and
|
|
overheads.
|
|
|
|
A mod @math{2^N+1} product can be formed with a normal
|
|
@math{N@cross{}N@rightarrow{}2N} bit multiply plus a subtraction, so an FFT
|
|
and Toom-3 etc can be compared directly. A @math{k=4} FFT at
|
|
@math{O(N^@W{1.333})} can be expected to be the first faster than Toom-3 at
|
|
@math{O(N^@W{1.465})}. In practice this is what's found, with
|
|
@code{MUL_FFT_MODF_THRESHOLD} and @code{SQR_FFT_MODF_THRESHOLD} being between
|
|
300 and 1000 limbs, depending on the CPU@. So far it's been found that only
|
|
very large FFTs recurse into pointwise multiplies above these sizes.
|
|
|
|
When an FFT is to give a full product, the change of @math{N} to @math{2N}
|
|
doesn't alter the theoretical complexity for a given @math{k}, but for the
|
|
purposes of considering where an FFT might be first used it can be assumed
|
|
that the FFT is recursing into a normal multiply and that on that basis it's
|
|
doing @math{2^k} recursed multiplies each @m{1/2^{k-2},1/2^(k-2)} the size of
|
|
the inputs, making it @m{O(N^{k/(k-2)}), O(N^(k/(k-2)))}. This would mean
|
|
@math{k=7} at @math{O(N^@W{1.4})} would be the first FFT faster than Toom-3.
|
|
In practice @code{MUL_FFT_THRESHOLD} and @code{SQR_FFT_THRESHOLD} have been
|
|
found to be in the @math{k=8} range, somewhere between 3000 and 10000 limbs.
|
|
|
|
The way @math{N} is split into @math{2^k} pieces and then @math{2M+k+3} is
|
|
rounded up to a multiple of @math{2^k} and @code{mp_bits_per_limb} means that
|
|
when @math{2^k@ge{}@nicode{mp\_bits\_per\_limb}} the effective @math{N} is a
|
|
multiple of @m{2^{2k-1},2^(2k-1)} bits. The @math{+k+3} means some values of
|
|
@math{N} just under such a multiple will be rounded to the next. The
|
|
complexity calculations above assume that a favourable size is used, meaning
|
|
one which isn't padded through rounding, and it's also assumed that the extra
|
|
@math{+k+3} bits are negligible at typical FFT sizes.
|
|
|
|
The practical effect of the @m{2^{2k-1},2^(2k-1)} constraint is to introduce a
|
|
step-effect into measured speeds. For example @math{k=8} will round @math{N}
|
|
up to a multiple of 32768 bits, so for a 32-bit limb there'll be 512 limb
|
|
groups of sizes for which @code{mpn_mul_n} runs at the same speed. Or for
|
|
@math{k=9} groups of 2048 limbs, @math{k=10} groups of 8192 limbs, etc. In
|
|
practice it's been found each @math{k} is used at quite small multiples of its
|
|
size constraint and so the step effect is quite noticeable in a time versus
|
|
size graph.
|
|
|
|
The threshold determinations currently measure at the mid-points of size
|
|
steps, but this is sub-optimal since at the start of a new step it can happen
|
|
that it's better to go back to the previous @math{k} for a while. Something
|
|
more sophisticated for @code{MUL_FFT_TABLE} and @code{SQR_FFT_TABLE} will be
|
|
needed.
|
|
|
|
|
|
@node Other Multiplication, , FFT Multiplication, Multiplication Algorithms
|
|
@subsection Other Multiplication
|
|
@cindex Toom multiplication
|
|
|
|
The 3-way Toom algorithm described above (@pxref{Toom 3-Way
|
|
Multiplication}) generalizes to split into an arbitrary number of pieces, as
|
|
per Knuth section 4.3.3 algorithm C@. This is not currently used, though it's
|
|
possible a Toom-4 might fit in between Toom-3 and the FFTs. The notes here
|
|
are merely for interest.
|
|
|
|
In general a split into @math{r+1} pieces is made, and evaluations and
|
|
pointwise multiplications done at @m{2r+1,2*r+1} points. A 4-way split does 7
|
|
pointwise multiplies, 5-way does 9, etc. Asymptotically an @math{(r+1)}-way
|
|
algorithm is @m{O(N^{log(2r+1)/log(r+1)}, O(N^(log(2*r+1)/log(r+1)))}. Only
|
|
the pointwise multiplications count towards big-@math{O} complexity, but the
|
|
time spent in the evaluate and interpolate stages grows with @math{r} and has
|
|
a significant practical impact, with the asymptotic advantage of each @math{r}
|
|
realized only at bigger and bigger sizes. The overheads grow as
|
|
@m{O(Nr),O(N*r)}, whereas in an @math{r=2^k} FFT they grow only as @m{O(N \log
|
|
r), O(N*log(r))}.
|
|
|
|
Knuth algorithm C evaluates at points 0,1,2,@dots{},@m{2r,2*r}, but exercise 4
|
|
uses @math{-r},@dots{},0,@dots{},@math{r} and the latter saves some small
|
|
multiplies in the evaluate stage (or rather trades them for additions), and
|
|
has a further saving of nearly half the interpolate steps. The idea is to
|
|
separate odd and even final coefficients and then perform algorithm C steps C7
|
|
and C8 on them separately. The divisors at step C7 become @math{j^2} and the
|
|
multipliers at C8 become @m{2tj-j^2,2*t*j-j^2}.
|
|
|
|
Splitting odd and even parts through positive and negative points can be
|
|
thought of as using @math{-1} as a square root of unity. If a 4th root of
|
|
unity was available then a further split and speedup would be possible, but no
|
|
such root exists for plain integers. Going to complex integers with
|
|
@m{i=\sqrt{-1}, i=sqrt(-1)} doesn't help, essentially because in cartesian
|
|
form it takes three real multiplies to do a complex multiply. The existence
|
|
of @m{2^k,2^k'}th roots of unity in a suitable ring or field lets the fast
|
|
fourier transform keep splitting and get to @m{O(N \log r), O(N*log(r))}.
|
|
|
|
Floating point FFTs use complex numbers approximating Nth roots of unity.
|
|
Some processors have special support for such FFTs. But these are not used in
|
|
MPIR since it's very difficult to guarantee an exact result (to some number of
|
|
bits). An occasional difference of 1 in the last bit might not matter to a
|
|
typical signal processing algorithm, but is of course of vital importance to
|
|
MPIR.
|
|
|
|
|
|
@node Division Algorithms, Greatest Common Divisor Algorithms, Multiplication Algorithms, Algorithms
|
|
@section Division Algorithms
|
|
@cindex Division algorithms
|
|
|
|
@menu
|
|
* Single Limb Division::
|
|
* Basecase Division::
|
|
* Divide and Conquer Division::
|
|
* Exact Division::
|
|
* Exact Remainder::
|
|
* Small Quotient Division::
|
|
@end menu
|
|
|
|
|
|
@node Single Limb Division, Basecase Division, Division Algorithms, Division Algorithms
|
|
@subsection Single Limb Division
|
|
|
|
N@cross{}1 division is implemented using repeated 2@cross{}1 divisions from
|
|
high to low, either with a hardware divide instruction or a multiplication by
|
|
inverse, whichever is best on a given CPU.
|
|
|
|
The multiply by inverse follows section 8 of ``Division by Invariant Integers
|
|
using Multiplication'' by Granlund and Montgomery (@pxref{References}) and is
|
|
implemented as @code{udiv_qrnnd_preinv} in @file{gmp-impl.h}. The idea is to
|
|
have a fixed-point approximation to @math{1/d} (see @code{invert_limb}) and
|
|
then multiply by the high limb (plus one bit) of the dividend to get a
|
|
quotient @math{q}. With @math{d} normalized (high bit set), @math{q} is no
|
|
more than 1 too small. Subtracting @m{qd,q*d} from the dividend gives a
|
|
remainder, and reveals whether @math{q} or @math{q-1} is correct.
|
|
|
|
The result is a division done with two multiplications and four or five
|
|
arithmetic operations. On CPUs with low latency multipliers this can be much
|
|
faster than a hardware divide, though the cost of calculating the inverse at
|
|
the start may mean it's only better on inputs bigger than say 4 or 5 limbs.
|
|
|
|
When a divisor must be normalized, either for the generic C
|
|
@code{__udiv_qrnnd_c} or the multiply by inverse, the division performed is
|
|
actually @m{a2^k,a*2^k} by @m{d2^k,d*2^k} where @math{a} is the dividend and
|
|
@math{k} is the power necessary to have the high bit of @m{d2^k,d*2^k} set.
|
|
The bit shifts for the dividend are usually accomplished ``on the fly''
|
|
meaning by extracting the appropriate bits at each step. Done this way the
|
|
quotient limbs come out aligned ready to store. When only the remainder is
|
|
wanted, an alternative is to take the dividend limbs unshifted and calculate
|
|
@m{r = a \bmod d2^k, r = a mod d*2^k} followed by an extra final step @m{r2^k
|
|
\bmod d2^k, r*2^k mod d*2^k}. This can help on CPUs with poor bit shifts or
|
|
few registers.
|
|
|
|
The multiply by inverse can be done two limbs at a time. The calculation is
|
|
basically the same, but the inverse is two limbs and the divisor treated as if
|
|
padded with a low zero limb. This means more work, since the inverse will
|
|
need a 2@cross{}2 multiply, but the four 1@cross{}1s to do that are
|
|
independent and can therefore be done partly or wholly in parallel. Likewise
|
|
for a 2@cross{}1 calculating @m{qd,q*d}. The net effect is to process two
|
|
limbs with roughly the same two multiplies worth of latency that one limb at a
|
|
time gives. This extends to 3 or 4 limbs at a time, though the extra work to
|
|
apply the inverse will almost certainly soon reach the limits of multiplier
|
|
throughput.
|
|
|
|
A similar approach in reverse can be taken to process just half a limb at a
|
|
time if the divisor is only a half limb. In this case the 1@cross{}1 multiply
|
|
for the inverse effectively becomes two @m{{1\over2}\times1, (1/2)x1} for each
|
|
limb, which can be a saving on CPUs with a fast half limb multiply, or in fact
|
|
if the only multiply is a half limb, and especially if it's not pipelined.
|
|
|
|
|
|
@node Basecase Division, Divide and Conquer Division, Single Limb Division, Division Algorithms
|
|
@subsection Basecase Division
|
|
|
|
Basecase N@cross{}M division is like long division done by hand, but in base
|
|
@m{2\GMPraise{@code{mp\_bits\_per\_limb}}, 2^mp_bits_per_limb}. See Knuth
|
|
section 4.3.1 algorithm D, and @file{mpn/generic/sb_divrem_mn.c}.
|
|
|
|
Briefly stated, while the dividend remains larger than the divisor, a high
|
|
quotient limb is formed and the N@cross{}1 product @m{qd,q*d} subtracted at
|
|
the top end of the dividend. With a normalized divisor (most significant bit
|
|
set), each quotient limb can be formed with a 2@cross{}1 division and a
|
|
1@cross{}1 multiplication plus some subtractions. The 2@cross{}1 division is
|
|
by the high limb of the divisor and is done either with a hardware divide or a
|
|
multiply by inverse (the same as in @ref{Single Limb Division}) whichever is
|
|
faster. Such a quotient is sometimes one too big, requiring an addback of the
|
|
divisor, but that happens rarely.
|
|
|
|
With Q=N@minus{}M being the number of quotient limbs, this is an
|
|
@m{O(QM),O(Q*M)} algorithm and will run at a speed similar to a basecase
|
|
Q@cross{}M multiplication, differing in fact only in the extra multiply and
|
|
divide for each of the Q quotient limbs.
|
|
|
|
|
|
@node Divide and Conquer Division, Exact Division, Basecase Division, Division Algorithms
|
|
@subsection Divide and Conquer Division
|
|
|
|
For divisors larger than @code{DIV_DC_THRESHOLD}, division is done by dividing.
|
|
Or to be precise by a recursive divide and conquer algorithm based on work by
|
|
Moenck and Borodin, Jebelean, and Burnikel and Ziegler (@pxref{References}).
|
|
|
|
The algorithm consists essentially of recognising that a 2N@cross{}N division
|
|
can be done with the basecase division algorithm (@pxref{Basecase Division}),
|
|
but using N/2 limbs as a base, not just a single limb. This way the
|
|
multiplications that arise are (N/2)@cross{}(N/2) and can take advantage of
|
|
Karatsuba and higher multiplication algorithms (@pxref{Multiplication
|
|
Algorithms}). The two ``digits'' of the quotient are formed by recursive
|
|
N@cross{}(N/2) divisions.
|
|
|
|
If the (N/2)@cross{}(N/2) multiplies are done with a basecase multiplication
|
|
then the work is about the same as a basecase division, but with more function
|
|
call overheads and with some subtractions separated from the multiplies.
|
|
These overheads mean that it's only when N/2 is above
|
|
@code{MUL_KARATSUBA_THRESHOLD} that divide and conquer is of use.
|
|
|
|
@code{DIV_DC_THRESHOLD} is based on the divisor size N, so it will be somewhere
|
|
above twice @code{MUL_KARATSUBA_THRESHOLD}, but how much above depends on the
|
|
CPU@. An optimized @code{mpn_mul_basecase} can lower @code{DIV_DC_THRESHOLD} a
|
|
little by offering a ready-made advantage over repeated @code{mpn_submul_1}
|
|
calls.
|
|
|
|
Divide and conquer is asymptotically @m{O(M(N)\log N),O(M(N)*log(N))} where
|
|
@math{M(N)} is the time for an N@cross{}N multiplication done with FFTs. The
|
|
actual time is a sum over multiplications of the recursed sizes, as can be
|
|
seen near the end of section 2.2 of Burnikel and Ziegler. For example, within
|
|
the Toom-3 range, divide and conquer is @m{2.63M(N), 2.63*M(N)}. With higher
|
|
algorithms the @math{M(N)} term improves and the multiplier tends to @m{\log
|
|
N, log(N)}. In practice, at moderate to large sizes, a 2N@cross{}N division
|
|
is about 2 to 4 times slower than an N@cross{}N multiplication.
|
|
|
|
Newton's method used for division is asymptotically @math{O(M(N))} and should
|
|
therefore be superior to divide and conquer, but it's believed this would only
|
|
be for large to very large N.
|
|
|
|
|
|
@node Exact Division, Exact Remainder, Divide and Conquer Division, Division Algorithms
|
|
@subsection Exact Division
|
|
|
|
A so-called exact division is when the dividend is known to be an exact
|
|
multiple of the divisor. Jebelean's exact division algorithm uses this
|
|
knowledge to make some significant optimizations (@pxref{References}).
|
|
|
|
The idea can be illustrated in decimal for example with 368154 divided by
|
|
543. Because the low digit of the dividend is 4, the low digit of the
|
|
quotient must be 8. This is arrived at from @m{4 \mathord{\times} 7 \bmod 10,
|
|
4*7 mod 10}, using the fact 7 is the modular inverse of 3 (the low digit of
|
|
the divisor), since @m{3 \mathord{\times} 7 \mathop{\equiv} 1 \bmod 10, 3*7
|
|
@equiv{} 1 mod 10}. So @m{8\mathord{\times}543 = 4344,8*543=4344} can be
|
|
subtracted from the dividend leaving 363810. Notice the low digit has become
|
|
zero.
|
|
|
|
The procedure is repeated at the second digit, with the next quotient digit 7
|
|
(@m{1 \mathord{\times} 7 \bmod 10, 7 @equiv{} 1*7 mod 10}), subtracting
|
|
@m{7\mathord{\times}543 = 3801,7*543=3801}, leaving 325800. And finally at
|
|
the third digit with quotient digit 6 (@m{8 \mathord{\times} 7 \bmod 10, 8*7
|
|
mod 10}), subtracting @m{6\mathord{\times}543 = 3258,6*543=3258} leaving 0.
|
|
So the quotient is 678.
|
|
|
|
Notice however that the multiplies and subtractions don't need to extend past
|
|
the low three digits of the dividend, since that's enough to determine the
|
|
three quotient digits. For the last quotient digit no subtraction is needed
|
|
at all. On a 2N@cross{}N division like this one, only about half the work of
|
|
a normal basecase division is necessary.
|
|
|
|
For an N@cross{}M exact division producing Q=N@minus{}M quotient limbs, the
|
|
saving over a normal basecase division is in two parts. Firstly, each of the
|
|
Q quotient limbs needs only one multiply, not a 2@cross{}1 divide and
|
|
multiply. Secondly, the crossproducts are reduced when @math{Q>M} to
|
|
@m{QM-M(M+1)/2,Q*M-M*(M+1)/2}, or when @math{Q@le{}M} to @m{Q(Q-1)/2,
|
|
Q*(Q-1)/2}. Notice the savings are complementary. If Q is big then many
|
|
divisions are saved, or if Q is small then the crossproducts reduce to a small
|
|
number.
|
|
|
|
The modular inverse used is calculated efficiently by @code{modlimb_invert} in
|
|
@file{gmp-impl.h}. This does four multiplies for a 32-bit limb, or six for a
|
|
64-bit limb. @file{tune/modlinv.c} has some alternate implementations that
|
|
might suit processors better at bit twiddling than multiplying.
|
|
|
|
The sub-quadratic exact division described by Jebelean in ``Exact Division
|
|
with Karatsuba Complexity'' is not currently implemented. It uses a
|
|
rearrangement similar to the divide and conquer for normal division
|
|
(@pxref{Divide and Conquer Division}), but operating from low to high. A
|
|
further possibility not currently implemented is ``Bidirectional Exact Integer
|
|
Division'' by Krandick and Jebelean which forms quotient limbs from both the
|
|
high and low ends of the dividend, and can halve once more the number of
|
|
crossproducts needed in a 2N@cross{}N division.
|
|
|
|
A special case exact division by 3 exists in @code{mpn_divexact_by3},
|
|
supporting Toom-3 multiplication and @code{mpq} canonicalizations. It forms
|
|
quotient digits with a multiply by the modular inverse of 3 (which is
|
|
@code{0xAA..AAB}) and uses two comparisons to determine a borrow for the next
|
|
limb. The multiplications don't need to be on the dependent chain, as long as
|
|
the effect of the borrows is applied, which can help chips with pipelined
|
|
multipliers.
|
|
|
|
|
|
@node Exact Remainder, Small Quotient Division, Exact Division, Division Algorithms
|
|
@subsection Exact Remainder
|
|
@cindex Exact remainder
|
|
|
|
If the exact division algorithm is done with a full subtraction at each stage
|
|
and the dividend isn't a multiple of the divisor, then low zero limbs are
|
|
produced but with a remainder in the high limbs. For dividend @math{a},
|
|
divisor @math{d}, quotient @math{q}, and @m{b = 2
|
|
\GMPraise{@code{mp\_bits\_per\_limb}}, b = 2^mp_bits_per_limb}, this remainder
|
|
@math{r} is of the form
|
|
@tex
|
|
$$ a = qd + r b^n $$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
a = q*d + r*b^n
|
|
@end example
|
|
|
|
@end ifnottex
|
|
@math{n} represents the number of zero limbs produced by the subtractions,
|
|
that being the number of limbs produced for @math{q}. @math{r} will be in the
|
|
range @math{0@le{}r<d} and can be viewed as a remainder, but one shifted up by
|
|
a factor of @math{b^n}.
|
|
|
|
Carrying out full subtractions at each stage means the same number of cross
|
|
products must be done as a normal division, but there's still some single limb
|
|
divisions saved. When @math{d} is a single limb some simplifications arise,
|
|
providing good speedups on a number of processors.
|
|
|
|
@code{mpn_bdivmod}, @code{mpn_divexact_by3}, @code{mpn_modexact_1_odd} and the
|
|
@code{redc} function in @code{mpz_powm} differ subtly in how they return
|
|
@math{r}, leading to some negations in the above formula, but all are
|
|
essentially the same.
|
|
|
|
@cindex Divisibility algorithm
|
|
@cindex Congruence algorithm
|
|
Clearly @math{r} is zero when @math{a} is a multiple of @math{d}, and this
|
|
leads to divisibility or congruence tests which are potentially more efficient
|
|
than a normal division.
|
|
|
|
The factor of @math{b^n} on @math{r} can be ignored in a GCD when @math{d} is
|
|
odd, hence the use of @code{mpn_bdivmod} in @code{mpn_gcd}, and the use of
|
|
@code{mpn_modexact_1_odd} by @code{mpn_gcd_1} and @code{mpz_kronecker_ui} etc
|
|
(@pxref{Greatest Common Divisor Algorithms}).
|
|
|
|
Montgomery's REDC method for modular multiplications uses operands of the form
|
|
of @m{xb^{-n}, x*b^-n} and @m{yb^{-n}, y*b^-n} and on calculating @m{(xb^{-n})
|
|
(yb^{-n}), (x*b^-n)*(y*b^-n)} uses the factor of @math{b^n} in the exact
|
|
remainder to reach a product in the same form @m{(xy)b^{-n}, (x*y)*b^-n}
|
|
(@pxref{Modular Powering Algorithm}).
|
|
|
|
Notice that @math{r} generally gives no useful information about the ordinary
|
|
remainder @math{a @bmod d} since @math{b^n @bmod d} could be anything. If
|
|
however @math{b^n @equiv{} 1 @bmod d}, then @math{r} is the negative of the
|
|
ordinary remainder. This occurs whenever @math{d} is a factor of
|
|
@math{b^n-1}, as for example with 3 in @code{mpn_divexact_by3}. For a 32 or
|
|
64 bit limb other such factors include 5, 17 and 257, but no particular use
|
|
has been found for this.
|
|
|
|
|
|
@node Small Quotient Division, , Exact Remainder, Division Algorithms
|
|
@subsection Small Quotient Division
|
|
|
|
An N@cross{}M division where the number of quotient limbs Q=N@minus{}M is
|
|
small can be optimized somewhat.
|
|
|
|
An ordinary basecase division normalizes the divisor by shifting it to make
|
|
the high bit set, shifting the dividend accordingly, and shifting the
|
|
remainder back down at the end of the calculation. This is wasteful if only a
|
|
few quotient limbs are to be formed. Instead a division of just the top
|
|
@m{\rm2Q,2*Q} limbs of the dividend by the top Q limbs of the divisor can be
|
|
used to form a trial quotient. This requires only those limbs normalized, not
|
|
the whole of the divisor and dividend.
|
|
|
|
A multiply and subtract then applies the trial quotient to the M@minus{}Q
|
|
unused limbs of the divisor and N@minus{}Q dividend limbs (which includes Q
|
|
limbs remaining from the trial quotient division). The starting trial
|
|
quotient can be 1 or 2 too big, but all cases of 2 too big and most cases of 1
|
|
too big are detected by first comparing the most significant limbs that will
|
|
arise from the subtraction. An addback is done if the quotient still turns
|
|
out to be 1 too big.
|
|
|
|
This whole procedure is essentially the same as one step of the basecase
|
|
algorithm done in a Q limb base, though with the trial quotient test done only
|
|
with the high limbs, not an entire Q limb ``digit'' product. The correctness
|
|
of this weaker test can be established by following the argument of Knuth
|
|
section 4.3.1 exercise 20 but with the @m{v_2 \GMPhat q > b \GMPhat r
|
|
+ u_2, v2*q>b*r+u2} condition appropriately relaxed.
|
|
|
|
|
|
@need 1000
|
|
@node Greatest Common Divisor Algorithms, Powering Algorithms, Division Algorithms, Algorithms
|
|
@section Greatest Common Divisor
|
|
@cindex Greatest common divisor algorithms
|
|
@cindex GCD algorithms
|
|
|
|
@menu
|
|
* Binary GCD::
|
|
* Accelerated GCD::
|
|
* Extended GCD::
|
|
* Jacobi Symbol::
|
|
@end menu
|
|
|
|
|
|
@node Binary GCD, Accelerated GCD, Greatest Common Divisor Algorithms, Greatest Common Divisor Algorithms
|
|
@subsection Binary GCD
|
|
|
|
At small sizes MPIR uses an @math{O(N^2)} binary style GCD@. This is described
|
|
in many textbooks, for example Knuth section 4.5.2 algorithm B@. It simply
|
|
consists of successively reducing odd operands @math{a} and @math{b} using
|
|
|
|
@quotation
|
|
@math{a,b = @abs{}(a-b),@min{}(a,b)} @*
|
|
strip factors of 2 from @math{a}
|
|
@end quotation
|
|
|
|
The Euclidean GCD algorithm, as per Knuth algorithms E and A, reduces using
|
|
@math{a @bmod b} but this has so far been found to be slower everywhere. One
|
|
reason the binary method does well is that the implied quotient at each step
|
|
is usually small, so often only one or two subtractions are needed to get the
|
|
same effect as a division. Quotients 1, 2 and 3 for example occur 67.7% of
|
|
the time, see Knuth section 4.5.3 Theorem E.
|
|
|
|
When the implied quotient is large, meaning @math{b} is much smaller than
|
|
@math{a}, then a division is worthwhile. This is the basis for the initial
|
|
@math{a @bmod b} reductions in @code{mpn_gcd} and @code{mpn_gcd_1} (the latter
|
|
for both N@cross{}1 and 1@cross{}1 cases). But after that initial reduction,
|
|
big quotients occur too rarely to make it worth checking for them.
|
|
|
|
@sp 1
|
|
The final @math{1@cross{}1} GCD in @code{mpn_gcd_1} is done in the generic C
|
|
code as described above. For two N-bit operands, the algorithm takes about
|
|
0.68 iterations per bit. For optimum performance some attention needs to be
|
|
paid to the way the factors of 2 are stripped from @math{a}.
|
|
|
|
Firstly it may be noted that in twos complement the number of low zero bits on
|
|
@math{a-b} is the same as @math{b-a}, so counting or testing can begin on
|
|
@math{a-b} without waiting for @math{@abs{}(a-b)} to be determined.
|
|
|
|
A loop stripping low zero bits tends not to branch predict well, since the
|
|
condition is data dependent. But on average there's only a few low zeros, so
|
|
an option is to strip one or two bits arithmetically then loop for more (as
|
|
done for AMD K6). Or use a lookup table to get a count for several bits then
|
|
loop for more (as done for AMD K7). An alternative approach is to keep just
|
|
one of @math{a} or @math{b} odd and iterate
|
|
|
|
@quotation
|
|
@math{a,b = @abs{}(a-b), @min{}(a,b)} @*
|
|
@math{a = a/2} if even @*
|
|
@math{b = b/2} if even
|
|
@end quotation
|
|
|
|
This requires about 1.25 iterations per bit, but stripping of a single bit at
|
|
each step avoids any branching. Repeating the bit strip reduces to about 0.9
|
|
iterations per bit, which may be a worthwhile tradeoff.
|
|
|
|
Generally with the above approaches a speed of perhaps 6 cycles per bit can be
|
|
achieved, which is still not terribly fast with for instance a 64-bit GCD
|
|
taking nearly 400 cycles. It's this sort of time which means it's not usually
|
|
advantageous to combine a set of divisibility tests into a GCD.
|
|
|
|
|
|
@node Accelerated GCD, Extended GCD, Binary GCD, Greatest Common Divisor Algorithms
|
|
@subsection Accelerated GCD
|
|
|
|
For sizes above @code{GCD_ACCEL_THRESHOLD}, MPIR uses the Accelerated GCD
|
|
algorithm described independently by Weber and Jebelean (the latter as the
|
|
``Generalized Binary'' algorithm), @pxref{References}. This algorithm is
|
|
still @math{O(N^2)}, but is much faster than the binary algorithm since it
|
|
does fewer multi-precision operations. It consists of alternating the
|
|
@math{k}-ary reduction by Sorenson, and a ``dmod'' exact remainder reduction.
|
|
|
|
For operands @math{u} and @math{v} the @math{k}-ary reduction replaces
|
|
@math{u} with @m{nv-du,n*v-d*u} where @math{n} and @math{d} are single limb
|
|
values chosen to give two trailing zero limbs on that value, which can be
|
|
stripped. @math{n} and @math{d} are calculated using an algorithm similar to
|
|
half of a two limb GCD (see @code{find_a} in @file{mpn/generic/gcd.c}).
|
|
|
|
When @math{u} and @math{v} differ in size by more than a certain number of
|
|
bits, a dmod is performed to zero out bits at the low end of the larger. It
|
|
consists of an exact remainder style division applied to an appropriate number
|
|
of bits (@pxref{Exact Division}, and @pxref{Exact Remainder}). This is faster
|
|
than a @math{k}-ary reduction but useful only when the operands differ in
|
|
size. There's a dmod after each @math{k}-ary reduction, and if the dmod
|
|
leaves the operands still differing in size then it's repeated.
|
|
|
|
The @math{k}-ary reduction step can introduce spurious factors into the GCD
|
|
calculated, and these are eliminated at the end by taking GCDs with the
|
|
original inputs @math{@gcd{}(u,@gcd{}(v,g))} using the binary algorithm.
|
|
Since @math{g} is almost always small this takes very little time.
|
|
|
|
At small sizes the algorithm needs a good implementation of @code{find_a}. At
|
|
larger sizes it's dominated by @code{mpn_addmul_1} applying @math{n} and
|
|
@math{d}.
|
|
|
|
|
|
@node Extended GCD, Jacobi Symbol, Accelerated GCD, Greatest Common Divisor Algorithms
|
|
@subsection Extended GCD
|
|
|
|
The extended GCD calculates @math{@gcd{}(a,b)} and also cofactors @math{x} and
|
|
@math{y} satisfying @m{ax+by=\gcd(a@C{}b), a*x+b*y=gcd(a@C{}b)}. Lehmer's
|
|
multi-step improvement of the extended Euclidean algorithm is used. See Knuth
|
|
section 4.5.2 algorithm L, and @file{mpn/generic/gcdext.c}. This is an
|
|
@math{O(N^2)} algorithm.
|
|
|
|
The multipliers at each step are found using single limb calculations for
|
|
sizes up to @code{GCDEXT_THRESHOLD}, or double limb calculations above that.
|
|
The single limb code is faster but doesn't produce full-limb multipliers,
|
|
hence not making full use of the @code{mpn_addmul_1} calls.
|
|
|
|
When a CPU has a data-dependent multiplier, meaning one which is faster on
|
|
operands with fewer bits, the extra work in the double-limb calculation might
|
|
only save some looping overheads, leading to a large @code{GCDEXT_THRESHOLD}.
|
|
|
|
Currently the single limb calculation doesn't optimize for the small quotients
|
|
that often occur, and this can lead to unusually low values of
|
|
@code{GCDEXT_THRESHOLD}, depending on the CPU.
|
|
|
|
An analysis of double-limb calculations can be found in ``A Double-Digit
|
|
Lehmer-Euclid Algorithm'' by Jebelean (@pxref{References}). The code in MPIR
|
|
was developed independently.
|
|
|
|
It should be noted that when a double limb calculation is used, it's used for
|
|
the whole of that GCD, it doesn't fall back to single limb part way through.
|
|
This is because as the algorithm proceeds, the inputs @math{a} and @math{b}
|
|
are reduced, but the cofactors @math{x} and @math{y} grow, so the multipliers
|
|
at each step are applied to a roughly constant total number of limbs.
|
|
|
|
|
|
@node Jacobi Symbol, , Extended GCD, Greatest Common Divisor Algorithms
|
|
@subsection Jacobi Symbol
|
|
@cindex Jacobi symbol algorithm
|
|
|
|
@code{mpz_jacobi} and @code{mpz_kronecker} are currently implemented with a
|
|
simple binary algorithm similar to that described for the GCDs (@pxref{Binary
|
|
GCD}). They're not very fast when both inputs are large. Lehmer's multi-step
|
|
improvement or a binary based multi-step algorithm is likely to be better.
|
|
|
|
When one operand fits a single limb, and that includes @code{mpz_kronecker_ui}
|
|
and friends, an initial reduction is done with either @code{mpn_mod_1} or
|
|
@code{mpn_modexact_1_odd}, followed by the binary algorithm on a single limb.
|
|
The binary algorithm is well suited to a single limb, and the whole
|
|
calculation in this case is quite efficient.
|
|
|
|
In all the routines sign changes for the result are accumulated using some bit
|
|
twiddling, avoiding table lookups or conditional jumps.
|
|
|
|
|
|
@need 1000
|
|
@node Powering Algorithms, Root Extraction Algorithms, Greatest Common Divisor Algorithms, Algorithms
|
|
@section Powering Algorithms
|
|
@cindex Powering algorithms
|
|
|
|
@menu
|
|
* Normal Powering Algorithm::
|
|
* Modular Powering Algorithm::
|
|
@end menu
|
|
|
|
|
|
@node Normal Powering Algorithm, Modular Powering Algorithm, Powering Algorithms, Powering Algorithms
|
|
@subsection Normal Powering
|
|
|
|
Normal @code{mpz} or @code{mpf} powering uses a simple binary algorithm,
|
|
successively squaring and then multiplying by the base when a 1 bit is seen in
|
|
the exponent, as per Knuth section 4.6.3. The ``left to right''
|
|
variant described there is used rather than algorithm A, since it's just as
|
|
easy and can be done with somewhat less temporary memory.
|
|
|
|
|
|
@node Modular Powering Algorithm, , Normal Powering Algorithm, Powering Algorithms
|
|
@subsection Modular Powering
|
|
|
|
Modular powering is implemented using a @math{2^k}-ary sliding window
|
|
algorithm, as per ``Handbook of Applied Cryptography'' algorithm 14.85
|
|
(@pxref{References}). @math{k} is chosen according to the size of the
|
|
exponent. Larger exponents use larger values of @math{k}, the choice being
|
|
made to minimize the average number of multiplications that must supplement
|
|
the squaring.
|
|
|
|
The modular multiplies and squares use either a simple division or the REDC
|
|
method by Montgomery (@pxref{References}). REDC is a little faster,
|
|
essentially saving N single limb divisions in a fashion similar to an exact
|
|
remainder (@pxref{Exact Remainder}). The current REDC has some limitations.
|
|
It's only @math{O(N^2)} so above @code{POWM_THRESHOLD} division becomes faster
|
|
and is used. It doesn't attempt to detect small bases, but rather always uses
|
|
a REDC form, which is usually a full size operand. And lastly it's only
|
|
applied to odd moduli.
|
|
|
|
|
|
@node Root Extraction Algorithms, Radix Conversion Algorithms, Powering Algorithms, Algorithms
|
|
@section Root Extraction Algorithms
|
|
@cindex Root extraction algorithms
|
|
|
|
@menu
|
|
* Square Root Algorithm::
|
|
* Nth Root Algorithm::
|
|
* Perfect Square Algorithm::
|
|
* Perfect Power Algorithm::
|
|
@end menu
|
|
|
|
|
|
@node Square Root Algorithm, Nth Root Algorithm, Root Extraction Algorithms, Root Extraction Algorithms
|
|
@subsection Square Root
|
|
@cindex Square root algorithm
|
|
@cindex Karatsuba square root algorithm
|
|
|
|
Square roots are taken using the ``Karatsuba Square Root'' algorithm by Paul
|
|
Zimmermann (@pxref{References}).
|
|
|
|
An input @math{n} is split into four parts of @math{k} bits each, so with
|
|
@math{b=2^k} we have @m{n = a_3b^3 + a_2b^2 + a_1b + a_0, n = a3*b^3 + a2*b^2
|
|
+ a1*b + a0}. Part @ms{a,3} must be ``normalized'' so that either the high or
|
|
second highest bit is set. In MPIR, @math{k} is kept on a limb boundary and
|
|
the input is left shifted (by an even number of bits) to normalize.
|
|
|
|
The square root of the high two parts is taken, by recursive application of
|
|
the algorithm (bottoming out in a one-limb Newton's method),
|
|
@tex
|
|
$$ s',r' = \mathop{\rm sqrtrem} \> (a_3b + a_2) $$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
s1,r1 = sqrtrem (a3*b + a2)
|
|
@end example
|
|
|
|
@end ifnottex
|
|
This is an approximation to the desired root and is extended by a division to
|
|
give @math{s},@math{r},
|
|
@tex
|
|
$$\eqalign{
|
|
q,u &= \mathop{\rm divrem} \> (r'b + a_1, 2s') \cr
|
|
s &= s'b + q \cr
|
|
r &= ub + a_0 - q^2
|
|
}$$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
q,u = divrem (r1*b + a1, 2*s1)
|
|
s = s1*b + q
|
|
r = u*b + a0 - q^2
|
|
@end example
|
|
|
|
@end ifnottex
|
|
The normalization requirement on @ms{a,3} means at this point @math{s} is
|
|
either correct or 1 too big. @math{r} is negative in the latter case, so
|
|
@tex
|
|
$$\eqalign{
|
|
\mathop{\rm if} \; r &< 0 \; \mathop{\rm then} \cr
|
|
r &\leftarrow r + 2s - 1 \cr
|
|
s &\leftarrow s - 1
|
|
}$$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
if r < 0 then
|
|
r = r + 2*s - 1
|
|
s = s - 1
|
|
@end example
|
|
|
|
@end ifnottex
|
|
The algorithm is expressed in a divide and conquer form, but as noted in the
|
|
paper it can also be viewed as a discrete variant of Newton's method, or as a
|
|
variation on the schoolboy method (no longer taught) for square roots two
|
|
digits at a time.
|
|
|
|
If the remainder @math{r} is not required then usually only a few high limbs
|
|
of @math{r} and @math{u} need to be calculated to determine whether an
|
|
adjustment to @math{s} is required. This optimization is not currently
|
|
implemented.
|
|
|
|
In the Karatsuba multiplication range this algorithm is @m{O({3\over2}
|
|
M(N/2)),O(1.5*M(N/2))}, where @math{M(n)} is the time to multiply two numbers
|
|
of @math{n} limbs. In the FFT multiplication range this grows to a bound of
|
|
@m{O(6 M(N/2)),O(6*M(N/2))}. In practice a factor of about 1.5 to 1.8 is
|
|
found in the Karatsuba and Toom-3 ranges, growing to 2 or 3 in the FFT range.
|
|
|
|
The algorithm does all its calculations in integers and the resulting
|
|
@code{mpn_sqrtrem} is used for both @code{mpz_sqrt} and @code{mpf_sqrt}.
|
|
The extended precision given by @code{mpf_sqrt_ui} is obtained by
|
|
padding with zero limbs.
|
|
|
|
|
|
@node Nth Root Algorithm, Perfect Square Algorithm, Square Root Algorithm, Root Extraction Algorithms
|
|
@subsection Nth Root
|
|
@cindex Root extraction algorithm
|
|
@cindex Nth root algorithm
|
|
|
|
Integer Nth roots are taken using Newton's method with the following
|
|
iteration, where @math{A} is the input and @math{n} is the root to be taken.
|
|
@tex
|
|
$$a_{i+1} = {1\over n} \left({A \over a_i^{n-1}} + (n-1)a_i \right)$$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
1 A
|
|
a[i+1] = - * ( --------- + (n-1)*a[i] )
|
|
n a[i]^(n-1)
|
|
@end example
|
|
|
|
@end ifnottex
|
|
The initial approximation @m{a_1,a[1]} is generated bitwise by successively
|
|
powering a trial root with or without new 1 bits, aiming to be just above the
|
|
true root. The iteration converges quadratically when started from a good
|
|
approximation. When @math{n} is large more initial bits are needed to get
|
|
good convergence. The current implementation is not particularly well
|
|
optimized.
|
|
|
|
|
|
@node Perfect Square Algorithm, Perfect Power Algorithm, Nth Root Algorithm, Root Extraction Algorithms
|
|
@subsection Perfect Square
|
|
@cindex Perfect square algorithm
|
|
|
|
A significant fraction of non-squares can be quickly identified by checking
|
|
whether the input is a quadratic residue modulo small integers.
|
|
|
|
@code{mpz_perfect_square_p} first tests the input mod 256, which means just
|
|
examining the low byte. Only 44 different values occur for squares mod 256,
|
|
so 82.8% of inputs can be immediately identified as non-squares.
|
|
|
|
On a 32-bit system similar tests are done mod 9, 5, 7, 13 and 17, for a total
|
|
99.25% of inputs identified as non-squares. On a 64-bit system 97 is tested
|
|
too, for a total 99.62%.
|
|
|
|
These moduli are chosen because they're factors of @math{2^@W{24}-1} (or
|
|
@math{2^@W{48}-1} for 64-bits), and such a remainder can be quickly taken just
|
|
using additions (see @code{mpn_mod_34lsub1}).
|
|
|
|
When nails are in use moduli are instead selected by the @file{gen-psqr.c}
|
|
program and applied with an @code{mpn_mod_1}. The same @math{2^@W{24}-1} or
|
|
@math{2^@W{48}-1} could be done with nails using some extra bit shifts, but
|
|
this is not currently implemented.
|
|
|
|
In any case each modulus is applied to the @code{mpn_mod_34lsub1} or
|
|
@code{mpn_mod_1} remainder and a table lookup identifies non-squares. By
|
|
using a ``modexact'' style calculation, and suitably permuted tables, just one
|
|
multiply each is required, see the code for details. Moduli are also combined
|
|
to save operations, so long as the lookup tables don't become too big.
|
|
@file{gen-psqr.c} does all the pre-calculations.
|
|
|
|
A square root must still be taken for any value that passes these tests, to
|
|
verify it's really a square and not one of the small fraction of non-squares
|
|
that get through (ie.@: a pseudo-square to all the tested bases).
|
|
|
|
Clearly more residue tests could be done, @code{mpz_perfect_square_p} only
|
|
uses a compact and efficient set. Big inputs would probably benefit from more
|
|
residue testing, small inputs might be better off with less. The assumed
|
|
distribution of squares versus non-squares in the input would affect such
|
|
considerations.
|
|
|
|
|
|
@node Perfect Power Algorithm, , Perfect Square Algorithm, Root Extraction Algorithms
|
|
@subsection Perfect Power
|
|
@cindex Perfect power algorithm
|
|
|
|
Detecting perfect powers is required by some factorization algorithms.
|
|
Currently @code{mpz_perfect_power_p} is implemented using repeated Nth root
|
|
extractions, though naturally only prime roots need to be considered.
|
|
(@xref{Nth Root Algorithm}.)
|
|
|
|
If a prime divisor @math{p} with multiplicity @math{e} can be found, then only
|
|
roots which are divisors of @math{e} need to be considered, much reducing the
|
|
work necessary. To this end divisibility by a set of small primes is checked.
|
|
|
|
|
|
@node Radix Conversion Algorithms, Other Algorithms, Root Extraction Algorithms, Algorithms
|
|
@section Radix Conversion
|
|
@cindex Radix conversion algorithms
|
|
|
|
Radix conversions are less important than other algorithms. A program
|
|
dominated by conversions should probably use a different data representation.
|
|
|
|
@menu
|
|
* Binary to Radix::
|
|
* Radix to Binary::
|
|
@end menu
|
|
|
|
|
|
@node Binary to Radix, Radix to Binary, Radix Conversion Algorithms, Radix Conversion Algorithms
|
|
@subsection Binary to Radix
|
|
|
|
Conversions from binary to a power-of-2 radix use a simple and fast
|
|
@math{O(N)} bit extraction algorithm.
|
|
|
|
Conversions from binary to other radices use one of two algorithms. Sizes
|
|
below @code{GET_STR_PRECOMPUTE_THRESHOLD} use a basic @math{O(N^2)} method.
|
|
Repeated divisions by @math{b^n} are made, where @math{b} is the radix and
|
|
@math{n} is the biggest power that fits in a limb. But instead of simply
|
|
using the remainder @math{r} from such divisions, an extra divide step is done
|
|
to give a fractional limb representing @math{r/b^n}. The digits of @math{r}
|
|
can then be extracted using multiplications by @math{b} rather than divisions.
|
|
Special case code is provided for decimal, allowing multiplications by 10 to
|
|
optimize to shifts and adds.
|
|
|
|
Above @code{GET_STR_PRECOMPUTE_THRESHOLD} a sub-quadratic algorithm is used.
|
|
For an input @math{t}, powers @m{b^{n2^i},b^(n*2^i)} of the radix are
|
|
calculated, until a power between @math{t} and @m{\sqrt{t},sqrt(t)} is
|
|
reached. @math{t} is then divided by that largest power, giving a quotient
|
|
which is the digits above that power, and a remainder which is those below.
|
|
These two parts are in turn divided by the second highest power, and so on
|
|
recursively. When a piece has been divided down to less than
|
|
@code{GET_STR_DC_THRESHOLD} limbs, the basecase algorithm described above is
|
|
used.
|
|
|
|
The advantage of this algorithm is that big divisions can make use of the
|
|
sub-quadratic divide and conquer division (@pxref{Divide and Conquer
|
|
Division}), and big divisions tend to have less overheads than lots of
|
|
separate single limb divisions anyway. But in any case the cost of
|
|
calculating the powers @m{b^{n2^i},b^(n*2^i)} must first be overcome.
|
|
|
|
@code{GET_STR_PRECOMPUTE_THRESHOLD} and @code{GET_STR_DC_THRESHOLD} represent
|
|
the same basic thing, the point where it becomes worth doing a big division to
|
|
cut the input in half. @code{GET_STR_PRECOMPUTE_THRESHOLD} includes the cost
|
|
of calculating the radix power required, whereas @code{GET_STR_DC_THRESHOLD}
|
|
assumes that's already available, which is the case when recursing.
|
|
|
|
Since the base case produces digits from least to most significant but they
|
|
want to be stored from most to least, it's necessary to calculate in advance
|
|
how many digits there will be, or at least be sure not to underestimate that.
|
|
For MPIR the number of input bits is multiplied by @code{chars_per_bit_exactly}
|
|
from @code{mp_bases}, rounding up. The result is either correct or one too
|
|
big.
|
|
|
|
Examining some of the high bits of the input could increase the chance of
|
|
getting the exact number of digits, but an exact result every time would not
|
|
be practical, since in general the difference between numbers 100@dots{} and
|
|
99@dots{} is only in the last few bits and the work to identify 99@dots{}
|
|
might well be almost as much as a full conversion.
|
|
|
|
@code{mpf_get_str} doesn't currently use the algorithm described here, it
|
|
multiplies or divides by a power of @math{b} to move the radix point to the
|
|
just above the highest non-zero digit (or at worst one above that location),
|
|
then multiplies by @math{b^n} to bring out digits. This is @math{O(N^2)} and
|
|
is certainly not optimal.
|
|
|
|
The @math{r/b^n} scheme described above for using multiplications to bring out
|
|
digits might be useful for more than a single limb. Some brief experiments
|
|
with it on the base case when recursing didn't give a noticeable improvement,
|
|
but perhaps that was only due to the implementation. Something similar would
|
|
work for the sub-quadratic divisions too, though there would be the cost of
|
|
calculating a bigger radix power.
|
|
|
|
Another possible improvement for the sub-quadratic part would be to arrange
|
|
for radix powers that balanced the sizes of quotient and remainder produced,
|
|
ie.@: the highest power would be an @m{b^{nk},b^(n*k)} approximately equal to
|
|
@m{\sqrt{t},sqrt(t)}, not restricted to a @math{2^i} factor. That ought to
|
|
smooth out a graph of times against sizes, but may or may not be a net
|
|
speedup.
|
|
|
|
|
|
@node Radix to Binary, , Binary to Radix, Radix Conversion Algorithms
|
|
@subsection Radix to Binary
|
|
|
|
Conversions from a power-of-2 radix into binary use a simple and fast
|
|
@math{O(N)} bitwise concatenation algorithm.
|
|
|
|
Conversions from other radices use one of two algorithms. Sizes below
|
|
@code{SET_STR_THRESHOLD} use a basic @math{O(N^2)} method. Groups of @math{n}
|
|
digits are converted to limbs, where @math{n} is the biggest power of the base
|
|
@math{b} which will fit in a limb, then those groups are accumulated into the
|
|
result by multiplying by @math{b^n} and adding. This saves multi-precision
|
|
operations, as per Knuth section 4.4 part E (@pxref{References}). Some
|
|
special case code is provided for decimal, giving the compiler a chance to
|
|
optimize multiplications by 10.
|
|
|
|
Above @code{SET_STR_THRESHOLD} a sub-quadratic algorithm is used. First
|
|
groups of @math{n} digits are converted into limbs. Then adjacent limbs are
|
|
combined into limb pairs with @m{xb^n+y,x*b^n+y}, where @math{x} and @math{y}
|
|
are the limbs. Adjacent limb pairs are combined into quads similarly with
|
|
@m{xb^{2n}+y,x*b^(2n)+y}. This continues until a single block remains, that
|
|
being the result.
|
|
|
|
The advantage of this method is that the multiplications for each @math{x} are
|
|
big blocks, allowing Karatsuba and higher algorithms to be used. But the cost
|
|
of calculating the powers @m{b^{n2^i},b^(n*2^i)} must be overcome.
|
|
@code{SET_STR_THRESHOLD} usually ends up quite big, around 5000 digits, and on
|
|
some processors much bigger still.
|
|
|
|
@code{SET_STR_THRESHOLD} is based on the input digits (and tuned for decimal),
|
|
though it might be better based on a limb count, so as to be independent of
|
|
the base. But that sort of count isn't used by the base case and so would
|
|
need some sort of initial calculation or estimate.
|
|
|
|
The main reason @code{SET_STR_THRESHOLD} is so much bigger than the
|
|
corresponding @code{GET_STR_PRECOMPUTE_THRESHOLD} is that @code{mpn_mul_1} is
|
|
much faster than @code{mpn_divrem_1} (often by a factor of 10, or more).
|
|
|
|
|
|
@need 1000
|
|
@node Other Algorithms, Assembler Coding, Radix Conversion Algorithms, Algorithms
|
|
@section Other Algorithms
|
|
|
|
@menu
|
|
* Prime Testing Algorithm::
|
|
* Factorial Algorithm::
|
|
* Binomial Coefficients Algorithm::
|
|
* Fibonacci Numbers Algorithm::
|
|
* Lucas Numbers Algorithm::
|
|
* Random Number Algorithms::
|
|
@end menu
|
|
|
|
|
|
@node Prime Testing Algorithm, Factorial Algorithm, Other Algorithms, Other Algorithms
|
|
@subsection Prime Testing
|
|
@cindex Prime testing algorithms
|
|
|
|
The primality testing in @code{mpz_probab_prime_p} (@pxref{Number Theoretic
|
|
Functions}) first does some trial division by small factors and then uses the
|
|
Miller-Rabin probabilistic primality testing algorithm, as described in Knuth
|
|
section 4.5.4 algorithm P (@pxref{References}).
|
|
|
|
For an odd input @math{n}, and with @math{n = q@GMPmultiply{}2^k+1} where
|
|
@math{q} is odd, this algorithm selects a random base @math{x} and tests
|
|
whether @math{x^q @bmod{} n} is 1 or @math{-1}, or an @m{x^{q2^j} \bmod n,
|
|
x^(q*2^j) mod n} is @math{1}, for @math{1@le{}j@le{}k}. If so then @math{n}
|
|
is probably prime, if not then @math{n} is definitely composite.
|
|
|
|
Any prime @math{n} will pass the test, but some composites do too. Such
|
|
composites are known as strong pseudoprimes to base @math{x}. No @math{n} is
|
|
a strong pseudoprime to more than @math{1/4} of all bases (see Knuth exercise
|
|
22), hence with @math{x} chosen at random there's no more than a @math{1/4}
|
|
chance a ``probable prime'' will in fact be composite.
|
|
|
|
In fact strong pseudoprimes are quite rare, making the test much more
|
|
powerful than this analysis would suggest, but @math{1/4} is all that's proven
|
|
for an arbitrary @math{n}.
|
|
|
|
|
|
@node Factorial Algorithm, Binomial Coefficients Algorithm, Prime Testing Algorithm, Other Algorithms
|
|
@subsection Factorial
|
|
@cindex Factorial algorithm
|
|
|
|
Factorials are calculated by a combination of removal of twos, powering, and
|
|
binary splitting. The procedure can be best illustrated with an example,
|
|
|
|
@quotation
|
|
@math{23! = 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23}
|
|
@end quotation
|
|
|
|
@noindent
|
|
has factors of two removed,
|
|
|
|
@quotation
|
|
@math{23! = 2^{19}.1.1.3.1.5.3.7.1.9.5.11.3.13.7.15.1.17.9.19.5.21.11.23}
|
|
@end quotation
|
|
|
|
@noindent
|
|
and the resulting terms collected up according to their multiplicity,
|
|
|
|
@quotation
|
|
@math{23! = 2^{19}.(3.5)^3.(7.9.11)^2.(13.15.17.19.21.23)}
|
|
@end quotation
|
|
|
|
Each sequence such as @math{13.15.17.19.21.23} is evaluated by splitting into
|
|
every second term, as for instance @math{(13.17.21).(15.19.23)}, and the same
|
|
recursively on each half. This is implemented iteratively using some bit
|
|
twiddling.
|
|
|
|
Such splitting is more efficient than repeated N@cross{}1 multiplies since it
|
|
forms big multiplies, allowing Karatsuba and higher algorithms to be used.
|
|
And even below the Karatsuba threshold a big block of work can be more
|
|
efficient for the basecase algorithm.
|
|
|
|
Splitting into subsequences of every second term keeps the resulting products
|
|
more nearly equal in size than would the simpler approach of say taking the
|
|
first half and second half of the sequence. Nearly equal products are more
|
|
efficient for the current multiply implementation.
|
|
|
|
|
|
@node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms
|
|
@subsection Binomial Coefficients
|
|
@cindex Binomial coefficient algorithm
|
|
|
|
Binomial coefficients @m{\left({n}\atop{k}\right), C(n@C{}k)} are calculated
|
|
by first arranging @math{k @le{} n/2} using @m{\left({n}\atop{k}\right) =
|
|
\left({n}\atop{n-k}\right), C(n@C{}k) = C(n@C{}n-k)} if necessary, and then
|
|
evaluating the following product simply from @math{i=2} to @math{i=k}.
|
|
@tex
|
|
$$ \left({n}\atop{k}\right) = (n-k+1) \prod_{i=2}^{k} {{n-k+i} \over i} $$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
k (n-k+i)
|
|
C(n,k) = (n-k+1) * prod -------
|
|
i=2 i
|
|
@end example
|
|
|
|
@end ifnottex
|
|
It's easy to show that each denominator @math{i} will divide the product so
|
|
far, so the exact division algorithm is used (@pxref{Exact Division}).
|
|
|
|
The numerators @math{n-k+i} and denominators @math{i} are first accumulated
|
|
into as many fit a limb, to save multi-precision operations, though for
|
|
@code{mpz_bin_ui} this applies only to the divisors, since @math{n} is an
|
|
@code{mpz_t} and @math{n-k+i} in general won't fit in a limb at all.
|
|
|
|
|
|
@node Fibonacci Numbers Algorithm, Lucas Numbers Algorithm, Binomial Coefficients Algorithm, Other Algorithms
|
|
@subsection Fibonacci Numbers
|
|
@cindex Fibonacci number algorithm
|
|
|
|
The Fibonacci functions @code{mpz_fib_ui} and @code{mpz_fib2_ui} are designed
|
|
for calculating isolated @m{F_n,F[n]} or @m{F_n,F[n]},@m{F_{n-1},F[n-1]}
|
|
values efficiently.
|
|
|
|
For small @math{n}, a table of single limb values in @code{__gmp_fib_table} is
|
|
used. On a 32-bit limb this goes up to @m{F_{47},F[47]}, or on a 64-bit limb
|
|
up to @m{F_{93},F[93]}. For convenience the table starts at @m{F_{-1},F[-1]}.
|
|
|
|
Beyond the table, values are generated with a binary powering algorithm,
|
|
calculating a pair @m{F_n,F[n]} and @m{F_{n-1},F[n-1]} working from high to
|
|
low across the bits of @math{n}. The formulas used are
|
|
@tex
|
|
$$\eqalign{
|
|
F_{2k+1} &= 4F_k^2 - F_{k-1}^2 + 2(-1)^k \cr
|
|
F_{2k-1} &= F_k^2 + F_{k-1}^2 \cr
|
|
F_{2k} &= F_{2k+1} - F_{2k-1}
|
|
}$$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
F[2k+1] = 4*F[k]^2 - F[k-1]^2 + 2*(-1)^k
|
|
F[2k-1] = F[k]^2 + F[k-1]^2
|
|
|
|
F[2k] = F[2k+1] - F[2k-1]
|
|
@end example
|
|
|
|
@end ifnottex
|
|
At each step, @math{k} is the high @math{b} bits of @math{n}. If the next bit
|
|
of @math{n} is 0 then @m{F_{2k},F[2k]},@m{F_{2k-1},F[2k-1]} is used, or if
|
|
it's a 1 then @m{F_{2k+1},F[2k+1]},@m{F_{2k},F[2k]} is used, and the process
|
|
repeated until all bits of @math{n} are incorporated. Notice these formulas
|
|
require just two squares per bit of @math{n}.
|
|
|
|
It'd be possible to handle the first few @math{n} above the single limb table
|
|
with simple additions, using the defining Fibonacci recurrence @m{F_{k+1} =
|
|
F_k + F_{k-1}, F[k+1]=F[k]+F[k-1]}, but this is not done since it usually
|
|
turns out to be faster for only about 10 or 20 values of @math{n}, and
|
|
including a block of code for just those doesn't seem worthwhile. If they
|
|
really mattered it'd be better to extend the data table.
|
|
|
|
Using a table avoids lots of calculations on small numbers, and makes small
|
|
@math{n} go fast. A bigger table would make more small @math{n} go fast, it's
|
|
just a question of balancing size against desired speed. For MPIR the code is
|
|
kept compact, with the emphasis primarily on a good powering algorithm.
|
|
|
|
@code{mpz_fib2_ui} returns both @m{F_n,F[n]} and @m{F_{n-1},F[n-1]}, but
|
|
@code{mpz_fib_ui} is only interested in @m{F_n,F[n]}. In this case the last
|
|
step of the algorithm can become one multiply instead of two squares. One of
|
|
the following two formulas is used, according as @math{n} is odd or even.
|
|
@tex
|
|
$$\eqalign{
|
|
F_{2k} &= F_k (F_k + 2F_{k-1}) \cr
|
|
F_{2k+1} &= (2F_k + F_{k-1}) (2F_k - F_{k-1}) + 2(-1)^k
|
|
}$$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
F[2k] = F[k]*(F[k]+2F[k-1])
|
|
|
|
F[2k+1] = (2F[k]+F[k-1])*(2F[k]-F[k-1]) + 2*(-1)^k
|
|
@end example
|
|
|
|
@end ifnottex
|
|
@m{F_{2k+1},F[2k+1]} here is the same as above, just rearranged to be a
|
|
multiply. For interest, the @m{2(-1)^k, 2*(-1)^k} term both here and above
|
|
can be applied just to the low limb of the calculation, without a carry or
|
|
borrow into further limbs, which saves some code size. See comments with
|
|
@code{mpz_fib_ui} and the internal @code{mpn_fib2_ui} for how this is done.
|
|
|
|
|
|
@node Lucas Numbers Algorithm, Random Number Algorithms, Fibonacci Numbers Algorithm, Other Algorithms
|
|
@subsection Lucas Numbers
|
|
@cindex Lucas number algorithm
|
|
|
|
@code{mpz_lucnum2_ui} derives a pair of Lucas numbers from a pair of Fibonacci
|
|
numbers with the following simple formulas.
|
|
@tex
|
|
$$\eqalign{
|
|
L_k &= F_k + 2F_{k-1} \cr
|
|
L_{k-1} &= 2F_k - F_{k-1}
|
|
}$$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
L[k] = F[k] + 2*F[k-1]
|
|
L[k-1] = 2*F[k] - F[k-1]
|
|
@end example
|
|
|
|
@end ifnottex
|
|
@code{mpz_lucnum_ui} is only interested in @m{L_n,L[n]}, and some work can be
|
|
saved. Trailing zero bits on @math{n} can be handled with a single square
|
|
each.
|
|
@tex
|
|
$$ L_{2k} = L_k^2 - 2(-1)^k $$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
L[2k] = L[k]^2 - 2*(-1)^k
|
|
@end example
|
|
|
|
@end ifnottex
|
|
And the lowest 1 bit can be handled with one multiply of a pair of Fibonacci
|
|
numbers, similar to what @code{mpz_fib_ui} does.
|
|
@tex
|
|
$$ L_{2k+1} = 5F_{k-1} (2F_k + F_{k-1}) - 4(-1)^k $$
|
|
@end tex
|
|
@ifnottex
|
|
|
|
@example
|
|
L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*(-1)^k
|
|
@end example
|
|
|
|
@end ifnottex
|
|
|
|
|
|
@node Random Number Algorithms, , Lucas Numbers Algorithm, Other Algorithms
|
|
@subsection Random Numbers
|
|
@cindex Random number algorithms
|
|
|
|
For the @code{urandomb} functions, random numbers are generated simply by
|
|
concatenating bits produced by the generator. As long as the generator has
|
|
good randomness properties this will produce well-distributed @math{N} bit
|
|
numbers.
|
|
|
|
For the @code{urandomm} functions, random numbers in a range @math{0@le{}R<N}
|
|
are generated by taking values @math{R} of @m{\lceil \log_2 N \rceil,
|
|
ceil(log2(N))} bits each until one satisfies @math{R<N}. This will normally
|
|
require only one or two attempts, but the attempts are limited in case the
|
|
generator is somehow degenerate and produces only 1 bits or similar.
|
|
|
|
@cindex Mersenne twister algorithm
|
|
The Mersenne Twister generator is by Matsumoto and Nishimura
|
|
(@pxref{References}). It has a non-repeating period of @math{2^@W{19937}-1},
|
|
which is a Mersenne prime, hence the name of the generator. The state is 624
|
|
words of 32-bits each, which is iterated with one XOR and shift for each
|
|
32-bit word generated, making the algorithm very fast. Randomness properties
|
|
are also very good and this is the default algorithm used by MPIR.
|
|
|
|
@cindex Linear congruential algorithm
|
|
Linear congruential generators are described in many text books, for instance
|
|
Knuth volume 2 (@pxref{References}). With a modulus @math{M} and parameters
|
|
@math{A} and @math{C}, a integer state @math{S} is iterated by the formula
|
|
@math{S @leftarrow{} A@GMPmultiply{}S+C @bmod{} M}. At each step the new
|
|
state is a linear function of the previous, mod @math{M}, hence the name of
|
|
the generator.
|
|
|
|
In MPIR only moduli of the form @math{2^N} are supported, and the current
|
|
implementation is not as well optimized as it could be. Overheads are
|
|
significant when @math{N} is small, and when @math{N} is large clearly the
|
|
multiply at each step will become slow. This is not a big concern, since the
|
|
Mersenne Twister generator is better in every respect and is therefore
|
|
recommended for all normal applications.
|
|
|
|
For both generators the current state can be deduced by observing enough
|
|
output and applying some linear algebra (over GF(2) in the case of the
|
|
Mersenne Twister). This generally means raw output is unsuitable for
|
|
cryptographic applications without further hashing or the like.
|
|
|
|
|
|
@node Assembler Coding, , Other Algorithms, Algorithms
|
|
@section Assembler Coding
|
|
@cindex Assembler coding
|
|
|
|
The assembler subroutines in MPIR are the most significant source of speed at
|
|
small to moderate sizes. At larger sizes algorithm selection becomes more
|
|
important, but of course speedups in low level routines will still speed up
|
|
everything proportionally.
|
|
|
|
Carry handling and widening multiplies that are important for MPIR can't be
|
|
easily expressed in C@. GCC @code{asm} blocks help a lot and are provided in
|
|
@file{longlong.h}, but hand coding low level routines invariably offers a
|
|
speedup over generic C by a factor of anything from 2 to 10.
|
|
|
|
@menu
|
|
* Assembler Code Organisation::
|
|
* Assembler Basics::
|
|
* Assembler Carry Propagation::
|
|
* Assembler Cache Handling::
|
|
* Assembler Functional Units::
|
|
* Assembler Floating Point::
|
|
* Assembler SIMD Instructions::
|
|
* Assembler Software Pipelining::
|
|
* Assembler Loop Unrolling::
|
|
* Assembler Writing Guide::
|
|
@end menu
|
|
|
|
|
|
@node Assembler Code Organisation, Assembler Basics, Assembler Coding, Assembler Coding
|
|
@subsection Code Organisation
|
|
@cindex Assembler code organisation
|
|
@cindex Code organisation
|
|
|
|
The various @file{mpn} subdirectories contain machine-dependent code, written
|
|
in C or assembler. The @file{mpn/generic} subdirectory contains default code,
|
|
used when there's no machine-specific version of a particular file.
|
|
|
|
Each @file{mpn} subdirectory is for an ISA family. Generally 32-bit and
|
|
64-bit variants in a family cannot share code and have separate directories.
|
|
Within a family further subdirectories may exist for CPU variants.
|
|
|
|
In each directory a @file{nails} subdirectory may exist, holding code with
|
|
nails support for that CPU variant. A @code{NAILS_SUPPORT} directive in each
|
|
file indicates the nails values the code handles. Nails code only exists
|
|
where it's faster, or promises to be faster, than plain code. There's no
|
|
effort put into nails if they're not going to enhance a given CPU.
|
|
|
|
|
|
@node Assembler Basics, Assembler Carry Propagation, Assembler Code Organisation, Assembler Coding
|
|
@subsection Assembler Basics
|
|
|
|
@code{mpn_addmul_1} and @code{mpn_submul_1} are the most important routines
|
|
for overall MPIR performance. All multiplications and divisions come down to
|
|
repeated calls to these. @code{mpn_add_n}, @code{mpn_sub_n},
|
|
@code{mpn_lshift} and @code{mpn_rshift} are next most important.
|
|
|
|
On some CPUs assembler versions of the internal functions
|
|
@code{mpn_mul_basecase} and @code{mpn_sqr_basecase} give significant speedups,
|
|
mainly through avoiding function call overheads. They can also potentially
|
|
make better use of a wide superscalar processor, as can bigger primitives like
|
|
@code{mpn_addmul_2} or @code{mpn_addmul_4}.
|
|
|
|
The restrictions on overlaps between sources and destinations
|
|
(@pxref{Low-level Functions}) are designed to facilitate a variety of
|
|
implementations. For example, knowing @code{mpn_add_n} won't have partly
|
|
overlapping sources and destination means reading can be done far ahead of
|
|
writing on superscalar processors, and loops can be vectorized on a vector
|
|
processor, depending on the carry handling.
|
|
|
|
|
|
@node Assembler Carry Propagation, Assembler Cache Handling, Assembler Basics, Assembler Coding
|
|
@subsection Carry Propagation
|
|
@cindex Assembler carry propagation
|
|
|
|
The problem that presents most challenges in MPIR is propagating carries from
|
|
one limb to the next. In functions like @code{mpn_addmul_1} and
|
|
@code{mpn_add_n}, carries are the only dependencies between limb operations.
|
|
|
|
On processors with carry flags, a straightforward CISC style @code{adc} is
|
|
generally best. AMD K6 @code{mpn_addmul_1} however is an example of an
|
|
unusual set of circumstances where a branch works out better.
|
|
|
|
On RISC processors generally an add and compare for overflow is used. This
|
|
sort of thing can be seen in @file{mpn/generic/aors_n.c}. Some carry
|
|
propagation schemes require 4 instructions, meaning at least 4 cycles per
|
|
limb, but other schemes may use just 1 or 2. On wide superscalar processors
|
|
performance may be completely determined by the number of dependent
|
|
instructions between carry-in and carry-out for each limb.
|
|
|
|
On vector processors good use can be made of the fact that a carry bit only
|
|
very rarely propagates more than one limb. When adding a single bit to a
|
|
limb, there's only a carry out if that limb was @code{0xFF@dots{}FF} which on
|
|
random data will be only 1 in @m{2\GMPraise{@code{mp\_bits\_per\_limb}},
|
|
2^mp_bits_per_limb}. @file{mpn/cray/add_n.c} is an example of this, it adds
|
|
all limbs in parallel, adds one set of carry bits in parallel and then only
|
|
rarely needs to fall through to a loop propagating further carries.
|
|
|
|
On the x86s, GCC (as of version 2.95.2) doesn't generate particularly good code
|
|
for the RISC style idioms that are necessary to handle carry bits in
|
|
C@. Often conditional jumps are generated where @code{adc} or @code{sbb} forms
|
|
would be better. And so unfortunately almost any loop involving carry bits
|
|
needs to be coded in assembler for best results.
|
|
|
|
|
|
@node Assembler Cache Handling, Assembler Functional Units, Assembler Carry Propagation, Assembler Coding
|
|
@subsection Cache Handling
|
|
@cindex Assembler cache handling
|
|
|
|
MPIR aims to perform well both on operands that fit entirely in L1 cache and
|
|
those which don't.
|
|
|
|
Basic routines like @code{mpn_add_n} or @code{mpn_lshift} are often used on
|
|
large operands, so L2 and main memory performance is important for them.
|
|
@code{mpn_mul_1} and @code{mpn_addmul_1} are mostly used for multiply and
|
|
square basecases, so L1 performance matters most for them, unless assembler
|
|
versions of @code{mpn_mul_basecase} and @code{mpn_sqr_basecase} exist, in
|
|
which case the remaining uses are mostly for larger operands.
|
|
|
|
For L2 or main memory operands, memory access times will almost certainly be
|
|
more than the calculation time. The aim therefore is to maximize memory
|
|
throughput, by starting a load of the next cache line while processing the
|
|
contents of the previous one. Clearly this is only possible if the chip has a
|
|
lock-up free cache or some sort of prefetch instruction. Most current chips
|
|
have both these features.
|
|
|
|
Prefetching sources combines well with loop unrolling, since a prefetch can be
|
|
initiated once per unrolled loop (or more than once if the loop covers more
|
|
than one cache line).
|
|
|
|
On CPUs without write-allocate caches, prefetching destinations will ensure
|
|
individual stores don't go further down the cache hierarchy, limiting
|
|
bandwidth. Of course for calculations which are slow anyway, like
|
|
@code{mpn_divrem_1}, write-throughs might be fine.
|
|
|
|
The distance ahead to prefetch will be determined by memory latency versus
|
|
throughput. The aim of course is to have data arriving continuously, at peak
|
|
throughput. Some CPUs have limits on the number of fetches or prefetches in
|
|
progress.
|
|
|
|
If a special prefetch instruction doesn't exist then a plain load can be used,
|
|
but in that case care must be taken not to attempt to read past the end of an
|
|
operand, since that might produce a segmentation violation.
|
|
|
|
Some CPUs or systems have hardware that detects sequential memory accesses and
|
|
initiates suitable cache movements automatically, making life easy.
|
|
|
|
|
|
@node Assembler Functional Units, Assembler Floating Point, Assembler Cache Handling, Assembler Coding
|
|
@subsection Functional Units
|
|
|
|
When choosing an approach for an assembler loop, consideration is given to
|
|
what operations can execute simultaneously and what throughput can thereby be
|
|
achieved. In some cases an algorithm can be tweaked to accommodate available
|
|
resources.
|
|
|
|
Loop control will generally require a counter and pointer updates, costing as
|
|
much as 5 instructions, plus any delays a branch introduces. CPU addressing
|
|
modes might reduce pointer updates, perhaps by allowing just one updating
|
|
pointer and others expressed as offsets from it, or on CISC chips with all
|
|
addressing done with the loop counter as a scaled index.
|
|
|
|
The final loop control cost can be amortised by processing several limbs in
|
|
each iteration (@pxref{Assembler Loop Unrolling}). This at least ensures loop
|
|
control isn't a big fraction the work done.
|
|
|
|
Memory throughput is always a limit. If perhaps only one load or one store
|
|
can be done per cycle then 3 cycles/limb will the top speed for ``binary''
|
|
operations like @code{mpn_add_n}, and any code achieving that is optimal.
|
|
|
|
Integer resources can be freed up by having the loop counter in a float
|
|
register, or by pressing the float units into use for some multiplying,
|
|
perhaps doing every second limb on the float side (@pxref{Assembler Floating
|
|
Point}).
|
|
|
|
Float resources can be freed up by doing carry propagation on the integer
|
|
side, or even by doing integer to float conversions in integers using bit
|
|
twiddling.
|
|
|
|
|
|
@node Assembler Floating Point, Assembler SIMD Instructions, Assembler Functional Units, Assembler Coding
|
|
@subsection Floating Point
|
|
@cindex Assembler floating Point
|
|
|
|
Floating point arithmetic is used in MPIR for multiplications on CPUs with poor
|
|
integer multipliers. It's mostly useful for @code{mpn_mul_1},
|
|
@code{mpn_addmul_1} and @code{mpn_submul_1} on 64-bit machines, and
|
|
@code{mpn_mul_basecase} on both 32-bit and 64-bit machines.
|
|
|
|
With IEEE 53-bit double precision floats, integer multiplications producing up
|
|
to 53 bits will give exact results. Breaking a 64@cross{}64 multiplication
|
|
into eight 16@cross{}@math{32@rightarrow{}48} bit pieces is convenient. With
|
|
some care though six 21@cross{}@math{32@rightarrow{}53} bit products can be
|
|
used, if one of the lower two 21-bit pieces also uses the sign bit.
|
|
|
|
For the @code{mpn_mul_1} family of functions on a 64-bit machine, the
|
|
invariant single limb is split at the start, into 3 or 4 pieces. Inside the
|
|
loop, the bignum operand is split into 32-bit pieces. Fast conversion of
|
|
these unsigned 32-bit pieces to floating point is highly machine-dependent.
|
|
In some cases, reading the data into the integer unit, zero-extending to
|
|
64-bits, then transferring to the floating point unit back via memory is the
|
|
only option.
|
|
|
|
Converting partial products back to 64-bit limbs is usually best done as a
|
|
signed conversion. Since all values are smaller than @m{2^{53},2^53}, signed
|
|
and unsigned are the same, but most processors lack unsigned conversions.
|
|
|
|
@sp 2
|
|
|
|
Here is a diagram showing 16@cross{}32 bit products for an @code{mpn_mul_1} or
|
|
@code{mpn_addmul_1} with a 64-bit limb. The single limb operand V is split
|
|
into four 16-bit parts. The multi-limb operand U is split in the loop into
|
|
two 32-bit parts.
|
|
|
|
@tex
|
|
\global\newdimen\GMPbits \global\GMPbits=0.18em
|
|
\def\GMPbox#1#2#3{%
|
|
\hbox{%
|
|
\hbox to 128\GMPbits{\hfil
|
|
\vbox{%
|
|
\hrule
|
|
\hbox to 48\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}%
|
|
\hrule}%
|
|
\hskip #1\GMPbits}%
|
|
\raise \GMPboxdepth \hbox{\hskip 2em #3}}}
|
|
%
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\hbox{%
|
|
\hbox to 128\GMPbits {\hfil
|
|
\vbox{%
|
|
\hrule
|
|
\hbox to 64\GMPbits{%
|
|
\GMPvrule \hfil$v48$\hfil
|
|
\vrule \hfil$v32$\hfil
|
|
\vrule \hfil$v16$\hfil
|
|
\vrule \hfil$v00$\hfil
|
|
\vrule}
|
|
\hrule}}%
|
|
\raise \GMPboxdepth \hbox{\hskip 2em V Operand}}
|
|
\vskip 0.5ex
|
|
\hbox{%
|
|
\hbox to 128\GMPbits {\hfil
|
|
\raise \GMPboxdepth \hbox{$\times$\hskip 1.5em}%
|
|
\vbox{%
|
|
\hrule
|
|
\hbox to 64\GMPbits {%
|
|
\GMPvrule \hfil$u32$\hfil
|
|
\vrule \hfil$u00$\hfil
|
|
\vrule}%
|
|
\hrule}}%
|
|
\raise \GMPboxdepth \hbox{\hskip 2em U Operand (one limb)}}%
|
|
\vskip 0.5ex
|
|
\hbox{\vbox to 2ex{\hrule width 128\GMPbits}}%
|
|
\GMPbox{0}{u00 \times v00}{$p00$\hskip 1.5em 48-bit products}%
|
|
\vskip 0.5ex
|
|
\GMPbox{16}{u00 \times v16}{$p16$}
|
|
\vskip 0.5ex
|
|
\GMPbox{32}{u00 \times v32}{$p32$}
|
|
\vskip 0.5ex
|
|
\GMPbox{48}{u00 \times v48}{$p48$}
|
|
\vskip 0.5ex
|
|
\GMPbox{32}{u32 \times v00}{$r32$}
|
|
\vskip 0.5ex
|
|
\GMPbox{48}{u32 \times v16}{$r48$}
|
|
\vskip 0.5ex
|
|
\GMPbox{64}{u32 \times v32}{$r64$}
|
|
\vskip 0.5ex
|
|
\GMPbox{80}{u32 \times v48}{$r80$}
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
+---+---+---+---+
|
|
|v48|v32|v16|v00| V operand
|
|
+---+---+---+---+
|
|
|
|
+-------+---+---+
|
|
x | u32 | u00 | U operand (one limb)
|
|
+---------------+
|
|
|
|
---------------------------------
|
|
|
|
+-----------+
|
|
| u00 x v00 | p00 48-bit products
|
|
+-----------+
|
|
+-----------+
|
|
| u00 x v16 | p16
|
|
+-----------+
|
|
+-----------+
|
|
| u00 x v32 | p32
|
|
+-----------+
|
|
+-----------+
|
|
| u00 x v48 | p48
|
|
+-----------+
|
|
+-----------+
|
|
| u32 x v00 | r32
|
|
+-----------+
|
|
+-----------+
|
|
| u32 x v16 | r48
|
|
+-----------+
|
|
+-----------+
|
|
| u32 x v32 | r64
|
|
+-----------+
|
|
+-----------+
|
|
| u32 x v48 | r80
|
|
+-----------+
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
@math{p32} and @math{r32} can be summed using floating-point addition, and
|
|
likewise @math{p48} and @math{r48}. @math{p00} and @math{p16} can be summed
|
|
with @math{r64} and @math{r80} from the previous iteration.
|
|
|
|
For each loop then, four 49-bit quantities are transfered to the integer unit,
|
|
aligned as follows,
|
|
|
|
@tex
|
|
% GMPbox here should be 49 bits wide, but use 51 to better show p16+r80'
|
|
% crossing into the upper 64 bits.
|
|
\def\GMPbox#1#2#3{%
|
|
\hbox{%
|
|
\hbox to 128\GMPbits {%
|
|
\hfil
|
|
\vbox{%
|
|
\hrule
|
|
\hbox to 51\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}%
|
|
\hrule}%
|
|
\hskip #1\GMPbits}%
|
|
\raise \GMPboxdepth \hbox{\hskip 1.5em $#3$\hfil}%
|
|
}}
|
|
\newbox\b \setbox\b\hbox{64 bits}%
|
|
\newdimen\bw \bw=\wd\b \advance\bw by 2em
|
|
\newdimen\x \x=128\GMPbits
|
|
\advance\x by -2\bw
|
|
\divide\x by4
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\hbox to 128\GMPbits {%
|
|
\GMPvrule
|
|
\raise 0.5ex \vbox{\hrule \hbox to \x {}}%
|
|
\hfil 64 bits\hfil
|
|
\raise 0.5ex \vbox{\hrule \hbox to \x {}}%
|
|
\vrule
|
|
\raise 0.5ex \vbox{\hrule \hbox to \x {}}%
|
|
\hfil 64 bits\hfil
|
|
\raise 0.5ex \vbox{\hrule \hbox to \x {}}%
|
|
\vrule}%
|
|
\vskip 0.7ex
|
|
\GMPbox{0}{p00+r64'}{i00}
|
|
\vskip 0.5ex
|
|
\GMPbox{16}{p16+r80'}{i16}
|
|
\vskip 0.5ex
|
|
\GMPbox{32}{p32+r32}{i32}
|
|
\vskip 0.5ex
|
|
\GMPbox{48}{p48+r48}{i48}
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
@group
|
|
|-----64bits----|-----64bits----|
|
|
+------------+
|
|
| p00 + r64' | i00
|
|
+------------+
|
|
+------------+
|
|
| p16 + r80' | i16
|
|
+------------+
|
|
+------------+
|
|
| p32 + r32 | i32
|
|
+------------+
|
|
+------------+
|
|
| p48 + r48 | i48
|
|
+------------+
|
|
@end group
|
|
@end example
|
|
@end ifnottex
|
|
|
|
The challenge then is to sum these efficiently and add in a carry limb,
|
|
generating a low 64-bit result limb and a high 33-bit carry limb (@math{i48}
|
|
extends 33 bits into the high half).
|
|
|
|
|
|
@node Assembler SIMD Instructions, Assembler Software Pipelining, Assembler Floating Point, Assembler Coding
|
|
@subsection SIMD Instructions
|
|
@cindex Assembler SIMD
|
|
|
|
The single-instruction multiple-data support in current microprocessors is
|
|
aimed at signal processing algorithms where each data point can be treated
|
|
more or less independently. There's generally not much support for
|
|
propagating the sort of carries that arise in MPIR.
|
|
|
|
SIMD multiplications of say four 16@cross{}16 bit multiplies only do as much
|
|
work as one 32@cross{}32 from MPIR's point of view, and need some shifts and
|
|
adds besides. But of course if say the SIMD form is fully pipelined and uses
|
|
less instruction decoding then it may still be worthwhile.
|
|
|
|
On the x86 chips, MMX has so far found a use in @code{mpn_rshift} and
|
|
@code{mpn_lshift}, and is used in a special case for 16-bit multipliers in the
|
|
P55 @code{mpn_mul_1}. SSE2 is used for Pentium 4 @code{mpn_mul_1},
|
|
@code{mpn_addmul_1}, and @code{mpn_submul_1}.
|
|
|
|
|
|
@node Assembler Software Pipelining, Assembler Loop Unrolling, Assembler SIMD Instructions, Assembler Coding
|
|
@subsection Software Pipelining
|
|
@cindex Assembler software pipelining
|
|
|
|
Software pipelining consists of scheduling instructions around the branch
|
|
point in a loop. For example a loop might issue a load not for use in the
|
|
present iteration but the next, thereby allowing extra cycles for the data to
|
|
arrive from memory.
|
|
|
|
Naturally this is wanted only when doing things like loads or multiplies that
|
|
take several cycles to complete, and only where a CPU has multiple functional
|
|
units so that other work can be done in the meantime.
|
|
|
|
A pipeline with several stages will have a data value in progress at each
|
|
stage and each loop iteration moves them along one stage. This is like
|
|
juggling.
|
|
|
|
If the latency of some instruction is greater than the loop time then it will
|
|
be necessary to unroll, so one register has a result ready to use while
|
|
another (or multiple others) are still in progress. (@pxref{Assembler Loop
|
|
Unrolling}).
|
|
|
|
|
|
@node Assembler Loop Unrolling, Assembler Writing Guide, Assembler Software Pipelining, Assembler Coding
|
|
@subsection Loop Unrolling
|
|
@cindex Assembler loop unrolling
|
|
|
|
Loop unrolling consists of replicating code so that several limbs are
|
|
processed in each loop. At a minimum this reduces loop overheads by a
|
|
corresponding factor, but it can also allow better register usage, for example
|
|
alternately using one register combination and then another. Judicious use of
|
|
@command{m4} macros can help avoid lots of duplication in the source code.
|
|
|
|
Any amount of unrolling can be handled with a loop counter that's decremented
|
|
by @math{N} each time, stopping when the remaining count is less than the
|
|
further @math{N} the loop will process. Or by subtracting @math{N} at the
|
|
start, the termination condition becomes when the counter @math{C} is less
|
|
than 0 (and the count of remaining limbs is @math{C+N}).
|
|
|
|
Alternately for a power of 2 unroll the loop count and remainder can be
|
|
established with a shift and mask. This is convenient if also making a
|
|
computed jump into the middle of a large loop.
|
|
|
|
The limbs not a multiple of the unrolling can be handled in various ways, for
|
|
example
|
|
|
|
@itemize @bullet
|
|
@item
|
|
A simple loop at the end (or the start) to process the excess. Care will be
|
|
wanted that it isn't too much slower than the unrolled part.
|
|
|
|
@item
|
|
A set of binary tests, for example after an 8-limb unrolling, test for 4 more
|
|
limbs to process, then a further 2 more or not, and finally 1 more or not.
|
|
This will probably take more code space than a simple loop.
|
|
|
|
@item
|
|
A @code{switch} statement, providing separate code for each possible excess,
|
|
for example an 8-limb unrolling would have separate code for 0 remaining, 1
|
|
remaining, etc, up to 7 remaining. This might take a lot of code, but may be
|
|
the best way to optimize all cases in combination with a deep pipelined loop.
|
|
|
|
@item
|
|
A computed jump into the middle of the loop, thus making the first iteration
|
|
handle the excess. This should make times smoothly increase with size, which
|
|
is attractive, but setups for the jump and adjustments for pointers can be
|
|
tricky and could become quite difficult in combination with deep pipelining.
|
|
@end itemize
|
|
|
|
|
|
@node Assembler Writing Guide, , Assembler Loop Unrolling, Assembler Coding
|
|
@subsection Writing Guide
|
|
@cindex Assembler writing guide
|
|
|
|
This is a guide to writing software pipelined loops for processing limb
|
|
vectors in assembler.
|
|
|
|
First determine the algorithm and which instructions are needed. Code it
|
|
without unrolling or scheduling, to make sure it works. On a 3-operand CPU
|
|
try to write each new value to a new register, this will greatly simplify later
|
|
steps.
|
|
|
|
Then note for each instruction the functional unit and/or issue port
|
|
requirements. If an instruction can use either of two units, like U0 or U1
|
|
then make a category ``U0/U1''. Count the total using each unit (or combined
|
|
unit), and count all instructions.
|
|
|
|
Figure out from those counts the best possible loop time. The goal will be to
|
|
find a perfect schedule where instruction latencies are completely hidden.
|
|
The total instruction count might be the limiting factor, or perhaps a
|
|
particular functional unit. It might be possible to tweak the instructions to
|
|
help the limiting factor.
|
|
|
|
Suppose the loop time is @math{N}, then make @math{N} issue buckets, with the
|
|
final loop branch at the end of the last. Now fill the buckets with dummy
|
|
instructions using the functional units desired. Run this to make sure the
|
|
intended speed is reached.
|
|
|
|
Now replace the dummy instructions with the real instructions from the slow
|
|
but correct loop you started with. The first will typically be a load
|
|
instruction. Then the instruction using that value is placed in a bucket an
|
|
appropriate distance down. Run the loop again, to check it still runs at
|
|
target speed.
|
|
|
|
Keep placing instructions, frequently measuring the loop. After a few you
|
|
will need to wrap around from the last bucket back to the top of the loop. If
|
|
you used the new-register for new-value strategy above then there will be no
|
|
register conflicts. If not then take care not to clobber something already in
|
|
use. Changing registers at this time is very error prone.
|
|
|
|
The loop will overlap two or more of the original loop iterations, and the
|
|
computation of one vector element result will be started in one iteration of
|
|
the new loop, and completed one or several iterations later.
|
|
|
|
The final step is to create feed-in and wind-down code for the loop. A good
|
|
way to do this is to make a copy (or copies) of the loop at the start and
|
|
delete those instructions which don't have valid antecedents, and at the end
|
|
replicate and delete those whose results are unwanted (including any further
|
|
loads).
|
|
|
|
The loop will have a minimum number of limbs loaded and processed, so the
|
|
feed-in code must test if the request size is smaller and skip either to a
|
|
suitable part of the wind-down or to special code for small sizes.
|
|
|
|
|
|
@node Internals, Contributors, Algorithms, Top
|
|
@chapter Internals
|
|
@cindex Internals
|
|
|
|
@strong{This chapter is provided only for informational purposes and the
|
|
various internals described here may change in future MPIR releases.
|
|
Applications expecting to be compatible with future releases should use only
|
|
the documented interfaces described in previous chapters.}
|
|
|
|
@menu
|
|
* Integer Internals::
|
|
* Rational Internals::
|
|
* Float Internals::
|
|
* Raw Output Internals::
|
|
* C++ Interface Internals::
|
|
@end menu
|
|
|
|
@node Integer Internals, Rational Internals, Internals, Internals
|
|
@section Integer Internals
|
|
@cindex Integer internals
|
|
|
|
@code{mpz_t} variables represent integers using sign and magnitude, in space
|
|
dynamically allocated and reallocated. The fields are as follows.
|
|
|
|
@table @asis
|
|
@item @code{_mp_size}
|
|
The number of limbs, or the negative of that when representing a negative
|
|
integer. Zero is represented by @code{_mp_size} set to zero, in which case
|
|
the @code{_mp_d} data is unused.
|
|
|
|
@item @code{_mp_d}
|
|
A pointer to an array of limbs which is the magnitude. These are stored
|
|
``little endian'' as per the @code{mpn} functions, so @code{_mp_d[0]} is the
|
|
least significant limb and @code{_mp_d[ABS(_mp_size)-1]} is the most
|
|
significant. Whenever @code{_mp_size} is non-zero, the most significant limb
|
|
is non-zero.
|
|
|
|
Currently there's always at least one limb allocated, so for instance
|
|
@code{mpz_set_ui} never needs to reallocate, and @code{mpz_get_ui} can fetch
|
|
@code{_mp_d[0]} unconditionally (though its value is then only wanted if
|
|
@code{_mp_size} is non-zero).
|
|
|
|
@item @code{_mp_alloc}
|
|
@code{_mp_alloc} is the number of limbs currently allocated at @code{_mp_d},
|
|
and naturally @code{_mp_alloc >= ABS(_mp_size)}. When an @code{mpz} routine
|
|
is about to (or might be about to) increase @code{_mp_size}, it checks
|
|
@code{_mp_alloc} to see whether there's enough space, and reallocates if not.
|
|
@code{MPZ_REALLOC} is generally used for this.
|
|
@end table
|
|
|
|
The various bitwise logical functions like @code{mpz_and} behave as if
|
|
negative values were twos complement. But sign and magnitude is always used
|
|
internally, and necessary adjustments are made during the calculations.
|
|
Sometimes this isn't pretty, but sign and magnitude are best for other
|
|
routines.
|
|
|
|
Some internal temporary variables are setup with @code{MPZ_TMP_INIT} and these
|
|
have @code{_mp_d} space obtained from @code{TMP_ALLOC} rather than the memory
|
|
allocation functions. Care is taken to ensure that these are big enough that
|
|
no reallocation is necessary (since it would have unpredictable consequences).
|
|
|
|
@code{_mp_size} and @code{_mp_alloc} are @code{int}, although @code{mp_size_t}
|
|
is usually a @code{long}. This is done to make the fields just 32 bits on
|
|
some 64 bits systems, thereby saving a few bytes of data space but still
|
|
providing plenty of range.
|
|
|
|
|
|
@node Rational Internals, Float Internals, Integer Internals, Internals
|
|
@section Rational Internals
|
|
@cindex Rational internals
|
|
|
|
@code{mpq_t} variables represent rationals using an @code{mpz_t} numerator and
|
|
denominator (@pxref{Integer Internals}).
|
|
|
|
The canonical form adopted is denominator positive (and non-zero), no common
|
|
factors between numerator and denominator, and zero uniquely represented as
|
|
0/1.
|
|
|
|
It's believed that casting out common factors at each stage of a calculation
|
|
is best in general. A GCD is an @math{O(N^2)} operation so it's better to do
|
|
a few small ones immediately than to delay and have to do a big one later.
|
|
Knowing the numerator and denominator have no common factors can be used for
|
|
example in @code{mpq_mul} to make only two cross GCDs necessary, not four.
|
|
|
|
This general approach to common factors is badly sub-optimal in the presence
|
|
of simple factorizations or little prospect for cancellation, but MPIR has no
|
|
way to know when this will occur. As per @ref{Efficiency}, that's left to
|
|
applications. The @code{mpq_t} framework might still suit, with
|
|
@code{mpq_numref} and @code{mpq_denref} for direct access to the numerator and
|
|
denominator, or of course @code{mpz_t} variables can be used directly.
|
|
|
|
|
|
@node Float Internals, Raw Output Internals, Rational Internals, Internals
|
|
@section Float Internals
|
|
@cindex Float internals
|
|
|
|
Efficient calculation is the primary aim of MPIR floats and the use of whole
|
|
limbs and simple rounding facilitates this.
|
|
|
|
@code{mpf_t} floats have a variable precision mantissa and a single machine
|
|
word signed exponent. The mantissa is represented using sign and magnitude.
|
|
|
|
@c FIXME: The arrow heads don't join to the lines exactly.
|
|
@tex
|
|
\global\newdimen\GMPboxwidth \GMPboxwidth=5em
|
|
\global\newdimen\GMPboxheight \GMPboxheight=3ex
|
|
\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\hbox to 5\GMPboxwidth {most significant limb \hfil least significant limb}
|
|
\vskip 0.7ex
|
|
\def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}
|
|
\hbox {
|
|
\hbox to 3\GMPboxwidth {%
|
|
\setbox 0 = \hbox{@code{\_mp\_exp}}%
|
|
\dimen0=3\GMPboxwidth
|
|
\advance\dimen0 by -\wd0
|
|
\divide\dimen0 by 2
|
|
\advance\dimen0 by -1em
|
|
\setbox1 = \hbox{$\rightarrow$}%
|
|
\dimen1=\dimen0
|
|
\advance\dimen1 by -\wd1
|
|
\GMPcentreline{\dimen0}%
|
|
\hfil
|
|
\box0%
|
|
\hfil
|
|
\GMPcentreline{\dimen1{}}%
|
|
\box1}
|
|
\hbox to 2\GMPboxwidth {\hfil @code{\_mp\_d}}}
|
|
\vskip 0.5ex
|
|
\vbox {%
|
|
\hrule
|
|
\hbox{%
|
|
\vrule height 2ex depth 1ex
|
|
\hbox to \GMPboxwidth {}%
|
|
\vrule
|
|
\hbox to \GMPboxwidth {}%
|
|
\vrule
|
|
\hbox to \GMPboxwidth {}%
|
|
\vrule
|
|
\hbox to \GMPboxwidth {}%
|
|
\vrule
|
|
\hbox to \GMPboxwidth {}%
|
|
\vrule}
|
|
\hrule
|
|
}
|
|
\hbox {%
|
|
\hbox to 0.8 pt {}
|
|
\hbox to 3\GMPboxwidth {%
|
|
\hfil $\cdot$} \hbox {$\leftarrow$ radix point\hfil}}
|
|
\hbox to 5\GMPboxwidth{%
|
|
\setbox 0 = \hbox{@code{\_mp\_size}}%
|
|
\dimen0 = 5\GMPboxwidth
|
|
\advance\dimen0 by -\wd0
|
|
\divide\dimen0 by 2
|
|
\advance\dimen0 by -1em
|
|
\dimen1 = \dimen0
|
|
\setbox1 = \hbox{$\leftarrow$}%
|
|
\setbox2 = \hbox{$\rightarrow$}%
|
|
\advance\dimen0 by -\wd1
|
|
\advance\dimen1 by -\wd2
|
|
\hbox to 0.3 em {}%
|
|
\box1
|
|
\GMPcentreline{\dimen0}%
|
|
\hfil
|
|
\box0
|
|
\hfil
|
|
\GMPcentreline{\dimen1}%
|
|
\box2}
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
most least
|
|
significant significant
|
|
limb limb
|
|
|
|
_mp_d
|
|
|---- _mp_exp ---> |
|
|
_____ _____ _____ _____ _____
|
|
|_____|_____|_____|_____|_____|
|
|
. <------------ radix point
|
|
|
|
<-------- _mp_size --------->
|
|
@sp 1
|
|
@end example
|
|
@end ifnottex
|
|
|
|
@noindent
|
|
The fields are as follows.
|
|
|
|
@table @asis
|
|
@item @code{_mp_size}
|
|
The number of limbs currently in use, or the negative of that when
|
|
representing a negative value. Zero is represented by @code{_mp_size} and
|
|
@code{_mp_exp} both set to zero, and in that case the @code{_mp_d} data is
|
|
unused. (In the future @code{_mp_exp} might be undefined when representing
|
|
zero.)
|
|
|
|
@item @code{_mp_prec}
|
|
The precision of the mantissa, in limbs. In any calculation the aim is to
|
|
produce @code{_mp_prec} limbs of result (the most significant being non-zero).
|
|
|
|
@item @code{_mp_d}
|
|
A pointer to the array of limbs which is the absolute value of the mantissa.
|
|
These are stored ``little endian'' as per the @code{mpn} functions, so
|
|
@code{_mp_d[0]} is the least significant limb and
|
|
@code{_mp_d[ABS(_mp_size)-1]} the most significant.
|
|
|
|
The most significant limb is always non-zero, but there are no other
|
|
restrictions on its value, in particular the highest 1 bit can be anywhere
|
|
within the limb.
|
|
|
|
@code{_mp_prec+1} limbs are allocated to @code{_mp_d}, the extra limb being
|
|
for convenience (see below). There are no reallocations during a calculation,
|
|
only in a change of precision with @code{mpf_set_prec}.
|
|
|
|
@item @code{_mp_exp}
|
|
The exponent, in limbs, determining the location of the implied radix point.
|
|
Zero means the radix point is just above the most significant limb. Positive
|
|
values mean a radix point offset towards the lower limbs and hence a value
|
|
@math{@ge{} 1}, as for example in the diagram above. Negative exponents mean
|
|
a radix point further above the highest limb.
|
|
|
|
Naturally the exponent can be any value, it doesn't have to fall within the
|
|
limbs as the diagram shows, it can be a long way above or a long way below.
|
|
Limbs other than those included in the @code{@{_mp_d,_mp_size@}} data
|
|
are treated as zero.
|
|
@end table
|
|
|
|
@code{_mp_size} and @code{_mp_prec} are @code{int}, although @code{mp_size_t}
|
|
is usually a @code{long}. This is done to make the fields just 32 bits on
|
|
some 64 bits systems, thereby saving a few bytes of data space but still
|
|
providing plenty of range.
|
|
|
|
|
|
@sp 1
|
|
@noindent
|
|
The following various points should be noted.
|
|
|
|
@table @asis
|
|
@item Low Zeros
|
|
The least significant limbs @code{_mp_d[0]} etc can be zero, though such low
|
|
zeros can always be ignored. Routines likely to produce low zeros check and
|
|
avoid them to save time in subsequent calculations, but for most routines
|
|
they're quite unlikely and aren't checked.
|
|
|
|
@item Mantissa Size Range
|
|
The @code{_mp_size} count of limbs in use can be less than @code{_mp_prec} if
|
|
the value can be represented in less. This means low precision values or
|
|
small integers stored in a high precision @code{mpf_t} can still be operated
|
|
on efficiently.
|
|
|
|
@code{_mp_size} can also be greater than @code{_mp_prec}. Firstly a value is
|
|
allowed to use all of the @code{_mp_prec+1} limbs available at @code{_mp_d},
|
|
and secondly when @code{mpf_set_prec_raw} lowers @code{_mp_prec} it leaves
|
|
@code{_mp_size} unchanged and so the size can be arbitrarily bigger than
|
|
@code{_mp_prec}.
|
|
|
|
@item Rounding
|
|
All rounding is done on limb boundaries. Calculating @code{_mp_prec} limbs
|
|
with the high non-zero will ensure the application requested minimum precision
|
|
is obtained.
|
|
|
|
The use of simple ``trunc'' rounding towards zero is efficient, since there's
|
|
no need to examine extra limbs and increment or decrement.
|
|
|
|
@item Bit Shifts
|
|
Since the exponent is in limbs, there are no bit shifts in basic operations
|
|
like @code{mpf_add} and @code{mpf_mul}. When differing exponents are
|
|
encountered all that's needed is to adjust pointers to line up the relevant
|
|
limbs.
|
|
|
|
Of course @code{mpf_mul_2exp} and @code{mpf_div_2exp} will require bit shifts,
|
|
but the choice is between an exponent in limbs which requires shifts there, or
|
|
one in bits which requires them almost everywhere else.
|
|
|
|
@item Use of @code{_mp_prec+1} Limbs
|
|
The extra limb on @code{_mp_d} (@code{_mp_prec+1} rather than just
|
|
@code{_mp_prec}) helps when an @code{mpf} routine might get a carry from its
|
|
operation. @code{mpf_add} for instance will do an @code{mpn_add} of
|
|
@code{_mp_prec} limbs. If there's no carry then that's the result, but if
|
|
there is a carry then it's stored in the extra limb of space and
|
|
@code{_mp_size} becomes @code{_mp_prec+1}.
|
|
|
|
Whenever @code{_mp_prec+1} limbs are held in a variable, the low limb is not
|
|
needed for the intended precision, only the @code{_mp_prec} high limbs. But
|
|
zeroing it out or moving the rest down is unnecessary. Subsequent routines
|
|
reading the value will simply take the high limbs they need, and this will be
|
|
@code{_mp_prec} if their target has that same precision. This is no more than
|
|
a pointer adjustment, and must be checked anyway since the destination
|
|
precision can be different from the sources.
|
|
|
|
Copy functions like @code{mpf_set} will retain a full @code{_mp_prec+1} limbs
|
|
if available. This ensures that a variable which has @code{_mp_size} equal to
|
|
@code{_mp_prec+1} will get its full exact value copied. Strictly speaking
|
|
this is unnecessary since only @code{_mp_prec} limbs are needed for the
|
|
application's requested precision, but it's considered that an @code{mpf_set}
|
|
from one variable into another of the same precision ought to produce an exact
|
|
copy.
|
|
|
|
@item Application Precisions
|
|
@code{__GMPF_BITS_TO_PREC} converts an application requested precision to an
|
|
@code{_mp_prec}. The value in bits is rounded up to a whole limb then an
|
|
extra limb is added since the most significant limb of @code{_mp_d} is only
|
|
non-zero and therefore might contain only one bit.
|
|
|
|
@code{__GMPF_PREC_TO_BITS} does the reverse conversion, and removes the extra
|
|
limb from @code{_mp_prec} before converting to bits. The net effect of
|
|
reading back with @code{mpf_get_prec} is simply the precision rounded up to a
|
|
multiple of @code{mp_bits_per_limb}.
|
|
|
|
Note that the extra limb added here for the high only being non-zero is in
|
|
addition to the extra limb allocated to @code{_mp_d}. For example with a
|
|
32-bit limb, an application request for 250 bits will be rounded up to 8
|
|
limbs, then an extra added for the high being only non-zero, giving an
|
|
@code{_mp_prec} of 9. @code{_mp_d} then gets 10 limbs allocated. Reading
|
|
back with @code{mpf_get_prec} will take @code{_mp_prec} subtract 1 limb and
|
|
multiply by 32, giving 256 bits.
|
|
|
|
Strictly speaking, the fact the high limb has at least one bit means that a
|
|
float with, say, 3 limbs of 32-bits each will be holding at least 65 bits, but
|
|
for the purposes of @code{mpf_t} it's considered simply to be 64 bits, a nice
|
|
multiple of the limb size.
|
|
@end table
|
|
|
|
|
|
@node Raw Output Internals, C++ Interface Internals, Float Internals, Internals
|
|
@section Raw Output Internals
|
|
@cindex Raw output internals
|
|
|
|
@noindent
|
|
@code{mpz_out_raw} uses the following format.
|
|
|
|
@tex
|
|
\global\newdimen\GMPboxwidth \GMPboxwidth=5em
|
|
\global\newdimen\GMPboxheight \GMPboxheight=3ex
|
|
\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}
|
|
\GMPdisplay{%
|
|
\vbox{%
|
|
\def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}
|
|
\vbox {%
|
|
\hrule
|
|
\hbox{%
|
|
\vrule height 2.5ex depth 1.5ex
|
|
\hbox to \GMPboxwidth {\hfil size\hfil}%
|
|
\vrule
|
|
\hbox to 3\GMPboxwidth {\hfil data bytes\hfil}%
|
|
\vrule}
|
|
\hrule}
|
|
}}
|
|
@end tex
|
|
@ifnottex
|
|
@example
|
|
+------+------------------------+
|
|
| size | data bytes |
|
|
+------+------------------------+
|
|
@end example
|
|
@end ifnottex
|
|
|
|
The size is 4 bytes written most significant byte first, being the number of
|
|
subsequent data bytes, or the twos complement negative of that when a negative
|
|
integer is represented. The data bytes are the absolute value of the integer,
|
|
written most significant byte first.
|
|
|
|
The most significant data byte is always non-zero, so the output is the same
|
|
on all systems, irrespective of limb size.
|
|
|
|
In GMP 1, leading zero bytes were written to pad the data bytes to a multiple
|
|
of the limb size. @code{mpz_inp_raw} will still accept this, for
|
|
compatibility.
|
|
|
|
The use of ``big endian'' for both the size and data fields is deliberate, it
|
|
makes the data easy to read in a hex dump of a file. Unfortunately it also
|
|
means that the limb data must be reversed when reading or writing, so neither
|
|
a big endian nor little endian system can just read and write @code{_mp_d}.
|
|
|
|
|
|
@node C++ Interface Internals, , Raw Output Internals, Internals
|
|
@section C++ Interface Internals
|
|
@cindex C++ interface internals
|
|
|
|
A system of expression templates is used to ensure something like @code{a=b+c}
|
|
turns into a simple call to @code{mpz_add} etc. For @code{mpf_class}
|
|
the scheme also ensures the precision of the final
|
|
destination is used for any temporaries within a statement like
|
|
@code{f=w*x+y*z}. These are important features which a naive implementation
|
|
cannot provide.
|
|
|
|
A simplified description of the scheme follows. The true scheme is
|
|
complicated by the fact that expressions have different return types. For
|
|
detailed information, refer to the source code.
|
|
|
|
To perform an operation, say, addition, we first define a ``function object''
|
|
evaluating it,
|
|
|
|
@example
|
|
struct __gmp_binary_plus
|
|
@{
|
|
static void eval(mpf_t f, mpf_t g, mpf_t h) @{ mpf_add(f, g, h); @}
|
|
@};
|
|
@end example
|
|
|
|
@noindent
|
|
And an ``additive expression'' object,
|
|
|
|
@example
|
|
__gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
|
|
operator+(const mpf_class &f, const mpf_class &g)
|
|
@{
|
|
return __gmp_expr
|
|
<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
|
|
@}
|
|
@end example
|
|
|
|
The seemingly redundant @code{__gmp_expr<__gmp_binary_expr<@dots{}>>} is used to
|
|
encapsulate any possible kind of expression into a single template type. In
|
|
fact even @code{mpf_class} etc are @code{typedef} specializations of
|
|
@code{__gmp_expr}.
|
|
|
|
Next we define assignment of @code{__gmp_expr} to @code{mpf_class}.
|
|
|
|
@example
|
|
template <class T>
|
|
mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
|
|
@{
|
|
expr.eval(this->get_mpf_t(), this->precision());
|
|
return *this;
|
|
@}
|
|
|
|
template <class Op>
|
|
void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
|
|
(mpf_t f, unsigned long int precision)
|
|
@{
|
|
Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
|
|
@}
|
|
@end example
|
|
|
|
where @code{expr.val1} and @code{expr.val2} are references to the expression's
|
|
operands (here @code{expr} is the @code{__gmp_binary_expr} stored within the
|
|
@code{__gmp_expr}).
|
|
|
|
This way, the expression is actually evaluated only at the time of assignment,
|
|
when the required precision (that of @code{f}) is known. Furthermore the
|
|
target @code{mpf_t} is now available, thus we can call @code{mpf_add} directly
|
|
with @code{f} as the output argument.
|
|
|
|
Compound expressions are handled by defining operators taking subexpressions
|
|
as their arguments, like this:
|
|
|
|
@example
|
|
template <class T, class U>
|
|
__gmp_expr
|
|
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
|
|
operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
|
|
@{
|
|
return __gmp_expr
|
|
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
|
|
(expr1, expr2);
|
|
@}
|
|
@end example
|
|
|
|
And the corresponding specializations of @code{__gmp_expr::eval}:
|
|
|
|
@example
|
|
template <class T, class U, class Op>
|
|
void __gmp_expr
|
|
<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
|
|
(mpf_t f, unsigned long int precision)
|
|
@{
|
|
// declare two temporaries
|
|
mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
|
|
Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
|
|
@}
|
|
@end example
|
|
|
|
The expression is thus recursively evaluated to any level of complexity and
|
|
all subexpressions are evaluated to the precision of @code{f}.
|
|
|
|
|
|
@node Contributors, References, Internals, Top
|
|
@comment node-name, next, previous, up
|
|
@appendix Contributors
|
|
@cindex Contributors
|
|
|
|
Torbjorn Granlund wrote the original GMP library and is still developing and
|
|
maintaining it. Several other individuals and organizations have contributed
|
|
to GMP in various ways. Here is a list in chronological order:
|
|
|
|
Gunnar Sjoedin and Hans Riesel helped with mathematical problems in early
|
|
versions of the library.
|
|
|
|
Richard Stallman contributed to the interface design and revised the first
|
|
version of this manual.
|
|
|
|
Brian Beuning and Doug Lea helped with testing of early versions of the
|
|
library and made creative suggestions.
|
|
|
|
John Amanatides of York University in Canada contributed the function
|
|
@code{mpz_probab_prime_p}.
|
|
|
|
Paul Zimmermann of Inria sparked the development of GMP 2, with his
|
|
comparisons between bignum packages.
|
|
|
|
Ken Weber (Kent State University, Universidade Federal do Rio Grande do Sul)
|
|
contributed @code{mpz_gcd}, @code{mpz_divexact}, @code{mpn_gcd}, and
|
|
@code{mpn_bdivmod}, partially supported by CNPq (Brazil) grant 301314194-2.
|
|
|
|
Per Bothner of Cygnus Support helped to set up GMP to use Cygnus' configure.
|
|
He has also made valuable suggestions and tested numerous intermediary
|
|
releases.
|
|
|
|
Joachim Hollman was involved in the design of the @code{mpf} interface, and in
|
|
the @code{mpz} design revisions for version 2.
|
|
|
|
Bennet Yee contributed the initial versions of @code{mpz_jacobi} and
|
|
@code{mpz_legendre}.
|
|
|
|
Andreas Schwab contributed the files @file{mpn/m68k/lshift.S} and
|
|
@file{mpn/m68k/rshift.S} (now in @file{.asm} form).
|
|
|
|
The development of floating point functions of GNU MP 2, were supported in part
|
|
by the ESPRIT-BRA (Basic Research Activities) 6846 project POSSO (POlynomial
|
|
System SOlving).
|
|
|
|
GNU MP 2 was finished and released by SWOX AB, SWEDEN, in cooperation with the
|
|
IDA Center for Computing Sciences, USA.
|
|
|
|
Robert Harley of Inria, France and David Seal of ARM, England, suggested clever
|
|
improvements for population count.
|
|
|
|
Robert Harley also wrote highly optimized Karatsuba and 3-way Toom
|
|
multiplication functions for GMP 3. He also contributed the ARM assembly
|
|
code.
|
|
|
|
Torsten Ekedahl of the Mathematical department of Stockholm University provided
|
|
significant inspiration during several phases of the GMP development. His
|
|
mathematical expertise helped improve several algorithms.
|
|
|
|
Paul Zimmermann wrote the Divide and Conquer division code, the REDC code, the
|
|
REDC-based mpz_powm code, the FFT multiply code, and the Karatsuba square root
|
|
code. He also rewrote the Toom3 code for GMP 4.2. The ECMNET project Paul is
|
|
organizing was a driving force behind many of the optimizations in GMP 3.
|
|
|
|
Linus Nordberg wrote the new configure system based on autoconf and
|
|
implemented the new random functions.
|
|
|
|
Kent Boortz made the Mac OS 9 port.
|
|
|
|
Kevin Ryde worked on a number of things: optimized x86 code, m4 asm macros,
|
|
parameter tuning, speed measuring, the configure system, function inlining,
|
|
divisibility tests, bit scanning, Jacobi symbols, Fibonacci and Lucas number
|
|
functions, printf and scanf functions, perl interface, demo expression parser,
|
|
the algorithms chapter in the manual, @file{gmpasm-mode.el}, and various
|
|
miscellaneous improvements elsewhere.
|
|
|
|
Steve Root helped write the optimized alpha 21264 assembly code.
|
|
|
|
Gerardo Ballabio wrote the @file{gmpxx.h} C++ class interface and the C++
|
|
@code{istream} input routines.
|
|
|
|
GNU MP 4 was finished and released by Torbjorn Granlund and Kevin Ryde.
|
|
Torbjorn's work was partially funded by the IDA Center for Computing Sciences,
|
|
USA.
|
|
|
|
Jason Moxham rewrote @code{mpz_fac_ui}.
|
|
|
|
Pedro Gimeno implemented the Mersenne Twister and made other random number
|
|
improvements.
|
|
|
|
(This list is chronological, not ordered after significance. If you have
|
|
contributed to GMP/MPIR but are not listed above, please tell
|
|
@uref{http://groups.google.com/group/mpir-devel} about the omission!)
|
|
|
|
Thanks go to Hans Thorsen for donating an SGI system for the GMP test system
|
|
environment.
|
|
|
|
In 2008 GMP was forked and gave rise to the MPIR (Multiple Precision Integers
|
|
and Rationals) project. The following people have contributed to the MPIR project.
|
|
|
|
William Hart did work on the build system and helped get the first release working
|
|
on numerous systems, including adding build support for new assembly patches
|
|
that compile using yasm.
|
|
|
|
Brian Gladman wrote and maintains MSVC project files so the project can build
|
|
on MSVC. He also did the initial conversion of Pierrick Gaudry's and Jason
|
|
Martin's assembly patches to intel format.
|
|
|
|
Pierrick Gaudry wrote some fast assembly support for AMD 64.
|
|
|
|
Jason Martin wrote some fast assembly patches for Core 2 and converted them to
|
|
intel format.
|
|
|
|
Gonzalo Tornaria helped patch config.guess and associated files to distinguish
|
|
modern processors.
|
|
|
|
Michael Abshoff helped resolve some build issues on various platforms. He is the
|
|
release manager for the MPIR project.
|
|
|
|
@node References, GNU Free Documentation License, Contributors, Top
|
|
@comment node-name, next, previous, up
|
|
@appendix References
|
|
@cindex References
|
|
|
|
@c FIXME: In tex, the @uref's are unhyphenated, which is good for clarity,
|
|
@c but being long words they upset paragraph formatting (the preceding line
|
|
@c can get badly stretched). Would like an conditional @* style line break
|
|
@c if the uref is too long to fit on the last line of the paragraph, but it's
|
|
@c not clear how to do that. For now explicit @texlinebreak{}s are used on
|
|
@c paragraphs that come out bad.
|
|
|
|
@section Books
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Jonathan M. Borwein and Peter B. Borwein, ``Pi and the AGM: A Study in
|
|
Analytic Number Theory and Computational Complexity'', Wiley, 1998.
|
|
|
|
@item
|
|
Henri Cohen, ``A Course in Computational Algebraic Number Theory'', Graduate
|
|
Texts in Mathematics number 138, Springer-Verlag, 1993.
|
|
@texlinebreak{} @uref{http://www.math.u-bordeaux.fr/~cohen/}
|
|
|
|
@item
|
|
Donald E. Knuth, ``The Art of Computer Programming'', volume 2,
|
|
``Seminumerical Algorithms'', 3rd edition, Addison-Wesley, 1998.
|
|
@texlinebreak{} @uref{http://www-cs-faculty.stanford.edu/~knuth/taocp.html}
|
|
|
|
@item
|
|
John D. Lipson, ``Elements of Algebra and Algebraic Computing'',
|
|
The Benjamin Cummings Publishing Company Inc, 1981.
|
|
|
|
@item
|
|
Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone, ``Handbook of
|
|
Applied Cryptography'', @uref{http://www.cacr.math.uwaterloo.ca/hac/}
|
|
|
|
@item
|
|
Richard M. Stallman, ``Using and Porting GCC'', Free Software Foundation, 1999,
|
|
available online @uref{http://gcc.gnu.org/onlinedocs/}, and in
|
|
the GCC package @uref{ftp://ftp.gnu.org/gnu/gcc/}
|
|
@end itemize
|
|
|
|
@section Papers
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Yves Bertot, Nicolas Magaud and Paul Zimmermann, ``A Proof of GMP Square
|
|
Root'', Journal of Automated Reasoning, volume 29, 2002, pp.@: 225-252. Also
|
|
available online as INRIA Research Report 4475, June 2001,
|
|
@uref{http://www.inria.fr/rrrt/rr-4475.html}
|
|
|
|
@item
|
|
Christoph Burnikel and Joachim Ziegler, ``Fast Recursive Division'',
|
|
Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022,
|
|
@texlinebreak{} @uref{http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022}
|
|
|
|
@item
|
|
Torbjorn Granlund and Peter L. Montgomery, ``Division by Invariant Integers
|
|
using Multiplication'', in Proceedings of the SIGPLAN PLDI'94 Conference, June
|
|
1994. Also available @uref{ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz}
|
|
(and .psl.gz).
|
|
|
|
@item
|
|
Tudor Jebelean,
|
|
``An algorithm for exact division'',
|
|
Journal of Symbolic Computation,
|
|
volume 15, 1993, pp.@: 169-180.
|
|
Research report version available @texlinebreak{}
|
|
@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz}
|
|
|
|
@item
|
|
Tudor Jebelean, ``Exact Division with Karatsuba Complexity - Extended
|
|
Abstract'', RISC-Linz technical report 96-31, @texlinebreak{}
|
|
@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz}
|
|
|
|
@item
|
|
Tudor Jebelean, ``Practical Integer Division with Karatsuba Complexity'',
|
|
ISSAC 97, pp.@: 339-341. Technical report available @texlinebreak{}
|
|
@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz}
|
|
|
|
@item
|
|
Tudor Jebelean, ``A Generalization of the Binary GCD Algorithm'', ISSAC 93,
|
|
pp.@: 111-116. Technical report version available @texlinebreak{}
|
|
@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz}
|
|
|
|
@item
|
|
Tudor Jebelean, ``A Double-Digit Lehmer-Euclid Algorithm for Finding the GCD
|
|
of Long Integers'', Journal of Symbolic Computation, volume 19, 1995,
|
|
pp.@: 145-157. Technical report version also available @texlinebreak{}
|
|
@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz}
|
|
|
|
@item
|
|
Werner Krandick and Tudor Jebelean, ``Bidirectional Exact Integer Division'',
|
|
Journal of Symbolic Computation, volume 21, 1996, pp.@: 441-455. Early
|
|
technical report version also available
|
|
@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz}
|
|
|
|
@item
|
|
Makoto Matsumoto and Takuji Nishimura, ``Mersenne Twister: A 623-dimensionally
|
|
equidistributed uniform pseudorandom number generator'', ACM Transactions on
|
|
Modelling and Computer Simulation, volume 8, January 1998, pp.@: 3-30.
|
|
Available online @texlinebreak{}
|
|
@uref{http://www.math.keio.ac.jp/~nisimura/random/doc/mt.ps.gz} (or .pdf)
|
|
|
|
@item
|
|
R. Moenck and A. Borodin, ``Fast Modular Transforms via Division'',
|
|
Proceedings of the 13th Annual IEEE Symposium on Switching and Automata
|
|
Theory, October 1972, pp.@: 90-96. Reprinted as ``Fast Modular Transforms'',
|
|
Journal of Computer and System Sciences, volume 8, number 3, June 1974,
|
|
pp.@: 366-386.
|
|
|
|
@item
|
|
Peter L. Montgomery, ``Modular Multiplication Without Trial Division'', in
|
|
Mathematics of Computation, volume 44, number 170, April 1985.
|
|
|
|
@item
|
|
Arnold Sch@"onhage and Volker Strassen, ``Schnelle Multiplikation grosser
|
|
Zahlen'', Computing 7, 1971, pp.@: 281-292.
|
|
|
|
@item
|
|
Kenneth Weber, ``The accelerated integer GCD algorithm'',
|
|
ACM Transactions on Mathematical Software,
|
|
volume 21, number 1, March 1995, pp.@: 111-122.
|
|
|
|
@item
|
|
Paul Zimmermann, ``Karatsuba Square Root'', INRIA Research Report 3805,
|
|
November 1999, @uref{http://www.inria.fr/rrrt/rr-3805.html}
|
|
|
|
@item
|
|
Paul Zimmermann, ``A Proof of GMP Fast Division and Square Root
|
|
Implementations'', @texlinebreak{}
|
|
@uref{http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz}
|
|
|
|
@item
|
|
Dan Zuras, ``On Squaring and Multiplying Large Integers'', ARITH-11: IEEE
|
|
Symposium on Computer Arithmetic, 1993, pp.@: 260 to 271. Reprinted as ``More
|
|
on Multiplying and Squaring Large Integers'', IEEE Transactions on Computers,
|
|
volume 43, number 8, August 1994, pp.@: 899-908.
|
|
@end itemize
|
|
|
|
|
|
@node GNU Free Documentation License, Concept Index, References, Top
|
|
@appendix GNU Free Documentation License
|
|
@cindex GNU Free Documentation License
|
|
@cindex Free Documentation License
|
|
@cindex Documentation license
|
|
@include fdl.texi
|
|
|
|
|
|
@node Concept Index, Function Index, GNU Free Documentation License, Top
|
|
@comment node-name, next, previous, up
|
|
@unnumbered Concept Index
|
|
@printindex cp
|
|
|
|
@node Function Index, , Concept Index, Top
|
|
@comment node-name, next, previous, up
|
|
@unnumbered Function and Type Index
|
|
@printindex fn
|
|
|
|
@bye
|
|
|
|
@c Local variables:
|
|
@c fill-column: 78
|
|
@c compile-command: "make gmp.info"
|
|
@c End:
|