mpir/mpn/ia64
2015-11-13 14:47:44 +00:00
..
add_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
addlsh1_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
addmul_1.asm
addmul_2.asm
addmul_4.asm
and_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
andn_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
copyd.asm
copyi.asm
divexact_1.asm change dive_1.* to divexact_1.* 2010-08-13 12:54:25 +00:00
divrem_2.asm
divrem_euclidean_qr_1.asm
gcd_1.asm
gmp-mparam.h Remove sb_div* small implementation (due to bug and due to being a very minor 2015-11-13 14:47:44 +00:00
hamdist.asm
ia64-defs.m4
invert_limb.asm
ior_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
iorn_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
longlong_inc.h move asm code in gmp-impl into the arch specific dirs 2011-04-30 07:05:19 +00:00
lshift.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
modexact_1c_odd.asm replace mode1o.* with modexact_1c_odd.* 2010-08-13 14:09:23 +00:00
mul_1.asm
mul_2.asm
nand_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
nior_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
popcount.asm
README
rsh1add_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
rsh1sub_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
rshift.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
sqr_diagonal.asm
sub_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
sublsh1_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
submul_1.c
xnor_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00
xor_n.asm Properly quote define so that m4 does not fail. 2013-08-01 19:07:47 +02:00

Copyright 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.

This file is part of the GNU MP Library.

The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at your
option) any later version.

The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.



                      IA-64 MPN SUBROUTINES


This directory contains mpn functions for the IA-64 architecture.


CODE ORGANIZATION

	mpn/ia64          itanium-2, and generic ia64

The code here has been optimized primarily for Itanium 2.  Very few Itanium 1
chips were ever sold, and Itanium 2 is more powerful, so the latter is what
we concentrate on.



CHIP NOTES

The IA-64 ISA keeps instructions three and three in 128 bit bundles.
Programmers/compilers need to put explicit breaks `;;' when there are WAW or
RAW dependencies, with some notable exceptions.  Such "breaks" are typically
at the end of a bundle, but can be put between operations within some bundle
types too.

The Itanium 1 and Itanium 2 implementations can under ideal conditions
execute two bundles per cycle.  The Itanium 2 allows 4 of these instructions
to do integer operations, while the Itanium 2 allows all 6 to be integer
operations.

Taken cloop branches seem to insert a bubble into the pipeline most of the
time on Itanium 1.

Loads to the fp registers bypass the L1 cache and thus get extremely long
latencies, 9 cycles on the Itanium 1 and 6 cycles on the Itanium 2.

The software pipeline stuff using br.ctop instruction causes delays, since
many issue slots are taken up by instructions with zero predicates, and
since many extra instructions are needed to set things up.  These features
are clearly designed for code density, not speed.

Misc pipeline limitations (Itanium 1):
* The getf.sig instruction can only execute in M0.
* At most four integer instructions/cycle.
* Nops take up resources like any plain instructions.

Misc pipeline limitations (Itanium 2):
* The getf.sig instruction can only execute in M0.
* Nops take up resources like any plain instructions.


ASSEMBLY SYNTAX

.align pads with nops in a text segment, but gas 2.14 and earlier
incorrectly byte-swaps its nop bundle in big endian mode (eg. hpux), making
it come out as break instructions.  We use the ALIGN() macro in
mpn/ia64/ia64-defs.m4 when it might be executed across.  That macro
suppresses any .align if the problem is detected by configure.  Lack of
alignment might hurt performance but will at least be correct.

foo:: to create a global symbol is not accepted by gas.  Use separate
".global foo" and "foo:" instead.

.global is the standard global directive.  gas accepts .globl, but hpux "as"
doesn't.

.proc / .endp generates the appropriate .type and .size information for ELF,
so the latter directives don't need to be given explicitly.

.pred.rel "mutex"... is standard for annotating predicate register
relationships.  gas also accepts .pred.rel.mutex, but hpux "as" doesn't.

.pred directives can't be put on a line with a label, like
".Lfoo: .pred ...", the HP assembler on HP-UX 11.23 rejects that.
gas is happy with it, and past versions of HP had seemed ok.

// is the standard comment sequence, but we prefer "C" since it inhibits m4
macro expansion.  See comments in ia64-defs.m4.


REGISTER USAGE

Special:
   r0: constant 0
   r1: global pointer (gp)
   r8: return value
   r12: stack pointer (sp)
   r13: thread pointer (tp)
Caller-saves: r8-r11 r14-r31 f6-f15 f32-f127
Caller-saves but rotating: r32-


REFERENCES

Intel Itanium Architecture Software Developer's Manual, volumes 1 to 3,
Intel document 245317-004, 245318-004, 245319-004 October 2002.  Volume 1
includes an Itanium optimization guide.

Intel Itanium Processor-specific Application Binary Interface (ABI), Intel
document 245370-003, May 2001.  Describes C type sizes, dynamic linking,
etc.

Intel Itanium Architecture Assembly Language Reference Guide, Intel document
248801-004, 2000-2002.  Describes assembly instruction syntax and other
directives.

Itanium Software Conventions and Runtime Architecture Guide, Intel document
245358-003, May 2001.  Describes calling conventions, including stack
unwinding requirements.

Intel Itanium Processor Reference Manual for Software Optimization, Intel
document 245473-003, November 2001.

Intel Itanium-2 Processor Reference Manual for Software Development and
Optimization, Intel document 251110-003, May 2004.

All the above documents can be found online at

    http://developer.intel.com/design/itanium/manuals.htm


----------------
Local variables:
mode: text
fill-column: 76
End: