52 lines
1.8 KiB
Plaintext
52 lines
1.8 KiB
Plaintext
|
Copyright 1996 Free Software Foundation, Inc.
|
||
|
|
||
|
This file is part of the GNU MP Library.
|
||
|
|
||
|
The GNU MP Library is free software; you can redistribute it and/or modify
|
||
|
it under the terms of the GNU Lesser General Public License as published by
|
||
|
the Free Software Foundation; either version 2.1 of the License, or (at your
|
||
|
option) any later version.
|
||
|
|
||
|
The GNU MP Library is distributed in the hope that it will be useful, but
|
||
|
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
|
||
|
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
|
||
|
License for more details.
|
||
|
|
||
|
You should have received a copy of the GNU Lesser General Public License
|
||
|
along with the GNU MP Library; see the file COPYING.LIB. If not, write to
|
||
|
the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
|
||
|
02110-1301, USA.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
This directory contains mpn functions optimized for MIPS3. Example of
|
||
|
processors that implement MIPS3 are R4000, R4400, R4600, R4700, and R8000.
|
||
|
|
||
|
RELEVANT OPTIMIZATION ISSUES
|
||
|
|
||
|
1. On the R4000 and R4400, branches, both the plain and the "likely" ones,
|
||
|
take 3 cycles to execute. (The fastest possible loop will take 4 cycles,
|
||
|
because of the delay insn.)
|
||
|
|
||
|
On the R4600, branches takes a single cycle
|
||
|
|
||
|
On the R8000, branches often take no noticable cycles, as they are
|
||
|
executed in a separate function unit..
|
||
|
|
||
|
2. The R4000 and R4400 have a load latency of 4 cycles.
|
||
|
|
||
|
3. On the R4000 and R4400, multiplies take a data-dependent number of
|
||
|
cycles, contrary to the SGI documentation. There seem to be 3 or 4
|
||
|
possible latencies.
|
||
|
|
||
|
4. The R1x000 processors can issue one floating-point operation, two integer
|
||
|
operations, and one memory operation per cycle. The FPU has very short
|
||
|
latencies, while the integer multiply unit is non-pipelined. We should
|
||
|
therefore write fp based mpn_Xmul_1.
|
||
|
|
||
|
STATUS
|
||
|
|
||
|
Good...
|