Fixed MTP to work with TWRP

This commit is contained in:
awab228 2018-06-19 23:16:04 +02:00
commit f6dfaef42e
50820 changed files with 20846062 additions and 0 deletions

481
arch/sparc/lib/COPYING.LIB Normal file
View file

@ -0,0 +1,481 @@
GNU LIBRARY GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1991 Free Software Foundation, Inc.
675 Mass Ave, Cambridge, MA 02139, USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
[This is the first released version of the library GPL. It is
numbered 2 because it goes with version 2 of the ordinary GPL.]
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
Licenses are intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users.
This license, the Library General Public License, applies to some
specially designated Free Software Foundation software, and to any
other libraries whose authors decide to use it. You can use it for
your libraries, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if
you distribute copies of the library, or if you modify it.
For example, if you distribute copies of the library, whether gratis
or for a fee, you must give the recipients all the rights that we gave
you. You must make sure that they, too, receive or can get the source
code. If you link a program with the library, you must provide
complete object files to the recipients so that they can relink them
with the library, after making changes to the library and recompiling
it. And you must show them these terms so they know their rights.
Our method of protecting your rights has two steps: (1) copyright
the library, and (2) offer you this license which gives you legal
permission to copy, distribute and/or modify the library.
Also, for each distributor's protection, we want to make certain
that everyone understands that there is no warranty for this free
library. If the library is modified by someone else and passed on, we
want its recipients to know that what they have is not the original
version, so that any problems introduced by others will not reflect on
the original authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that companies distributing free
software will individually obtain patent licenses, thus in effect
transforming the program into proprietary software. To prevent this,
we have made it clear that any patent must be licensed for everyone's
free use or not licensed at all.
Most GNU software, including some libraries, is covered by the ordinary
GNU General Public License, which was designed for utility programs. This
license, the GNU Library General Public License, applies to certain
designated libraries. This license is quite different from the ordinary
one; be sure to read it in full, and don't assume that anything in it is
the same as in the ordinary license.
The reason we have a separate public license for some libraries is that
they blur the distinction we usually make between modifying or adding to a
program and simply using it. Linking a program with a library, without
changing the library, is in some sense simply using the library, and is
analogous to running a utility program or application program. However, in
a textual and legal sense, the linked executable is a combined work, a
derivative of the original library, and the ordinary General Public License
treats it as such.
Because of this blurred distinction, using the ordinary General
Public License for libraries did not effectively promote software
sharing, because most developers did not use the libraries. We
concluded that weaker conditions might promote sharing better.
However, unrestricted linking of non-free programs would deprive the
users of those programs of all benefit from the free status of the
libraries themselves. This Library General Public License is intended to
permit developers of non-free programs to use free libraries, while
preserving your freedom as a user of such programs to change the free
libraries that are incorporated in them. (We have not seen how to achieve
this as regards changes in header files, but we have achieved it as regards
changes in the actual functions of the Library.) The hope is that this
will lead to faster development of free libraries.
The precise terms and conditions for copying, distribution and
modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, while the latter only
works together with the library.
Note that it is possible for a library to be covered by the ordinary
General Public License rather than by this special one.
GNU LIBRARY GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any software library which
contains a notice placed by the copyright holder or other authorized
party saying it may be distributed under the terms of this Library
General Public License (also called "this License"). Each licensee is
addressed as "you".
A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.
The "Library", below, refers to any such software library or work
which has been distributed under these terms. A "work based on the
Library" means either the Library or any derivative work under
copyright law: that is to say, a work containing the Library or a
portion of it, either verbatim or with modifications and/or translated
straightforwardly into another language. (Hereinafter, translation is
included without limitation in the term "modification".)
"Source code" for a work means the preferred form of the work for
making modifications to it. For a library, complete source code means
all the source code for all modules it contains, plus any associated
interface definition files, plus the scripts used to control compilation
and installation of the library.
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running a program using the Library is not restricted, and output from
such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it). Whether that is true depends on what the Library does
and what the program that uses the Library does.
1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any
warranty; and distribute a copy of this License along with the
Library.
You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.
2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) The modified work must itself be a software library.
b) You must cause the files modified to carry prominent notices
stating that you changed the files and the date of any change.
c) You must cause the whole of the work to be licensed at no
charge to all third parties under the terms of this License.
d) If a facility in the modified Library refers to a function or a
table of data to be supplied by an application program that uses
the facility, other than as an argument passed when the facility
is invoked, then you must make a good faith effort to ensure that,
in the event an application does not supply such function or
table, the facility still operates, and performs whatever part of
its purpose remains meaningful.
(For example, a function in a library to compute square roots has
a purpose that is entirely well-defined independent of the
application. Therefore, Subsection 2d requires that any
application-supplied function or table used by this function must
be optional: if the application does not supply it, the square
root function must still compute square roots.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Library,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Library, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote
it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Library.
In addition, mere aggregation of another work not based on the Library
with the Library (or with a work based on the Library) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library. To do
this, you must alter all the notices that refer to this License, so
that they refer to the ordinary GNU General Public License, version 2,
instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in
these notices.
Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.
This option is useful when you wish to copy part of the code of
the Library into a program that is not a library.
4. You may copy and distribute the Library (or a portion or
derivative of it, under Section 2) in object code or executable form
under the terms of Sections 1 and 2 above provided that you accompany
it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange.
If distribution of object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the
source code from the same place satisfies the requirement to
distribute the source code, even though third parties are not
compelled to copy the source along with the object code.
5. A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library". Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.
However, linking a "work that uses the Library" with the Library
creates an executable that is a derivative of the Library (because it
contains portions of the Library), rather than a "work that uses the
library". The executable is therefore covered by this License.
Section 6 states terms for distribution of such executables.
When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be
linked without the Library, or if the work is itself a library. The
threshold for this to be true is not precisely defined by law.
If such an object file uses only numerical parameters, data
structure layouts and accessors, and small macros and small inline
functions (ten lines or less in length), then the use of the object
file is unrestricted, regardless of whether it is legally a derivative
work. (Executables containing this object code plus portions of the
Library will still fall under Section 6.)
Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.
6. As an exception to the Sections above, you may also compile or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
under terms of your choice, provided that the terms permit
modification of the work for the customer's own use and reverse
engineering for debugging such modifications.
You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License. You must supply a copy of this License. If the work
during execution displays copyright notices, you must include the
copyright notice for the Library among them, as well as a reference
directing the user to the copy of this License. Also, you must do one
of these things:
a) Accompany the work with the complete corresponding
machine-readable source code for the Library including whatever
changes were used in the work (which must be distributed under
Sections 1 and 2 above); and, if the work is an executable linked
with the Library, with the complete machine-readable "work that
uses the Library", as object code and/or source code, so that the
user can modify the Library and then relink to produce a modified
executable containing the modified Library. (It is understood
that the user who changes the contents of definitions files in the
Library will not necessarily be able to recompile the application
to use the modified definitions.)
b) Accompany the work with a written offer, valid for at
least three years, to give the same user the materials
specified in Subsection 6a, above, for a charge no more
than the cost of performing this distribution.
c) If distribution of the work is made by offering access to copy
from a designated place, offer equivalent access to copy the above
specified materials from the same place.
d) Verify that the user has already received a copy of these
materials or that you have already sent this user a copy.
For an executable, the required form of the "work that uses the
Library" must include any data and utility programs needed for
reproducing the executable from it. However, as a special exception,
the source code distributed need not include anything that is normally
distributed (in either source or binary form) with the major
components (compiler, kernel, and so on) of the operating system on
which the executable runs, unless that component itself accompanies
the executable.
It may happen that this requirement contradicts the license
restrictions of other proprietary libraries that do not normally
accompany the operating system. Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.
7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
library, provided that the separate distribution of the work based on
the Library and of the other library facilities is otherwise
permitted, and provided that you do these two things:
a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities. This must be distributed under the terms of the
Sections above.
b) Give prominent notice with the combined library of the fact
that part of it is a work based on the Library, and explaining
where to find the accompanying uncombined form of the same work.
8. You may not copy, modify, sublicense, link with, or distribute
the Library except as expressly provided under this License. Any
attempt otherwise to copy, modify, sublicense, link with, or
distribute the Library is void, and will automatically terminate your
rights under this License. However, parties who have received copies,
or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
9. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Library or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Library or works based on it.
10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Library at all. For example, if a patent
license would not permit royalty-free redistribution of the Library by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Library.
If any portion of this section is held invalid or unenforceable under any
particular circumstance, the balance of the section is intended to apply,
and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License may add
an explicit geographical distribution limitation excluding those countries,
so that distribution is permitted only in or among countries not thus
excluded. In such case, this License incorporates the limitation as if
written in the body of this License.
13. The Free Software Foundation may publish revised and/or new
versions of the Library General Public License from time to time.
Such new versions will be similar in spirit to the present version,
but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Library
specifies a version number of this License which applies to it and
"any later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.
14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this. Our
decision will be guided by the two goals of preserving the free status
of all derivatives of our free software and of promoting the sharing
and reuse of software generally.
NO WARRANTY
15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
Appendix: How to Apply These Terms to Your New Libraries
If you develop a new library, and you want it to be of the greatest
possible use to the public, we recommend making it free software that
everyone can redistribute and change. You can do so by permitting
redistribution under these terms (or, alternatively, under the terms of the
ordinary General Public License).
To apply these terms, attach the following notices to the library. It is
safest to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.
<one line to give the library's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Also add information on how to contact you by electronic and paper mail.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the library, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
library `Frob' (a library for tweaking knobs) written by James Random Hacker.
<signature of Ty Coon>, 1 April 1990
Ty Coon, President of Vice
That's all there is to it!

156
arch/sparc/lib/GENbzero.S Normal file
View file

@ -0,0 +1,156 @@
/* GENbzero.S: Generic sparc64 memset/clear_user.
*
* Copyright (C) 2007 David S. Miller (davem@davemloft.net)
*/
#include <asm/asi.h>
#define EX_ST(x,y) \
98: x,y; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_o1; \
.text; \
.align 4;
.align 32
.text
.globl GENmemset
.type GENmemset, #function
GENmemset: /* %o0=buf, %o1=pat, %o2=len */
and %o1, 0xff, %o3
mov %o2, %o1
sllx %o3, 8, %g1
or %g1, %o3, %o2
sllx %o2, 16, %g1
or %g1, %o2, %o2
sllx %o2, 32, %g1
ba,pt %xcc, 1f
or %g1, %o2, %o2
.globl GENbzero
.type GENbzero, #function
GENbzero:
clr %o2
1: brz,pn %o1, GENbzero_return
mov %o0, %o3
/* %o5: saved %asi, restored at GENbzero_done
* %o4: store %asi to use
*/
rd %asi, %o5
mov ASI_P, %o4
wr %o4, 0x0, %asi
GENbzero_from_clear_user:
cmp %o1, 15
bl,pn %icc, GENbzero_tiny
andcc %o0, 0x7, %g1
be,pt %xcc, 2f
mov 8, %g2
sub %g2, %g1, %g1
sub %o1, %g1, %o1
1: EX_ST(stba %o2, [%o0 + 0x00] %asi)
subcc %g1, 1, %g1
bne,pt %xcc, 1b
add %o0, 1, %o0
2: cmp %o1, 128
bl,pn %icc, GENbzero_medium
andcc %o0, (64 - 1), %g1
be,pt %xcc, GENbzero_pre_loop
mov 64, %g2
sub %g2, %g1, %g1
sub %o1, %g1, %o1
1: EX_ST(stxa %o2, [%o0 + 0x00] %asi)
subcc %g1, 8, %g1
bne,pt %xcc, 1b
add %o0, 8, %o0
GENbzero_pre_loop:
andn %o1, (64 - 1), %g1
sub %o1, %g1, %o1
GENbzero_loop:
EX_ST(stxa %o2, [%o0 + 0x00] %asi)
EX_ST(stxa %o2, [%o0 + 0x08] %asi)
EX_ST(stxa %o2, [%o0 + 0x10] %asi)
EX_ST(stxa %o2, [%o0 + 0x18] %asi)
EX_ST(stxa %o2, [%o0 + 0x20] %asi)
EX_ST(stxa %o2, [%o0 + 0x28] %asi)
EX_ST(stxa %o2, [%o0 + 0x30] %asi)
EX_ST(stxa %o2, [%o0 + 0x38] %asi)
subcc %g1, 64, %g1
bne,pt %xcc, GENbzero_loop
add %o0, 64, %o0
membar #Sync
wr %o4, 0x0, %asi
brz,pn %o1, GENbzero_done
GENbzero_medium:
andncc %o1, 0x7, %g1
be,pn %xcc, 2f
sub %o1, %g1, %o1
1: EX_ST(stxa %o2, [%o0 + 0x00] %asi)
subcc %g1, 8, %g1
bne,pt %xcc, 1b
add %o0, 8, %o0
2: brz,pt %o1, GENbzero_done
nop
GENbzero_tiny:
1: EX_ST(stba %o2, [%o0 + 0x00] %asi)
subcc %o1, 1, %o1
bne,pt %icc, 1b
add %o0, 1, %o0
/* fallthrough */
GENbzero_done:
wr %o5, 0x0, %asi
GENbzero_return:
retl
mov %o3, %o0
.size GENbzero, .-GENbzero
.size GENmemset, .-GENmemset
.globl GENclear_user
.type GENclear_user, #function
GENclear_user: /* %o0=buf, %o1=len */
rd %asi, %o5
brz,pn %o1, GENbzero_done
clr %o3
cmp %o5, ASI_AIUS
bne,pn %icc, GENbzero
clr %o2
ba,pt %xcc, GENbzero_from_clear_user
mov ASI_AIUS, %o4
.size GENclear_user, .-GENclear_user
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define GEN_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl generic_patch_bzero
.type generic_patch_bzero,#function
generic_patch_bzero:
GEN_DO_PATCH(memset, GENmemset)
GEN_DO_PATCH(__bzero, GENbzero)
GEN_DO_PATCH(__clear_user, GENclear_user)
retl
nop
.size generic_patch_bzero,.-generic_patch_bzero

View file

@ -0,0 +1,30 @@
/* GENcopy_from_user.S: Generic sparc64 copy from userspace.
*
* Copyright (C) 2007 David S. Miller (davem@davemloft.net)
*/
#define EX_LD(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one; \
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#define FUNC_NAME GENcopy_from_user
#define LOAD(type,addr,dest) type##a [addr] ASI_AIUS, dest
#define EX_RETVAL(x) 0
#ifdef __KERNEL__
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "GENmemcpy.S"

View file

@ -0,0 +1,34 @@
/* GENcopy_to_user.S: Generic sparc64 copy to userspace.
*
* Copyright (C) 2007 David S. Miller (davem@davemloft.net)
*/
#define EX_ST(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one; \
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#define FUNC_NAME GENcopy_to_user
#define STORE(type,src,addr) type##a src, [addr] ASI_AIUS
#define EX_RETVAL(x) 0
#ifdef __KERNEL__
/* Writing to %asi is _expensive_ so we hardcode it.
* Reading %asi to check for KERNEL_DS is comparatively
* cheap.
*/
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "GENmemcpy.S"

121
arch/sparc/lib/GENmemcpy.S Normal file
View file

@ -0,0 +1,121 @@
/* GENmemcpy.S: Generic sparc64 memcpy.
*
* Copyright (C) 2007 David S. Miller (davem@davemloft.net)
*/
#ifdef __KERNEL__
#define GLOBAL_SPARE %g7
#else
#define GLOBAL_SPARE %g5
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
#ifndef EX_ST
#define EX_ST(x) x
#endif
#ifndef EX_RETVAL
#define EX_RETVAL(x) x
#endif
#ifndef LOAD
#define LOAD(type,addr,dest) type [addr], dest
#endif
#ifndef STORE
#define STORE(type,src,addr) type src, [addr]
#endif
#ifndef FUNC_NAME
#define FUNC_NAME GENmemcpy
#endif
#ifndef PREAMBLE
#define PREAMBLE
#endif
#ifndef XCC
#define XCC xcc
#endif
.register %g2,#scratch
.register %g3,#scratch
.text
.align 64
.globl FUNC_NAME
.type FUNC_NAME,#function
FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
srlx %o2, 31, %g2
cmp %g2, 0
tne %XCC, 5
PREAMBLE
mov %o0, GLOBAL_SPARE
cmp %o2, 0
be,pn %XCC, 85f
or %o0, %o1, %o3
cmp %o2, 16
blu,a,pn %XCC, 80f
or %o3, %o2, %o3
xor %o0, %o1, %o4
andcc %o4, 0x7, %g0
bne,a,pn %XCC, 90f
sub %o0, %o1, %o3
and %o0, 0x7, %o4
sub %o4, 0x8, %o4
sub %g0, %o4, %o4
sub %o2, %o4, %o2
1: subcc %o4, 1, %o4
EX_LD(LOAD(ldub, %o1, %g1))
EX_ST(STORE(stb, %g1, %o0))
add %o1, 1, %o1
bne,pt %XCC, 1b
add %o0, 1, %o0
andn %o2, 0x7, %g1
sub %o2, %g1, %o2
1: subcc %g1, 0x8, %g1
EX_LD(LOAD(ldx, %o1, %g2))
EX_ST(STORE(stx, %g2, %o0))
add %o1, 0x8, %o1
bne,pt %XCC, 1b
add %o0, 0x8, %o0
brz,pt %o2, 85f
sub %o0, %o1, %o3
ba,a,pt %XCC, 90f
.align 64
80: /* 0 < len <= 16 */
andcc %o3, 0x3, %g0
bne,pn %XCC, 90f
sub %o0, %o1, %o3
1:
subcc %o2, 4, %o2
EX_LD(LOAD(lduw, %o1, %g1))
EX_ST(STORE(stw, %g1, %o1 + %o3))
bgu,pt %XCC, 1b
add %o1, 4, %o1
85: retl
mov EX_RETVAL(GLOBAL_SPARE), %o0
.align 32
90:
subcc %o2, 1, %o2
EX_LD(LOAD(ldub, %o1, %g1))
EX_ST(STORE(stb, %g1, %o1 + %o3))
bgu,pt %XCC, 90b
add %o1, 1, %o1
retl
mov EX_RETVAL(GLOBAL_SPARE), %o0
.size FUNC_NAME, .-FUNC_NAME

77
arch/sparc/lib/GENpage.S Normal file
View file

@ -0,0 +1,77 @@
/* GENpage.S: Generic clear and copy page.
*
* Copyright (C) 2007 (davem@davemloft.net)
*/
#include <asm/page.h>
.text
.align 32
GENcopy_user_page:
set PAGE_SIZE, %g7
1: ldx [%o1 + 0x00], %o2
ldx [%o1 + 0x08], %o3
ldx [%o1 + 0x10], %o4
ldx [%o1 + 0x18], %o5
stx %o2, [%o0 + 0x00]
stx %o3, [%o0 + 0x08]
stx %o4, [%o0 + 0x10]
stx %o5, [%o0 + 0x18]
ldx [%o1 + 0x20], %o2
ldx [%o1 + 0x28], %o3
ldx [%o1 + 0x30], %o4
ldx [%o1 + 0x38], %o5
stx %o2, [%o0 + 0x20]
stx %o3, [%o0 + 0x28]
stx %o4, [%o0 + 0x30]
stx %o5, [%o0 + 0x38]
subcc %g7, 64, %g7
add %o1, 64, %o1
bne,pt %xcc, 1b
add %o0, 64, %o0
retl
nop
GENclear_page:
GENclear_user_page:
set PAGE_SIZE, %g7
1: stx %g0, [%o0 + 0x00]
stx %g0, [%o0 + 0x08]
stx %g0, [%o0 + 0x10]
stx %g0, [%o0 + 0x18]
stx %g0, [%o0 + 0x20]
stx %g0, [%o0 + 0x28]
stx %g0, [%o0 + 0x30]
stx %g0, [%o0 + 0x38]
subcc %g7, 64, %g7
bne,pt %xcc, 1b
add %o0, 64, %o0
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define GEN_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl generic_patch_pageops
.type generic_patch_pageops,#function
generic_patch_pageops:
GEN_DO_PATCH(copy_user_page, GENcopy_user_page)
GEN_DO_PATCH(_clear_page, GENclear_page)
GEN_DO_PATCH(clear_user_page, GENclear_user_page)
retl
nop
.size generic_patch_pageops,.-generic_patch_pageops

33
arch/sparc/lib/GENpatch.S Normal file
View file

@ -0,0 +1,33 @@
/* GENpatch.S: Patch Ultra-I routines with generic variant.
*
* Copyright (C) 2007 David S. Miller <davem@davemloft.net>
*/
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define GEN_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl generic_patch_copyops
.type generic_patch_copyops,#function
generic_patch_copyops:
GEN_DO_PATCH(memcpy, GENmemcpy)
GEN_DO_PATCH(___copy_from_user, GENcopy_from_user)
GEN_DO_PATCH(___copy_to_user, GENcopy_to_user)
retl
nop
.size generic_patch_copyops,.-generic_patch_copyops

47
arch/sparc/lib/Makefile Normal file
View file

@ -0,0 +1,47 @@
# Makefile for Sparc library files..
#
asflags-y := -ansi -DST_DIV0=0x02
ccflags-y := -Werror
lib-$(CONFIG_SPARC32) += ashrdi3.o
lib-$(CONFIG_SPARC32) += memcpy.o memset.o
lib-y += strlen.o
lib-y += checksum_$(BITS).o
lib-$(CONFIG_SPARC32) += blockops.o
lib-y += memscan_$(BITS).o memcmp.o strncmp_$(BITS).o
lib-$(CONFIG_SPARC32) += divdi3.o udivdi3.o
lib-$(CONFIG_SPARC32) += copy_user.o locks.o
lib-$(CONFIG_SPARC64) += atomic_64.o
lib-$(CONFIG_SPARC32) += lshrdi3.o ashldi3.o
lib-$(CONFIG_SPARC32) += muldi3.o bitext.o cmpdi2.o
lib-$(CONFIG_SPARC64) += copy_page.o clear_page.o bzero.o
lib-$(CONFIG_SPARC64) += csum_copy.o csum_copy_from_user.o csum_copy_to_user.o
lib-$(CONFIG_SPARC64) += VISsave.o
lib-$(CONFIG_SPARC64) += bitops.o
lib-$(CONFIG_SPARC64) += U1memcpy.o U1copy_from_user.o U1copy_to_user.o
lib-$(CONFIG_SPARC64) += U3memcpy.o U3copy_from_user.o U3copy_to_user.o
lib-$(CONFIG_SPARC64) += U3patch.o
lib-$(CONFIG_SPARC64) += NGmemcpy.o NGcopy_from_user.o NGcopy_to_user.o
lib-$(CONFIG_SPARC64) += NGpatch.o NGpage.o NGbzero.o
lib-$(CONFIG_SPARC64) += NG2memcpy.o NG2copy_from_user.o NG2copy_to_user.o
lib-$(CONFIG_SPARC64) += NG2patch.o
lib-$(CONFIG_SPARC64) += NG4memcpy.o NG4copy_from_user.o NG4copy_to_user.o
lib-$(CONFIG_SPARC64) += NG4patch.o NG4copy_page.o NG4clear_page.o NG4memset.o
lib-$(CONFIG_SPARC64) += GENmemcpy.o GENcopy_from_user.o GENcopy_to_user.o
lib-$(CONFIG_SPARC64) += GENpatch.o GENpage.o GENbzero.o
lib-$(CONFIG_SPARC64) += copy_in_user.o user_fixup.o memmove.o
lib-$(CONFIG_SPARC64) += mcount.o ipcsum.o xor.o hweight.o ffs.o
obj-$(CONFIG_SPARC64) += iomap.o
obj-$(CONFIG_SPARC32) += atomic32.o ucmpdi2.o
obj-y += ksyms.o
obj-$(CONFIG_SPARC64) += PeeCeeI.o

View file

@ -0,0 +1,35 @@
/* NG2copy_from_user.S: Niagara-2 optimized copy from userspace.
*
* Copyright (C) 2007 David S. Miller (davem@davemloft.net)
*/
#define EX_LD(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one_asi;\
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#ifndef ASI_BLK_AIUS_4V
#define ASI_BLK_AIUS_4V 0x17
#endif
#define FUNC_NAME NG2copy_from_user
#define LOAD(type,addr,dest) type##a [addr] %asi, dest
#define LOAD_BLK(addr,dest) ldda [addr] ASI_BLK_AIUS_4V, dest
#define EX_RETVAL(x) 0
#ifdef __KERNEL__
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "NG2memcpy.S"

View file

@ -0,0 +1,44 @@
/* NG2copy_to_user.S: Niagara-2 optimized copy to userspace.
*
* Copyright (C) 2007 David S. Miller (davem@davemloft.net)
*/
#define EX_ST(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one_asi;\
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#ifndef ASI_BLK_AIUS_4V
#define ASI_BLK_AIUS_4V 0x17
#endif
#ifndef ASI_BLK_INIT_QUAD_LDD_AIUS
#define ASI_BLK_INIT_QUAD_LDD_AIUS 0x23
#endif
#define FUNC_NAME NG2copy_to_user
#define STORE(type,src,addr) type##a src, [addr] ASI_AIUS
#define STORE_ASI ASI_BLK_INIT_QUAD_LDD_AIUS
#define STORE_BLK(src,addr) stda src, [addr] ASI_BLK_AIUS_4V
#define EX_RETVAL(x) 0
#ifdef __KERNEL__
/* Writing to %asi is _expensive_ so we hardcode it.
* Reading %asi to check for KERNEL_DS is comparatively
* cheap.
*/
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "NG2memcpy.S"

521
arch/sparc/lib/NG2memcpy.S Normal file
View file

@ -0,0 +1,521 @@
/* NG2memcpy.S: Niagara-2 optimized memcpy.
*
* Copyright (C) 2007 David S. Miller (davem@davemloft.net)
*/
#ifdef __KERNEL__
#include <asm/visasm.h>
#include <asm/asi.h>
#define GLOBAL_SPARE %g7
#else
#define ASI_PNF 0x82
#define ASI_BLK_P 0xf0
#define ASI_BLK_INIT_QUAD_LDD_P 0xe2
#define FPRS_FEF 0x04
#ifdef MEMCPY_DEBUG
#define VISEntryHalf rd %fprs, %o5; wr %g0, FPRS_FEF, %fprs; \
clr %g1; clr %g2; clr %g3; clr %g5; subcc %g0, %g0, %g0;
#define VISExitHalf and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#else
#define VISEntryHalf rd %fprs, %o5; wr %g0, FPRS_FEF, %fprs
#define VISExitHalf and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#endif
#define GLOBAL_SPARE %g5
#endif
#ifndef STORE_ASI
#ifndef SIMULATE_NIAGARA_ON_NON_NIAGARA
#define STORE_ASI ASI_BLK_INIT_QUAD_LDD_P
#else
#define STORE_ASI 0x80 /* ASI_P */
#endif
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
#ifndef EX_ST
#define EX_ST(x) x
#endif
#ifndef EX_RETVAL
#define EX_RETVAL(x) x
#endif
#ifndef LOAD
#define LOAD(type,addr,dest) type [addr], dest
#endif
#ifndef LOAD_BLK
#define LOAD_BLK(addr,dest) ldda [addr] ASI_BLK_P, dest
#endif
#ifndef STORE
#ifndef MEMCPY_DEBUG
#define STORE(type,src,addr) type src, [addr]
#else
#define STORE(type,src,addr) type##a src, [addr] 0x80
#endif
#endif
#ifndef STORE_BLK
#define STORE_BLK(src,addr) stda src, [addr] ASI_BLK_P
#endif
#ifndef STORE_INIT
#define STORE_INIT(src,addr) stxa src, [addr] STORE_ASI
#endif
#ifndef FUNC_NAME
#define FUNC_NAME NG2memcpy
#endif
#ifndef PREAMBLE
#define PREAMBLE
#endif
#ifndef XCC
#define XCC xcc
#endif
#define FREG_FROB(x0, x1, x2, x3, x4, x5, x6, x7, x8) \
faligndata %x0, %x1, %f0; \
faligndata %x1, %x2, %f2; \
faligndata %x2, %x3, %f4; \
faligndata %x3, %x4, %f6; \
faligndata %x4, %x5, %f8; \
faligndata %x5, %x6, %f10; \
faligndata %x6, %x7, %f12; \
faligndata %x7, %x8, %f14;
#define FREG_MOVE_1(x0) \
fsrc2 %x0, %f0;
#define FREG_MOVE_2(x0, x1) \
fsrc2 %x0, %f0; \
fsrc2 %x1, %f2;
#define FREG_MOVE_3(x0, x1, x2) \
fsrc2 %x0, %f0; \
fsrc2 %x1, %f2; \
fsrc2 %x2, %f4;
#define FREG_MOVE_4(x0, x1, x2, x3) \
fsrc2 %x0, %f0; \
fsrc2 %x1, %f2; \
fsrc2 %x2, %f4; \
fsrc2 %x3, %f6;
#define FREG_MOVE_5(x0, x1, x2, x3, x4) \
fsrc2 %x0, %f0; \
fsrc2 %x1, %f2; \
fsrc2 %x2, %f4; \
fsrc2 %x3, %f6; \
fsrc2 %x4, %f8;
#define FREG_MOVE_6(x0, x1, x2, x3, x4, x5) \
fsrc2 %x0, %f0; \
fsrc2 %x1, %f2; \
fsrc2 %x2, %f4; \
fsrc2 %x3, %f6; \
fsrc2 %x4, %f8; \
fsrc2 %x5, %f10;
#define FREG_MOVE_7(x0, x1, x2, x3, x4, x5, x6) \
fsrc2 %x0, %f0; \
fsrc2 %x1, %f2; \
fsrc2 %x2, %f4; \
fsrc2 %x3, %f6; \
fsrc2 %x4, %f8; \
fsrc2 %x5, %f10; \
fsrc2 %x6, %f12;
#define FREG_MOVE_8(x0, x1, x2, x3, x4, x5, x6, x7) \
fsrc2 %x0, %f0; \
fsrc2 %x1, %f2; \
fsrc2 %x2, %f4; \
fsrc2 %x3, %f6; \
fsrc2 %x4, %f8; \
fsrc2 %x5, %f10; \
fsrc2 %x6, %f12; \
fsrc2 %x7, %f14;
#define FREG_LOAD_1(base, x0) \
EX_LD(LOAD(ldd, base + 0x00, %x0))
#define FREG_LOAD_2(base, x0, x1) \
EX_LD(LOAD(ldd, base + 0x00, %x0)); \
EX_LD(LOAD(ldd, base + 0x08, %x1));
#define FREG_LOAD_3(base, x0, x1, x2) \
EX_LD(LOAD(ldd, base + 0x00, %x0)); \
EX_LD(LOAD(ldd, base + 0x08, %x1)); \
EX_LD(LOAD(ldd, base + 0x10, %x2));
#define FREG_LOAD_4(base, x0, x1, x2, x3) \
EX_LD(LOAD(ldd, base + 0x00, %x0)); \
EX_LD(LOAD(ldd, base + 0x08, %x1)); \
EX_LD(LOAD(ldd, base + 0x10, %x2)); \
EX_LD(LOAD(ldd, base + 0x18, %x3));
#define FREG_LOAD_5(base, x0, x1, x2, x3, x4) \
EX_LD(LOAD(ldd, base + 0x00, %x0)); \
EX_LD(LOAD(ldd, base + 0x08, %x1)); \
EX_LD(LOAD(ldd, base + 0x10, %x2)); \
EX_LD(LOAD(ldd, base + 0x18, %x3)); \
EX_LD(LOAD(ldd, base + 0x20, %x4));
#define FREG_LOAD_6(base, x0, x1, x2, x3, x4, x5) \
EX_LD(LOAD(ldd, base + 0x00, %x0)); \
EX_LD(LOAD(ldd, base + 0x08, %x1)); \
EX_LD(LOAD(ldd, base + 0x10, %x2)); \
EX_LD(LOAD(ldd, base + 0x18, %x3)); \
EX_LD(LOAD(ldd, base + 0x20, %x4)); \
EX_LD(LOAD(ldd, base + 0x28, %x5));
#define FREG_LOAD_7(base, x0, x1, x2, x3, x4, x5, x6) \
EX_LD(LOAD(ldd, base + 0x00, %x0)); \
EX_LD(LOAD(ldd, base + 0x08, %x1)); \
EX_LD(LOAD(ldd, base + 0x10, %x2)); \
EX_LD(LOAD(ldd, base + 0x18, %x3)); \
EX_LD(LOAD(ldd, base + 0x20, %x4)); \
EX_LD(LOAD(ldd, base + 0x28, %x5)); \
EX_LD(LOAD(ldd, base + 0x30, %x6));
.register %g2,#scratch
.register %g3,#scratch
.text
.align 64
.globl FUNC_NAME
.type FUNC_NAME,#function
FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
srlx %o2, 31, %g2
cmp %g2, 0
tne %xcc, 5
PREAMBLE
mov %o0, %o3
cmp %o2, 0
be,pn %XCC, 85f
or %o0, %o1, GLOBAL_SPARE
cmp %o2, 16
blu,a,pn %XCC, 80f
or GLOBAL_SPARE, %o2, GLOBAL_SPARE
/* 2 blocks (128 bytes) is the minimum we can do the block
* copy with. We need to ensure that we'll iterate at least
* once in the block copy loop. At worst we'll need to align
* the destination to a 64-byte boundary which can chew up
* to (64 - 1) bytes from the length before we perform the
* block copy loop.
*
* However, the cut-off point, performance wise, is around
* 4 64-byte blocks.
*/
cmp %o2, (4 * 64)
blu,pt %XCC, 75f
andcc GLOBAL_SPARE, 0x7, %g0
/* %o0: dst
* %o1: src
* %o2: len (known to be >= 128)
*
* The block copy loops can use %o4, %g2, %g3 as
* temporaries while copying the data. %o5 must
* be preserved between VISEntryHalf and VISExitHalf
*/
LOAD(prefetch, %o1 + 0x000, #one_read)
LOAD(prefetch, %o1 + 0x040, #one_read)
LOAD(prefetch, %o1 + 0x080, #one_read)
/* Align destination on 64-byte boundary. */
andcc %o0, (64 - 1), %o4
be,pt %XCC, 2f
sub %o4, 64, %o4
sub %g0, %o4, %o4 ! bytes to align dst
sub %o2, %o4, %o2
1: subcc %o4, 1, %o4
EX_LD(LOAD(ldub, %o1, %g1))
EX_ST(STORE(stb, %g1, %o0))
add %o1, 1, %o1
bne,pt %XCC, 1b
add %o0, 1, %o0
2:
/* Clobbers o5/g1/g2/g3/g7/icc/xcc. We must preserve
* o5 from here until we hit VISExitHalf.
*/
VISEntryHalf
membar #Sync
alignaddr %o1, %g0, %g0
add %o1, (64 - 1), %o4
andn %o4, (64 - 1), %o4
andn %o2, (64 - 1), %g1
sub %o2, %g1, %o2
and %o1, (64 - 1), %g2
add %o1, %g1, %o1
sub %o0, %o4, %g3
brz,pt %g2, 190f
cmp %g2, 32
blu,a 5f
cmp %g2, 16
cmp %g2, 48
blu,a 4f
cmp %g2, 40
cmp %g2, 56
blu 170f
nop
ba,a,pt %xcc, 180f
4: /* 32 <= low bits < 48 */
blu 150f
nop
ba,a,pt %xcc, 160f
5: /* 0 < low bits < 32 */
blu,a 6f
cmp %g2, 8
cmp %g2, 24
blu 130f
nop
ba,a,pt %xcc, 140f
6: /* 0 < low bits < 16 */
bgeu 120f
nop
/* fall through for 0 < low bits < 8 */
110: sub %o4, 64, %g2
EX_LD(LOAD_BLK(%g2, %f0))
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f2, f4, f6, f8, f10, f12, f14, f16)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_8(f16, f18, f20, f22, f24, f26, f28, f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
120: sub %o4, 56, %g2
FREG_LOAD_7(%g2, f0, f2, f4, f6, f8, f10, f12)
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f2, f4, f6, f8, f10, f12, f16, f18)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_7(f18, f20, f22, f24, f26, f28, f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
130: sub %o4, 48, %g2
FREG_LOAD_6(%g2, f0, f2, f4, f6, f8, f10)
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f2, f4, f6, f8, f10, f16, f18, f20)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_6(f20, f22, f24, f26, f28, f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
140: sub %o4, 40, %g2
FREG_LOAD_5(%g2, f0, f2, f4, f6, f8)
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f2, f4, f6, f8, f16, f18, f20, f22)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_5(f22, f24, f26, f28, f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
150: sub %o4, 32, %g2
FREG_LOAD_4(%g2, f0, f2, f4, f6)
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f2, f4, f6, f16, f18, f20, f22, f24)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_4(f24, f26, f28, f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
160: sub %o4, 24, %g2
FREG_LOAD_3(%g2, f0, f2, f4)
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f2, f4, f16, f18, f20, f22, f24, f26)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_3(f26, f28, f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
170: sub %o4, 16, %g2
FREG_LOAD_2(%g2, f0, f2)
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f2, f16, f18, f20, f22, f24, f26, f28)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_2(f28, f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
180: sub %o4, 8, %g2
FREG_LOAD_1(%g2, f0)
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
EX_LD(LOAD_BLK(%o4, %f16))
FREG_FROB(f0, f16, f18, f20, f22, f24, f26, f28, f30)
EX_ST(STORE_BLK(%f0, %o4 + %g3))
FREG_MOVE_1(f30)
subcc %g1, 64, %g1
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
ba,pt %xcc, 195f
nop
190:
1: EX_ST(STORE_INIT(%g0, %o4 + %g3))
subcc %g1, 64, %g1
EX_LD(LOAD_BLK(%o4, %f0))
EX_ST(STORE_BLK(%f0, %o4 + %g3))
add %o4, 64, %o4
bne,pt %xcc, 1b
LOAD(prefetch, %o4 + 64, #one_read)
195:
add %o4, %g3, %o0
membar #Sync
VISExitHalf
/* %o2 contains any final bytes still needed to be copied
* over. If anything is left, we copy it one byte at a time.
*/
brz,pt %o2, 85f
sub %o0, %o1, GLOBAL_SPARE
ba,a,pt %XCC, 90f
.align 64
75: /* 16 < len <= 64 */
bne,pn %XCC, 75f
sub %o0, %o1, GLOBAL_SPARE
72:
andn %o2, 0xf, %o4
and %o2, 0xf, %o2
1: subcc %o4, 0x10, %o4
EX_LD(LOAD(ldx, %o1, %o5))
add %o1, 0x08, %o1
EX_LD(LOAD(ldx, %o1, %g1))
sub %o1, 0x08, %o1
EX_ST(STORE(stx, %o5, %o1 + GLOBAL_SPARE))
add %o1, 0x8, %o1
EX_ST(STORE(stx, %g1, %o1 + GLOBAL_SPARE))
bgu,pt %XCC, 1b
add %o1, 0x8, %o1
73: andcc %o2, 0x8, %g0
be,pt %XCC, 1f
nop
sub %o2, 0x8, %o2
EX_LD(LOAD(ldx, %o1, %o5))
EX_ST(STORE(stx, %o5, %o1 + GLOBAL_SPARE))
add %o1, 0x8, %o1
1: andcc %o2, 0x4, %g0
be,pt %XCC, 1f
nop
sub %o2, 0x4, %o2
EX_LD(LOAD(lduw, %o1, %o5))
EX_ST(STORE(stw, %o5, %o1 + GLOBAL_SPARE))
add %o1, 0x4, %o1
1: cmp %o2, 0
be,pt %XCC, 85f
nop
ba,pt %xcc, 90f
nop
75:
andcc %o0, 0x7, %g1
sub %g1, 0x8, %g1
be,pn %icc, 2f
sub %g0, %g1, %g1
sub %o2, %g1, %o2
1: subcc %g1, 1, %g1
EX_LD(LOAD(ldub, %o1, %o5))
EX_ST(STORE(stb, %o5, %o1 + GLOBAL_SPARE))
bgu,pt %icc, 1b
add %o1, 1, %o1
2: add %o1, GLOBAL_SPARE, %o0
andcc %o1, 0x7, %g1
bne,pt %icc, 8f
sll %g1, 3, %g1
cmp %o2, 16
bgeu,pt %icc, 72b
nop
ba,a,pt %xcc, 73b
8: mov 64, GLOBAL_SPARE
andn %o1, 0x7, %o1
EX_LD(LOAD(ldx, %o1, %g2))
sub GLOBAL_SPARE, %g1, GLOBAL_SPARE
andn %o2, 0x7, %o4
sllx %g2, %g1, %g2
1: add %o1, 0x8, %o1
EX_LD(LOAD(ldx, %o1, %g3))
subcc %o4, 0x8, %o4
srlx %g3, GLOBAL_SPARE, %o5
or %o5, %g2, %o5
EX_ST(STORE(stx, %o5, %o0))
add %o0, 0x8, %o0
bgu,pt %icc, 1b
sllx %g3, %g1, %g2
srl %g1, 3, %g1
andcc %o2, 0x7, %o2
be,pn %icc, 85f
add %o1, %g1, %o1
ba,pt %xcc, 90f
sub %o0, %o1, GLOBAL_SPARE
.align 64
80: /* 0 < len <= 16 */
andcc GLOBAL_SPARE, 0x3, %g0
bne,pn %XCC, 90f
sub %o0, %o1, GLOBAL_SPARE
1:
subcc %o2, 4, %o2
EX_LD(LOAD(lduw, %o1, %g1))
EX_ST(STORE(stw, %g1, %o1 + GLOBAL_SPARE))
bgu,pt %XCC, 1b
add %o1, 4, %o1
85: retl
mov EX_RETVAL(%o3), %o0
.align 32
90:
subcc %o2, 1, %o2
EX_LD(LOAD(ldub, %o1, %g1))
EX_ST(STORE(stb, %g1, %o1 + GLOBAL_SPARE))
bgu,pt %XCC, 90b
add %o1, 1, %o1
retl
mov EX_RETVAL(%o3), %o0
.size FUNC_NAME, .-FUNC_NAME

33
arch/sparc/lib/NG2patch.S Normal file
View file

@ -0,0 +1,33 @@
/* NG2patch.S: Patch Ultra-I routines with Niagara-2 variant.
*
* Copyright (C) 2007 David S. Miller <davem@davemloft.net>
*/
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define NG_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl niagara2_patch_copyops
.type niagara2_patch_copyops,#function
niagara2_patch_copyops:
NG_DO_PATCH(memcpy, NG2memcpy)
NG_DO_PATCH(___copy_from_user, NG2copy_from_user)
NG_DO_PATCH(___copy_to_user, NG2copy_to_user)
retl
nop
.size niagara2_patch_copyops,.-niagara2_patch_copyops

View file

@ -0,0 +1,29 @@
/* NG4copy_page.S: Niagara-4 optimized clear page.
*
* Copyright (C) 2012 (davem@davemloft.net)
*/
#include <asm/asi.h>
#include <asm/page.h>
.text
.register %g3, #scratch
.align 32
.globl NG4clear_page
.globl NG4clear_user_page
NG4clear_page: /* %o0=dest */
NG4clear_user_page: /* %o0=dest, %o1=vaddr */
set PAGE_SIZE, %g7
mov 0x20, %g3
1: stxa %g0, [%o0 + %g0] ASI_ST_BLKINIT_MRU_P
subcc %g7, 0x40, %g7
stxa %g0, [%o0 + %g3] ASI_ST_BLKINIT_MRU_P
bne,pt %xcc, 1b
add %o0, 0x40, %o0
membar #StoreLoad|#StoreStore
retl
nop
.size NG4clear_page,.-NG4clear_page
.size NG4clear_user_page,.-NG4clear_user_page

View file

@ -0,0 +1,30 @@
/* NG4copy_from_user.S: Niagara-4 optimized copy from userspace.
*
* Copyright (C) 2012 David S. Miller (davem@davemloft.net)
*/
#define EX_LD(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one_asi;\
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#define FUNC_NAME NG4copy_from_user
#define LOAD(type,addr,dest) type##a [addr] %asi, dest
#define EX_RETVAL(x) 0
#ifdef __KERNEL__
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "NG4memcpy.S"

View file

@ -0,0 +1,57 @@
/* NG4copy_page.S: Niagara-4 optimized copy page.
*
* Copyright (C) 2012 (davem@davemloft.net)
*/
#include <asm/asi.h>
#include <asm/page.h>
.text
.align 32
.register %g2, #scratch
.register %g3, #scratch
.globl NG4copy_user_page
NG4copy_user_page: /* %o0=dest, %o1=src, %o2=vaddr */
prefetch [%o1 + 0x000], #n_reads_strong
prefetch [%o1 + 0x040], #n_reads_strong
prefetch [%o1 + 0x080], #n_reads_strong
prefetch [%o1 + 0x0c0], #n_reads_strong
set PAGE_SIZE, %g7
prefetch [%o1 + 0x100], #n_reads_strong
prefetch [%o1 + 0x140], #n_reads_strong
prefetch [%o1 + 0x180], #n_reads_strong
prefetch [%o1 + 0x1c0], #n_reads_strong
1:
ldx [%o1 + 0x00], %o2
subcc %g7, 0x40, %g7
ldx [%o1 + 0x08], %o3
ldx [%o1 + 0x10], %o4
ldx [%o1 + 0x18], %o5
ldx [%o1 + 0x20], %g1
stxa %o2, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
ldx [%o1 + 0x28], %g2
stxa %o3, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
ldx [%o1 + 0x30], %g3
stxa %o4, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
ldx [%o1 + 0x38], %o2
add %o1, 0x40, %o1
stxa %o5, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
stxa %g1, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
stxa %g2, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
stxa %g3, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
stxa %o2, [%o0] ASI_ST_BLKINIT_MRU_P
add %o0, 0x08, %o0
bne,pt %icc, 1b
prefetch [%o1 + 0x200], #n_reads_strong
retl
membar #StoreLoad | #StoreStore
.size NG4copy_user_page,.-NG4copy_user_page

View file

@ -0,0 +1,39 @@
/* NG4copy_to_user.S: Niagara-4 optimized copy to userspace.
*
* Copyright (C) 2012 David S. Miller (davem@davemloft.net)
*/
#define EX_ST(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one_asi;\
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#ifndef ASI_BLK_INIT_QUAD_LDD_AIUS
#define ASI_BLK_INIT_QUAD_LDD_AIUS 0x23
#endif
#define FUNC_NAME NG4copy_to_user
#define STORE(type,src,addr) type##a src, [addr] %asi
#define STORE_ASI ASI_BLK_INIT_QUAD_LDD_AIUS
#define EX_RETVAL(x) 0
#ifdef __KERNEL__
/* Writing to %asi is _expensive_ so we hardcode it.
* Reading %asi to check for KERNEL_DS is comparatively
* cheap.
*/
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "NG4memcpy.S"

372
arch/sparc/lib/NG4memcpy.S Normal file
View file

@ -0,0 +1,372 @@
/* NG4memcpy.S: Niagara-4 optimized memcpy.
*
* Copyright (C) 2012 David S. Miller (davem@davemloft.net)
*/
#ifdef __KERNEL__
#include <asm/visasm.h>
#include <asm/asi.h>
#define GLOBAL_SPARE %g7
#else
#define ASI_BLK_INIT_QUAD_LDD_P 0xe2
#define FPRS_FEF 0x04
/* On T4 it is very expensive to access ASRs like %fprs and
* %asi, avoiding a read or a write can save ~50 cycles.
*/
#define FPU_ENTER \
rd %fprs, %o5; \
andcc %o5, FPRS_FEF, %g0; \
be,a,pn %icc, 999f; \
wr %g0, FPRS_FEF, %fprs; \
999:
#ifdef MEMCPY_DEBUG
#define VISEntryHalf FPU_ENTER; \
clr %g1; clr %g2; clr %g3; clr %g5; subcc %g0, %g0, %g0;
#define VISExitHalf and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#else
#define VISEntryHalf FPU_ENTER
#define VISExitHalf and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#endif
#define GLOBAL_SPARE %g5
#endif
#ifndef STORE_ASI
#ifndef SIMULATE_NIAGARA_ON_NON_NIAGARA
#define STORE_ASI ASI_BLK_INIT_QUAD_LDD_P
#else
#define STORE_ASI 0x80 /* ASI_P */
#endif
#endif
#if !defined(EX_LD) && !defined(EX_ST)
#define NON_USER_COPY
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
#ifndef EX_ST
#define EX_ST(x) x
#endif
#ifndef EX_RETVAL
#define EX_RETVAL(x) x
#endif
#ifndef LOAD
#define LOAD(type,addr,dest) type [addr], dest
#endif
#ifndef STORE
#ifndef MEMCPY_DEBUG
#define STORE(type,src,addr) type src, [addr]
#else
#define STORE(type,src,addr) type##a src, [addr] %asi
#endif
#endif
#ifndef STORE_INIT
#define STORE_INIT(src,addr) stxa src, [addr] STORE_ASI
#endif
#ifndef FUNC_NAME
#define FUNC_NAME NG4memcpy
#endif
#ifndef PREAMBLE
#define PREAMBLE
#endif
#ifndef XCC
#define XCC xcc
#endif
.register %g2,#scratch
.register %g3,#scratch
.text
.align 64
.globl FUNC_NAME
.type FUNC_NAME,#function
FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
#ifdef MEMCPY_DEBUG
wr %g0, 0x80, %asi
#endif
srlx %o2, 31, %g2
cmp %g2, 0
tne %XCC, 5
PREAMBLE
mov %o0, %o3
brz,pn %o2, .Lexit
cmp %o2, 3
ble,pn %icc, .Ltiny
cmp %o2, 19
ble,pn %icc, .Lsmall
or %o0, %o1, %g2
cmp %o2, 128
bl,pn %icc, .Lmedium
nop
.Llarge:/* len >= 0x80 */
/* First get dest 8 byte aligned. */
sub %g0, %o0, %g1
and %g1, 0x7, %g1
brz,pt %g1, 51f
sub %o2, %g1, %o2
1: EX_LD(LOAD(ldub, %o1 + 0x00, %g2))
add %o1, 1, %o1
subcc %g1, 1, %g1
add %o0, 1, %o0
bne,pt %icc, 1b
EX_ST(STORE(stb, %g2, %o0 - 0x01))
51: LOAD(prefetch, %o1 + 0x040, #n_reads_strong)
LOAD(prefetch, %o1 + 0x080, #n_reads_strong)
LOAD(prefetch, %o1 + 0x0c0, #n_reads_strong)
LOAD(prefetch, %o1 + 0x100, #n_reads_strong)
LOAD(prefetch, %o1 + 0x140, #n_reads_strong)
LOAD(prefetch, %o1 + 0x180, #n_reads_strong)
LOAD(prefetch, %o1 + 0x1c0, #n_reads_strong)
LOAD(prefetch, %o1 + 0x200, #n_reads_strong)
/* Check if we can use the straight fully aligned
* loop, or we require the alignaddr/faligndata variant.
*/
andcc %o1, 0x7, %o5
bne,pn %icc, .Llarge_src_unaligned
sub %g0, %o0, %g1
/* Legitimize the use of initializing stores by getting dest
* to be 64-byte aligned.
*/
and %g1, 0x3f, %g1
brz,pt %g1, .Llarge_aligned
sub %o2, %g1, %o2
1: EX_LD(LOAD(ldx, %o1 + 0x00, %g2))
add %o1, 8, %o1
subcc %g1, 8, %g1
add %o0, 8, %o0
bne,pt %icc, 1b
EX_ST(STORE(stx, %g2, %o0 - 0x08))
.Llarge_aligned:
/* len >= 0x80 && src 8-byte aligned && dest 8-byte aligned */
andn %o2, 0x3f, %o4
sub %o2, %o4, %o2
1: EX_LD(LOAD(ldx, %o1 + 0x00, %g1))
add %o1, 0x40, %o1
EX_LD(LOAD(ldx, %o1 - 0x38, %g2))
subcc %o4, 0x40, %o4
EX_LD(LOAD(ldx, %o1 - 0x30, %g3))
EX_LD(LOAD(ldx, %o1 - 0x28, GLOBAL_SPARE))
EX_LD(LOAD(ldx, %o1 - 0x20, %o5))
EX_ST(STORE_INIT(%g1, %o0))
add %o0, 0x08, %o0
EX_ST(STORE_INIT(%g2, %o0))
add %o0, 0x08, %o0
EX_LD(LOAD(ldx, %o1 - 0x18, %g2))
EX_ST(STORE_INIT(%g3, %o0))
add %o0, 0x08, %o0
EX_LD(LOAD(ldx, %o1 - 0x10, %g3))
EX_ST(STORE_INIT(GLOBAL_SPARE, %o0))
add %o0, 0x08, %o0
EX_LD(LOAD(ldx, %o1 - 0x08, GLOBAL_SPARE))
EX_ST(STORE_INIT(%o5, %o0))
add %o0, 0x08, %o0
EX_ST(STORE_INIT(%g2, %o0))
add %o0, 0x08, %o0
EX_ST(STORE_INIT(%g3, %o0))
add %o0, 0x08, %o0
EX_ST(STORE_INIT(GLOBAL_SPARE, %o0))
add %o0, 0x08, %o0
bne,pt %icc, 1b
LOAD(prefetch, %o1 + 0x200, #n_reads_strong)
membar #StoreLoad | #StoreStore
brz,pn %o2, .Lexit
cmp %o2, 19
ble,pn %icc, .Lsmall_unaligned
nop
ba,a,pt %icc, .Lmedium_noprefetch
.Lexit: retl
mov EX_RETVAL(%o3), %o0
.Llarge_src_unaligned:
#ifdef NON_USER_COPY
VISEntryHalfFast(.Lmedium_vis_entry_fail)
#else
VISEntryHalf
#endif
andn %o2, 0x3f, %o4
sub %o2, %o4, %o2
alignaddr %o1, %g0, %g1
add %o1, %o4, %o1
EX_LD(LOAD(ldd, %g1 + 0x00, %f0))
1: EX_LD(LOAD(ldd, %g1 + 0x08, %f2))
subcc %o4, 0x40, %o4
EX_LD(LOAD(ldd, %g1 + 0x10, %f4))
EX_LD(LOAD(ldd, %g1 + 0x18, %f6))
EX_LD(LOAD(ldd, %g1 + 0x20, %f8))
EX_LD(LOAD(ldd, %g1 + 0x28, %f10))
EX_LD(LOAD(ldd, %g1 + 0x30, %f12))
EX_LD(LOAD(ldd, %g1 + 0x38, %f14))
faligndata %f0, %f2, %f16
EX_LD(LOAD(ldd, %g1 + 0x40, %f0))
faligndata %f2, %f4, %f18
add %g1, 0x40, %g1
faligndata %f4, %f6, %f20
faligndata %f6, %f8, %f22
faligndata %f8, %f10, %f24
faligndata %f10, %f12, %f26
faligndata %f12, %f14, %f28
faligndata %f14, %f0, %f30
EX_ST(STORE(std, %f16, %o0 + 0x00))
EX_ST(STORE(std, %f18, %o0 + 0x08))
EX_ST(STORE(std, %f20, %o0 + 0x10))
EX_ST(STORE(std, %f22, %o0 + 0x18))
EX_ST(STORE(std, %f24, %o0 + 0x20))
EX_ST(STORE(std, %f26, %o0 + 0x28))
EX_ST(STORE(std, %f28, %o0 + 0x30))
EX_ST(STORE(std, %f30, %o0 + 0x38))
add %o0, 0x40, %o0
bne,pt %icc, 1b
LOAD(prefetch, %g1 + 0x200, #n_reads_strong)
VISExitHalf
brz,pn %o2, .Lexit
cmp %o2, 19
ble,pn %icc, .Lsmall_unaligned
nop
ba,a,pt %icc, .Lmedium_unaligned
#ifdef NON_USER_COPY
.Lmedium_vis_entry_fail:
or %o0, %o1, %g2
#endif
.Lmedium:
LOAD(prefetch, %o1 + 0x40, #n_reads_strong)
andcc %g2, 0x7, %g0
bne,pn %icc, .Lmedium_unaligned
nop
.Lmedium_noprefetch:
andncc %o2, 0x20 - 1, %o5
be,pn %icc, 2f
sub %o2, %o5, %o2
1: EX_LD(LOAD(ldx, %o1 + 0x00, %g1))
EX_LD(LOAD(ldx, %o1 + 0x08, %g2))
EX_LD(LOAD(ldx, %o1 + 0x10, GLOBAL_SPARE))
EX_LD(LOAD(ldx, %o1 + 0x18, %o4))
add %o1, 0x20, %o1
subcc %o5, 0x20, %o5
EX_ST(STORE(stx, %g1, %o0 + 0x00))
EX_ST(STORE(stx, %g2, %o0 + 0x08))
EX_ST(STORE(stx, GLOBAL_SPARE, %o0 + 0x10))
EX_ST(STORE(stx, %o4, %o0 + 0x18))
bne,pt %icc, 1b
add %o0, 0x20, %o0
2: andcc %o2, 0x18, %o5
be,pt %icc, 3f
sub %o2, %o5, %o2
1: EX_LD(LOAD(ldx, %o1 + 0x00, %g1))
add %o1, 0x08, %o1
add %o0, 0x08, %o0
subcc %o5, 0x08, %o5
bne,pt %icc, 1b
EX_ST(STORE(stx, %g1, %o0 - 0x08))
3: brz,pt %o2, .Lexit
cmp %o2, 0x04
bl,pn %icc, .Ltiny
nop
EX_LD(LOAD(lduw, %o1 + 0x00, %g1))
add %o1, 0x04, %o1
add %o0, 0x04, %o0
subcc %o2, 0x04, %o2
bne,pn %icc, .Ltiny
EX_ST(STORE(stw, %g1, %o0 - 0x04))
ba,a,pt %icc, .Lexit
.Lmedium_unaligned:
/* First get dest 8 byte aligned. */
sub %g0, %o0, %g1
and %g1, 0x7, %g1
brz,pt %g1, 2f
sub %o2, %g1, %o2
1: EX_LD(LOAD(ldub, %o1 + 0x00, %g2))
add %o1, 1, %o1
subcc %g1, 1, %g1
add %o0, 1, %o0
bne,pt %icc, 1b
EX_ST(STORE(stb, %g2, %o0 - 0x01))
2:
and %o1, 0x7, %g1
brz,pn %g1, .Lmedium_noprefetch
sll %g1, 3, %g1
mov 64, %g2
sub %g2, %g1, %g2
andn %o1, 0x7, %o1
EX_LD(LOAD(ldx, %o1 + 0x00, %o4))
sllx %o4, %g1, %o4
andn %o2, 0x08 - 1, %o5
sub %o2, %o5, %o2
1: EX_LD(LOAD(ldx, %o1 + 0x08, %g3))
add %o1, 0x08, %o1
subcc %o5, 0x08, %o5
srlx %g3, %g2, GLOBAL_SPARE
or GLOBAL_SPARE, %o4, GLOBAL_SPARE
EX_ST(STORE(stx, GLOBAL_SPARE, %o0 + 0x00))
add %o0, 0x08, %o0
bne,pt %icc, 1b
sllx %g3, %g1, %o4
srl %g1, 3, %g1
add %o1, %g1, %o1
brz,pn %o2, .Lexit
nop
ba,pt %icc, .Lsmall_unaligned
.Ltiny:
EX_LD(LOAD(ldub, %o1 + 0x00, %g1))
subcc %o2, 1, %o2
be,pn %icc, .Lexit
EX_ST(STORE(stb, %g1, %o0 + 0x00))
EX_LD(LOAD(ldub, %o1 + 0x01, %g1))
subcc %o2, 1, %o2
be,pn %icc, .Lexit
EX_ST(STORE(stb, %g1, %o0 + 0x01))
EX_LD(LOAD(ldub, %o1 + 0x02, %g1))
ba,pt %icc, .Lexit
EX_ST(STORE(stb, %g1, %o0 + 0x02))
.Lsmall:
andcc %g2, 0x3, %g0
bne,pn %icc, .Lsmall_unaligned
andn %o2, 0x4 - 1, %o5
sub %o2, %o5, %o2
1:
EX_LD(LOAD(lduw, %o1 + 0x00, %g1))
add %o1, 0x04, %o1
subcc %o5, 0x04, %o5
add %o0, 0x04, %o0
bne,pt %icc, 1b
EX_ST(STORE(stw, %g1, %o0 - 0x04))
brz,pt %o2, .Lexit
nop
ba,a,pt %icc, .Ltiny
.Lsmall_unaligned:
1: EX_LD(LOAD(ldub, %o1 + 0x00, %g1))
add %o1, 1, %o1
add %o0, 1, %o0
subcc %o2, 1, %o2
bne,pt %icc, 1b
EX_ST(STORE(stb, %g1, %o0 - 0x01))
ba,a,pt %icc, .Lexit
.size FUNC_NAME, .-FUNC_NAME

105
arch/sparc/lib/NG4memset.S Normal file
View file

@ -0,0 +1,105 @@
/* NG4memset.S: Niagara-4 optimized memset/bzero.
*
* Copyright (C) 2012 David S. Miller (davem@davemloft.net)
*/
#include <asm/asi.h>
.register %g2, #scratch
.register %g3, #scratch
.text
.align 32
.globl NG4memset
NG4memset:
andcc %o1, 0xff, %o4
be,pt %icc, 1f
mov %o2, %o1
sllx %o4, 8, %g1
or %g1, %o4, %o2
sllx %o2, 16, %g1
or %g1, %o2, %o2
sllx %o2, 32, %g1
ba,pt %icc, 1f
or %g1, %o2, %o4
.size NG4memset,.-NG4memset
.align 32
.globl NG4bzero
NG4bzero:
clr %o4
1: cmp %o1, 16
ble %icc, .Ltiny
mov %o0, %o3
sub %g0, %o0, %g1
and %g1, 0x7, %g1
brz,pt %g1, .Laligned8
sub %o1, %g1, %o1
1: stb %o4, [%o0 + 0x00]
subcc %g1, 1, %g1
bne,pt %icc, 1b
add %o0, 1, %o0
.Laligned8:
cmp %o1, 64 + (64 - 8)
ble .Lmedium
sub %g0, %o0, %g1
andcc %g1, (64 - 1), %g1
brz,pn %g1, .Laligned64
sub %o1, %g1, %o1
1: stx %o4, [%o0 + 0x00]
subcc %g1, 8, %g1
bne,pt %icc, 1b
add %o0, 0x8, %o0
.Laligned64:
andn %o1, 64 - 1, %g1
sub %o1, %g1, %o1
brnz,pn %o4, .Lnon_bzero_loop
mov 0x20, %g2
1: stxa %o4, [%o0 + %g0] ASI_BLK_INIT_QUAD_LDD_P
subcc %g1, 0x40, %g1
stxa %o4, [%o0 + %g2] ASI_BLK_INIT_QUAD_LDD_P
bne,pt %icc, 1b
add %o0, 0x40, %o0
.Lpostloop:
cmp %o1, 8
bl,pn %icc, .Ltiny
membar #StoreStore|#StoreLoad
.Lmedium:
andn %o1, 0x7, %g1
sub %o1, %g1, %o1
1: stx %o4, [%o0 + 0x00]
subcc %g1, 0x8, %g1
bne,pt %icc, 1b
add %o0, 0x08, %o0
andcc %o1, 0x4, %g1
be,pt %icc, .Ltiny
sub %o1, %g1, %o1
stw %o4, [%o0 + 0x00]
add %o0, 0x4, %o0
.Ltiny:
cmp %o1, 0
be,pn %icc, .Lexit
1: subcc %o1, 1, %o1
stb %o4, [%o0 + 0x00]
bne,pt %icc, 1b
add %o0, 1, %o0
.Lexit:
retl
mov %o3, %o0
.Lnon_bzero_loop:
mov 0x08, %g3
mov 0x28, %o5
1: stxa %o4, [%o0 + %g0] ASI_BLK_INIT_QUAD_LDD_P
subcc %g1, 0x40, %g1
stxa %o4, [%o0 + %g2] ASI_BLK_INIT_QUAD_LDD_P
stxa %o4, [%o0 + %g3] ASI_BLK_INIT_QUAD_LDD_P
stxa %o4, [%o0 + %o5] ASI_BLK_INIT_QUAD_LDD_P
add %o0, 0x10, %o0
stxa %o4, [%o0 + %g0] ASI_BLK_INIT_QUAD_LDD_P
stxa %o4, [%o0 + %g2] ASI_BLK_INIT_QUAD_LDD_P
stxa %o4, [%o0 + %g3] ASI_BLK_INIT_QUAD_LDD_P
stxa %o4, [%o0 + %o5] ASI_BLK_INIT_QUAD_LDD_P
bne,pt %icc, 1b
add %o0, 0x30, %o0
ba,a,pt %icc, .Lpostloop
.size NG4bzero,.-NG4bzero

54
arch/sparc/lib/NG4patch.S Normal file
View file

@ -0,0 +1,54 @@
/* NG4patch.S: Patch Ultra-I routines with Niagara-4 variant.
*
* Copyright (C) 2012 David S. Miller <davem@davemloft.net>
*/
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define NG_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl niagara4_patch_copyops
.type niagara4_patch_copyops,#function
niagara4_patch_copyops:
NG_DO_PATCH(memcpy, NG4memcpy)
NG_DO_PATCH(___copy_from_user, NG4copy_from_user)
NG_DO_PATCH(___copy_to_user, NG4copy_to_user)
retl
nop
.size niagara4_patch_copyops,.-niagara4_patch_copyops
.globl niagara4_patch_bzero
.type niagara4_patch_bzero,#function
niagara4_patch_bzero:
NG_DO_PATCH(memset, NG4memset)
NG_DO_PATCH(__bzero, NG4bzero)
NG_DO_PATCH(__clear_user, NGclear_user)
NG_DO_PATCH(tsb_init, NGtsb_init)
retl
nop
.size niagara4_patch_bzero,.-niagara4_patch_bzero
.globl niagara4_patch_pageops
.type niagara4_patch_pageops,#function
niagara4_patch_pageops:
NG_DO_PATCH(copy_user_page, NG4copy_user_page)
NG_DO_PATCH(_clear_page, NG4clear_page)
NG_DO_PATCH(clear_user_page, NG4clear_user_page)
retl
nop
.size niagara4_patch_pageops,.-niagara4_patch_pageops

160
arch/sparc/lib/NGbzero.S Normal file
View file

@ -0,0 +1,160 @@
/* NGbzero.S: Niagara optimized memset/clear_user.
*
* Copyright (C) 2006 David S. Miller (davem@davemloft.net)
*/
#include <asm/asi.h>
#define EX_ST(x,y) \
98: x,y; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_o1; \
.text; \
.align 4;
.text
.globl NGmemset
.type NGmemset, #function
NGmemset: /* %o0=buf, %o1=pat, %o2=len */
and %o1, 0xff, %o3
mov %o2, %o1
sllx %o3, 8, %g1
or %g1, %o3, %o2
sllx %o2, 16, %g1
or %g1, %o2, %o2
sllx %o2, 32, %g1
ba,pt %xcc, 1f
or %g1, %o2, %o2
.globl NGbzero
.type NGbzero, #function
NGbzero:
clr %o2
1: brz,pn %o1, NGbzero_return
mov %o0, %o3
/* %o5: saved %asi, restored at NGbzero_done
* %g7: store-init %asi to use
* %o4: non-store-init %asi to use
*/
rd %asi, %o5
mov ASI_BLK_INIT_QUAD_LDD_P, %g7
mov ASI_P, %o4
wr %o4, 0x0, %asi
NGbzero_from_clear_user:
cmp %o1, 15
bl,pn %icc, NGbzero_tiny
andcc %o0, 0x7, %g1
be,pt %xcc, 2f
mov 8, %g2
sub %g2, %g1, %g1
sub %o1, %g1, %o1
1: EX_ST(stba %o2, [%o0 + 0x00] %asi)
subcc %g1, 1, %g1
bne,pt %xcc, 1b
add %o0, 1, %o0
2: cmp %o1, 128
bl,pn %icc, NGbzero_medium
andcc %o0, (64 - 1), %g1
be,pt %xcc, NGbzero_pre_loop
mov 64, %g2
sub %g2, %g1, %g1
sub %o1, %g1, %o1
1: EX_ST(stxa %o2, [%o0 + 0x00] %asi)
subcc %g1, 8, %g1
bne,pt %xcc, 1b
add %o0, 8, %o0
NGbzero_pre_loop:
wr %g7, 0x0, %asi
andn %o1, (64 - 1), %g1
sub %o1, %g1, %o1
NGbzero_loop:
EX_ST(stxa %o2, [%o0 + 0x00] %asi)
EX_ST(stxa %o2, [%o0 + 0x08] %asi)
EX_ST(stxa %o2, [%o0 + 0x10] %asi)
EX_ST(stxa %o2, [%o0 + 0x18] %asi)
EX_ST(stxa %o2, [%o0 + 0x20] %asi)
EX_ST(stxa %o2, [%o0 + 0x28] %asi)
EX_ST(stxa %o2, [%o0 + 0x30] %asi)
EX_ST(stxa %o2, [%o0 + 0x38] %asi)
subcc %g1, 64, %g1
bne,pt %xcc, NGbzero_loop
add %o0, 64, %o0
membar #Sync
wr %o4, 0x0, %asi
brz,pn %o1, NGbzero_done
NGbzero_medium:
andncc %o1, 0x7, %g1
be,pn %xcc, 2f
sub %o1, %g1, %o1
1: EX_ST(stxa %o2, [%o0 + 0x00] %asi)
subcc %g1, 8, %g1
bne,pt %xcc, 1b
add %o0, 8, %o0
2: brz,pt %o1, NGbzero_done
nop
NGbzero_tiny:
1: EX_ST(stba %o2, [%o0 + 0x00] %asi)
subcc %o1, 1, %o1
bne,pt %icc, 1b
add %o0, 1, %o0
/* fallthrough */
NGbzero_done:
wr %o5, 0x0, %asi
NGbzero_return:
retl
mov %o3, %o0
.size NGbzero, .-NGbzero
.size NGmemset, .-NGmemset
.globl NGclear_user
.type NGclear_user, #function
NGclear_user: /* %o0=buf, %o1=len */
rd %asi, %o5
brz,pn %o1, NGbzero_done
clr %o3
cmp %o5, ASI_AIUS
bne,pn %icc, NGbzero
clr %o2
mov ASI_BLK_INIT_QUAD_LDD_AIUS, %g7
ba,pt %xcc, NGbzero_from_clear_user
mov ASI_AIUS, %o4
.size NGclear_user, .-NGclear_user
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define NG_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl niagara_patch_bzero
.type niagara_patch_bzero,#function
niagara_patch_bzero:
NG_DO_PATCH(memset, NGmemset)
NG_DO_PATCH(__bzero, NGbzero)
NG_DO_PATCH(__clear_user, NGclear_user)
NG_DO_PATCH(tsb_init, NGtsb_init)
retl
nop
.size niagara_patch_bzero,.-niagara_patch_bzero

View file

@ -0,0 +1,32 @@
/* NGcopy_from_user.S: Niagara optimized copy from userspace.
*
* Copyright (C) 2006, 2007 David S. Miller (davem@davemloft.net)
*/
#define EX_LD(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __ret_one_asi;\
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#define FUNC_NAME NGcopy_from_user
#define LOAD(type,addr,dest) type##a [addr] ASI_AIUS, dest
#define LOAD_TWIN(addr_reg,dest0,dest1) \
ldda [addr_reg] ASI_BLK_INIT_QUAD_LDD_AIUS, dest0
#define EX_RETVAL(x) %g0
#ifdef __KERNEL__
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "NGmemcpy.S"

View file

@ -0,0 +1,35 @@
/* NGcopy_to_user.S: Niagara optimized copy to userspace.
*
* Copyright (C) 2006, 2007 David S. Miller (davem@davemloft.net)
*/
#define EX_ST(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __ret_one_asi;\
.text; \
.align 4;
#ifndef ASI_AIUS
#define ASI_AIUS 0x11
#endif
#define FUNC_NAME NGcopy_to_user
#define STORE(type,src,addr) type##a src, [addr] ASI_AIUS
#define STORE_ASI ASI_BLK_INIT_QUAD_LDD_AIUS
#define EX_RETVAL(x) %g0
#ifdef __KERNEL__
/* Writing to %asi is _expensive_ so we hardcode it.
* Reading %asi to check for KERNEL_DS is comparatively
* cheap.
*/
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop
#endif
#include "NGmemcpy.S"

425
arch/sparc/lib/NGmemcpy.S Normal file
View file

@ -0,0 +1,425 @@
/* NGmemcpy.S: Niagara optimized memcpy.
*
* Copyright (C) 2006, 2007 David S. Miller (davem@davemloft.net)
*/
#ifdef __KERNEL__
#include <asm/asi.h>
#include <asm/thread_info.h>
#define GLOBAL_SPARE %g7
#define RESTORE_ASI(TMP) \
ldub [%g6 + TI_CURRENT_DS], TMP; \
wr TMP, 0x0, %asi;
#else
#define GLOBAL_SPARE %g5
#define RESTORE_ASI(TMP) \
wr %g0, ASI_PNF, %asi
#endif
#ifdef __sparc_v9__
#define SAVE_AMOUNT 128
#else
#define SAVE_AMOUNT 64
#endif
#ifndef STORE_ASI
#define STORE_ASI ASI_BLK_INIT_QUAD_LDD_P
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
#ifndef EX_ST
#define EX_ST(x) x
#endif
#ifndef EX_RETVAL
#define EX_RETVAL(x) x
#endif
#ifndef LOAD
#ifndef MEMCPY_DEBUG
#define LOAD(type,addr,dest) type [addr], dest
#else
#define LOAD(type,addr,dest) type##a [addr] 0x80, dest
#endif
#endif
#ifndef LOAD_TWIN
#define LOAD_TWIN(addr_reg,dest0,dest1) \
ldda [addr_reg] ASI_BLK_INIT_QUAD_LDD_P, dest0
#endif
#ifndef STORE
#define STORE(type,src,addr) type src, [addr]
#endif
#ifndef STORE_INIT
#ifndef SIMULATE_NIAGARA_ON_NON_NIAGARA
#define STORE_INIT(src,addr) stxa src, [addr] %asi
#else
#define STORE_INIT(src,addr) stx src, [addr + 0x00]
#endif
#endif
#ifndef FUNC_NAME
#define FUNC_NAME NGmemcpy
#endif
#ifndef PREAMBLE
#define PREAMBLE
#endif
#ifndef XCC
#define XCC xcc
#endif
.register %g2,#scratch
.register %g3,#scratch
.text
.align 64
.globl FUNC_NAME
.type FUNC_NAME,#function
FUNC_NAME: /* %i0=dst, %i1=src, %i2=len */
PREAMBLE
save %sp, -SAVE_AMOUNT, %sp
srlx %i2, 31, %g2
cmp %g2, 0
tne %xcc, 5
mov %i0, %o0
cmp %i2, 0
be,pn %XCC, 85f
or %o0, %i1, %i3
cmp %i2, 16
blu,a,pn %XCC, 80f
or %i3, %i2, %i3
/* 2 blocks (128 bytes) is the minimum we can do the block
* copy with. We need to ensure that we'll iterate at least
* once in the block copy loop. At worst we'll need to align
* the destination to a 64-byte boundary which can chew up
* to (64 - 1) bytes from the length before we perform the
* block copy loop.
*/
cmp %i2, (2 * 64)
blu,pt %XCC, 70f
andcc %i3, 0x7, %g0
/* %o0: dst
* %i1: src
* %i2: len (known to be >= 128)
*
* The block copy loops will use %i4/%i5,%g2/%g3 as
* temporaries while copying the data.
*/
LOAD(prefetch, %i1, #one_read)
wr %g0, STORE_ASI, %asi
/* Align destination on 64-byte boundary. */
andcc %o0, (64 - 1), %i4
be,pt %XCC, 2f
sub %i4, 64, %i4
sub %g0, %i4, %i4 ! bytes to align dst
sub %i2, %i4, %i2
1: subcc %i4, 1, %i4
EX_LD(LOAD(ldub, %i1, %g1))
EX_ST(STORE(stb, %g1, %o0))
add %i1, 1, %i1
bne,pt %XCC, 1b
add %o0, 1, %o0
/* If the source is on a 16-byte boundary we can do
* the direct block copy loop. If it is 8-byte aligned
* we can do the 16-byte loads offset by -8 bytes and the
* init stores offset by one register.
*
* If the source is not even 8-byte aligned, we need to do
* shifting and masking (basically integer faligndata).
*
* The careful bit with init stores is that if we store
* to any part of the cache line we have to store the whole
* cacheline else we can end up with corrupt L2 cache line
* contents. Since the loop works on 64-bytes of 64-byte
* aligned store data at a time, this is easy to ensure.
*/
2:
andcc %i1, (16 - 1), %i4
andn %i2, (64 - 1), %g1 ! block copy loop iterator
be,pt %XCC, 50f
sub %i2, %g1, %i2 ! final sub-block copy bytes
cmp %i4, 8
be,pt %XCC, 10f
sub %i1, %i4, %i1
/* Neither 8-byte nor 16-byte aligned, shift and mask. */
and %i4, 0x7, GLOBAL_SPARE
sll GLOBAL_SPARE, 3, GLOBAL_SPARE
mov 64, %i5
EX_LD(LOAD_TWIN(%i1, %g2, %g3))
sub %i5, GLOBAL_SPARE, %i5
mov 16, %o4
mov 32, %o5
mov 48, %o7
mov 64, %i3
bg,pn %XCC, 9f
nop
#define MIX_THREE_WORDS(WORD1, WORD2, WORD3, PRE_SHIFT, POST_SHIFT, TMP) \
sllx WORD1, POST_SHIFT, WORD1; \
srlx WORD2, PRE_SHIFT, TMP; \
sllx WORD2, POST_SHIFT, WORD2; \
or WORD1, TMP, WORD1; \
srlx WORD3, PRE_SHIFT, TMP; \
or WORD2, TMP, WORD2;
8: EX_LD(LOAD_TWIN(%i1 + %o4, %o2, %o3))
MIX_THREE_WORDS(%g2, %g3, %o2, %i5, GLOBAL_SPARE, %o1)
LOAD(prefetch, %i1 + %i3, #one_read)
EX_ST(STORE_INIT(%g2, %o0 + 0x00))
EX_ST(STORE_INIT(%g3, %o0 + 0x08))
EX_LD(LOAD_TWIN(%i1 + %o5, %g2, %g3))
MIX_THREE_WORDS(%o2, %o3, %g2, %i5, GLOBAL_SPARE, %o1)
EX_ST(STORE_INIT(%o2, %o0 + 0x10))
EX_ST(STORE_INIT(%o3, %o0 + 0x18))
EX_LD(LOAD_TWIN(%i1 + %o7, %o2, %o3))
MIX_THREE_WORDS(%g2, %g3, %o2, %i5, GLOBAL_SPARE, %o1)
EX_ST(STORE_INIT(%g2, %o0 + 0x20))
EX_ST(STORE_INIT(%g3, %o0 + 0x28))
EX_LD(LOAD_TWIN(%i1 + %i3, %g2, %g3))
add %i1, 64, %i1
MIX_THREE_WORDS(%o2, %o3, %g2, %i5, GLOBAL_SPARE, %o1)
EX_ST(STORE_INIT(%o2, %o0 + 0x30))
EX_ST(STORE_INIT(%o3, %o0 + 0x38))
subcc %g1, 64, %g1
bne,pt %XCC, 8b
add %o0, 64, %o0
ba,pt %XCC, 60f
add %i1, %i4, %i1
9: EX_LD(LOAD_TWIN(%i1 + %o4, %o2, %o3))
MIX_THREE_WORDS(%g3, %o2, %o3, %i5, GLOBAL_SPARE, %o1)
LOAD(prefetch, %i1 + %i3, #one_read)
EX_ST(STORE_INIT(%g3, %o0 + 0x00))
EX_ST(STORE_INIT(%o2, %o0 + 0x08))
EX_LD(LOAD_TWIN(%i1 + %o5, %g2, %g3))
MIX_THREE_WORDS(%o3, %g2, %g3, %i5, GLOBAL_SPARE, %o1)
EX_ST(STORE_INIT(%o3, %o0 + 0x10))
EX_ST(STORE_INIT(%g2, %o0 + 0x18))
EX_LD(LOAD_TWIN(%i1 + %o7, %o2, %o3))
MIX_THREE_WORDS(%g3, %o2, %o3, %i5, GLOBAL_SPARE, %o1)
EX_ST(STORE_INIT(%g3, %o0 + 0x20))
EX_ST(STORE_INIT(%o2, %o0 + 0x28))
EX_LD(LOAD_TWIN(%i1 + %i3, %g2, %g3))
add %i1, 64, %i1
MIX_THREE_WORDS(%o3, %g2, %g3, %i5, GLOBAL_SPARE, %o1)
EX_ST(STORE_INIT(%o3, %o0 + 0x30))
EX_ST(STORE_INIT(%g2, %o0 + 0x38))
subcc %g1, 64, %g1
bne,pt %XCC, 9b
add %o0, 64, %o0
ba,pt %XCC, 60f
add %i1, %i4, %i1
10: /* Destination is 64-byte aligned, source was only 8-byte
* aligned but it has been subtracted by 8 and we perform
* one twin load ahead, then add 8 back into source when
* we finish the loop.
*/
EX_LD(LOAD_TWIN(%i1, %o4, %o5))
mov 16, %o7
mov 32, %g2
mov 48, %g3
mov 64, %o1
1: EX_LD(LOAD_TWIN(%i1 + %o7, %o2, %o3))
LOAD(prefetch, %i1 + %o1, #one_read)
EX_ST(STORE_INIT(%o5, %o0 + 0x00)) ! initializes cache line
EX_ST(STORE_INIT(%o2, %o0 + 0x08))
EX_LD(LOAD_TWIN(%i1 + %g2, %o4, %o5))
EX_ST(STORE_INIT(%o3, %o0 + 0x10))
EX_ST(STORE_INIT(%o4, %o0 + 0x18))
EX_LD(LOAD_TWIN(%i1 + %g3, %o2, %o3))
EX_ST(STORE_INIT(%o5, %o0 + 0x20))
EX_ST(STORE_INIT(%o2, %o0 + 0x28))
EX_LD(LOAD_TWIN(%i1 + %o1, %o4, %o5))
add %i1, 64, %i1
EX_ST(STORE_INIT(%o3, %o0 + 0x30))
EX_ST(STORE_INIT(%o4, %o0 + 0x38))
subcc %g1, 64, %g1
bne,pt %XCC, 1b
add %o0, 64, %o0
ba,pt %XCC, 60f
add %i1, 0x8, %i1
50: /* Destination is 64-byte aligned, and source is 16-byte
* aligned.
*/
mov 16, %o7
mov 32, %g2
mov 48, %g3
mov 64, %o1
1: EX_LD(LOAD_TWIN(%i1 + %g0, %o4, %o5))
EX_LD(LOAD_TWIN(%i1 + %o7, %o2, %o3))
LOAD(prefetch, %i1 + %o1, #one_read)
EX_ST(STORE_INIT(%o4, %o0 + 0x00)) ! initializes cache line
EX_ST(STORE_INIT(%o5, %o0 + 0x08))
EX_LD(LOAD_TWIN(%i1 + %g2, %o4, %o5))
EX_ST(STORE_INIT(%o2, %o0 + 0x10))
EX_ST(STORE_INIT(%o3, %o0 + 0x18))
EX_LD(LOAD_TWIN(%i1 + %g3, %o2, %o3))
add %i1, 64, %i1
EX_ST(STORE_INIT(%o4, %o0 + 0x20))
EX_ST(STORE_INIT(%o5, %o0 + 0x28))
EX_ST(STORE_INIT(%o2, %o0 + 0x30))
EX_ST(STORE_INIT(%o3, %o0 + 0x38))
subcc %g1, 64, %g1
bne,pt %XCC, 1b
add %o0, 64, %o0
/* fall through */
60:
membar #Sync
/* %i2 contains any final bytes still needed to be copied
* over. If anything is left, we copy it one byte at a time.
*/
RESTORE_ASI(%i3)
brz,pt %i2, 85f
sub %o0, %i1, %i3
ba,a,pt %XCC, 90f
.align 64
70: /* 16 < len <= 64 */
bne,pn %XCC, 75f
sub %o0, %i1, %i3
72:
andn %i2, 0xf, %i4
and %i2, 0xf, %i2
1: subcc %i4, 0x10, %i4
EX_LD(LOAD(ldx, %i1, %o4))
add %i1, 0x08, %i1
EX_LD(LOAD(ldx, %i1, %g1))
sub %i1, 0x08, %i1
EX_ST(STORE(stx, %o4, %i1 + %i3))
add %i1, 0x8, %i1
EX_ST(STORE(stx, %g1, %i1 + %i3))
bgu,pt %XCC, 1b
add %i1, 0x8, %i1
73: andcc %i2, 0x8, %g0
be,pt %XCC, 1f
nop
sub %i2, 0x8, %i2
EX_LD(LOAD(ldx, %i1, %o4))
EX_ST(STORE(stx, %o4, %i1 + %i3))
add %i1, 0x8, %i1
1: andcc %i2, 0x4, %g0
be,pt %XCC, 1f
nop
sub %i2, 0x4, %i2
EX_LD(LOAD(lduw, %i1, %i5))
EX_ST(STORE(stw, %i5, %i1 + %i3))
add %i1, 0x4, %i1
1: cmp %i2, 0
be,pt %XCC, 85f
nop
ba,pt %xcc, 90f
nop
75:
andcc %o0, 0x7, %g1
sub %g1, 0x8, %g1
be,pn %icc, 2f
sub %g0, %g1, %g1
sub %i2, %g1, %i2
1: subcc %g1, 1, %g1
EX_LD(LOAD(ldub, %i1, %i5))
EX_ST(STORE(stb, %i5, %i1 + %i3))
bgu,pt %icc, 1b
add %i1, 1, %i1
2: add %i1, %i3, %o0
andcc %i1, 0x7, %g1
bne,pt %icc, 8f
sll %g1, 3, %g1
cmp %i2, 16
bgeu,pt %icc, 72b
nop
ba,a,pt %xcc, 73b
8: mov 64, %i3
andn %i1, 0x7, %i1
EX_LD(LOAD(ldx, %i1, %g2))
sub %i3, %g1, %i3
andn %i2, 0x7, %i4
sllx %g2, %g1, %g2
1: add %i1, 0x8, %i1
EX_LD(LOAD(ldx, %i1, %g3))
subcc %i4, 0x8, %i4
srlx %g3, %i3, %i5
or %i5, %g2, %i5
EX_ST(STORE(stx, %i5, %o0))
add %o0, 0x8, %o0
bgu,pt %icc, 1b
sllx %g3, %g1, %g2
srl %g1, 3, %g1
andcc %i2, 0x7, %i2
be,pn %icc, 85f
add %i1, %g1, %i1
ba,pt %xcc, 90f
sub %o0, %i1, %i3
.align 64
80: /* 0 < len <= 16 */
andcc %i3, 0x3, %g0
bne,pn %XCC, 90f
sub %o0, %i1, %i3
1:
subcc %i2, 4, %i2
EX_LD(LOAD(lduw, %i1, %g1))
EX_ST(STORE(stw, %g1, %i1 + %i3))
bgu,pt %XCC, 1b
add %i1, 4, %i1
85: ret
restore EX_RETVAL(%i0), %g0, %o0
.align 32
90:
subcc %i2, 1, %i2
EX_LD(LOAD(ldub, %i1, %g1))
EX_ST(STORE(stb, %g1, %i1 + %i3))
bgu,pt %XCC, 90b
add %i1, 1, %i1
ret
restore EX_RETVAL(%i0), %g0, %o0
.size FUNC_NAME, .-FUNC_NAME

137
arch/sparc/lib/NGpage.S Normal file
View file

@ -0,0 +1,137 @@
/* NGpage.S: Niagara optimize clear and copy page.
*
* Copyright (C) 2006 (davem@davemloft.net)
*/
#include <asm/asi.h>
#include <asm/page.h>
.text
.align 32
/* This is heavily simplified from the sun4u variants
* because Niagara does not have any D-cache aliasing issues
* and also we don't need to use the FPU in order to implement
* an optimal page copy/clear.
*/
NGcopy_user_page: /* %o0=dest, %o1=src, %o2=vaddr */
save %sp, -192, %sp
rd %asi, %g3
wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi
set PAGE_SIZE, %g7
prefetch [%i1 + 0x00], #one_read
prefetch [%i1 + 0x40], #one_read
1: prefetch [%i1 + 0x80], #one_read
prefetch [%i1 + 0xc0], #one_read
ldda [%i1 + 0x00] %asi, %o2
ldda [%i1 + 0x10] %asi, %o4
ldda [%i1 + 0x20] %asi, %l2
ldda [%i1 + 0x30] %asi, %l4
stxa %o2, [%i0 + 0x00] %asi
stxa %o3, [%i0 + 0x08] %asi
stxa %o4, [%i0 + 0x10] %asi
stxa %o5, [%i0 + 0x18] %asi
stxa %l2, [%i0 + 0x20] %asi
stxa %l3, [%i0 + 0x28] %asi
stxa %l4, [%i0 + 0x30] %asi
stxa %l5, [%i0 + 0x38] %asi
ldda [%i1 + 0x40] %asi, %o2
ldda [%i1 + 0x50] %asi, %o4
ldda [%i1 + 0x60] %asi, %l2
ldda [%i1 + 0x70] %asi, %l4
stxa %o2, [%i0 + 0x40] %asi
stxa %o3, [%i0 + 0x48] %asi
stxa %o4, [%i0 + 0x50] %asi
stxa %o5, [%i0 + 0x58] %asi
stxa %l2, [%i0 + 0x60] %asi
stxa %l3, [%i0 + 0x68] %asi
stxa %l4, [%i0 + 0x70] %asi
stxa %l5, [%i0 + 0x78] %asi
add %i1, 128, %i1
subcc %g7, 128, %g7
bne,pt %xcc, 1b
add %i0, 128, %i0
wr %g3, 0x0, %asi
membar #Sync
ret
restore
.align 32
.globl NGclear_page
.globl NGclear_user_page
NGclear_page: /* %o0=dest */
NGclear_user_page: /* %o0=dest, %o1=vaddr */
rd %asi, %g3
wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi
set PAGE_SIZE, %g7
1: stxa %g0, [%o0 + 0x00] %asi
stxa %g0, [%o0 + 0x08] %asi
stxa %g0, [%o0 + 0x10] %asi
stxa %g0, [%o0 + 0x18] %asi
stxa %g0, [%o0 + 0x20] %asi
stxa %g0, [%o0 + 0x28] %asi
stxa %g0, [%o0 + 0x30] %asi
stxa %g0, [%o0 + 0x38] %asi
stxa %g0, [%o0 + 0x40] %asi
stxa %g0, [%o0 + 0x48] %asi
stxa %g0, [%o0 + 0x50] %asi
stxa %g0, [%o0 + 0x58] %asi
stxa %g0, [%o0 + 0x60] %asi
stxa %g0, [%o0 + 0x68] %asi
stxa %g0, [%o0 + 0x70] %asi
stxa %g0, [%o0 + 0x78] %asi
stxa %g0, [%o0 + 0x80] %asi
stxa %g0, [%o0 + 0x88] %asi
stxa %g0, [%o0 + 0x90] %asi
stxa %g0, [%o0 + 0x98] %asi
stxa %g0, [%o0 + 0xa0] %asi
stxa %g0, [%o0 + 0xa8] %asi
stxa %g0, [%o0 + 0xb0] %asi
stxa %g0, [%o0 + 0xb8] %asi
stxa %g0, [%o0 + 0xc0] %asi
stxa %g0, [%o0 + 0xc8] %asi
stxa %g0, [%o0 + 0xd0] %asi
stxa %g0, [%o0 + 0xd8] %asi
stxa %g0, [%o0 + 0xe0] %asi
stxa %g0, [%o0 + 0xe8] %asi
stxa %g0, [%o0 + 0xf0] %asi
stxa %g0, [%o0 + 0xf8] %asi
subcc %g7, 256, %g7
bne,pt %xcc, 1b
add %o0, 256, %o0
wr %g3, 0x0, %asi
membar #Sync
retl
nop
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define NG_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl niagara_patch_pageops
.type niagara_patch_pageops,#function
niagara_patch_pageops:
NG_DO_PATCH(copy_user_page, NGcopy_user_page)
NG_DO_PATCH(_clear_page, NGclear_page)
NG_DO_PATCH(clear_user_page, NGclear_user_page)
retl
nop
.size niagara_patch_pageops,.-niagara_patch_pageops

33
arch/sparc/lib/NGpatch.S Normal file
View file

@ -0,0 +1,33 @@
/* NGpatch.S: Patch Ultra-I routines with Niagara variant.
*
* Copyright (C) 2006 David S. Miller <davem@davemloft.net>
*/
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define NG_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl niagara_patch_copyops
.type niagara_patch_copyops,#function
niagara_patch_copyops:
NG_DO_PATCH(memcpy, NGmemcpy)
NG_DO_PATCH(___copy_from_user, NGcopy_from_user)
NG_DO_PATCH(___copy_to_user, NGcopy_to_user)
retl
nop
.size niagara_patch_copyops,.-niagara_patch_copyops

211
arch/sparc/lib/PeeCeeI.c Normal file
View file

@ -0,0 +1,211 @@
/*
* PeeCeeI.c: The emerging standard...
*
* Copyright (C) 1997 David S. Miller (davem@caip.rutgers.edu)
*/
#include <linux/module.h>
#include <asm/io.h>
#include <asm/byteorder.h>
void outsb(unsigned long __addr, const void *src, unsigned long count)
{
void __iomem *addr = (void __iomem *) __addr;
const u8 *p = src;
while (count--)
__raw_writeb(*p++, addr);
}
EXPORT_SYMBOL(outsb);
void outsw(unsigned long __addr, const void *src, unsigned long count)
{
void __iomem *addr = (void __iomem *) __addr;
while (count--) {
__raw_writew(*(u16 *)src, addr);
src += sizeof(u16);
}
}
EXPORT_SYMBOL(outsw);
void outsl(unsigned long __addr, const void *src, unsigned long count)
{
void __iomem *addr = (void __iomem *) __addr;
u32 l, l2;
if (!count)
return;
switch (((unsigned long)src) & 0x3) {
case 0x0:
/* src is naturally aligned */
while (count--) {
__raw_writel(*(u32 *)src, addr);
src += sizeof(u32);
}
break;
case 0x2:
/* 2-byte alignment */
while (count--) {
l = (*(u16 *)src) << 16;
l |= *(u16 *)(src + sizeof(u16));
__raw_writel(l, addr);
src += sizeof(u32);
}
break;
case 0x1:
/* Hold three bytes in l each time, grab a byte from l2 */
l = (*(u8 *)src) << 24;
l |= (*(u16 *)(src + sizeof(u8))) << 8;
src += sizeof(u8) + sizeof(u16);
while (count--) {
l2 = *(u32 *)src;
l |= (l2 >> 24);
__raw_writel(l, addr);
l = l2 << 8;
src += sizeof(u32);
}
break;
case 0x3:
/* Hold a byte in l each time, grab 3 bytes from l2 */
l = (*(u8 *)src) << 24;
src += sizeof(u8);
while (count--) {
l2 = *(u32 *)src;
l |= (l2 >> 8);
__raw_writel(l, addr);
l = l2 << 24;
src += sizeof(u32);
}
break;
}
}
EXPORT_SYMBOL(outsl);
void insb(unsigned long __addr, void *dst, unsigned long count)
{
void __iomem *addr = (void __iomem *) __addr;
if (count) {
u32 *pi;
u8 *pb = dst;
while ((((unsigned long)pb) & 0x3) && count--)
*pb++ = __raw_readb(addr);
pi = (u32 *)pb;
while (count >= 4) {
u32 w;
w = (__raw_readb(addr) << 24);
w |= (__raw_readb(addr) << 16);
w |= (__raw_readb(addr) << 8);
w |= (__raw_readb(addr) << 0);
*pi++ = w;
count -= 4;
}
pb = (u8 *)pi;
while (count--)
*pb++ = __raw_readb(addr);
}
}
EXPORT_SYMBOL(insb);
void insw(unsigned long __addr, void *dst, unsigned long count)
{
void __iomem *addr = (void __iomem *) __addr;
if (count) {
u16 *ps = dst;
u32 *pi;
if (((unsigned long)ps) & 0x2) {
*ps++ = __raw_readw(addr);
count--;
}
pi = (u32 *)ps;
while (count >= 2) {
u32 w;
w = __raw_readw(addr) << 16;
w |= __raw_readw(addr) << 0;
*pi++ = w;
count -= 2;
}
ps = (u16 *)pi;
if (count)
*ps = __raw_readw(addr);
}
}
EXPORT_SYMBOL(insw);
void insl(unsigned long __addr, void *dst, unsigned long count)
{
void __iomem *addr = (void __iomem *) __addr;
if (count) {
if ((((unsigned long)dst) & 0x3) == 0) {
u32 *pi = dst;
while (count--)
*pi++ = __raw_readl(addr);
} else {
u32 l = 0, l2, *pi;
u16 *ps;
u8 *pb;
switch (((unsigned long)dst) & 3) {
case 0x2:
ps = dst;
count -= 1;
l = __raw_readl(addr);
*ps++ = l;
pi = (u32 *)ps;
while (count--) {
l2 = __raw_readl(addr);
*pi++ = (l << 16) | (l2 >> 16);
l = l2;
}
ps = (u16 *)pi;
*ps = l;
break;
case 0x1:
pb = dst;
count -= 1;
l = __raw_readl(addr);
*pb++ = l >> 24;
ps = (u16 *)pb;
*ps++ = ((l >> 8) & 0xffff);
pi = (u32 *)ps;
while (count--) {
l2 = __raw_readl(addr);
*pi++ = (l << 24) | (l2 >> 8);
l = l2;
}
pb = (u8 *)pi;
*pb = l;
break;
case 0x3:
pb = (u8 *)dst;
count -= 1;
l = __raw_readl(addr);
*pb++ = l >> 24;
pi = (u32 *)pb;
while (count--) {
l2 = __raw_readl(addr);
*pi++ = (l << 8) | (l2 >> 24);
l = l2;
}
ps = (u16 *)pi;
*ps++ = ((l >> 8) & 0xffff);
pb = (u8 *)ps;
*pb = l;
break;
}
}
}
}
EXPORT_SYMBOL(insl);

View file

@ -0,0 +1,29 @@
/* U1copy_from_user.S: UltraSparc-I/II/IIi/IIe optimized copy from userspace.
*
* Copyright (C) 1999, 2000, 2004 David S. Miller (davem@redhat.com)
*/
#define EX_LD(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one; \
.text; \
.align 4;
#define FUNC_NAME ___copy_from_user
#define LOAD(type,addr,dest) type##a [addr] %asi, dest
#define LOAD_BLK(addr,dest) ldda [addr] ASI_BLK_AIUS, dest
#define EX_RETVAL(x) 0
/* Writing to %asi is _expensive_ so we hardcode it.
* Reading %asi to check for KERNEL_DS is comparatively
* cheap.
*/
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop; \
#include "U1memcpy.S"

View file

@ -0,0 +1,29 @@
/* U1copy_to_user.S: UltraSparc-I/II/IIi/IIe optimized copy to userspace.
*
* Copyright (C) 1999, 2000, 2004 David S. Miller (davem@redhat.com)
*/
#define EX_ST(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one; \
.text; \
.align 4;
#define FUNC_NAME ___copy_to_user
#define STORE(type,src,addr) type##a src, [addr] ASI_AIUS
#define STORE_BLK(src,addr) stda src, [addr] ASI_BLK_AIUS
#define EX_RETVAL(x) 0
/* Writing to %asi is _expensive_ so we hardcode it.
* Reading %asi to check for KERNEL_DS is comparatively
* cheap.
*/
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop; \
#include "U1memcpy.S"

563
arch/sparc/lib/U1memcpy.S Normal file
View file

@ -0,0 +1,563 @@
/* U1memcpy.S: UltraSPARC-I/II/IIi/IIe optimized memcpy.
*
* Copyright (C) 1997, 2004 David S. Miller (davem@redhat.com)
* Copyright (C) 1996, 1997, 1998, 1999 Jakub Jelinek (jj@ultra.linux.cz)
*/
#ifdef __KERNEL__
#include <asm/visasm.h>
#include <asm/asi.h>
#define GLOBAL_SPARE g7
#else
#define GLOBAL_SPARE g5
#define ASI_BLK_P 0xf0
#define FPRS_FEF 0x04
#ifdef MEMCPY_DEBUG
#define VISEntry rd %fprs, %o5; wr %g0, FPRS_FEF, %fprs; \
clr %g1; clr %g2; clr %g3; subcc %g0, %g0, %g0;
#define VISExit and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#else
#define VISEntry rd %fprs, %o5; wr %g0, FPRS_FEF, %fprs
#define VISExit and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#endif
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
#ifndef EX_ST
#define EX_ST(x) x
#endif
#ifndef EX_RETVAL
#define EX_RETVAL(x) x
#endif
#ifndef LOAD
#define LOAD(type,addr,dest) type [addr], dest
#endif
#ifndef LOAD_BLK
#define LOAD_BLK(addr,dest) ldda [addr] ASI_BLK_P, dest
#endif
#ifndef STORE
#define STORE(type,src,addr) type src, [addr]
#endif
#ifndef STORE_BLK
#define STORE_BLK(src,addr) stda src, [addr] ASI_BLK_P
#endif
#ifndef FUNC_NAME
#define FUNC_NAME memcpy
#endif
#ifndef PREAMBLE
#define PREAMBLE
#endif
#ifndef XCC
#define XCC xcc
#endif
#define FREG_FROB(f1, f2, f3, f4, f5, f6, f7, f8, f9) \
faligndata %f1, %f2, %f48; \
faligndata %f2, %f3, %f50; \
faligndata %f3, %f4, %f52; \
faligndata %f4, %f5, %f54; \
faligndata %f5, %f6, %f56; \
faligndata %f6, %f7, %f58; \
faligndata %f7, %f8, %f60; \
faligndata %f8, %f9, %f62;
#define MAIN_LOOP_CHUNK(src, dest, fdest, fsrc, len, jmptgt) \
EX_LD(LOAD_BLK(%src, %fdest)); \
EX_ST(STORE_BLK(%fsrc, %dest)); \
add %src, 0x40, %src; \
subcc %len, 0x40, %len; \
be,pn %xcc, jmptgt; \
add %dest, 0x40, %dest; \
#define LOOP_CHUNK1(src, dest, len, branch_dest) \
MAIN_LOOP_CHUNK(src, dest, f0, f48, len, branch_dest)
#define LOOP_CHUNK2(src, dest, len, branch_dest) \
MAIN_LOOP_CHUNK(src, dest, f16, f48, len, branch_dest)
#define LOOP_CHUNK3(src, dest, len, branch_dest) \
MAIN_LOOP_CHUNK(src, dest, f32, f48, len, branch_dest)
#define DO_SYNC membar #Sync;
#define STORE_SYNC(dest, fsrc) \
EX_ST(STORE_BLK(%fsrc, %dest)); \
add %dest, 0x40, %dest; \
DO_SYNC
#define STORE_JUMP(dest, fsrc, target) \
EX_ST(STORE_BLK(%fsrc, %dest)); \
add %dest, 0x40, %dest; \
ba,pt %xcc, target; \
nop;
#define FINISH_VISCHUNK(dest, f0, f1, left) \
subcc %left, 8, %left;\
bl,pn %xcc, 95f; \
faligndata %f0, %f1, %f48; \
EX_ST(STORE(std, %f48, %dest)); \
add %dest, 8, %dest;
#define UNEVEN_VISCHUNK_LAST(dest, f0, f1, left) \
subcc %left, 8, %left; \
bl,pn %xcc, 95f; \
fsrc2 %f0, %f1;
#define UNEVEN_VISCHUNK(dest, f0, f1, left) \
UNEVEN_VISCHUNK_LAST(dest, f0, f1, left) \
ba,a,pt %xcc, 93f;
.register %g2,#scratch
.register %g3,#scratch
.text
.align 64
.globl FUNC_NAME
.type FUNC_NAME,#function
FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
srlx %o2, 31, %g2
cmp %g2, 0
tne %xcc, 5
PREAMBLE
mov %o0, %o4
cmp %o2, 0
be,pn %XCC, 85f
or %o0, %o1, %o3
cmp %o2, 16
blu,a,pn %XCC, 80f
or %o3, %o2, %o3
cmp %o2, (5 * 64)
blu,pt %XCC, 70f
andcc %o3, 0x7, %g0
/* Clobbers o5/g1/g2/g3/g7/icc/xcc. */
VISEntry
/* Is 'dst' already aligned on an 64-byte boundary? */
andcc %o0, 0x3f, %g2
be,pt %XCC, 2f
/* Compute abs((dst & 0x3f) - 0x40) into %g2. This is the number
* of bytes to copy to make 'dst' 64-byte aligned. We pre-
* subtract this from 'len'.
*/
sub %o0, %o1, %GLOBAL_SPARE
sub %g2, 0x40, %g2
sub %g0, %g2, %g2
sub %o2, %g2, %o2
andcc %g2, 0x7, %g1
be,pt %icc, 2f
and %g2, 0x38, %g2
1: subcc %g1, 0x1, %g1
EX_LD(LOAD(ldub, %o1 + 0x00, %o3))
EX_ST(STORE(stb, %o3, %o1 + %GLOBAL_SPARE))
bgu,pt %XCC, 1b
add %o1, 0x1, %o1
add %o1, %GLOBAL_SPARE, %o0
2: cmp %g2, 0x0
and %o1, 0x7, %g1
be,pt %icc, 3f
alignaddr %o1, %g0, %o1
EX_LD(LOAD(ldd, %o1, %f4))
1: EX_LD(LOAD(ldd, %o1 + 0x8, %f6))
add %o1, 0x8, %o1
subcc %g2, 0x8, %g2
faligndata %f4, %f6, %f0
EX_ST(STORE(std, %f0, %o0))
be,pn %icc, 3f
add %o0, 0x8, %o0
EX_LD(LOAD(ldd, %o1 + 0x8, %f4))
add %o1, 0x8, %o1
subcc %g2, 0x8, %g2
faligndata %f6, %f4, %f0
EX_ST(STORE(std, %f0, %o0))
bne,pt %icc, 1b
add %o0, 0x8, %o0
/* Destination is 64-byte aligned. */
3:
membar #LoadStore | #StoreStore | #StoreLoad
subcc %o2, 0x40, %GLOBAL_SPARE
add %o1, %g1, %g1
andncc %GLOBAL_SPARE, (0x40 - 1), %GLOBAL_SPARE
srl %g1, 3, %g2
sub %o2, %GLOBAL_SPARE, %g3
andn %o1, (0x40 - 1), %o1
and %g2, 7, %g2
andncc %g3, 0x7, %g3
fsrc2 %f0, %f2
sub %g3, 0x8, %g3
sub %o2, %GLOBAL_SPARE, %o2
add %g1, %GLOBAL_SPARE, %g1
subcc %o2, %g3, %o2
EX_LD(LOAD_BLK(%o1, %f0))
add %o1, 0x40, %o1
add %g1, %g3, %g1
EX_LD(LOAD_BLK(%o1, %f16))
add %o1, 0x40, %o1
sub %GLOBAL_SPARE, 0x80, %GLOBAL_SPARE
EX_LD(LOAD_BLK(%o1, %f32))
add %o1, 0x40, %o1
/* There are 8 instances of the unrolled loop,
* one for each possible alignment of the
* source buffer. Each loop instance is 452
* bytes.
*/
sll %g2, 3, %o3
sub %o3, %g2, %o3
sllx %o3, 4, %o3
add %o3, %g2, %o3
sllx %o3, 2, %g2
1: rd %pc, %o3
add %o3, %lo(1f - 1b), %o3
jmpl %o3 + %g2, %g0
nop
.align 64
1: FREG_FROB(f0, f2, f4, f6, f8, f10,f12,f14,f16)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f16,f18,f20,f22,f24,f26,f28,f30,f32)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f32,f34,f36,f38,f40,f42,f44,f46,f0)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f0, %f2, %f48
1: FREG_FROB(f16,f18,f20,f22,f24,f26,f28,f30,f32)
STORE_SYNC(o0, f48)
FREG_FROB(f32,f34,f36,f38,f40,f42,f44,f46,f0)
STORE_JUMP(o0, f48, 40f)
2: FREG_FROB(f32,f34,f36,f38,f40,f42,f44,f46,f0)
STORE_SYNC(o0, f48)
FREG_FROB(f0, f2, f4, f6, f8, f10,f12,f14,f16)
STORE_JUMP(o0, f48, 48f)
3: FREG_FROB(f0, f2, f4, f6, f8, f10,f12,f14,f16)
STORE_SYNC(o0, f48)
FREG_FROB(f16,f18,f20,f22,f24,f26,f28,f30,f32)
STORE_JUMP(o0, f48, 56f)
1: FREG_FROB(f2, f4, f6, f8, f10,f12,f14,f16,f18)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f18,f20,f22,f24,f26,f28,f30,f32,f34)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f34,f36,f38,f40,f42,f44,f46,f0, f2)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f2, %f4, %f48
1: FREG_FROB(f18,f20,f22,f24,f26,f28,f30,f32,f34)
STORE_SYNC(o0, f48)
FREG_FROB(f34,f36,f38,f40,f42,f44,f46,f0, f2)
STORE_JUMP(o0, f48, 41f)
2: FREG_FROB(f34,f36,f38,f40,f42,f44,f46,f0, f2)
STORE_SYNC(o0, f48)
FREG_FROB(f2, f4, f6, f8, f10,f12,f14,f16,f18)
STORE_JUMP(o0, f48, 49f)
3: FREG_FROB(f2, f4, f6, f8, f10,f12,f14,f16,f18)
STORE_SYNC(o0, f48)
FREG_FROB(f18,f20,f22,f24,f26,f28,f30,f32,f34)
STORE_JUMP(o0, f48, 57f)
1: FREG_FROB(f4, f6, f8, f10,f12,f14,f16,f18,f20)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f20,f22,f24,f26,f28,f30,f32,f34,f36)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f36,f38,f40,f42,f44,f46,f0, f2, f4)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f4, %f6, %f48
1: FREG_FROB(f20,f22,f24,f26,f28,f30,f32,f34,f36)
STORE_SYNC(o0, f48)
FREG_FROB(f36,f38,f40,f42,f44,f46,f0, f2, f4)
STORE_JUMP(o0, f48, 42f)
2: FREG_FROB(f36,f38,f40,f42,f44,f46,f0, f2, f4)
STORE_SYNC(o0, f48)
FREG_FROB(f4, f6, f8, f10,f12,f14,f16,f18,f20)
STORE_JUMP(o0, f48, 50f)
3: FREG_FROB(f4, f6, f8, f10,f12,f14,f16,f18,f20)
STORE_SYNC(o0, f48)
FREG_FROB(f20,f22,f24,f26,f28,f30,f32,f34,f36)
STORE_JUMP(o0, f48, 58f)
1: FREG_FROB(f6, f8, f10,f12,f14,f16,f18,f20,f22)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f22,f24,f26,f28,f30,f32,f34,f36,f38)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f38,f40,f42,f44,f46,f0, f2, f4, f6)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f6, %f8, %f48
1: FREG_FROB(f22,f24,f26,f28,f30,f32,f34,f36,f38)
STORE_SYNC(o0, f48)
FREG_FROB(f38,f40,f42,f44,f46,f0, f2, f4, f6)
STORE_JUMP(o0, f48, 43f)
2: FREG_FROB(f38,f40,f42,f44,f46,f0, f2, f4, f6)
STORE_SYNC(o0, f48)
FREG_FROB(f6, f8, f10,f12,f14,f16,f18,f20,f22)
STORE_JUMP(o0, f48, 51f)
3: FREG_FROB(f6, f8, f10,f12,f14,f16,f18,f20,f22)
STORE_SYNC(o0, f48)
FREG_FROB(f22,f24,f26,f28,f30,f32,f34,f36,f38)
STORE_JUMP(o0, f48, 59f)
1: FREG_FROB(f8, f10,f12,f14,f16,f18,f20,f22,f24)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f24,f26,f28,f30,f32,f34,f36,f38,f40)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f40,f42,f44,f46,f0, f2, f4, f6, f8)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f8, %f10, %f48
1: FREG_FROB(f24,f26,f28,f30,f32,f34,f36,f38,f40)
STORE_SYNC(o0, f48)
FREG_FROB(f40,f42,f44,f46,f0, f2, f4, f6, f8)
STORE_JUMP(o0, f48, 44f)
2: FREG_FROB(f40,f42,f44,f46,f0, f2, f4, f6, f8)
STORE_SYNC(o0, f48)
FREG_FROB(f8, f10,f12,f14,f16,f18,f20,f22,f24)
STORE_JUMP(o0, f48, 52f)
3: FREG_FROB(f8, f10,f12,f14,f16,f18,f20,f22,f24)
STORE_SYNC(o0, f48)
FREG_FROB(f24,f26,f28,f30,f32,f34,f36,f38,f40)
STORE_JUMP(o0, f48, 60f)
1: FREG_FROB(f10,f12,f14,f16,f18,f20,f22,f24,f26)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f26,f28,f30,f32,f34,f36,f38,f40,f42)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f42,f44,f46,f0, f2, f4, f6, f8, f10)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f10, %f12, %f48
1: FREG_FROB(f26,f28,f30,f32,f34,f36,f38,f40,f42)
STORE_SYNC(o0, f48)
FREG_FROB(f42,f44,f46,f0, f2, f4, f6, f8, f10)
STORE_JUMP(o0, f48, 45f)
2: FREG_FROB(f42,f44,f46,f0, f2, f4, f6, f8, f10)
STORE_SYNC(o0, f48)
FREG_FROB(f10,f12,f14,f16,f18,f20,f22,f24,f26)
STORE_JUMP(o0, f48, 53f)
3: FREG_FROB(f10,f12,f14,f16,f18,f20,f22,f24,f26)
STORE_SYNC(o0, f48)
FREG_FROB(f26,f28,f30,f32,f34,f36,f38,f40,f42)
STORE_JUMP(o0, f48, 61f)
1: FREG_FROB(f12,f14,f16,f18,f20,f22,f24,f26,f28)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f28,f30,f32,f34,f36,f38,f40,f42,f44)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f44,f46,f0, f2, f4, f6, f8, f10,f12)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f12, %f14, %f48
1: FREG_FROB(f28,f30,f32,f34,f36,f38,f40,f42,f44)
STORE_SYNC(o0, f48)
FREG_FROB(f44,f46,f0, f2, f4, f6, f8, f10,f12)
STORE_JUMP(o0, f48, 46f)
2: FREG_FROB(f44,f46,f0, f2, f4, f6, f8, f10,f12)
STORE_SYNC(o0, f48)
FREG_FROB(f12,f14,f16,f18,f20,f22,f24,f26,f28)
STORE_JUMP(o0, f48, 54f)
3: FREG_FROB(f12,f14,f16,f18,f20,f22,f24,f26,f28)
STORE_SYNC(o0, f48)
FREG_FROB(f28,f30,f32,f34,f36,f38,f40,f42,f44)
STORE_JUMP(o0, f48, 62f)
1: FREG_FROB(f14,f16,f18,f20,f22,f24,f26,f28,f30)
LOOP_CHUNK1(o1, o0, GLOBAL_SPARE, 1f)
FREG_FROB(f30,f32,f34,f36,f38,f40,f42,f44,f46)
LOOP_CHUNK2(o1, o0, GLOBAL_SPARE, 2f)
FREG_FROB(f46,f0, f2, f4, f6, f8, f10,f12,f14)
LOOP_CHUNK3(o1, o0, GLOBAL_SPARE, 3f)
ba,pt %xcc, 1b+4
faligndata %f14, %f16, %f48
1: FREG_FROB(f30,f32,f34,f36,f38,f40,f42,f44,f46)
STORE_SYNC(o0, f48)
FREG_FROB(f46,f0, f2, f4, f6, f8, f10,f12,f14)
STORE_JUMP(o0, f48, 47f)
2: FREG_FROB(f46,f0, f2, f4, f6, f8, f10,f12,f14)
STORE_SYNC(o0, f48)
FREG_FROB(f14,f16,f18,f20,f22,f24,f26,f28,f30)
STORE_JUMP(o0, f48, 55f)
3: FREG_FROB(f14,f16,f18,f20,f22,f24,f26,f28,f30)
STORE_SYNC(o0, f48)
FREG_FROB(f30,f32,f34,f36,f38,f40,f42,f44,f46)
STORE_JUMP(o0, f48, 63f)
40: FINISH_VISCHUNK(o0, f0, f2, g3)
41: FINISH_VISCHUNK(o0, f2, f4, g3)
42: FINISH_VISCHUNK(o0, f4, f6, g3)
43: FINISH_VISCHUNK(o0, f6, f8, g3)
44: FINISH_VISCHUNK(o0, f8, f10, g3)
45: FINISH_VISCHUNK(o0, f10, f12, g3)
46: FINISH_VISCHUNK(o0, f12, f14, g3)
47: UNEVEN_VISCHUNK(o0, f14, f0, g3)
48: FINISH_VISCHUNK(o0, f16, f18, g3)
49: FINISH_VISCHUNK(o0, f18, f20, g3)
50: FINISH_VISCHUNK(o0, f20, f22, g3)
51: FINISH_VISCHUNK(o0, f22, f24, g3)
52: FINISH_VISCHUNK(o0, f24, f26, g3)
53: FINISH_VISCHUNK(o0, f26, f28, g3)
54: FINISH_VISCHUNK(o0, f28, f30, g3)
55: UNEVEN_VISCHUNK(o0, f30, f0, g3)
56: FINISH_VISCHUNK(o0, f32, f34, g3)
57: FINISH_VISCHUNK(o0, f34, f36, g3)
58: FINISH_VISCHUNK(o0, f36, f38, g3)
59: FINISH_VISCHUNK(o0, f38, f40, g3)
60: FINISH_VISCHUNK(o0, f40, f42, g3)
61: FINISH_VISCHUNK(o0, f42, f44, g3)
62: FINISH_VISCHUNK(o0, f44, f46, g3)
63: UNEVEN_VISCHUNK_LAST(o0, f46, f0, g3)
93: EX_LD(LOAD(ldd, %o1, %f2))
add %o1, 8, %o1
subcc %g3, 8, %g3
faligndata %f0, %f2, %f8
EX_ST(STORE(std, %f8, %o0))
bl,pn %xcc, 95f
add %o0, 8, %o0
EX_LD(LOAD(ldd, %o1, %f0))
add %o1, 8, %o1
subcc %g3, 8, %g3
faligndata %f2, %f0, %f8
EX_ST(STORE(std, %f8, %o0))
bge,pt %xcc, 93b
add %o0, 8, %o0
95: brz,pt %o2, 2f
mov %g1, %o1
1: EX_LD(LOAD(ldub, %o1, %o3))
add %o1, 1, %o1
subcc %o2, 1, %o2
EX_ST(STORE(stb, %o3, %o0))
bne,pt %xcc, 1b
add %o0, 1, %o0
2: membar #StoreLoad | #StoreStore
VISExit
retl
mov EX_RETVAL(%o4), %o0
.align 64
70: /* 16 < len <= (5 * 64) */
bne,pn %XCC, 75f
sub %o0, %o1, %o3
72: andn %o2, 0xf, %GLOBAL_SPARE
and %o2, 0xf, %o2
1: EX_LD(LOAD(ldx, %o1 + 0x00, %o5))
EX_LD(LOAD(ldx, %o1 + 0x08, %g1))
subcc %GLOBAL_SPARE, 0x10, %GLOBAL_SPARE
EX_ST(STORE(stx, %o5, %o1 + %o3))
add %o1, 0x8, %o1
EX_ST(STORE(stx, %g1, %o1 + %o3))
bgu,pt %XCC, 1b
add %o1, 0x8, %o1
73: andcc %o2, 0x8, %g0
be,pt %XCC, 1f
nop
EX_LD(LOAD(ldx, %o1, %o5))
sub %o2, 0x8, %o2
EX_ST(STORE(stx, %o5, %o1 + %o3))
add %o1, 0x8, %o1
1: andcc %o2, 0x4, %g0
be,pt %XCC, 1f
nop
EX_LD(LOAD(lduw, %o1, %o5))
sub %o2, 0x4, %o2
EX_ST(STORE(stw, %o5, %o1 + %o3))
add %o1, 0x4, %o1
1: cmp %o2, 0
be,pt %XCC, 85f
nop
ba,pt %xcc, 90f
nop
75: andcc %o0, 0x7, %g1
sub %g1, 0x8, %g1
be,pn %icc, 2f
sub %g0, %g1, %g1
sub %o2, %g1, %o2
1: EX_LD(LOAD(ldub, %o1, %o5))
subcc %g1, 1, %g1
EX_ST(STORE(stb, %o5, %o1 + %o3))
bgu,pt %icc, 1b
add %o1, 1, %o1
2: add %o1, %o3, %o0
andcc %o1, 0x7, %g1
bne,pt %icc, 8f
sll %g1, 3, %g1
cmp %o2, 16
bgeu,pt %icc, 72b
nop
ba,a,pt %xcc, 73b
8: mov 64, %o3
andn %o1, 0x7, %o1
EX_LD(LOAD(ldx, %o1, %g2))
sub %o3, %g1, %o3
andn %o2, 0x7, %GLOBAL_SPARE
sllx %g2, %g1, %g2
1: EX_LD(LOAD(ldx, %o1 + 0x8, %g3))
subcc %GLOBAL_SPARE, 0x8, %GLOBAL_SPARE
add %o1, 0x8, %o1
srlx %g3, %o3, %o5
or %o5, %g2, %o5
EX_ST(STORE(stx, %o5, %o0))
add %o0, 0x8, %o0
bgu,pt %icc, 1b
sllx %g3, %g1, %g2
srl %g1, 3, %g1
andcc %o2, 0x7, %o2
be,pn %icc, 85f
add %o1, %g1, %o1
ba,pt %xcc, 90f
sub %o0, %o1, %o3
.align 64
80: /* 0 < len <= 16 */
andcc %o3, 0x3, %g0
bne,pn %XCC, 90f
sub %o0, %o1, %o3
1: EX_LD(LOAD(lduw, %o1, %g1))
subcc %o2, 4, %o2
EX_ST(STORE(stw, %g1, %o1 + %o3))
bgu,pt %XCC, 1b
add %o1, 4, %o1
85: retl
mov EX_RETVAL(%o4), %o0
.align 32
90: EX_LD(LOAD(ldub, %o1, %g1))
subcc %o2, 1, %o2
EX_ST(STORE(stb, %g1, %o1 + %o3))
bgu,pt %XCC, 90b
add %o1, 1, %o1
retl
mov EX_RETVAL(%o4), %o0
.size FUNC_NAME, .-FUNC_NAME

View file

@ -0,0 +1,18 @@
/* U3copy_from_user.S: UltraSparc-III optimized copy from userspace.
*
* Copyright (C) 1999, 2000, 2004 David S. Miller (davem@redhat.com)
*/
#define EX_LD(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one; \
.text; \
.align 4;
#define FUNC_NAME U3copy_from_user
#define LOAD(type,addr,dest) type##a [addr] %asi, dest
#define EX_RETVAL(x) 0
#include "U3memcpy.S"

View file

@ -0,0 +1,29 @@
/* U3copy_to_user.S: UltraSparc-III optimized copy to userspace.
*
* Copyright (C) 1999, 2000, 2004 David S. Miller (davem@redhat.com)
*/
#define EX_ST(x) \
98: x; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one; \
.text; \
.align 4;
#define FUNC_NAME U3copy_to_user
#define STORE(type,src,addr) type##a src, [addr] ASI_AIUS
#define STORE_BLK(src,addr) stda src, [addr] ASI_BLK_AIUS
#define EX_RETVAL(x) 0
/* Writing to %asi is _expensive_ so we hardcode it.
* Reading %asi to check for KERNEL_DS is comparatively
* cheap.
*/
#define PREAMBLE \
rd %asi, %g1; \
cmp %g1, ASI_AIUS; \
bne,pn %icc, ___copy_in_user; \
nop; \
#include "U3memcpy.S"

422
arch/sparc/lib/U3memcpy.S Normal file
View file

@ -0,0 +1,422 @@
/* U3memcpy.S: UltraSparc-III optimized memcpy.
*
* Copyright (C) 1999, 2000, 2004 David S. Miller (davem@redhat.com)
*/
#ifdef __KERNEL__
#include <asm/visasm.h>
#include <asm/asi.h>
#define GLOBAL_SPARE %g7
#else
#define ASI_BLK_P 0xf0
#define FPRS_FEF 0x04
#ifdef MEMCPY_DEBUG
#define VISEntryHalf rd %fprs, %o5; wr %g0, FPRS_FEF, %fprs; \
clr %g1; clr %g2; clr %g3; subcc %g0, %g0, %g0;
#define VISExitHalf and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#else
#define VISEntryHalf rd %fprs, %o5; wr %g0, FPRS_FEF, %fprs
#define VISExitHalf and %o5, FPRS_FEF, %o5; wr %o5, 0x0, %fprs
#endif
#define GLOBAL_SPARE %g5
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
#ifndef EX_ST
#define EX_ST(x) x
#endif
#ifndef EX_RETVAL
#define EX_RETVAL(x) x
#endif
#ifndef LOAD
#define LOAD(type,addr,dest) type [addr], dest
#endif
#ifndef STORE
#define STORE(type,src,addr) type src, [addr]
#endif
#ifndef STORE_BLK
#define STORE_BLK(src,addr) stda src, [addr] ASI_BLK_P
#endif
#ifndef FUNC_NAME
#define FUNC_NAME U3memcpy
#endif
#ifndef PREAMBLE
#define PREAMBLE
#endif
#ifndef XCC
#define XCC xcc
#endif
.register %g2,#scratch
.register %g3,#scratch
/* Special/non-trivial issues of this code:
*
* 1) %o5 is preserved from VISEntryHalf to VISExitHalf
* 2) Only low 32 FPU registers are used so that only the
* lower half of the FPU register set is dirtied by this
* code. This is especially important in the kernel.
* 3) This code never prefetches cachelines past the end
* of the source buffer.
*/
.text
.align 64
/* The cheetah's flexible spine, oversized liver, enlarged heart,
* slender muscular body, and claws make it the swiftest hunter
* in Africa and the fastest animal on land. Can reach speeds
* of up to 2.4GB per second.
*/
.globl FUNC_NAME
.type FUNC_NAME,#function
FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
srlx %o2, 31, %g2
cmp %g2, 0
tne %xcc, 5
PREAMBLE
mov %o0, %o4
cmp %o2, 0
be,pn %XCC, 85f
or %o0, %o1, %o3
cmp %o2, 16
blu,a,pn %XCC, 80f
or %o3, %o2, %o3
cmp %o2, (3 * 64)
blu,pt %XCC, 70f
andcc %o3, 0x7, %g0
/* Clobbers o5/g1/g2/g3/g7/icc/xcc. We must preserve
* o5 from here until we hit VISExitHalf.
*/
VISEntryHalf
/* Is 'dst' already aligned on an 64-byte boundary? */
andcc %o0, 0x3f, %g2
be,pt %XCC, 2f
/* Compute abs((dst & 0x3f) - 0x40) into %g2. This is the number
* of bytes to copy to make 'dst' 64-byte aligned. We pre-
* subtract this from 'len'.
*/
sub %o0, %o1, GLOBAL_SPARE
sub %g2, 0x40, %g2
sub %g0, %g2, %g2
sub %o2, %g2, %o2
andcc %g2, 0x7, %g1
be,pt %icc, 2f
and %g2, 0x38, %g2
1: subcc %g1, 0x1, %g1
EX_LD(LOAD(ldub, %o1 + 0x00, %o3))
EX_ST(STORE(stb, %o3, %o1 + GLOBAL_SPARE))
bgu,pt %XCC, 1b
add %o1, 0x1, %o1
add %o1, GLOBAL_SPARE, %o0
2: cmp %g2, 0x0
and %o1, 0x7, %g1
be,pt %icc, 3f
alignaddr %o1, %g0, %o1
EX_LD(LOAD(ldd, %o1, %f4))
1: EX_LD(LOAD(ldd, %o1 + 0x8, %f6))
add %o1, 0x8, %o1
subcc %g2, 0x8, %g2
faligndata %f4, %f6, %f0
EX_ST(STORE(std, %f0, %o0))
be,pn %icc, 3f
add %o0, 0x8, %o0
EX_LD(LOAD(ldd, %o1 + 0x8, %f4))
add %o1, 0x8, %o1
subcc %g2, 0x8, %g2
faligndata %f6, %f4, %f2
EX_ST(STORE(std, %f2, %o0))
bne,pt %icc, 1b
add %o0, 0x8, %o0
3: LOAD(prefetch, %o1 + 0x000, #one_read)
LOAD(prefetch, %o1 + 0x040, #one_read)
andn %o2, (0x40 - 1), GLOBAL_SPARE
LOAD(prefetch, %o1 + 0x080, #one_read)
LOAD(prefetch, %o1 + 0x0c0, #one_read)
LOAD(prefetch, %o1 + 0x100, #one_read)
EX_LD(LOAD(ldd, %o1 + 0x000, %f0))
LOAD(prefetch, %o1 + 0x140, #one_read)
EX_LD(LOAD(ldd, %o1 + 0x008, %f2))
LOAD(prefetch, %o1 + 0x180, #one_read)
EX_LD(LOAD(ldd, %o1 + 0x010, %f4))
LOAD(prefetch, %o1 + 0x1c0, #one_read)
faligndata %f0, %f2, %f16
EX_LD(LOAD(ldd, %o1 + 0x018, %f6))
faligndata %f2, %f4, %f18
EX_LD(LOAD(ldd, %o1 + 0x020, %f8))
faligndata %f4, %f6, %f20
EX_LD(LOAD(ldd, %o1 + 0x028, %f10))
faligndata %f6, %f8, %f22
EX_LD(LOAD(ldd, %o1 + 0x030, %f12))
faligndata %f8, %f10, %f24
EX_LD(LOAD(ldd, %o1 + 0x038, %f14))
faligndata %f10, %f12, %f26
EX_LD(LOAD(ldd, %o1 + 0x040, %f0))
subcc GLOBAL_SPARE, 0x80, GLOBAL_SPARE
add %o1, 0x40, %o1
bgu,pt %XCC, 1f
srl GLOBAL_SPARE, 6, %o3
ba,pt %xcc, 2f
nop
.align 64
1:
EX_LD(LOAD(ldd, %o1 + 0x008, %f2))
faligndata %f12, %f14, %f28
EX_LD(LOAD(ldd, %o1 + 0x010, %f4))
faligndata %f14, %f0, %f30
EX_ST(STORE_BLK(%f16, %o0))
EX_LD(LOAD(ldd, %o1 + 0x018, %f6))
faligndata %f0, %f2, %f16
add %o0, 0x40, %o0
EX_LD(LOAD(ldd, %o1 + 0x020, %f8))
faligndata %f2, %f4, %f18
EX_LD(LOAD(ldd, %o1 + 0x028, %f10))
faligndata %f4, %f6, %f20
EX_LD(LOAD(ldd, %o1 + 0x030, %f12))
subcc %o3, 0x01, %o3
faligndata %f6, %f8, %f22
EX_LD(LOAD(ldd, %o1 + 0x038, %f14))
faligndata %f8, %f10, %f24
EX_LD(LOAD(ldd, %o1 + 0x040, %f0))
LOAD(prefetch, %o1 + 0x1c0, #one_read)
faligndata %f10, %f12, %f26
bg,pt %XCC, 1b
add %o1, 0x40, %o1
/* Finally we copy the last full 64-byte block. */
2:
EX_LD(LOAD(ldd, %o1 + 0x008, %f2))
faligndata %f12, %f14, %f28
EX_LD(LOAD(ldd, %o1 + 0x010, %f4))
faligndata %f14, %f0, %f30
EX_ST(STORE_BLK(%f16, %o0))
EX_LD(LOAD(ldd, %o1 + 0x018, %f6))
faligndata %f0, %f2, %f16
EX_LD(LOAD(ldd, %o1 + 0x020, %f8))
faligndata %f2, %f4, %f18
EX_LD(LOAD(ldd, %o1 + 0x028, %f10))
faligndata %f4, %f6, %f20
EX_LD(LOAD(ldd, %o1 + 0x030, %f12))
faligndata %f6, %f8, %f22
EX_LD(LOAD(ldd, %o1 + 0x038, %f14))
faligndata %f8, %f10, %f24
cmp %g1, 0
be,pt %XCC, 1f
add %o0, 0x40, %o0
EX_LD(LOAD(ldd, %o1 + 0x040, %f0))
1: faligndata %f10, %f12, %f26
faligndata %f12, %f14, %f28
faligndata %f14, %f0, %f30
EX_ST(STORE_BLK(%f16, %o0))
add %o0, 0x40, %o0
add %o1, 0x40, %o1
membar #Sync
/* Now we copy the (len modulo 64) bytes at the end.
* Note how we borrow the %f0 loaded above.
*
* Also notice how this code is careful not to perform a
* load past the end of the src buffer.
*/
and %o2, 0x3f, %o2
andcc %o2, 0x38, %g2
be,pn %XCC, 2f
subcc %g2, 0x8, %g2
be,pn %XCC, 2f
cmp %g1, 0
sub %o2, %g2, %o2
be,a,pt %XCC, 1f
EX_LD(LOAD(ldd, %o1 + 0x00, %f0))
1: EX_LD(LOAD(ldd, %o1 + 0x08, %f2))
add %o1, 0x8, %o1
subcc %g2, 0x8, %g2
faligndata %f0, %f2, %f8
EX_ST(STORE(std, %f8, %o0))
be,pn %XCC, 2f
add %o0, 0x8, %o0
EX_LD(LOAD(ldd, %o1 + 0x08, %f0))
add %o1, 0x8, %o1
subcc %g2, 0x8, %g2
faligndata %f2, %f0, %f8
EX_ST(STORE(std, %f8, %o0))
bne,pn %XCC, 1b
add %o0, 0x8, %o0
/* If anything is left, we copy it one byte at a time.
* Note that %g1 is (src & 0x3) saved above before the
* alignaddr was performed.
*/
2:
cmp %o2, 0
add %o1, %g1, %o1
VISExitHalf
be,pn %XCC, 85f
sub %o0, %o1, %o3
andcc %g1, 0x7, %g0
bne,pn %icc, 90f
andcc %o2, 0x8, %g0
be,pt %icc, 1f
nop
EX_LD(LOAD(ldx, %o1, %o5))
EX_ST(STORE(stx, %o5, %o1 + %o3))
add %o1, 0x8, %o1
1: andcc %o2, 0x4, %g0
be,pt %icc, 1f
nop
EX_LD(LOAD(lduw, %o1, %o5))
EX_ST(STORE(stw, %o5, %o1 + %o3))
add %o1, 0x4, %o1
1: andcc %o2, 0x2, %g0
be,pt %icc, 1f
nop
EX_LD(LOAD(lduh, %o1, %o5))
EX_ST(STORE(sth, %o5, %o1 + %o3))
add %o1, 0x2, %o1
1: andcc %o2, 0x1, %g0
be,pt %icc, 85f
nop
EX_LD(LOAD(ldub, %o1, %o5))
ba,pt %xcc, 85f
EX_ST(STORE(stb, %o5, %o1 + %o3))
.align 64
70: /* 16 < len <= 64 */
bne,pn %XCC, 75f
sub %o0, %o1, %o3
72:
andn %o2, 0xf, GLOBAL_SPARE
and %o2, 0xf, %o2
1: subcc GLOBAL_SPARE, 0x10, GLOBAL_SPARE
EX_LD(LOAD(ldx, %o1 + 0x00, %o5))
EX_LD(LOAD(ldx, %o1 + 0x08, %g1))
EX_ST(STORE(stx, %o5, %o1 + %o3))
add %o1, 0x8, %o1
EX_ST(STORE(stx, %g1, %o1 + %o3))
bgu,pt %XCC, 1b
add %o1, 0x8, %o1
73: andcc %o2, 0x8, %g0
be,pt %XCC, 1f
nop
sub %o2, 0x8, %o2
EX_LD(LOAD(ldx, %o1, %o5))
EX_ST(STORE(stx, %o5, %o1 + %o3))
add %o1, 0x8, %o1
1: andcc %o2, 0x4, %g0
be,pt %XCC, 1f
nop
sub %o2, 0x4, %o2
EX_LD(LOAD(lduw, %o1, %o5))
EX_ST(STORE(stw, %o5, %o1 + %o3))
add %o1, 0x4, %o1
1: cmp %o2, 0
be,pt %XCC, 85f
nop
ba,pt %xcc, 90f
nop
75:
andcc %o0, 0x7, %g1
sub %g1, 0x8, %g1
be,pn %icc, 2f
sub %g0, %g1, %g1
sub %o2, %g1, %o2
1: subcc %g1, 1, %g1
EX_LD(LOAD(ldub, %o1, %o5))
EX_ST(STORE(stb, %o5, %o1 + %o3))
bgu,pt %icc, 1b
add %o1, 1, %o1
2: add %o1, %o3, %o0
andcc %o1, 0x7, %g1
bne,pt %icc, 8f
sll %g1, 3, %g1
cmp %o2, 16
bgeu,pt %icc, 72b
nop
ba,a,pt %xcc, 73b
8: mov 64, %o3
andn %o1, 0x7, %o1
EX_LD(LOAD(ldx, %o1, %g2))
sub %o3, %g1, %o3
andn %o2, 0x7, GLOBAL_SPARE
sllx %g2, %g1, %g2
1: EX_LD(LOAD(ldx, %o1 + 0x8, %g3))
subcc GLOBAL_SPARE, 0x8, GLOBAL_SPARE
add %o1, 0x8, %o1
srlx %g3, %o3, %o5
or %o5, %g2, %o5
EX_ST(STORE(stx, %o5, %o0))
add %o0, 0x8, %o0
bgu,pt %icc, 1b
sllx %g3, %g1, %g2
srl %g1, 3, %g1
andcc %o2, 0x7, %o2
be,pn %icc, 85f
add %o1, %g1, %o1
ba,pt %xcc, 90f
sub %o0, %o1, %o3
.align 64
80: /* 0 < len <= 16 */
andcc %o3, 0x3, %g0
bne,pn %XCC, 90f
sub %o0, %o1, %o3
1:
subcc %o2, 4, %o2
EX_LD(LOAD(lduw, %o1, %g1))
EX_ST(STORE(stw, %g1, %o1 + %o3))
bgu,pt %XCC, 1b
add %o1, 4, %o1
85: retl
mov EX_RETVAL(%o4), %o0
.align 32
90:
subcc %o2, 1, %o2
EX_LD(LOAD(ldub, %o1, %g1))
EX_ST(STORE(stb, %g1, %o1 + %o3))
bgu,pt %XCC, 90b
add %o1, 1, %o1
retl
mov EX_RETVAL(%o4), %o0
.size FUNC_NAME, .-FUNC_NAME

33
arch/sparc/lib/U3patch.S Normal file
View file

@ -0,0 +1,33 @@
/* U3patch.S: Patch Ultra-I routines with Ultra-III variant.
*
* Copyright (C) 2004 David S. Miller <davem@redhat.com>
*/
#define BRANCH_ALWAYS 0x10680000
#define NOP 0x01000000
#define ULTRA3_DO_PATCH(OLD, NEW) \
sethi %hi(NEW), %g1; \
or %g1, %lo(NEW), %g1; \
sethi %hi(OLD), %g2; \
or %g2, %lo(OLD), %g2; \
sub %g1, %g2, %g1; \
sethi %hi(BRANCH_ALWAYS), %g3; \
sll %g1, 11, %g1; \
srl %g1, 11 + 2, %g1; \
or %g3, %lo(BRANCH_ALWAYS), %g3; \
or %g3, %g1, %g3; \
stw %g3, [%g2]; \
sethi %hi(NOP), %g3; \
or %g3, %lo(NOP), %g3; \
stw %g3, [%g2 + 0x4]; \
flush %g2;
.globl cheetah_patch_copyops
.type cheetah_patch_copyops,#function
cheetah_patch_copyops:
ULTRA3_DO_PATCH(memcpy, U3memcpy)
ULTRA3_DO_PATCH(___copy_from_user, U3copy_from_user)
ULTRA3_DO_PATCH(___copy_to_user, U3copy_to_user)
retl
nop
.size cheetah_patch_copyops,.-cheetah_patch_copyops

144
arch/sparc/lib/VISsave.S Normal file
View file

@ -0,0 +1,144 @@
/*
* VISsave.S: Code for saving FPU register state for
* VIS routines. One should not call this directly,
* but use macros provided in <asm/visasm.h>.
*
* Copyright (C) 1998 Jakub Jelinek (jj@ultra.linux.cz)
*/
#include <asm/asi.h>
#include <asm/page.h>
#include <asm/ptrace.h>
#include <asm/visasm.h>
#include <asm/thread_info.h>
.text
.globl VISenter, VISenterhalf
/* On entry: %o5=current FPRS value, %g7 is callers address */
/* May clobber %o5, %g1, %g2, %g3, %g7, %icc, %xcc */
/* Nothing special need be done here to handle pre-emption, this
* FPU save/restore mechanism is already preemption safe.
*/
.align 32
VISenter:
ldub [%g6 + TI_FPDEPTH], %g1
brnz,a,pn %g1, 1f
cmp %g1, 1
stb %g0, [%g6 + TI_FPSAVED]
stx %fsr, [%g6 + TI_XFSR]
9: jmpl %g7 + %g0, %g0
nop
1: bne,pn %icc, 2f
srl %g1, 1, %g1
vis1: ldub [%g6 + TI_FPSAVED], %g3
stx %fsr, [%g6 + TI_XFSR]
or %g3, %o5, %g3
stb %g3, [%g6 + TI_FPSAVED]
rd %gsr, %g3
clr %g1
ba,pt %xcc, 3f
stx %g3, [%g6 + TI_GSR]
2: add %g6, %g1, %g3
cmp %o5, FPRS_DU
be,pn %icc, 6f
sll %g1, 3, %g1
stb %o5, [%g3 + TI_FPSAVED]
rd %gsr, %g2
add %g6, %g1, %g3
stx %g2, [%g3 + TI_GSR]
add %g6, %g1, %g2
stx %fsr, [%g2 + TI_XFSR]
sll %g1, 5, %g1
3: andcc %o5, FPRS_DL|FPRS_DU, %g0
be,pn %icc, 9b
add %g6, TI_FPREGS, %g2
andcc %o5, FPRS_DL, %g0
be,pn %icc, 4f
add %g6, TI_FPREGS+0x40, %g3
membar #Sync
stda %f0, [%g2 + %g1] ASI_BLK_P
stda %f16, [%g3 + %g1] ASI_BLK_P
membar #Sync
andcc %o5, FPRS_DU, %g0
be,pn %icc, 5f
4: add %g1, 128, %g1
membar #Sync
stda %f32, [%g2 + %g1] ASI_BLK_P
stda %f48, [%g3 + %g1] ASI_BLK_P
5: membar #Sync
ba,pt %xcc, 80f
nop
.align 32
80: jmpl %g7 + %g0, %g0
nop
6: ldub [%g3 + TI_FPSAVED], %o5
or %o5, FPRS_DU, %o5
add %g6, TI_FPREGS+0x80, %g2
stb %o5, [%g3 + TI_FPSAVED]
sll %g1, 5, %g1
add %g6, TI_FPREGS+0xc0, %g3
wr %g0, FPRS_FEF, %fprs
membar #Sync
stda %f32, [%g2 + %g1] ASI_BLK_P
stda %f48, [%g3 + %g1] ASI_BLK_P
membar #Sync
ba,pt %xcc, 80f
nop
.align 32
80: jmpl %g7 + %g0, %g0
nop
.align 32
VISenterhalf:
ldub [%g6 + TI_FPDEPTH], %g1
brnz,a,pn %g1, 1f
cmp %g1, 1
stb %g0, [%g6 + TI_FPSAVED]
stx %fsr, [%g6 + TI_XFSR]
clr %o5
jmpl %g7 + %g0, %g0
wr %g0, FPRS_FEF, %fprs
1: bne,pn %icc, 2f
srl %g1, 1, %g1
ba,pt %xcc, vis1
sub %g7, 8, %g7
2: addcc %g6, %g1, %g3
sll %g1, 3, %g1
andn %o5, FPRS_DU, %g2
stb %g2, [%g3 + TI_FPSAVED]
rd %gsr, %g2
add %g6, %g1, %g3
stx %g2, [%g3 + TI_GSR]
add %g6, %g1, %g2
stx %fsr, [%g2 + TI_XFSR]
sll %g1, 5, %g1
3: andcc %o5, FPRS_DL, %g0
be,pn %icc, 4f
add %g6, TI_FPREGS, %g2
add %g6, TI_FPREGS+0x40, %g3
membar #Sync
stda %f0, [%g2 + %g1] ASI_BLK_P
stda %f16, [%g3 + %g1] ASI_BLK_P
membar #Sync
ba,pt %xcc, 4f
nop
.align 32
4: and %o5, FPRS_DU, %o5
jmpl %g7 + %g0, %g0
wr %o5, FPRS_FEF, %fprs

35
arch/sparc/lib/ashldi3.S Normal file
View file

@ -0,0 +1,35 @@
/*
* ashldi3.S: GCC emits these for certain drivers playing
* with long longs.
*
* Copyright (C) 1999 David S. Miller (davem@redhat.com)
*/
#include <linux/linkage.h>
.text
ENTRY(__ashldi3)
cmp %o2, 0
be 9f
mov 0x20, %g2
sub %g2, %o2, %g2
cmp %g2, 0
bg 7f
sll %o0, %o2, %g3
neg %g2
clr %o5
b 8f
sll %o1, %g2, %o4
7:
srl %o1, %g2, %g2
sll %o1, %o2, %o5
or %g3, %g2, %o4
8:
mov %o4, %o0
mov %o5, %o1
9:
retl
nop
ENDPROC(__ashldi3)

37
arch/sparc/lib/ashrdi3.S Normal file
View file

@ -0,0 +1,37 @@
/*
* ashrdi3.S: The filesystem code creates all kinds of references to
* this little routine on the sparc with gcc.
*
* Copyright (C) 1995 David S. Miller (davem@caip.rutgers.edu)
*/
#include <linux/linkage.h>
.text
ENTRY(__ashrdi3)
tst %o2
be 3f
or %g0, 32, %g2
sub %g2, %o2, %g2
tst %g2
bg 1f
sra %o0, %o2, %o4
sra %o0, 31, %o4
sub %g0, %g2, %g2
ba 2f
sra %o0, %g2, %o5
1:
sll %o0, %g2, %g3
srl %o1, %o2, %g2
or %g2, %g3, %o5
2:
or %g0, %o4, %o0
or %g0, %o5, %o1
3:
jmpl %o7 + 8, %g0
nop
ENDPROC(__ashrdi3)

166
arch/sparc/lib/atomic32.c Normal file
View file

@ -0,0 +1,166 @@
/*
* atomic32.c: 32-bit atomic_t implementation
*
* Copyright (C) 2004 Keith M Wesolowski
* Copyright (C) 2007 Kyle McMartin
*
* Based on asm-parisc/atomic.h Copyright (C) 2000 Philipp Rumpf
*/
#include <linux/atomic.h>
#include <linux/spinlock.h>
#include <linux/module.h>
#ifdef CONFIG_SMP
#define ATOMIC_HASH_SIZE 4
#define ATOMIC_HASH(a) (&__atomic_hash[(((unsigned long)a)>>8) & (ATOMIC_HASH_SIZE-1)])
spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] = {
[0 ... (ATOMIC_HASH_SIZE-1)] = __SPIN_LOCK_UNLOCKED(__atomic_hash)
};
#else /* SMP */
static DEFINE_SPINLOCK(dummy);
#define ATOMIC_HASH_SIZE 1
#define ATOMIC_HASH(a) (&dummy)
#endif /* SMP */
#define ATOMIC_OP(op, cop) \
int atomic_##op##_return(int i, atomic_t *v) \
{ \
int ret; \
unsigned long flags; \
spin_lock_irqsave(ATOMIC_HASH(v), flags); \
\
ret = (v->counter cop i); \
\
spin_unlock_irqrestore(ATOMIC_HASH(v), flags); \
return ret; \
} \
EXPORT_SYMBOL(atomic_##op##_return);
ATOMIC_OP(add, +=)
#undef ATOMIC_OP
int atomic_xchg(atomic_t *v, int new)
{
int ret;
unsigned long flags;
spin_lock_irqsave(ATOMIC_HASH(v), flags);
ret = v->counter;
v->counter = new;
spin_unlock_irqrestore(ATOMIC_HASH(v), flags);
return ret;
}
EXPORT_SYMBOL(atomic_xchg);
int atomic_cmpxchg(atomic_t *v, int old, int new)
{
int ret;
unsigned long flags;
spin_lock_irqsave(ATOMIC_HASH(v), flags);
ret = v->counter;
if (likely(ret == old))
v->counter = new;
spin_unlock_irqrestore(ATOMIC_HASH(v), flags);
return ret;
}
EXPORT_SYMBOL(atomic_cmpxchg);
int __atomic_add_unless(atomic_t *v, int a, int u)
{
int ret;
unsigned long flags;
spin_lock_irqsave(ATOMIC_HASH(v), flags);
ret = v->counter;
if (ret != u)
v->counter += a;
spin_unlock_irqrestore(ATOMIC_HASH(v), flags);
return ret;
}
EXPORT_SYMBOL(__atomic_add_unless);
/* Atomic operations are already serializing */
void atomic_set(atomic_t *v, int i)
{
unsigned long flags;
spin_lock_irqsave(ATOMIC_HASH(v), flags);
v->counter = i;
spin_unlock_irqrestore(ATOMIC_HASH(v), flags);
}
EXPORT_SYMBOL(atomic_set);
unsigned long ___set_bit(unsigned long *addr, unsigned long mask)
{
unsigned long old, flags;
spin_lock_irqsave(ATOMIC_HASH(addr), flags);
old = *addr;
*addr = old | mask;
spin_unlock_irqrestore(ATOMIC_HASH(addr), flags);
return old & mask;
}
EXPORT_SYMBOL(___set_bit);
unsigned long ___clear_bit(unsigned long *addr, unsigned long mask)
{
unsigned long old, flags;
spin_lock_irqsave(ATOMIC_HASH(addr), flags);
old = *addr;
*addr = old & ~mask;
spin_unlock_irqrestore(ATOMIC_HASH(addr), flags);
return old & mask;
}
EXPORT_SYMBOL(___clear_bit);
unsigned long ___change_bit(unsigned long *addr, unsigned long mask)
{
unsigned long old, flags;
spin_lock_irqsave(ATOMIC_HASH(addr), flags);
old = *addr;
*addr = old ^ mask;
spin_unlock_irqrestore(ATOMIC_HASH(addr), flags);
return old & mask;
}
EXPORT_SYMBOL(___change_bit);
unsigned long __cmpxchg_u32(volatile u32 *ptr, u32 old, u32 new)
{
unsigned long flags;
u32 prev;
spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
if ((prev = *ptr) == old)
*ptr = new;
spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
return (unsigned long)prev;
}
EXPORT_SYMBOL(__cmpxchg_u32);
unsigned long __xchg_u32(volatile u32 *ptr, u32 new)
{
unsigned long flags;
u32 prev;
spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
prev = *ptr;
*ptr = new;
spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
return (unsigned long)prev;
}
EXPORT_SYMBOL(__xchg_u32);

104
arch/sparc/lib/atomic_64.S Normal file
View file

@ -0,0 +1,104 @@
/* atomic.S: These things are too big to do inline.
*
* Copyright (C) 1999, 2007 2012 David S. Miller (davem@davemloft.net)
*/
#include <linux/linkage.h>
#include <asm/asi.h>
#include <asm/backoff.h>
.text
/* Two versions of the atomic routines, one that
* does not return a value and does not perform
* memory barriers, and a second which returns
* a value and does the barriers.
*/
#define ATOMIC_OP(op) \
ENTRY(atomic_##op) /* %o0 = increment, %o1 = atomic_ptr */ \
BACKOFF_SETUP(%o2); \
1: lduw [%o1], %g1; \
op %g1, %o0, %g7; \
cas [%o1], %g1, %g7; \
cmp %g1, %g7; \
bne,pn %icc, BACKOFF_LABEL(2f, 1b); \
nop; \
retl; \
nop; \
2: BACKOFF_SPIN(%o2, %o3, 1b); \
ENDPROC(atomic_##op); \
#define ATOMIC_OP_RETURN(op) \
ENTRY(atomic_##op##_return) /* %o0 = increment, %o1 = atomic_ptr */ \
BACKOFF_SETUP(%o2); \
1: lduw [%o1], %g1; \
op %g1, %o0, %g7; \
cas [%o1], %g1, %g7; \
cmp %g1, %g7; \
bne,pn %icc, BACKOFF_LABEL(2f, 1b); \
op %g1, %o0, %g1; \
retl; \
sra %g1, 0, %o0; \
2: BACKOFF_SPIN(%o2, %o3, 1b); \
ENDPROC(atomic_##op##_return);
#define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_OP_RETURN(op)
ATOMIC_OPS(add)
ATOMIC_OPS(sub)
#undef ATOMIC_OPS
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
#define ATOMIC64_OP(op) \
ENTRY(atomic64_##op) /* %o0 = increment, %o1 = atomic_ptr */ \
BACKOFF_SETUP(%o2); \
1: ldx [%o1], %g1; \
op %g1, %o0, %g7; \
casx [%o1], %g1, %g7; \
cmp %g1, %g7; \
bne,pn %xcc, BACKOFF_LABEL(2f, 1b); \
nop; \
retl; \
nop; \
2: BACKOFF_SPIN(%o2, %o3, 1b); \
ENDPROC(atomic64_##op); \
#define ATOMIC64_OP_RETURN(op) \
ENTRY(atomic64_##op##_return) /* %o0 = increment, %o1 = atomic_ptr */ \
BACKOFF_SETUP(%o2); \
1: ldx [%o1], %g1; \
op %g1, %o0, %g7; \
casx [%o1], %g1, %g7; \
cmp %g1, %g7; \
bne,pn %xcc, BACKOFF_LABEL(2f, 1b); \
nop; \
retl; \
op %g1, %o0, %o0; \
2: BACKOFF_SPIN(%o2, %o3, 1b); \
ENDPROC(atomic64_##op##_return);
#define ATOMIC64_OPS(op) ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op)
ATOMIC64_OPS(add)
ATOMIC64_OPS(sub)
#undef ATOMIC64_OPS
#undef ATOMIC64_OP_RETURN
#undef ATOMIC64_OP
ENTRY(atomic64_dec_if_positive) /* %o0 = atomic_ptr */
BACKOFF_SETUP(%o2)
1: ldx [%o0], %g1
brlez,pn %g1, 3f
sub %g1, 1, %g7
casx [%o0], %g1, %g7
cmp %g1, %g7
bne,pn %xcc, BACKOFF_LABEL(2f, 1b)
nop
3: retl
sub %g1, 1, %o0
2: BACKOFF_SPIN(%o2, %o3, 1b)
ENDPROC(atomic64_dec_if_positive)

127
arch/sparc/lib/bitext.c Normal file
View file

@ -0,0 +1,127 @@
/*
* bitext.c: kernel little helper (of bit shuffling variety).
*
* Copyright (C) 2002 Pete Zaitcev <zaitcev@yahoo.com>
*
* The algorithm to search a zero bit string is geared towards its application.
* We expect a couple of fixed sizes of requests, so a rotating counter, reset
* by align size, should provide fast enough search while maintaining low
* fragmentation.
*/
#include <linux/string.h>
#include <linux/bitmap.h>
#include <asm/bitext.h>
/**
* bit_map_string_get - find and set a bit string in bit map.
* @t: the bit map.
* @len: requested string length
* @align: requested alignment
*
* Returns offset in the map or -1 if out of space.
*
* Not safe to call from an interrupt (uses spin_lock).
*/
int bit_map_string_get(struct bit_map *t, int len, int align)
{
int offset, count; /* siamese twins */
int off_new;
int align1;
int i, color;
if (t->num_colors) {
/* align is overloaded to be the page color */
color = align;
align = t->num_colors;
} else {
color = 0;
if (align == 0)
align = 1;
}
align1 = align - 1;
if ((align & align1) != 0)
BUG();
if (align < 0 || align >= t->size)
BUG();
if (len <= 0 || len > t->size)
BUG();
color &= align1;
spin_lock(&t->lock);
if (len < t->last_size)
offset = t->first_free;
else
offset = t->last_off & ~align1;
count = 0;
for (;;) {
off_new = find_next_zero_bit(t->map, t->size, offset);
off_new = ((off_new + align1) & ~align1) + color;
count += off_new - offset;
offset = off_new;
if (offset >= t->size)
offset = 0;
if (count + len > t->size) {
spin_unlock(&t->lock);
/* P3 */ printk(KERN_ERR
"bitmap out: size %d used %d off %d len %d align %d count %d\n",
t->size, t->used, offset, len, align, count);
return -1;
}
if (offset + len > t->size) {
count += t->size - offset;
offset = 0;
continue;
}
i = 0;
while (test_bit(offset + i, t->map) == 0) {
i++;
if (i == len) {
bitmap_set(t->map, offset, len);
if (offset == t->first_free)
t->first_free = find_next_zero_bit
(t->map, t->size,
t->first_free + len);
if ((t->last_off = offset + len) >= t->size)
t->last_off = 0;
t->used += len;
t->last_size = len;
spin_unlock(&t->lock);
return offset;
}
}
count += i + 1;
if ((offset += i + 1) >= t->size)
offset = 0;
}
}
void bit_map_clear(struct bit_map *t, int offset, int len)
{
int i;
if (t->used < len)
BUG(); /* Much too late to do any good, but alas... */
spin_lock(&t->lock);
for (i = 0; i < len; i++) {
if (test_bit(offset + i, t->map) == 0)
BUG();
__clear_bit(offset + i, t->map);
}
if (offset < t->first_free)
t->first_free = offset;
t->used -= len;
spin_unlock(&t->lock);
}
void bit_map_init(struct bit_map *t, unsigned long *map, int size)
{
bitmap_zero(map, size);
memset(t, 0, sizeof *t);
spin_lock_init(&t->lock);
t->map = map;
t->size = size;
}

130
arch/sparc/lib/bitops.S Normal file
View file

@ -0,0 +1,130 @@
/* bitops.S: Sparc64 atomic bit operations.
*
* Copyright (C) 2000, 2007 David S. Miller (davem@davemloft.net)
*/
#include <linux/linkage.h>
#include <asm/asi.h>
#include <asm/backoff.h>
.text
ENTRY(test_and_set_bit) /* %o0=nr, %o1=addr */
BACKOFF_SETUP(%o3)
srlx %o0, 6, %g1
mov 1, %o2
sllx %g1, 3, %g3
and %o0, 63, %g2
sllx %o2, %g2, %o2
add %o1, %g3, %o1
1: ldx [%o1], %g7
or %g7, %o2, %g1
casx [%o1], %g7, %g1
cmp %g7, %g1
bne,pn %xcc, BACKOFF_LABEL(2f, 1b)
and %g7, %o2, %g2
clr %o0
movrne %g2, 1, %o0
retl
nop
2: BACKOFF_SPIN(%o3, %o4, 1b)
ENDPROC(test_and_set_bit)
ENTRY(test_and_clear_bit) /* %o0=nr, %o1=addr */
BACKOFF_SETUP(%o3)
srlx %o0, 6, %g1
mov 1, %o2
sllx %g1, 3, %g3
and %o0, 63, %g2
sllx %o2, %g2, %o2
add %o1, %g3, %o1
1: ldx [%o1], %g7
andn %g7, %o2, %g1
casx [%o1], %g7, %g1
cmp %g7, %g1
bne,pn %xcc, BACKOFF_LABEL(2f, 1b)
and %g7, %o2, %g2
clr %o0
movrne %g2, 1, %o0
retl
nop
2: BACKOFF_SPIN(%o3, %o4, 1b)
ENDPROC(test_and_clear_bit)
ENTRY(test_and_change_bit) /* %o0=nr, %o1=addr */
BACKOFF_SETUP(%o3)
srlx %o0, 6, %g1
mov 1, %o2
sllx %g1, 3, %g3
and %o0, 63, %g2
sllx %o2, %g2, %o2
add %o1, %g3, %o1
1: ldx [%o1], %g7
xor %g7, %o2, %g1
casx [%o1], %g7, %g1
cmp %g7, %g1
bne,pn %xcc, BACKOFF_LABEL(2f, 1b)
and %g7, %o2, %g2
clr %o0
movrne %g2, 1, %o0
retl
nop
2: BACKOFF_SPIN(%o3, %o4, 1b)
ENDPROC(test_and_change_bit)
ENTRY(set_bit) /* %o0=nr, %o1=addr */
BACKOFF_SETUP(%o3)
srlx %o0, 6, %g1
mov 1, %o2
sllx %g1, 3, %g3
and %o0, 63, %g2
sllx %o2, %g2, %o2
add %o1, %g3, %o1
1: ldx [%o1], %g7
or %g7, %o2, %g1
casx [%o1], %g7, %g1
cmp %g7, %g1
bne,pn %xcc, BACKOFF_LABEL(2f, 1b)
nop
retl
nop
2: BACKOFF_SPIN(%o3, %o4, 1b)
ENDPROC(set_bit)
ENTRY(clear_bit) /* %o0=nr, %o1=addr */
BACKOFF_SETUP(%o3)
srlx %o0, 6, %g1
mov 1, %o2
sllx %g1, 3, %g3
and %o0, 63, %g2
sllx %o2, %g2, %o2
add %o1, %g3, %o1
1: ldx [%o1], %g7
andn %g7, %o2, %g1
casx [%o1], %g7, %g1
cmp %g7, %g1
bne,pn %xcc, BACKOFF_LABEL(2f, 1b)
nop
retl
nop
2: BACKOFF_SPIN(%o3, %o4, 1b)
ENDPROC(clear_bit)
ENTRY(change_bit) /* %o0=nr, %o1=addr */
BACKOFF_SETUP(%o3)
srlx %o0, 6, %g1
mov 1, %o2
sllx %g1, 3, %g3
and %o0, 63, %g2
sllx %o2, %g2, %o2
add %o1, %g3, %o1
1: ldx [%o1], %g7
xor %g7, %o2, %g1
casx [%o1], %g7, %g1
cmp %g7, %g1
bne,pn %xcc, BACKOFF_LABEL(2f, 1b)
nop
retl
nop
2: BACKOFF_SPIN(%o3, %o4, 1b)
ENDPROC(change_bit)

89
arch/sparc/lib/blockops.S Normal file
View file

@ -0,0 +1,89 @@
/*
* blockops.S: Common block zero optimized routines.
*
* Copyright (C) 1996 David S. Miller (davem@caip.rutgers.edu)
*/
#include <linux/linkage.h>
#include <asm/page.h>
/* Zero out 64 bytes of memory at (buf + offset).
* Assumes %g1 contains zero.
*/
#define BLAST_BLOCK(buf, offset) \
std %g0, [buf + offset + 0x38]; \
std %g0, [buf + offset + 0x30]; \
std %g0, [buf + offset + 0x28]; \
std %g0, [buf + offset + 0x20]; \
std %g0, [buf + offset + 0x18]; \
std %g0, [buf + offset + 0x10]; \
std %g0, [buf + offset + 0x08]; \
std %g0, [buf + offset + 0x00];
/* Copy 32 bytes of memory at (src + offset) to
* (dst + offset).
*/
#define MIRROR_BLOCK(dst, src, offset, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [src + offset + 0x18], t0; \
ldd [src + offset + 0x10], t2; \
ldd [src + offset + 0x08], t4; \
ldd [src + offset + 0x00], t6; \
std t0, [dst + offset + 0x18]; \
std t2, [dst + offset + 0x10]; \
std t4, [dst + offset + 0x08]; \
std t6, [dst + offset + 0x00];
/* Profiling evidence indicates that memset() is
* commonly called for blocks of size PAGE_SIZE,
* and (2 * PAGE_SIZE) (for kernel stacks)
* and with a second arg of zero. We assume in
* all of these cases that the buffer is aligned
* on at least an 8 byte boundary.
*
* Therefore we special case them to make them
* as fast as possible.
*/
.text
ENTRY(bzero_1page)
/* NOTE: If you change the number of insns of this routine, please check
* arch/sparc/mm/hypersparc.S */
/* %o0 = buf */
or %g0, %g0, %g1
or %o0, %g0, %o1
or %g0, (PAGE_SIZE >> 8), %g2
1:
BLAST_BLOCK(%o0, 0x00)
BLAST_BLOCK(%o0, 0x40)
BLAST_BLOCK(%o0, 0x80)
BLAST_BLOCK(%o0, 0xc0)
subcc %g2, 1, %g2
bne 1b
add %o0, 0x100, %o0
retl
nop
ENDPROC(bzero_1page)
ENTRY(__copy_1page)
/* NOTE: If you change the number of insns of this routine, please check
* arch/sparc/mm/hypersparc.S */
/* %o0 = dst, %o1 = src */
or %g0, (PAGE_SIZE >> 8), %g1
1:
MIRROR_BLOCK(%o0, %o1, 0x00, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
MIRROR_BLOCK(%o0, %o1, 0x20, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
MIRROR_BLOCK(%o0, %o1, 0x40, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
MIRROR_BLOCK(%o0, %o1, 0x60, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
MIRROR_BLOCK(%o0, %o1, 0x80, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
MIRROR_BLOCK(%o0, %o1, 0xa0, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
MIRROR_BLOCK(%o0, %o1, 0xc0, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
MIRROR_BLOCK(%o0, %o1, 0xe0, %o2, %o3, %o4, %o5, %g2, %g3, %g4, %g5)
subcc %g1, 1, %g1
add %o0, 0x100, %o0
bne 1b
add %o1, 0x100, %o1
retl
nop
ENDPROC(__copy_1page)

145
arch/sparc/lib/bzero.S Normal file
View file

@ -0,0 +1,145 @@
/* bzero.S: Simple prefetching memset, bzero, and clear_user
* implementations.
*
* Copyright (C) 2005 David S. Miller <davem@davemloft.net>
*/
#include <linux/linkage.h>
.text
ENTRY(memset) /* %o0=buf, %o1=pat, %o2=len */
and %o1, 0xff, %o3
mov %o2, %o1
sllx %o3, 8, %g1
or %g1, %o3, %o2
sllx %o2, 16, %g1
or %g1, %o2, %o2
sllx %o2, 32, %g1
ba,pt %xcc, 1f
or %g1, %o2, %o2
ENTRY(__bzero) /* %o0=buf, %o1=len */
clr %o2
1: mov %o0, %o3
brz,pn %o1, __bzero_done
cmp %o1, 16
bl,pn %icc, __bzero_tiny
prefetch [%o0 + 0x000], #n_writes
andcc %o0, 0x3, %g0
be,pt %icc, 2f
1: stb %o2, [%o0 + 0x00]
add %o0, 1, %o0
andcc %o0, 0x3, %g0
bne,pn %icc, 1b
sub %o1, 1, %o1
2: andcc %o0, 0x7, %g0
be,pt %icc, 3f
stw %o2, [%o0 + 0x00]
sub %o1, 4, %o1
add %o0, 4, %o0
3: and %o1, 0x38, %g1
cmp %o1, 0x40
andn %o1, 0x3f, %o4
bl,pn %icc, 5f
and %o1, 0x7, %o1
prefetch [%o0 + 0x040], #n_writes
prefetch [%o0 + 0x080], #n_writes
prefetch [%o0 + 0x0c0], #n_writes
prefetch [%o0 + 0x100], #n_writes
prefetch [%o0 + 0x140], #n_writes
4: prefetch [%o0 + 0x180], #n_writes
stx %o2, [%o0 + 0x00]
stx %o2, [%o0 + 0x08]
stx %o2, [%o0 + 0x10]
stx %o2, [%o0 + 0x18]
stx %o2, [%o0 + 0x20]
stx %o2, [%o0 + 0x28]
stx %o2, [%o0 + 0x30]
stx %o2, [%o0 + 0x38]
subcc %o4, 0x40, %o4
bne,pt %icc, 4b
add %o0, 0x40, %o0
brz,pn %g1, 6f
nop
5: stx %o2, [%o0 + 0x00]
subcc %g1, 8, %g1
bne,pt %icc, 5b
add %o0, 0x8, %o0
6: brz,pt %o1, __bzero_done
nop
__bzero_tiny:
1: stb %o2, [%o0 + 0x00]
subcc %o1, 1, %o1
bne,pt %icc, 1b
add %o0, 1, %o0
__bzero_done:
retl
mov %o3, %o0
ENDPROC(__bzero)
ENDPROC(memset)
#define EX_ST(x,y) \
98: x,y; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_o1; \
.text; \
.align 4;
ENTRY(__clear_user) /* %o0=buf, %o1=len */
brz,pn %o1, __clear_user_done
cmp %o1, 16
bl,pn %icc, __clear_user_tiny
EX_ST(prefetcha [%o0 + 0x00] %asi, #n_writes)
andcc %o0, 0x3, %g0
be,pt %icc, 2f
1: EX_ST(stba %g0, [%o0 + 0x00] %asi)
add %o0, 1, %o0
andcc %o0, 0x3, %g0
bne,pn %icc, 1b
sub %o1, 1, %o1
2: andcc %o0, 0x7, %g0
be,pt %icc, 3f
EX_ST(stwa %g0, [%o0 + 0x00] %asi)
sub %o1, 4, %o1
add %o0, 4, %o0
3: and %o1, 0x38, %g1
cmp %o1, 0x40
andn %o1, 0x3f, %o4
bl,pn %icc, 5f
and %o1, 0x7, %o1
EX_ST(prefetcha [%o0 + 0x040] %asi, #n_writes)
EX_ST(prefetcha [%o0 + 0x080] %asi, #n_writes)
EX_ST(prefetcha [%o0 + 0x0c0] %asi, #n_writes)
EX_ST(prefetcha [%o0 + 0x100] %asi, #n_writes)
EX_ST(prefetcha [%o0 + 0x140] %asi, #n_writes)
4: EX_ST(prefetcha [%o0 + 0x180] %asi, #n_writes)
EX_ST(stxa %g0, [%o0 + 0x00] %asi)
EX_ST(stxa %g0, [%o0 + 0x08] %asi)
EX_ST(stxa %g0, [%o0 + 0x10] %asi)
EX_ST(stxa %g0, [%o0 + 0x18] %asi)
EX_ST(stxa %g0, [%o0 + 0x20] %asi)
EX_ST(stxa %g0, [%o0 + 0x28] %asi)
EX_ST(stxa %g0, [%o0 + 0x30] %asi)
EX_ST(stxa %g0, [%o0 + 0x38] %asi)
subcc %o4, 0x40, %o4
bne,pt %icc, 4b
add %o0, 0x40, %o0
brz,pn %g1, 6f
nop
5: EX_ST(stxa %g0, [%o0 + 0x00] %asi)
subcc %g1, 8, %g1
bne,pt %icc, 5b
add %o0, 0x8, %o0
6: brz,pt %o1, __clear_user_done
nop
__clear_user_tiny:
1: EX_ST(stba %g0, [%o0 + 0x00] %asi)
subcc %o1, 1, %o1
bne,pt %icc, 1b
add %o0, 1, %o0
__clear_user_done:
retl
clr %o0
ENDPROC(__clear_user)

View file

@ -0,0 +1,589 @@
/* checksum.S: Sparc optimized checksum code.
*
* Copyright(C) 1995 Linus Torvalds
* Copyright(C) 1995 Miguel de Icaza
* Copyright(C) 1996 David S. Miller
* Copyright(C) 1997 Jakub Jelinek
*
* derived from:
* Linux/Alpha checksum c-code
* Linux/ix86 inline checksum assembly
* RFC1071 Computing the Internet Checksum (esp. Jacobsons m68k code)
* David Mosberger-Tang for optimized reference c-code
* BSD4.4 portable checksum routine
*/
#include <asm/errno.h>
#define CSUM_BIGCHUNK(buf, offset, sum, t0, t1, t2, t3, t4, t5) \
ldd [buf + offset + 0x00], t0; \
ldd [buf + offset + 0x08], t2; \
addxcc t0, sum, sum; \
addxcc t1, sum, sum; \
ldd [buf + offset + 0x10], t4; \
addxcc t2, sum, sum; \
addxcc t3, sum, sum; \
ldd [buf + offset + 0x18], t0; \
addxcc t4, sum, sum; \
addxcc t5, sum, sum; \
addxcc t0, sum, sum; \
addxcc t1, sum, sum;
#define CSUM_LASTCHUNK(buf, offset, sum, t0, t1, t2, t3) \
ldd [buf - offset - 0x08], t0; \
ldd [buf - offset - 0x00], t2; \
addxcc t0, sum, sum; \
addxcc t1, sum, sum; \
addxcc t2, sum, sum; \
addxcc t3, sum, sum;
/* Do end cruft out of band to get better cache patterns. */
csum_partial_end_cruft:
be 1f ! caller asks %o1 & 0x8
andcc %o1, 4, %g0 ! nope, check for word remaining
ldd [%o0], %g2 ! load two
addcc %g2, %o2, %o2 ! add first word to sum
addxcc %g3, %o2, %o2 ! add second word as well
add %o0, 8, %o0 ! advance buf ptr
addx %g0, %o2, %o2 ! add in final carry
andcc %o1, 4, %g0 ! check again for word remaining
1: be 1f ! nope, skip this code
andcc %o1, 3, %o1 ! check for trailing bytes
ld [%o0], %g2 ! load it
addcc %g2, %o2, %o2 ! add to sum
add %o0, 4, %o0 ! advance buf ptr
addx %g0, %o2, %o2 ! add in final carry
andcc %o1, 3, %g0 ! check again for trailing bytes
1: be 1f ! no trailing bytes, return
addcc %o1, -1, %g0 ! only one byte remains?
bne 2f ! at least two bytes more
subcc %o1, 2, %o1 ! only two bytes more?
b 4f ! only one byte remains
or %g0, %g0, %o4 ! clear fake hword value
2: lduh [%o0], %o4 ! get hword
be 6f ! jmp if only hword remains
add %o0, 2, %o0 ! advance buf ptr either way
sll %o4, 16, %o4 ! create upper hword
4: ldub [%o0], %o5 ! get final byte
sll %o5, 8, %o5 ! put into place
or %o5, %o4, %o4 ! coalese with hword (if any)
6: addcc %o4, %o2, %o2 ! add to sum
1: retl ! get outta here
addx %g0, %o2, %o0 ! add final carry into retval
/* Also do alignment out of band to get better cache patterns. */
csum_partial_fix_alignment:
cmp %o1, 6
bl cpte - 0x4
andcc %o0, 0x2, %g0
be 1f
andcc %o0, 0x4, %g0
lduh [%o0 + 0x00], %g2
sub %o1, 2, %o1
add %o0, 2, %o0
sll %g2, 16, %g2
addcc %g2, %o2, %o2
srl %o2, 16, %g3
addx %g0, %g3, %g2
sll %o2, 16, %o2
sll %g2, 16, %g3
srl %o2, 16, %o2
andcc %o0, 0x4, %g0
or %g3, %o2, %o2
1: be cpa
andcc %o1, 0xffffff80, %o3
ld [%o0 + 0x00], %g2
sub %o1, 4, %o1
addcc %g2, %o2, %o2
add %o0, 4, %o0
addx %g0, %o2, %o2
b cpa
andcc %o1, 0xffffff80, %o3
/* The common case is to get called with a nicely aligned
* buffer of size 0x20. Follow the code path for that case.
*/
.globl csum_partial
csum_partial: /* %o0=buf, %o1=len, %o2=sum */
andcc %o0, 0x7, %g0 ! alignment problems?
bne csum_partial_fix_alignment ! yep, handle it
sethi %hi(cpte - 8), %g7 ! prepare table jmp ptr
andcc %o1, 0xffffff80, %o3 ! num loop iterations
cpa: be 3f ! none to do
andcc %o1, 0x70, %g1 ! clears carry flag too
5: CSUM_BIGCHUNK(%o0, 0x00, %o2, %o4, %o5, %g2, %g3, %g4, %g5)
CSUM_BIGCHUNK(%o0, 0x20, %o2, %o4, %o5, %g2, %g3, %g4, %g5)
CSUM_BIGCHUNK(%o0, 0x40, %o2, %o4, %o5, %g2, %g3, %g4, %g5)
CSUM_BIGCHUNK(%o0, 0x60, %o2, %o4, %o5, %g2, %g3, %g4, %g5)
addx %g0, %o2, %o2 ! sink in final carry
subcc %o3, 128, %o3 ! detract from loop iters
bne 5b ! more to do
add %o0, 128, %o0 ! advance buf ptr
andcc %o1, 0x70, %g1 ! clears carry flag too
3: be cpte ! nope
andcc %o1, 0xf, %g0 ! anything left at all?
srl %g1, 1, %o4 ! compute offset
sub %g7, %g1, %g7 ! adjust jmp ptr
sub %g7, %o4, %g7 ! final jmp ptr adjust
jmp %g7 + %lo(cpte - 8) ! enter the table
add %o0, %g1, %o0 ! advance buf ptr
cptbl: CSUM_LASTCHUNK(%o0, 0x68, %o2, %g2, %g3, %g4, %g5)
CSUM_LASTCHUNK(%o0, 0x58, %o2, %g2, %g3, %g4, %g5)
CSUM_LASTCHUNK(%o0, 0x48, %o2, %g2, %g3, %g4, %g5)
CSUM_LASTCHUNK(%o0, 0x38, %o2, %g2, %g3, %g4, %g5)
CSUM_LASTCHUNK(%o0, 0x28, %o2, %g2, %g3, %g4, %g5)
CSUM_LASTCHUNK(%o0, 0x18, %o2, %g2, %g3, %g4, %g5)
CSUM_LASTCHUNK(%o0, 0x08, %o2, %g2, %g3, %g4, %g5)
addx %g0, %o2, %o2 ! fetch final carry
andcc %o1, 0xf, %g0 ! anything left at all?
cpte: bne csum_partial_end_cruft ! yep, handle it
andcc %o1, 8, %g0 ! check how much
cpout: retl ! get outta here
mov %o2, %o0 ! return computed csum
.globl __csum_partial_copy_start, __csum_partial_copy_end
__csum_partial_copy_start:
/* Work around cpp -rob */
#define ALLOC #alloc
#define EXECINSTR #execinstr
#define EX(x,y,a,b) \
98: x,y; \
.section .fixup,ALLOC,EXECINSTR; \
.align 4; \
99: ba 30f; \
a, b, %o3; \
.section __ex_table,ALLOC; \
.align 4; \
.word 98b, 99b; \
.text; \
.align 4
#define EX2(x,y) \
98: x,y; \
.section __ex_table,ALLOC; \
.align 4; \
.word 98b, 30f; \
.text; \
.align 4
#define EX3(x,y) \
98: x,y; \
.section __ex_table,ALLOC; \
.align 4; \
.word 98b, 96f; \
.text; \
.align 4
#define EXT(start,end,handler) \
.section __ex_table,ALLOC; \
.align 4; \
.word start, 0, end, handler; \
.text; \
.align 4
/* This aligned version executes typically in 8.5 superscalar cycles, this
* is the best I can do. I say 8.5 because the final add will pair with
* the next ldd in the main unrolled loop. Thus the pipe is always full.
* If you change these macros (including order of instructions),
* please check the fixup code below as well.
*/
#define CSUMCOPY_BIGCHUNK_ALIGNED(src, dst, sum, off, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [src + off + 0x00], t0; \
ldd [src + off + 0x08], t2; \
addxcc t0, sum, sum; \
ldd [src + off + 0x10], t4; \
addxcc t1, sum, sum; \
ldd [src + off + 0x18], t6; \
addxcc t2, sum, sum; \
std t0, [dst + off + 0x00]; \
addxcc t3, sum, sum; \
std t2, [dst + off + 0x08]; \
addxcc t4, sum, sum; \
std t4, [dst + off + 0x10]; \
addxcc t5, sum, sum; \
std t6, [dst + off + 0x18]; \
addxcc t6, sum, sum; \
addxcc t7, sum, sum;
/* 12 superscalar cycles seems to be the limit for this case,
* because of this we thus do all the ldd's together to get
* Viking MXCC into streaming mode. Ho hum...
*/
#define CSUMCOPY_BIGCHUNK(src, dst, sum, off, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [src + off + 0x00], t0; \
ldd [src + off + 0x08], t2; \
ldd [src + off + 0x10], t4; \
ldd [src + off + 0x18], t6; \
st t0, [dst + off + 0x00]; \
addxcc t0, sum, sum; \
st t1, [dst + off + 0x04]; \
addxcc t1, sum, sum; \
st t2, [dst + off + 0x08]; \
addxcc t2, sum, sum; \
st t3, [dst + off + 0x0c]; \
addxcc t3, sum, sum; \
st t4, [dst + off + 0x10]; \
addxcc t4, sum, sum; \
st t5, [dst + off + 0x14]; \
addxcc t5, sum, sum; \
st t6, [dst + off + 0x18]; \
addxcc t6, sum, sum; \
st t7, [dst + off + 0x1c]; \
addxcc t7, sum, sum;
/* Yuck, 6 superscalar cycles... */
#define CSUMCOPY_LASTCHUNK(src, dst, sum, off, t0, t1, t2, t3) \
ldd [src - off - 0x08], t0; \
ldd [src - off - 0x00], t2; \
addxcc t0, sum, sum; \
st t0, [dst - off - 0x08]; \
addxcc t1, sum, sum; \
st t1, [dst - off - 0x04]; \
addxcc t2, sum, sum; \
st t2, [dst - off - 0x00]; \
addxcc t3, sum, sum; \
st t3, [dst - off + 0x04];
/* Handle the end cruft code out of band for better cache patterns. */
cc_end_cruft:
be 1f
andcc %o3, 4, %g0
EX(ldd [%o0 + 0x00], %g2, and %o3, 0xf)
add %o1, 8, %o1
addcc %g2, %g7, %g7
add %o0, 8, %o0
addxcc %g3, %g7, %g7
EX2(st %g2, [%o1 - 0x08])
addx %g0, %g7, %g7
andcc %o3, 4, %g0
EX2(st %g3, [%o1 - 0x04])
1: be 1f
andcc %o3, 3, %o3
EX(ld [%o0 + 0x00], %g2, add %o3, 4)
add %o1, 4, %o1
addcc %g2, %g7, %g7
EX2(st %g2, [%o1 - 0x04])
addx %g0, %g7, %g7
andcc %o3, 3, %g0
add %o0, 4, %o0
1: be 1f
addcc %o3, -1, %g0
bne 2f
subcc %o3, 2, %o3
b 4f
or %g0, %g0, %o4
2: EX(lduh [%o0 + 0x00], %o4, add %o3, 2)
add %o0, 2, %o0
EX2(sth %o4, [%o1 + 0x00])
be 6f
add %o1, 2, %o1
sll %o4, 16, %o4
4: EX(ldub [%o0 + 0x00], %o5, add %g0, 1)
EX2(stb %o5, [%o1 + 0x00])
sll %o5, 8, %o5
or %o5, %o4, %o4
6: addcc %o4, %g7, %g7
1: retl
addx %g0, %g7, %o0
/* Also, handle the alignment code out of band. */
cc_dword_align:
cmp %g1, 16
bge 1f
srl %g1, 1, %o3
2: cmp %o3, 0
be,a ccte
andcc %g1, 0xf, %o3
andcc %o3, %o0, %g0 ! Check %o0 only (%o1 has the same last 2 bits)
be,a 2b
srl %o3, 1, %o3
1: andcc %o0, 0x1, %g0
bne ccslow
andcc %o0, 0x2, %g0
be 1f
andcc %o0, 0x4, %g0
EX(lduh [%o0 + 0x00], %g4, add %g1, 0)
sub %g1, 2, %g1
EX2(sth %g4, [%o1 + 0x00])
add %o0, 2, %o0
sll %g4, 16, %g4
addcc %g4, %g7, %g7
add %o1, 2, %o1
srl %g7, 16, %g3
addx %g0, %g3, %g4
sll %g7, 16, %g7
sll %g4, 16, %g3
srl %g7, 16, %g7
andcc %o0, 0x4, %g0
or %g3, %g7, %g7
1: be 3f
andcc %g1, 0xffffff80, %g0
EX(ld [%o0 + 0x00], %g4, add %g1, 0)
sub %g1, 4, %g1
EX2(st %g4, [%o1 + 0x00])
add %o0, 4, %o0
addcc %g4, %g7, %g7
add %o1, 4, %o1
addx %g0, %g7, %g7
b 3f
andcc %g1, 0xffffff80, %g0
/* Sun, you just can't beat me, you just can't. Stop trying,
* give up. I'm serious, I am going to kick the living shit
* out of you, game over, lights out.
*/
.align 8
.globl __csum_partial_copy_sparc_generic
__csum_partial_copy_sparc_generic:
/* %o0=src, %o1=dest, %g1=len, %g7=sum */
xor %o0, %o1, %o4 ! get changing bits
andcc %o4, 3, %g0 ! check for mismatched alignment
bne ccslow ! better this than unaligned/fixups
andcc %o0, 7, %g0 ! need to align things?
bne cc_dword_align ! yes, we check for short lengths there
andcc %g1, 0xffffff80, %g0 ! can we use unrolled loop?
3: be 3f ! nope, less than one loop remains
andcc %o1, 4, %g0 ! dest aligned on 4 or 8 byte boundary?
be ccdbl + 4 ! 8 byte aligned, kick ass
5: CSUMCOPY_BIGCHUNK(%o0,%o1,%g7,0x00,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
CSUMCOPY_BIGCHUNK(%o0,%o1,%g7,0x20,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
CSUMCOPY_BIGCHUNK(%o0,%o1,%g7,0x40,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
CSUMCOPY_BIGCHUNK(%o0,%o1,%g7,0x60,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
10: EXT(5b, 10b, 20f) ! note for exception handling
sub %g1, 128, %g1 ! detract from length
addx %g0, %g7, %g7 ! add in last carry bit
andcc %g1, 0xffffff80, %g0 ! more to csum?
add %o0, 128, %o0 ! advance src ptr
bne 5b ! we did not go negative, continue looping
add %o1, 128, %o1 ! advance dest ptr
3: andcc %g1, 0x70, %o2 ! can use table?
ccmerge:be ccte ! nope, go and check for end cruft
andcc %g1, 0xf, %o3 ! get low bits of length (clears carry btw)
srl %o2, 1, %o4 ! begin negative offset computation
sethi %hi(12f), %o5 ! set up table ptr end
add %o0, %o2, %o0 ! advance src ptr
sub %o5, %o4, %o5 ! continue table calculation
sll %o2, 1, %g2 ! constant multiplies are fun...
sub %o5, %g2, %o5 ! some more adjustments
jmp %o5 + %lo(12f) ! jump into it, duff style, wheee...
add %o1, %o2, %o1 ! advance dest ptr (carry is clear btw)
cctbl: CSUMCOPY_LASTCHUNK(%o0,%o1,%g7,0x68,%g2,%g3,%g4,%g5)
CSUMCOPY_LASTCHUNK(%o0,%o1,%g7,0x58,%g2,%g3,%g4,%g5)
CSUMCOPY_LASTCHUNK(%o0,%o1,%g7,0x48,%g2,%g3,%g4,%g5)
CSUMCOPY_LASTCHUNK(%o0,%o1,%g7,0x38,%g2,%g3,%g4,%g5)
CSUMCOPY_LASTCHUNK(%o0,%o1,%g7,0x28,%g2,%g3,%g4,%g5)
CSUMCOPY_LASTCHUNK(%o0,%o1,%g7,0x18,%g2,%g3,%g4,%g5)
CSUMCOPY_LASTCHUNK(%o0,%o1,%g7,0x08,%g2,%g3,%g4,%g5)
12: EXT(cctbl, 12b, 22f) ! note for exception table handling
addx %g0, %g7, %g7
andcc %o3, 0xf, %g0 ! check for low bits set
ccte: bne cc_end_cruft ! something left, handle it out of band
andcc %o3, 8, %g0 ! begin checks for that code
retl ! return
mov %g7, %o0 ! give em the computed checksum
ccdbl: CSUMCOPY_BIGCHUNK_ALIGNED(%o0,%o1,%g7,0x00,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
CSUMCOPY_BIGCHUNK_ALIGNED(%o0,%o1,%g7,0x20,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
CSUMCOPY_BIGCHUNK_ALIGNED(%o0,%o1,%g7,0x40,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
CSUMCOPY_BIGCHUNK_ALIGNED(%o0,%o1,%g7,0x60,%o4,%o5,%g2,%g3,%g4,%g5,%o2,%o3)
11: EXT(ccdbl, 11b, 21f) ! note for exception table handling
sub %g1, 128, %g1 ! detract from length
addx %g0, %g7, %g7 ! add in last carry bit
andcc %g1, 0xffffff80, %g0 ! more to csum?
add %o0, 128, %o0 ! advance src ptr
bne ccdbl ! we did not go negative, continue looping
add %o1, 128, %o1 ! advance dest ptr
b ccmerge ! finish it off, above
andcc %g1, 0x70, %o2 ! can use table? (clears carry btw)
ccslow: cmp %g1, 0
mov 0, %g5
bleu 4f
andcc %o0, 1, %o5
be,a 1f
srl %g1, 1, %g4
sub %g1, 1, %g1
EX(ldub [%o0], %g5, add %g1, 1)
add %o0, 1, %o0
EX2(stb %g5, [%o1])
srl %g1, 1, %g4
add %o1, 1, %o1
1: cmp %g4, 0
be,a 3f
andcc %g1, 1, %g0
andcc %o0, 2, %g0
be,a 1f
srl %g4, 1, %g4
EX(lduh [%o0], %o4, add %g1, 0)
sub %g1, 2, %g1
srl %o4, 8, %g2
sub %g4, 1, %g4
EX2(stb %g2, [%o1])
add %o4, %g5, %g5
EX2(stb %o4, [%o1 + 1])
add %o0, 2, %o0
srl %g4, 1, %g4
add %o1, 2, %o1
1: cmp %g4, 0
be,a 2f
andcc %g1, 2, %g0
EX3(ld [%o0], %o4)
5: srl %o4, 24, %g2
srl %o4, 16, %g3
EX2(stb %g2, [%o1])
srl %o4, 8, %g2
EX2(stb %g3, [%o1 + 1])
add %o0, 4, %o0
EX2(stb %g2, [%o1 + 2])
addcc %o4, %g5, %g5
EX2(stb %o4, [%o1 + 3])
addx %g5, %g0, %g5 ! I am now to lazy to optimize this (question it
add %o1, 4, %o1 ! is worthy). Maybe some day - with the sll/srl
subcc %g4, 1, %g4 ! tricks
bne,a 5b
EX3(ld [%o0], %o4)
sll %g5, 16, %g2
srl %g5, 16, %g5
srl %g2, 16, %g2
andcc %g1, 2, %g0
add %g2, %g5, %g5
2: be,a 3f
andcc %g1, 1, %g0
EX(lduh [%o0], %o4, and %g1, 3)
andcc %g1, 1, %g0
srl %o4, 8, %g2
add %o0, 2, %o0
EX2(stb %g2, [%o1])
add %g5, %o4, %g5
EX2(stb %o4, [%o1 + 1])
add %o1, 2, %o1
3: be,a 1f
sll %g5, 16, %o4
EX(ldub [%o0], %g2, add %g0, 1)
sll %g2, 8, %o4
EX2(stb %g2, [%o1])
add %g5, %o4, %g5
sll %g5, 16, %o4
1: addcc %o4, %g5, %g5
srl %g5, 16, %o4
addx %g0, %o4, %g5
orcc %o5, %g0, %g0
be 4f
srl %g5, 8, %o4
and %g5, 0xff, %g2
and %o4, 0xff, %o4
sll %g2, 8, %g2
or %g2, %o4, %g5
4: addcc %g7, %g5, %g7
retl
addx %g0, %g7, %o0
__csum_partial_copy_end:
/* We do these strange calculations for the csum_*_from_user case only, ie.
* we only bother with faults on loads... */
/* o2 = ((g2%20)&3)*8
* o3 = g1 - (g2/20)*32 - o2 */
20:
cmp %g2, 20
blu,a 1f
and %g2, 3, %o2
sub %g1, 32, %g1
b 20b
sub %g2, 20, %g2
1:
sll %o2, 3, %o2
b 31f
sub %g1, %o2, %o3
/* o2 = (!(g2 & 15) ? 0 : (((g2 & 15) + 1) & ~1)*8)
* o3 = g1 - (g2/16)*32 - o2 */
21:
andcc %g2, 15, %o3
srl %g2, 4, %g2
be,a 1f
clr %o2
add %o3, 1, %o3
and %o3, 14, %o3
sll %o3, 3, %o2
1:
sll %g2, 5, %g2
sub %g1, %g2, %o3
b 31f
sub %o3, %o2, %o3
/* o0 += (g2/10)*16 - 0x70
* 01 += (g2/10)*16 - 0x70
* o2 = (g2 % 10) ? 8 : 0
* o3 += 0x70 - (g2/10)*16 - o2 */
22:
cmp %g2, 10
blu,a 1f
sub %o0, 0x70, %o0
add %o0, 16, %o0
add %o1, 16, %o1
sub %o3, 16, %o3
b 22b
sub %g2, 10, %g2
1:
sub %o1, 0x70, %o1
add %o3, 0x70, %o3
clr %o2
tst %g2
bne,a 1f
mov 8, %o2
1:
b 31f
sub %o3, %o2, %o3
96:
and %g1, 3, %g1
sll %g4, 2, %g4
add %g1, %g4, %o3
30:
/* %o1 is dst
* %o3 is # bytes to zero out
* %o4 is faulting address
* %o5 is %pc where fault occurred */
clr %o2
31:
/* %o0 is src
* %o1 is dst
* %o2 is # of bytes to copy from src to dst
* %o3 is # bytes to zero out
* %o4 is faulting address
* %o5 is %pc where fault occurred */
save %sp, -104, %sp
mov %i5, %o0
mov %i7, %o1
mov %i4, %o2
call lookup_fault
mov %g7, %i4
cmp %o0, 2
bne 1f
add %g0, -EFAULT, %i5
tst %i2
be 2f
mov %i0, %o1
mov %i1, %o0
5:
call memcpy
mov %i2, %o2
tst %o0
bne,a 2f
add %i3, %i2, %i3
add %i1, %i2, %i1
2:
mov %i1, %o0
6:
call __bzero
mov %i3, %o1
1:
ld [%sp + 168], %o2 ! struct_ptr of parent
st %i5, [%o2]
ret
restore
.section __ex_table,#alloc
.align 4
.word 5b,2
.word 6b,2

View file

@ -0,0 +1,173 @@
/* checksum.S: Sparc V9 optimized checksum code.
*
* Copyright(C) 1995 Linus Torvalds
* Copyright(C) 1995 Miguel de Icaza
* Copyright(C) 1996, 2000 David S. Miller
* Copyright(C) 1997 Jakub Jelinek
*
* derived from:
* Linux/Alpha checksum c-code
* Linux/ix86 inline checksum assembly
* RFC1071 Computing the Internet Checksum (esp. Jacobsons m68k code)
* David Mosberger-Tang for optimized reference c-code
* BSD4.4 portable checksum routine
*/
.text
csum_partial_fix_alignment:
/* We checked for zero length already, so there must be
* at least one byte.
*/
be,pt %icc, 1f
nop
ldub [%o0 + 0x00], %o4
add %o0, 1, %o0
sub %o1, 1, %o1
1: andcc %o0, 0x2, %g0
be,pn %icc, csum_partial_post_align
cmp %o1, 2
blu,pn %icc, csum_partial_end_cruft
nop
lduh [%o0 + 0x00], %o5
add %o0, 2, %o0
sub %o1, 2, %o1
ba,pt %xcc, csum_partial_post_align
add %o5, %o4, %o4
.align 32
.globl csum_partial
csum_partial: /* %o0=buff, %o1=len, %o2=sum */
prefetch [%o0 + 0x000], #n_reads
clr %o4
prefetch [%o0 + 0x040], #n_reads
brz,pn %o1, csum_partial_finish
andcc %o0, 0x3, %g0
/* We "remember" whether the lowest bit in the address
* was set in %g7. Because if it is, we have to swap
* upper and lower 8 bit fields of the sum we calculate.
*/
bne,pn %icc, csum_partial_fix_alignment
andcc %o0, 0x1, %g7
csum_partial_post_align:
prefetch [%o0 + 0x080], #n_reads
andncc %o1, 0x3f, %o3
prefetch [%o0 + 0x0c0], #n_reads
sub %o1, %o3, %o1
brz,pn %o3, 2f
prefetch [%o0 + 0x100], #n_reads
/* So that we don't need to use the non-pairing
* add-with-carry instructions we accumulate 32-bit
* values into a 64-bit register. At the end of the
* loop we fold it down to 32-bits and so on.
*/
prefetch [%o0 + 0x140], #n_reads
1: lduw [%o0 + 0x00], %o5
lduw [%o0 + 0x04], %g1
lduw [%o0 + 0x08], %g2
add %o4, %o5, %o4
lduw [%o0 + 0x0c], %g3
add %o4, %g1, %o4
lduw [%o0 + 0x10], %o5
add %o4, %g2, %o4
lduw [%o0 + 0x14], %g1
add %o4, %g3, %o4
lduw [%o0 + 0x18], %g2
add %o4, %o5, %o4
lduw [%o0 + 0x1c], %g3
add %o4, %g1, %o4
lduw [%o0 + 0x20], %o5
add %o4, %g2, %o4
lduw [%o0 + 0x24], %g1
add %o4, %g3, %o4
lduw [%o0 + 0x28], %g2
add %o4, %o5, %o4
lduw [%o0 + 0x2c], %g3
add %o4, %g1, %o4
lduw [%o0 + 0x30], %o5
add %o4, %g2, %o4
lduw [%o0 + 0x34], %g1
add %o4, %g3, %o4
lduw [%o0 + 0x38], %g2
add %o4, %o5, %o4
lduw [%o0 + 0x3c], %g3
add %o4, %g1, %o4
prefetch [%o0 + 0x180], #n_reads
add %o4, %g2, %o4
subcc %o3, 0x40, %o3
add %o0, 0x40, %o0
bne,pt %icc, 1b
add %o4, %g3, %o4
2: and %o1, 0x3c, %o3
brz,pn %o3, 2f
sub %o1, %o3, %o1
1: lduw [%o0 + 0x00], %o5
subcc %o3, 0x4, %o3
add %o0, 0x4, %o0
bne,pt %icc, 1b
add %o4, %o5, %o4
2:
/* fold 64-->32 */
srlx %o4, 32, %o5
srl %o4, 0, %o4
add %o4, %o5, %o4
srlx %o4, 32, %o5
srl %o4, 0, %o4
add %o4, %o5, %o4
/* fold 32-->16 */
sethi %hi(0xffff0000), %g1
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
csum_partial_end_cruft:
/* %o4 has the 16-bit sum we have calculated so-far. */
cmp %o1, 2
blu,pt %icc, 1f
nop
lduh [%o0 + 0x00], %o5
sub %o1, 2, %o1
add %o0, 2, %o0
add %o4, %o5, %o4
1: brz,pt %o1, 1f
nop
ldub [%o0 + 0x00], %o5
sub %o1, 1, %o1
add %o0, 1, %o0
sllx %o5, 8, %o5
add %o4, %o5, %o4
1:
/* fold 32-->16 */
sethi %hi(0xffff0000), %g1
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
1: brz,pt %g7, 1f
nop
/* We started with an odd byte, byte-swap the result. */
srl %o4, 8, %o5
and %o4, 0xff, %g1
sll %g1, 8, %g1
or %o5, %g1, %o4
1: addcc %o2, %o4, %o2
addc %g0, %o2, %o2
csum_partial_finish:
retl
srl %o2, 0, %o0

103
arch/sparc/lib/clear_page.S Normal file
View file

@ -0,0 +1,103 @@
/* clear_page.S: UltraSparc optimized clear page.
*
* Copyright (C) 1996, 1998, 1999, 2000, 2004 David S. Miller (davem@redhat.com)
* Copyright (C) 1997 Jakub Jelinek (jakub@redhat.com)
*/
#include <asm/visasm.h>
#include <asm/thread_info.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/spitfire.h>
#include <asm/head.h>
/* What we used to do was lock a TLB entry into a specific
* TLB slot, clear the page with interrupts disabled, then
* restore the original TLB entry. This was great for
* disturbing the TLB as little as possible, but it meant
* we had to keep interrupts disabled for a long time.
*
* Now, we simply use the normal TLB loading mechanism,
* and this makes the cpu choose a slot all by itself.
* Then we do a normal TLB flush on exit. We need only
* disable preemption during the clear.
*/
.text
.globl _clear_page
_clear_page: /* %o0=dest */
ba,pt %xcc, clear_page_common
clr %o4
/* This thing is pretty important, it shows up
* on the profiles via do_anonymous_page().
*/
.align 32
.globl clear_user_page
clear_user_page: /* %o0=dest, %o1=vaddr */
lduw [%g6 + TI_PRE_COUNT], %o2
sethi %hi(PAGE_OFFSET), %g2
sethi %hi(PAGE_SIZE), %o4
ldx [%g2 + %lo(PAGE_OFFSET)], %g2
sethi %hi(PAGE_KERNEL_LOCKED), %g3
ldx [%g3 + %lo(PAGE_KERNEL_LOCKED)], %g3
sub %o0, %g2, %g1 ! paddr
and %o1, %o4, %o0 ! vaddr D-cache alias bit
or %g1, %g3, %g1 ! TTE data
sethi %hi(TLBTEMP_BASE), %o3
add %o2, 1, %o4
add %o0, %o3, %o0 ! TTE vaddr
/* Disable preemption. */
mov TLB_TAG_ACCESS, %g3
stw %o4, [%g6 + TI_PRE_COUNT]
/* Load TLB entry. */
rdpr %pstate, %o4
wrpr %o4, PSTATE_IE, %pstate
stxa %o0, [%g3] ASI_DMMU
stxa %g1, [%g0] ASI_DTLB_DATA_IN
sethi %hi(KERNBASE), %g1
flush %g1
wrpr %o4, 0x0, %pstate
mov 1, %o4
clear_page_common:
VISEntryHalf
membar #StoreLoad | #StoreStore | #LoadStore
fzero %f0
sethi %hi(PAGE_SIZE/64), %o1
mov %o0, %g1 ! remember vaddr for tlbflush
fzero %f2
or %o1, %lo(PAGE_SIZE/64), %o1
faddd %f0, %f2, %f4
fmuld %f0, %f2, %f6
faddd %f0, %f2, %f8
fmuld %f0, %f2, %f10
faddd %f0, %f2, %f12
fmuld %f0, %f2, %f14
1: stda %f0, [%o0 + %g0] ASI_BLK_P
subcc %o1, 1, %o1
bne,pt %icc, 1b
add %o0, 0x40, %o0
membar #Sync
VISExitHalf
brz,pn %o4, out
nop
stxa %g0, [%g1] ASI_DMMU_DEMAP
membar #Sync
stw %o2, [%g6 + TI_PRE_COUNT]
out: retl
nop

27
arch/sparc/lib/cmpdi2.c Normal file
View file

@ -0,0 +1,27 @@
#include <linux/module.h>
#include "libgcc.h"
word_type __cmpdi2(long long a, long long b)
{
const DWunion au = {
.ll = a
};
const DWunion bu = {
.ll = b
};
if (au.s.high < bu.s.high)
return 0;
else if (au.s.high > bu.s.high)
return 2;
if ((unsigned int) au.s.low < (unsigned int) bu.s.low)
return 0;
else if ((unsigned int) au.s.low > (unsigned int) bu.s.low)
return 2;
return 1;
}
EXPORT_SYMBOL(__cmpdi2);

View file

@ -0,0 +1,92 @@
/* copy_in_user.S: Copy from userspace to userspace.
*
* Copyright (C) 1999, 2000, 2004 David S. Miller (davem@redhat.com)
*/
#include <linux/linkage.h>
#include <asm/asi.h>
#define XCC xcc
#define EX(x,y) \
98: x,y; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_one; \
.text; \
.align 4;
.register %g2,#scratch
.register %g3,#scratch
.text
.align 32
/* Don't try to get too fancy here, just nice and
* simple. This is predominantly used for well aligned
* small copies in the compat layer. It is also used
* to copy register windows around during thread cloning.
*/
ENTRY(___copy_in_user) /* %o0=dst, %o1=src, %o2=len */
cmp %o2, 0
be,pn %XCC, 85f
or %o0, %o1, %o3
cmp %o2, 16
bleu,a,pn %XCC, 80f
or %o3, %o2, %o3
/* 16 < len <= 64 */
andcc %o3, 0x7, %g0
bne,pn %XCC, 90f
nop
andn %o2, 0x7, %o4
and %o2, 0x7, %o2
1: subcc %o4, 0x8, %o4
EX(ldxa [%o1] %asi, %o5)
EX(stxa %o5, [%o0] %asi)
add %o1, 0x8, %o1
bgu,pt %XCC, 1b
add %o0, 0x8, %o0
andcc %o2, 0x4, %g0
be,pt %XCC, 1f
nop
sub %o2, 0x4, %o2
EX(lduwa [%o1] %asi, %o5)
EX(stwa %o5, [%o0] %asi)
add %o1, 0x4, %o1
add %o0, 0x4, %o0
1: cmp %o2, 0
be,pt %XCC, 85f
nop
ba,pt %xcc, 90f
nop
80: /* 0 < len <= 16 */
andcc %o3, 0x3, %g0
bne,pn %XCC, 90f
nop
82:
subcc %o2, 4, %o2
EX(lduwa [%o1] %asi, %g1)
EX(stwa %g1, [%o0] %asi)
add %o1, 4, %o1
bgu,pt %XCC, 82b
add %o0, 4, %o0
85: retl
clr %o0
.align 32
90:
subcc %o2, 1, %o2
EX(lduba [%o1] %asi, %g1)
EX(stba %g1, [%o0] %asi)
add %o1, 1, %o1
bgu,pt %XCC, 90b
add %o0, 1, %o0
retl
clr %o0
ENDPROC(___copy_in_user)

250
arch/sparc/lib/copy_page.S Normal file
View file

@ -0,0 +1,250 @@
/* clear_page.S: UltraSparc optimized copy page.
*
* Copyright (C) 1996, 1998, 1999, 2000, 2004 David S. Miller (davem@redhat.com)
* Copyright (C) 1997 Jakub Jelinek (jakub@redhat.com)
*/
#include <asm/visasm.h>
#include <asm/thread_info.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/spitfire.h>
#include <asm/head.h>
/* What we used to do was lock a TLB entry into a specific
* TLB slot, clear the page with interrupts disabled, then
* restore the original TLB entry. This was great for
* disturbing the TLB as little as possible, but it meant
* we had to keep interrupts disabled for a long time.
*
* Now, we simply use the normal TLB loading mechanism,
* and this makes the cpu choose a slot all by itself.
* Then we do a normal TLB flush on exit. We need only
* disable preemption during the clear.
*/
#define DCACHE_SIZE (PAGE_SIZE * 2)
#if (PAGE_SHIFT == 13)
#define PAGE_SIZE_REM 0x80
#elif (PAGE_SHIFT == 16)
#define PAGE_SIZE_REM 0x100
#else
#error Wrong PAGE_SHIFT specified
#endif
#define TOUCH(reg0, reg1, reg2, reg3, reg4, reg5, reg6, reg7) \
fsrc2 %reg0, %f48; fsrc2 %reg1, %f50; \
fsrc2 %reg2, %f52; fsrc2 %reg3, %f54; \
fsrc2 %reg4, %f56; fsrc2 %reg5, %f58; \
fsrc2 %reg6, %f60; fsrc2 %reg7, %f62;
.text
.align 32
.globl copy_user_page
.type copy_user_page,#function
copy_user_page: /* %o0=dest, %o1=src, %o2=vaddr */
lduw [%g6 + TI_PRE_COUNT], %o4
sethi %hi(PAGE_OFFSET), %g2
sethi %hi(PAGE_SIZE), %o3
ldx [%g2 + %lo(PAGE_OFFSET)], %g2
sethi %hi(PAGE_KERNEL_LOCKED), %g3
ldx [%g3 + %lo(PAGE_KERNEL_LOCKED)], %g3
sub %o0, %g2, %g1 ! dest paddr
sub %o1, %g2, %g2 ! src paddr
and %o2, %o3, %o0 ! vaddr D-cache alias bit
or %g1, %g3, %g1 ! dest TTE data
or %g2, %g3, %g2 ! src TTE data
sethi %hi(TLBTEMP_BASE), %o3
sethi %hi(DCACHE_SIZE), %o1
add %o0, %o3, %o0 ! dest TTE vaddr
add %o4, 1, %o2
add %o0, %o1, %o1 ! src TTE vaddr
/* Disable preemption. */
mov TLB_TAG_ACCESS, %g3
stw %o2, [%g6 + TI_PRE_COUNT]
/* Load TLB entries. */
rdpr %pstate, %o2
wrpr %o2, PSTATE_IE, %pstate
stxa %o0, [%g3] ASI_DMMU
stxa %g1, [%g0] ASI_DTLB_DATA_IN
membar #Sync
stxa %o1, [%g3] ASI_DMMU
stxa %g2, [%g0] ASI_DTLB_DATA_IN
membar #Sync
wrpr %o2, 0x0, %pstate
cheetah_copy_page_insn:
ba,pt %xcc, 9f
nop
1:
VISEntryHalf
membar #StoreLoad | #StoreStore | #LoadStore
sethi %hi((PAGE_SIZE/64)-2), %o2
mov %o0, %g1
prefetch [%o1 + 0x000], #one_read
or %o2, %lo((PAGE_SIZE/64)-2), %o2
prefetch [%o1 + 0x040], #one_read
prefetch [%o1 + 0x080], #one_read
prefetch [%o1 + 0x0c0], #one_read
ldd [%o1 + 0x000], %f0
prefetch [%o1 + 0x100], #one_read
ldd [%o1 + 0x008], %f2
prefetch [%o1 + 0x140], #one_read
ldd [%o1 + 0x010], %f4
prefetch [%o1 + 0x180], #one_read
fsrc2 %f0, %f16
ldd [%o1 + 0x018], %f6
fsrc2 %f2, %f18
ldd [%o1 + 0x020], %f8
fsrc2 %f4, %f20
ldd [%o1 + 0x028], %f10
fsrc2 %f6, %f22
ldd [%o1 + 0x030], %f12
fsrc2 %f8, %f24
ldd [%o1 + 0x038], %f14
fsrc2 %f10, %f26
ldd [%o1 + 0x040], %f0
1: ldd [%o1 + 0x048], %f2
fsrc2 %f12, %f28
ldd [%o1 + 0x050], %f4
fsrc2 %f14, %f30
stda %f16, [%o0] ASI_BLK_P
ldd [%o1 + 0x058], %f6
fsrc2 %f0, %f16
ldd [%o1 + 0x060], %f8
fsrc2 %f2, %f18
ldd [%o1 + 0x068], %f10
fsrc2 %f4, %f20
ldd [%o1 + 0x070], %f12
fsrc2 %f6, %f22
ldd [%o1 + 0x078], %f14
fsrc2 %f8, %f24
ldd [%o1 + 0x080], %f0
prefetch [%o1 + 0x180], #one_read
fsrc2 %f10, %f26
subcc %o2, 1, %o2
add %o0, 0x40, %o0
bne,pt %xcc, 1b
add %o1, 0x40, %o1
ldd [%o1 + 0x048], %f2
fsrc2 %f12, %f28
ldd [%o1 + 0x050], %f4
fsrc2 %f14, %f30
stda %f16, [%o0] ASI_BLK_P
ldd [%o1 + 0x058], %f6
fsrc2 %f0, %f16
ldd [%o1 + 0x060], %f8
fsrc2 %f2, %f18
ldd [%o1 + 0x068], %f10
fsrc2 %f4, %f20
ldd [%o1 + 0x070], %f12
fsrc2 %f6, %f22
add %o0, 0x40, %o0
ldd [%o1 + 0x078], %f14
fsrc2 %f8, %f24
fsrc2 %f10, %f26
fsrc2 %f12, %f28
fsrc2 %f14, %f30
stda %f16, [%o0] ASI_BLK_P
membar #Sync
VISExitHalf
ba,pt %xcc, 5f
nop
9:
VISEntry
ldub [%g6 + TI_FAULT_CODE], %g3
mov %o0, %g1
cmp %g3, 0
rd %asi, %g3
be,a,pt %icc, 1f
wr %g0, ASI_BLK_P, %asi
wr %g0, ASI_BLK_COMMIT_P, %asi
1: ldda [%o1] ASI_BLK_P, %f0
add %o1, 0x40, %o1
ldda [%o1] ASI_BLK_P, %f16
add %o1, 0x40, %o1
sethi %hi(PAGE_SIZE), %o2
1: TOUCH(f0, f2, f4, f6, f8, f10, f12, f14)
ldda [%o1] ASI_BLK_P, %f32
stda %f48, [%o0] %asi
add %o1, 0x40, %o1
sub %o2, 0x40, %o2
add %o0, 0x40, %o0
TOUCH(f16, f18, f20, f22, f24, f26, f28, f30)
ldda [%o1] ASI_BLK_P, %f0
stda %f48, [%o0] %asi
add %o1, 0x40, %o1
sub %o2, 0x40, %o2
add %o0, 0x40, %o0
TOUCH(f32, f34, f36, f38, f40, f42, f44, f46)
ldda [%o1] ASI_BLK_P, %f16
stda %f48, [%o0] %asi
sub %o2, 0x40, %o2
add %o1, 0x40, %o1
cmp %o2, PAGE_SIZE_REM
bne,pt %xcc, 1b
add %o0, 0x40, %o0
#if (PAGE_SHIFT == 16)
TOUCH(f0, f2, f4, f6, f8, f10, f12, f14)
ldda [%o1] ASI_BLK_P, %f32
stda %f48, [%o0] %asi
add %o1, 0x40, %o1
sub %o2, 0x40, %o2
add %o0, 0x40, %o0
TOUCH(f16, f18, f20, f22, f24, f26, f28, f30)
ldda [%o1] ASI_BLK_P, %f0
stda %f48, [%o0] %asi
add %o1, 0x40, %o1
sub %o2, 0x40, %o2
add %o0, 0x40, %o0
membar #Sync
stda %f32, [%o0] %asi
add %o0, 0x40, %o0
stda %f0, [%o0] %asi
#else
membar #Sync
stda %f0, [%o0] %asi
add %o0, 0x40, %o0
stda %f16, [%o0] %asi
#endif
membar #Sync
wr %g3, 0x0, %asi
VISExit
5:
stxa %g0, [%g1] ASI_DMMU_DEMAP
membar #Sync
sethi %hi(DCACHE_SIZE), %g2
stxa %g0, [%g1 + %g2] ASI_DMMU_DEMAP
membar #Sync
retl
stw %o4, [%g6 + TI_PRE_COUNT]
.size copy_user_page, .-copy_user_page
.globl cheetah_patch_copy_page
cheetah_patch_copy_page:
sethi %hi(0x01000000), %o1 ! NOP
sethi %hi(cheetah_copy_page_insn), %o0
or %o0, %lo(cheetah_copy_page_insn), %o0
stw %o1, [%o0]
membar #StoreStore
flush %o0
retl
nop

496
arch/sparc/lib/copy_user.S Normal file
View file

@ -0,0 +1,496 @@
/* copy_user.S: Sparc optimized copy_from_user and copy_to_user code.
*
* Copyright(C) 1995 Linus Torvalds
* Copyright(C) 1996 David S. Miller
* Copyright(C) 1996 Eddie C. Dost
* Copyright(C) 1996,1998 Jakub Jelinek
*
* derived from:
* e-mail between David and Eddie.
*
* Returns 0 if successful, otherwise count of bytes not copied yet
*/
#include <asm/ptrace.h>
#include <asm/asmmacro.h>
#include <asm/page.h>
#include <asm/thread_info.h>
/* Work around cpp -rob */
#define ALLOC #alloc
#define EXECINSTR #execinstr
#define EX(x,y,a,b) \
98: x,y; \
.section .fixup,ALLOC,EXECINSTR; \
.align 4; \
99: ba fixupretl; \
a, b, %g3; \
.section __ex_table,ALLOC; \
.align 4; \
.word 98b, 99b; \
.text; \
.align 4
#define EX2(x,y,c,d,e,a,b) \
98: x,y; \
.section .fixup,ALLOC,EXECINSTR; \
.align 4; \
99: c, d, e; \
ba fixupretl; \
a, b, %g3; \
.section __ex_table,ALLOC; \
.align 4; \
.word 98b, 99b; \
.text; \
.align 4
#define EXO2(x,y) \
98: x, y; \
.section __ex_table,ALLOC; \
.align 4; \
.word 98b, 97f; \
.text; \
.align 4
#define EXT(start,end,handler) \
.section __ex_table,ALLOC; \
.align 4; \
.word start, 0, end, handler; \
.text; \
.align 4
/* Please do not change following macros unless you change logic used
* in .fixup at the end of this file as well
*/
/* Both these macros have to start with exactly the same insn */
#define MOVE_BIGCHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [%src + (offset) + 0x00], %t0; \
ldd [%src + (offset) + 0x08], %t2; \
ldd [%src + (offset) + 0x10], %t4; \
ldd [%src + (offset) + 0x18], %t6; \
st %t0, [%dst + (offset) + 0x00]; \
st %t1, [%dst + (offset) + 0x04]; \
st %t2, [%dst + (offset) + 0x08]; \
st %t3, [%dst + (offset) + 0x0c]; \
st %t4, [%dst + (offset) + 0x10]; \
st %t5, [%dst + (offset) + 0x14]; \
st %t6, [%dst + (offset) + 0x18]; \
st %t7, [%dst + (offset) + 0x1c];
#define MOVE_BIGALIGNCHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [%src + (offset) + 0x00], %t0; \
ldd [%src + (offset) + 0x08], %t2; \
ldd [%src + (offset) + 0x10], %t4; \
ldd [%src + (offset) + 0x18], %t6; \
std %t0, [%dst + (offset) + 0x00]; \
std %t2, [%dst + (offset) + 0x08]; \
std %t4, [%dst + (offset) + 0x10]; \
std %t6, [%dst + (offset) + 0x18];
#define MOVE_LASTCHUNK(src, dst, offset, t0, t1, t2, t3) \
ldd [%src - (offset) - 0x10], %t0; \
ldd [%src - (offset) - 0x08], %t2; \
st %t0, [%dst - (offset) - 0x10]; \
st %t1, [%dst - (offset) - 0x0c]; \
st %t2, [%dst - (offset) - 0x08]; \
st %t3, [%dst - (offset) - 0x04];
#define MOVE_HALFCHUNK(src, dst, offset, t0, t1, t2, t3) \
lduh [%src + (offset) + 0x00], %t0; \
lduh [%src + (offset) + 0x02], %t1; \
lduh [%src + (offset) + 0x04], %t2; \
lduh [%src + (offset) + 0x06], %t3; \
sth %t0, [%dst + (offset) + 0x00]; \
sth %t1, [%dst + (offset) + 0x02]; \
sth %t2, [%dst + (offset) + 0x04]; \
sth %t3, [%dst + (offset) + 0x06];
#define MOVE_SHORTCHUNK(src, dst, offset, t0, t1) \
ldub [%src - (offset) - 0x02], %t0; \
ldub [%src - (offset) - 0x01], %t1; \
stb %t0, [%dst - (offset) - 0x02]; \
stb %t1, [%dst - (offset) - 0x01];
.text
.align 4
.globl __copy_user_begin
__copy_user_begin:
.globl __copy_user
dword_align:
andcc %o1, 1, %g0
be 4f
andcc %o1, 2, %g0
EXO2(ldub [%o1], %g2)
add %o1, 1, %o1
EXO2(stb %g2, [%o0])
sub %o2, 1, %o2
bne 3f
add %o0, 1, %o0
EXO2(lduh [%o1], %g2)
add %o1, 2, %o1
EXO2(sth %g2, [%o0])
sub %o2, 2, %o2
b 3f
add %o0, 2, %o0
4:
EXO2(lduh [%o1], %g2)
add %o1, 2, %o1
EXO2(sth %g2, [%o0])
sub %o2, 2, %o2
b 3f
add %o0, 2, %o0
__copy_user: /* %o0=dst %o1=src %o2=len */
xor %o0, %o1, %o4
1:
andcc %o4, 3, %o5
2:
bne cannot_optimize
cmp %o2, 15
bleu short_aligned_end
andcc %o1, 3, %g0
bne dword_align
3:
andcc %o1, 4, %g0
be 2f
mov %o2, %g1
EXO2(ld [%o1], %o4)
sub %g1, 4, %g1
EXO2(st %o4, [%o0])
add %o1, 4, %o1
add %o0, 4, %o0
2:
andcc %g1, 0xffffff80, %g7
be 3f
andcc %o0, 4, %g0
be ldd_std + 4
5:
MOVE_BIGCHUNK(o1, o0, 0x00, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGCHUNK(o1, o0, 0x20, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGCHUNK(o1, o0, 0x40, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGCHUNK(o1, o0, 0x60, o2, o3, o4, o5, g2, g3, g4, g5)
80:
EXT(5b, 80b, 50f)
subcc %g7, 128, %g7
add %o1, 128, %o1
bne 5b
add %o0, 128, %o0
3:
andcc %g1, 0x70, %g7
be copy_user_table_end
andcc %g1, 8, %g0
sethi %hi(copy_user_table_end), %o5
srl %g7, 1, %o4
add %g7, %o4, %o4
add %o1, %g7, %o1
sub %o5, %o4, %o5
jmpl %o5 + %lo(copy_user_table_end), %g0
add %o0, %g7, %o0
copy_user_table:
MOVE_LASTCHUNK(o1, o0, 0x60, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x50, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x40, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x30, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x20, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x10, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x00, g2, g3, g4, g5)
copy_user_table_end:
EXT(copy_user_table, copy_user_table_end, 51f)
be copy_user_last7
andcc %g1, 4, %g0
EX(ldd [%o1], %g2, and %g1, 0xf)
add %o0, 8, %o0
add %o1, 8, %o1
EX(st %g2, [%o0 - 0x08], and %g1, 0xf)
EX2(st %g3, [%o0 - 0x04], and %g1, 0xf, %g1, sub %g1, 4)
copy_user_last7:
be 1f
andcc %g1, 2, %g0
EX(ld [%o1], %g2, and %g1, 7)
add %o1, 4, %o1
EX(st %g2, [%o0], and %g1, 7)
add %o0, 4, %o0
1:
be 1f
andcc %g1, 1, %g0
EX(lduh [%o1], %g2, and %g1, 3)
add %o1, 2, %o1
EX(sth %g2, [%o0], and %g1, 3)
add %o0, 2, %o0
1:
be 1f
nop
EX(ldub [%o1], %g2, add %g0, 1)
EX(stb %g2, [%o0], add %g0, 1)
1:
retl
clr %o0
ldd_std:
MOVE_BIGALIGNCHUNK(o1, o0, 0x00, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGALIGNCHUNK(o1, o0, 0x20, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGALIGNCHUNK(o1, o0, 0x40, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGALIGNCHUNK(o1, o0, 0x60, o2, o3, o4, o5, g2, g3, g4, g5)
81:
EXT(ldd_std, 81b, 52f)
subcc %g7, 128, %g7
add %o1, 128, %o1
bne ldd_std
add %o0, 128, %o0
andcc %g1, 0x70, %g7
be copy_user_table_end
andcc %g1, 8, %g0
sethi %hi(copy_user_table_end), %o5
srl %g7, 1, %o4
add %g7, %o4, %o4
add %o1, %g7, %o1
sub %o5, %o4, %o5
jmpl %o5 + %lo(copy_user_table_end), %g0
add %o0, %g7, %o0
cannot_optimize:
bleu short_end
cmp %o5, 2
bne byte_chunk
and %o2, 0xfffffff0, %o3
andcc %o1, 1, %g0
be 10f
nop
EXO2(ldub [%o1], %g2)
add %o1, 1, %o1
EXO2(stb %g2, [%o0])
sub %o2, 1, %o2
andcc %o2, 0xfffffff0, %o3
be short_end
add %o0, 1, %o0
10:
MOVE_HALFCHUNK(o1, o0, 0x00, g2, g3, g4, g5)
MOVE_HALFCHUNK(o1, o0, 0x08, g2, g3, g4, g5)
82:
EXT(10b, 82b, 53f)
subcc %o3, 0x10, %o3
add %o1, 0x10, %o1
bne 10b
add %o0, 0x10, %o0
b 2f
and %o2, 0xe, %o3
byte_chunk:
MOVE_SHORTCHUNK(o1, o0, -0x02, g2, g3)
MOVE_SHORTCHUNK(o1, o0, -0x04, g2, g3)
MOVE_SHORTCHUNK(o1, o0, -0x06, g2, g3)
MOVE_SHORTCHUNK(o1, o0, -0x08, g2, g3)
MOVE_SHORTCHUNK(o1, o0, -0x0a, g2, g3)
MOVE_SHORTCHUNK(o1, o0, -0x0c, g2, g3)
MOVE_SHORTCHUNK(o1, o0, -0x0e, g2, g3)
MOVE_SHORTCHUNK(o1, o0, -0x10, g2, g3)
83:
EXT(byte_chunk, 83b, 54f)
subcc %o3, 0x10, %o3
add %o1, 0x10, %o1
bne byte_chunk
add %o0, 0x10, %o0
short_end:
and %o2, 0xe, %o3
2:
sethi %hi(short_table_end), %o5
sll %o3, 3, %o4
add %o0, %o3, %o0
sub %o5, %o4, %o5
add %o1, %o3, %o1
jmpl %o5 + %lo(short_table_end), %g0
andcc %o2, 1, %g0
84:
MOVE_SHORTCHUNK(o1, o0, 0x0c, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x0a, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x08, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x06, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x04, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x02, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x00, g2, g3)
short_table_end:
EXT(84b, short_table_end, 55f)
be 1f
nop
EX(ldub [%o1], %g2, add %g0, 1)
EX(stb %g2, [%o0], add %g0, 1)
1:
retl
clr %o0
short_aligned_end:
bne short_end
andcc %o2, 8, %g0
be 1f
andcc %o2, 4, %g0
EXO2(ld [%o1 + 0x00], %g2)
EXO2(ld [%o1 + 0x04], %g3)
add %o1, 8, %o1
EXO2(st %g2, [%o0 + 0x00])
EX(st %g3, [%o0 + 0x04], sub %o2, 4)
add %o0, 8, %o0
1:
b copy_user_last7
mov %o2, %g1
.section .fixup,#alloc,#execinstr
.align 4
97:
mov %o2, %g3
fixupretl:
sethi %hi(PAGE_OFFSET), %g1
cmp %o0, %g1
blu 1f
cmp %o1, %g1
bgeu 1f
ld [%g6 + TI_PREEMPT], %g1
cmp %g1, 0
bne 1f
nop
save %sp, -64, %sp
mov %i0, %o0
call __bzero
mov %g3, %o1
restore
1: retl
mov %g3, %o0
/* exception routine sets %g2 to (broken_insn - first_insn)>>2 */
50:
/* This magic counts how many bytes are left when crash in MOVE_BIGCHUNK
* happens. This is derived from the amount ldd reads, st stores, etc.
* x = g2 % 12;
* g3 = g1 + g7 - ((g2 / 12) * 32 + (x < 4) ? 0 : (x - 4) * 4);
* o0 += (g2 / 12) * 32;
*/
cmp %g2, 12
add %o0, %g7, %o0
bcs 1f
cmp %g2, 24
bcs 2f
cmp %g2, 36
bcs 3f
nop
sub %g2, 12, %g2
sub %g7, 32, %g7
3: sub %g2, 12, %g2
sub %g7, 32, %g7
2: sub %g2, 12, %g2
sub %g7, 32, %g7
1: cmp %g2, 4
bcs,a 60f
clr %g2
sub %g2, 4, %g2
sll %g2, 2, %g2
60: and %g1, 0x7f, %g3
sub %o0, %g7, %o0
add %g3, %g7, %g3
ba fixupretl
sub %g3, %g2, %g3
51:
/* i = 41 - g2; j = i % 6;
* g3 = (g1 & 15) + (i / 6) * 16 + (j < 4) ? (j + 1) * 4 : 16;
* o0 -= (i / 6) * 16 + 16;
*/
neg %g2
and %g1, 0xf, %g1
add %g2, 41, %g2
add %o0, %g1, %o0
1: cmp %g2, 6
bcs,a 2f
cmp %g2, 4
add %g1, 16, %g1
b 1b
sub %g2, 6, %g2
2: bcc,a 2f
mov 16, %g2
inc %g2
sll %g2, 2, %g2
2: add %g1, %g2, %g3
ba fixupretl
sub %o0, %g3, %o0
52:
/* g3 = g1 + g7 - (g2 / 8) * 32 + (g2 & 4) ? (g2 & 3) * 8 : 0;
o0 += (g2 / 8) * 32 */
andn %g2, 7, %g4
add %o0, %g7, %o0
andcc %g2, 4, %g0
and %g2, 3, %g2
sll %g4, 2, %g4
sll %g2, 3, %g2
bne 60b
sub %g7, %g4, %g7
ba 60b
clr %g2
53:
/* g3 = o3 + (o2 & 15) - (g2 & 8) - (g2 & 4) ? (g2 & 3) * 2 : 0;
o0 += (g2 & 8) */
and %g2, 3, %g4
andcc %g2, 4, %g0
and %g2, 8, %g2
sll %g4, 1, %g4
be 1f
add %o0, %g2, %o0
add %g2, %g4, %g2
1: and %o2, 0xf, %g3
add %g3, %o3, %g3
ba fixupretl
sub %g3, %g2, %g3
54:
/* g3 = o3 + (o2 & 15) - (g2 / 4) * 2 - (g2 & 2) ? (g2 & 1) : 0;
o0 += (g2 / 4) * 2 */
srl %g2, 2, %o4
and %g2, 1, %o5
srl %g2, 1, %g2
add %o4, %o4, %o4
and %o5, %g2, %o5
and %o2, 0xf, %o2
add %o0, %o4, %o0
sub %o3, %o5, %o3
sub %o2, %o4, %o2
ba fixupretl
add %o2, %o3, %g3
55:
/* i = 27 - g2;
g3 = (o2 & 1) + i / 4 * 2 + !(i & 3);
o0 -= i / 4 * 2 + 1 */
neg %g2
and %o2, 1, %o2
add %g2, 27, %g2
srl %g2, 2, %o5
andcc %g2, 3, %g0
mov 1, %g2
add %o5, %o5, %o5
be,a 1f
clr %g2
1: add %g2, %o5, %g3
sub %o0, %g3, %o0
ba fixupretl
add %g3, %o2, %g3
.globl __copy_user_end
__copy_user_end:

309
arch/sparc/lib/csum_copy.S Normal file
View file

@ -0,0 +1,309 @@
/* csum_copy.S: Checksum+copy code for sparc64
*
* Copyright (C) 2005 David S. Miller <davem@davemloft.net>
*/
#ifdef __KERNEL__
#define GLOBAL_SPARE %g7
#else
#define GLOBAL_SPARE %g5
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
#ifndef EX_ST
#define EX_ST(x) x
#endif
#ifndef EX_RETVAL
#define EX_RETVAL(x) x
#endif
#ifndef LOAD
#define LOAD(type,addr,dest) type [addr], dest
#endif
#ifndef STORE
#define STORE(type,src,addr) type src, [addr]
#endif
#ifndef FUNC_NAME
#define FUNC_NAME csum_partial_copy_nocheck
#endif
.register %g2, #scratch
.register %g3, #scratch
.text
90:
/* We checked for zero length already, so there must be
* at least one byte.
*/
be,pt %icc, 1f
nop
EX_LD(LOAD(ldub, %o0 + 0x00, %o4))
add %o0, 1, %o0
sub %o2, 1, %o2
EX_ST(STORE(stb, %o4, %o1 + 0x00))
add %o1, 1, %o1
1: andcc %o0, 0x2, %g0
be,pn %icc, 80f
cmp %o2, 2
blu,pn %icc, 60f
nop
EX_LD(LOAD(lduh, %o0 + 0x00, %o5))
add %o0, 2, %o0
sub %o2, 2, %o2
EX_ST(STORE(sth, %o5, %o1 + 0x00))
add %o1, 2, %o1
ba,pt %xcc, 80f
add %o5, %o4, %o4
.globl FUNC_NAME
FUNC_NAME: /* %o0=src, %o1=dst, %o2=len, %o3=sum */
LOAD(prefetch, %o0 + 0x000, #n_reads)
xor %o0, %o1, %g1
clr %o4
andcc %g1, 0x3, %g0
bne,pn %icc, 95f
LOAD(prefetch, %o0 + 0x040, #n_reads)
brz,pn %o2, 70f
andcc %o0, 0x3, %g0
/* We "remember" whether the lowest bit in the address
* was set in GLOBAL_SPARE. Because if it is, we have to swap
* upper and lower 8 bit fields of the sum we calculate.
*/
bne,pn %icc, 90b
andcc %o0, 0x1, GLOBAL_SPARE
80:
LOAD(prefetch, %o0 + 0x080, #n_reads)
andncc %o2, 0x3f, %g3
LOAD(prefetch, %o0 + 0x0c0, #n_reads)
sub %o2, %g3, %o2
brz,pn %g3, 2f
LOAD(prefetch, %o0 + 0x100, #n_reads)
/* So that we don't need to use the non-pairing
* add-with-carry instructions we accumulate 32-bit
* values into a 64-bit register. At the end of the
* loop we fold it down to 32-bits and so on.
*/
ba,pt %xcc, 1f
LOAD(prefetch, %o0 + 0x140, #n_reads)
.align 32
1: EX_LD(LOAD(lduw, %o0 + 0x00, %o5))
EX_LD(LOAD(lduw, %o0 + 0x04, %g1))
EX_LD(LOAD(lduw, %o0 + 0x08, %g2))
add %o4, %o5, %o4
EX_ST(STORE(stw, %o5, %o1 + 0x00))
EX_LD(LOAD(lduw, %o0 + 0x0c, %o5))
add %o4, %g1, %o4
EX_ST(STORE(stw, %g1, %o1 + 0x04))
EX_LD(LOAD(lduw, %o0 + 0x10, %g1))
add %o4, %g2, %o4
EX_ST(STORE(stw, %g2, %o1 + 0x08))
EX_LD(LOAD(lduw, %o0 + 0x14, %g2))
add %o4, %o5, %o4
EX_ST(STORE(stw, %o5, %o1 + 0x0c))
EX_LD(LOAD(lduw, %o0 + 0x18, %o5))
add %o4, %g1, %o4
EX_ST(STORE(stw, %g1, %o1 + 0x10))
EX_LD(LOAD(lduw, %o0 + 0x1c, %g1))
add %o4, %g2, %o4
EX_ST(STORE(stw, %g2, %o1 + 0x14))
EX_LD(LOAD(lduw, %o0 + 0x20, %g2))
add %o4, %o5, %o4
EX_ST(STORE(stw, %o5, %o1 + 0x18))
EX_LD(LOAD(lduw, %o0 + 0x24, %o5))
add %o4, %g1, %o4
EX_ST(STORE(stw, %g1, %o1 + 0x1c))
EX_LD(LOAD(lduw, %o0 + 0x28, %g1))
add %o4, %g2, %o4
EX_ST(STORE(stw, %g2, %o1 + 0x20))
EX_LD(LOAD(lduw, %o0 + 0x2c, %g2))
add %o4, %o5, %o4
EX_ST(STORE(stw, %o5, %o1 + 0x24))
EX_LD(LOAD(lduw, %o0 + 0x30, %o5))
add %o4, %g1, %o4
EX_ST(STORE(stw, %g1, %o1 + 0x28))
EX_LD(LOAD(lduw, %o0 + 0x34, %g1))
add %o4, %g2, %o4
EX_ST(STORE(stw, %g2, %o1 + 0x2c))
EX_LD(LOAD(lduw, %o0 + 0x38, %g2))
add %o4, %o5, %o4
EX_ST(STORE(stw, %o5, %o1 + 0x30))
EX_LD(LOAD(lduw, %o0 + 0x3c, %o5))
add %o4, %g1, %o4
EX_ST(STORE(stw, %g1, %o1 + 0x34))
LOAD(prefetch, %o0 + 0x180, #n_reads)
add %o4, %g2, %o4
EX_ST(STORE(stw, %g2, %o1 + 0x38))
subcc %g3, 0x40, %g3
add %o0, 0x40, %o0
add %o4, %o5, %o4
EX_ST(STORE(stw, %o5, %o1 + 0x3c))
bne,pt %icc, 1b
add %o1, 0x40, %o1
2: and %o2, 0x3c, %g3
brz,pn %g3, 2f
sub %o2, %g3, %o2
1: EX_LD(LOAD(lduw, %o0 + 0x00, %o5))
subcc %g3, 0x4, %g3
add %o0, 0x4, %o0
add %o4, %o5, %o4
EX_ST(STORE(stw, %o5, %o1 + 0x00))
bne,pt %icc, 1b
add %o1, 0x4, %o1
2:
/* fold 64-->32 */
srlx %o4, 32, %o5
srl %o4, 0, %o4
add %o4, %o5, %o4
srlx %o4, 32, %o5
srl %o4, 0, %o4
add %o4, %o5, %o4
/* fold 32-->16 */
sethi %hi(0xffff0000), %g1
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
60:
/* %o4 has the 16-bit sum we have calculated so-far. */
cmp %o2, 2
blu,pt %icc, 1f
nop
EX_LD(LOAD(lduh, %o0 + 0x00, %o5))
sub %o2, 2, %o2
add %o0, 2, %o0
add %o4, %o5, %o4
EX_ST(STORE(sth, %o5, %o1 + 0x00))
add %o1, 0x2, %o1
1: brz,pt %o2, 1f
nop
EX_LD(LOAD(ldub, %o0 + 0x00, %o5))
sub %o2, 1, %o2
add %o0, 1, %o0
EX_ST(STORE(stb, %o5, %o1 + 0x00))
sllx %o5, 8, %o5
add %o1, 1, %o1
add %o4, %o5, %o4
1:
/* fold 32-->16 */
sethi %hi(0xffff0000), %g1
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
srl %o4, 16, %o5
andn %o4, %g1, %g2
add %o5, %g2, %o4
1: brz,pt GLOBAL_SPARE, 1f
nop
/* We started with an odd byte, byte-swap the result. */
srl %o4, 8, %o5
and %o4, 0xff, %g1
sll %g1, 8, %g1
or %o5, %g1, %o4
1: addcc %o3, %o4, %o3
addc %g0, %o3, %o3
70:
retl
srl %o3, 0, %o0
95: mov 0, GLOBAL_SPARE
brlez,pn %o2, 4f
andcc %o0, 1, %o5
be,a,pt %icc, 1f
srl %o2, 1, %g1
sub %o2, 1, %o2
EX_LD(LOAD(ldub, %o0, GLOBAL_SPARE))
add %o0, 1, %o0
EX_ST(STORE(stb, GLOBAL_SPARE, %o1))
srl %o2, 1, %g1
add %o1, 1, %o1
1: brz,a,pn %g1, 3f
andcc %o2, 1, %g0
andcc %o0, 2, %g0
be,a,pt %icc, 1f
srl %g1, 1, %g1
EX_LD(LOAD(lduh, %o0, %o4))
sub %o2, 2, %o2
srl %o4, 8, %g2
sub %g1, 1, %g1
EX_ST(STORE(stb, %g2, %o1))
add %o4, GLOBAL_SPARE, GLOBAL_SPARE
EX_ST(STORE(stb, %o4, %o1 + 1))
add %o0, 2, %o0
srl %g1, 1, %g1
add %o1, 2, %o1
1: brz,a,pn %g1, 2f
andcc %o2, 2, %g0
EX_LD(LOAD(lduw, %o0, %o4))
5: srl %o4, 24, %g2
srl %o4, 16, %g3
EX_ST(STORE(stb, %g2, %o1))
srl %o4, 8, %g2
EX_ST(STORE(stb, %g3, %o1 + 1))
add %o0, 4, %o0
EX_ST(STORE(stb, %g2, %o1 + 2))
addcc %o4, GLOBAL_SPARE, GLOBAL_SPARE
EX_ST(STORE(stb, %o4, %o1 + 3))
addc GLOBAL_SPARE, %g0, GLOBAL_SPARE
add %o1, 4, %o1
subcc %g1, 1, %g1
bne,a,pt %icc, 5b
EX_LD(LOAD(lduw, %o0, %o4))
sll GLOBAL_SPARE, 16, %g2
srl GLOBAL_SPARE, 16, GLOBAL_SPARE
srl %g2, 16, %g2
andcc %o2, 2, %g0
add %g2, GLOBAL_SPARE, GLOBAL_SPARE
2: be,a,pt %icc, 3f
andcc %o2, 1, %g0
EX_LD(LOAD(lduh, %o0, %o4))
andcc %o2, 1, %g0
srl %o4, 8, %g2
add %o0, 2, %o0
EX_ST(STORE(stb, %g2, %o1))
add GLOBAL_SPARE, %o4, GLOBAL_SPARE
EX_ST(STORE(stb, %o4, %o1 + 1))
add %o1, 2, %o1
3: be,a,pt %icc, 1f
sll GLOBAL_SPARE, 16, %o4
EX_LD(LOAD(ldub, %o0, %g2))
sll %g2, 8, %o4
EX_ST(STORE(stb, %g2, %o1))
add GLOBAL_SPARE, %o4, GLOBAL_SPARE
sll GLOBAL_SPARE, 16, %o4
1: addcc %o4, GLOBAL_SPARE, GLOBAL_SPARE
srl GLOBAL_SPARE, 16, %o4
addc %g0, %o4, GLOBAL_SPARE
brz,pt %o5, 4f
srl GLOBAL_SPARE, 8, %o4
and GLOBAL_SPARE, 0xff, %g2
and %o4, 0xff, %o4
sll %g2, 8, %g2
or %g2, %o4, GLOBAL_SPARE
4: addcc %o3, GLOBAL_SPARE, %o3
addc %g0, %o3, %o0
retl
srl %o0, 0, %o0
.size FUNC_NAME, .-FUNC_NAME

View file

@ -0,0 +1,21 @@
/* csum_copy_from_user.S: Checksum+copy from userspace.
*
* Copyright (C) 2005 David S. Miller (davem@davemloft.net)
*/
#define EX_LD(x) \
98: x; \
.section .fixup, "ax"; \
.align 4; \
99: retl; \
mov -1, %o0; \
.section __ex_table,"a";\
.align 4; \
.word 98b, 99b; \
.text; \
.align 4;
#define FUNC_NAME __csum_partial_copy_from_user
#define LOAD(type,addr,dest) type##a [addr] %asi, dest
#include "csum_copy.S"

View file

@ -0,0 +1,21 @@
/* csum_copy_to_user.S: Checksum+copy to userspace.
*
* Copyright (C) 2005 David S. Miller (davem@davemloft.net)
*/
#define EX_ST(x) \
98: x; \
.section .fixup,"ax"; \
.align 4; \
99: retl; \
mov -1, %o0; \
.section __ex_table,"a";\
.align 4; \
.word 98b, 99b; \
.text; \
.align 4;
#define FUNC_NAME __csum_partial_copy_to_user
#define STORE(type,src,addr) type##a src, [addr] %asi
#include "csum_copy.S"

281
arch/sparc/lib/divdi3.S Normal file
View file

@ -0,0 +1,281 @@
/* Copyright (C) 1989, 1992, 1993, 1994, 1995 Free Software Foundation, Inc.
This file is part of GNU CC.
GNU CC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
GNU CC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU CC; see the file COPYING. If not, write to
the Free Software Foundation, 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
.text
.align 4
.globl __divdi3
__divdi3:
save %sp,-104,%sp
cmp %i0,0
bge .LL40
mov 0,%l4
mov -1,%l4
sub %g0,%i1,%o0
mov %o0,%o5
subcc %g0,%o0,%g0
sub %g0,%i0,%o0
subx %o0,0,%o4
mov %o4,%i0
mov %o5,%i1
.LL40:
cmp %i2,0
bge .LL84
mov %i3,%o4
xnor %g0,%l4,%l4
sub %g0,%i3,%o0
mov %o0,%o3
subcc %g0,%o0,%g0
sub %g0,%i2,%o0
subx %o0,0,%o2
mov %o2,%i2
mov %o3,%i3
mov %i3,%o4
.LL84:
cmp %i2,0
bne .LL45
mov %i1,%i3
cmp %o4,%i0
bleu .LL46
mov %i3,%o1
mov 32,%g1
subcc %i0,%o4,%g0
1: bcs 5f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
sub %i0,%o4,%i0 ! this kills msb of n
addx %i0,%i0,%i0 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %i0,%o4,%g0
bcs 3f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
b 3f
sub %i0,%o4,%i0 ! this kills msb of n
4: sub %i0,%o4,%i0
5: addxcc %i0,%i0,%i0
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o1,%o1,%o1 ! shift n1n0 and a 0-bit in lsb
sub %i0,%o4,%i0
3: xnor %o1,0,%o1
b .LL50
mov 0,%o2
.LL46:
cmp %o4,0
bne .LL85
mov %i0,%o2
mov 1,%o0
mov 0,%o1
wr %g0, 0, %y
udiv %o0, %o1, %o0
mov %o0,%o4
mov %i0,%o2
.LL85:
mov 0,%g3
mov 32,%g1
subcc %g3,%o4,%g0
1: bcs 5f
addxcc %o2,%o2,%o2 ! shift n1n0 and a q-bit in lsb
sub %g3,%o4,%g3 ! this kills msb of n
addx %g3,%g3,%g3 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %g3,%o4,%g0
bcs 3f
addxcc %o2,%o2,%o2 ! shift n1n0 and a q-bit in lsb
b 3f
sub %g3,%o4,%g3 ! this kills msb of n
4: sub %g3,%o4,%g3
5: addxcc %g3,%g3,%g3
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o2,%o2,%o2 ! shift n1n0 and a 0-bit in lsb
sub %g3,%o4,%g3
3: xnor %o2,0,%o2
mov %g3,%i0
mov %i3,%o1
mov 32,%g1
subcc %i0,%o4,%g0
1: bcs 5f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
sub %i0,%o4,%i0 ! this kills msb of n
addx %i0,%i0,%i0 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %i0,%o4,%g0
bcs 3f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
b 3f
sub %i0,%o4,%i0 ! this kills msb of n
4: sub %i0,%o4,%i0
5: addxcc %i0,%i0,%i0
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o1,%o1,%o1 ! shift n1n0 and a 0-bit in lsb
sub %i0,%o4,%i0
3: xnor %o1,0,%o1
b .LL86
mov %o1,%l1
.LL45:
cmp %i2,%i0
bleu .LL51
sethi %hi(65535),%o0
b .LL78
mov 0,%o1
.LL51:
or %o0,%lo(65535),%o0
cmp %i2,%o0
bgu .LL58
mov %i2,%o1
cmp %i2,256
addx %g0,-1,%o0
b .LL64
and %o0,8,%o2
.LL58:
sethi %hi(16777215),%o0
or %o0,%lo(16777215),%o0
cmp %i2,%o0
bgu .LL64
mov 24,%o2
mov 16,%o2
.LL64:
srl %o1,%o2,%o0
sethi %hi(__clz_tab),%o1
or %o1,%lo(__clz_tab),%o1
ldub [%o0+%o1],%o0
add %o0,%o2,%o0
mov 32,%o1
subcc %o1,%o0,%o3
bne,a .LL72
sub %o1,%o3,%o1
cmp %i0,%i2
bgu .LL74
cmp %i3,%o4
blu .LL78
mov 0,%o1
.LL74:
b .LL78
mov 1,%o1
.LL72:
sll %i2,%o3,%o2
srl %o4,%o1,%o0
or %o2,%o0,%i2
sll %o4,%o3,%o4
srl %i0,%o1,%o2
sll %i0,%o3,%o0
srl %i3,%o1,%o1
or %o0,%o1,%i0
sll %i3,%o3,%i3
mov %i0,%o1
mov 32,%g1
subcc %o2,%i2,%g0
1: bcs 5f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
sub %o2,%i2,%o2 ! this kills msb of n
addx %o2,%o2,%o2 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %o2,%i2,%g0
bcs 3f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
b 3f
sub %o2,%i2,%o2 ! this kills msb of n
4: sub %o2,%i2,%o2
5: addxcc %o2,%o2,%o2
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o1,%o1,%o1 ! shift n1n0 and a 0-bit in lsb
sub %o2,%i2,%o2
3: xnor %o1,0,%o1
mov %o2,%i0
wr %g0,%o1,%y ! SPARC has 0-3 delay insn after a wr
sra %o4,31,%g2 ! Do not move this insn
and %o1,%g2,%g2 ! Do not move this insn
andcc %g0,0,%g1 ! Do not move this insn
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,%o4,%g1
mulscc %g1,0,%g1
add %g1,%g2,%o0
rd %y,%o2
cmp %o0,%i0
bgu,a .LL78
add %o1,-1,%o1
bne,a .LL50
mov 0,%o2
cmp %o2,%i3
bleu .LL50
mov 0,%o2
add %o1,-1,%o1
.LL78:
mov 0,%o2
.LL50:
mov %o1,%l1
.LL86:
mov %o2,%l0
mov %l0,%i0
mov %l1,%i1
cmp %l4,0
be .LL81
sub %g0,%i1,%o0
mov %o0,%l3
subcc %g0,%o0,%g0
sub %g0,%i0,%o0
subx %o0,0,%l2
mov %l2,%i0
mov %l3,%i1
.LL81:
ret
restore

84
arch/sparc/lib/ffs.S Normal file
View file

@ -0,0 +1,84 @@
#include <linux/linkage.h>
.register %g2,#scratch
.text
.align 32
ENTRY(ffs)
brnz,pt %o0, 1f
mov 1, %o1
retl
clr %o0
nop
nop
ENTRY(__ffs)
sllx %o0, 32, %g1 /* 1 */
srlx %o0, 32, %g2
clr %o1 /* 2 */
movrz %g1, %g2, %o0
movrz %g1, 32, %o1 /* 3 */
1: clr %o2
sllx %o0, (64 - 16), %g1 /* 4 */
srlx %o0, 16, %g2
movrz %g1, %g2, %o0 /* 5 */
clr %o3
movrz %g1, 16, %o2 /* 6 */
clr %o4
and %o0, 0xff, %g1 /* 7 */
srlx %o0, 8, %g2
movrz %g1, %g2, %o0 /* 8 */
clr %o5
movrz %g1, 8, %o3 /* 9 */
add %o2, %o1, %o2
and %o0, 0xf, %g1 /* 10 */
srlx %o0, 4, %g2
movrz %g1, %g2, %o0 /* 11 */
add %o2, %o3, %o2
movrz %g1, 4, %o4 /* 12 */
and %o0, 0x3, %g1 /* 13 */
srlx %o0, 2, %g2
movrz %g1, %g2, %o0 /* 14 */
add %o2, %o4, %o2
movrz %g1, 2, %o5 /* 15 */
and %o0, 0x1, %g1 /* 16 */
add %o2, %o5, %o2 /* 17 */
xor %g1, 0x1, %g1
retl /* 18 */
add %o2, %g1, %o0
ENDPROC(ffs)
ENDPROC(__ffs)
.section .popc_6insn_patch, "ax"
.word ffs
brz,pn %o0, 98f
neg %o0, %g1
xnor %o0, %g1, %o1
popc %o1, %o0
98: retl
nop
.word __ffs
neg %o0, %g1
xnor %o0, %g1, %o1
popc %o1, %o0
retl
sub %o0, 1, %o0
nop
.previous

51
arch/sparc/lib/hweight.S Normal file
View file

@ -0,0 +1,51 @@
#include <linux/linkage.h>
.text
.align 32
ENTRY(__arch_hweight8)
ba,pt %xcc, __sw_hweight8
nop
nop
ENDPROC(__arch_hweight8)
.section .popc_3insn_patch, "ax"
.word __arch_hweight8
sllx %o0, 64-8, %g1
retl
popc %g1, %o0
.previous
ENTRY(__arch_hweight16)
ba,pt %xcc, __sw_hweight16
nop
nop
ENDPROC(__arch_hweight16)
.section .popc_3insn_patch, "ax"
.word __arch_hweight16
sllx %o0, 64-16, %g1
retl
popc %g1, %o0
.previous
ENTRY(__arch_hweight32)
ba,pt %xcc, __sw_hweight32
nop
nop
ENDPROC(__arch_hweight32)
.section .popc_3insn_patch, "ax"
.word __arch_hweight32
sllx %o0, 64-32, %g1
retl
popc %g1, %o0
.previous
ENTRY(__arch_hweight64)
ba,pt %xcc, __sw_hweight64
nop
nop
ENDPROC(__arch_hweight64)
.section .popc_3insn_patch, "ax"
.word __arch_hweight64
retl
popc %o0, %o0
nop
.previous

25
arch/sparc/lib/iomap.c Normal file
View file

@ -0,0 +1,25 @@
/*
* Implement the sparc iomap interfaces
*/
#include <linux/pci.h>
#include <linux/module.h>
#include <asm/io.h>
/* Create a virtual mapping cookie for an IO port range */
void __iomem *ioport_map(unsigned long port, unsigned int nr)
{
return (void __iomem *) (unsigned long) port;
}
void ioport_unmap(void __iomem *addr)
{
/* Nothing to do */
}
EXPORT_SYMBOL(ioport_map);
EXPORT_SYMBOL(ioport_unmap);
void pci_iounmap(struct pci_dev *dev, void __iomem * addr)
{
/* nothing to do */
}
EXPORT_SYMBOL(pci_iounmap);

33
arch/sparc/lib/ipcsum.S Normal file
View file

@ -0,0 +1,33 @@
#include <linux/linkage.h>
.text
ENTRY(ip_fast_csum) /* %o0 = iph, %o1 = ihl */
sub %o1, 4, %g7
lduw [%o0 + 0x00], %o2
lduw [%o0 + 0x04], %g2
lduw [%o0 + 0x08], %g3
addcc %g2, %o2, %o2
lduw [%o0 + 0x0c], %g2
addccc %g3, %o2, %o2
lduw [%o0 + 0x10], %g3
addccc %g2, %o2, %o2
addc %o2, %g0, %o2
1: addcc %g3, %o2, %o2
add %o0, 4, %o0
addccc %o2, %g0, %o2
subcc %g7, 1, %g7
be,a,pt %icc, 2f
sll %o2, 16, %g2
lduw [%o0 + 0x10], %g3
ba,pt %xcc, 1b
nop
2: addcc %o2, %g2, %g2
srl %g2, 16, %o2
addc %o2, %g0, %o2
xnor %g0, %o2, %o2
set 0xffff, %o1
retl
and %o2, %o1, %o0
ENDPROC(ip_fast_csum)

166
arch/sparc/lib/ksyms.c Normal file
View file

@ -0,0 +1,166 @@
/*
* Export of symbols defined in assembler
*/
/* Tell string.h we don't want memcpy etc. as cpp defines */
#define EXPORT_SYMTAB_STROPS
#include <linux/module.h>
#include <linux/string.h>
#include <linux/types.h>
#include <asm/checksum.h>
#include <asm/uaccess.h>
#include <asm/ftrace.h>
/* string functions */
EXPORT_SYMBOL(strlen);
EXPORT_SYMBOL(strncmp);
/* mem* functions */
extern void *__memscan_zero(void *, size_t);
extern void *__memscan_generic(void *, int, size_t);
extern void *__bzero(void *, size_t);
EXPORT_SYMBOL(memscan);
EXPORT_SYMBOL(__memscan_zero);
EXPORT_SYMBOL(__memscan_generic);
EXPORT_SYMBOL(memcmp);
EXPORT_SYMBOL(memcpy);
EXPORT_SYMBOL(memset);
EXPORT_SYMBOL(memmove);
EXPORT_SYMBOL(__bzero);
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial);
#ifdef CONFIG_MCOUNT
EXPORT_SYMBOL(_mcount);
#endif
/*
* sparc
*/
#ifdef CONFIG_SPARC32
extern int __ashrdi3(int, int);
extern int __ashldi3(int, int);
extern int __lshrdi3(int, int);
extern int __muldi3(int, int);
extern int __divdi3(int, int);
extern void (*__copy_1page)(void *, const void *);
extern void (*bzero_1page)(void *);
extern void ___rw_read_enter(void);
extern void ___rw_read_try(void);
extern void ___rw_read_exit(void);
extern void ___rw_write_enter(void);
/* Networking helper routines. */
EXPORT_SYMBOL(__csum_partial_copy_sparc_generic);
/* Special internal versions of library functions. */
EXPORT_SYMBOL(__copy_1page);
EXPORT_SYMBOL(__memmove);
EXPORT_SYMBOL(bzero_1page);
/* Moving data to/from/in userspace. */
EXPORT_SYMBOL(__copy_user);
/* Used by asm/spinlock.h */
#ifdef CONFIG_SMP
EXPORT_SYMBOL(___rw_read_enter);
EXPORT_SYMBOL(___rw_read_try);
EXPORT_SYMBOL(___rw_read_exit);
EXPORT_SYMBOL(___rw_write_enter);
#endif
EXPORT_SYMBOL(__ashrdi3);
EXPORT_SYMBOL(__ashldi3);
EXPORT_SYMBOL(__lshrdi3);
EXPORT_SYMBOL(__muldi3);
EXPORT_SYMBOL(__divdi3);
#endif
/*
* sparc64
*/
#ifdef CONFIG_SPARC64
/* Networking helper routines. */
EXPORT_SYMBOL(csum_partial_copy_nocheck);
EXPORT_SYMBOL(__csum_partial_copy_from_user);
EXPORT_SYMBOL(__csum_partial_copy_to_user);
EXPORT_SYMBOL(ip_fast_csum);
/* Moving data to/from/in userspace. */
EXPORT_SYMBOL(___copy_to_user);
EXPORT_SYMBOL(___copy_from_user);
EXPORT_SYMBOL(___copy_in_user);
EXPORT_SYMBOL(__clear_user);
/* Atomic counter implementation. */
#define ATOMIC_OP(op) \
EXPORT_SYMBOL(atomic_##op); \
EXPORT_SYMBOL(atomic64_##op);
#define ATOMIC_OP_RETURN(op) \
EXPORT_SYMBOL(atomic_##op##_return); \
EXPORT_SYMBOL(atomic64_##op##_return);
#define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_OP_RETURN(op)
ATOMIC_OPS(add)
ATOMIC_OPS(sub)
#undef ATOMIC_OPS
#undef ATOMIC_OP_RETURN
#undef ATOMIC_OP
EXPORT_SYMBOL(atomic64_dec_if_positive);
/* Atomic bit operations. */
EXPORT_SYMBOL(test_and_set_bit);
EXPORT_SYMBOL(test_and_clear_bit);
EXPORT_SYMBOL(test_and_change_bit);
EXPORT_SYMBOL(set_bit);
EXPORT_SYMBOL(clear_bit);
EXPORT_SYMBOL(change_bit);
/* Special internal versions of library functions. */
EXPORT_SYMBOL(_clear_page);
EXPORT_SYMBOL(clear_user_page);
EXPORT_SYMBOL(copy_user_page);
/* RAID code needs this */
void VISenter(void);
EXPORT_SYMBOL(VISenter);
/* CRYPTO code needs this */
void VISenterhalf(void);
EXPORT_SYMBOL(VISenterhalf);
extern void xor_vis_2(unsigned long, unsigned long *, unsigned long *);
extern void xor_vis_3(unsigned long, unsigned long *, unsigned long *,
unsigned long *);
extern void xor_vis_4(unsigned long, unsigned long *, unsigned long *,
unsigned long *, unsigned long *);
extern void xor_vis_5(unsigned long, unsigned long *, unsigned long *,
unsigned long *, unsigned long *, unsigned long *);
EXPORT_SYMBOL(xor_vis_2);
EXPORT_SYMBOL(xor_vis_3);
EXPORT_SYMBOL(xor_vis_4);
EXPORT_SYMBOL(xor_vis_5);
extern void xor_niagara_2(unsigned long, unsigned long *, unsigned long *);
extern void xor_niagara_3(unsigned long, unsigned long *, unsigned long *,
unsigned long *);
extern void xor_niagara_4(unsigned long, unsigned long *, unsigned long *,
unsigned long *, unsigned long *);
extern void xor_niagara_5(unsigned long, unsigned long *, unsigned long *,
unsigned long *, unsigned long *, unsigned long *);
EXPORT_SYMBOL(xor_niagara_2);
EXPORT_SYMBOL(xor_niagara_3);
EXPORT_SYMBOL(xor_niagara_4);
EXPORT_SYMBOL(xor_niagara_5);
#endif

18
arch/sparc/lib/libgcc.h Normal file
View file

@ -0,0 +1,18 @@
#ifndef __ASM_LIBGCC_H
#define __ASM_LIBGCC_H
#include <asm/byteorder.h>
typedef int word_type __attribute__ ((mode (__word__)));
struct DWstruct {
int high, low;
};
typedef union
{
struct DWstruct s;
long long ll;
} DWunion;
#endif /* __ASM_LIBGCC_H */

92
arch/sparc/lib/locks.S Normal file
View file

@ -0,0 +1,92 @@
/*
* locks.S: SMP low-level lock primitives on Sparc.
*
* Copyright (C) 1996 David S. Miller (davem@caip.rutgers.edu)
* Copyright (C) 1998 Anton Blanchard (anton@progsoc.uts.edu.au)
* Copyright (C) 1998 Jakub Jelinek (jj@ultra.linux.cz)
*/
#include <asm/ptrace.h>
#include <asm/psr.h>
#include <asm/smp.h>
#include <asm/spinlock.h>
.text
.align 4
/* Read/writer locks, as usual this is overly clever to make it
* as fast as possible.
*/
/* caches... */
___rw_read_enter_spin_on_wlock:
orcc %g2, 0x0, %g0
be,a ___rw_read_enter
ldstub [%g1 + 3], %g2
b ___rw_read_enter_spin_on_wlock
ldub [%g1 + 3], %g2
___rw_read_try_spin_on_wlock:
andcc %g2, 0xff, %g0
be,a ___rw_read_try
ldstub [%g1 + 3], %g2
xnorcc %g2, 0x0, %o0 /* if g2 is ~0, set o0 to 0 and bugger off */
bne,a ___rw_read_enter_spin_on_wlock
ld [%g1], %g2
retl
mov %g4, %o7
___rw_read_exit_spin_on_wlock:
orcc %g2, 0x0, %g0
be,a ___rw_read_exit
ldstub [%g1 + 3], %g2
b ___rw_read_exit_spin_on_wlock
ldub [%g1 + 3], %g2
___rw_write_enter_spin_on_wlock:
orcc %g2, 0x0, %g0
be,a ___rw_write_enter
ldstub [%g1 + 3], %g2
b ___rw_write_enter_spin_on_wlock
ld [%g1], %g2
.globl ___rw_read_enter
___rw_read_enter:
orcc %g2, 0x0, %g0
bne,a ___rw_read_enter_spin_on_wlock
ldub [%g1 + 3], %g2
ld [%g1], %g2
add %g2, 1, %g2
st %g2, [%g1]
retl
mov %g4, %o7
.globl ___rw_read_exit
___rw_read_exit:
orcc %g2, 0x0, %g0
bne,a ___rw_read_exit_spin_on_wlock
ldub [%g1 + 3], %g2
ld [%g1], %g2
sub %g2, 0x1ff, %g2
st %g2, [%g1]
retl
mov %g4, %o7
.globl ___rw_read_try
___rw_read_try:
orcc %g2, 0x0, %g0
bne ___rw_read_try_spin_on_wlock
ld [%g1], %g2
add %g2, 1, %g2
st %g2, [%g1]
set 1, %o1
retl
mov %g4, %o7
.globl ___rw_write_enter
___rw_write_enter:
orcc %g2, 0x0, %g0
bne ___rw_write_enter_spin_on_wlock
ld [%g1], %g2
andncc %g2, 0xff, %g0
bne,a ___rw_write_enter_spin_on_wlock
stb %g0, [%g1 + 3]
retl
mov %g4, %o7

27
arch/sparc/lib/lshrdi3.S Normal file
View file

@ -0,0 +1,27 @@
#include <linux/linkage.h>
ENTRY(__lshrdi3)
cmp %o2, 0
be 3f
mov 0x20, %g2
sub %g2, %o2, %g2
cmp %g2, 0
bg 1f
srl %o0, %o2, %o4
clr %o4
neg %g2
b 2f
srl %o0, %g2, %o5
1:
sll %o0, %g2, %g3
srl %o1, %o2, %g2
or %g2, %g3, %o5
2:
mov %o4, %o0
mov %o5, %o1
3:
retl
nop
ENDPROC(__lshrdi3)

123
arch/sparc/lib/mcount.S Normal file
View file

@ -0,0 +1,123 @@
/*
* Copyright (C) 2000 Anton Blanchard (anton@linuxcare.com)
*
* This file implements mcount(), which is used to collect profiling data.
* This can also be tweaked for kernel stack overflow detection.
*/
#include <linux/linkage.h>
/*
* This is the main variant and is called by C code. GCC's -pg option
* automatically instruments every C function with a call to this.
*/
.text
.align 32
.globl _mcount
.type _mcount,#function
.globl mcount
.type mcount,#function
_mcount:
mcount:
#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_DYNAMIC_FTRACE
/* Do nothing, the retl/nop below is all we need. */
#else
sethi %hi(ftrace_trace_function), %g1
sethi %hi(ftrace_stub), %g2
ldx [%g1 + %lo(ftrace_trace_function)], %g1
or %g2, %lo(ftrace_stub), %g2
cmp %g1, %g2
be,pn %icc, 1f
mov %i7, %g3
save %sp, -176, %sp
mov %g3, %o1
jmpl %g1, %o7
mov %i7, %o0
ret
restore
/* not reached */
1:
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
sethi %hi(ftrace_graph_return), %g1
ldx [%g1 + %lo(ftrace_graph_return)], %g3
cmp %g2, %g3
bne,pn %xcc, 5f
sethi %hi(ftrace_graph_entry_stub), %g2
sethi %hi(ftrace_graph_entry), %g1
or %g2, %lo(ftrace_graph_entry_stub), %g2
ldx [%g1 + %lo(ftrace_graph_entry)], %g1
cmp %g1, %g2
be,pt %xcc, 2f
nop
5: mov %i7, %g2
mov %fp, %g3
save %sp, -176, %sp
mov %g2, %l0
ba,pt %xcc, ftrace_graph_caller
mov %g3, %l1
#endif
2:
#endif
#endif
retl
nop
.size _mcount,.-_mcount
.size mcount,.-mcount
#ifdef CONFIG_FUNCTION_TRACER
.globl ftrace_stub
.type ftrace_stub,#function
ftrace_stub:
retl
nop
.size ftrace_stub,.-ftrace_stub
#ifdef CONFIG_DYNAMIC_FTRACE
.globl ftrace_caller
.type ftrace_caller,#function
ftrace_caller:
mov %i7, %g2
mov %fp, %g3
save %sp, -176, %sp
mov %g2, %o1
mov %g2, %l0
mov %g3, %l1
.globl ftrace_call
ftrace_call:
call ftrace_stub
mov %i7, %o0
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
.globl ftrace_graph_call
ftrace_graph_call:
call ftrace_stub
nop
#endif
ret
restore
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
.size ftrace_graph_call,.-ftrace_graph_call
#endif
.size ftrace_call,.-ftrace_call
.size ftrace_caller,.-ftrace_caller
#endif
#endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
ENTRY(ftrace_graph_caller)
mov %l0, %o0
mov %i7, %o1
call prepare_ftrace_return
mov %l1, %o2
ret
restore %o0, -8, %i7
END(ftrace_graph_caller)
ENTRY(return_to_handler)
save %sp, -176, %sp
call ftrace_return_to_handler
mov %fp, %o0
jmpl %o0 + 8, %g0
restore
END(return_to_handler)
#endif

27
arch/sparc/lib/memcmp.S Normal file
View file

@ -0,0 +1,27 @@
/* Sparc optimized memcmp code.
*
* Copyright (C) 1997 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
* Copyright (C) 2000, 2008 David S. Miller (davem@davemloft.net)
*/
#include <linux/linkage.h>
#include <asm/asm.h>
.text
ENTRY(memcmp)
cmp %o2, 0
1: BRANCH32(be, pn, 2f)
nop
ldub [%o0], %g7
ldub [%o1], %g3
sub %o2, 1, %o2
add %o0, 1, %o0
add %o1, 1, %o1
subcc %g7, %g3, %g3
BRANCH32(be, pt, 1b)
cmp %o2, 0
retl
mov %g3, %o0
2: retl
mov 0, %o0
ENDPROC(memcmp)

541
arch/sparc/lib/memcpy.S Normal file
View file

@ -0,0 +1,541 @@
/* memcpy.S: Sparc optimized memcpy and memmove code
* Hand optimized from GNU libc's memcpy and memmove
* Copyright (C) 1991,1996 Free Software Foundation
* Copyright (C) 1995 Linus Torvalds (Linus.Torvalds@helsinki.fi)
* Copyright (C) 1996 David S. Miller (davem@caip.rutgers.edu)
* Copyright (C) 1996 Eddie C. Dost (ecd@skynet.be)
* Copyright (C) 1996 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
*/
#define FUNC(x) \
.globl x; \
.type x,@function; \
.align 4; \
x:
/* Both these macros have to start with exactly the same insn */
#define MOVE_BIGCHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [%src + (offset) + 0x00], %t0; \
ldd [%src + (offset) + 0x08], %t2; \
ldd [%src + (offset) + 0x10], %t4; \
ldd [%src + (offset) + 0x18], %t6; \
st %t0, [%dst + (offset) + 0x00]; \
st %t1, [%dst + (offset) + 0x04]; \
st %t2, [%dst + (offset) + 0x08]; \
st %t3, [%dst + (offset) + 0x0c]; \
st %t4, [%dst + (offset) + 0x10]; \
st %t5, [%dst + (offset) + 0x14]; \
st %t6, [%dst + (offset) + 0x18]; \
st %t7, [%dst + (offset) + 0x1c];
#define MOVE_BIGALIGNCHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [%src + (offset) + 0x00], %t0; \
ldd [%src + (offset) + 0x08], %t2; \
ldd [%src + (offset) + 0x10], %t4; \
ldd [%src + (offset) + 0x18], %t6; \
std %t0, [%dst + (offset) + 0x00]; \
std %t2, [%dst + (offset) + 0x08]; \
std %t4, [%dst + (offset) + 0x10]; \
std %t6, [%dst + (offset) + 0x18];
#define MOVE_LASTCHUNK(src, dst, offset, t0, t1, t2, t3) \
ldd [%src - (offset) - 0x10], %t0; \
ldd [%src - (offset) - 0x08], %t2; \
st %t0, [%dst - (offset) - 0x10]; \
st %t1, [%dst - (offset) - 0x0c]; \
st %t2, [%dst - (offset) - 0x08]; \
st %t3, [%dst - (offset) - 0x04];
#define MOVE_LASTALIGNCHUNK(src, dst, offset, t0, t1, t2, t3) \
ldd [%src - (offset) - 0x10], %t0; \
ldd [%src - (offset) - 0x08], %t2; \
std %t0, [%dst - (offset) - 0x10]; \
std %t2, [%dst - (offset) - 0x08];
#define MOVE_SHORTCHUNK(src, dst, offset, t0, t1) \
ldub [%src - (offset) - 0x02], %t0; \
ldub [%src - (offset) - 0x01], %t1; \
stb %t0, [%dst - (offset) - 0x02]; \
stb %t1, [%dst - (offset) - 0x01];
/* Both these macros have to start with exactly the same insn */
#define RMOVE_BIGCHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [%src - (offset) - 0x20], %t0; \
ldd [%src - (offset) - 0x18], %t2; \
ldd [%src - (offset) - 0x10], %t4; \
ldd [%src - (offset) - 0x08], %t6; \
st %t0, [%dst - (offset) - 0x20]; \
st %t1, [%dst - (offset) - 0x1c]; \
st %t2, [%dst - (offset) - 0x18]; \
st %t3, [%dst - (offset) - 0x14]; \
st %t4, [%dst - (offset) - 0x10]; \
st %t5, [%dst - (offset) - 0x0c]; \
st %t6, [%dst - (offset) - 0x08]; \
st %t7, [%dst - (offset) - 0x04];
#define RMOVE_BIGALIGNCHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, t7) \
ldd [%src - (offset) - 0x20], %t0; \
ldd [%src - (offset) - 0x18], %t2; \
ldd [%src - (offset) - 0x10], %t4; \
ldd [%src - (offset) - 0x08], %t6; \
std %t0, [%dst - (offset) - 0x20]; \
std %t2, [%dst - (offset) - 0x18]; \
std %t4, [%dst - (offset) - 0x10]; \
std %t6, [%dst - (offset) - 0x08];
#define RMOVE_LASTCHUNK(src, dst, offset, t0, t1, t2, t3) \
ldd [%src + (offset) + 0x00], %t0; \
ldd [%src + (offset) + 0x08], %t2; \
st %t0, [%dst + (offset) + 0x00]; \
st %t1, [%dst + (offset) + 0x04]; \
st %t2, [%dst + (offset) + 0x08]; \
st %t3, [%dst + (offset) + 0x0c];
#define RMOVE_SHORTCHUNK(src, dst, offset, t0, t1) \
ldub [%src + (offset) + 0x00], %t0; \
ldub [%src + (offset) + 0x01], %t1; \
stb %t0, [%dst + (offset) + 0x00]; \
stb %t1, [%dst + (offset) + 0x01];
#define SMOVE_CHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, prev, shil, shir, offset2) \
ldd [%src + (offset) + 0x00], %t0; \
ldd [%src + (offset) + 0x08], %t2; \
srl %t0, shir, %t5; \
srl %t1, shir, %t6; \
sll %t0, shil, %t0; \
or %t5, %prev, %t5; \
sll %t1, shil, %prev; \
or %t6, %t0, %t0; \
srl %t2, shir, %t1; \
srl %t3, shir, %t6; \
sll %t2, shil, %t2; \
or %t1, %prev, %t1; \
std %t4, [%dst + (offset) + (offset2) - 0x04]; \
std %t0, [%dst + (offset) + (offset2) + 0x04]; \
sll %t3, shil, %prev; \
or %t6, %t2, %t4;
#define SMOVE_ALIGNCHUNK(src, dst, offset, t0, t1, t2, t3, t4, t5, t6, prev, shil, shir, offset2) \
ldd [%src + (offset) + 0x00], %t0; \
ldd [%src + (offset) + 0x08], %t2; \
srl %t0, shir, %t4; \
srl %t1, shir, %t5; \
sll %t0, shil, %t6; \
or %t4, %prev, %t0; \
sll %t1, shil, %prev; \
or %t5, %t6, %t1; \
srl %t2, shir, %t4; \
srl %t3, shir, %t5; \
sll %t2, shil, %t6; \
or %t4, %prev, %t2; \
sll %t3, shil, %prev; \
or %t5, %t6, %t3; \
std %t0, [%dst + (offset) + (offset2) + 0x00]; \
std %t2, [%dst + (offset) + (offset2) + 0x08];
.text
.align 4
0:
retl
nop ! Only bcopy returns here and it retuns void...
#ifdef __KERNEL__
FUNC(amemmove)
FUNC(__memmove)
#endif
FUNC(memmove)
cmp %o0, %o1
mov %o0, %g7
bleu 9f
sub %o0, %o1, %o4
add %o1, %o2, %o3
cmp %o3, %o0
bleu 0f
andcc %o4, 3, %o5
add %o1, %o2, %o1
add %o0, %o2, %o0
sub %o1, 1, %o1
sub %o0, 1, %o0
1: /* reverse_bytes */
ldub [%o1], %o4
subcc %o2, 1, %o2
stb %o4, [%o0]
sub %o1, 1, %o1
bne 1b
sub %o0, 1, %o0
retl
mov %g7, %o0
/* NOTE: This code is executed just for the cases,
where %src (=%o1) & 3 is != 0.
We need to align it to 4. So, for (%src & 3)
1 we need to do ldub,lduh
2 lduh
3 just ldub
so even if it looks weird, the branches
are correct here. -jj
*/
78: /* dword_align */
andcc %o1, 1, %g0
be 4f
andcc %o1, 2, %g0
ldub [%o1], %g2
add %o1, 1, %o1
stb %g2, [%o0]
sub %o2, 1, %o2
bne 3f
add %o0, 1, %o0
4:
lduh [%o1], %g2
add %o1, 2, %o1
sth %g2, [%o0]
sub %o2, 2, %o2
b 3f
add %o0, 2, %o0
FUNC(memcpy) /* %o0=dst %o1=src %o2=len */
sub %o0, %o1, %o4
mov %o0, %g7
9:
andcc %o4, 3, %o5
0:
bne 86f
cmp %o2, 15
bleu 90f
andcc %o1, 3, %g0
bne 78b
3:
andcc %o1, 4, %g0
be 2f
mov %o2, %g1
ld [%o1], %o4
sub %g1, 4, %g1
st %o4, [%o0]
add %o1, 4, %o1
add %o0, 4, %o0
2:
andcc %g1, 0xffffff80, %g0
be 3f
andcc %o0, 4, %g0
be 82f + 4
5:
MOVE_BIGCHUNK(o1, o0, 0x00, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGCHUNK(o1, o0, 0x20, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGCHUNK(o1, o0, 0x40, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGCHUNK(o1, o0, 0x60, o2, o3, o4, o5, g2, g3, g4, g5)
sub %g1, 128, %g1
add %o1, 128, %o1
cmp %g1, 128
bge 5b
add %o0, 128, %o0
3:
andcc %g1, 0x70, %g4
be 80f
andcc %g1, 8, %g0
sethi %hi(80f), %o5
srl %g4, 1, %o4
add %g4, %o4, %o4
add %o1, %g4, %o1
sub %o5, %o4, %o5
jmpl %o5 + %lo(80f), %g0
add %o0, %g4, %o0
79: /* memcpy_table */
MOVE_LASTCHUNK(o1, o0, 0x60, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x50, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x40, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x30, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x20, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x10, g2, g3, g4, g5)
MOVE_LASTCHUNK(o1, o0, 0x00, g2, g3, g4, g5)
80: /* memcpy_table_end */
be 81f
andcc %g1, 4, %g0
ldd [%o1], %g2
add %o0, 8, %o0
st %g2, [%o0 - 0x08]
add %o1, 8, %o1
st %g3, [%o0 - 0x04]
81: /* memcpy_last7 */
be 1f
andcc %g1, 2, %g0
ld [%o1], %g2
add %o1, 4, %o1
st %g2, [%o0]
add %o0, 4, %o0
1:
be 1f
andcc %g1, 1, %g0
lduh [%o1], %g2
add %o1, 2, %o1
sth %g2, [%o0]
add %o0, 2, %o0
1:
be 1f
nop
ldub [%o1], %g2
stb %g2, [%o0]
1:
retl
mov %g7, %o0
82: /* ldd_std */
MOVE_BIGALIGNCHUNK(o1, o0, 0x00, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGALIGNCHUNK(o1, o0, 0x20, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGALIGNCHUNK(o1, o0, 0x40, o2, o3, o4, o5, g2, g3, g4, g5)
MOVE_BIGALIGNCHUNK(o1, o0, 0x60, o2, o3, o4, o5, g2, g3, g4, g5)
subcc %g1, 128, %g1
add %o1, 128, %o1
cmp %g1, 128
bge 82b
add %o0, 128, %o0
andcc %g1, 0x70, %g4
be 84f
andcc %g1, 8, %g0
sethi %hi(84f), %o5
add %o1, %g4, %o1
sub %o5, %g4, %o5
jmpl %o5 + %lo(84f), %g0
add %o0, %g4, %o0
83: /* amemcpy_table */
MOVE_LASTALIGNCHUNK(o1, o0, 0x60, g2, g3, g4, g5)
MOVE_LASTALIGNCHUNK(o1, o0, 0x50, g2, g3, g4, g5)
MOVE_LASTALIGNCHUNK(o1, o0, 0x40, g2, g3, g4, g5)
MOVE_LASTALIGNCHUNK(o1, o0, 0x30, g2, g3, g4, g5)
MOVE_LASTALIGNCHUNK(o1, o0, 0x20, g2, g3, g4, g5)
MOVE_LASTALIGNCHUNK(o1, o0, 0x10, g2, g3, g4, g5)
MOVE_LASTALIGNCHUNK(o1, o0, 0x00, g2, g3, g4, g5)
84: /* amemcpy_table_end */
be 85f
andcc %g1, 4, %g0
ldd [%o1], %g2
add %o0, 8, %o0
std %g2, [%o0 - 0x08]
add %o1, 8, %o1
85: /* amemcpy_last7 */
be 1f
andcc %g1, 2, %g0
ld [%o1], %g2
add %o1, 4, %o1
st %g2, [%o0]
add %o0, 4, %o0
1:
be 1f
andcc %g1, 1, %g0
lduh [%o1], %g2
add %o1, 2, %o1
sth %g2, [%o0]
add %o0, 2, %o0
1:
be 1f
nop
ldub [%o1], %g2
stb %g2, [%o0]
1:
retl
mov %g7, %o0
86: /* non_aligned */
cmp %o2, 6
bleu 88f
nop
save %sp, -96, %sp
andcc %i0, 3, %g0
be 61f
andcc %i0, 1, %g0
be 60f
andcc %i0, 2, %g0
ldub [%i1], %g5
add %i1, 1, %i1
stb %g5, [%i0]
sub %i2, 1, %i2
bne 61f
add %i0, 1, %i0
60:
ldub [%i1], %g3
add %i1, 2, %i1
stb %g3, [%i0]
sub %i2, 2, %i2
ldub [%i1 - 1], %g3
add %i0, 2, %i0
stb %g3, [%i0 - 1]
61:
and %i1, 3, %g2
and %i2, 0xc, %g3
and %i1, -4, %i1
cmp %g3, 4
sll %g2, 3, %g4
mov 32, %g2
be 4f
sub %g2, %g4, %l0
blu 3f
cmp %g3, 0x8
be 2f
srl %i2, 2, %g3
ld [%i1], %i3
add %i0, -8, %i0
ld [%i1 + 4], %i4
b 8f
add %g3, 1, %g3
2:
ld [%i1], %i4
add %i0, -12, %i0
ld [%i1 + 4], %i5
add %g3, 2, %g3
b 9f
add %i1, -4, %i1
3:
ld [%i1], %g1
add %i0, -4, %i0
ld [%i1 + 4], %i3
srl %i2, 2, %g3
b 7f
add %i1, 4, %i1
4:
ld [%i1], %i5
cmp %i2, 7
ld [%i1 + 4], %g1
srl %i2, 2, %g3
bleu 10f
add %i1, 8, %i1
ld [%i1], %i3
add %g3, -1, %g3
5:
sll %i5, %g4, %g2
srl %g1, %l0, %g5
or %g2, %g5, %g2
st %g2, [%i0]
7:
ld [%i1 + 4], %i4
sll %g1, %g4, %g2
srl %i3, %l0, %g5
or %g2, %g5, %g2
st %g2, [%i0 + 4]
8:
ld [%i1 + 8], %i5
sll %i3, %g4, %g2
srl %i4, %l0, %g5
or %g2, %g5, %g2
st %g2, [%i0 + 8]
9:
ld [%i1 + 12], %g1
sll %i4, %g4, %g2
srl %i5, %l0, %g5
addcc %g3, -4, %g3
or %g2, %g5, %g2
add %i1, 16, %i1
st %g2, [%i0 + 12]
add %i0, 16, %i0
bne,a 5b
ld [%i1], %i3
10:
sll %i5, %g4, %g2
srl %g1, %l0, %g5
srl %l0, 3, %g3
or %g2, %g5, %g2
sub %i1, %g3, %i1
andcc %i2, 2, %g0
st %g2, [%i0]
be 1f
andcc %i2, 1, %g0
ldub [%i1], %g2
add %i1, 2, %i1
stb %g2, [%i0 + 4]
add %i0, 2, %i0
ldub [%i1 - 1], %g2
stb %g2, [%i0 + 3]
1:
be 1f
nop
ldub [%i1], %g2
stb %g2, [%i0 + 4]
1:
ret
restore %g7, %g0, %o0
88: /* short_end */
and %o2, 0xe, %o3
20:
sethi %hi(89f), %o5
sll %o3, 3, %o4
add %o0, %o3, %o0
sub %o5, %o4, %o5
add %o1, %o3, %o1
jmpl %o5 + %lo(89f), %g0
andcc %o2, 1, %g0
MOVE_SHORTCHUNK(o1, o0, 0x0c, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x0a, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x08, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x06, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x04, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x02, g2, g3)
MOVE_SHORTCHUNK(o1, o0, 0x00, g2, g3)
89: /* short_table_end */
be 1f
nop
ldub [%o1], %g2
stb %g2, [%o0]
1:
retl
mov %g7, %o0
90: /* short_aligned_end */
bne 88b
andcc %o2, 8, %g0
be 1f
andcc %o2, 4, %g0
ld [%o1 + 0x00], %g2
ld [%o1 + 0x04], %g3
add %o1, 8, %o1
st %g2, [%o0 + 0x00]
st %g3, [%o0 + 0x04]
add %o0, 8, %o0
1:
b 81b
mov %o2, %g1

59
arch/sparc/lib/memmove.S Normal file
View file

@ -0,0 +1,59 @@
/* memmove.S: Simple memmove implementation.
*
* Copyright (C) 1997, 2004 David S. Miller (davem@redhat.com)
* Copyright (C) 1996, 1997, 1998, 1999 Jakub Jelinek (jj@ultra.linux.cz)
*/
#include <linux/linkage.h>
.text
ENTRY(memmove) /* o0=dst o1=src o2=len */
brz,pn %o2, 99f
mov %o0, %g1
cmp %o0, %o1
bleu,pt %xcc, 2f
add %o1, %o2, %g7
cmp %g7, %o0
bleu,pt %xcc, memcpy
add %o0, %o2, %o5
sub %g7, 1, %o1
sub %o5, 1, %o0
1: ldub [%o1], %g7
subcc %o2, 1, %o2
sub %o1, 1, %o1
stb %g7, [%o0]
bne,pt %icc, 1b
sub %o0, 1, %o0
99:
retl
mov %g1, %o0
/* We can't just call memcpy for these memmove cases. On some
* chips the memcpy uses cache initializing stores and when dst
* and src are close enough, those can clobber the source data
* before we've loaded it in.
*/
2: or %o0, %o1, %g7
or %o2, %g7, %g7
andcc %g7, 0x7, %g0
bne,pn %xcc, 4f
nop
3: ldx [%o1], %g7
add %o1, 8, %o1
subcc %o2, 8, %o2
add %o0, 8, %o0
bne,pt %icc, 3b
stx %g7, [%o0 - 0x8]
ba,a,pt %xcc, 99b
4: ldub [%o1], %g7
add %o1, 1, %o1
subcc %o2, 1, %o2
add %o0, 1, %o0
bne,pt %icc, 4b
stb %g7, [%o0 - 0x1]
ba,a,pt %xcc, 99b
ENDPROC(memmove)

133
arch/sparc/lib/memscan_32.S Normal file
View file

@ -0,0 +1,133 @@
/*
* memscan.S: Optimized memscan for the Sparc.
*
* Copyright (C) 1996 David S. Miller (davem@caip.rutgers.edu)
*/
/* In essence, this is just a fancy strlen. */
#define LO_MAGIC 0x01010101
#define HI_MAGIC 0x80808080
.text
.align 4
.globl __memscan_zero, __memscan_generic
.globl memscan
__memscan_zero:
/* %o0 = addr, %o1 = size */
cmp %o1, 0
bne,a 1f
andcc %o0, 3, %g0
retl
nop
1:
be mzero_scan_word
sethi %hi(HI_MAGIC), %g2
ldsb [%o0], %g3
mzero_still_not_word_aligned:
cmp %g3, 0
bne 1f
add %o0, 1, %o0
retl
sub %o0, 1, %o0
1:
subcc %o1, 1, %o1
bne,a 1f
andcc %o0, 3, %g0
retl
nop
1:
bne,a mzero_still_not_word_aligned
ldsb [%o0], %g3
sethi %hi(HI_MAGIC), %g2
mzero_scan_word:
or %g2, %lo(HI_MAGIC), %o3
sethi %hi(LO_MAGIC), %g3
or %g3, %lo(LO_MAGIC), %o2
mzero_next_word:
ld [%o0], %g2
mzero_next_word_preloaded:
sub %g2, %o2, %g2
mzero_next_word_preloaded_next:
andcc %g2, %o3, %g0
bne mzero_byte_zero
add %o0, 4, %o0
mzero_check_out_of_fuel:
subcc %o1, 4, %o1
bg,a 1f
ld [%o0], %g2
retl
nop
1:
b mzero_next_word_preloaded_next
sub %g2, %o2, %g2
/* Check every byte. */
mzero_byte_zero:
ldsb [%o0 - 4], %g2
cmp %g2, 0
bne mzero_byte_one
sub %o0, 4, %g3
retl
mov %g3, %o0
mzero_byte_one:
ldsb [%o0 - 3], %g2
cmp %g2, 0
bne,a mzero_byte_two_and_three
ldsb [%o0 - 2], %g2
retl
sub %o0, 3, %o0
mzero_byte_two_and_three:
cmp %g2, 0
bne,a 1f
ldsb [%o0 - 1], %g2
retl
sub %o0, 2, %o0
1:
cmp %g2, 0
bne,a mzero_next_word_preloaded
ld [%o0], %g2
retl
sub %o0, 1, %o0
mzero_found_it:
retl
sub %o0, 2, %o0
memscan:
__memscan_generic:
/* %o0 = addr, %o1 = c, %o2 = size */
cmp %o2, 0
bne,a 0f
ldub [%o0], %g2
b,a 2f
1:
ldub [%o0], %g2
0:
cmp %g2, %o1
be 2f
addcc %o2, -1, %o2
bne 1b
add %o0, 1, %o0
2:
retl
nop

129
arch/sparc/lib/memscan_64.S Normal file
View file

@ -0,0 +1,129 @@
/*
* memscan.S: Optimized memscan for Sparc64.
*
* Copyright (C) 1997,1998 Jakub Jelinek (jj@ultra.linux.cz)
* Copyright (C) 1998 David S. Miller (davem@redhat.com)
*/
#define HI_MAGIC 0x8080808080808080
#define LO_MAGIC 0x0101010101010101
#define ASI_PL 0x88
.text
.align 32
.globl __memscan_zero, __memscan_generic
.globl memscan
__memscan_zero:
/* %o0 = bufp, %o1 = size */
brlez,pn %o1, szzero
andcc %o0, 7, %g0
be,pt %icc, we_are_aligned
sethi %hi(HI_MAGIC), %o4
ldub [%o0], %o5
1: subcc %o1, 1, %o1
brz,pn %o5, 10f
add %o0, 1, %o0
be,pn %xcc, szzero
andcc %o0, 7, %g0
bne,a,pn %icc, 1b
ldub [%o0], %o5
we_are_aligned:
ldxa [%o0] ASI_PL, %o5
or %o4, %lo(HI_MAGIC), %o3
sllx %o3, 32, %o4
or %o4, %o3, %o3
srlx %o3, 7, %o2
msloop:
sub %o1, 8, %o1
add %o0, 8, %o0
sub %o5, %o2, %o4
xor %o4, %o5, %o4
andcc %o4, %o3, %g3
bne,pn %xcc, check_bytes
srlx %o4, 32, %g3
brgz,a,pt %o1, msloop
ldxa [%o0] ASI_PL, %o5
check_bytes:
bne,a,pn %icc, 2f
andcc %o5, 0xff, %g0
add %o0, -5, %g2
ba,pt %xcc, 3f
srlx %o5, 32, %g7
2: srlx %o5, 8, %g7
be,pn %icc, 1f
add %o0, -8, %g2
andcc %g7, 0xff, %g0
srlx %g7, 8, %g7
be,pn %icc, 1f
inc %g2
andcc %g7, 0xff, %g0
srlx %g7, 8, %g7
be,pn %icc, 1f
inc %g2
andcc %g7, 0xff, %g0
srlx %g7, 8, %g7
be,pn %icc, 1f
inc %g2
andcc %g3, %o3, %g0
be,a,pn %icc, 2f
mov %o0, %g2
3: andcc %g7, 0xff, %g0
srlx %g7, 8, %g7
be,pn %icc, 1f
inc %g2
andcc %g7, 0xff, %g0
srlx %g7, 8, %g7
be,pn %icc, 1f
inc %g2
andcc %g7, 0xff, %g0
srlx %g7, 8, %g7
be,pn %icc, 1f
inc %g2
andcc %g7, 0xff, %g0
srlx %g7, 8, %g7
be,pn %icc, 1f
inc %g2
2: brgz,a,pt %o1, msloop
ldxa [%o0] ASI_PL, %o5
inc %g2
1: add %o0, %o1, %o0
cmp %g2, %o0
retl
movle %xcc, %g2, %o0
10: retl
sub %o0, 1, %o0
szzero: retl
nop
memscan:
__memscan_generic:
/* %o0 = addr, %o1 = c, %o2 = size */
brz,pn %o2, 3f
add %o0, %o2, %o3
ldub [%o0], %o5
sub %g0, %o2, %o4
1:
cmp %o5, %o1
be,pn %icc, 2f
addcc %o4, 1, %o4
bne,a,pt %xcc, 1b
ldub [%o3 + %o4], %o5
retl
/* The delay slot is the same as the next insn, this is just to make it look more awful */
2:
add %o3, %o4, %o0
retl
sub %o0, 1, %o0
3:
retl
nop

212
arch/sparc/lib/memset.S Normal file
View file

@ -0,0 +1,212 @@
/* linux/arch/sparc/lib/memset.S: Sparc optimized memset, bzero and clear_user code
* Copyright (C) 1991,1996 Free Software Foundation
* Copyright (C) 1996,1997 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
* Copyright (C) 1996 David S. Miller (davem@caip.rutgers.edu)
*
* Calls to memset returns initial %o0. Calls to bzero returns 0, if ok, and
* number of bytes not yet set if exception occurs and we were called as
* clear_user.
*/
#include <asm/ptrace.h>
/* Work around cpp -rob */
#define ALLOC #alloc
#define EXECINSTR #execinstr
#define EX(x,y,a,b) \
98: x,y; \
.section .fixup,ALLOC,EXECINSTR; \
.align 4; \
99: ba 30f; \
a, b, %o0; \
.section __ex_table,ALLOC; \
.align 4; \
.word 98b, 99b; \
.text; \
.align 4
#define EXT(start,end,handler) \
.section __ex_table,ALLOC; \
.align 4; \
.word start, 0, end, handler; \
.text; \
.align 4
/* Please don't change these macros, unless you change the logic
* in the .fixup section below as well.
* Store 64 bytes at (BASE + OFFSET) using value SOURCE. */
#define ZERO_BIG_BLOCK(base, offset, source) \
std source, [base + offset + 0x00]; \
std source, [base + offset + 0x08]; \
std source, [base + offset + 0x10]; \
std source, [base + offset + 0x18]; \
std source, [base + offset + 0x20]; \
std source, [base + offset + 0x28]; \
std source, [base + offset + 0x30]; \
std source, [base + offset + 0x38];
#define ZERO_LAST_BLOCKS(base, offset, source) \
std source, [base - offset - 0x38]; \
std source, [base - offset - 0x30]; \
std source, [base - offset - 0x28]; \
std source, [base - offset - 0x20]; \
std source, [base - offset - 0x18]; \
std source, [base - offset - 0x10]; \
std source, [base - offset - 0x08]; \
std source, [base - offset - 0x00];
.text
.align 4
.globl __bzero_begin
__bzero_begin:
.globl __bzero
.globl memset
.globl __memset_start, __memset_end
__memset_start:
memset:
mov %o0, %g1
mov 1, %g4
and %o1, 0xff, %g3
sll %g3, 8, %g2
or %g3, %g2, %g3
sll %g3, 16, %g2
or %g3, %g2, %g3
b 1f
mov %o2, %o1
3:
cmp %o2, 3
be 2f
EX(stb %g3, [%o0], sub %o1, 0)
cmp %o2, 2
be 2f
EX(stb %g3, [%o0 + 0x01], sub %o1, 1)
EX(stb %g3, [%o0 + 0x02], sub %o1, 2)
2:
sub %o2, 4, %o2
add %o1, %o2, %o1
b 4f
sub %o0, %o2, %o0
__bzero:
clr %g4
mov %g0, %g3
1:
cmp %o1, 7
bleu 7f
andcc %o0, 3, %o2
bne 3b
4:
andcc %o0, 4, %g0
be 2f
mov %g3, %g2
EX(st %g3, [%o0], sub %o1, 0)
sub %o1, 4, %o1
add %o0, 4, %o0
2:
andcc %o1, 0xffffff80, %o3 ! Now everything is 8 aligned and o1 is len to run
be 9f
andcc %o1, 0x78, %o2
10:
ZERO_BIG_BLOCK(%o0, 0x00, %g2)
subcc %o3, 128, %o3
ZERO_BIG_BLOCK(%o0, 0x40, %g2)
11:
EXT(10b, 11b, 20f)
bne 10b
add %o0, 128, %o0
orcc %o2, %g0, %g0
9:
be 13f
andcc %o1, 7, %o1
srl %o2, 1, %o3
set 13f, %o4
sub %o4, %o3, %o4
jmp %o4
add %o0, %o2, %o0
12:
ZERO_LAST_BLOCKS(%o0, 0x48, %g2)
ZERO_LAST_BLOCKS(%o0, 0x08, %g2)
13:
be 8f
andcc %o1, 4, %g0
be 1f
andcc %o1, 2, %g0
EX(st %g3, [%o0], and %o1, 7)
add %o0, 4, %o0
1:
be 1f
andcc %o1, 1, %g0
EX(sth %g3, [%o0], and %o1, 3)
add %o0, 2, %o0
1:
bne,a 8f
EX(stb %g3, [%o0], and %o1, 1)
8:
b 0f
nop
7:
be 13b
orcc %o1, 0, %g0
be 0f
8:
add %o0, 1, %o0
subcc %o1, 1, %o1
bne 8b
EX(stb %g3, [%o0 - 1], add %o1, 1)
0:
andcc %g4, 1, %g0
be 5f
nop
retl
mov %g1, %o0
5:
retl
clr %o0
__memset_end:
.section .fixup,#alloc,#execinstr
.align 4
20:
cmp %g2, 8
bleu 1f
and %o1, 0x7f, %o1
sub %g2, 9, %g2
add %o3, 64, %o3
1:
sll %g2, 3, %g2
add %o3, %o1, %o0
b 30f
sub %o0, %g2, %o0
21:
mov 8, %o0
and %o1, 7, %o1
sub %o0, %g2, %o0
sll %o0, 3, %o0
b 30f
add %o0, %o1, %o0
30:
/* %o4 is faulting address, %o5 is %pc where fault occurred */
save %sp, -104, %sp
mov %i5, %o0
mov %i7, %o1
call lookup_fault
mov %i4, %o2
ret
restore
.globl __bzero_end
__bzero_end:

76
arch/sparc/lib/muldi3.S Normal file
View file

@ -0,0 +1,76 @@
/* Copyright (C) 1989, 1992, 1993, 1994, 1995 Free Software Foundation, Inc.
This file is part of GNU CC.
GNU CC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
GNU CC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU CC; see the file COPYING. If not, write to
the Free Software Foundation, 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
.text
.align 4
.globl __muldi3
__muldi3:
save %sp, -104, %sp
wr %g0, %i1, %y
sra %i3, 0x1f, %g2
and %i1, %g2, %g2
andcc %g0, 0, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, %i3, %g1
mulscc %g1, 0, %g1
add %g1, %g2, %l2
rd %y, %o1
mov %o1, %l3
mov %i1, %o0
mov %i2, %o1
umul %o0, %o1, %o0
mov %o0, %l0
mov %i0, %o0
mov %i3, %o1
umul %o0, %o1, %o0
add %l0, %o0, %l0
mov %l2, %i0
add %l2, %l0, %i0
ret
restore %g0, %l3, %o1

80
arch/sparc/lib/strlen.S Normal file
View file

@ -0,0 +1,80 @@
/* strlen.S: Sparc optimized strlen code
* Hand optimized from GNU libc's strlen
* Copyright (C) 1991,1996 Free Software Foundation
* Copyright (C) 1996,2008 David S. Miller (davem@davemloft.net)
* Copyright (C) 1996, 1997 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
*/
#include <linux/linkage.h>
#include <asm/asm.h>
#define LO_MAGIC 0x01010101
#define HI_MAGIC 0x80808080
.text
ENTRY(strlen)
mov %o0, %o1
andcc %o0, 3, %g0
BRANCH32(be, pt, 9f)
sethi %hi(HI_MAGIC), %o4
ldub [%o0], %o5
BRANCH_REG_ZERO(pn, %o5, 11f)
add %o0, 1, %o0
andcc %o0, 3, %g0
BRANCH32(be, pn, 4f)
or %o4, %lo(HI_MAGIC), %o3
ldub [%o0], %o5
BRANCH_REG_ZERO(pn, %o5, 12f)
add %o0, 1, %o0
andcc %o0, 3, %g0
BRANCH32(be, pt, 5f)
sethi %hi(LO_MAGIC), %o4
ldub [%o0], %o5
BRANCH_REG_ZERO(pn, %o5, 13f)
add %o0, 1, %o0
BRANCH32(ba, pt, 8f)
or %o4, %lo(LO_MAGIC), %o2
9:
or %o4, %lo(HI_MAGIC), %o3
4:
sethi %hi(LO_MAGIC), %o4
5:
or %o4, %lo(LO_MAGIC), %o2
8:
ld [%o0], %o5
2:
sub %o5, %o2, %o4
andcc %o4, %o3, %g0
BRANCH32(be, pt, 8b)
add %o0, 4, %o0
/* Check every byte. */
srl %o5, 24, %g7
andcc %g7, 0xff, %g0
BRANCH32(be, pn, 1f)
add %o0, -4, %o4
srl %o5, 16, %g7
andcc %g7, 0xff, %g0
BRANCH32(be, pn, 1f)
add %o4, 1, %o4
srl %o5, 8, %g7
andcc %g7, 0xff, %g0
BRANCH32(be, pn, 1f)
add %o4, 1, %o4
andcc %o5, 0xff, %g0
BRANCH32_ANNUL(bne, pt, 2b)
ld [%o0], %o5
add %o4, 1, %o4
1:
retl
sub %o4, %o1, %o0
11:
retl
mov 0, %o0
12:
retl
mov 1, %o0
13:
retl
mov 2, %o0
ENDPROC(strlen)

118
arch/sparc/lib/strncmp_32.S Normal file
View file

@ -0,0 +1,118 @@
/*
* strncmp.S: Hand optimized Sparc assembly of GCC output from GNU libc
* generic strncmp routine.
*/
#include <linux/linkage.h>
.text
ENTRY(strncmp)
mov %o0, %g3
mov 0, %o3
cmp %o2, 3
ble 7f
mov 0, %g2
sra %o2, 2, %o4
ldub [%g3], %o3
0:
ldub [%o1], %g2
add %g3, 1, %g3
and %o3, 0xff, %o0
cmp %o0, 0
be 8f
add %o1, 1, %o1
cmp %o0, %g2
be,a 1f
ldub [%g3], %o3
retl
sub %o0, %g2, %o0
1:
ldub [%o1], %g2
add %g3,1, %g3
and %o3, 0xff, %o0
cmp %o0, 0
be 8f
add %o1, 1, %o1
cmp %o0, %g2
be,a 1f
ldub [%g3], %o3
retl
sub %o0, %g2, %o0
1:
ldub [%o1], %g2
add %g3, 1, %g3
and %o3, 0xff, %o0
cmp %o0, 0
be 8f
add %o1, 1, %o1
cmp %o0, %g2
be,a 1f
ldub [%g3], %o3
retl
sub %o0, %g2, %o0
1:
ldub [%o1], %g2
add %g3, 1, %g3
and %o3, 0xff, %o0
cmp %o0, 0
be 8f
add %o1, 1, %o1
cmp %o0, %g2
be 1f
add %o4, -1, %o4
retl
sub %o0, %g2, %o0
1:
cmp %o4, 0
bg,a 0b
ldub [%g3], %o3
b 7f
and %o2, 3, %o2
9:
ldub [%o1], %g2
add %g3, 1, %g3
and %o3, 0xff, %o0
cmp %o0, 0
be 8f
add %o1, 1, %o1
cmp %o0, %g2
be 7f
add %o2, -1, %o2
8:
retl
sub %o0, %g2, %o0
7:
cmp %o2, 0
bg,a 9b
ldub [%g3], %o3
and %g2, 0xff, %o0
retl
sub %o3, %o0, %o0
ENDPROC(strncmp)

View file

@ -0,0 +1,30 @@
/*
* Sparc64 optimized strncmp code.
*
* Copyright (C) 1997 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
*/
#include <linux/linkage.h>
#include <asm/asi.h>
.text
ENTRY(strncmp)
brlez,pn %o2, 3f
lduba [%o0] (ASI_PNF), %o3
1:
add %o0, 1, %o0
ldub [%o1], %o4
brz,pn %o3, 2f
add %o1, 1, %o1
cmp %o3, %o4
bne,pn %icc, 2f
subcc %o2, 1, %o2
bne,a,pt %xcc, 1b
ldub [%o0], %o3
2:
retl
sub %o3, %o4, %o0
3:
retl
clr %o0
ENDPROC(strncmp)

19
arch/sparc/lib/ucmpdi2.c Normal file
View file

@ -0,0 +1,19 @@
#include <linux/module.h>
#include "libgcc.h"
word_type __ucmpdi2(unsigned long long a, unsigned long long b)
{
const DWunion au = {.ll = a};
const DWunion bu = {.ll = b};
if ((unsigned int) au.s.high < (unsigned int) bu.s.high)
return 0;
else if ((unsigned int) au.s.high > (unsigned int) bu.s.high)
return 2;
if ((unsigned int) au.s.low < (unsigned int) bu.s.low)
return 0;
else if ((unsigned int) au.s.low > (unsigned int) bu.s.low)
return 2;
return 1;
}
EXPORT_SYMBOL(__ucmpdi2);

259
arch/sparc/lib/udivdi3.S Normal file
View file

@ -0,0 +1,259 @@
/* Copyright (C) 1989, 1992, 1993, 1994, 1995 Free Software Foundation, Inc.
This file is part of GNU CC.
GNU CC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
GNU CC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU CC; see the file COPYING. If not, write to
the Free Software Foundation, 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
.text
.align 4
.globl __udivdi3
__udivdi3:
save %sp,-104,%sp
mov %i3,%o3
cmp %i2,0
bne .LL40
mov %i1,%i3
cmp %o3,%i0
bleu .LL41
mov %i3,%o1
! Inlined udiv_qrnnd
mov 32,%g1
subcc %i0,%o3,%g0
1: bcs 5f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
sub %i0,%o3,%i0 ! this kills msb of n
addx %i0,%i0,%i0 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %i0,%o3,%g0
bcs 3f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
b 3f
sub %i0,%o3,%i0 ! this kills msb of n
4: sub %i0,%o3,%i0
5: addxcc %i0,%i0,%i0
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o1,%o1,%o1 ! shift n1n0 and a 0-bit in lsb
sub %i0,%o3,%i0
3: xnor %o1,0,%o1
! End of inline udiv_qrnnd
b .LL45
mov 0,%o2
.LL41:
cmp %o3,0
bne .LL77
mov %i0,%o2
mov 1,%o0
mov 0,%o1
wr %g0, 0, %y
udiv %o0, %o1, %o0
mov %o0,%o3
mov %i0,%o2
.LL77:
mov 0,%o4
! Inlined udiv_qrnnd
mov 32,%g1
subcc %o4,%o3,%g0
1: bcs 5f
addxcc %o2,%o2,%o2 ! shift n1n0 and a q-bit in lsb
sub %o4,%o3,%o4 ! this kills msb of n
addx %o4,%o4,%o4 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %o4,%o3,%g0
bcs 3f
addxcc %o2,%o2,%o2 ! shift n1n0 and a q-bit in lsb
b 3f
sub %o4,%o3,%o4 ! this kills msb of n
4: sub %o4,%o3,%o4
5: addxcc %o4,%o4,%o4
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o2,%o2,%o2 ! shift n1n0 and a 0-bit in lsb
sub %o4,%o3,%o4
3: xnor %o2,0,%o2
! End of inline udiv_qrnnd
mov %o4,%i0
mov %i3,%o1
! Inlined udiv_qrnnd
mov 32,%g1
subcc %i0,%o3,%g0
1: bcs 5f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
sub %i0,%o3,%i0 ! this kills msb of n
addx %i0,%i0,%i0 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %i0,%o3,%g0
bcs 3f
addxcc %o1,%o1,%o1 ! shift n1n0 and a q-bit in lsb
b 3f
sub %i0,%o3,%i0 ! this kills msb of n
4: sub %i0,%o3,%i0
5: addxcc %i0,%i0,%i0
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o1,%o1,%o1 ! shift n1n0 and a 0-bit in lsb
sub %i0,%o3,%i0
3: xnor %o1,0,%o1
! End of inline udiv_qrnnd
b .LL78
mov %o1,%l1
.LL40:
cmp %i2,%i0
bleu .LL46
sethi %hi(65535),%o0
b .LL73
mov 0,%o1
.LL46:
or %o0,%lo(65535),%o0
cmp %i2,%o0
bgu .LL53
mov %i2,%o1
cmp %i2,256
addx %g0,-1,%o0
b .LL59
and %o0,8,%o2
.LL53:
sethi %hi(16777215),%o0
or %o0,%lo(16777215),%o0
cmp %o1,%o0
bgu .LL59
mov 24,%o2
mov 16,%o2
.LL59:
srl %o1,%o2,%o1
sethi %hi(__clz_tab),%o0
or %o0,%lo(__clz_tab),%o0
ldub [%o1+%o0],%o0
add %o0,%o2,%o0
mov 32,%o1
subcc %o1,%o0,%o2
bne,a .LL67
mov 32,%o0
cmp %i0,%i2
bgu .LL69
cmp %i3,%o3
blu .LL73
mov 0,%o1
.LL69:
b .LL73
mov 1,%o1
.LL67:
sub %o0,%o2,%o0
sll %i2,%o2,%i2
srl %o3,%o0,%o1
or %i2,%o1,%i2
sll %o3,%o2,%o3
srl %i0,%o0,%o1
sll %i0,%o2,%i0
srl %i3,%o0,%o0
or %i0,%o0,%i0
sll %i3,%o2,%i3
mov %i0,%o5
mov %o1,%o4
! Inlined udiv_qrnnd
mov 32,%g1
subcc %o4,%i2,%g0
1: bcs 5f
addxcc %o5,%o5,%o5 ! shift n1n0 and a q-bit in lsb
sub %o4,%i2,%o4 ! this kills msb of n
addx %o4,%o4,%o4 ! so this cannot give carry
subcc %g1,1,%g1
2: bne 1b
subcc %o4,%i2,%g0
bcs 3f
addxcc %o5,%o5,%o5 ! shift n1n0 and a q-bit in lsb
b 3f
sub %o4,%i2,%o4 ! this kills msb of n
4: sub %o4,%i2,%o4
5: addxcc %o4,%o4,%o4
bcc 2b
subcc %g1,1,%g1
! Got carry from n. Subtract next step to cancel this carry.
bne 4b
addcc %o5,%o5,%o5 ! shift n1n0 and a 0-bit in lsb
sub %o4,%i2,%o4
3: xnor %o5,0,%o5
! End of inline udiv_qrnnd
mov %o4,%i0
mov %o5,%o1
! Inlined umul_ppmm
wr %g0,%o1,%y ! SPARC has 0-3 delay insn after a wr
sra %o3,31,%g2 ! Do not move this insn
and %o1,%g2,%g2 ! Do not move this insn
andcc %g0,0,%g1 ! Do not move this insn
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,%o3,%g1
mulscc %g1,0,%g1
add %g1,%g2,%o0
rd %y,%o2
cmp %o0,%i0
bgu,a .LL73
add %o1,-1,%o1
bne,a .LL45
mov 0,%o2
cmp %o2,%i3
bleu .LL45
mov 0,%o2
add %o1,-1,%o1
.LL73:
mov 0,%o2
.LL45:
mov %o1,%l1
.LL78:
mov %o2,%l0
mov %l0,%i0
mov %l1,%i1
ret
restore

View file

@ -0,0 +1,71 @@
/* user_fixup.c: Fix up user copy faults.
*
* Copyright (C) 2004 David S. Miller <davem@redhat.com>
*/
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/module.h>
#include <asm/uaccess.h>
/* Calculating the exact fault address when using
* block loads and stores can be very complicated.
*
* Instead of trying to be clever and handling all
* of the cases, just fix things up simply here.
*/
static unsigned long compute_size(unsigned long start, unsigned long size, unsigned long *offset)
{
unsigned long fault_addr = current_thread_info()->fault_address;
unsigned long end = start + size;
if (fault_addr < start || fault_addr >= end) {
*offset = 0;
} else {
*offset = fault_addr - start;
size = end - fault_addr;
}
return size;
}
unsigned long copy_from_user_fixup(void *to, const void __user *from, unsigned long size)
{
unsigned long offset;
size = compute_size((unsigned long) from, size, &offset);
if (likely(size))
memset(to + offset, 0, size);
return size;
}
EXPORT_SYMBOL(copy_from_user_fixup);
unsigned long copy_to_user_fixup(void __user *to, const void *from, unsigned long size)
{
unsigned long offset;
return compute_size((unsigned long) to, size, &offset);
}
EXPORT_SYMBOL(copy_to_user_fixup);
unsigned long copy_in_user_fixup(void __user *to, void __user *from, unsigned long size)
{
unsigned long fault_addr = current_thread_info()->fault_address;
unsigned long start = (unsigned long) to;
unsigned long end = start + size;
if (fault_addr >= start && fault_addr < end)
return end - fault_addr;
start = (unsigned long) from;
end = start + size;
if (fault_addr >= start && fault_addr < end)
return end - fault_addr;
return size;
}
EXPORT_SYMBOL(copy_in_user_fixup);

636
arch/sparc/lib/xor.S Normal file
View file

@ -0,0 +1,636 @@
/*
* arch/sparc64/lib/xor.S
*
* High speed xor_block operation for RAID4/5 utilizing the
* UltraSparc Visual Instruction Set and Niagara store-init/twin-load.
*
* Copyright (C) 1997, 1999 Jakub Jelinek (jj@ultra.linux.cz)
* Copyright (C) 2006 David S. Miller <davem@davemloft.net>
*/
#include <linux/linkage.h>
#include <asm/visasm.h>
#include <asm/asi.h>
#include <asm/dcu.h>
#include <asm/spitfire.h>
/*
* Requirements:
* !(((long)dest | (long)sourceN) & (64 - 1)) &&
* !(len & 127) && len >= 256
*/
.text
/* VIS versions. */
ENTRY(xor_vis_2)
rd %fprs, %o5
andcc %o5, FPRS_FEF|FPRS_DU, %g0
be,pt %icc, 0f
sethi %hi(VISenter), %g1
jmpl %g1 + %lo(VISenter), %g7
add %g7, 8, %g7
0: wr %g0, FPRS_FEF, %fprs
rd %asi, %g1
wr %g0, ASI_BLK_P, %asi
membar #LoadStore|#StoreLoad|#StoreStore
sub %o0, 128, %o0
ldda [%o1] %asi, %f0
ldda [%o2] %asi, %f16
2: ldda [%o1 + 64] %asi, %f32
fxor %f0, %f16, %f16
fxor %f2, %f18, %f18
fxor %f4, %f20, %f20
fxor %f6, %f22, %f22
fxor %f8, %f24, %f24
fxor %f10, %f26, %f26
fxor %f12, %f28, %f28
fxor %f14, %f30, %f30
stda %f16, [%o1] %asi
ldda [%o2 + 64] %asi, %f48
ldda [%o1 + 128] %asi, %f0
fxor %f32, %f48, %f48
fxor %f34, %f50, %f50
add %o1, 128, %o1
fxor %f36, %f52, %f52
add %o2, 128, %o2
fxor %f38, %f54, %f54
subcc %o0, 128, %o0
fxor %f40, %f56, %f56
fxor %f42, %f58, %f58
fxor %f44, %f60, %f60
fxor %f46, %f62, %f62
stda %f48, [%o1 - 64] %asi
bne,pt %xcc, 2b
ldda [%o2] %asi, %f16
ldda [%o1 + 64] %asi, %f32
fxor %f0, %f16, %f16
fxor %f2, %f18, %f18
fxor %f4, %f20, %f20
fxor %f6, %f22, %f22
fxor %f8, %f24, %f24
fxor %f10, %f26, %f26
fxor %f12, %f28, %f28
fxor %f14, %f30, %f30
stda %f16, [%o1] %asi
ldda [%o2 + 64] %asi, %f48
membar #Sync
fxor %f32, %f48, %f48
fxor %f34, %f50, %f50
fxor %f36, %f52, %f52
fxor %f38, %f54, %f54
fxor %f40, %f56, %f56
fxor %f42, %f58, %f58
fxor %f44, %f60, %f60
fxor %f46, %f62, %f62
stda %f48, [%o1 + 64] %asi
membar #Sync|#StoreStore|#StoreLoad
wr %g1, %g0, %asi
retl
wr %g0, 0, %fprs
ENDPROC(xor_vis_2)
ENTRY(xor_vis_3)
rd %fprs, %o5
andcc %o5, FPRS_FEF|FPRS_DU, %g0
be,pt %icc, 0f
sethi %hi(VISenter), %g1
jmpl %g1 + %lo(VISenter), %g7
add %g7, 8, %g7
0: wr %g0, FPRS_FEF, %fprs
rd %asi, %g1
wr %g0, ASI_BLK_P, %asi
membar #LoadStore|#StoreLoad|#StoreStore
sub %o0, 64, %o0
ldda [%o1] %asi, %f0
ldda [%o2] %asi, %f16
3: ldda [%o3] %asi, %f32
fxor %f0, %f16, %f48
fxor %f2, %f18, %f50
add %o1, 64, %o1
fxor %f4, %f20, %f52
fxor %f6, %f22, %f54
add %o2, 64, %o2
fxor %f8, %f24, %f56
fxor %f10, %f26, %f58
fxor %f12, %f28, %f60
fxor %f14, %f30, %f62
ldda [%o1] %asi, %f0
fxor %f48, %f32, %f48
fxor %f50, %f34, %f50
fxor %f52, %f36, %f52
fxor %f54, %f38, %f54
add %o3, 64, %o3
fxor %f56, %f40, %f56
fxor %f58, %f42, %f58
subcc %o0, 64, %o0
fxor %f60, %f44, %f60
fxor %f62, %f46, %f62
stda %f48, [%o1 - 64] %asi
bne,pt %xcc, 3b
ldda [%o2] %asi, %f16
ldda [%o3] %asi, %f32
fxor %f0, %f16, %f48
fxor %f2, %f18, %f50
fxor %f4, %f20, %f52
fxor %f6, %f22, %f54
fxor %f8, %f24, %f56
fxor %f10, %f26, %f58
fxor %f12, %f28, %f60
fxor %f14, %f30, %f62
membar #Sync
fxor %f48, %f32, %f48
fxor %f50, %f34, %f50
fxor %f52, %f36, %f52
fxor %f54, %f38, %f54
fxor %f56, %f40, %f56
fxor %f58, %f42, %f58
fxor %f60, %f44, %f60
fxor %f62, %f46, %f62
stda %f48, [%o1] %asi
membar #Sync|#StoreStore|#StoreLoad
wr %g1, %g0, %asi
retl
wr %g0, 0, %fprs
ENDPROC(xor_vis_3)
ENTRY(xor_vis_4)
rd %fprs, %o5
andcc %o5, FPRS_FEF|FPRS_DU, %g0
be,pt %icc, 0f
sethi %hi(VISenter), %g1
jmpl %g1 + %lo(VISenter), %g7
add %g7, 8, %g7
0: wr %g0, FPRS_FEF, %fprs
rd %asi, %g1
wr %g0, ASI_BLK_P, %asi
membar #LoadStore|#StoreLoad|#StoreStore
sub %o0, 64, %o0
ldda [%o1] %asi, %f0
ldda [%o2] %asi, %f16
4: ldda [%o3] %asi, %f32
fxor %f0, %f16, %f16
fxor %f2, %f18, %f18
add %o1, 64, %o1
fxor %f4, %f20, %f20
fxor %f6, %f22, %f22
add %o2, 64, %o2
fxor %f8, %f24, %f24
fxor %f10, %f26, %f26
fxor %f12, %f28, %f28
fxor %f14, %f30, %f30
ldda [%o4] %asi, %f48
fxor %f16, %f32, %f32
fxor %f18, %f34, %f34
fxor %f20, %f36, %f36
fxor %f22, %f38, %f38
add %o3, 64, %o3
fxor %f24, %f40, %f40
fxor %f26, %f42, %f42
fxor %f28, %f44, %f44
fxor %f30, %f46, %f46
ldda [%o1] %asi, %f0
fxor %f32, %f48, %f48
fxor %f34, %f50, %f50
fxor %f36, %f52, %f52
add %o4, 64, %o4
fxor %f38, %f54, %f54
fxor %f40, %f56, %f56
fxor %f42, %f58, %f58
subcc %o0, 64, %o0
fxor %f44, %f60, %f60
fxor %f46, %f62, %f62
stda %f48, [%o1 - 64] %asi
bne,pt %xcc, 4b
ldda [%o2] %asi, %f16
ldda [%o3] %asi, %f32
fxor %f0, %f16, %f16
fxor %f2, %f18, %f18
fxor %f4, %f20, %f20
fxor %f6, %f22, %f22
fxor %f8, %f24, %f24
fxor %f10, %f26, %f26
fxor %f12, %f28, %f28
fxor %f14, %f30, %f30
ldda [%o4] %asi, %f48
fxor %f16, %f32, %f32
fxor %f18, %f34, %f34
fxor %f20, %f36, %f36
fxor %f22, %f38, %f38
fxor %f24, %f40, %f40
fxor %f26, %f42, %f42
fxor %f28, %f44, %f44
fxor %f30, %f46, %f46
membar #Sync
fxor %f32, %f48, %f48
fxor %f34, %f50, %f50
fxor %f36, %f52, %f52
fxor %f38, %f54, %f54
fxor %f40, %f56, %f56
fxor %f42, %f58, %f58
fxor %f44, %f60, %f60
fxor %f46, %f62, %f62
stda %f48, [%o1] %asi
membar #Sync|#StoreStore|#StoreLoad
wr %g1, %g0, %asi
retl
wr %g0, 0, %fprs
ENDPROC(xor_vis_4)
ENTRY(xor_vis_5)
save %sp, -192, %sp
rd %fprs, %o5
andcc %o5, FPRS_FEF|FPRS_DU, %g0
be,pt %icc, 0f
sethi %hi(VISenter), %g1
jmpl %g1 + %lo(VISenter), %g7
add %g7, 8, %g7
0: wr %g0, FPRS_FEF, %fprs
rd %asi, %g1
wr %g0, ASI_BLK_P, %asi
membar #LoadStore|#StoreLoad|#StoreStore
sub %i0, 64, %i0
ldda [%i1] %asi, %f0
ldda [%i2] %asi, %f16
5: ldda [%i3] %asi, %f32
fxor %f0, %f16, %f48
fxor %f2, %f18, %f50
add %i1, 64, %i1
fxor %f4, %f20, %f52
fxor %f6, %f22, %f54
add %i2, 64, %i2
fxor %f8, %f24, %f56
fxor %f10, %f26, %f58
fxor %f12, %f28, %f60
fxor %f14, %f30, %f62
ldda [%i4] %asi, %f16
fxor %f48, %f32, %f48
fxor %f50, %f34, %f50
fxor %f52, %f36, %f52
fxor %f54, %f38, %f54
add %i3, 64, %i3
fxor %f56, %f40, %f56
fxor %f58, %f42, %f58
fxor %f60, %f44, %f60
fxor %f62, %f46, %f62
ldda [%i5] %asi, %f32
fxor %f48, %f16, %f48
fxor %f50, %f18, %f50
add %i4, 64, %i4
fxor %f52, %f20, %f52
fxor %f54, %f22, %f54
add %i5, 64, %i5
fxor %f56, %f24, %f56
fxor %f58, %f26, %f58
fxor %f60, %f28, %f60
fxor %f62, %f30, %f62
ldda [%i1] %asi, %f0
fxor %f48, %f32, %f48
fxor %f50, %f34, %f50
fxor %f52, %f36, %f52
fxor %f54, %f38, %f54
fxor %f56, %f40, %f56
fxor %f58, %f42, %f58
subcc %i0, 64, %i0
fxor %f60, %f44, %f60
fxor %f62, %f46, %f62
stda %f48, [%i1 - 64] %asi
bne,pt %xcc, 5b
ldda [%i2] %asi, %f16
ldda [%i3] %asi, %f32
fxor %f0, %f16, %f48
fxor %f2, %f18, %f50
fxor %f4, %f20, %f52
fxor %f6, %f22, %f54
fxor %f8, %f24, %f56
fxor %f10, %f26, %f58
fxor %f12, %f28, %f60
fxor %f14, %f30, %f62
ldda [%i4] %asi, %f16
fxor %f48, %f32, %f48
fxor %f50, %f34, %f50
fxor %f52, %f36, %f52
fxor %f54, %f38, %f54
fxor %f56, %f40, %f56
fxor %f58, %f42, %f58
fxor %f60, %f44, %f60
fxor %f62, %f46, %f62
ldda [%i5] %asi, %f32
fxor %f48, %f16, %f48
fxor %f50, %f18, %f50
fxor %f52, %f20, %f52
fxor %f54, %f22, %f54
fxor %f56, %f24, %f56
fxor %f58, %f26, %f58
fxor %f60, %f28, %f60
fxor %f62, %f30, %f62
membar #Sync
fxor %f48, %f32, %f48
fxor %f50, %f34, %f50
fxor %f52, %f36, %f52
fxor %f54, %f38, %f54
fxor %f56, %f40, %f56
fxor %f58, %f42, %f58
fxor %f60, %f44, %f60
fxor %f62, %f46, %f62
stda %f48, [%i1] %asi
membar #Sync|#StoreStore|#StoreLoad
wr %g1, %g0, %asi
wr %g0, 0, %fprs
ret
restore
ENDPROC(xor_vis_5)
/* Niagara versions. */
ENTRY(xor_niagara_2) /* %o0=bytes, %o1=dest, %o2=src */
save %sp, -192, %sp
prefetch [%i1], #n_writes
prefetch [%i2], #one_read
rd %asi, %g7
wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi
srlx %i0, 6, %g1
mov %i1, %i0
mov %i2, %i1
1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src + 0x00 */
ldda [%i1 + 0x10] %asi, %i4 /* %i4/%i5 = src + 0x10 */
ldda [%i1 + 0x20] %asi, %g2 /* %g2/%g3 = src + 0x20 */
ldda [%i1 + 0x30] %asi, %l0 /* %l0/%l1 = src + 0x30 */
prefetch [%i1 + 0x40], #one_read
ldda [%i0 + 0x00] %asi, %o0 /* %o0/%o1 = dest + 0x00 */
ldda [%i0 + 0x10] %asi, %o2 /* %o2/%o3 = dest + 0x10 */
ldda [%i0 + 0x20] %asi, %o4 /* %o4/%o5 = dest + 0x20 */
ldda [%i0 + 0x30] %asi, %l2 /* %l2/%l3 = dest + 0x30 */
prefetch [%i0 + 0x40], #n_writes
xor %o0, %i2, %o0
xor %o1, %i3, %o1
stxa %o0, [%i0 + 0x00] %asi
stxa %o1, [%i0 + 0x08] %asi
xor %o2, %i4, %o2
xor %o3, %i5, %o3
stxa %o2, [%i0 + 0x10] %asi
stxa %o3, [%i0 + 0x18] %asi
xor %o4, %g2, %o4
xor %o5, %g3, %o5
stxa %o4, [%i0 + 0x20] %asi
stxa %o5, [%i0 + 0x28] %asi
xor %l2, %l0, %l2
xor %l3, %l1, %l3
stxa %l2, [%i0 + 0x30] %asi
stxa %l3, [%i0 + 0x38] %asi
add %i0, 0x40, %i0
subcc %g1, 1, %g1
bne,pt %xcc, 1b
add %i1, 0x40, %i1
membar #Sync
wr %g7, 0x0, %asi
ret
restore
ENDPROC(xor_niagara_2)
ENTRY(xor_niagara_3) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2 */
save %sp, -192, %sp
prefetch [%i1], #n_writes
prefetch [%i2], #one_read
prefetch [%i3], #one_read
rd %asi, %g7
wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi
srlx %i0, 6, %g1
mov %i1, %i0
mov %i2, %i1
mov %i3, %l7
1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */
ldda [%i1 + 0x10] %asi, %i4 /* %i4/%i5 = src1 + 0x10 */
ldda [%l7 + 0x00] %asi, %g2 /* %g2/%g3 = src2 + 0x00 */
ldda [%l7 + 0x10] %asi, %l0 /* %l0/%l1 = src2 + 0x10 */
ldda [%i0 + 0x00] %asi, %o0 /* %o0/%o1 = dest + 0x00 */
ldda [%i0 + 0x10] %asi, %o2 /* %o2/%o3 = dest + 0x10 */
xor %g2, %i2, %g2
xor %g3, %i3, %g3
xor %o0, %g2, %o0
xor %o1, %g3, %o1
stxa %o0, [%i0 + 0x00] %asi
stxa %o1, [%i0 + 0x08] %asi
ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */
ldda [%l7 + 0x20] %asi, %g2 /* %g2/%g3 = src2 + 0x20 */
ldda [%i0 + 0x20] %asi, %o0 /* %o0/%o1 = dest + 0x20 */
xor %l0, %i4, %l0
xor %l1, %i5, %l1
xor %o2, %l0, %o2
xor %o3, %l1, %o3
stxa %o2, [%i0 + 0x10] %asi
stxa %o3, [%i0 + 0x18] %asi
ldda [%i1 + 0x30] %asi, %i4 /* %i4/%i5 = src1 + 0x30 */
ldda [%l7 + 0x30] %asi, %l0 /* %l0/%l1 = src2 + 0x30 */
ldda [%i0 + 0x30] %asi, %o2 /* %o2/%o3 = dest + 0x30 */
prefetch [%i1 + 0x40], #one_read
prefetch [%l7 + 0x40], #one_read
prefetch [%i0 + 0x40], #n_writes
xor %g2, %i2, %g2
xor %g3, %i3, %g3
xor %o0, %g2, %o0
xor %o1, %g3, %o1
stxa %o0, [%i0 + 0x20] %asi
stxa %o1, [%i0 + 0x28] %asi
xor %l0, %i4, %l0
xor %l1, %i5, %l1
xor %o2, %l0, %o2
xor %o3, %l1, %o3
stxa %o2, [%i0 + 0x30] %asi
stxa %o3, [%i0 + 0x38] %asi
add %i0, 0x40, %i0
add %i1, 0x40, %i1
subcc %g1, 1, %g1
bne,pt %xcc, 1b
add %l7, 0x40, %l7
membar #Sync
wr %g7, 0x0, %asi
ret
restore
ENDPROC(xor_niagara_3)
ENTRY(xor_niagara_4) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3 */
save %sp, -192, %sp
prefetch [%i1], #n_writes
prefetch [%i2], #one_read
prefetch [%i3], #one_read
prefetch [%i4], #one_read
rd %asi, %g7
wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi
srlx %i0, 6, %g1
mov %i1, %i0
mov %i2, %i1
mov %i3, %l7
mov %i4, %l6
1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */
ldda [%l7 + 0x00] %asi, %i4 /* %i4/%i5 = src2 + 0x00 */
ldda [%l6 + 0x00] %asi, %g2 /* %g2/%g3 = src3 + 0x00 */
ldda [%i0 + 0x00] %asi, %l0 /* %l0/%l1 = dest + 0x00 */
xor %i4, %i2, %i4
xor %i5, %i3, %i5
ldda [%i1 + 0x10] %asi, %i2 /* %i2/%i3 = src1 + 0x10 */
xor %g2, %i4, %g2
xor %g3, %i5, %g3
ldda [%l7 + 0x10] %asi, %i4 /* %i4/%i5 = src2 + 0x10 */
xor %l0, %g2, %l0
xor %l1, %g3, %l1
stxa %l0, [%i0 + 0x00] %asi
stxa %l1, [%i0 + 0x08] %asi
ldda [%l6 + 0x10] %asi, %g2 /* %g2/%g3 = src3 + 0x10 */
ldda [%i0 + 0x10] %asi, %l0 /* %l0/%l1 = dest + 0x10 */
xor %i4, %i2, %i4
xor %i5, %i3, %i5
ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */
xor %g2, %i4, %g2
xor %g3, %i5, %g3
ldda [%l7 + 0x20] %asi, %i4 /* %i4/%i5 = src2 + 0x20 */
xor %l0, %g2, %l0
xor %l1, %g3, %l1
stxa %l0, [%i0 + 0x10] %asi
stxa %l1, [%i0 + 0x18] %asi
ldda [%l6 + 0x20] %asi, %g2 /* %g2/%g3 = src3 + 0x20 */
ldda [%i0 + 0x20] %asi, %l0 /* %l0/%l1 = dest + 0x20 */
xor %i4, %i2, %i4
xor %i5, %i3, %i5
ldda [%i1 + 0x30] %asi, %i2 /* %i2/%i3 = src1 + 0x30 */
xor %g2, %i4, %g2
xor %g3, %i5, %g3
ldda [%l7 + 0x30] %asi, %i4 /* %i4/%i5 = src2 + 0x30 */
xor %l0, %g2, %l0
xor %l1, %g3, %l1
stxa %l0, [%i0 + 0x20] %asi
stxa %l1, [%i0 + 0x28] %asi
ldda [%l6 + 0x30] %asi, %g2 /* %g2/%g3 = src3 + 0x30 */
ldda [%i0 + 0x30] %asi, %l0 /* %l0/%l1 = dest + 0x30 */
prefetch [%i1 + 0x40], #one_read
prefetch [%l7 + 0x40], #one_read
prefetch [%l6 + 0x40], #one_read
prefetch [%i0 + 0x40], #n_writes
xor %i4, %i2, %i4
xor %i5, %i3, %i5
xor %g2, %i4, %g2
xor %g3, %i5, %g3
xor %l0, %g2, %l0
xor %l1, %g3, %l1
stxa %l0, [%i0 + 0x30] %asi
stxa %l1, [%i0 + 0x38] %asi
add %i0, 0x40, %i0
add %i1, 0x40, %i1
add %l7, 0x40, %l7
subcc %g1, 1, %g1
bne,pt %xcc, 1b
add %l6, 0x40, %l6
membar #Sync
wr %g7, 0x0, %asi
ret
restore
ENDPROC(xor_niagara_4)
ENTRY(xor_niagara_5) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3, %o5=src4 */
save %sp, -192, %sp
prefetch [%i1], #n_writes
prefetch [%i2], #one_read
prefetch [%i3], #one_read
prefetch [%i4], #one_read
prefetch [%i5], #one_read
rd %asi, %g7
wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi
srlx %i0, 6, %g1
mov %i1, %i0
mov %i2, %i1
mov %i3, %l7
mov %i4, %l6
mov %i5, %l5
1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */
ldda [%l7 + 0x00] %asi, %i4 /* %i4/%i5 = src2 + 0x00 */
ldda [%l6 + 0x00] %asi, %g2 /* %g2/%g3 = src3 + 0x00 */
ldda [%l5 + 0x00] %asi, %l0 /* %l0/%l1 = src4 + 0x00 */
ldda [%i0 + 0x00] %asi, %l2 /* %l2/%l3 = dest + 0x00 */
xor %i4, %i2, %i4
xor %i5, %i3, %i5
ldda [%i1 + 0x10] %asi, %i2 /* %i2/%i3 = src1 + 0x10 */
xor %g2, %i4, %g2
xor %g3, %i5, %g3
ldda [%l7 + 0x10] %asi, %i4 /* %i4/%i5 = src2 + 0x10 */
xor %l0, %g2, %l0
xor %l1, %g3, %l1
ldda [%l6 + 0x10] %asi, %g2 /* %g2/%g3 = src3 + 0x10 */
xor %l2, %l0, %l2
xor %l3, %l1, %l3
stxa %l2, [%i0 + 0x00] %asi
stxa %l3, [%i0 + 0x08] %asi
ldda [%l5 + 0x10] %asi, %l0 /* %l0/%l1 = src4 + 0x10 */
ldda [%i0 + 0x10] %asi, %l2 /* %l2/%l3 = dest + 0x10 */
xor %i4, %i2, %i4
xor %i5, %i3, %i5
ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */
xor %g2, %i4, %g2
xor %g3, %i5, %g3
ldda [%l7 + 0x20] %asi, %i4 /* %i4/%i5 = src2 + 0x20 */
xor %l0, %g2, %l0
xor %l1, %g3, %l1
ldda [%l6 + 0x20] %asi, %g2 /* %g2/%g3 = src3 + 0x20 */
xor %l2, %l0, %l2
xor %l3, %l1, %l3
stxa %l2, [%i0 + 0x10] %asi
stxa %l3, [%i0 + 0x18] %asi
ldda [%l5 + 0x20] %asi, %l0 /* %l0/%l1 = src4 + 0x20 */
ldda [%i0 + 0x20] %asi, %l2 /* %l2/%l3 = dest + 0x20 */
xor %i4, %i2, %i4
xor %i5, %i3, %i5
ldda [%i1 + 0x30] %asi, %i2 /* %i2/%i3 = src1 + 0x30 */
xor %g2, %i4, %g2
xor %g3, %i5, %g3
ldda [%l7 + 0x30] %asi, %i4 /* %i4/%i5 = src2 + 0x30 */
xor %l0, %g2, %l0
xor %l1, %g3, %l1
ldda [%l6 + 0x30] %asi, %g2 /* %g2/%g3 = src3 + 0x30 */
xor %l2, %l0, %l2
xor %l3, %l1, %l3
stxa %l2, [%i0 + 0x20] %asi
stxa %l3, [%i0 + 0x28] %asi
ldda [%l5 + 0x30] %asi, %l0 /* %l0/%l1 = src4 + 0x30 */
ldda [%i0 + 0x30] %asi, %l2 /* %l2/%l3 = dest + 0x30 */
prefetch [%i1 + 0x40], #one_read
prefetch [%l7 + 0x40], #one_read
prefetch [%l6 + 0x40], #one_read
prefetch [%l5 + 0x40], #one_read
prefetch [%i0 + 0x40], #n_writes
xor %i4, %i2, %i4
xor %i5, %i3, %i5
xor %g2, %i4, %g2
xor %g3, %i5, %g3
xor %l0, %g2, %l0
xor %l1, %g3, %l1
xor %l2, %l0, %l2
xor %l3, %l1, %l3
stxa %l2, [%i0 + 0x30] %asi
stxa %l3, [%i0 + 0x38] %asi
add %i0, 0x40, %i0
add %i1, 0x40, %i1
add %l7, 0x40, %l7
add %l6, 0x40, %l6
subcc %g1, 1, %g1
bne,pt %xcc, 1b
add %l5, 0x40, %l5
membar #Sync
wr %g7, 0x0, %asi
ret
restore
ENDPROC(xor_niagara_5)