Projects
home:Kaguya:branches:home:Kaguya
tensorflow-oerv
Sign Up
Log In
Username
Password
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
Expand all
Collapse all
Changes of Revision 4
View file
nat_riscv64.s
Added
@@ -0,0 +1,91 @@ +// Copyright 2023 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +//go:build !purego + +#include "textflag.h" + +// func addMulVVW1024(z, x *uint, y uint) (c uint) +TEXT ·addMulVVW1024(SB),$0-32 + MOV $16, X30 + JMP addMulVVWx(SB) + +// func addMulVVW1536(z, x *uint, y uint) (c uint) +TEXT ·addMulVVW1536(SB),$0-32 + MOV $24, X30 + JMP addMulVVWx(SB) + +// func addMulVVW2048(z, x *uint, y uint) (c uint) +TEXT ·addMulVVW2048(SB),$0-32 + MOV $32, X30 + JMP addMulVVWx(SB) + +TEXT addMulVVWx(SB),NOFRAME|NOSPLIT,$0 + MOV z+0(FP), X5 + MOV x+8(FP), X7 + MOV y+16(FP), X6 + MOV $0, X29 + + BEQZ X30, done +loop: + MOV 0*8(X5), X10 // z0 + MOV 1*8(X5), X13 // z1 + MOV 2*8(X5), X16 // z2 + MOV 3*8(X5), X19 // z3 + + MOV 0*8(X7), X8 // x0 + MOV 1*8(X7), X11 // x1 + MOV 2*8(X7), X14 // x2 + MOV 3*8(X7), X17 // x3 + + MULHU X8, X6, X9 // z_hi0 = x0 * y + MUL X8, X6, X8 // z_lo0 = x0 * y + ADD X8, X10, X21 // z_lo0 = x0 * y + z0 + SLTU X8, X21, X22 + ADD X9, X22, X9 // z_hi0 = x0 * y + z0 + ADD X21, X29, X10 // z_lo0 = x0 * y + z0 + c + SLTU X21, X10, X22 + ADD X9, X22, X29 // next c + + MULHU X11, X6, X12 // z_hi1 = x1 * y + MUL X11, X6, X11 // z_lo1 = x1 * y + ADD X11, X13, X21 // z_lo1 = x1 * y + z1 + SLTU X11, X21, X22 + ADD X12, X22, X12 // z_hi1 = x1 * y + z1 + ADD X21, X29, X13 // z_lo1 = x1 * y + z1 + c + SLTU X21, X13, X22 + ADD X12, X22, X29 // next c + + MULHU X14, X6, X15 // z_hi2 = x2 * y + MUL X14, X6, X14 // z_lo2 = x2 * y + ADD X14, X16, X21 // z_lo2 = x2 * y + z2 + SLTU X14, X21, X22 + ADD X15, X22, X15 // z_hi2 = x2 * y + z2 + ADD X21, X29, X16 // z_lo2 = x2 * y + z2 + c + SLTU X21, X16, X22 + ADD X15, X22, X29 // next c + + MULHU X17, X6, X18 // z_hi3 = x3 * y + MUL X17, X6, X17 // z_lo3 = x3 * y + ADD X17, X19, X21 // z_lo3 = x3 * y + z3 + SLTU X17, X21, X22 + ADD X18, X22, X18 // z_hi3 = x3 * y + z3 + ADD X21, X29, X19 // z_lo3 = x3 * y + z3 + c + SLTU X21, X19, X22 + ADD X18, X22, X29 // next c + + MOV X10, 0*8(X5) // z0 + MOV X13, 1*8(X5) // z1 + MOV X16, 2*8(X5) // z2 + MOV X19, 3*8(X5) // z3 + + ADD $32, X5 + ADD $32, X7 + + SUB $4, X30 + BNEZ X30, loop + +done: + MOV X29, c+24(FP) + RET
Locations
Projects
Search
Status Monitor
Help
Open Build Service
OBS Manuals
API Documentation
OBS Portal
Reporting a Bug
Contact
Mailing List
Forums
Chat (IRC)
Twitter
Open Build Service (OBS)
is an
openSUSE project
.
浙ICP备2022010568号-2