Projects
openEuler:24.03:SP1:Everything
gcc
_service:tar_scm:0157-Backport-SME-aarch64-Vect...
Sign Up
Log In
Username
Password
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File _service:tar_scm:0157-Backport-SME-aarch64-Vector-move-fixes-for-nosimd.patch of Package gcc
From 737d2a5f1c5e725b7e5a20075270016ebf56b44c Mon Sep 17 00:00:00 2001 From: Richard Sandiford <richard.sandiford@arm.com> Date: Tue, 13 Sep 2022 09:28:49 +0100 Subject: [PATCH 058/157] [Backport][SME] aarch64: Vector move fixes for +nosimd Reference: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=721c0fb3aca31d3bf8ad6e929eab32e29a427e60 This patch fixes various issues around the handling of vectors and (particularly) vector structures with +nosimd. Previously, passing and returning structures would trigger an ICE, since: * we didn't allow the structure modes to be stored in FPRs * we didn't provide +nosimd move patterns * splitting the moves into word-sized pieces (the default strategy without move patterns) doesn't work because the registers are doubleword sized. The patch is a bit of a hodge-podge since a lot of the handling of moves, register costs, and register legitimacy is so interconnected. It didn't seem feasible to split things further. Some notes: * The patch recognises vector and tuple modes based on TARGET_FLOAT rather than TARGET_SIMD, and instead adds TARGET_SIMD to places that really do need the vector ISA. This is necessary for the modes to be handled correctly in register arguments and returns. * The 64-bit (DREG) STP peephole required TARGET_SIMD but the LDP peephole didn't. I think the LDP one is right, since DREG moves could involve GPRs as well as FPRs. * The patch keeps the existing choices of instructions for TARGET_SIMD, just in case they happen to be better than FMOV on some uarches. * Before the patch, +nosimd Q<->Q moves of 128-bit scalars went via a GPR, thanks to a secondary reload pattern. This approach might not be ideal, but there's no reason that 128-bit vectors should behave differently from 128-bit scalars. The patch therefore extends the current scalar approach to vectors. * Multi-vector LD1 and ST1 require TARGET_SIMD, so the TARGET_FLOAT structure moves need to use LDP/STP and LDR/STR combinations instead. That's also what we do for big-endian even with TARGET_SIMD, so most of the code was already there. The patterns for structures of 64-bit vectors are identical, but the patterns for structures of 128-bit vectors need to cope with the lack of 128-bit Q<->Q moves. It isn't feasible to move multi-vector tuples via GPRs, so the patch moves them via memory instead. This contaminates the port with its first secondary memory reload. gcc/ * config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Use TARGET_FLOAT instead of TARGET_SIMD. (aarch64_vectorize_related_mode): Restrict ADVSIMD handling to TARGET_SIMD. (aarch64_hard_regno_mode_ok): Don't allow tuples of 2 64-bit vectors in GPRs. (aarch64_classify_address): Treat little-endian structure moves like big-endian for TARGET_FLOAT && !TARGET_SIMD. (aarch64_secondary_memory_needed): New function. (aarch64_secondary_reload): Handle 128-bit Advanced SIMD vectors in the same way as TF, TI and TD. (aarch64_rtx_mult_cost): Restrict ADVSIMD handling to TARGET_SIMD. (aarch64_rtx_costs): Likewise. (aarch64_register_move_cost): Treat a pair of 64-bit vectors separately from a single 128-bit vector. Handle the cost implied by aarch64_secondary_memory_needed. (aarch64_simd_valid_immediate): Restrict ADVSIMD handling to TARGET_SIMD. (aarch64_expand_vec_perm_const_1): Likewise. (TARGET_SECONDARY_MEMORY_NEEDED): New macro. * config/aarch64/iterators.md (VTX): New iterator. * config/aarch64/aarch64.md (arches): Add fp_q as a synonym of simd. (arch_enabled): Adjust accordingly. (@aarch64_reload_mov<TX:mode>): Extend to... (@aarch64_reload_mov<VTX:mode>): ...this. * config/aarch64/aarch64-simd.md (mov<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. (movmisalign<mode>): Likewise. (load_pair<DREG:mode><DREG2:mode>): Likewise. (vec_store_pair<DREG:mode><DREG2:mode>): Likewise. (load_pair<VQ:mode><VQ2:mode>): Likewise. (vec_store_pair<VQ:mode><VQ2:mode>): Likewise. (@aarch64_split_simd_mov<mode>): Likewise. (aarch64_get_low<mode>): Likewise. (aarch64_get_high<mode>): Likewise. (aarch64_get_half<mode>): Likewise. Canonicalize to a move for lowpart extracts. (*aarch64_simd_mov<VDMOV:mode>): Require TARGET_FLOAT rather than TARGET_SIMD. Use different w<-w and r<-w instructions for !TARGET_SIMD. Disable immediate moves for !TARGET_SIMD but add an alternative specifically for w<-Z. (*aarch64_simd_mov<VQMOV:mode>): Require TARGET_FLOAT rather than TARGET_SIMD. Likewise for the associated define_splits. Disable FPR moves and immediate moves for !TARGET_SIMD but add an alternative specifically for w<-Z. (aarch64_simd_mov_from_<mode>high): Require TARGET_FLOAT rather than TARGET_SIMD. Restrict the existing alternatives to TARGET_SIMD but add a new r<-w one for !TARGET_SIMD. (*aarch64_get_high<mode>): New pattern. (load_pair_lanes<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. (store_pair_lanes<mode>): Likewise. (*aarch64_combine_internal<mode>): Likewise. Restrict existing w<-w, w<-r and w<-m alternatives to TARGET_SIMD but add a new w<-r alternative for !TARGET_SIMD. (*aarch64_combine_internal_be<mode>): Likewise. (aarch64_combinez<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. Remove bogus arch attribute. (*aarch64_combinez_be<mode>): Likewise. (@aarch64_vec_concat<mode>): Require TARGET_FLOAT rather than TARGET_SIMD. (aarch64_combine<mode>): Likewise. (aarch64_rev_reglist<mode>): Likewise. (mov<mode>): Likewise. (*aarch64_be_mov<VSTRUCT_2D:mode>): Extend to TARGET_FLOAT && !TARGET_SIMD, regardless of endianness. Extend associated define_splits in the same way, both for this pattern and the ones below. (*aarch64_be_mov<VSTRUCT_2Qmode>): Likewise. Restrict w<-w alternative to TARGET_SIMD. (*aarch64_be_movoi): Likewise. (*aarch64_be_movci): Likewise. (*aarch64_be_movxi): Likewise. (*aarch64_be_mov<VSTRUCT_4QD:mode>): Extend to TARGET_FLOAT && !TARGET_SIMD, regardless of endianness. Restrict w<-w alternative to TARGET_SIMD for tuples of 128-bit vectors. (*aarch64_be_mov<VSTRUCT_4QD:mode>): Likewise. * config/aarch64/aarch64-ldpstp.md: Remove TARGET_SIMD condition from DREG STP peephole. Change TARGET_SIMD to TARGET_FLOAT in the VQ and VP_2E LDP and STP peepholes. gcc/testsuite/ * gcc.target/aarch64/ldp_stp_20.c: New test. * gcc.target/aarch64/ldp_stp_21.c: Likewise. * gcc.target/aarch64/ldp_stp_22.c: Likewise. * gcc.target/aarch64/ldp_stp_23.c: Likewise. * gcc.target/aarch64/ldp_stp_24.c: Likewise. * gcc.target/aarch64/movv16qi_1.c (gpr_to_gpr): New function. * gcc.target/aarch64/movv8qi_1.c (gpr_to_gpr): Likewise. * gcc.target/aarch64/movv16qi_2.c: New test. * gcc.target/aarch64/movv16qi_3.c: Likewise. * gcc.target/aarch64/movv2di_1.c: Likewise. * gcc.target/aarch64/movv2x16qi_1.c: Likewise. * gcc.target/aarch64/movv2x8qi_1.c: Likewise. * gcc.target/aarch64/movv3x16qi_1.c: Likewise. * gcc.target/aarch64/movv3x8qi_1.c: Likewise. * gcc.target/aarch64/movv4x16qi_1.c: Likewise. * gcc.target/aarch64/movv4x8qi_1.c: Likewise. * gcc.target/aarch64/movv8qi_2.c: Likewise. * gcc.target/aarch64/movv8qi_3.c: Likewise. * gcc.target/aarch64/vect_unary_2.c: Likewise. --- gcc/config/aarch64/aarch64-ldpstp.md | 11 +- gcc/config/aarch64/aarch64-simd.md | 199 +++++++++++------- gcc/config/aarch64/aarch64.cc | 94 ++++++--- gcc/config/aarch64/aarch64.md | 11 +- gcc/config/aarch64/iterators.md | 2 + gcc/testsuite/gcc.target/aarch64/ldp_stp_20.c | 7 + gcc/testsuite/gcc.target/aarch64/ldp_stp_21.c | 7 + gcc/testsuite/gcc.target/aarch64/ldp_stp_22.c | 13 ++ gcc/testsuite/gcc.target/aarch64/ldp_stp_23.c | 16 ++ gcc/testsuite/gcc.target/aarch64/ldp_stp_24.c | 16 ++ gcc/testsuite/gcc.target/aarch64/movv16qi_1.c | 21 ++ gcc/testsuite/gcc.target/aarch64/movv16qi_2.c | 27 +++ gcc/testsuite/gcc.target/aarch64/movv16qi_3.c | 30 +++ gcc/testsuite/gcc.target/aarch64/movv2di_1.c | 103 +++++++++ .../gcc.target/aarch64/movv2x16qi_1.c | 40 ++++ .../gcc.target/aarch64/movv2x8qi_1.c | 38 ++++ .../gcc.target/aarch64/movv3x16qi_1.c | 44 ++++ .../gcc.target/aarch64/movv3x8qi_1.c | 41 ++++ .../gcc.target/aarch64/movv4x16qi_1.c | 44 ++++ .../gcc.target/aarch64/movv4x8qi_1.c | 42 ++++ gcc/testsuite/gcc.target/aarch64/movv8qi_1.c | 15 ++ gcc/testsuite/gcc.target/aarch64/movv8qi_2.c | 27 +++ gcc/testsuite/gcc.target/aarch64/movv8qi_3.c | 30 +++ .../gcc.target/aarch64/vect_unary_2.c | 5 + 24 files changed, 774 insertions(+), 109 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_20.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_21.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_22.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_23.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_24.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv2di_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv2x16qi_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv2x8qi_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv3x16qi_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv3x8qi_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv4x16qi_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv4x8qi_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv8qi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv8qi_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vect_unary_2.c diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md index ba76a1b78..f8446e212 100644 --- a/gcc/config/aarch64/aarch64-ldpstp.md +++ b/gcc/config/aarch64/aarch64-ldpstp.md @@ -83,8 +83,7 @@ (match_operand:DREG 1 "register_operand" "")) (set (match_operand:DREG2 2 "memory_operand" "") (match_operand:DREG2 3 "register_operand" ""))] - "TARGET_SIMD - && aarch64_operands_ok_for_ldpstp (operands, false, <DREG:MODE>mode)" + "aarch64_operands_ok_for_ldpstp (operands, false, <DREG:MODE>mode)" [(parallel [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (match_dup 3))])] { @@ -96,7 +95,7 @@ (match_operand:VQ 1 "memory_operand" "")) (set (match_operand:VQ2 2 "register_operand" "") (match_operand:VQ2 3 "memory_operand" ""))] - "TARGET_SIMD + "TARGET_FLOAT && aarch64_operands_ok_for_ldpstp (operands, true, <VQ:MODE>mode) && (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0" @@ -111,7 +110,7 @@ (match_operand:VQ 1 "register_operand" "")) (set (match_operand:VQ2 2 "memory_operand" "") (match_operand:VQ2 3 "register_operand" ""))] - "TARGET_SIMD + "TARGET_FLOAT && aarch64_operands_ok_for_ldpstp (operands, false, <VQ:MODE>mode) && (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0" @@ -306,7 +305,7 @@ (set (match_operand:VP_2E 6 "memory_operand" "") (match_operand:VP_2E 7 "aarch64_reg_or_zero" "")) (match_dup 8)] - "TARGET_SIMD + "TARGET_FLOAT && aarch64_operands_adjust_ok_for_ldpstp (operands, false, <MODE>mode)" [(const_int 0)] { @@ -327,7 +326,7 @@ (set (match_operand:VP_2E 6 "register_operand" "") (match_operand:VP_2E 7 "memory_operand" "")) (match_dup 8)] - "TARGET_SIMD + "TARGET_FLOAT && aarch64_operands_adjust_ok_for_ldpstp (operands, true, <MODE>mode)" [(const_int 0)] { diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index a47b39281..ef7fc4ecb 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -21,7 +21,7 @@ (define_expand "mov<mode>" [(set (match_operand:VALL_F16 0 "nonimmediate_operand") (match_operand:VALL_F16 1 "general_operand"))] - "TARGET_SIMD" + "TARGET_FLOAT" " /* Force the operand into a register if it is not an immediate whose use can be replaced with xzr. @@ -52,7 +52,7 @@ (define_expand "movmisalign<mode>" [(set (match_operand:VALL_F16 0 "nonimmediate_operand") (match_operand:VALL_F16 1 "general_operand"))] - "TARGET_SIMD && !STRICT_ALIGNMENT" + "TARGET_FLOAT && !STRICT_ALIGNMENT" { /* This pattern is not permitted to fail during expansion: if both arguments are non-registers (e.g. memory := constant, which can be created by the @@ -116,10 +116,10 @@ (define_insn "*aarch64_simd_mov<VDMOV:mode>" [(set (match_operand:VDMOV 0 "nonimmediate_operand" - "=w, m, m, w, ?r, ?w, ?r, w") + "=w, m, m, w, ?r, ?w, ?r, w, w") (match_operand:VDMOV 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] - "TARGET_SIMD + "m, Dz, w, w, w, r, r, Dn, Dz"))] + "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode) || aarch64_simd_reg_or_zero (operands[1], <MODE>mode))" { @@ -128,26 +128,34 @@ case 0: return "ldr\t%d0, %1"; case 1: return "str\txzr, %0"; case 2: return "str\t%d1, %0"; - case 3: return "mov\t%0.<Vbtype>, %1.<Vbtype>"; - case 4: return "umov\t%0, %1.d[0]"; + case 3: + if (TARGET_SIMD) + return "mov\t%0.<Vbtype>, %1.<Vbtype>"; + return "fmov\t%d0, %d1"; + case 4: + if (TARGET_SIMD) + return "umov\t%0, %1.d[0]"; + return "fmov\t%x0, %d1"; case 5: return "fmov\t%d0, %1"; case 6: return "mov\t%0, %1"; case 7: return aarch64_output_simd_mov_immediate (operands[1], 64); + case 8: return "fmov\t%d0, xzr"; default: gcc_unreachable (); } } [(set_attr "type" "neon_load1_1reg<q>, store_8, neon_store1_1reg<q>,\ neon_logic<q>, neon_to_gp<q>, f_mcr,\ - mov_reg, neon_move<q>")] + mov_reg, neon_move<q>, f_mcr") + (set_attr "arch" "*,*,*,*,*,*,*,simd,*")] ) (define_insn "*aarch64_simd_mov<VQMOV:mode>" [(set (match_operand:VQMOV 0 "nonimmediate_operand" - "=w, Umn, m, w, ?r, ?w, ?r, w") + "=w, Umn, m, w, ?r, ?w, ?r, w, w") (match_operand:VQMOV 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] - "TARGET_SIMD + "m, Dz, w, w, w, r, r, Dn, Dz"))] + "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode) || aarch64_simd_reg_or_zero (operands[1], <MODE>mode))" { @@ -167,14 +175,17 @@ return "#"; case 7: return aarch64_output_simd_mov_immediate (operands[1], 128); + case 8: + return "fmov\t%d0, xzr"; default: gcc_unreachable (); } } [(set_attr "type" "neon_load1_1reg<q>, store_16, neon_store1_1reg<q>,\ neon_logic<q>, multiple, multiple,\ - multiple, neon_move<q>") - (set_attr "length" "4,4,4,4,8,8,8,4")] + multiple, neon_move<q>, fmov") + (set_attr "length" "4,4,4,4,8,8,8,4,4") + (set_attr "arch" "*,*,*,simd,*,*,*,simd,*")] ) ;; When storing lane zero we can use the normal STR and its more permissive @@ -195,7 +206,7 @@ (match_operand:DREG 1 "aarch64_mem_pair_operand" "Ump")) (set (match_operand:DREG2 2 "register_operand" "=w") (match_operand:DREG2 3 "memory_operand" "m"))] - "TARGET_SIMD + "TARGET_FLOAT && rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, XEXP (operands[1], 0), @@ -209,7 +220,7 @@ (match_operand:DREG 1 "register_operand" "w")) (set (match_operand:DREG2 2 "memory_operand" "=m") (match_operand:DREG2 3 "register_operand" "w"))] - "TARGET_SIMD + "TARGET_FLOAT && rtx_equal_p (XEXP (operands[2], 0), plus_constant (Pmode, XEXP (operands[0], 0), @@ -223,7 +234,7 @@ (match_operand:VQ 1 "aarch64_mem_pair_operand" "Ump")) (set (match_operand:VQ2 2 "register_operand" "=w") (match_operand:VQ2 3 "memory_operand" "m"))] - "TARGET_SIMD + "TARGET_FLOAT && rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, XEXP (operands[1], 0), @@ -237,10 +248,11 @@ (match_operand:VQ 1 "register_operand" "w")) (set (match_operand:VQ2 2 "memory_operand" "=m") (match_operand:VQ2 3 "register_operand" "w"))] - "TARGET_SIMD && rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (<VQ:MODE>mode)))" + "TARGET_FLOAT + && rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, + XEXP (operands[0], 0), + GET_MODE_SIZE (<VQ:MODE>mode)))" "stp\\t%q1, %q3, %z0" [(set_attr "type" "neon_stp_q")] ) @@ -248,8 +260,9 @@ (define_split [(set (match_operand:VQMOV 0 "register_operand" "") - (match_operand:VQMOV 1 "register_operand" ""))] - "TARGET_SIMD && reload_completed + (match_operand:VQMOV 1 "register_operand" ""))] + "TARGET_FLOAT + && reload_completed && GP_REGNUM_P (REGNO (operands[0])) && GP_REGNUM_P (REGNO (operands[1]))" [(const_int 0)] @@ -261,7 +274,8 @@ (define_split [(set (match_operand:VQMOV 0 "register_operand" "") (match_operand:VQMOV 1 "register_operand" ""))] - "TARGET_SIMD && reload_completed + "TARGET_FLOAT + && reload_completed && ((FP_REGNUM_P (REGNO (operands[0])) && GP_REGNUM_P (REGNO (operands[1]))) || (GP_REGNUM_P (REGNO (operands[0])) && FP_REGNUM_P (REGNO (operands[1]))))" [(const_int 0)] @@ -273,7 +287,7 @@ (define_expand "@aarch64_split_simd_mov<mode>" [(set (match_operand:VQMOV 0) (match_operand:VQMOV 1))] - "TARGET_SIMD" + "TARGET_FLOAT" { rtx dst = operands[0]; rtx src = operands[1]; @@ -306,13 +320,20 @@ (vec_select:<VHALF> (match_operand:VQMOV 1 "register_operand") (match_operand 2 "ascending_int_parallel")))] - "TARGET_SIMD" + "TARGET_FLOAT" + { + if (vect_par_cnst_lo_half (operands[2], <MODE>mode)) + { + emit_move_insn (operands[0], gen_lowpart (<VHALF>mode, operands[1])); + DONE; + } + } ) (define_expand "aarch64_get_low<mode>" [(match_operand:<VHALF> 0 "register_operand") (match_operand:VQMOV 1 "register_operand")] - "TARGET_SIMD" + "TARGET_FLOAT" { rtx lo = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false); emit_insn (gen_aarch64_get_half<mode> (operands[0], operands[1], lo)); @@ -323,7 +344,7 @@ (define_expand "aarch64_get_high<mode>" [(match_operand:<VHALF> 0 "register_operand") (match_operand:VQMOV 1 "register_operand")] - "TARGET_SIMD" + "TARGET_FLOAT" { rtx hi = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true); emit_insn (gen_aarch64_get_half<mode> (operands[0], operands[1], hi)); @@ -350,15 +371,17 @@ ) (define_insn "aarch64_simd_mov_from_<mode>high" - [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r") + [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r,?r") (vec_select:<VHALF> - (match_operand:VQMOV_NO2E 1 "register_operand" "w,w") + (match_operand:VQMOV_NO2E 1 "register_operand" "w,w,w") (match_operand:VQMOV_NO2E 2 "vect_par_cnst_hi_half" "")))] - "TARGET_SIMD" + "TARGET_FLOAT" "@ - dup\\t%d0, %1.d[1] - umov\t%0, %1.d[1]" - [(set_attr "type" "neon_dup<q>,neon_to_gp<q>") + dup\t%d0, %1.d[1] + umov\t%0, %1.d[1] + fmov\t%0, %1.d[1]" + [(set_attr "type" "neon_dup<q>,neon_to_gp<q>,f_mrc") + (set_attr "arch" "simd,simd,*") (set_attr "length" "4")] ) @@ -4322,12 +4345,22 @@ [(set_attr "type" "neon_to_gp<q>, neon_dup<q>, neon_store1_one_lane<q>")] ) +(define_insn "*aarch64_get_high<mode>" + [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r") + (vec_select:<VEL> + (match_operand:VQ_2E 1 "register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand")])))] + "TARGET_FLOAT && ENDIAN_LANE_N (<nunits>, INTVAL (operands[2])) == 1" + "fmov\t%0, %1.d[1]" + [(set_attr "type" "f_mrc")] +) + (define_insn "load_pair_lanes<mode>" [(set (match_operand:<VDBL> 0 "register_operand" "=w") (vec_concat:<VDBL> (match_operand:VDCSIF 1 "memory_operand" "Utq") (match_operand:VDCSIF 2 "memory_operand" "m")))] - "TARGET_SIMD + "TARGET_FLOAT && aarch64_mergeable_load_pair_p (<VDBL>mode, operands[1], operands[2])" "ldr\\t%<single_dtype>0, %1" [(set_attr "type" "neon_load1_1reg<dblq>")] @@ -4357,7 +4390,7 @@ (vec_concat:<VDBL> (match_operand:VDCSIF 1 "register_operand" "w, r") (match_operand:VDCSIF 2 "register_operand" "w, r")))] - "TARGET_SIMD" + "TARGET_FLOAT" "@ stp\t%<single_type>1, %<single_type>2, %y0 stp\t%<single_wx>1, %<single_wx>2, %y0" @@ -4372,39 +4405,44 @@ ;; the register alternatives either don't accept or themselves disparage. (define_insn "*aarch64_combine_internal<mode>" - [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn") + [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, w, Umn, Umn") (vec_concat:<VDBL> - (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, ?w, ?r") - (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, w, ?r")))] - "TARGET_SIMD + (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, 0, ?w, ?r") + (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, ?r, Utv, w, ?r")))] + "TARGET_FLOAT && !BYTES_BIG_ENDIAN && (register_operand (operands[0], <VDBL>mode) || register_operand (operands[2], <MODE>mode))" "@ ins\t%0.<single_type>[1], %2.<single_type>[0] ins\t%0.<single_type>[1], %<single_wx>2 + fmov\t%0.d[1], %2 ld1\t{%0.<single_type>}[1], %2 stp\t%<single_type>1, %<single_type>2, %y0 stp\t%<single_wx>1, %<single_wx>2, %y0" - [(set_attr "type" "neon_ins<dblq>, neon_from_gp<dblq>, neon_load1_one_lane<dblq>, neon_stp, store_16")] + [(set_attr "type" "neon_ins<dblq>, neon_from_gp<dblq>, f_mcr, + neon_load1_one_lane<dblq>, neon_stp, store_16") + (set_attr "arch" "simd,simd,*,simd,*,*")] ) (define_insn "*aarch64_combine_internal_be<mode>" - [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn") + [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, w, Umn, Umn") (vec_concat:<VDBL> - (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, ?w, ?r") - (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, ?w, ?r")))] - "TARGET_SIMD + (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, ?r, Utv, ?w, ?r") + (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, 0, ?w, ?r")))] + "TARGET_FLOAT && BYTES_BIG_ENDIAN && (register_operand (operands[0], <VDBL>mode) || register_operand (operands[2], <MODE>mode))" "@ ins\t%0.<single_type>[1], %2.<single_type>[0] ins\t%0.<single_type>[1], %<single_wx>2 + fmov\t%0.d[1], %2 ld1\t{%0.<single_type>}[1], %2 stp\t%<single_type>2, %<single_type>1, %y0 stp\t%<single_wx>2, %<single_wx>1, %y0" - [(set_attr "type" "neon_ins<dblq>, neon_from_gp<dblq>, neon_load1_one_lane<dblq>, neon_stp, store_16")] + [(set_attr "type" "neon_ins<dblq>, neon_from_gp<dblq>, f_mcr, neon_load1_one_lane<dblq>, neon_stp, store_16") + (set_attr "arch" "simd,simd,*,simd,*,*")] ) ;; In this insn, operand 1 should be low, and operand 2 the high part of the @@ -4415,13 +4453,12 @@ (vec_concat:<VDBL> (match_operand:VDCSIF 1 "nonimmediate_operand" "w,?r,m") (match_operand:VDCSIF 2 "aarch64_simd_or_scalar_imm_zero")))] - "TARGET_SIMD && !BYTES_BIG_ENDIAN" + "TARGET_FLOAT && !BYTES_BIG_ENDIAN" "@ fmov\\t%<single_type>0, %<single_type>1 fmov\t%<single_type>0, %<single_wx>1 ldr\\t%<single_type>0, %1" - [(set_attr "type" "neon_move<q>, neon_from_gp, neon_load1_1reg") - (set_attr "arch" "simd,fp,simd")] + [(set_attr "type" "neon_move<q>, neon_from_gp, neon_load1_1reg")] ) (define_insn "*aarch64_combinez_be<mode>" @@ -4429,13 +4466,12 @@ (vec_concat:<VDBL> (match_operand:VDCSIF 2 "aarch64_simd_or_scalar_imm_zero") (match_operand:VDCSIF 1 "nonimmediate_operand" "w,?r,m")))] - "TARGET_SIMD && BYTES_BIG_ENDIAN" + "TARGET_FLOAT && BYTES_BIG_ENDIAN" "@ fmov\\t%<single_type>0, %<single_type>1 fmov\t%<single_type>0, %<single_wx>1 ldr\\t%<single_type>0, %1" - [(set_attr "type" "neon_move<q>, neon_from_gp, neon_load1_1reg") - (set_attr "arch" "simd,fp,simd")] + [(set_attr "type" "neon_move<q>, neon_from_gp, neon_load1_1reg")] ) ;; Form a vector whose first half (in array order) comes from operand 1 @@ -4446,7 +4482,7 @@ (vec_concat:<VDBL> (match_operand:VDCSIF 1 "general_operand") (match_operand:VDCSIF 2 "general_operand")))] - "TARGET_SIMD" + "TARGET_FLOAT" { int lo = BYTES_BIG_ENDIAN ? 2 : 1; int hi = BYTES_BIG_ENDIAN ? 1 : 2; @@ -4464,7 +4500,7 @@ } else { - /* Use *aarch64_combine_general<mode>. */ + /* Use *aarch64_combine_internal<mode>. */ operands[lo] = force_reg (<MODE>mode, operands[lo]); if (!aarch64_simd_nonimmediate_operand (operands[hi], <MODE>mode)) { @@ -4486,7 +4522,7 @@ [(match_operand:<VDBL> 0 "register_operand") (match_operand:VDC 1 "general_operand") (match_operand:VDC 2 "general_operand")] - "TARGET_SIMD" + "TARGET_FLOAT" { if (BYTES_BIG_ENDIAN) std::swap (operands[1], operands[2]); @@ -7367,7 +7403,7 @@ (define_expand "mov<mode>" [(set (match_operand:VSTRUCT_QD 0 "nonimmediate_operand") (match_operand:VSTRUCT_QD 1 "general_operand"))] - "TARGET_SIMD" + "TARGET_FLOAT" { if (can_create_pseudo_p ()) { @@ -7379,7 +7415,7 @@ (define_expand "mov<mode>" [(set (match_operand:VSTRUCT 0 "nonimmediate_operand") (match_operand:VSTRUCT 1 "general_operand"))] - "TARGET_SIMD" + "TARGET_FLOAT" { if (can_create_pseudo_p ()) { @@ -7559,7 +7595,8 @@ (define_insn "*aarch64_be_mov<mode>" [(set (match_operand:VSTRUCT_2D 0 "nonimmediate_operand" "=w,m,w") (match_operand:VSTRUCT_2D 1 "general_operand" " w,w,m"))] - "TARGET_SIMD && BYTES_BIG_ENDIAN + "TARGET_FLOAT + && (!TARGET_SIMD || BYTES_BIG_ENDIAN) && (register_operand (operands[0], <MODE>mode) || register_operand (operands[1], <MODE>mode))" "@ @@ -7573,7 +7610,8 @@ (define_insn "*aarch64_be_mov<mode>" [(set (match_operand:VSTRUCT_2Q 0 "nonimmediate_operand" "=w,m,w") (match_operand:VSTRUCT_2Q 1 "general_operand" " w,w,m"))] - "TARGET_SIMD && BYTES_BIG_ENDIAN + "TARGET_FLOAT + && (!TARGET_SIMD || BYTES_BIG_ENDIAN) && (register_operand (operands[0], <MODE>mode) || register_operand (operands[1], <MODE>mode))" "@ @@ -7581,13 +7619,15 @@ stp\\t%q1, %R1, %0 ldp\\t%q0, %R0, %1" [(set_attr "type" "multiple,neon_stp_q,neon_ldp_q") + (set_attr "arch" "simd,*,*") (set_attr "length" "8,4,4")] ) (define_insn "*aarch64_be_movoi" [(set (match_operand:OI 0 "nonimmediate_operand" "=w,m,w") (match_operand:OI 1 "general_operand" " w,w,m"))] - "TARGET_SIMD && BYTES_BIG_ENDIAN + "TARGET_FLOAT + && (!TARGET_SIMD || BYTES_BIG_ENDIAN) && (register_operand (operands[0], OImode) || register_operand (operands[1], OImode))" "@ @@ -7595,57 +7635,66 @@ stp\\t%q1, %R1, %0 ldp\\t%q0, %R0, %1" [(set_attr "type" "multiple,neon_stp_q,neon_ldp_q") + (set_attr "arch" "simd,*,*") (set_attr "length" "8,4,4")] ) (define_insn "*aarch64_be_mov<mode>" [(set (match_operand:VSTRUCT_3QD 0 "nonimmediate_operand" "=w,o,w") (match_operand:VSTRUCT_3QD 1 "general_operand" " w,w,o"))] - "TARGET_SIMD && BYTES_BIG_ENDIAN + "TARGET_FLOAT + && (!TARGET_SIMD || BYTES_BIG_ENDIAN) && (register_operand (operands[0], <MODE>mode) || register_operand (operands[1], <MODE>mode))" "#" [(set_attr "type" "multiple") + (set_attr "arch" "fp<q>,*,*") (set_attr "length" "12,8,8")] ) (define_insn "*aarch64_be_movci" [(set (match_operand:CI 0 "nonimmediate_operand" "=w,o,w") (match_operand:CI 1 "general_operand" " w,w,o"))] - "TARGET_SIMD && BYTES_BIG_ENDIAN + "TARGET_FLOAT + && (!TARGET_SIMD || BYTES_BIG_ENDIAN) && (register_operand (operands[0], CImode) || register_operand (operands[1], CImode))" "#" [(set_attr "type" "multiple") - (set_attr "length" "12,4,4")] + (set_attr "arch" "simd,*,*") + (set_attr "length" "12,8,8")] ) (define_insn "*aarch64_be_mov<mode>" [(set (match_operand:VSTRUCT_4QD 0 "nonimmediate_operand" "=w,o,w") (match_operand:VSTRUCT_4QD 1 "general_operand" " w,w,o"))] - "TARGET_SIMD && BYTES_BIG_ENDIAN + "TARGET_FLOAT + && (!TARGET_SIMD || BYTES_BIG_ENDIAN) && (register_operand (operands[0], <MODE>mode) || register_operand (operands[1], <MODE>mode))" "#" [(set_attr "type" "multiple") + (set_attr "arch" "fp<q>,*,*") (set_attr "length" "16,8,8")] ) (define_insn "*aarch64_be_movxi" [(set (match_operand:XI 0 "nonimmediate_operand" "=w,o,w") (match_operand:XI 1 "general_operand" " w,w,o"))] - "TARGET_SIMD && BYTES_BIG_ENDIAN + "TARGET_FLOAT + && (!TARGET_SIMD || BYTES_BIG_ENDIAN) && (register_operand (operands[0], XImode) || register_operand (operands[1], XImode))" "#" [(set_attr "type" "multiple") - (set_attr "length" "16,4,4")] + (set_attr "arch" "simd,*,*") + (set_attr "length" "16,8,8")] ) (define_split [(set (match_operand:VSTRUCT_2QD 0 "register_operand") (match_operand:VSTRUCT_2QD 1 "register_operand"))] - "TARGET_SIMD && reload_completed" + "TARGET_FLOAT && reload_completed" [(const_int 0)] { aarch64_simd_emit_reg_reg_move (operands, <VSTRUCT_ELT>mode, 2); @@ -7655,7 +7704,7 @@ (define_split [(set (match_operand:OI 0 "register_operand") (match_operand:OI 1 "register_operand"))] - "TARGET_SIMD && reload_completed" + "TARGET_FLOAT && reload_completed" [(const_int 0)] { aarch64_simd_emit_reg_reg_move (operands, TImode, 2); @@ -7665,7 +7714,7 @@ (define_split [(set (match_operand:VSTRUCT_3QD 0 "nonimmediate_operand") (match_operand:VSTRUCT_3QD 1 "general_operand"))] - "TARGET_SIMD && reload_completed" + "TARGET_FLOAT && reload_completed" [(const_int 0)] { if (register_operand (operands[0], <MODE>mode) @@ -7674,7 +7723,7 @@ aarch64_simd_emit_reg_reg_move (operands, <VSTRUCT_ELT>mode, 3); DONE; } - else if (BYTES_BIG_ENDIAN) + else if (!TARGET_SIMD || BYTES_BIG_ENDIAN) { int elt_size = GET_MODE_SIZE (<MODE>mode).to_constant () / <nregs>; machine_mode pair_mode = elt_size == 16 ? V2x16QImode : V2x8QImode; @@ -7701,7 +7750,7 @@ (define_split [(set (match_operand:CI 0 "nonimmediate_operand") (match_operand:CI 1 "general_operand"))] - "TARGET_SIMD && reload_completed" + "TARGET_FLOAT && reload_completed" [(const_int 0)] { if (register_operand (operands[0], CImode) @@ -7710,7 +7759,7 @@ aarch64_simd_emit_reg_reg_move (operands, TImode, 3); DONE; } - else if (BYTES_BIG_ENDIAN) + else if (!TARGET_SIMD || BYTES_BIG_ENDIAN) { emit_move_insn (simplify_gen_subreg (OImode, operands[0], CImode, 0), simplify_gen_subreg (OImode, operands[1], CImode, 0)); @@ -7729,7 +7778,7 @@ (define_split [(set (match_operand:VSTRUCT_4QD 0 "nonimmediate_operand") (match_operand:VSTRUCT_4QD 1 "general_operand"))] - "TARGET_SIMD && reload_completed" + "TARGET_FLOAT && reload_completed" [(const_int 0)] { if (register_operand (operands[0], <MODE>mode) @@ -7738,7 +7787,7 @@ aarch64_simd_emit_reg_reg_move (operands, <VSTRUCT_ELT>mode, 4); DONE; } - else if (BYTES_BIG_ENDIAN) + else if (!TARGET_SIMD || BYTES_BIG_ENDIAN) { int elt_size = GET_MODE_SIZE (<MODE>mode).to_constant () / <nregs>; machine_mode pair_mode = elt_size == 16 ? V2x16QImode : V2x8QImode; @@ -7759,7 +7808,7 @@ (define_split [(set (match_operand:XI 0 "nonimmediate_operand") (match_operand:XI 1 "general_operand"))] - "TARGET_SIMD && reload_completed" + "TARGET_FLOAT && reload_completed" [(const_int 0)] { if (register_operand (operands[0], XImode) @@ -7768,7 +7817,7 @@ aarch64_simd_emit_reg_reg_move (operands, TImode, 4); DONE; } - else if (BYTES_BIG_ENDIAN) + else if (!TARGET_SIMD || BYTES_BIG_ENDIAN) { emit_move_insn (simplify_gen_subreg (OImode, operands[0], XImode, 0), simplify_gen_subreg (OImode, operands[1], XImode, 0)); diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 02210ed13..b4b646fa0 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -3748,7 +3748,7 @@ aarch64_classify_vector_mode (machine_mode mode) case E_OImode: case E_CImode: case E_XImode: - return TARGET_SIMD ? VEC_ADVSIMD | VEC_STRUCT : 0; + return TARGET_FLOAT ? VEC_ADVSIMD | VEC_STRUCT : 0; /* Structures of 64-bit Advanced SIMD vectors. */ case E_V2x8QImode: @@ -3775,7 +3775,7 @@ aarch64_classify_vector_mode (machine_mode mode) case E_V4x4HFmode: case E_V4x2SFmode: case E_V4x1DFmode: - return TARGET_SIMD ? VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL : 0; + return TARGET_FLOAT ? VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL : 0; /* Structures of 128-bit Advanced SIMD vectors. */ case E_V2x16QImode: @@ -3802,7 +3802,7 @@ aarch64_classify_vector_mode (machine_mode mode) case E_V4x8HFmode: case E_V4x4SFmode: case E_V4x2DFmode: - return TARGET_SIMD ? VEC_ADVSIMD | VEC_STRUCT : 0; + return TARGET_FLOAT ? VEC_ADVSIMD | VEC_STRUCT : 0; /* 64-bit Advanced SIMD vectors. */ case E_V8QImode: @@ -3822,7 +3822,7 @@ aarch64_classify_vector_mode (machine_mode mode) case E_V8BFmode: case E_V4SFmode: case E_V2DFmode: - return TARGET_SIMD ? VEC_ADVSIMD : 0; + return TARGET_FLOAT ? VEC_ADVSIMD : 0; default: return 0; @@ -4110,7 +4110,8 @@ aarch64_vectorize_related_mode (machine_mode vector_mode, } /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors. */ - if ((vec_flags & VEC_ADVSIMD) + if (TARGET_SIMD + && (vec_flags & VEC_ADVSIMD) && known_eq (nunits, 0U) && known_eq (GET_MODE_BITSIZE (vector_mode), 64U) && maybe_ge (GET_MODE_BITSIZE (element_mode) @@ -4208,7 +4209,7 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) if (GP_REGNUM_P (regno)) { - if (vec_flags & VEC_ANY_SVE) + if (vec_flags & (VEC_ANY_SVE | VEC_STRUCT)) return false; if (known_le (GET_MODE_SIZE (mode), 8)) return true; @@ -10884,7 +10885,8 @@ aarch64_classify_address (struct aarch64_address_info *info, || mode == TImode || mode == TFmode || mode == TDmode - || (BYTES_BIG_ENDIAN && advsimd_struct_p)); + || ((!TARGET_SIMD || BYTES_BIG_ENDIAN) + && advsimd_struct_p)); /* If we are dealing with ADDR_QUERY_LDP_STP_N that means the incoming mode corresponds to the actual size of the memory being loaded/stored and the mode of the corresponding addressing mode is half of that. */ @@ -10914,6 +10916,7 @@ aarch64_classify_address (struct aarch64_address_info *info, /* On LE, for AdvSIMD, don't support anything other than POST_INC or REG addressing. */ if (advsimd_struct_p + && TARGET_SIMD && !BYTES_BIG_ENDIAN && (code != POST_INC && code != REG)) return false; @@ -10976,7 +10979,7 @@ aarch64_classify_address (struct aarch64_address_info *info, && aarch64_offset_7bit_signed_scaled_p (DImode, offset + 48)); /* A 7bit offset check because OImode will emit a ldp/stp - instruction (only big endian will get here). + instruction (only !TARGET_SIMD or big endian will get here). For ldp/stp instructions, the offset is scaled for the size of a single element of the pair. */ if (aarch64_advsimd_partial_struct_mode_p (mode) @@ -10987,7 +10990,8 @@ aarch64_classify_address (struct aarch64_address_info *info, return aarch64_offset_7bit_signed_scaled_p (TImode, offset); /* Three 9/12 bit offsets checks because CImode will emit three - ldr/str instructions (only big endian will get here). */ + ldr/str instructions (only !TARGET_SIMD or big endian will + get here). */ if (aarch64_advsimd_partial_struct_mode_p (mode) && known_eq (GET_MODE_SIZE (mode), 24)) return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) @@ -12716,18 +12720,16 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, /* Use aarch64_sve_reload_mem for SVE memory reloads that cannot use LDR and STR. See the comment at the head of aarch64-sve.md for more details about the big-endian handling. */ + unsigned int vec_flags = aarch64_classify_vector_mode (mode); if (reg_class_subset_p (rclass, FP_REGS) && !((REG_P (x) && HARD_REGISTER_P (x)) || aarch64_simd_valid_immediate (x, NULL)) - && mode != VNx16QImode) + && mode != VNx16QImode + && (vec_flags & VEC_SVE_DATA) + && ((vec_flags & VEC_PARTIAL) || BYTES_BIG_ENDIAN)) { - unsigned int vec_flags = aarch64_classify_vector_mode (mode); - if ((vec_flags & VEC_SVE_DATA) - && ((vec_flags & VEC_PARTIAL) || BYTES_BIG_ENDIAN)) - { - sri->icode = CODE_FOR_aarch64_sve_reload_mem; - return NO_REGS; - } + sri->icode = CODE_FOR_aarch64_sve_reload_mem; + return NO_REGS; } /* If we have to disable direct literal pool loads and stores because the @@ -12744,9 +12746,13 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, /* Without the TARGET_SIMD instructions we cannot move a Q register to a Q register directly. We need a scratch. */ if (REG_P (x) - && (mode == TFmode || mode == TImode || mode == TDmode) + && (mode == TFmode + || mode == TImode + || mode == TDmode + || (vec_flags == VEC_ADVSIMD && known_eq (GET_MODE_SIZE (mode), 16))) && mode == GET_MODE (x) - && FP_REGNUM_P (REGNO (x)) && !TARGET_SIMD + && !TARGET_SIMD + && FP_REGNUM_P (REGNO (x)) && reg_class_subset_p (rclass, FP_REGS)) { sri->icode = code_for_aarch64_reload_mov (mode); @@ -12768,6 +12774,28 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, return NO_REGS; } +/* Implement TARGET_SECONDARY_MEMORY_NEEDED. */ + +static bool +aarch64_secondary_memory_needed (machine_mode mode, reg_class_t class1, + reg_class_t class2) +{ + if (!TARGET_SIMD + && reg_classes_intersect_p (class1, FP_REGS) + && reg_classes_intersect_p (class2, FP_REGS)) + { + /* We can't do a 128-bit FPR-to-FPR move without TARGET_SIMD, + so we can't easily split a move involving tuples of 128-bit + vectors. Force the copy through memory instead. + + (Tuples of 64-bit vectors are fine.) */ + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT)) + return true; + } + return false; +} + static bool aarch64_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to) { @@ -13311,7 +13339,7 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int outer, bool speed) if (VECTOR_MODE_P (mode)) { unsigned int vec_flags = aarch64_classify_vector_mode (mode); - if (vec_flags & VEC_ADVSIMD) + if (TARGET_SIMD && (vec_flags & VEC_ADVSIMD)) { /* The select-operand-high-half versions of the instruction have the same cost as the three vector version - don't add the costs of the @@ -14257,7 +14285,7 @@ cost_minus: { /* SUBL2 and SUBW2. */ unsigned int vec_flags = aarch64_classify_vector_mode (mode); - if (vec_flags & VEC_ADVSIMD) + if (TARGET_SIMD && (vec_flags & VEC_ADVSIMD)) { /* The select-operand-high-half versions of the sub instruction have the same cost as the regular three vector version - @@ -14359,7 +14387,7 @@ cost_plus: { /* ADDL2 and ADDW2. */ unsigned int vec_flags = aarch64_classify_vector_mode (mode); - if (vec_flags & VEC_ADVSIMD) + if (TARGET_SIMD && (vec_flags & VEC_ADVSIMD)) { /* The select-operand-high-half versions of the add instruction have the same cost as the regular three vector version - @@ -15284,7 +15312,9 @@ aarch64_register_move_cost (machine_mode mode, return aarch64_register_move_cost (mode, from, GENERAL_REGS) + aarch64_register_move_cost (mode, GENERAL_REGS, to); - if (known_eq (GET_MODE_SIZE (mode), 16)) + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + if (vec_flags != (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL) + && known_eq (GET_MODE_SIZE (mode), 16)) { /* 128-bit operations on general registers require 2 instructions. */ if (from == GENERAL_REGS && to == GENERAL_REGS) @@ -15312,6 +15342,16 @@ aarch64_register_move_cost (machine_mode mode, else if (to == GENERAL_REGS) return regmove_cost->FP2GP; + if (!TARGET_SIMD && vec_flags == (VEC_ADVSIMD | VEC_STRUCT)) + { + /* Needs a round-trip through memory, which can use LDP/STP for pairs. + The cost must be greater than 2 units to indicate that direct + moves aren't possible. */ + auto per_vector = (aarch64_tune_params.memmov_cost.load_fp + + aarch64_tune_params.memmov_cost.store_fp); + return MIN (CEIL (per_vector, 2), 4); + } + return regmove_cost->FP2FP; } @@ -21504,6 +21544,9 @@ aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info, if (vec_flags == 0 || vec_flags == (VEC_ADVSIMD | VEC_STRUCT)) return false; + if ((vec_flags & VEC_ADVSIMD) && !TARGET_SIMD) + return false; + if (vec_flags & VEC_SVE_PRED) return aarch64_sve_pred_valid_immediate (op, info); @@ -24430,7 +24473,7 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) std::swap (d->op0, d->op1); } - if ((d->vec_flags == VEC_ADVSIMD + if (((d->vec_flags == VEC_ADVSIMD && TARGET_SIMD) || d->vec_flags == VEC_SVE_DATA || d->vec_flags == (VEC_SVE_DATA | VEC_PARTIAL) || d->vec_flags == VEC_SVE_PRED) @@ -27977,6 +28020,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_SECONDARY_RELOAD #define TARGET_SECONDARY_RELOAD aarch64_secondary_reload +#undef TARGET_SECONDARY_MEMORY_NEEDED +#define TARGET_SECONDARY_MEMORY_NEEDED aarch64_secondary_memory_needed + #undef TARGET_SHIFT_TRUNCATION_MASK #define TARGET_SHIFT_TRUNCATION_MASK aarch64_shift_truncation_mask diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 8757a962f..c0cc91756 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -374,8 +374,11 @@ ;; Attributes of the architecture required to support the instruction (or ;; alternative). This attribute is used to compute attribute "enabled", use type ;; "any" to enable an alternative in all cases. +;; +;; As a convenience, "fp_q" means "fp" + the ability to move between +;; Q registers and is equivalent to "simd". -(define_enum "arches" [ any rcpc8_4 fp simd sve fp16]) +(define_enum "arches" [ any rcpc8_4 fp fp_q simd sve fp16]) (define_enum_attr "arch" "arches" (const_string "any")) @@ -403,7 +406,7 @@ (and (eq_attr "arch" "fp") (match_test "TARGET_FLOAT")) - (and (eq_attr "arch" "simd") + (and (eq_attr "arch" "fp_q, simd") (match_test "TARGET_SIMD")) (and (eq_attr "arch" "fp16") @@ -6768,8 +6771,8 @@ ) (define_expand "@aarch64_reload_mov<mode>" - [(set (match_operand:TX 0 "register_operand" "=w") - (match_operand:TX 1 "register_operand" "w")) + [(set (match_operand:VTX 0 "register_operand" "=w") + (match_operand:VTX 1 "register_operand" "w")) (clobber (match_operand:DI 2 "register_operand" "=&r")) ] "TARGET_FLOAT" diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d0cd1b788..a8a39b65a 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -313,6 +313,8 @@ (define_mode_iterator TX [TI TF TD]) +(define_mode_iterator VTX [TI TF TD V16QI V8HI V4SI V2DI V8HF V4SF V2DF V8BF]) + ;; Advanced SIMD opaque structure modes. (define_mode_iterator VSTRUCT [OI CI XI]) diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_20.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_20.c new file mode 100644 index 000000000..7e705e119 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_20.c @@ -0,0 +1,7 @@ +/* { dg-options "-O2" } */ + +#pragma GCC target "+nosimd+fp" + +#include "ldp_stp_6.c" + +/* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_21.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_21.c new file mode 100644 index 000000000..462e3c9aa --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_21.c @@ -0,0 +1,7 @@ +/* { dg-options "-O2" } */ + +#pragma GCC target "+nosimd+fp" + +#include "ldp_stp_8.c" + +/* { dg-final { scan-assembler-times "ldp\td\[0-9\], d\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_22.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_22.c new file mode 100644 index 000000000..283c56dd2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_22.c @@ -0,0 +1,13 @@ +/* { dg-options "-O2" } */ + +#pragma GCC target "+nosimd+fp" + +void +foo (__Float32x4_t *ptr) +{ + ptr[0] = ptr[2]; + ptr[1] = ptr[3]; +} + +/* { dg-final { scan-assembler {\tldp\tq[0-9]+, q[0-9]+} } } */ +/* { dg-final { scan-assembler {\tstp\tq[0-9]+, q[0-9]+} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_23.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_23.c new file mode 100644 index 000000000..b14976cfe --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_23.c @@ -0,0 +1,16 @@ +/* { dg-options "-O2" } */ + +#pragma GCC target "+nosimd+fp" + +void +foo (char *char_ptr) +{ + __Float64x2_t *ptr = (__Float64x2_t *)(char_ptr + 1); + asm volatile ("" :: + "w" (ptr[1]), + "w" (ptr[2]), + "w" (ptr[3]), + "w" (ptr[4])); +} + +/* { dg-final { scan-assembler-times {\tldp\tq[0-9]+, q[0-9]+} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_24.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_24.c new file mode 100644 index 000000000..a99426eb2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_24.c @@ -0,0 +1,16 @@ +/* { dg-options "-O2" } */ + +#pragma GCC target "+nosimd+fp" + +void +foo (char *char_ptr) +{ + __Float64x2_t *ptr = (__Float64x2_t *)(char_ptr + 1); + asm volatile ("" : + "=w" (ptr[1]), + "=w" (ptr[2]), + "=w" (ptr[3]), + "=w" (ptr[4])); +} + +/* { dg-final { scan-assembler-times {\tstp\tq[0-9]+, q[0-9]+} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_1.c index 8a6afb13b..cac4241b0 100644 --- a/gcc/testsuite/gcc.target/aarch64/movv16qi_1.c +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_1.c @@ -80,3 +80,24 @@ fpr_to_gpr (v16qi q0) x0 = q0; asm volatile ("" :: "r" (x0)); } + +/* +** gpr_to_gpr: +** ( +** mov x0, x2 +** mov x1, x3 +** | +** mov x1, x3 +** mov x0, x2 +** ) +** ret +*/ +void +gpr_to_gpr () +{ + register v16qi x0 asm ("x0"); + register v16qi x2 asm ("x2"); + asm volatile ("" : "=r" (x2)); + x0 = x2; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c new file mode 100644 index 000000000..08a0a19b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c @@ -0,0 +1,27 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ + +#pragma GCC target "+nosimd+fp" + +#define TEST_GENERAL(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE zero_##TYPE () { return (TYPE) {}; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_GENERAL (__Int8x16_t) +TEST_GENERAL (__Int16x8_t) +TEST_GENERAL (__Int32x4_t) +TEST_GENERAL (__Int64x2_t) +TEST_GENERAL (__Bfloat16x8_t) +TEST_GENERAL (__Float16x8_t) +TEST_GENERAL (__Float32x4_t) +TEST_GENERAL (__Float64x2_t) + +__Int8x16_t const_s8x8 () { return (__Int8x16_t) { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }; } +__Int16x8_t const_s16x4 () { return (__Int16x8_t) { 1, 0, 1, 0, 1, 0, 1, 0 }; } +__Int32x4_t const_s32x2 () { return (__Int32x4_t) { 1, 2, 3, 4 }; } +__Int64x2_t const_s64x1 () { return (__Int64x2_t) { 100, 100 }; } +__Float16x8_t const_f16x4 () { return (__Float16x8_t) { 2, 2, 2, 2, 2, 2, 2, 2 }; } +__Float32x4_t const_f32x2 () { return (__Float32x4_t) { 1, 2, 1, 2 }; } +__Float64x2_t const_f64x1 () { return (__Float64x2_t) { 32, 32 }; } diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_3.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_3.c new file mode 100644 index 000000000..d43b994c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_3.c @@ -0,0 +1,30 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE \ + test_##TYPE (void) \ + { \ + typedef TYPE v __attribute__((aligned(1))); \ + register v *ptr asm ("x0"); \ + asm volatile ("" : "=r" (ptr)); \ + return *ptr; \ + } + +TEST_VECTOR (__Int8x16_t) +TEST_VECTOR (__Int16x8_t) +TEST_VECTOR (__Int32x4_t) +TEST_VECTOR (__Int64x2_t) +TEST_VECTOR (__Bfloat16x8_t) +TEST_VECTOR (__Float16x8_t) +TEST_VECTOR (__Float32x4_t) +TEST_VECTOR (__Float64x2_t) + +/* +** test___Int8x16_t: +** ldr q0, \[x0\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/movv2di_1.c b/gcc/testsuite/gcc.target/aarch64/movv2di_1.c new file mode 100644 index 000000000..e3b55fd52 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv2di_1.c @@ -0,0 +1,103 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nothing+nosimd+fp" + +typedef long long v2di __attribute__((vector_size(16))); + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +v2di +fpr_to_fpr (v2di q0, v2di q1) +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +v2di +gpr_to_fpr () +{ + register v2di x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v2di +zero_to_fpr () +{ + return (v2di) {}; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +void +fpr_to_gpr (v2di q0) +{ + register v2di x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} + +/* +** gpr_to_gpr: +** ( +** mov x0, x2 +** mov x1, x3 +** | +** mov x1, x3 +** mov x0, x2 +** ) +** ret +*/ +void +gpr_to_gpr () +{ + register v2di x0 asm ("x0"); + register v2di x2 asm ("x2"); + asm volatile ("" : "=r" (x2)); + x0 = x2; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv2x16qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv2x16qi_1.c new file mode 100644 index 000000000..90e3b426d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv2x16qi_1.c @@ -0,0 +1,40 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC aarch64 "arm_neon.h" + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_VECTOR (int8x16x2_t) +TEST_VECTOR (int16x8x2_t) +TEST_VECTOR (int32x4x2_t) +TEST_VECTOR (int64x2x2_t) +TEST_VECTOR (float16x8x2_t) +TEST_VECTOR (bfloat16x8x2_t) +TEST_VECTOR (float32x4x2_t) +TEST_VECTOR (float64x2x2_t) + +/* +** mov_int8x16x2_t: +** sub sp, sp, #32 +** stp q2, q3, \[sp\] +** ldp q0, q1, \[sp\] +** add sp, sp, #?32 +** ret +*/ +/* +** load_int8x16x2_t: +** ldp q0, q1, \[x0\] +** ret +*/ +/* +** store_int8x16x2_t: { xfail *-*-* } +** stp q0, q1, \[x0\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/movv2x8qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv2x8qi_1.c new file mode 100644 index 000000000..883a0ea71 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv2x8qi_1.c @@ -0,0 +1,38 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC aarch64 "arm_neon.h" + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_VECTOR (int8x8x2_t) +TEST_VECTOR (int16x4x2_t) +TEST_VECTOR (int32x2x2_t) +TEST_VECTOR (int64x1x2_t) +TEST_VECTOR (float16x4x2_t) +TEST_VECTOR (bfloat16x4x2_t) +TEST_VECTOR (float32x2x2_t) +TEST_VECTOR (float64x1x2_t) + +/* +** mov_int8x8x2_t: +** fmov d0, d2 +** fmov d1, d3 +** ret +*/ +/* +** load_int8x8x2_t: +** ldp d0, d1, \[x0\] +** ret +*/ +/* +** store_int8x8x2_t: +** stp d0, d1, \[x0\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/movv3x16qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv3x16qi_1.c new file mode 100644 index 000000000..070a596bf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv3x16qi_1.c @@ -0,0 +1,44 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC aarch64 "arm_neon.h" + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_VECTOR (int8x16x3_t) +TEST_VECTOR (int16x8x3_t) +TEST_VECTOR (int32x4x3_t) +TEST_VECTOR (int64x2x3_t) +TEST_VECTOR (float16x8x3_t) +TEST_VECTOR (bfloat16x8x3_t) +TEST_VECTOR (float32x4x3_t) +TEST_VECTOR (float64x2x3_t) + +/* +** mov_int8x16x3_t: +** sub sp, sp, #48 +** stp q3, q4, \[sp\] +** str q5, \[sp, #?32\] +** ldp q0, q1, \[sp\] +** ldr q2, \[sp, #?32\] +** add sp, sp, #?48 +** ret +*/ +/* +** load_int8x16x3_t: +** ldp q0, q1, \[x0\] +** ldr q2, \[x0, #?32\] +** ret +*/ +/* +** store_int8x16x3_t: { xfail *-*-* } +** stp q0, q1, \[x0\] +** stp q2, \[x0, #?32\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/movv3x8qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv3x8qi_1.c new file mode 100644 index 000000000..4b873d749 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv3x8qi_1.c @@ -0,0 +1,41 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC aarch64 "arm_neon.h" + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_VECTOR (int8x8x3_t) +TEST_VECTOR (int16x4x3_t) +TEST_VECTOR (int32x2x3_t) +TEST_VECTOR (int64x1x3_t) +TEST_VECTOR (float16x4x3_t) +TEST_VECTOR (bfloat16x4x3_t) +TEST_VECTOR (float32x2x3_t) +TEST_VECTOR (float64x1x3_t) + +/* +** mov_int8x8x3_t: +** fmov d0, d3 +** fmov d1, d4 +** fmov d2, d5 +** ret +*/ +/* +** load_int8x8x3_t: +** ldp d0, d1, \[x0\] +** ldr d2, \[x0, #?16\] +** ret +*/ +/* +** store_int8x8x3_t: +** stp d0, d1, \[x0\] +** str d2, \[x0, #?16\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/movv4x16qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv4x16qi_1.c new file mode 100644 index 000000000..6a517b4fe --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv4x16qi_1.c @@ -0,0 +1,44 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC aarch64 "arm_neon.h" + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_VECTOR (int8x16x4_t) +TEST_VECTOR (int16x8x4_t) +TEST_VECTOR (int32x4x4_t) +TEST_VECTOR (int64x2x4_t) +TEST_VECTOR (float16x8x4_t) +TEST_VECTOR (bfloat16x8x4_t) +TEST_VECTOR (float32x4x4_t) +TEST_VECTOR (float64x2x4_t) + +/* +** mov_int8x16x4_t: +** sub sp, sp, #64 +** stp q4, q5, \[sp\] +** stp q6, q7, \[sp, #?32\] +** ldp q0, q1, \[sp\] +** ldp q2, q3, \[sp, #?32\] +** add sp, sp, #?64 +** ret +*/ +/* +** load_int8x16x4_t: +** ldp q0, q1, \[x0\] +** ldp q2, q3, \[x0, #?32\] +** ret +*/ +/* +** store_int8x16x4_t: { xfail *-*-* } +** stp q0, q1, \[x0\] +** stp q2, q3, \[x0, #?32\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/movv4x8qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv4x8qi_1.c new file mode 100644 index 000000000..f096be4a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv4x8qi_1.c @@ -0,0 +1,42 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC aarch64 "arm_neon.h" + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_VECTOR (int8x8x4_t) +TEST_VECTOR (int16x4x4_t) +TEST_VECTOR (int32x2x4_t) +TEST_VECTOR (int64x1x4_t) +TEST_VECTOR (float16x4x4_t) +TEST_VECTOR (bfloat16x4x4_t) +TEST_VECTOR (float32x2x4_t) +TEST_VECTOR (float64x1x4_t) + +/* +** mov_int8x8x4_t: +** fmov d0, d4 +** fmov d1, d5 +** fmov d2, d6 +** fmov d3, d7 +** ret +*/ +/* +** load_int8x8x4_t: +** ldp d0, d1, \[x0\] +** ldp d2, d3, \[x0, #?16\] +** ret +*/ +/* +** store_int8x8x4_t: +** stp d0, d1, \[x0\] +** stp d2, d3, \[x0, #?16\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_1.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_1.c index 4c97e6fbc..d2b5d8025 100644 --- a/gcc/testsuite/gcc.target/aarch64/movv8qi_1.c +++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_1.c @@ -53,3 +53,18 @@ fpr_to_gpr (v8qi q0) x0 = q0; asm volatile ("" :: "r" (x0)); } + +/* +** gpr_to_gpr: +** mov x0, x1 +** ret +*/ +void +gpr_to_gpr () +{ + register v8qi x0 asm ("x0"); + register v8qi x1 asm ("x1"); + asm volatile ("" : "=r" (x1)); + x0 = x1; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_2.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_2.c new file mode 100644 index 000000000..0d8576ffe --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_2.c @@ -0,0 +1,27 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ + +#pragma GCC target "+nosimd+fp" + +#define TEST_GENERAL(TYPE) \ + TYPE mov_##TYPE (TYPE a, TYPE b) { return b; } \ + TYPE zero_##TYPE () { return (TYPE) {}; } \ + TYPE load_##TYPE (TYPE *ptr) { return *ptr; } \ + void store_##TYPE (TYPE *ptr, TYPE a) { *ptr = a; } + +TEST_GENERAL (__Int8x8_t) +TEST_GENERAL (__Int16x4_t) +TEST_GENERAL (__Int32x2_t) +TEST_GENERAL (__Int64x1_t) +TEST_GENERAL (__Bfloat16x4_t) +TEST_GENERAL (__Float16x4_t) +TEST_GENERAL (__Float32x2_t) +TEST_GENERAL (__Float64x1_t) + +__Int8x8_t const_s8x8 () { return (__Int8x8_t) { 1, 1, 1, 1, 1, 1, 1, 1 }; } +__Int16x4_t const_s16x4 () { return (__Int16x4_t) { 1, 0, 1, 0 }; } +__Int32x2_t const_s32x2 () { return (__Int32x2_t) { 1, 2 }; } +__Int64x1_t const_s64x1 () { return (__Int64x1_t) { 100 }; } +__Float16x4_t const_f16x4 () { return (__Float16x4_t) { 2, 2, 2, 2 }; } +__Float32x2_t const_f32x2 () { return (__Float32x2_t) { 1, 2 }; } +__Float64x1_t const_f64x1 () { return (__Float64x1_t) { 32 }; } diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_3.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_3.c new file mode 100644 index 000000000..1caa1a788 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_3.c @@ -0,0 +1,30 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosimd+fp" + +#define TEST_VECTOR(TYPE) \ + TYPE \ + test_##TYPE (void) \ + { \ + typedef TYPE v __attribute__((aligned(1))); \ + register v *ptr asm ("x0"); \ + asm volatile ("" : "=r" (ptr)); \ + return *ptr; \ + } + +TEST_VECTOR (__Int8x8_t) +TEST_VECTOR (__Int16x4_t) +TEST_VECTOR (__Int32x2_t) +TEST_VECTOR (__Int64x1_t) +TEST_VECTOR (__Bfloat16x4_t) +TEST_VECTOR (__Float16x4_t) +TEST_VECTOR (__Float32x2_t) +TEST_VECTOR (__Float64x1_t) + +/* +** test___Int8x8_t: +** ldr d0, \[x0\] +** ret +*/ diff --git a/gcc/testsuite/gcc.target/aarch64/vect_unary_2.c b/gcc/testsuite/gcc.target/aarch64/vect_unary_2.c new file mode 100644 index 000000000..454ac2771 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect_unary_2.c @@ -0,0 +1,5 @@ +/* { dg-options "-O3 -fno-math-errno --save-temps" } */ + +#pragma GCC target "+nosimd+fp" + +#include "vect_unary_1.c" -- 2.33.0
Locations
Projects
Search
Status Monitor
Help
Open Build Service
OBS Manuals
API Documentation
OBS Portal
Reporting a Bug
Contact
Mailing List
Forums
Chat (IRC)
Twitter
Open Build Service (OBS)
is an
openSUSE project
.
浙ICP备2022010568号-2