Commit graph

187 commits

Author SHA1 Message Date
Dzmitry Dubrova b87f269282 core: misc changes (#430)
* core: misc changes

* video_core: add some formats for detiling

* clang format
2024-08-14 20:37:05 +02:00
TheTurtle d5e7180c54 spirv: Simplify shared memory handling (#427)
* spirv: Simplify shared memory handling

* spirv: Ignore clip plane

* spirv: Fix image offsets

* ir_pass: Implement shared memory lowering pass

* NVIDIA doesn't like using shared mem in fragment shader and softlocks driver

* spirv: Add log for ignoring pos1
2024-08-14 19:01:17 +03:00
TheTurtle 705d1e29cf video_core: Various fixes (#423)
* video_core: Various fixes

* clang format
2024-08-13 20:05:10 +03:00
TheTurtle c243ef0c58 video_core: Crucial buffer cache fixes + proper GPU clears (#414)
* translator: Use templates for stronger type guarantees

* spirv: Define buffer offsets upfront

* Saves a lot of shader instructions

* buffer_cache: Use dynamic vertex input when available

* Fixes issues when games like dark souls rebind vertex buffers with different stride

* externals: Update boost

* spirv: Use runtime array for ssbos

* ssbos can be large and typically their size will vary, especially in generic copy/clear cs shaders

* fs: Lock when doing case insensitive search

* Dark Souls does fs lookups from different threads

* texture_cache: More precise invalidation from compute

* Fixes unrelated render targets being cleared

* texture_cache: Use hashes for protect gpu modified images from reupload

* translator: Treat V_CNDMASK as float

* Sometimes it can have input modifiers. Worst this will cause is some extra calls to uintBitsToFloat and opposite. But most often this is used as float anyway

* translator: Small optimization for V_SAD_U32

* Fix review

* clang format
2024-08-13 09:21:48 +03:00
Vinicius Rangel 5b589b8cc8 spirv: fix image sample lod/clamp/offset translation (#402)
* spirv: fix image sample lod/clamp translation

* spirv: fix image sample offsets

* fix ImageSample opcodes & offset emission
2024-08-13 09:12:38 +03:00
psucien 29b76d8a2b Build stabilization (#413)
* shader_recompiler: fix for float convert and debug asserts

* libraries: kernel: correct return code on invalid semaphore

* amdgpu: additional case for cb extents retrieval heuristic

* removed redundant check in assert

* amdgpu: fix for linear tiling mode detection fin color buffers

* texture_cache: fix for unexpected scheduler flushes by detiler

* renderer_vulkan: missing depth barrier

* texture_cache: missed slices in rt view; + detiler format
2024-08-12 17:23:01 +03:00
IndecisiveTurtle a32d8e1c55 vk_graphics_pipeline: Fix regression 2024-08-08 17:01:03 +03:00
TheTurtle 8809e1c226 video_core: Implement guest buffer manager (#373)
* video_core: Introduce buffer cache

* video_core: Use multi level page table for caches

* renderer_vulkan: Remove unused stream buffer

* fix build

* oops forgot optimize off
2024-08-08 15:02:10 +03:00
TheTurtle e91da052c3 video_core: Minor fixes (#366)
* data_share: Fix DS instruction

* vk_graphics_pipeline: Fix unnecessary invalidate

* spirv: Remove subgroup id

* vector_alu: Simplify mbcnt pattern

* shader_recompiler: More instructions

* clang format

* kernel: Fix cond memory leak and reduce spam

* liverpool: Print error on exception

* build fix
2024-08-05 13:45:28 +03:00
TheTurtle bfc845324c shader_recompiler: Small instruction parsing refactor/bugfixes (#340)
* translator: Implemtn f32 to f16 convert

* shader_recompiler: Add bit instructions

* shader_recompiler: More data share instructions

* shader_recompiler: Remove exec contexts, fix S_MOV_B64

* shader_recompiler: Split instruction parsing into categories

* shader_recompiler: Better BFS search

* shader_recompiler: Constant propagation pass for cmp_class_f32

* shader_recompiler: Partial readfirstlane implementation

* shader_recompiler: Stub readlane/writelane only for non-compute

* hack: Fix swizzle on RDR

* Will properly fix this when merging this

* clang format

* address_space: Bump user area size to full

* shader_recompiler: V_INTERP_MOV_F32

* Should work the same as spirv will emit flat decoration on demand

* kernel: Add MAP_OP_MAP_FLEXIBLE

* image_view: Attempt to apply storage swizzle on format

* vk_scheduler: Barrier attachments on renderpass end

* clang format

* liverpool: cs state backup

* shader_recompiler: More instructions and formats

* vector_alu: Proper V_MBCNT_U32_B32

* shader_recompiler: Port some dark souls things

* file_system: Implement sceKernelRename

* more formats

* clang format

* resource_tracking_pass: Back to assert

* translate: Tracedata

* kernel: Remove tracy lock

* Solves random crashes in Dark Souls

* code: Review comments
2024-07-30 23:32:40 +02:00
DanielSvoboda f751f7cc09 log improvement ThrowInvalidType (#330)
* log improvement ThrowInvalidType

* log improvement ThrowInvalidType
2024-07-28 18:42:54 +03:00
Vinicius Rangel 9d8cbdc507 64 bits OP, impl V_ADDC_U32 & V_MAD_U64_U32 (#310)
* impl V_ADDC_U32 & V_MAD_U64_U32

* shader recompiler: add 64 bits version to get register / GetSrc

* fix V_ADDC_U32 carry

* shader recompiler: removed automatic conversion to force_flt in GetSRc

* shader recompiler: auto cast between u32 and u64 during ssa pass

* shader recompiler: fix SetVectorReg64 & standardize switches-case

* shader translate: fix overflow detection in V_ADD_I32

use vcc lo instead of vcc thread bit

* shader recompiler: more 64-bit work

- removed bit_size parameter from Get[Scalar/Vector]Register
- add BitwiseOr64
- add SetDst64 as a replacement for SetScalarReg64 & SetVectorReg64
- add GetSrc64 for 64-bit value

* shader recompiler: add V_MAD_U64_U32 vcc output

- add V_MAD_U64_U32 vcc output
- ILessThan for 64-bits

* shader recompiler: removed unnecessary changes & missing consts

* shader_recompiler: Add s64 type in constant propagation
2024-07-27 17:23:59 +03:00
DanielSvoboda 88cd3172ff BUFFER_STORE_DWORDX2 2024-07-26 00:25:29 -03:00
TheTurtle 0c96f2a030 memory: Cleanups and refactors (#324)
* memory: Various fixes

* Added (Partial) sceKernelBatchMap/sceKernelBatchMap2

* memory: Rename and implement batch unmap

* memory: Remove uneeded assert

* memory: Commonize free search routine

* memory: Contains check is inclusive

* memory: Address some alignment issues

* clang format

---------

Co-authored-by: raziel1000 <ckraziel@gmail.com>
2024-07-25 23:01:12 +03:00
squidbus 38398a2175 Fix one-off bug with user data registers. 2024-07-21 22:36:12 +03:00
squidbus d42a32bbd8 Add initial macOS support. 2024-07-21 22:36:12 +03:00
psucien 2b31ab1e71 Surface management rework (1/3) (#307)
* amdgpu: proper CB and DB sizes calculation; minor refactoring

* texture_cache: separate file for image_info

* texture_cache: image guest address moved into image info

* texture_cache: surface size calculation

* shader_recompiler: fixed sin/cos

Thanks to red_pring and gandalfthewhite0173

* initial preparations for subresources upload

* review comments
2024-07-20 12:51:21 +03:00
TheTurtle e70ce517cc spirv: Address some regressions in buffer loads (#304)
* spirv: Use correct index

* spirv: Fix indices during buffer load

* clang-format fix

* spirv: Index can be const

---------

Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
2024-07-19 19:36:07 +03:00
Vladislav Mikhalin 6c7825f31f Fixed buffer_store_* regression (#302) 2024-07-18 21:04:12 +03:00
georgemoralis 9f1251b643 clang format fix 2024-07-17 17:57:54 +03:00
IndecisiveTurtle 70e74160d7 shader_recompiler: Normal gathers 2024-07-17 16:49:45 +03:00
Vladislav Mikhalin f1d1af2dba Implemented load_buffer_format_* conversions (#295)
* Implemented load_buffer_format_* conversions

* clang-format insists on ugly things
2024-07-16 15:03:07 +03:00
georgemoralis 6202c21106 Merge pull request #293 from shadps4-emu/misc-fixes3
Various linux fixes
2024-07-15 15:25:20 +03:00
georgemoralis 909fcb5b75 clang format fix 2024-07-15 14:18:28 +03:00
IndecisiveTurtle 01f26998f8 ssa_rewrite_pass: Correct phi node type for thread bitmask 2024-07-15 13:34:34 +03:00
georgemoralis 9a7a508b80 Merge pull request #287 from polybiusproxy/dev
gnmdriver: Implement shader functions
2024-07-15 07:47:33 +03:00
IndecisiveTurtle d4e95f7bd3 clang format 2024-07-15 03:47:10 +03:00
psucien 77d0535f9f review comments applied 2024-07-14 23:25:41 +02:00
georgemoralis 8435322d6a Merge pull request #292 from shadps4-emu/games/00144
Missing graphics features for flOw & Flower
2024-07-14 23:07:46 +03:00
psucien b684893aa8 recompiler: added support for discard on export with masked EXEC 2024-07-13 14:57:01 +02:00
Daniel R ba8ae239f8 shader_recompiler/frontend: Implement opcodes (#289)
`S_ASHR_I32` and `BUFFER_LOAD_DWORD`.
2024-07-13 12:37:25 +03:00
psucien c068adda48 recompiler: proper VS inputs initialization 2024-07-13 01:00:24 +02:00
Vladislav Mikhalin 9b06e9ab64 Fixed an issue with number of components of shader attributes 2024-07-11 16:10:23 +03:00
Vinicius Rangel 7dad151f98 impl V_CMP_CLASS_F32 common filter masks (#276) 2024-07-10 02:24:01 +03:00
DanielSvoboda 33c78854e2 add V_MAD_U32_U24 (#262)
* V_MAD_U32_U24

* adjust V_MAD_I32_I24 for bit extraction

* optional bit extraction parameter

* Update vector_alu.cpp

* clang-format

* Update src/shader_recompiler/frontend/translate/vector_alu.cpp

Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>

* Update vector_alu.cpp

* Update translate.h

---------

Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
2024-07-09 01:35:01 +03:00
Stolas f9fd1e1dae Added Legacy Min/Max ops (#266)
* Forwarding V_MAX_LEGACY_F32 to V_MAX3_F32. Fixes Translation error in Geometry Wars 3.

* Forwarded to correct op

* Implemented Legacy Max/Min using NMax/NMin

* Added extra argument to Min/Max op codes

* Removed extra translator functions, replaced with bool

* Formatting
2024-07-08 12:24:12 +03:00
psucien a25872031c renderer: a bit more formats to support 2024-07-07 14:34:36 +02:00
psucien 5836a31d76 recompiler: switch instance data to storage buffers 2024-07-07 13:08:39 +02:00
psucien 9d9ebe7a30 recompiler: fix for gather4 components return 2024-07-07 13:00:52 +02:00
psucien 317838122d renderer: added support for instance step rates 2024-07-06 18:03:43 +02:00
TheTurtle 60c63da3fd shader_recompiler: Check usage before enabling capabilities (#245)
* vk_instance: Better feature check

* shader_recompiler: Make most features optional

* vk_instance: Bump extension vector size

* resource_tracking_pass: Perform BFS for sharp tracking

* The Witness triggered this
2024-07-06 02:42:16 +03:00
psucien 5317c45029 Recompiler: sampler patching (#236)
* recompiler: restored bfs in image instruction producers search

* recompiler: added pattern check for s# anisotropy modification

* added check if s# comes from constant load (e.g. EUD)
2024-07-05 00:15:57 +03:00
TheTurtle d9873e30bc shader_recompiler: Implement most integer image atomics, workgroup barriers and shared memory load/store (#231)
* shader_recompiler: Add LDEXP

* shader_recompiler: Add most image integer atomic ops

* shader_recompiler: Implement shared memory load/store

* shader_recompiler: More image atomics

* externals: Update sirit

* clang format

* cmake: Add missing files

* shader_recompiler: Fix some atomic bugs

* shader_recompiler: Vs outputs

* shader_recompiler: Shared mem has side-effects, fix format component order

* shader_recompiler: Inline constant buffer impl

* video_core: Fix regressions

* Work

* Fixup a few things
2024-07-05 00:15:44 +03:00
georgemoralis ec0afbd63c clang format fix 2024-07-01 23:48:30 +03:00
IndecisiveTurtle b72115bed4 shader_recompiler: More instructions 2024-07-01 22:42:45 +03:00
IndecisiveTurtle 0a900115e8 video_core: Fix some regressions 2024-07-01 18:26:22 +03:00
IndecisiveTurtle 4ec742281d clang format 2024-07-01 13:56:14 +03:00
IndecisiveTurtle 028e28e231 shader_recompiler: Apply buffer swizzle on vertex attribs 2024-07-01 13:56:14 +03:00
IndecisiveTurtle 1fba40369d renderer_vulkan: Prefer depth stencil read-only layout when possible
* Persona reads a depth attachment while it is being attached with writes disabled. Now this works without spamming vk validation errors
2024-07-01 13:56:14 +03:00
IndecisiveTurtle 69797f4d5d video_core: Track renderpass scopes properly 2024-07-01 13:56:14 +03:00
IndecisiveTurtle 73c2697ed1 video_core: Fix a few problems 2024-07-01 13:56:14 +03:00
IndecisiveTurtle 333e5ab85f spirv: Add fragdepth and implement image query 2024-07-01 13:56:14 +03:00
psucien 6adcd905b6 shader_recompiler: a simple bfs in image arg producer search 2024-06-30 01:21:39 +03:00
psucien 33939eb470 Metadata support (#223)
* texture_cache: more image usage flags

* texture_cache: metadata registration

* renderer_vulkan: initial CMask support

* renderer_vulkan: skip redundant FCE and FMask decompression passes

* renderer_vulkan: redundant VO surface registration removed

* renderer_vulkan: initial HTile support

* renderer_vulkan: added support for MSAA attachments

* renderer_vulkan: skip unnecessary metadata updates
2024-06-29 16:49:59 +03:00
georgemoralis 0845d8f250 Stabilization8 (#218)
* disable configured flexible memory size (caused issues in some games)

* fixed case S_OR_B64 for blazing chrome

* submodules updates and fixes for latest SDL

* stubbed _sigprocmask (not handled and spams too much)

* added ReplaceOp case in Stencilop

* dummy ajm module added
2024-06-27 16:37:17 +03:00
IndecisiveTurtle c5f2368e52 kernel: Const correctness 2024-06-26 18:24:06 +03:00
IndecisiveTurtle 4595a4cfb2 translator: Merge ANDN2 with AND and impl ORN2 2024-06-26 18:16:01 +03:00
IndecisiveTurtle ee50cbdcb6 shader_recompiler: More instructions and fix for swords of ditto 2024-06-26 18:03:09 +03:00
psucien e6e3aa0080 Initial instancing and asynchronous compute queues (#207)
* gnm_driver: added `sceGnmRegisterOwner` and `sceGnmRegisterResource`

* video_out: `sceVideoOutGetDeviceCapabilityInfo` for sdk runtime

* gnm_driver: correct vqid index range

* amdgpu: indirect buffer, release mem and some additional irq modes

* amdgpu: added ASC commands processor

* shader_recompiler: added support for fetch instance id

* amdgpu: classic bitfields for T# representation (debugging experience)

* renderer_vulkan: skip zero sized VBs from binding

* texture_cache: image upload logic moved into `Image` object

* gnm_driver: `sceGnmDingDong` implementation

* texture_cache: `Image` usage flags moved; correct VO buffer pitch
2024-06-22 19:50:20 +03:00
georgemoralis 1eb182dca7 more clang fix 2024-06-22 18:15:42 +03:00
IndecisiveTurtle f830b08b3f linker: Set rela bits for all symbol types 2024-06-22 18:09:04 +03:00
IndecisiveTurtle fc1d1e6f73 shader_recompiler: Even more instructions 2024-06-22 18:09:04 +03:00
IndecisiveTurtle 49834a0cd2 shader_recompiler: Add more instructions 2024-06-22 18:09:03 +03:00
psucien 42353fd939 final touch: assert instead of log crit to crash earlier 2024-06-17 00:42:26 +02:00
psucien e09b04c492 shader_recompiler: list all missing instructions during translation pass 2024-06-16 23:45:39 +02:00
psucien 5928d74b2e shader_recompiler: added V_TRUNC VOP1/3 (496) 2024-06-16 23:39:45 +02:00
psucien 079073d4c5 shader_recompiler: pretty print for missing shader instructions 2024-06-16 23:11:36 +02:00
psucien 6c534ffa11 shader_recompiler: added V_MAX VOP2 (431, 433) 2024-06-16 21:34:23 +02:00
psucien 78752b008a shader_recompiler: correct format for SSBO store op 2024-06-16 21:21:19 +02:00
psucien 9f605c9bbd shader_recompiler: added MUL_HI VOP2 (896) 2024-06-16 20:39:53 +02:00
psucien 1fb06835b9 shader_recompiler: added SOPK MOVK (45) 2024-06-16 20:26:24 +02:00
TheTurtle 556d88ecb4 core: Many things (#194)
* video_core: Add a few missed things

* libkernel: More proper memory mapped files

* memory: Fix tessellation buffer mapping

* Cuphead work

* sceKernelPollSema fix

* clang format

* fixed ngs2 lle loading and rtc lib

* draft pthreads keys implementation

* fixed return codes

* return error code if sceKernelLoadStartModule module is invalid

* re-enabled system modules and disable debug in libs.h

* Improve linux support

* fix windows build

* kernel: Rework keys

---------

Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
2024-06-15 14:36:07 +03:00
psucien b110523d90 recompiler: trivial missing ops (VALU OR and SALU LE, GE) added 2024-06-10 23:49:23 +02:00
TheTurtle 769eb6a9ee video_core: Preliminary storage image support and more (#188)
* vk_rasterizer: Clear depth buffer when DB_RENDER_CONTROL says so

* video_core: Preliminary storage image support, more opcodes

* renderer_vulkan: a fix for vertex buffers merging

* renderer_vulkan: a heuristic for blend override when alpha out is masked

---------

Co-authored-by: psucien <bad_cast@protonmail.com>
2024-06-10 22:35:14 +03:00
TheTurtle 718ade970f video_core: Add depth buffer support and fix some bugs (#172)
* memory: Avoid crash when alignment is zero

* Also remove unused file

* shader_recompiler: Add more instructions

* Also fix some minor issues with a few existing instructions

* control_flow: Don't emit discard for null exports

* renderer_vulkan: Add depth buffer support

* liverpool: Fix wrong color buffer number type and viewport zscale

* Also add some more formats
2024-06-07 16:26:43 +03:00
raphaelthegreat 6fbd20c2d8 shader: Fix block processing order in dead code elimination pass 2024-06-06 02:46:36 +03:00
raphaelthegreat fc8e4a1f97 shader_recompiler: Add more instructions and fix a few thinhs 2024-06-05 22:22:34 +03:00
raphaelthegreat ead6ef58b5 shader_recompiler: Better branch detection + more opcodes 2024-06-02 03:05:40 +03:00
raphaelthegreat e637f52076 video_core: Moar shader instruction 2024-05-30 18:17:54 +03:00
psucien e89e84aae2 shader_recompiler: redundant IR opcode removed 2024-05-30 11:50:42 +02:00
psucien 6f4a9dd87b shader_recompiler: added NOP and RSQ instructions 2024-05-30 09:43:49 +02:00
raphaelthegreat 99d20d4119 video_core: Implement basic compute shaders and more instructions 2024-05-30 01:39:24 +03:00
raphaelthegreat 05c4542301 video_core: Address some feedback 2024-05-27 22:13:55 +03:00
raphaelthegreat 8bd9bf1a7d video_core: Add image support 2024-05-27 18:25:45 +03:00
TheTurtle 22b7ae4b63 video_core: Add constant buffer support (#147) 2024-05-26 15:51:35 +03:00
TheTurtle 0aa04c60cb video_core: Bringup some basic functionality (#145)
* video_core: Remove hack in rasterizer

* The hack was to skip the first draw as the display buffer had not been created yet and the texture cache couldn't create one itself. With this patch it now can, using the color buffer parameters from registers

* shader_recompiler: Implement attribute loads/stores

* video_core: Add basic vertex, index buffer handling and pipeline caching

* externals: Make xxhash lowercase
2024-05-25 15:33:15 +03:00
TheTurtle 4380066a90 video: Import new shader recompiler + display a triangle (#142) 2024-05-22 01:35:12 +03:00