Speed up LiverpoolToVK::SurfaceFormat (#1982)

* Speed up LiverpoolToVK::SurfaceFormat

In Bloodborne this shows up as the function with the very highest cumulative "exclusive time". This is true both in scenes that perform poorly, and scenes that perform well.

I took (approximately) 10s samples using an 8khz sampling profiler.

In the Nightmare Grand Cathedral (looking towards the stairs, at the rest of the level):
- Reduced total time from 757.34ms to 82.61ms (out of ~10000ms).
- Reduced average frame times by 2ms (though according to the graph, the gap may be as big as 9ms every N frames).

In the Hunter's Dream (in the spawn position):
- Reduced the total time from 486.50ms to 53.83ms (out of ~10000ms).
- Average frame times appear to be roughly the same.

These are profiles of the change vs the version currently in the main branch. These improvements also improve things in the `threading` branch. They might improve them even more in that branch, but I didn't bother keeping track of my measurements as well in that branch. I believe this change will still be useful even when that branch is stabilized and merged.

It could be there are other bottlenecks in rendering on this branch that are preventing this code from being the critical path in places like the Hunter's Dream, where performance isn't currently as constrained. That might explain why the reduction in call times isn't resulting in a higher frame rate.

* Implement SurfaceFormat with derived lookup table instead of switch

* Clang format fixes
This commit is contained in:
hspir404 2025-01-02 13:38:51 +00:00 committed by GitHub
parent 099e685bff
commit 6862c9aad7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -691,16 +691,40 @@ std::span<const SurfaceFormatInfo> SurfaceFormats() {
return formats;
}
// Table 8.13 Data and Image Formats [Sea Islands Series Instruction Set Architecture]
static const size_t amd_gpu_data_format_bit_size = 6; // All values are under 64
static const size_t amd_gpu_number_format_bit_size = 4; // All values are under 16
static size_t GetSurfaceFormatTableIndex(AmdGpu::DataFormat data_format,
AmdGpu::NumberFormat num_format) {
DEBUG_ASSERT(data_format < 1 << amd_gpu_data_format_bit_size);
DEBUG_ASSERT(num_format < 1 << amd_gpu_number_format_bit_size);
size_t result = static_cast<size_t>(num_format) |
(static_cast<size_t>(data_format) << amd_gpu_number_format_bit_size);
return result;
}
static auto surface_format_table = []() constexpr {
std::array<vk::Format, 1 << amd_gpu_data_format_bit_size * 1 << amd_gpu_number_format_bit_size>
result;
for (auto& entry : result) {
entry = vk::Format::eUndefined;
}
for (const auto& supported_format : SurfaceFormats()) {
result[GetSurfaceFormatTableIndex(supported_format.data_format,
supported_format.number_format)] =
supported_format.vk_format;
}
return result;
}();
vk::Format SurfaceFormat(AmdGpu::DataFormat data_format, AmdGpu::NumberFormat num_format) {
const auto& formats = SurfaceFormats();
const auto format =
std::find_if(formats.begin(), formats.end(), [&](const SurfaceFormatInfo& format_info) {
return format_info.data_format == data_format &&
format_info.number_format == num_format;
});
ASSERT_MSG(format != formats.end(), "Unknown data_format={} and num_format={}",
static_cast<u32>(data_format), static_cast<u32>(num_format));
return format->vk_format;
vk::Format result = surface_format_table[GetSurfaceFormatTableIndex(data_format, num_format)];
bool found =
result != vk::Format::eUndefined || data_format == AmdGpu::DataFormat::FormatInvalid;
ASSERT_MSG(found, "Unknown data_format={} and num_format={}", static_cast<u32>(data_format),
static_cast<u32>(num_format));
return result;
}
static constexpr DepthFormatInfo CreateDepthFormatInfo(