Intel already planning XeSS 2.0/3.0
Karthik Vaidyanathan (KV), Intel’s Principal Engineer for XeSS technology has been interviewed by Usman Pirzada from Wccftech.
The interview covers a wide range of questions in regard to Intel’s upcoming supersampling technology called XeSS, including, questions comparing the technology to NVIDIA DLSS, day-one support for existing GPUs and why Intel is releasing such technology in the first place.
Intel XeSS will not require training per game, just like NVIDIA DLSS 2.0
One of the first things revealed by Karthik was that XeSS will not require per-game training. It’s a uniform library that will be compatible with various titles at the same time. It is a bit similar approach to NVIDIA DLSS 2.0, for which libraries can be moved between the games without affecting the core technology, which is supersampling.
KV: DLSS 1.0, again I am not aware of the internals of DLSS because its not open, but from my understanding, it was not something that generalized across games, DLSS 2.0 plus was neural network based and it generalized very well. XeSS from day one, our objective has to be a generalized technique. […] And also you don’t want to have a solution that’s fragile that requires training for every game that someone ships that’s also been our objective from day one.
He also confirmed that the official demo of XeSS was not used as training material:
[…] You’ve seen the demo and I can say that XeSS has never seen that demo. It was never trained on that demo. All the content in that scene that you saw was not used as a part of our training process.
XeSS will work with GPUs supporting Microsoft Shader Model 6.4 and above
The engineer also explains why XeSS will see a broader adoption than NVIDIA DLSS. Intel’s technology will be available in two variants: XMX-accelerated, exclusive to Intel Arc GPUs, and DP4a based, a special dot product acceleration supported by Microsoft Shader Model 6.4 enabled GPUs, including NVIDIA Pascal, Turing, and AMD RDNA1/2.
KV: Nvidia has had this I think, since Turing and AMD has this now on RDNA2. So even without Matrix acceleration you can go quite far. It might not be as fast as matrix acceleration, but certainly meets the objective. And as I said, the objective is to maintain the fidelity of your render and achieve smooth frame rates. So, when it comes to older models, on older internal GPUs, we’ve had dot product acceleration (DP4a) for a while now. Microsoft has enabled this through Shader Model 6.4 and above and on all these platforms XeSS will work.
KV: So, for DP4a, yes, SM 6.4 and beyond SM 6.6 for example supports DP4a and SM 6.6 also supports these packing intrinsics for extracting 8-bit data and packing 8-bit data. So we recommend SM 6.6.
So far AMD had not publicly acknowledged that they wish to support XeSS technology, nor they have provided a list of supported GPUs, which is understandable since the technology has not been released yet. However, it should be noted that Intel did not wait for AMD FSR to be released to express interest in competitor’s technology.
The DP4a version will have a longer frame render time, but it will still be significantly lower than rendering the image at native 4K resolution.
XeSS will have one API for both versions
From the developer’s perspective, there will be no change to the API for XMX and DP4a based versions of XeSS.
KV: […] I would wanted to point out that both the DP4a version and the XMX version are exposed through the same API. So as far as the integration is concerned, it’s actually the same. What the game engine sees is the same interface and underneath that interface, you know, you can select the DP4a or the XMX version and depending on the platform. So I wanted to clarify that, it’s not two different interfaces. It’s the same interface and the same library that it’s integrated with two different paths inside of it, which makes it a lot easier for game developers.
No support for NVIDIA Tensor Cores, and no FP16/FP32 fallback at launch
Furthermore, he added that the XMX version of XeSS will not take advantage of NVIDIA Tensor Cores.
KV: Ah, no. Until there is standardization around Matrix acceleration that is cross-platform, it’s not easy for us to build something that runs on all kinds of Matrix acceleration hardware. DP4a has reached a stage where, you know, it supports this on all platforms. Certainly on all modern platforms. So that makes it much easier for us. But Matrix acceleration is not at that same stage. So our matrix implementation that targets XMX is Intel specific.
Some users may want to know that there is also no FP16/32 fallback option planned. At launch FSR had FP32 fallback for older GPUs, ensuring that a very large number of graphics architectures are supported. However, those instructions are not based on matrixes, like Tensor or XMX cores, which is the most common operation for AI-based algorithms.
KV: [FP16/32 fallback] No, not at the moment. We will look into it, but I cannot commit to anything at this point. Even, if you were able to do it, there’s the big question of performance and whether its justified.
XeSS will have multiple quality modes
Just like FSR and DLSS, XeSS will feature quality modes, providing more flexibility to gamers and developers, who would want to get the most out of their high-end or low-end GPUs through either manual or automatic optimization.
KV: We will have the quality modes as both FSR and DLSS have those at this point. So, you know, we will support the same when users are used to it. So we would support that. But I also wanted to point out that the one thing that sort of gets lost in these different modes, performance, quality, ultra quality is that what you really want to have is something like the performance mode produce an image quality that is so close to ultra quality that it doesn’t take away from the visual experience.
XeSS 2.0 and 3.0 are confirmed but they might require open-source input
Karthik confirmed that Intel will launch XeSS 2.0 and 3.0 in the future as the technology evolves. The manufacturer will be open-sourcing its technology once the technology matures. An open-source approach to AI-based super resolution might either help boost the popularity of XeSS and push the market towards a true cross-vendor solution, but it might also be a start of further market segmentation with minor changes between potential XeSS-clones. This is likely why Intel is reluctant to open source its technology at launch.
KV: There will be XeSS 2.0 at some point, XeSS 3.0 at some point. You know at some point maybe graphics just completely changes and it’s all neural networks. […] We have a certain perspective on this. […] if you have a technology that’s open source and runs on multiple platforms, it’s something that you can integrate into your game engine and not have to differentiate for every single platform that you’re running on. So, yes, it’s also been our objective from day one to have a solution that works on other GPUs, is open source and can set the path or establish a path to wider adoption across the industry.
XeSS is trained at 64 samples per pixel
XeSS reference images for training are trained at 64 samples per pixel, matching NVIDIA 16K training imagery.
KV: That’s a very interesting question. Let me put it differently, we train with 64 samples per pixel reference images and I think that makes more sense. Because what we are trying to match, the kind of quality that we are trying to train the network with is 64x SSAA. That’s what we use to train the network, and another way of looking at it is how many samples it ends up being overall. So, when NVIDIA says 16k images, I am assuming it translates to the number of samples it has inside a pixel. So from our standpoint, that’s what I can talk about. We train with reference images that have 64 samples per pixel.
Intel XeSS technology should launch alongside Arc “Alchemist” GPUs in the first quarter of 2022. Intel will release a closed-source XeSS SDK to developers based on the XMX version. The DP4a version SDK will be released by the end of this year.
|Resolution-based Performance Improving Technologies|
FidelityFX Super Resolution
Xe Super Sampling
Deep Learning Super Sampling
|Upscaling Method||Spatial upscaling||Neural network upscaling||Neural network upscaling|
|AI training||No||Trained at 64 samples per pixel||Trained at 64 samples per pixel|
|Implementation||Per game (officially)||Per game||Per game|
|Status||Released (1.0)||Unreleased||Released (2.2)|
|Source||Open (MIT license)||Closed, Eventually Open-Source||Closed|
|GPU Support||AMD Navi, Polaris, Vega|
NVIDIA 10, 16, 20, 30 series
|XMX-based (Intel Arc)|
DP4a-based (NVIDIA Turing and newer, Intel Xe-LP, AMD Navi 2X and newer)
|Tensor (Volta, Turing, Ampere)|