Optimalization (something like a problem with advanced output)

Post your questions here and we'll all try to help.
Ladin
Posts: 52
Joined: Tue Apr 14, 2015 12:07
Location: Prague

Optimalization (something like a problem with advanced output)

Post by Ladin »

Hey there,

Maybe someone can help or can share some experience..

It´s look like there is a problem with outputing in Reso.


Whats wrong with the sending data from primary (rendering) card to expansion card like BM quad, another GTX, etc.? Why is there a huge performance drop?

There is no bottleneck. Main GPU is running on 45 - 50%, there is no reading problem of data from disk, no problem with amount of RAM, processor is runnig on 10%, PCI-E lines are ok and enough. And speed of PCI-E is crazy, there is no problem with bandwith in x16 gen3 or x8 gen 2 (case of BM Q2).

So why is there fps drop from 60fps to 25 fps when activating all of the outputs??

THX..

Edit: Same thing when using just one card and outputing, let´s say 2 x 4K and 1 x 3840 x 1080.
One wide composition and one layer. No effects, just playing one file, in DXV3. Same thing. Whole computer is resting and fps drops to halve.. Windows power scheme and nvidia control panel is prefering maximum performance, PCI-E runs on x16 gen3. So, what´s wrong?

I didn´t tryed it yet, but I believe, that if I will use just mosaic setup in windows, and play one video acros all of screens (let´s say 3x 4K), there will be no problem, no stuttering or dropping fps, same like just connecting 3 x 4K monitors and running screensaver on all of them..

Edit2: Next strange thing. I have completely clear deck. No videos in here.. Just wide composition like 19200x1080 on 60fps. With disabled outputs the counter says that it´s runing on 60fps. When enabling outputs, everything goes down. No footages, no pictures, nothing in deck. There is no reason for this huge drop.
So. Resolume is delivering 30-35 frames per second of nothing, instead of 60 frames per second of nothing to outputs, using 40% of graphic card performance (by GPU-Z).
I can imagine, that there is some allocation, but why Reso doesn´t use 80% - 100% of graphic card performance to delivering maximum fps to outputs (no matter if some footages or for delivering nothing)??

Edit3: And finally, there is really little difference between GTX 1070 and GTX 1080ti in that problem. Fps drop is maybe 5-7fps bigger in case of 1070..

Zoltán
Team Resolume
Posts: 7644
Joined: Thu Jan 09, 2014 13:08
Location: Székesfehérvár, Hungary

Re: Optimalization (something like a problem with advanced output)

Post by Zoltán »

With no layers playing and no outputs enabled, Resolume is idle.
You'll see the FPS of your main display on any machine, if you're not playing anything.

Once you activate an output to a capture card, the black texture needs to travel through your machine.
From your GPU to memory, then from memory to your capture card, as uncompressed data.

Resolume will cap the maximum frame rate to the refresh rate of your slowest output, so that could also be what you see, if you're outputting as 1080p25 for example.

How is the Resolume FPS with no outputs enabled and 1 layer playing in this wide composition?

How many layers do you need to play to get down to 30FPS?

What kind of CPU do you have in your machine?
Software developer, Sound Engineer,
Control Your show with ”Enter” - multiple Resolume servers at once - SMPTE/MTC column launch
try for free: http://programs.palffyzoltan.hu

Ladin
Posts: 52
Joined: Tue Apr 14, 2015 12:07
Location: Prague

Re: Optimalization (something like a problem with advanced output)

Post by Ladin »

Zoltán wrote: Thu May 02, 2019 10:34
Once you activate an output to a capture card, the black texture needs to travel through your machine.
From your GPU to memory, then from memory to your capture card, as uncompressed data.
Yes, but GPU usage is about 30%, RAM usage is only 3,6 GB of 64 GB, processor usage is only 9% to 11% and there is FPS drop from 60fps to 34 fps.
Zoltán wrote: Thu May 02, 2019 10:34 Resolume will cap the maximum frame rate to the refresh rate of your slowest output, so that could also be what you see, if you're outputting as 1080p25 for example.
All of the outputs are set to 60p, same when setting up to 30p all of them.
Zoltán wrote: Thu May 02, 2019 10:34
How is the Resolume FPS with no outputs enabled and 1 layer playing in this wide composition?
That is the thing.. 60FPS with no problem disabling all of the outputs..

Zoltán wrote: Thu May 02, 2019 10:34 How many layers do you need to play to get down to 30FPS?
With disabled outputs 10 layers to drop to 30fps.. Content is one loop 8500 x 1080 in DXV3 stretched across the whole composition..

With enabled outputs FPS drops to 34 without any content. With one layer and same loop it drops to 28 fps..


Zoltán wrote: Thu May 02, 2019 10:34
What kind of CPU do you have in your machine?

I have two machines.. One with Xeon E5-1650v3 and the second with i7-7820X..

Same behavoir on both of them..

With disabled outputs - 10 layers of the same content 29,8 fps, CPU usage about 30%, GPU usage skips to 70% and RAM usage 3,8 GB..

Strange thing is, that when I enable all of the outputs, with 10 layers of same content (still playing, just enable outputs) FPS drops to 18,5, CPU usage drops to 24%, RAM usage rises to 3,9G, but the worst thing is, that GPU usage drops from 70% of load to 50%...

I don´t understand..


PS: I´m talking about whole system usage.. While Arena is taking almost whole part of used CPU usage, GPU usage, RAM usage is only 830MB of 3,9 GB used.. (Edge explorer, wich I´m using for writing this post, wit just two windows opened takes 400MB).. But there is still more than 60 GB of unused RAM.. Same like there is a huge performance reserve in every aspect of the system, but Arena is running on 18 FPS with 10 layers playing and all outputs activated..
Last edited by Ladin on Thu May 02, 2019 13:40, edited 3 times in total.

Ladin
Posts: 52
Joined: Tue Apr 14, 2015 12:07
Location: Prague

Re: Optimalization (something like a problem with advanced output)

Post by Ladin »

Thing is, that there shouldn´t be a problem with PCI-E throughput, because when using BM Quad 2, there is 8 of outputs. For uncompressed video, let´s say in 24 bits (reso uses 8 bit or 10 bit) it´s 24x1920x1080x60 = 2,99 Gbit/s per output. With 8 outputs it´s 23,92 Gbit/s.

But PCI-E x8 gen2 (wich is BM Q2), has 4GB/s (not Gb) on one side of throuhput. 1 byte = 8 bits, so 4 x 8 = 32 Gbits per second.

So there shouldn´t be a bottleneck and I believe, that it´s not a problem of hardware, or PCI-e throughput, because adding outputs one by one causes fps drops, not like after reaching some critical border, that exceeds the limit of PCI-E bus.

Adding one output - fps drop, adding second output fps drop is bigger, the third output - bigger.. And not like adding 1st output - ok, adding 2nd output - ok, etc, etc and then limit of pci-e and massive dropdown..

Update:
Tested with 2 BM Quad2 cards (to avoid the problem with PCI-E throughput on one slot).. Exactly same thing.. No matter if I will send 8 FHD slices to 8 outputs on one card, or 4 FHD slices to each one. Behavior of the system is exactly same..
Last edited by Ladin on Thu May 02, 2019 21:27, edited 1 time in total.

Ladin
Posts: 52
Joined: Tue Apr 14, 2015 12:07
Location: Prague

Re: Optimalization (something like a problem with advanced output)

Post by Ladin »

After adding and removing outputs I have made a table, because I have been suspecting RAM usage. A this moment I am completely confused..

Fps drops when adding outputs and RAM alocating (usage):

No output - 0 339700k
1st out - 6 fps drop 394200k
2nd out - 7 fps drop 440008k
3rd out - 4 fps drop 490220k
4th out - 4 fps drop 535944k
5th out - 3 fps drop 585684k
6th out - 3 fps drop 620380k
7th out - 2 fps drop 675596k
8th out - 2 fps drop 709200k
Total - 31 fps drop when playing only one clip in one layer.. CPU, GPU usage always the same (more or less), like without outputs..

When removing outputs there is exactly same fps increasing (but in opposite way), from less fps to more. I mean, when disabling 8th output, it gives me just 2 fps, disabling 7th out 2 fps, 6th - 3fps and so on..

But RAM usage is constant, it´s not decreasing..

Something strange here..

As I said, I don´t believe, that it is a hardware or performance issue or some bottlenecking somewhere. Adding more outputs causes lesser and lesser fps drops. It shall be completely in opposite direction in case of bottleneck (bigger and bigger fps drops). And thing is that there is still plenty of computing power (unused) to feeding it flawlessly..

Zoltán
Team Resolume
Posts: 7644
Joined: Thu Jan 09, 2014 13:08
Location: Székesfehérvár, Hungary

Re: Optimalization (something like a problem with advanced output)

Post by Zoltán »

PCI-e speeds you see listed in specs are theoretical maximums, of only the pci-e pipe. There are other components in the chain.
So the gen 2 x8 connection to the BM with it's 32Gb/s, having to transfer 23Gbits would occupy the CPU / DMA controller 2/3rd of the time alone.

At the same time you'd need to start to transfer the next frame from disk, then the finished frame from the GPU to the memory. And the system will have to handle everything else it needs to be able to operate.

DMA working won't show up in the CPU stats I think, but all the processes will have to wait for each other.
If they have to wait, the fps will drop, because the GPU can't get the data for the next frame in time...
Does that makes sense?

You could double check that you have DMA textures enabled in Resolume Video preferences, but that might not help much.

If you need lots of outputs and high FPS, I'd recommend to only use the outputs on your renderer GPU, and use an FX4 to split up the outputs.
Software developer, Sound Engineer,
Control Your show with ”Enter” - multiple Resolume servers at once - SMPTE/MTC column launch
try for free: http://programs.palffyzoltan.hu

Ladin
Posts: 52
Joined: Tue Apr 14, 2015 12:07
Location: Prague

Re: Optimalization (something like a problem with advanced output)

Post by Ladin »

Sorry Zoltan, but I don´t agree.

There is enough PCI-E bandwidth, and processor/ring bus, (mesh bus) or system agent resources.
Blackmagic is running on X8 gen2. But it is in PCI-E slot wich is X16 gen 3. There is enough of PCI-E lines of processor, no problem with bandwidth or waiting for something.. For controller (system agent) it´s a piece of cake to handle that amount of data.

As you wrote:
At the same time you'd need to start to transfer the next frame from disk, then the finished frame from the GPU to the memory. And the system will have to handle everything else it needs to be able to operate.
What exactly is Resolume reading from disk, with completely empty deck (outputing just black, wich is generated by Reso)? There is nothing to read. There is nothing like a universal black footage on drive, wich is Resolume reading, when there are no clips in the deck.

Easy proof. Running Arena with enabled outputs on BM card. FPS drops to 30. At the same time opening the same composition in Resolume Avenue. With the same sources (loops) in deck. Outputing via main card. And voilá, there is no problem with dropping fps in Avenue. Runing both Resolumes together.. Rendering scene on one GPU for both Resolumes.. And, adding a M.2 drive with PCI-E adaptor to the next PCI-E slot and copying huge amount of data from this drive to another has ZERO impact on FPS in both Resolumes, so, there is no problem with controller..

Menno
Team Resolume
Posts: 147
Joined: Tue Mar 12, 2013 13:56

Re: Optimalization (something like a problem with advanced output)

Post by Menno »

Resolume's output system is single threaded and processes outputs sequentially. That means it's using a single cpu core and only processes a single output at the same time, one after another. This causes these hard to interpret situations when measuring average utilization over a set amount of time, because at one point we need the gpu to do some work, then we need some transfers to happen and then we need the cpu to do some work.

For example lets say in the first second we need 100% of the gpu and 0% of the cpu. Then in the second second we need 0% of the gpu and 100% of the cpu. If the measurement tools averages these values over that two second period you would see an average utilization of 50% gpu and 50% cpu, even though we're waiting for the hardware to complete the work.

We're aware that there's ways for us to achieve higher utilization and as a result obtain a higher framerate. For example we could process multiple outputs in parallel or asynchronously, however this almost always results in higher memory usage and sometimes may introduce an extra frame of delay in the output.
Some day when time allows we can make some improvements there.

You shouldn't worry about fps dropping when outputting 'nothing' though. There is no such thing as outputting 'nothing'. In the end when you're playing nothing you have a fully transparent output. Because we cannot actually make your outputs transparent we need to compose this onto black, and then we're outputting black. If you want no output you disable the output, if the output is enabled, the least amount of output you can get is black. This is especially noticable on capture device outputs as we still need to download this big area of black over the pci bus from the gpu into ram and then send it back over the pci bus to the capture card. This is a lot of work that's unaffected by the color of the image being sent around, it could be black, purple or pink, it still has to happen :)

Ladin
Posts: 52
Joined: Tue Apr 14, 2015 12:07
Location: Prague

Re: Optimalization (something like a problem with advanced output)

Post by Ladin »

Thank you for the answer.
Menno wrote: Tue May 14, 2019 15:00 Resolume's output system is single threaded and processes outputs sequentially. That means it's using a single cpu core and only processes a single output at the same time, one after another. This causes these hard to interpret situations when measuring average utilization over a set amount of time, because at one point we need the gpu to do some work, then we need some transfers to happen and then we need the cpu to do some work.

We're aware that there's ways for us to achieve higher utilization and as a result obtain a higher framerate. For example we could process multiple outputs in parallel or asynchronously, however this almost always results in higher memory usage and sometimes may introduce an extra frame of delay in the output.
Some day when time allows we can make some improvements there.
Focus on it please. Please. Maybe that day is here. Arena is sold like a mediaserver, wich is used for outputing media (videos, loops) using multiple outputs, for edgeblendings, etc.. And it´s a pitty, that it is useless to output those videos smoothly using officialy supported I/O cards.. I love Reso, working with it. I have been looking forward to new version, for some improvement. To be honest, I will appreciate smooth playback instead of new fancy features if we are talking about mediaserver.

Thank you.

L.

chq
Posts: 36
Joined: Sat Jul 13, 2019 09:05

Re: Optimalization (something like a problem with advanced output)

Post by chq »

100% my opinion too. You should delete the word „mediaserver“ from your website. A mediaserver can smoothly playback multiple UHDp60-files for example.

Greetz Chris

Post Reply