Thanks for picking up the challenge.
It's especially interesting to see that just playing a single triplewide clip in a triplewide comp is faster than all the other workarounds. If OP's need for a workaround came from lack of performance using this method, I doubt the workarounds will actually improve anything.
It just goes to show that taking a whole bunch of pixels at the same time and doing operations on them is exactly what a GPU is good at. And it knows how to do it best. Often, the more you try to be smart and try to 'help' it, the more difficult you actually make it.