I know of development teams that have used Blue Prism to interface with Citrix Desktops in the past using Surface Automation, so what you are describing should be possible.
Where your mouse pointer is within the VM when you start interfacing within it is irrelevant, because you will need to give focus to any element you want to interact with using a Global Mouse Click to that element, using Image Searching to find exactly where to click. Based upon your description your logic would therefore be something like:
-- Click within the VM to give the VM desktop focus and for your mouse/keyboard to start driving the vm desktop rather than your local desktop
-- Use Image Search - Find Image to find an image of where within the VM you want to click or interface
-- Send a Global Click to the location within the desktop that you want to interact, based upon the image search x y result.
One think to watch out for is that Blue Prism itself will continually regain focus if you are debugging as you build, which will stop your Surface Automation interface from working correctly. To prevent this the only way to test as you build is to run your flow to breakpoints.