I have a couple of clients that have done just what you suggest, I created the 'Surface Automation of Terminal Emulators' training based upon that experience which is worth a read. It is not ideal but it is certainly possible.
Interfacing with applications running within a virtual machine environment (i.e. running in a VDI like Citrix XenDesktop rather than virutalised as an application like XenApp) is probably the most difficult think to do. Effectively the entire desktop is a Surface Automation application and work needs to be done to build interfaces to do the simple stuff (switch between apps, start apps, etc). Everything you do needs to be based upon image locations.
If possible I also recommed using Blue Prism v6 where there have been some significant improvements to Surface Automation.