22-03-26 01:49 PM
I am looking for practical patterns teams use before exceptions pile up.
What has worked best for you?
- a dedicated exception queue with SLAs
- tagging by retryable vs permanent failure
- auto-routing to business users for data fixes
- daily review windows vs real-time triage
I am especially curious about what scales once volumes increase and you need consistent handoff between ops and bot owners.
25-03-26 05:43 AM
Hello, I've not seen use of dedicated queue just to report the exceptions. Usually, you can capture the exception and mark it as Business Exception/System Exception from the process itself. Then at the queue level you can filter out success and exception items.
Also the retry logic can be built in the main process itself for such exceptions.
Let us know if you want more brainstorming on the same and I can help.