Hi All,I just wanted to share a hobby project I have worked on the last few weeks to see if I could get a local language model to run on runtime resource.The general inference workflow for LLMs I have seen is running a separate service and using API ...