TinyAgent is a small language model (SLM) showcasing the capability of reducing computational footprints while enabling effective and privacy-preserving edge deployment. The recent developments in large language models like the GPT-4 and Gemini-1.5 have enabled AI agents to execute commands extensively. However, their large model size and computational requirements often necessitate their inference to be conducted on the cloud. This presents issues, such as privacy concerns and connectivity dependability. TinyAgent solves these concerns by implementing the language models locally at the edge.
Prior to TinyAgent, it was a challenge to deploy small language model (SLM) locally due to their insistence on memorizing general information about the world into their parametric memory. This resulted in constraints related to model size. The primary research question that TinyAgent explores is: Can a smaller language model with significantly less parametric memory emulate such emergent ability of these larger language models?
The findings of the research indicate that it is indeed possible to train a SLM for executing certain tasks with high efficiency without requiring extensive world knowledge. For instance, a Siri-like application doesn't require large-scale world knowledge, but effective reasoning to orchestrate the right functions and tools to accomplish the user's command.
A measure of success was established to evaluate the performance of the model. The objective is to accurately generate the right plan, which involves not only selecting the right set of functions, but also correctly orchestrating them in the right order. The success rate is computed as the percentage of correct plans executed by the model.
The synthesized data used for training served as a critical element in enhancing the models' function calling capability. The process of data creation involved generating realistic user queries and their associated function calling plan and input arguments. A total of 80K training data, 1k validation data, and 1k testing data were secured at a total cost of ~$500.
Once the data were in place, the SLMs were fine-tuned and Tool RAG method was integrated for efficiency. It allowed the conciseness of the input prompt by including the description of relevant tools based on the user query. The result was a reduction in the prompt size by ~2x tokens.
As the focus shifts towards the edge deployment, a good deal of work was put into Phase Estimation Quantization 4-bit (PEQ-4). Deploying SLMs on a MacBook with limited computational resources can be challenging. Loading the model parameters can consume a large portion of the available memory. PEQ-4, thus, allows us to store the model at a reduced bit precision.
Conclusion: Overall, the introduction of TinyAgent proved that it is feasible to train a Small Language Model (SLM), deploy it on a Macbook for a Mac-like assistant, maintain its local and private deployment, and yet exceed GPT-4-Turbo’s success rate.
Disclaimer: The above article was written with the assistance of AI. The original sources can be found on Berkeley Artificial Intelligence Research.