Generative AI stands as a potent instrument that, when wielded correctly, can enhance efficiency, creativity, and speed. However, it is not without its limitations. Users of tools like ChatGPT often encounter false information, or “hallucinations,” as they’re known in the AI realm.
These models are trained on vast amounts of data, essentially a snapshot of the “entire Internet.” For instance, when ChatGPT was initially released, it was trained on data from two years prior. However, newer versions of Large Language Models (LLMs) might be trained on more recent data.
We’ve established that tools like ChatGPT utilise a model. But what if there are multiple models? Is it feasible to train your own LLM on your unique data to create a personalised version of “ChatGPT”? The answer is a qualified yes. It is indeed feasible to train an LLM on your data, and you can create multiple models, each for a specific purpose.
This concept piqued my interest because, theoretically, it’s possible to sandbox the training data and train a bespoke model for your organisation. This approach allows proprietary information to remain within the organisation, making it readily accessible to employees.
However, the practicalities of this approach raise several questions:
- What volume of data is necessary to create a useful LLM for a typical enterprise?
- If you opt for a smaller LLM, how much computing power will it consume? Could it, for example, run on a standard server?
- More intriguingly, could a Small Language Model (SLM) run on a device like a mobile phone, tablet, or even a smartwatch?
The answers to these questions hold significant interest for anyone working in software product management looking to innovate with proprietary datasets and potentially integrate AI into applications running on physical devices in the hands of their employees or customers. The answers will also be of great interest to corporate IT professionals, faced with the challenge of infusing AI into enterprise applications.
The Potential and Challenges of Small Language Models
While established Large Language Models (LLMs) like ChatGPT and Co-Pilot offer API interfaces that allow access to their AI models, this comes with certain drawbacks. Firstly, the information these models generate can be subject to bias and errors, which could potentially harm an organisations brand if this functionality is embedded into a product or a chatbot on your website.
Moreover, connecting to these LLM platforms requires a cloud connection through the firewall and typically involves a subscription. Many companies may hesitate to embed such functionality into their products or services to the point where it becomes mission-critical. What if the vendor suddenly changes the rules, is acquired, or goes out of business? How do you prevent confidential information leaking and being used to train the models for competitors and others? LLM models are constantly evolving, and while this is generally seen as a positive (they are improving), one person’s improvement could be another’s regression. The answer will depend on how much risk you are willing to take here.
So, let’s return to the question of Small Language Models (SLMs) and the feasibility of training your own data to create a sandbox model that you can control and “own” to the point where it can be mission-critical and reliable in your product or service.
What are Small Language Models?
SLMs are lightweight generative AI models that bridge the gap for smaller organisations and startups, allowing them to develop and deploy generative AI technology in their own products or services.
An SLM is characterized by three main factors:
- smaller size (smaller neural networks),
- less training data (trained with fewer parameters and on less data),
- and reduced computational requirements (less processor and memory usage, making them suitable for on-device deployment).
I recently learned of Microsoft’s release of Orca 2, an SLM that comes in two sizes – 7 billion or 13 billion parameters. Microsoft claims that this SLM can perform closely to LLMs with relatively minor “tuning”, provided the model is targeted at a specialised domain. At the time of writing, I do not believe Orca 2 is quite ready for the applications I have in mind, but the pace of innovation is rapid, and there’s a good chance that it will mature enough to include in product roadmaps soon.
Examples of SLM deployments might include a customer support chatbot, a technical troubleshooting guide, a corporate intranet page that summarises important updates on the go, medical or safety diagnosis built into a portable device like a phone or watch, and perhaps a specialised code autocompletion bot that adheres to your corporate software development standards and style.
SLM Generative AI models can run “on the edge”, meaning they use local computing resources and do not rely on cloud services. This opens up the possibility of having your AI inside the firewall, where information security is well controlled. Consultants and developers selling software products into the larger Enterprise market will understand the importance of this.
There is clearly a lot to learn about Generative AI, and SLMs in particular. My interest stems from the challenge of embedding AI into existing software products. Infusing AI into a new software product will require a host of new skills and a fresh perspective on product development. As I discover more, I will update my blog, so feel free to subscribe if this topic interests you.