Do you agree with the general direction of GenAI/LLM's?
The aspiration is to get to Artificial General Intelligence (AGI). As an aside, the definition of AGI still eludes us. It tends to have meaning that can range from "God" to "generalized foundational models" using unique AI architectures - that is a different blog for a different day.
The question is: Is this approach a sustainable one and will it lead to another "generalized" solution with limited application scope? Consider these specifications and metrics for GPT-4:
- Has 1.8+ trillion parameters across 120 layers
- Uses 16 Mixture Of Experts (MoE) each with ~111B Multi-layer perceptron
- Trained on ~13T normalized tokens
- Training Cost over $6o million
- Inference cluster size:128 GPUs, 8-way tensor parallelism, 16-way pipeline parallelism.
This ensures that GPT-4 is trained on a very large corpus (which is still a small subset of a very very large corpus of available texts and images) and is up-to-date till that last training. To avoid continued training day-after-day, month-after-month and year-after-year, these models will use an extension called RAG (Retrieval-augmented generation) framework uses data from external sources of knowledge to improve the quality of responses. Technique is used to improve the accuracy of the LLM's and reduce the need for continuous training.
Also at a fundamental level, the process is quite complicated starting from the quality of data (Corpus) to generating accurate, representative embeddings:
- Requires a very large, costly infrastructure
- Requires a very large Corpus for training, hallucinations increase as the Corpus size decreases
- Increasingly relies on the use of Mixture of Experts (MoE’s)
- Does not rely on rules (of Grammar); Language Semantics, Ontology not explicit
- Does not “understand” the meaning of the words or the context. Hallucinations due to low probability vector/sequence matches, incomplete or conflicting training data and misinformation in training set
My Viewpoint: Human decisions require a deterministic outcome. They also require outcomes/outputs that can be trusted. To achieve that, the tokenization algorithms need to be more robust to make the vectors are generated to improve determinism. A recommended approach would be incorporate mapping to a grammar through use a modified form of BNF to parallelly generate business domain ontologies and semantics as a part of the embedding space. Example, the word “Transformer” is further qualified to the business domain of either toys, power, AI etc. to generate the output sequences. This can be done by the router within the MoE or in general the LLM itself to narrow the outputs using the ontological and semantic vectors to scope the output to the business space. This can also reduce hallucinations even when the Corpus is limited.
Comments
Post a Comment