Join our daily and weekly newsletters to get the latest updates and exclusive content on industry-leading AI coverage. Learn more
IBM is cementing its place at the top of the open source AI leaderboard with the new Granite 3.1 series, available today.
Granite 3.1’s Large Language Model (LLM) allows enterprise users to extend token context lengths up to 128,000 tokens. New embedding model. Integrated hallucination detection and improved performance. According to IBM, the new Granite 8B Instruct model outpaces open source competitors of the same size, including Meta Llama 3.1, Qwen 2.5 and Google Gemma 2. IBM ranks models based on a set of included academic benchmarks. In the OpenLLM Leaderboard
The new model is part of an acceleration of the rollout of IBM’s open source Granite model. Granite 3.0 was just released in October. At the time, IBM claimed it had a $2 billion business book related to generative AI. With the Granite 3.1 update, IBM is focused on packing additional capabilities into a smaller model. The basic idea is Smaller models are easier and more cost-effective for organizations to operate.
“We also added all the numbers. The overall performance of everything is pretty much better,” David Cox, vice president of AI modeling at IBM Research, told VentureBeat. “We use Granite for a lot of different applications. We use it internally at IBM for our products. We use it for counseling purposes. We make it available to our customers. And we release it as open source. So we have to be good at that as well. Everything.”
Why are performance and small model sizes important for enterprise AI?
There are several ways that organizations can evaluate LLM performance with benchmarks.
The direction IBM is taking is to run the model through the scope of academic and real-world testing. Cox emphasizes that IBM tests and trains the model to optimize it for enterprise use cases. Efficiency is not just an abstract measure of speed. But it is a somewhat more detailed measure of performance.
One aspect of efficiency that IBM aims to drive is helping users spend less time getting the results they want.
“You should spend less time fiddling with notifications,” Cox said. The less time you spend on engineering responses.”
Performance is also related to the size of the model. The larger the model, the Generally speaking, more computing resources and GPUs are required, which means more costs.
“When people do prototype type work that is minimally efficient, They often jump to very large models. So you might go to a 70 billion parameter model or a 405 billion parameter model to build your prototype,” Cox said. “But the reality is that many economically So another thing we try to do is drive as much capacity as possible into the smallest package possible.”
Context matters for enterprise-grade AI agents.
In addition to promises to improve efficiency and productivity, IBM has greatly expanded Granite’s context length.
In the initial release of Granite 3.0, context length was limited to 4k. In Granite 3.1, IBM has expanded it to 128k, allowing for much longer document processing times. The additional context is an important upgrade for enterprise AI users, both for the data extraction-augmentation (RAG) model and for agent-based AI.
Agent-based AI systems and AI agents often need to process and reason on longer sequences of data, such as large documents. Track record or an extended conversation The additional context length of 128,000 allows these AI agent systems to access more contextual information. Helps you better understand and respond to complex questions or tasks.
IBM also released a set of embedding models to help accelerate the vector data conversion process, the Granite-Embedding-30M-English model. It can achieve performance of 0.16 seconds per query, which IBM claims is faster than competing options, including Snowflake’s Arctic.
How has IBM improved Granite 3.1 to meet enterprise AI needs?
How did IBM manage to improve performance for Granite 3.1? It’s not anything specific. Rather, it is a set of process and technical innovations, Cox explains.
IBM has developed a more advanced, multi-stage training pipeline, he said. This allows the company to extract more performance from models. Additionally, a key part of LLM training is data, rather than focusing on adding. Training Data Volume IBM has focused on improving the quality of the data used to train the Granite model.
“It’s not a volume game,” Cox said. “It’s not like we can go out and get 10 times more data and that will magically make the model better.”
Reduce ghosting directly in the model
A common way to reduce the risk of hallucinations and false positives in LLM is to use guardrails. These are generally deployed as external qualifications alongside the LLM.
With Granite 3.1, IBM is integrating hallucination prevention directly into the model. The Granite Guardian 3.1 8B and 2B models now have function-triggered hallucination detection capabilities.
“This model can create its own barrier. That can give developers different opportunities to capture different things,” Cox said.
He explains that detecting phantoms in the model helps optimize the overall process. Internal detection means fewer inference calls. This makes the model more efficient and accurate.
How can organizations use Granite 3.1 today? And what will happen next?
All new Granite models are now freely available as open source to enterprise users. The model will also be available through IBM’s Watsonx enterprise AI service and will be integrated into IBM’s commercial products.
The company plans to continue updating the Granite model in leaps and bounds. Looking ahead, the plan for Granite 3.2 is for multiple functionality additions to launch in early 2025.
“You will see us in a few points when it goes live. By adding various features that is different This will lead to what we will announce at next year’s IBM Think conference,” Cox said.
[ad_2]
Source link