Analysis

Who’s the King of LLM? GPT-4 vs. Claude 3 vs. Gemini 1.5 vs. Mistral Large

Jakob Steinschaden17. April 2024, 14:00

König der Bots. © Dall-E / Trending Topics — KING OF THE BOTS. © DALL-E / TRENDING TOPICS

Startup Interviewer: Gib uns dein erstes AI Interview

Hardly a day goes by without a new Large Language Model (LLM) gets published. Sometimes it is organizations and companies from the open source sector that are throwing new language models onto the market, sometimes it is the billion-dollar Silicon Valley startups that are retrofitting their AI models. One thing is clear: the king, who has dominated the scene for a long time, has now been defeated – because a spin-off from OpenAI has managed to knock GPT-4 off the throne with billions from Google and Amazon.

Well, at least as far as the technical side is concerned. In any case, Anthropic, which is run by former OpenAi employees Daniela Amodei (Ex-OpenAI VP of Safety and Policy), Dario Amodei (Ex-OpenAI Vice President of Research), Jack Clark (Ex-OpenAI Policy Director) and Jared Kaplan (Ex-OpenAI Research Consultant) was founded, GPT-4 was dethroned. Claude 3 and especially its strongest version “Opus” performed better in tests than GPT-4 Turbo, OpenAI’s best AI model to date. So far, even Google hasn’t been able to do that with Gemini. The surprising fourth place, at least according to the ranking of the well-known Chatbot Arena, does not go to the hyped French startup Mistral AI, but to Command R+. Behind it is the AI startup Cohere, headquartered in Toronto, Canada.

Here is an overview of the most important current AI models, ranked according to the Chatbot Arena’s best list:

The better, the more expensive and closed

A comparison also makes it clear: the better an AI model is, the further away it is from open source. And that’s not all: If you want to work with it via API, you often have to pay three to five times more for the best AI models than what is charged for weaker but cheaper models. The business models are actually always the same: fees for input and output are charged for using an AI model via API, and the price depends on the tokens used. The process of converting input text into a form readable by the model typically occurs in a sequence of “tokens”. Tokens can be words, partial words, or even individual letters. The more input and output, the more expensive.

While in 2023 many who were interested in AI models were still paying close attention to the number of parameters that AI models bring with them, more attention is now being paid to the context window – this is going so far that many providers no longer even publish how many billions Parameters have their latest creations, but rather communicate the number of tokens of the context window. Among the available models, Google is now in the lead with a million tokens, followed by Anthropics Claude (200,000), and only after that come GPT-4 (128,000), Command R+ (128,000) or Mistral Large (32,000). For average users who quickly enter short questions in ChatGPT, the context window is irrelevant – but in a business environment where large data sets need to be entered for editing, large context windows are important. Anthropics Claude, with 200,000 tokens, is said to be able to summarize entire stock market prospectuses that can be hundreds of pages long. The larger the context window, the more business, you could say.

When it comes to context windows, a startup started in Austria could also play an important role. Because Magic.dev by Eric Steinberger and Sebastian De Ro, now properly financed by Alphabet’s investment arm Capital G and other Silicon Valley giants, wants to have created an LLM with a context window of five million tokens – i.e. 5x more than what was previously available the field’s leading model, Gemini 1.5 Pro from Google. However, unlike other AI models, it is not yet freely accessible.

GPT-4 is way ahead in usage

No matter how good new AI models from Anthropic, Cohere, Google, or Mistral AI may be, OpenAI with GPT-4 leads by far in terms of usage. Not only does ChatGPT have more than 100 million users and the paid business version has been licensed by 600,000 companies worldwide, GPT-4 is of course also included in Microsoft’s Copilot, which has been integrated across the IT giant’s product portfolio. Compared to GPT-4, even Google’s Gemini, which is currently being pushed into the market with all possible means, is comparatively small. According to the analysis service SimilarWeb, ChatGPT is now larger than the second-largest search engine Bing, and four to five times larger than Gemini. Microsoft’s copilot, Anthropic’s Claude, or Perplexity (based on GPT-4 Turbo and Claude 3) only played small supporting roles, at least at the time of the SimilarWeb measurement in February 2024.

OpenAI and Meta will follow suit soon

Currently, it can be said: that Anthropic (Claude 3), Google (Gemini), Cohere (Command R), and Mistral AI were able to catch up with the industry king GPT-4 last year, at least technically. However, it will probably only be a matter of weeks in which there is still eye level. It is expected that Meta will bring Llama 3 onto the market as early as May. On the one hand, this is urgently needed, as Llama 2 has long been far behind; On the other hand, it’s also exciting because, thanks to the content from Facebook and Instagram, Meta has an almost endless amount of data that could be used to train new AI models – no other company has as much data on human language in words and images as Meta. How important this data is can be seen in two examples: OpenAI is said to have transcribed millions of hours of YouTube videos to obtain amounts of text, and Google paid Reddit a reported $60 million per year to be able to tap their social platform via API.

And then of course there is GPT-5. It is an open secret that OpenAI has been working on the successor to the hit GPT-4 for a long time. Rumors suggest that GPT-5 could be released as early as summer. Microsoft is likely to be preparing for this major event by bringing in the Inflection AI or Deepmind founders to lead the new Microsoft AI division. Your task will be to bring the GPT creations not only to Copilot but also to the Bing search engine and the Edge browser. So one thing remains above all: exciting.