Mistral 7B LLM: Run Locally with OllamaMistral 7B LLM


Photo by Chris Ried on Unsplash

Mistral 7B LLM Introduction

Mistral AI currently provides two types of access to Large Language Models:

In this guide, we provide an overview of the Mistral 7B LLM and how to prompt with it. It also includes tips, applications, limitations, papers, and additional reading materials related to Mistral 7B and finetuned models.

Mistral-7B Introduction

Mistral 7B is a 7-billion-parameter language model released by Mistral AI(opens in a new tab). Mistral 7B is a carefully designed language model that provides both efficiency and high performance to enable real-world applications. Due to its efficiency improvements, the model is suitable for real-time applications where quick responses are essential. At the time of its release, Mistral 7B outperformed the best open source 13B model (Llama 2) in all evaluated benchmarks.


Mistral 7B has demonstrated superior performance across various benchmarks, outperforming even models with larger parameter counts. It excels in areas like mathematics, code generation, and reasoning. Below are results on several tasks such as math reasoning, world knowledge and commonsense reasoning

Code Generation

Mistral 7B achieves Code Llama 7B(opens in a new tab) code generation performance while not sacrificing performance on non-code benchmarks. Let’s look at a simple example demonstration Mistral 7B code generation capabilities.

We will be using Fireworks.ai inference platform(opens in a new tab) for Mistral 7B prompt examples. We use the default settings and change the max_length to 250.


Mistral 7B is designed for easy fine-tuning across various tasks. The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. This version of the model is fine-tuned for conversation and question answering.


Like many other LLMs, Mistral 7B can hallucinate and is prone to the common issues such as prompt injections. While Mistral 7B has shown impressive performance in many areas, its limited parameter count also restricts the amount of knowledge it can store, especially when compared to larger models.

Run Locally with Ollama

Ollama is an easy way for you to run large language models locally on macOS or Linux. Simply download Ollama and run one of the following commands in your CLI.

For the default Instruct model:

ollama run mistral

For the text completion model:

ollama run mistral:text

N.B. You will need at least 8GB of RAM. You can find more details on the Ollama Mistral library doc.

Mistral 7B in short

Mistral 7B is a 7.3B parameter model that:

  • Outperforms Llama 2 13B on all benchmarks
  • Outperforms Llama 1 34B on many benchmarks
  • Approaches CodeLlama 7B performance on code, while remaining good at English tasks
  • Uses Grouped-query attention (GQA) for faster inference
  • Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.

Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.