Exploring Self-Hosted Copilots with Google Gemma and Ollama

Chapter 1: Introduction to LLMs and Gemma

The rapid advancements in the realm of large language models (LLMs) can be overwhelming. On February 21, Google unveiled a new series of models named Gemma, which are designed to deliver exceptional performance relative to their size. As per the technical documentation, the 7B model surpasses all other open-weight models of comparable or larger sizes, including Llama-2 13B.

According to the accompanying technical report, these new models were trained using methodologies similar to those of Google's elite Gemini models. Recently, I published a tutorial on utilizing Ollama to create a Phi-2-based self-hosted copilot. Now, I aim to update that guide for those eager to experiment with this new model.

We will be employing the Gemma 2B model, which is sufficiently lightweight to operate on a standard consumer laptop. After establishing the copilot, it will assist us with Python programming tasks while we compare its performance to that of Phi-2.

Let's begin by reviewing the prerequisites before diving into the coding process!

Section 1.1: Prerequisites

To set up our environment, we need to install the same tools as in the previous Phi-2 tutorial. Please ensure you have the following software:

Visual Studio Code
Continue extension for Visual Studio Code
Windows Subsystem for Linux (WSL), if you're using Windows

In the coding section, I will reference concepts related to copilots, REST APIs, and programming in general without going into extensive detail.

For testing, I used a Dell Latitude 7430 equipped with 16GB of RAM and an Intel i5 1250p processor. Ollama operated in the WSL-2 environment, limited to 4GB of RAM. Keep in mind that performance may fluctuate based on your hardware specifications.

Subsection 1.1.1: Installing and Running the Model

To set up Ollama and launch the Gemma model, execute the following commands:

ollama run gemma:2b

The first command installs Ollama on your operating system, while the second command downloads and runs the 2B instruction-tuned model interactively. If everything is functioning correctly, you should see an interactive prompt.

You can verify the model's functionality by interacting with it, but if you prefer, you can proceed to the next steps without closing the prompt.

💡 Note: Do not close the interactive prompt; otherwise, subsequent steps will not work. You can run the model in the background or set up a dedicated service if you wish.

Section 1.2: Configuring Continue with Gemma

Next, install the Continue extension from the Visual Studio Code marketplace. Open the settings for the extension by clicking its icon in VS Code to configure the Ollama provider.

If you are running Ollama on localhost at port 11434, modify the configuration file as follows:

{

"model": "gemma:2b",

"title": "Ollama-Gemma",

"completionOptions": {},

"apiBase": "localhost:11434",

"provider": "ollama"

}

Now, you can select the model from the dropdown menu on the left side of the interface and begin testing the copilot.

Chapter 2: Testing the Copilot

Let's conduct tests with the copilot in the same manner as we did with the Phi-based model, using identical prompts. First, create an empty test.py file and open the Quick Edit menu using the shortcut "ctrl+shift+L."

Provide the same prompt as with Phi-2: "Create a Python application using a framework of your choice that accepts GET requests on an endpoint and returns a list of users. The users must be stored in a local data structure, avoiding any database-related code."

Here’s the generated code:

import flask

import json

app = flask.Flask(__name__)

# Define a list of users

users = []

@app.route("/")

def index():

return json.dumps(users)

if __name__ == "__main__":

app.run(debug=True)

Gemma’s output is quite similar to what Phi-2 produced, though Gemma opted for Python's standard library over Flask's JSON implementation for data serialization.

Now, let's test the terminal debug feature with Gemma. If we run the application, we will encounter an error due to the missing Flask module.

We can debug the terminal output using "ctrl+shift+R" with Continue and Gemma. Gemma identified the issue and provided a more detailed explanation than Phi-2, which can be advantageous or disadvantageous depending on the user's preferences and the task at hand.

After installing the Flask module and manually populating the preceding code, we can re-run our test application to confirm it’s functioning as expected:

curl http://127.0.0.1:5000

You should receive the following response:

[{"age": 23, "username": "billgates"}, {"age": 29, "username": "stevejobs"}]

Next, we will ask Gemma to implement a simple CRUD operation. Highlight the code, press "ctrl+shift+L," and use the prompt: "Can you please add endpoints to add users, delete users, and retrieve a user by its index in the list?"

Gemma provided the following code:

import flask

import json

app = flask.Flask(__name__)

# Define a list of users

users = [{"age": 23, "username": "billgates"},

{"age": 29, "username": "stevejobs"}]

# Add a new user

@app.route("/add_user", methods=["POST"])

def add_user():

data = flask.request.get_json()

users.append(data)

return json.dumps(users)

# Delete a user by its index

@app.route("/delete_user/")

def delete_user(index):

users.pop(index)

return json.dumps(users)

# Get a user by its index

@app.route("/get_user/")

def get_user(index):

return json.dumps(users[index])

if __name__ == "__main__":

app.run(debug=True)

Gemma accurately interpreted my request, unlike Phi-2, which made a couple of small mistakes: it failed to include the __main__ entry point and mismanaged the "delete-by-index" feature. However, Phi-2 did provide some limited error-handling code that Gemma did not include.

In summary, while both models performed well, Gemma slightly outperformed Phi-2 in this particular task, although these results are not definitive due to the narrow scope of the tests conducted.

Conclusions

The introduction of the Gemma model family, particularly the smaller Gemma 2B variant, is a significant advancement in the LLM landscape (or SLMs, as these models are frequently referred to). The accessibility of this model, along with its counterpart Phi-2, facilitates experimentation for researchers and developers alike.

Overall, I am pleased with the performance exhibited by Gemma 2B in code completion tasks. The year 2024 is off to an exciting start, and I anticipate the emergence of even more SLMs in the near future.

Thank you for joining me on this journey—here’s to the enthusiasts!

References

Gemma: Google introduces new state-of-the-art open models (blog.google)
Gemma Technical Report
Phi-2: The surprising power of small language models — Microsoft Research
Ollama

The first video discusses an incredible open-source alternative to GitHub's Copilot, featuring Groq, Llama-3, Ollama, and more.

The second video reveals how to stop paying for GitHub's Copilot with a new, local, and free alternative involving Phi-3, ContinueDev, and Ollama.

provocationofmind.com

Exploring Self-Hosted Copilots with Google Gemma and Ollama

Chapter 1: Introduction to LLMs and Gemma

Section 1.1: Prerequisites

Subsection 1.1.1: Installing and Running the Model

Section 1.2: Configuring Continue with Gemma

Chapter 2: Testing the Copilot

Conclusions

References

Share the page:

Recent Post:

What Makes a Book Truly Great for You?

Mastering the Art of Writing Research-Driven Articles

Navigating Self-Evaluations: A Humorous Take on Career Insights