Running Ollama on SSCC Servers (for Mark Fedenia)

(Note: I wrote this up in a Python notebook because it was an easy way to give you working code along with the explanation. But while Quarto makes it look all professional, it was written with the speed of an email and has the corresponding probaiblity of errors.)

Ollama is an odd program: it runs as a service on the computer and then jobs connect to it. It’s designed to be running all the time but does not appear to be intended for multi-user environments. From our perspective, the best way to run it is in a Slurm job which starts up the service, runs your program, and thens shuts everything down. If multiple people try to run it on the Linstats and that causes problems, we may need to ask people to run large language models in a different way or query cloud services.

You definitely want to use a server with a GPU. For interactive work you can use the T4s in Linstat7 & 8, or even better slurm015 and slurm016 in OnDemand as few people are using OnDemand yet. For Slurm, note that we’ve changed our instructions for how to reserve GPUs. Someone tried to use them for the first time on Friday and reported problems, but right now the cluster is so busy (mostly with that person’s jobs) we can’t test properly.

Installing Ollama

Dan figured out that you only need sudo to install Ollama if you want it to start automatically on bootup. But I haven’t found a good way to change its settings otherwise, which will create one minor complication.

Create Directories

SSCC home directories are relatively small, so create a folder for Ollama in your project directory. I’ll use /project/bbadger/ollama for my example commands. Inside it, create a directory called models. But Ollama puts models in your home directory by default, and since we can’t change that we’ll fool it with a symlink.

Under your home directory, create a directory called .ollama. Make that your working directory with cd ~/.ollama. Then make a symlink to your version of /project/bbadger/ollama/models with:

ln -s /project/bbadger/ollama/models models

Get the Program

Now go back to your project directory (/project/bbadger/ollama) and get Ollama itself with:

curl -L https://ollama.com/download/ollama-linux-amd64 -o ollama

Make it executable with:

chmod + x ollama

Running Ollama Interactively

Make /project/bbadger/ollama your working directory, then start the Ollama service with:

./ollama start

This will take over your terminal session, so start another one to use it. Again make /project/bbadger/ollama your working directory, then run:

./ollama run llama2

llama2 will then respond to whatever you type next. You can use any model from the Ollama model library.

Note that it will first download the llama2 model to /project/bbadger/ollama (actually, it’s trying to put it in ~/.ollama/models). This is the easy way to make models available for Python code you’ll run later.

Creating a Conda Environment for Ollama

I presume you’re interested in running Python code that uses Ollama, not just running it interactively. I created a conda environment that can run Ollama successfully; you can clone it by downloading this export file, putting it in /project/bbadger/ollama and running:

conda env create -f ollama_env.yml -p ollama_env

This will create a conda environment called ollama_env which you can activate (assuming /project/bbadger/ollama is your working directory) with:

conda activate ./ollama_env

Running Python Code that Uses Ollama on an Interactive Server

To run Python code that uses Ollama, Ollama must be running. If you haven’t already run:

./ollama start

You can then run Python code as usual (in a different terminal). If you want to run this notebook, activate ollama_env first, run sscc-jupyter, and copy the ULR it gives you into a local browser. To run a script with just the first prompt in the notebook, activate the environment and run:

python ollama_sscc.py

The following code comes from the Ollama Python tutorial, with minor modifications. It starts with a simple prompt/response, then loads Homer’s Odyssey and answers a question based on it. Depending on the chunk_overlap I’ve seen it completely make up a poem, Homer’s Nekrologia (presumably a prequel to Necronomicon).

from langchain.llms import Ollama
import time

ollama = Ollama(base_url='http://localhost:11434', model="llama2")
start = time.time()
print(ollama.invoke("why is the sky blue"))
end = time.time()

print('Elapsed time: {:.2f} seconds'.format(end-start))


The sky appears blue to us because of a phenomenon called Rayleigh scattering, which occurs when sunlight passes through the Earth's atmosphere. The atmosphere contains tiny molecules of gases such as nitrogen and oxygen, which scatters the sunlight in all directions.

The shorter wavelengths of light, such as violet and blue, are scattered more than the longer wavelengths, such as red and orange. This is because the smaller wavelengths have a shorter wave length, which means they are easier to scatter. As a result, the blue light is dispersed throughout the atmosphere, giving the sky its blue appearance.

Other factors can also affect the color of the sky, such as the presence of dust, water vapor, and pollen in the atmosphere. These particles can absorb or scatter light in different ways, causing the sky to take on a range of colors. For example, during sunrise and sunset, the sky can take on hues of red, orange, and pink due to the scattering of light by particles in the atmosphere.

It's worth noting that the color of the sky can also vary depending on the observer's location and the time of day. For example, the sky may appear more orange or yellow if you are viewing it from a distance, or if you are viewing it during the night when the Earth is rotating and the position of the sun changes.

In summary, the sky appears blue to us because of Rayleigh scattering, which scatters the shorter wavelengths of light, such as blue and violet, more than the longer wavelengths, such as red and orange. Other factors can also affect the color of the sky, but the main cause is the scattering of light by tiny molecules in the atmosphere.
Elapsed time: 13.11 seconds

from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.gutenberg.org/files/1727/1727-h/1727-h.htm")
data = loader.load()

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
all_splits = text_splitter.split_documents(data)

from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma

oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)

question="Who is Neleus and who is in Neleus' family?"
docs = vectorstore.similarity_search(question)
len(docs)

from langchain.chains import RetrievalQA
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
qachain.invoke({"query": question})

{'query': "Who is Neleus and who is in Neleus' family?",
 'result': "Based on the context provided, Neleus is the son of Iasus and the king of Minyan Orchomenus. He married Chloris, who was the youngest daughter of Amphion and the queen of Pylos. Neleus' family includes his wife Chloris and their children Nestor, Chromius, Periclymenus, and Pero."}

Submitting An Ollama Job to Slurm

To run a Python script that uses Ollama in Slurm, your job needs to

Start Ollama
Activate the ollama_env conda environment
Run the Python script

You can do that with the following shell script:

#!/bin/bash
./ollama start &
conda activate ./ollama_env
python ollama_sscc.py

Make sure it’s executable (chmod +x) before you submit it.

You can submit this to Slurm without using a GPU (i.e. without --gres=gpu:a100:2). In my testing of just the first prompt, using all 128 cores of a Slurm node took about 100 seconds, while the T4s took 6-8 seconds. I haven’t been able to test with an A100 yet.