Summarization with Llamaindex and a Local Model

The summarizations examples in Llamaindex website are always based on the OpenAI connection. In this short article we will show how to setup a local model and pass it to the summarization task.

cover

Dependencies

First and foremost you need Ollama, the runtime engine to load and query against a pretty decent number of pre-trained LLM.

Then of course you need LlamaIndex

Loading Ollama and Llamaindex in the code

In the code below we instantiate the llm via Ollama and the service context to be later passed to the summarization task. We select Mistral as LLM, but you can choose any other model that can run with Ollama.

print("loading ollama...")
from llama_index.llms import Ollama
from llama_index import ServiceContext
llm = Ollama(model="mistral", request_timeout=10.0)
service_context = ServiceContext.from_defaults(llm=llm,embed_model="local")
print("DONE")

Loading the document to be summarized

Here we load the document from a local file. However, given the Reader capabilities of LlamaIndex classes you can load from remote url or any other support.

print("loading data ...")
from llama_index import SimpleDirectoryReader
reader = SimpleDirectoryReader(
    input_files=["../docs/essay.txt"]
)
docs = reader.load_data()
text = docs[0].text
print("DONE")

Summarizing, finally!

print("summarizing ...")
from llama_index.response_synthesizers import TreeSummarize
summarizer = TreeSummarize(service_context=service_context,verbose=True)
response =  summarizer.get_response("what is all about?", [text])
print(response)