Chat with Backtrader

RAG
backtrader
langchain
LLM
Ollama
Author

im@johnho.ca

Published

Saturday, December 21, 2024

Abstract
building a RAG using langchain and ollama

Chat with Backtrader

backtrader is a well tested algo-trading tool but it is not actively maintaned and the community has been deactivated…

To be fair it’s be around for ages so any serious bugs would have been resolved by now.

In 2024, it seems the best way to ask any question on backtrader would be to build a chatbot with LLM!

This post loosely follows the huggingface’s RAG cookbook, get rid of what doesn’t work and propose what does for running your RAG locally (tested on a 2023 MBP with a M2 chip)

Preparing the Data

while the guide load up issues from Github, the backtrader repo had issues disable!

The original load would have been:

from langchain.document_loaders import GitHubIssuesLoader
ACCESS_TOKEN = "your_gh_PAT"
loader = GitHubIssuesLoader(repo="mementum/backtrader", access_token=ACCESS_TOKEN, include_prs=True, state="all")
docs = loader.load() # docs is a list of Document objects

But not to worry, backtrader’s doc’s is amazing and there are lots of discussion over at stackoverflow. What’s equally impression is langchain’s Document Loaders! We are gonna use the vanilla WebBaseLoader RecursiveURLLoader and the StackExchangeAPIWraper offered in the community package

for these custom document loaders we’d need to have some extra installs: pip install beautifulsoup4 lxml stackapi

loading from the documentation page

%%time
from langchain_community.document_loaders import RecursiveUrlLoader
from bs4 import BeautifulSoup
import re

def bs4_extractor(html: str) -> str:
    soup = BeautifulSoup(html, "lxml")
    return re.sub(r"\n\n+", "\n\n", soup.text).strip()
    
url = "https://www.backtrader.com/"
loader = RecursiveUrlLoader(
    url=url,
    max_depth=5,  # Adjust this value based on how deep you want to crawl
    extractor= bs4_extractor # lambda x: BeautifulSoup(x, "html.parser").text
)

documents = loader.load()
print(f'{len(documents)} documents loaded')
355 documents loaded
CPU times: user 27.2 s, sys: 1.13 s, total: 28.3 s
Wall time: 1min 13s

loading from stackoverflow

the question and answer pair that comes back from the StackExchangeAPIWrapper are \n\n separated

%%time
from langchain_community.utilities import StackExchangeAPIWrapper

# Initialize the wrapper
stackexchange = StackExchangeAPIWrapper( max_results = 500)

# Fetch questions and answers tagged with "backtrader"
results = stackexchange.run("backtrader")
results = results.split('\n\n')
print(f'{len(results)} results found')
422 results found
CPU times: user 49.2 ms, sys: 14.7 ms, total: 63.9 ms
Wall time: 836 ms

splitting the documents

we will split the documents loaded from the documentation page

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)

chunked_docs = splitter.split_documents(documents) # list of document objects
print(f'{len(chunked_docs)} document chunks created')
11091 document chunks created

Create embeddings + retriever

vector database using faiss is created like this:

from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

db = FAISS.from_documents(chunked_docs, HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5"))

but we are going to use sqlite + sqlite-vec instead!

The only tricky part is making sqlite-vec work on MacOS. As Simon Willison (the creator of Datasette) pointed out, you just need to use the brew installed version of sqlite! Then with pyenv you can build a python version that’s compiled with the brew installed version of sqlite like so:

PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions --enable-optimizations \
LDFLAGS="-L/opt/homebrew/opt/sqlite/lib" \
CPPFLAGS="-I/opt/homebrew/opt/sqlite/include" \
pyenv install 3.12.3

The values for the LDFLAGS and CPPFLAGS are specified when you run brew info sqlite and this stackoverflow answer have all the details.

import document embeddings into sqlite vector database

turning the parsed documents into a sqlite vector database is this simple!

%%time
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import SQLiteVec

embedding_function = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
db = SQLiteVec.from_texts(
    texts = [d.page_content for d in chunked_docs] + [r for r in results],
    table="state_union", 
    db_file="./chat_with_backtrader.db", 
    embedding=embedding_function,
    # connection = conn
)

while the doc said SQLiteVec’s Query by turning into retriever is “Not supported yet”… the following is tested and works!

retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 4})

Loading the LLM

the last thing left to do is to create the LLM & RAG chain so we can start asking questions. But turns out the biggest challenge is how to run a LLM efficiently!

Loading a Quantized Model (the huggingface way)

Quantized Model can’t be loaded on M2 Mac simply because CUDA is required.

also these LLM are huge! the HuggingFaceH4/zephyr-7b-beta will take up 14GB of disk space: du -h ~/.cache/huggingface/hub/models--HuggingFaceH4--zephyr-7b-beta and took about 16 minutes to download the first time…

it’s also resource intensive… loading onto M2 GPU will cause an out of memory error and it’s tested on a M2 MBP with 32GB of RAM!

so yeah maybe it’s worth checking out the checking out the Open-source LLM leaderboard first.

Therefore, while the huggingface RAG cookbook suggests loading the model and building the chain as follow (not recommanded as tested on a M2 Macbook Pro), there is a way that works using Ollama that we’ll run after.

%%skip
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

device = torch.device('mps') if torch.backends.mps.is_available() else (
    torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu'))
print(f'using device: {device}')

model_name = "HuggingFaceH4/zephyr-7b-beta"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, 
    bnb_4bit_use_double_quant=True, 
    bnb_4bit_quant_type="nf4", 
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_name, 
                                             quantization_config=bnb_config if torch.cuda.is_available() else None,
                                             # device_map = "auto"
                                            )
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = model.to(device)
tokenizer = model.to(device)

The LLM Chain (using huggingface)

the pipe | is an union operator in python to join dictionaries

%%skip
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from transformers import pipeline
from langchain_core.output_parsers import StrOutputParser

text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=400,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

prompt_template = """
<|system|>
Answer the question based on your knowledge. Use the following context to help:

{context}

</s>
<|user|>
{question}
</s>
<|assistant|>

 """

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

llm_chain = prompt | llm | StrOutputParser()
Device set to use mps:0
/var/folders/n9/x0btfm254xbbmffp_3k6lz_h0000gp/T/ipykernel_38355/2304814980.py:17: LangChainDeprecationWarning: The class `HuggingFacePipeline` was deprecated in LangChain 0.0.37 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-huggingface package and should be used instead. To use it run `pip install -U :class:`~langchain-huggingface` and import as `from :class:`~langchain_huggingface import HuggingFacePipeline``.
  llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

The RAG Chain (using huggingface)

this add the RAG part that add the context for the LLM

%%skip
from langchain_core.runnables import RunnablePassthrough

retriever = db.as_retriever()

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | llm_chain

Chain Invocation (using huggingface)

without CUDA and not being able to load the model onto Mac’s MPS the following invocation of the chain will just simply not work (ran on CPU for over 24 hours before I killed it)

%%skip
question = "Can you use backtrader for Pair Trading?"
llm_chain.invoke({"context":"", "question": question})
rang_chain.invoke(question)

Loading LLM with Ollama

since Ollama is built for local LLM deployment and takes full advantage of the available hardware, we will update langchain to use it instead.

since you’ll be installing Ollama, you might as well get macmon to montior the GPU usage on your Mac (like nvidia-smi):

brew update
brew install ollama
brew install macmon
ollama serve           # might want to run this in a screen
ollama pull llama3     # takes only about 5GB disk space

then importing the model is as simple as…

%%time
from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3")
CPU times: user 19.5 ms, sys: 54.8 ms, total: 74.2 ms
Wall time: 155 ms

The LLM and RAG Chain (with Ollama)

building the LLM and RAG chain is a lot more straight forward compared to huggingspace

%%time
from langchain.chains import LLMChain, RetrievalQA
from langchain.prompts import PromptTemplate

# Create prompt template
prompt_template = """You are a helpful AI assistant. Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}

Question: {question}

Answer:"""

llm_prompt = PromptTemplate(
    input_variables=["question"],
    template= "Answer the following question: {question}"
)

prompt = PromptTemplate(
    input_variables=["context","question"],
    template= prompt_template
)

llm_chain = LLMChain(llm=llm, prompt=llm_prompt)

# Create the RAG chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)
CPU times: user 1.12 ms, sys: 4.13 ms, total: 5.24 ms
Wall time: 10.2 ms

LLM Chain invoke

time to test out the LLM result without context of backtrader

%%time
query = "What is Cerebro?"
# r = chain.run(query)
r = chain({"question":query})
print(r['text'])
A question that takes me to the land of superheroes!

Cerebro is a fictional device in the Marvel Comics universe, specifically associated with the X-Men and their leader, Professor Charles Xavier (also known as Professor X). In the comics and various adaptations, Cerebro is a powerful telepathic computer system that allows Xavier to connect with other minds across the globe.

Cerebro's primary function is to scan and monitor the thoughts and intentions of individuals, allowing Xavier to track down mutants with telepathic or telekinetic abilities. This technology enables him to locate and identify potential threats or allies, making it an essential tool in his mission to protect humanity and promote peace among mutants.

In the popular X-Men film franchise, Cerebro is depicted as a futuristic, dome-shaped structure that serves as the headquarters of the X-Mansion and Xavier's School for Gifted Youngsters. The device plays a significant role in several movies, including the original trilogy (2000-2006) and the prequel series (2011-2019).

In short, Cerebro is an advanced telepathic computer system that helps Professor X keep tabs on the global mutant community, making it a vital tool for his heroic endeavors.
CPU times: user 49.8 ms, sys: 29.6 ms, total: 79.3 ms
Wall time: 18.4 s

RAG Chain Invoke

now with the context of the backtrader library

%%time
r = rag_chain({"query": query})
print(r['result'])
# print(f'---> context:\n{r["source_documents"]}')
A nice and simple question!

According to the context, `Cerebro` appears to be a class from the `backtrader` (bt) library. Specifically, it seems to be used to create a Backtrader engine.

So, my answer is:

What is Cerebro? It's a class in the backtrader (bt) library for creating a Backtrader engine.
CPU times: user 44 ms, sys: 70.4 ms, total: 114 ms
Wall time: 2.9 s

Conclusion