No OpenAI API key, No Problem!

LLM

langchain

Ollama

HuggingFace

Author

im@johnho.ca

Published

Friday, January 24, 2025

Abstract

three different ways to load up a ChatOpenAI model in Langchain without an OpenAI API Key

Intro

when developing LLM applications, most examples use ChatOpenAI as the default. While token cost is rapidly going down, there are actually 3 options that’s completely free (API key required but no credit card needed at sign-up). But more importantly, this offers the ability to switch and test between the vast offering of models which is growing even more rapidly!!!

All three of the following examples use the same interface, ChatOpenAI, so you are just one pip install langchain_openai away from building your own LLM application!

1. Ollama (local)

first option is to run the model locally! Thanks to Ollama, these large model can actually run pretty fast locally on any Apple Silicon Macbook. This post covers how to setup Ollama, which literally opens the door too all the latest open-sourced LLM in the universe! See the entire offering here

For our first example, let’s get a small-language-model to tell some dad jokes!

%%time
import os
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="llama3.2:1b", api_key="ollama", base_url = "http://localhost:11434/v1")
prompt = """
    You are a dad in his 40s which a very intellegent six year-old daughter 
    and a loving wife with a great sense of humor! 
    
    Imagine you are at your daughter's birthday party, please tell 3 of your funniest dad jokes"
    """
r = chat_model.invoke(prompt)
print(r.content)

(laughs) Ahahahaha, oh man... I'm glad my little girl is having so much fun today! You know, as a dad, it's great to have someone to share the love and silliness with. Okay, here are three of my favorite dad jokes that might make you groan (and maybe even laugh):

1. Why did the scarecrow win an award? Because he was outstanding in his field! (get it?)

And then I told her, "I'm so proud of you for being an 'ear-resistible' child!" (wink)

2. What do you call a fake noodle? An impasta!

She giggled so hard that she snorted her juice box out her nose. (laughs) Okay, okay, I'll stop now...

3. Why did the mushroom get invited to all the parties? Because he's fun-gi! (heh)

My daughter just rolled her eyes and said, "Daaaad, that one was so cheesy!" And then she proceeded to make a funny face at me, like this... (makes a silly face)
CPU times: user 26.3 ms, sys: 5.58 ms, total: 31.8 ms
Wall time: 2.29 s

This llama3.2:1b model weight is only 1.3GB can tell better jokes faster (<3 seconds) than I could even!

before trying out some of the bigger and “better” models with the following two providers, let’s load up those API keys from our secrets.env into environment variables

from dotenv import load_dotenv
load_dotenv('./secrets.env')

2. HuggingFace

HuggingFace’s Serverless Inference API has changed

see the update post from July 2025 on how to call the updated API

HuggingFace Serverless Inference API literally open up a world of models to do anything from Text Completion to Image Generation! While there are some limitations, you still have access to thousands of “warm” models just an API call away (that’s right! no need for GPUs).

Using their Chat Completion API, we can directly create an ChatOpenAI runnable with HuggingFace as the “backend”.

Let’s try the Deepseek R1 that’s all the rage these days and see how good its jokes are…

%%time
chat_model = ChatOpenAI(model="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B", 
                        base_url = "https://api-inference.huggingface.co/v1/",
                        api_key = os.environ['HUGGINGFACEHUB_API_TOKEN']
                       )
r = chat_model.invoke(prompt)
print(r.content)

<think>
Okay, so I need to come up with three funny dad jokes for a six-year-old birthday party. The dad is in his 40s, very intelligent, and the daughter is six. The wife is loving with a great sense of humor. The setting is a birthday party, so the jokes should be appropriate, maybe a bit playful, and something that a child would find funny.

First, I should think about what makes dad jokes funny. They usually play on words, puns, or simple misunderstandings. They're often corny but in a way that's endearing. Since it's for a six-year-old, the jokes should be simple, easy to understand, and maybe involve some repetition or funny sounds.

I should also consider the context of the birthday party. Maybe incorporating elements like parties, birthdays, or presents could make the jokes more relevant. The jokes should be clean and not too complex, so that the child can get the humor without needing too much background knowledge.

Let me think of some common themes for dad jokes: animals, food, everyday objects, maybe colors. For a birthday, maybe something with cake, presents, or parties.

First joke idea: Maybe something about cake because it's a birthday. Why did the cake go to school? Because it wanted to improve its icing! That's a play on "icing" and "schooling" or "educating." It's simple and relates to the party.

Second joke: Why don't skeletons fight each other? Because they don't have the guts! That's a classic dad joke, using "guts" as both courage and the internal organs. It's a bit morbid, but in a funny way, and appropriate for a child.

Third joke: Maybe something about presents. Why don't eggs fight in the fridge? Because they might end up in a yolky situation! Play on "yolky" and "icky," which is a fun sound. It's related to the party theme and presents, as eggs can be part of gifts or Easter, but maybe not directly. Alternatively, maybe why did the balloon go to school? Because it wanted to blow! That's another simple one, but maybe too similar.

Wait, another idea: Why did the clock go to school? Because it wanted to improve its timing! That's another play on words, and it's about school, which ties into the birthday party's educational aspect or just the idea of learning.

Alternatively, maybe why did the computer go to school? Because it wanted to improve its RAM! That's a bit more techy, but maybe the child would find it funny.

Wait, maybe something with the number six since the daughter is six. Why was six scared of seven? Because seven ate nine! That's a classic, but maybe a bit too abstract for a six-year-old.

Alternatively, why did the math book look so sad? Because it had too many problems! That's another play on words, but again, maybe a bit abstract.

Hmm, perhaps I should stick with the first three ideas: cake, skeletons, and presents. Alternatively, I can think of more related to the party.

Another idea: Why did the party hat go to school? Because it wanted to be a topper! That's a play on "topper" as both the hat and someone who tops in school.

Wait, maybe the first joke is good, then the skeleton, then maybe something with the number six. Or another one about the cake: Why did the cake go to the party? Because it was a hit at the last party, so it came back for more! That's a play on "hit" meaning both the dessert and the success.

Wait, perhaps the first three I thought of are better:

1. Why did the cake go to school? Because it wanted to improve its icing!
2. Why don't skeletons fight each other? Because they don't have the guts!
3. Why don't eggs fight in the fridge? Because they might end up in a yolky situation!

Alternatively, maybe the third joke is a bit more complex, so perhaps a simpler one. Maybe why did the present go to school? Because it wanted to wrap things up! That's a play on "wrap" meaning both the present wrapping and concluding something.

So, maybe the three jokes are:

1. Why did the cake go to school? Because it wanted to improve its icing!
2. Why don't skeletons fight each other? Because they don't have the guts!
3. Why did the present go to school? Because it wanted to wrap things up!

Alternatively, the third one could be about the party hat: Why did the party hat go to school? Because it wanted to be a topper!

I think the first three I thought of are good. They're simple, relate to the party, and use wordplay that a child can grasp. The skeleton joke is a classic, and the others tie into the party theme. The wife with a great sense of humor would appreciate the puns, and the six-year-old would find them funny without needing deep context.
</think>

Here are three dad jokes perfect for a six-year-old's birthday party, crafted to be fun, simple, and relevant to the celebration:

1. **Why did the cake go to school?**  
   Because it wanted to improve its icing!

2. **Why don't skeletons fight each other?**  
   Because they don't have the guts!

3. **Why did the present go to school?**  
   Because it wanted to wrap things up!

These jokes are designed to be light-hearted, using wordplay that a child can enjoy while also being appropriate for the birthday theme.
CPU times: user 34 ms, sys: 5.74 ms, total: 39.8 ms
Wall time: 56.1 s

it’s a little slow (almost a full minute) but that’s a 32 Billion Parameters model… and it’s free. For tinkering and development purposes, it’s acceptable IMO!

3. OpenRouter

Last contender on the list, OpenRouter, aggregates and offers a list of free models!

They have paid models as well, and in general, their inference time is noticeably faster. When using their free model, there is a 20 requests per minute and 200 requests per day limit.

To compare, let’s try the new Rogue Rose model that’s trained for roleplaying and storytelling capabilities with 103 billion parameters!

%%time
chat_model = ChatOpenAI(
    openai_api_key=os.environ["OpenRouter_key"],
    openai_api_base= "https://openrouter.ai/api/v1",
    model_name="sophosympatheia/rogue-rose-103b-v0.2:free",
)
r = chat_model.invoke(prompt)
print(r.content)

 As the dad in this scenario, I'd be happy to share three of my funniest dad jokes at my daughter's birthday party! Here goes:

1. Why did the tomato turn red?

Answer: Because it saw the salad dressing!

(I know, I know, it's a classic groaner, but kids love puns!)

1. What did the zero say to the eight?

Answer: "Nice belt!"

(This one is a bit more subtle, but I think the adults might appreciate it more. The number 8 looks like a zero wearing a belt around its waist.)

1. Why do we tell actors to "break a leg"?

Answer: Because every play has a cast!

(This is a pun that combines the old theater superstition of wishing an actor to "break a leg" (meaning to have a successful performance) with the idea of a play having a cast of characters. It's a bit more sophisticated, but I think the adults would enjoy the wordplay.)
CPU times: user 78.7 ms, sys: 32 ms, total: 111 ms
Wall time: 4.73 s

Conclusion

there you go, three ways to create the ChatOpenAI runnable in langchain that’s completely zero-cost!

with Ollama, everything runs locally and you can access basically most (if not all) open-source models
HuggingFace Serverless Inference API is a great option if you don’t have decent hardware but still want the vast options of models
and lastly OpenRouter, is great if you want faster inference and okay with a slimmer list of available models

With all these models now at your disposal, making sure that you choose the right model for the right task becomes the real challenge.

As this blog post illustrated, not all models are trained for agentic work, so consult with LLM leaderboards such as this HuggingFace one, this one, or ScaleAI (which literally test LLM for a living).

Reuse

CC BY-NC-SA 4.0