Model Provider Support

Open Notebook supports multiple AI model providers to give you flexibility in choosing the AI that best fits your needs.

Provider	Highlights
OpenAI	Great models, covering all necessary features for Open Notebook
Anthropic	Very capable Sonnet 3.5 for dynamic reasoning
Gemini	Large context (2M tokens) and the best text to speech for podcasts
Ollama	Running local models for free. Great for transformation tasks
ElevenLabs	For amazing voice quality
Open Router	Great option for using several open source models, as well as Cohere, Mistral, xAI, etc
Groq	Very fast inference, but limited model availability
xAI	The powerful grok model, less guardrails, great responses.
Vertex AI	If you are running a Google Cloud environment

All providers are installed out of the box. All you need to do is to setup the environment variable configurations (API Keys, etc) for your selected provider and decide which models to use.

Please refer to the .env.example file for instructions on which ENV variables are necessary for each.

API Key Requirements

The Podcast Generator feature currently requires Gemini API keys for content generation. Additionally, voice generation requires either an OpenAI API key or an Eleven Labs API key. Make sure you have the necessary API keys configured before using these features.

Create models on the Settings page

Go to the settings page and create your different models.

📝 Notice: For complete usage of all the features, you need to setup at least 4 models (one of each type).

Model Type	Supported Providers
Language	OpenAI, Anthropic, Open Router, LiteLLM, Vertex AI, Vertex AI, Anthropic, Gemini, Ollama, xAI, Groq
Embedding	OpenAI, Gemini, Vertex AI, Ollama
Speech to Text	OpenAI, Groq
Text to Speech	OpenAI, ElevenLabs, Gemini

If you are not sure which models to setup, the Model Settings page will offer some options for you to get started with.

After setting up the models, head to the Model Defaults tab to define the default models. There are several defaults to setup.

Model Default	Purpose
Chat Model	Will be used on all chats
Transformation Model	Will be used for summaries, insights, etc
Large Context	For content higher then 110k tokens (use Gemini here)
Speech to Text	For transcribing text from your audio/video uploads
Text to Speech	For generating podcasts
Embedding	For creating vector representation of content

All model types and defaults are required for now. If you are not sure which to pick, go with OpenAI, the only one that covers all possible model types.

The reason for opting for this route is because different LLMs, will behave better/worse depending on the type of request and type of tools offered. So it makes sense to build a more refined system to decide which model should process which task.

For instance, we can use an Ollama based model, like gemma2 to do summarization and document query, and use openai/claude for the chat. The whole idea is to allow you to experiment on cost/performance.

Suggested Configurations

These are some suggested configurations for different use cases and budgets:

Best in Class

Model Default	Model Name
Chat Model	claude-3-5-sonnet-latest
Transformation Model	gpt-4o-mini
Large Context	gemini-1.5-pro
Speech to Text	whisper-1
Text to Speech	eleven_turbo_v2_5 (elevenlabs)
Embedding	text-embedding-3-small

Open AI Only Configuration

Model Default	Model Name
Chat Model	gpt-4o-mini
Transformation Model	gpt-4o-mini
Large Context	gpt-4o-mini (you will be limited to 128k tokens)
Speech to Text	whisper-1
Text to Speech	tts-1-hd
Embedding	text-embedding-3-small

Gemini Only Configuration

Model Default	Model Name
Chat Model	gemini-1.5-flash
Transformation Model	gemini-1.5-flash
Large Context	gemini-1.5-pro
Speech to Text	(not available yet)
Text to Speech	default
Embedding	text-embedding-004

Open Source Only (using Ollama)

Model Default	Model Name
Chat Model	qwen2.5 or gemma2 or phi3 or llama3.2
Transformation Model	qwen2.5 or gemma2 or phi3 or llama3.2
Large Context	qwen2.5 or gemma2 or phi3 or llama3.2 (limited to 128k)
Speech to Text	(not possible yet)
Text to Speech	(not possible yet)
Embedding	mxbai-embed-large

We are working hard to support more providers and model types to give users more flexibility and options.

Testing your models

If you are not sure which model will work best for you, you can try them up on the Playground section and see for yourself how they handle different tasks.

⚠️ Important instructions for Gemini

The new Gemini Text to Speech models are amazing and definitely worth using. But in order to use them, you need to do a little setup. Please refer to this Podcastfy help page for details. But it basically requires you to enable the Text to Speech API and add it to your API Key.

Model Provider Support ​

Create models on the Settings page ​

Suggested Configurations ​

Best in Class ​

Open AI Only Configuration ​

Gemini Only Configuration ​

Open Source Only (using Ollama) ​

Testing your models ​

⚠️ Important instructions for Gemini ​