Skip to main content

DAY 0 Support: Gemini 3.1 Flash Lite Preview on LiteLLM

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

LiteLLM now supports gemini-3.1-flash-lite-preview with full day 0 support!

note

If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1

What's New​

Supports all four thinking levels:

  • MINIMAL: Ultra-fast responses with minimal reasoning
  • LOW: Simple instruction following
  • MEDIUM: Balanced reasoning for complex tasks
  • HIGH: Maximum reasoning depth (dynamic)

Quick Start​

Basic Usage

from litellm import completion

response = completion(
model="gemini/gemini-3.1-flash-lite-preview",
messages=[{"role": "user", "content": "Extract key entities from this text: ..."}],
)

print(response.choices[0].message.content)

With Thinking Levels

from litellm import completion

# Use MEDIUM thinking for complex reasoning tasks
response = completion(
model="gemini/gemini-3.1-flash-lite-preview",
messages=[{"role": "user", "content": "Analyze this dataset and identify patterns"}],
reasoning_effort="medium", # low, medium , high
)

print(response.choices[0].message.content)

Supported Endpoints​

LiteLLM provides full end-to-end support for Gemini 3.1 Flash Lite Preview on:

  • ✅ /v1/chat/completions - OpenAI-compatible chat completions endpoint
  • ✅ /v1/responses - OpenAI Responses API endpoint (streaming and non-streaming)
  • ✅ /v1/messages - Anthropic-compatible messages endpoint
  • ✅ /v1/generateContent – Google Gemini API compatible endpoint

All endpoints support:

  • Streaming and non-streaming responses
  • Function calling with thought signatures
  • Multi-turn conversations
  • All Gemini 3-specific features (thinking levels, thought signatures)
  • Full multimodal support (text, image, audio, video)

reasoning_effort Mapping for Gemini 3.1​

LiteLLM automatically maps OpenAI's reasoning_effort parameter to Gemini's thinkingLevel:

reasoning_effortthinking_levelUse Case
minimalminimalUltra-fast responses, simple queries
lowlowBasic instruction following
mediummediumBalanced reasoning for moderate complexity
highhighMaximum reasoning depth, complex problems
disableminimalDisable extended reasoning
noneminimalNo extended reasoning