The direct module provides low-level methods for making imperative requests to LLMs where the only abstraction is input and output schema translation, enabling you to use all models with the same API.
These methods are thin wrappers around the Model implementations, offering a simpler interface when you don't need the full functionality of an Agent.
The following functions are available:
model_request: Make a non-streamed async request to a model
Here's a simple example demonstrating how to use the direct API to make a basic request:
direct_basic.py
frompydantic_ai.directimportmodel_request_syncfrompydantic_ai.messagesimportModelRequest# Make a synchronous request to the modelmodel_response=model_request_sync('anthropic:claude-3-5-haiku-latest',[ModelRequest.user_text_prompt('What is the capital of France?')])print(model_response.parts[0].content)#> Parisprint(model_response.usage)#> Usage(requests=1, request_tokens=56, response_tokens=1, total_tokens=57)
(This example is complete, it can be run "as is")
Advanced Example with Tool Calling
You can also use the direct API to work with function/tool calling.
Even here we can use Pydantic to generate the JSON schema for the tool:
frompydanticimportBaseModelfromtyping_extensionsimportLiteralfrompydantic_ai.directimportmodel_requestfrompydantic_ai.messagesimportModelRequestfrompydantic_ai.modelsimportModelRequestParametersfrompydantic_ai.toolsimportToolDefinitionclassDivide(BaseModel):"""Divide two numbers."""numerator:floatdenominator:floaton_inf:Literal['error','infinity']='infinity'asyncdefmain():# Make a request to the model with tool accessmodel_response=awaitmodel_request('openai:gpt-4.1-nano',[ModelRequest.user_text_prompt('What is 123 / 456?')],model_request_parameters=ModelRequestParameters(function_tools=[ToolDefinition(name=Divide.__name__.lower(),description=Divide.__doc__or'',parameters_json_schema=Divide.model_json_schema(),)],allow_text_output=True,# Allow model to either use tools or respond directly),)print(model_response)""" ModelResponse( parts=[ ToolCallPart( tool_name='divide', args={'numerator': '123', 'denominator': '456'}, tool_call_id='pyd_ai_2e0e396768a14fe482df90a29a78dc7b', ) ], usage=Usage(requests=1, request_tokens=55, response_tokens=7, total_tokens=62), model_name='gpt-4.1-nano', timestamp=datetime.datetime(...), ) """
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
When to Use the direct API vs Agent
The direct API is ideal when:
You need more direct control over model interactions
You want to implement custom behavior around model requests
You're building your own abstractions on top of model interactions
For most application use cases, the higher-level Agent API provides a more convenient interface with additional features such as built-in tool execution, retrying, structured output parsing, and more.
OpenTelemetry or Logfire Instrumentation
As with agents, you can enable OpenTelemetry/Logfire instrumentation with just a few extra lines
direct_instrumented.py
importlogfirefrompydantic_ai.directimportmodel_request_syncfrompydantic_ai.messagesimportModelRequestlogfire.configure()logfire.instrument_pydantic_ai()# Make a synchronous request to the modelmodel_response=model_request_sync('anthropic:claude-3-5-haiku-latest',[ModelRequest.user_text_prompt('What is the capital of France?')],)print(model_response.parts[0].content)#> Paris
(This example is complete, it can be run "as is")
You can also enable OpenTelemetry on a per call basis:
direct_instrumented.py
importlogfirefrompydantic_ai.directimportmodel_request_syncfrompydantic_ai.messagesimportModelRequestlogfire.configure()# Make a synchronous request to the modelmodel_response=model_request_sync('anthropic:claude-3-5-haiku-latest',[ModelRequest.user_text_prompt('What is the capital of France?')],instrument=True)print(model_response.parts[0].content)#> Paris
See Debugging and Monitoring for more details, including how to instrument with plain OpenTelemetry without Logfire.