Data Models¶
The converter uses Pydantic v2 models for strict JSON validation and type safety. This page documents the key data structures.
Conversation Models¶
Conversation¶
The root object for each conversation in conversations-*.json:
class Conversation(BaseModel):
id: str
conversation_id: str
title: str
create_time: float # Unix epoch seconds
update_time: float | None
current_node: str
default_model_slug: str | None
mapping: dict[str, Node]
# ... additional flags
Key fields:
mapping: Dictionary of node IDs → Node objects forming the message DAGcurrent_node: UUID pointing to the active conversation leafcreate_time/update_time: Unix epoch floats (seconds since 1970-01-01 UTC)default_model_slug: Model used (e.g.,"gpt-4o","auto",null)
Node¶
Each node in the conversation DAG:
DAG structure:
- Exactly one root node with
parent: nullandmessage: null current_nodepoints to the leaf of the primary conversation path- Branches occur when users edit previous messages
Message¶
Individual message within a node:
class Message(BaseModel):
id: str
author: Author
content: Content # Discriminated union
channel: str | None
recipient: str
create_time: float | None
status: str
weight: float
metadata: dict
Author roles:
| Role | Description |
|---|---|
system |
System-generated context messages |
user |
User input |
assistant |
Model responses |
tool |
Tool invocation output |
Channel values:
| Channel | Meaning |
|---|---|
null |
Standard message |
commentary |
Thinking/reasoning (o-series models) |
final |
Resolved assistant response in multi-step agentic turn |
Weight:
1.0: Visible message0.0: Hidden/system message (filtered by default)
Content Types¶
Content is a discriminated union based on content_type. The converter supports 12+ types:
TextContent¶
Most common — plain text messages:
Example JSON:
MultimodalTextContent¶
Text mixed with embedded images or files:
class MultimodalTextContent(BaseModel):
content_type: Literal["multimodal_text"]
parts: list[str | AssetPointer]
Example JSON:
{
"content_type": "multimodal_text",
"parts": [
{
"asset_pointer": "file-service://file-8Vk2ls8JSO2iOVBq87yJ880Q",
"content_type": "image_asset_pointer",
"height": 765,
"width": 715,
"size_bytes": 129054
},
"Give the prompt to generate this image."
]
}
Parts alternate freely between strings and asset pointers.
CodeContent¶
Tool invocations or code snippets:
ThoughtsContent¶
Internal reasoning from thinking models (o-series):
Example JSON:
{
"content_type": "thoughts",
"thoughts": [
{
"content": "Let me analyze this step by step...",
"summary": "Breaking down the problem",
"finished": true
}
]
}
SonicWebpageContent¶
Web search results from the web.search tool:
class SonicWebpageContent(BaseModel):
content_type: Literal["sonic_webpage"]
domain: str
url: str
title: str
snippet: str
text: str # Full page content
TetherQuoteContent¶
File content extracted by myfiles_browser tool:
class TetherQuoteContent(BaseModel):
content_type: Literal["tether_quote"]
domain: str
url: str
title: str
text: str
ExecutionOutputContent¶
stdout from code interpreter:
SystemErrorContent¶
Tool invocation errors:
FallbackContent¶
Catchall for unknown content types:
class FallbackContent(BaseModel):
model_config = ConfigDict(extra="allow")
content_type: str # Any string not in known types
Allows forward compatibility with new content types added by OpenAI.
Content Type Summary¶
content_type |
Author Role | Description |
|---|---|---|
text |
any | Plain text (most common) |
multimodal_text |
user, assistant | Text + embedded asset pointers |
code |
tool | Tool invocation source |
execution_output |
tool | Code interpreter stdout |
computer_output |
tool | Computer-use screenshot + browser state |
thoughts |
assistant | Internal reasoning chain (o-series) |
reasoning_recap |
assistant | UI placeholder: "Thought for Ns" |
sonic_webpage |
tool | Web search result |
tether_quote |
tool | Extracted file content |
tether_browsing_display |
tool | Loading placeholder |
user_editable_context |
user | Custom instructions |
system_error |
tool | Exception from agent tool |
Asset Models¶
AssetPointer¶
References to files in the export:
class AssetPointer(BaseModel):
asset_pointer: str # URI: file-service://file-<ID>
content_type: Literal["image_asset_pointer"]
height: int | None
width: int | None
size_bytes: int
metadata: dict | None
URI schemes:
file-service://file-<ID>(modern)sediment://file_<HEX>(legacy, computer-use screenshots)
Attachment¶
File metadata in message.metadata.attachments:
class Attachment(BaseModel):
id: str # Matches file ID in asset_pointer
name: str # Original filename
size: int
height: int | None
width: int | None
Metadata Models¶
ExportManifest¶
File inventory from export_manifest.json:
class ExportManifestFile(BaseModel):
path: str
size_bytes: int
class ExportManifest(BaseModel):
export_files: list[ExportManifestFile]
User¶
Account profile from user.json:
class User(BaseModel):
id: str
email: str
phone_number: str | None
birth_year: int | None
chatgpt_plus_user: bool
PII Redaction
email, phone_number, and birth_year are redacted by default. Use --no-redact-pii to preserve.
UserSettings¶
Feature flags from user_settings.json:
class UserSettings(BaseModel):
model_config = ConfigDict(extra="allow")
user_id: str
settings: dict # Flexible structure
MessageFeedback¶
Thumbs-up/thumbs-down ratings:
class MessageFeedback(BaseModel):
id: str
conversation_id: str
user_id: str
rating: Literal["thumbs_up", "thumbs_down"]
create_time: str # ISO-8601 timestamp
Discriminated Union Pattern¶
The converter uses a custom discriminator for content types to support the FallbackContent catchall:
from pydantic import Discriminator, Tag
from typing import Annotated
def _content_discriminator(value: Any) -> str:
"""Route unknown content_types to 'fallback' tag."""
if isinstance(value, dict):
ct = value.get("content_type", "")
return "fallback" if ct not in _KNOWN_CONTENT_TYPES else ct
return "fallback"
Content = Annotated[
TextContent | MultimodalTextContent | CodeContent | ThoughtsContent | FallbackContent,
Discriminator(_content_discriminator),
]
This allows the converter to handle new content types gracefully without failing validation.
Model Configuration¶
Extra Fields¶
Some models allow extra fields for forward compatibility:
This prevents validation errors when OpenAI adds new fields to the export format.
Type Coercion¶
Pydantic automatically coerces compatible types:
- Unix epoch floats → datetime objects (via validators)
- String UUIDs → validated UUID objects
- Null values →
Nonein optional fields
Next Steps¶
- See pipeline documentation for how these models are used
- Review export format for raw JSON examples
- Explore output structure for rendered Markdown format