Data Models¶

The converter uses Pydantic v2 models for strict JSON validation and type safety. This page documents the key data structures.

Conversation Models¶

Conversation¶

The root object for each conversation in conversations-*.json:

class Conversation(BaseModel):
    id: str
    conversation_id: str
    title: str
    create_time: float  # Unix epoch seconds
    update_time: float | None
    current_node: str
    default_model_slug: str | None
    mapping: dict[str, Node]
    # ... additional flags

Key fields:

mapping: Dictionary of node IDs → Node objects forming the message DAG
current_node: UUID pointing to the active conversation leaf
create_time / update_time: Unix epoch floats (seconds since 1970-01-01 UTC)
default_model_slug: Model used (e.g., "gpt-4o", "auto", null)

Node¶

Each node in the conversation DAG:

class Node(BaseModel):
    id: str
    parent: str | None
    children: list[str]
    message: Message | None

DAG structure:

Exactly one root node with parent: null and message: null
current_node points to the leaf of the primary conversation path
Branches occur when users edit previous messages

Message¶

Individual message within a node:

class Message(BaseModel):
    id: str
    author: Author
    content: Content  # Discriminated union
    channel: str | None
    recipient: str
    create_time: float | None
    status: str
    weight: float
    metadata: dict

Author roles:

Role	Description
`system`	System-generated context messages
`user`	User input
`assistant`	Model responses
`tool`	Tool invocation output

Channel values:

Channel	Meaning
`null`	Standard message
`commentary`	Thinking/reasoning (o-series models)
`final`	Resolved assistant response in multi-step agentic turn

Weight:

1.0: Visible message
0.0: Hidden/system message (filtered by default)

Content Types¶

Content is a discriminated union based on content_type. The converter supports 12+ types:

TextContent¶

Most common — plain text messages:

class TextContent(BaseModel):
    content_type: Literal["text"]
    parts: list[str]

Example JSON:

{
  "content_type": "text",
  "parts": [
    "Hello, how can I help you today?"
  ]
}

MultimodalTextContent¶

Text mixed with embedded images or files:

class MultimodalTextContent(BaseModel):
    content_type: Literal["multimodal_text"]
    parts: list[str | AssetPointer]

Example JSON:

{
  "content_type": "multimodal_text",
  "parts": [
    {
      "asset_pointer": "file-service://file-8Vk2ls8JSO2iOVBq87yJ880Q",
      "content_type": "image_asset_pointer",
      "height": 765,
      "width": 715,
      "size_bytes": 129054
    },
    "Give the prompt to generate this image."
  ]
}

Parts alternate freely between strings and asset pointers.

CodeContent¶

Tool invocations or code snippets:

class CodeContent(BaseModel):
    content_type: Literal["code"]
    language: str | None
    text: str

ThoughtsContent¶

Internal reasoning from thinking models (o-series):

class ThoughtsContent(BaseModel):
    content_type: Literal["thoughts"]
    thoughts: list[dict]

Example JSON:

{
  "content_type": "thoughts",
  "thoughts": [
    {
      "content": "Let me analyze this step by step...",
      "summary": "Breaking down the problem",
      "finished": true
    }
  ]
}

SonicWebpageContent¶

Web search results from the web.search tool:

class SonicWebpageContent(BaseModel):
    content_type: Literal["sonic_webpage"]
    domain: str
    url: str
    title: str
    snippet: str
    text: str  # Full page content

TetherQuoteContent¶

File content extracted by myfiles_browser tool:

class TetherQuoteContent(BaseModel):
    content_type: Literal["tether_quote"]
    domain: str
    url: str
    title: str
    text: str

ExecutionOutputContent¶

stdout from code interpreter:

class ExecutionOutputContent(BaseModel):
    content_type: Literal["execution_output"]
    text: str

SystemErrorContent¶

Tool invocation errors:

class SystemErrorContent(BaseModel):
    content_type: Literal["system_error"]
    name: str
    text: str

FallbackContent¶

Catchall for unknown content types:

class FallbackContent(BaseModel):
    model_config = ConfigDict(extra="allow")
    content_type: str  # Any string not in known types

Allows forward compatibility with new content types added by OpenAI.

Content Type Summary¶

`content_type`	Author Role	Description
`text`	any	Plain text (most common)
`multimodal_text`	user, assistant	Text + embedded asset pointers
`code`	tool	Tool invocation source
`execution_output`	tool	Code interpreter stdout
`computer_output`	tool	Computer-use screenshot + browser state
`thoughts`	assistant	Internal reasoning chain (o-series)
`reasoning_recap`	assistant	UI placeholder: "Thought for Ns"
`sonic_webpage`	tool	Web search result
`tether_quote`	tool	Extracted file content
`tether_browsing_display`	tool	Loading placeholder
`user_editable_context`	user	Custom instructions
`system_error`	tool	Exception from agent tool

Asset Models¶

AssetPointer¶

References to files in the export:

class AssetPointer(BaseModel):
    asset_pointer: str  # URI: file-service://file-<ID>
    content_type: Literal["image_asset_pointer"]
    height: int | None
    width: int | None
    size_bytes: int
    metadata: dict | None

URI schemes:

file-service://file-<ID> (modern)
sediment://file_<HEX> (legacy, computer-use screenshots)

Attachment¶

File metadata in message.metadata.attachments:

class Attachment(BaseModel):
    id: str  # Matches file ID in asset_pointer
    name: str  # Original filename
    size: int
    height: int | None
    width: int | None

Metadata Models¶

ExportManifest¶

File inventory from export_manifest.json:

class ExportManifestFile(BaseModel):
    path: str
    size_bytes: int

class ExportManifest(BaseModel):
    export_files: list[ExportManifestFile]

User¶

Account profile from user.json:

class User(BaseModel):
    id: str
    email: str
    phone_number: str | None
    birth_year: int | None
    chatgpt_plus_user: bool

PII Redaction

email, phone_number, and birth_year are redacted by default. Use --no-redact-pii to preserve.

UserSettings¶

Feature flags from user_settings.json:

class UserSettings(BaseModel):
    model_config = ConfigDict(extra="allow")
    user_id: str
    settings: dict  # Flexible structure

MessageFeedback¶

Thumbs-up/thumbs-down ratings:

class MessageFeedback(BaseModel):
    id: str
    conversation_id: str
    user_id: str
    rating: Literal["thumbs_up", "thumbs_down"]
    create_time: str  # ISO-8601 timestamp

Discriminated Union Pattern¶

The converter uses a custom discriminator for content types to support the FallbackContent catchall:

from pydantic import Discriminator, Tag
from typing import Annotated

def _content_discriminator(value: Any) -> str:
    """Route unknown content_types to 'fallback' tag."""
    if isinstance(value, dict):
        ct = value.get("content_type", "")
        return "fallback" if ct not in _KNOWN_CONTENT_TYPES else ct
    return "fallback"

Content = Annotated[
    TextContent | MultimodalTextContent | CodeContent | ThoughtsContent | FallbackContent,
    Discriminator(_content_discriminator),
]

This allows the converter to handle new content types gracefully without failing validation.

Model Configuration¶

Extra Fields¶

Some models allow extra fields for forward compatibility:

class Conversation(BaseModel):
    model_config = ConfigDict(extra="allow")
    # ...

This prevents validation errors when OpenAI adds new fields to the export format.

Type Coercion¶

Pydantic automatically coerces compatible types:

Unix epoch floats → datetime objects (via validators)
String UUIDs → validated UUID objects
Null values → None in optional fields

Next Steps¶

See pipeline documentation for how these models are used
Review export format for raw JSON examples
Explore output structure for rendered Markdown format