Skip to content

Data Models

The converter uses Pydantic v2 models for strict JSON validation and type safety. This page documents the key data structures.

Conversation Models

Conversation

The root object for each conversation in conversations-*.json:

class Conversation(BaseModel):
    id: str
    conversation_id: str
    title: str
    create_time: float  # Unix epoch seconds
    update_time: float | None
    current_node: str
    default_model_slug: str | None
    mapping: dict[str, Node]
    # ... additional flags

Key fields:

  • mapping: Dictionary of node IDs → Node objects forming the message DAG
  • current_node: UUID pointing to the active conversation leaf
  • create_time / update_time: Unix epoch floats (seconds since 1970-01-01 UTC)
  • default_model_slug: Model used (e.g., "gpt-4o", "auto", null)

Node

Each node in the conversation DAG:

class Node(BaseModel):
    id: str
    parent: str | None
    children: list[str]
    message: Message | None

DAG structure:

  • Exactly one root node with parent: null and message: null
  • current_node points to the leaf of the primary conversation path
  • Branches occur when users edit previous messages

Message

Individual message within a node:

class Message(BaseModel):
    id: str
    author: Author
    content: Content  # Discriminated union
    channel: str | None
    recipient: str
    create_time: float | None
    status: str
    weight: float
    metadata: dict

Author roles:

Role Description
system System-generated context messages
user User input
assistant Model responses
tool Tool invocation output

Channel values:

Channel Meaning
null Standard message
commentary Thinking/reasoning (o-series models)
final Resolved assistant response in multi-step agentic turn

Weight:

  • 1.0: Visible message
  • 0.0: Hidden/system message (filtered by default)

Content Types

Content is a discriminated union based on content_type. The converter supports 12+ types:

TextContent

Most common — plain text messages:

class TextContent(BaseModel):
    content_type: Literal["text"]
    parts: list[str]

Example JSON:

{
  "content_type": "text",
  "parts": [
    "Hello, how can I help you today?"
  ]
}

MultimodalTextContent

Text mixed with embedded images or files:

class MultimodalTextContent(BaseModel):
    content_type: Literal["multimodal_text"]
    parts: list[str | AssetPointer]

Example JSON:

{
  "content_type": "multimodal_text",
  "parts": [
    {
      "asset_pointer": "file-service://file-8Vk2ls8JSO2iOVBq87yJ880Q",
      "content_type": "image_asset_pointer",
      "height": 765,
      "width": 715,
      "size_bytes": 129054
    },
    "Give the prompt to generate this image."
  ]
}

Parts alternate freely between strings and asset pointers.

CodeContent

Tool invocations or code snippets:

class CodeContent(BaseModel):
    content_type: Literal["code"]
    language: str | None
    text: str

ThoughtsContent

Internal reasoning from thinking models (o-series):

class ThoughtsContent(BaseModel):
    content_type: Literal["thoughts"]
    thoughts: list[dict]

Example JSON:

{
  "content_type": "thoughts",
  "thoughts": [
    {
      "content": "Let me analyze this step by step...",
      "summary": "Breaking down the problem",
      "finished": true
    }
  ]
}

SonicWebpageContent

Web search results from the web.search tool:

class SonicWebpageContent(BaseModel):
    content_type: Literal["sonic_webpage"]
    domain: str
    url: str
    title: str
    snippet: str
    text: str  # Full page content

TetherQuoteContent

File content extracted by myfiles_browser tool:

class TetherQuoteContent(BaseModel):
    content_type: Literal["tether_quote"]
    domain: str
    url: str
    title: str
    text: str

ExecutionOutputContent

stdout from code interpreter:

class ExecutionOutputContent(BaseModel):
    content_type: Literal["execution_output"]
    text: str

SystemErrorContent

Tool invocation errors:

class SystemErrorContent(BaseModel):
    content_type: Literal["system_error"]
    name: str
    text: str

FallbackContent

Catchall for unknown content types:

class FallbackContent(BaseModel):
    model_config = ConfigDict(extra="allow")
    content_type: str  # Any string not in known types

Allows forward compatibility with new content types added by OpenAI.

Content Type Summary

content_type Author Role Description
text any Plain text (most common)
multimodal_text user, assistant Text + embedded asset pointers
code tool Tool invocation source
execution_output tool Code interpreter stdout
computer_output tool Computer-use screenshot + browser state
thoughts assistant Internal reasoning chain (o-series)
reasoning_recap assistant UI placeholder: "Thought for Ns"
sonic_webpage tool Web search result
tether_quote tool Extracted file content
tether_browsing_display tool Loading placeholder
user_editable_context user Custom instructions
system_error tool Exception from agent tool

Asset Models

AssetPointer

References to files in the export:

class AssetPointer(BaseModel):
    asset_pointer: str  # URI: file-service://file-<ID>
    content_type: Literal["image_asset_pointer"]
    height: int | None
    width: int | None
    size_bytes: int
    metadata: dict | None

URI schemes:

  • file-service://file-<ID> (modern)
  • sediment://file_<HEX> (legacy, computer-use screenshots)

Attachment

File metadata in message.metadata.attachments:

class Attachment(BaseModel):
    id: str  # Matches file ID in asset_pointer
    name: str  # Original filename
    size: int
    height: int | None
    width: int | None

Metadata Models

ExportManifest

File inventory from export_manifest.json:

class ExportManifestFile(BaseModel):
    path: str
    size_bytes: int

class ExportManifest(BaseModel):
    export_files: list[ExportManifestFile]

User

Account profile from user.json:

class User(BaseModel):
    id: str
    email: str
    phone_number: str | None
    birth_year: int | None
    chatgpt_plus_user: bool

PII Redaction

email, phone_number, and birth_year are redacted by default. Use --no-redact-pii to preserve.

UserSettings

Feature flags from user_settings.json:

class UserSettings(BaseModel):
    model_config = ConfigDict(extra="allow")
    user_id: str
    settings: dict  # Flexible structure

MessageFeedback

Thumbs-up/thumbs-down ratings:

class MessageFeedback(BaseModel):
    id: str
    conversation_id: str
    user_id: str
    rating: Literal["thumbs_up", "thumbs_down"]
    create_time: str  # ISO-8601 timestamp

Discriminated Union Pattern

The converter uses a custom discriminator for content types to support the FallbackContent catchall:

from pydantic import Discriminator, Tag
from typing import Annotated

def _content_discriminator(value: Any) -> str:
    """Route unknown content_types to 'fallback' tag."""
    if isinstance(value, dict):
        ct = value.get("content_type", "")
        return "fallback" if ct not in _KNOWN_CONTENT_TYPES else ct
    return "fallback"

Content = Annotated[
    TextContent | MultimodalTextContent | CodeContent | ThoughtsContent | FallbackContent,
    Discriminator(_content_discriminator),
]

This allows the converter to handle new content types gracefully without failing validation.

Model Configuration

Extra Fields

Some models allow extra fields for forward compatibility:

class Conversation(BaseModel):
    model_config = ConfigDict(extra="allow")
    # ...

This prevents validation errors when OpenAI adds new fields to the export format.

Type Coercion

Pydantic automatically coerces compatible types:

  • Unix epoch floats → datetime objects (via validators)
  • String UUIDs → validated UUID objects
  • Null values → None in optional fields

Next Steps