ChatGPT Export Format¶

Understanding the structure of a ChatGPT data export is essential for troubleshooting conversion issues or extending the converter.

Overview¶

A ChatGPT data export is a ZIP archive produced by OpenAI's "Export data" feature under Settings → Data controls. It contains:

All conversations (partitioned JSON files)
User-uploaded files and DALL-E generations
Canvas/Project artifacts
User metadata and feedback records

The export is not a relational database. It's a denormalized dump where:

Conversations are stored as partitioned JSON arrays
File assets are scattered across multiple directories
References between conversations and files use internal identifiers (file-service:// URIs)
Message graphs within conversations are DAGs (directed acyclic graphs), not flat lists

Root-Level Files¶

File	Format	Purpose
`conversations-NNN.json`	JSON array	Partitioned conversation data (zero-indexed)
`export_manifest.json`	JSON object	Inventory of all exported files with paths and byte sizes
`user.json`	JSON object	Account profile (email, phone, subscription status)
`user_settings.json`	JSON array	Feature flags, model preferences, onboarding state
`message_feedback.json`	JSON array	Thumbs-up/thumbs-down ratings linked to conversations
`chat.html`	HTML	Interactive conversation browser (often 100+ MB)
`file-<ID>-<name>.<ext>`	Various	Root-level exported artifacts (modern format)
`file_<HEX>-<name>.<ext>`	Various	Root-level exported artifacts (legacy format)

Directory Structure¶

Directory Pattern	Contents
`<conversation-UUID>/image/`	PNG images from a specific conversation
`dalle-generations/`	DALL-E-generated WebP images
`user-<user-ID>/`	Canvas/Project workspace files
`user-<user-ID>/<hex-project-ID>/mnt/data/`	Individual project sandbox files (images, code)

File Naming Conventions¶

Two naming generations coexist in exports:

Modern Format¶

Pattern: file-<Base62ID>-<descriptive-name>.<ext>

Example: file-1tisCpYYvMfEMvXcf5uNTb-python_guidelines.md

File ID extraction: Take the first two dash-separated tokens → file-1tisCpYYvMfEMvXcf5uNTb

Legacy Format¶

Pattern: file_<HexID>-<descriptive-name>.<ext>

Example: file_000000006d18720cb0249e36a7f3d2d5-Untitled 1.md

File ID extraction: Take everything before the first hyphen → file_000000006d18720cb0249e36a7f3d2d5

Both formats in one export

A single export can contain both modern and legacy file naming formats. The converter handles both automatically.

Key JSON Files¶

Export Manifest (`export_manifest.json`)¶

Authoritative inventory of all files:

{
  "export_files": [
    {
      "path": "conversations-0.json",
      "size_bytes": 1234567
    },
    {
      "path": "file-8Vk2ls8JSO2iOVBq87yJ880Q-example.jpeg",
      "size_bytes": 129054
    }
  ]
}

Used by the converter to build the file ID → path lookup index.

User Profile (`user.json`)¶

Account information (contains PII):

{
  "id": "user-abc123",
  "email": "user@example.com",
  "phone_number": "+1234567890",
  "birth_year": 1990,
  "chatgpt_plus_user": true
}

PII Fields

By default, email, phone_number, and birth_year are redacted during conversion. Use --no-redact-pii to preserve them.

User Settings (`user_settings.json`)¶

Feature flags and preferences:

[
  {
    "user_id": "user-abc123",
    "settings": {
      "training_allowed": false,
      "developer_mode": false,
      "voice_name": "glimmer",
      "last_used_model_config": {
        "slugs": {
          "default": "gpt-4o",
          "web": "gpt-4o",
          "ios_app": "gpt-4o-mini"
        }
      }
    }
  }
]

Message Feedback (`message_feedback.json`)¶

Thumbs-up/thumbs-down ratings:

[
  {
    "id": "fb-uuid",
    "conversation_id": "conv-uuid",
    "rating": "thumbs_up",
    "create_time": "2026-01-19T20:41:07.303611Z"
  }
]

The conversation_id links feedback to specific conversations.

Conversations (`conversations-NNN.json`)¶

Each file contains a JSON array of conversation objects. Large exports are split into multiple partitioned files (conversations-0.json, conversations-1.json, etc.).

See Data Models for complete conversation structure.

Asset Reference Resolution¶

File Pointer URIs¶

Asset references use these URI schemes:

Scheme	Format	Example
`file-service://`	`file-service://file-<ID>`	`file-service://file-8Vk2ls8JSO2iOVBq87yJ880Q`
`sediment://`	`sediment://file_<HEX>`	`sediment://file_000000006d18720cb0249e36a7f3d2d5`

The converter strips the scheme prefix and looks up the file ID in the pre-built index.

Resolution Priority¶

When resolving a file ID, the converter searches locations in this order:

Root directory: file-<ID>-*
Conversation image directories: <conv-UUID>/image/file*
DALL-E directory: dalle-generations/file-<ID>-*
User workspace directories: user-<user-ID>/**/file*
Legacy root format: file_<HEX>-*

Missing Assets¶

If a file ID cannot be resolved:

A warning is logged with conversation ID, message ID, and unresolved pointer
Markdown output includes: 
Conversion continues (does not fail)

Conversation Structure¶

Each conversation in conversations-*.json contains:

Metadata: Title, timestamps, model slug, flags
Mapping: Dictionary of nodes forming a DAG
Current node: Pointer to the active conversation leaf

Message DAG¶

Messages are organized as a directed acyclic graph (DAG):

Root node: Always has parent: null and message: null
Branches: Created when users edit previous messages
Current node: Points to the leaf of the primary conversation path

The linearizer walks from current_node back to the root to reconstruct the conversation timeline.

Common Export Sizes¶

Account Activity	Export Size	Conversations	Files
Light user	1-10 MB	10-50	0-10
Moderate user	10-100 MB	50-500	10-100
Heavy user	100 MB-1 GB	500-5,000	100+
Power user	1-10 GB	5,000+	1,000+

The chat.html file alone can exceed 100 MB for active accounts.

Next Steps¶

Explore the data models for detailed JSON schemas
Understand the conversion pipeline
Review output structure conventions