Skip to content

ChatGPT Export Format

Understanding the structure of a ChatGPT data export is essential for troubleshooting conversion issues or extending the converter.

Overview

A ChatGPT data export is a ZIP archive produced by OpenAI's "Export data" feature under Settings → Data controls. It contains:

  • All conversations (partitioned JSON files)
  • User-uploaded files and DALL-E generations
  • Canvas/Project artifacts
  • User metadata and feedback records

The export is not a relational database. It's a denormalized dump where:

  • Conversations are stored as partitioned JSON arrays
  • File assets are scattered across multiple directories
  • References between conversations and files use internal identifiers (file-service:// URIs)
  • Message graphs within conversations are DAGs (directed acyclic graphs), not flat lists

Root-Level Files

File Format Purpose
conversations-NNN.json JSON array Partitioned conversation data (zero-indexed)
export_manifest.json JSON object Inventory of all exported files with paths and byte sizes
user.json JSON object Account profile (email, phone, subscription status)
user_settings.json JSON array Feature flags, model preferences, onboarding state
message_feedback.json JSON array Thumbs-up/thumbs-down ratings linked to conversations
chat.html HTML Interactive conversation browser (often 100+ MB)
file-<ID>-<name>.<ext> Various Root-level exported artifacts (modern format)
file_<HEX>-<name>.<ext> Various Root-level exported artifacts (legacy format)

Directory Structure

Directory Pattern Contents
<conversation-UUID>/image/ PNG images from a specific conversation
dalle-generations/ DALL-E-generated WebP images
user-<user-ID>/ Canvas/Project workspace files
user-<user-ID>/<hex-project-ID>/mnt/data/ Individual project sandbox files (images, code)

File Naming Conventions

Two naming generations coexist in exports:

Modern Format

Pattern: file-<Base62ID>-<descriptive-name>.<ext>

Example: file-1tisCpYYvMfEMvXcf5uNTb-python_guidelines.md

File ID extraction: Take the first two dash-separated tokens → file-1tisCpYYvMfEMvXcf5uNTb

Legacy Format

Pattern: file_<HexID>-<descriptive-name>.<ext>

Example: file_000000006d18720cb0249e36a7f3d2d5-Untitled 1.md

File ID extraction: Take everything before the first hyphen → file_000000006d18720cb0249e36a7f3d2d5

Both formats in one export

A single export can contain both modern and legacy file naming formats. The converter handles both automatically.

Key JSON Files

Export Manifest (export_manifest.json)

Authoritative inventory of all files:

{
  "export_files": [
    {
      "path": "conversations-0.json",
      "size_bytes": 1234567
    },
    {
      "path": "file-8Vk2ls8JSO2iOVBq87yJ880Q-example.jpeg",
      "size_bytes": 129054
    }
  ]
}

Used by the converter to build the file ID → path lookup index.

User Profile (user.json)

Account information (contains PII):

{
  "id": "user-abc123",
  "email": "user@example.com",
  "phone_number": "+1234567890",
  "birth_year": 1990,
  "chatgpt_plus_user": true
}

PII Fields

By default, email, phone_number, and birth_year are redacted during conversion. Use --no-redact-pii to preserve them.

User Settings (user_settings.json)

Feature flags and preferences:

[
  {
    "user_id": "user-abc123",
    "settings": {
      "training_allowed": false,
      "developer_mode": false,
      "voice_name": "glimmer",
      "last_used_model_config": {
        "slugs": {
          "default": "gpt-4o",
          "web": "gpt-4o",
          "ios_app": "gpt-4o-mini"
        }
      }
    }
  }
]

Message Feedback (message_feedback.json)

Thumbs-up/thumbs-down ratings:

[
  {
    "id": "fb-uuid",
    "conversation_id": "conv-uuid",
    "rating": "thumbs_up",
    "create_time": "2026-01-19T20:41:07.303611Z"
  }
]

The conversation_id links feedback to specific conversations.

Conversations (conversations-NNN.json)

Each file contains a JSON array of conversation objects. Large exports are split into multiple partitioned files (conversations-0.json, conversations-1.json, etc.).

See Data Models for complete conversation structure.

Asset Reference Resolution

File Pointer URIs

Asset references use these URI schemes:

Scheme Format Example
file-service:// file-service://file-<ID> file-service://file-8Vk2ls8JSO2iOVBq87yJ880Q
sediment:// sediment://file_<HEX> sediment://file_000000006d18720cb0249e36a7f3d2d5

The converter strips the scheme prefix and looks up the file ID in the pre-built index.

Resolution Priority

When resolving a file ID, the converter searches locations in this order:

  1. Root directory: file-<ID>-*
  2. Conversation image directories: <conv-UUID>/image/file*
  3. DALL-E directory: dalle-generations/file-<ID>-*
  4. User workspace directories: user-<user-ID>/**/file*
  5. Legacy root format: file_<HEX>-*

Missing Assets

If a file ID cannot be resolved:

  • A warning is logged with conversation ID, message ID, and unresolved pointer
  • Markdown output includes: <!-- MISSING ASSET: file-service://file-<ID> -->
  • Conversion continues (does not fail)

Conversation Structure

Each conversation in conversations-*.json contains:

  • Metadata: Title, timestamps, model slug, flags
  • Mapping: Dictionary of nodes forming a DAG
  • Current node: Pointer to the active conversation leaf

Message DAG

Messages are organized as a directed acyclic graph (DAG):

  • Root node: Always has parent: null and message: null
  • Branches: Created when users edit previous messages
  • Current node: Points to the leaf of the primary conversation path

The linearizer walks from current_node back to the root to reconstruct the conversation timeline.

Common Export Sizes

Account Activity Export Size Conversations Files
Light user 1-10 MB 10-50 0-10
Moderate user 10-100 MB 50-500 10-100
Heavy user 100 MB-1 GB 500-5,000 100+
Power user 1-10 GB 5,000+ 1,000+

The chat.html file alone can exceed 100 MB for active accounts.

Next Steps