Skip to content

Output Structure

The converter produces a structured Markdown archive with consistent naming conventions and organization.

Directory Layout

archive/
├── index.md                              # Root index with links to all sections
├── conversations/
│   ├── index.md                          # Table of all conversations
│   ├── 2024-08-24_image-creation-prompt_17cd7535/
│   │   ├── index.md                      # Rendered conversation
│   │   ├── media/
│   │   │   ├── 001-a1b2c3d4.jpeg         # Resolved media files
│   │   │   └── 002-e5f6g7h8.png
│   │   └── attachments/
│   │       ├── document.pdf
│   │       └── data.json
│   ├── 2024-09-11_python-best-practices_1ebe07f1/
│   │   ├── index.md
│   │   ├── media/
│   │   └── attachments/
│   └── ...
├── dalle/
│   ├── index.md                          # DALL-E gallery
│   └── a1b2c3d4.webp                     # DALL-E images by hash
└── metadata/
    ├── user_profile.md                   # Account information
    ├── settings.md                       # User settings and preferences
    ├── feedback.md                       # Message feedback (thumbs up/down)
    └── export_stats.md                   # Export statistics

Design rationale:

  • Conversations: Each in its own directory with media and attachments nested together
  • DALL-E: Centralized directory for all generated images
  • Metadata: Human-readable summaries of account data

Naming Conventions

Conversation Directories

Pattern: <YYYY-MM-DD>_<slug>_<short-id>/

Component Rule Example
YYYY-MM-DD From create_time (Unix → UTC date) 2024-08-24
slug Lowercase title; non-alphanumeric → hyphen; max 60 chars image-creation-prompt
short-id First 8 characters of conversation_id 17cd7535

Slug generation:

  • Uses python-slugify with text-unidecode for robust Unicode handling
  • Converts to lowercase
  • Transliterates accents and non-Latin scripts (e.g., cafécafe, Résuméresume, Компьютерkompiuter)
  • Replaces non-alphanumeric characters with hyphens
  • Collapses consecutive hyphens
  • Trims leading/trailing hyphens
  • Strips emoji
  • Truncates to 60 characters

Example:

  • Title: "Image Creation Prompt"
  • Date: 2024-08-24
  • ID: 17cd7535-aa77-4553-8c04-ee082d0d702f
  • Result: 2024-08-24_image-creation-prompt_17cd7535/

Media Files

Pattern: <NNN>-<short-hash>.<ext>

Component Rule
NNN Zero-padded ordinal (order of appearance)
short-hash First 8 hex chars of SHA-256 hash (deduplication key)
ext Original extension, lowercased

Examples:

  • 001-a1b2c3d4.jpeg
  • 002-e5f6g7h8.png
  • 003-9a0b1c2d.webp

Deduplication:

When --deduplicate is enabled (default), identical files share the same hash. The converter creates a single copy and links multiple conversations to it.

Attachment Files

Pattern: Sanitized original filename

Rules:

  • Replace unsafe characters (<>:"/\|?*) → -
  • Collapse consecutive hyphens
  • Lowercase
  • Prepend short-id on collision

Examples:

  • Original: My Document.pdfmy-document.pdf
  • Original: file<name>.jsonfile-name-.json
  • Collision: data.json17cd7535-data.json

Platform Safety

All filenames adhere to cross-platform safety rules:

  • Maximum length: 200 characters
  • No leading/trailing dots or spaces
  • Avoid Windows reserved names: CON, PRN, AUX, NUL, COM1COM9, LPT1LPT9
  • UTF-8 encoding throughout

Conversation Markdown Format

Front Matter

Every conversation starts with YAML front matter:

---
id: 17cd7535-aa77-4553-8c04-ee082d0d702f
title: Image Creation Prompt
created: 2024-08-24T07:33:32Z
updated: 2024-08-24T07:33:40Z
model: gpt-4o
---

Message Structure

Messages are rendered by role with clear separators:

**User:**

![Uploaded image](media/001-a1b2c3d4.jpeg)

Give the prompt to generate this image.

---

**Assistant** *(gpt-4o)*:

To recreate this image using a text prompt, here's a detailed description you can use...

Thinking Blocks

When --include-thinking is used, reasoning chains are included:

**Assistant** *(thinking)*:

Let me analyze this step by step:
1. The image shows...
2. Key elements include...
3. Color palette suggests...

---

**Assistant** *(gpt-4o)*:

To recreate this image using a text prompt, here's a detailed description you can use...

Tool Output

Tool invocations and outputs are rendered as collapsible blocks:

**Tool** (`web.search`):

Searched: "ChatGPT export format"

> **OpenAI Help Center**
> https://help.openai.com/...
>
> You can export your ChatGPT data from Settings → Data controls...

Code Blocks

Code content is rendered with language-specific syntax highlighting:

**Tool** (`python`):

```python
import json

with open("data.json") as f:
    data = json.load(f)
```

Missing Assets

Unresolved file pointers are indicated with HTML comments:

<!-- MISSING ASSET: file-service://file-8Vk2ls8JSO2iOVBq87yJ880Q -->

This allows post-processing to identify and potentially fix broken references.

Index Files

Root Index (index.md)

Overview of the entire archive:

  • Export statistics (conversation count, date range, file count)
  • Links to conversations index
  • Links to DALL-E gallery
  • Links to metadata section

Conversations Index (conversations/index.md)

Table of all conversations sorted by date:

Date Title Model
2024-08-24 Image Creation Prompt gpt-4o
2024-09-11 Python Best Practices gpt-4o-mini

Each title links to the conversation directory.

DALL-E Index (dalle/index.md)

Gallery view of all DALL-E-generated images with thumbnails.

Metadata Files

  • user_profile.md: Account ID, subscription status (PII redacted by default)
  • settings.md: Feature flags, model preferences, voice settings
  • feedback.md: List of thumbs-up/thumbs-down ratings by conversation
  • export_stats.md: Export date, total conversations, total messages, etc.

All links within the archive are relative for portability:

See the [conversation index](conversations/index.md) for all chats.

View [this conversation](conversations/2024-08-24_image-creation-prompt_17cd7535/index.md).

![Image](media/001-a1b2c3d4.jpeg)

This ensures the archive can be:

  • Moved to any directory
  • Hosted on a static web server
  • Zipped and shared
  • Opened in any Markdown viewer

Template Customization

The output format is controlled by Jinja2 templates in src/chatgpt_to_markdown/templates/:

Template Purpose
conversation.md.j2 Individual conversation rendering
root_index.md.j2 Archive root index
conversation_index.md.j2 Conversations table
dalle_index.md.j2 DALL-E gallery
metadata/user_profile.md.j2 User profile page
metadata/settings.md.j2 Settings page
metadata/feedback.md.j2 Feedback page
metadata/export_stats.md.j2 Export statistics page

See Development Guide for customization instructions.

Archive Size

Expected archive sizes:

Export Size Archive Size Notes
10 MB 8-12 MB Minimal overhead; mostly text
100 MB 80-110 MB Deduplication reduces size by ~20%
1 GB 700-900 MB Significant savings from deduplication

Deduplication impact:

  • Identical images in multiple conversations are stored once
  • SHA-256 hashing identifies duplicates
  • Original export may have 5-10% duplicate assets

Next Steps