Output Structure¶
The converter produces a structured Markdown archive with consistent naming conventions and organization.
Directory Layout¶
archive/
├── index.md # Root index with links to all sections
├── conversations/
│ ├── index.md # Table of all conversations
│ ├── 2024-08-24_image-creation-prompt_17cd7535/
│ │ ├── index.md # Rendered conversation
│ │ ├── media/
│ │ │ ├── 001-a1b2c3d4.jpeg # Resolved media files
│ │ │ └── 002-e5f6g7h8.png
│ │ └── attachments/
│ │ ├── document.pdf
│ │ └── data.json
│ ├── 2024-09-11_python-best-practices_1ebe07f1/
│ │ ├── index.md
│ │ ├── media/
│ │ └── attachments/
│ └── ...
├── dalle/
│ ├── index.md # DALL-E gallery
│ └── a1b2c3d4.webp # DALL-E images by hash
└── metadata/
├── user_profile.md # Account information
├── settings.md # User settings and preferences
├── feedback.md # Message feedback (thumbs up/down)
└── export_stats.md # Export statistics
Design rationale:
- Conversations: Each in its own directory with media and attachments nested together
- DALL-E: Centralized directory for all generated images
- Metadata: Human-readable summaries of account data
Naming Conventions¶
Conversation Directories¶
Pattern: <YYYY-MM-DD>_<slug>_<short-id>/
| Component | Rule | Example |
|---|---|---|
YYYY-MM-DD |
From create_time (Unix → UTC date) |
2024-08-24 |
slug |
Lowercase title; non-alphanumeric → hyphen; max 60 chars | image-creation-prompt |
short-id |
First 8 characters of conversation_id |
17cd7535 |
Slug generation:
- Uses python-slugify with text-unidecode for robust Unicode handling
- Converts to lowercase
- Transliterates accents and non-Latin scripts (e.g.,
café→cafe,Résumé→resume,Компьютер→kompiuter) - Replaces non-alphanumeric characters with hyphens
- Collapses consecutive hyphens
- Trims leading/trailing hyphens
- Strips emoji
- Truncates to 60 characters
Example:
- Title:
"Image Creation Prompt" - Date:
2024-08-24 - ID:
17cd7535-aa77-4553-8c04-ee082d0d702f - Result:
2024-08-24_image-creation-prompt_17cd7535/
Media Files¶
Pattern: <NNN>-<short-hash>.<ext>
| Component | Rule |
|---|---|
NNN |
Zero-padded ordinal (order of appearance) |
short-hash |
First 8 hex chars of SHA-256 hash (deduplication key) |
ext |
Original extension, lowercased |
Examples:
001-a1b2c3d4.jpeg002-e5f6g7h8.png003-9a0b1c2d.webp
Deduplication:
When --deduplicate is enabled (default), identical files share the same hash. The converter creates a single copy and links multiple conversations to it.
Attachment Files¶
Pattern: Sanitized original filename
Rules:
- Replace unsafe characters (
<>:"/\|?*) →- - Collapse consecutive hyphens
- Lowercase
- Prepend short-id on collision
Examples:
- Original:
My Document.pdf→my-document.pdf - Original:
file<name>.json→file-name-.json - Collision:
data.json→17cd7535-data.json
Platform Safety¶
All filenames adhere to cross-platform safety rules:
- Maximum length: 200 characters
- No leading/trailing dots or spaces
- Avoid Windows reserved names:
CON,PRN,AUX,NUL,COM1–COM9,LPT1–LPT9 - UTF-8 encoding throughout
Conversation Markdown Format¶
Front Matter¶
Every conversation starts with YAML front matter:
---
id: 17cd7535-aa77-4553-8c04-ee082d0d702f
title: Image Creation Prompt
created: 2024-08-24T07:33:32Z
updated: 2024-08-24T07:33:40Z
model: gpt-4o
---
Message Structure¶
Messages are rendered by role with clear separators:
**User:**

Give the prompt to generate this image.
---
**Assistant** *(gpt-4o)*:
To recreate this image using a text prompt, here's a detailed description you can use...
Thinking Blocks¶
When --include-thinking is used, reasoning chains are included:
**Assistant** *(thinking)*:
Let me analyze this step by step:
1. The image shows...
2. Key elements include...
3. Color palette suggests...
---
**Assistant** *(gpt-4o)*:
To recreate this image using a text prompt, here's a detailed description you can use...
Tool Output¶
Tool invocations and outputs are rendered as collapsible blocks:
**Tool** (`web.search`):
Searched: "ChatGPT export format"
> **OpenAI Help Center**
> https://help.openai.com/...
>
> You can export your ChatGPT data from Settings → Data controls...
Code Blocks¶
Code content is rendered with language-specific syntax highlighting:
Missing Assets¶
Unresolved file pointers are indicated with HTML comments:
This allows post-processing to identify and potentially fix broken references.
Index Files¶
Root Index (index.md)¶
Overview of the entire archive:
- Export statistics (conversation count, date range, file count)
- Links to conversations index
- Links to DALL-E gallery
- Links to metadata section
Conversations Index (conversations/index.md)¶
Table of all conversations sorted by date:
| Date | Title | Model |
|---|---|---|
| 2024-08-24 | Image Creation Prompt | gpt-4o |
| 2024-09-11 | Python Best Practices | gpt-4o-mini |
Each title links to the conversation directory.
DALL-E Index (dalle/index.md)¶
Gallery view of all DALL-E-generated images with thumbnails.
Metadata Files¶
user_profile.md: Account ID, subscription status (PII redacted by default)settings.md: Feature flags, model preferences, voice settingsfeedback.md: List of thumbs-up/thumbs-down ratings by conversationexport_stats.md: Export date, total conversations, total messages, etc.
Relative Links¶
All links within the archive are relative for portability:
See the [conversation index](conversations/index.md) for all chats.
View [this conversation](conversations/2024-08-24_image-creation-prompt_17cd7535/index.md).

This ensures the archive can be:
- Moved to any directory
- Hosted on a static web server
- Zipped and shared
- Opened in any Markdown viewer
Template Customization¶
The output format is controlled by Jinja2 templates in src/chatgpt_to_markdown/templates/:
| Template | Purpose |
|---|---|
conversation.md.j2 |
Individual conversation rendering |
root_index.md.j2 |
Archive root index |
conversation_index.md.j2 |
Conversations table |
dalle_index.md.j2 |
DALL-E gallery |
metadata/user_profile.md.j2 |
User profile page |
metadata/settings.md.j2 |
Settings page |
metadata/feedback.md.j2 |
Feedback page |
metadata/export_stats.md.j2 |
Export statistics page |
See Development Guide for customization instructions.
Archive Size¶
Expected archive sizes:
| Export Size | Archive Size | Notes |
|---|---|---|
| 10 MB | 8-12 MB | Minimal overhead; mostly text |
| 100 MB | 80-110 MB | Deduplication reduces size by ~20% |
| 1 GB | 700-900 MB | Significant savings from deduplication |
Deduplication impact:
- Identical images in multiple conversations are stored once
- SHA-256 hashing identifies duplicates
- Original export may have 5-10% duplicate assets
Next Steps¶
- See CLI reference for conversion options
- Review pipeline documentation for processing details
- Explore data models for JSON structure