Output Structure¶

The converter produces a structured Markdown archive with consistent naming conventions and organization.

Directory Layout¶

archive/
├── index.md                              # Root index with links to all sections
├── conversations/
│   ├── index.md                          # Table of all conversations
│   ├── 2024-08-24_image-creation-prompt_17cd7535/
│   │   ├── index.md                      # Rendered conversation
│   │   ├── media/
│   │   │   ├── 001-a1b2c3d4.jpeg         # Resolved media files
│   │   │   └── 002-e5f6g7h8.png
│   │   └── attachments/
│   │       ├── document.pdf
│   │       └── data.json
│   ├── 2024-09-11_python-best-practices_1ebe07f1/
│   │   ├── index.md
│   │   ├── media/
│   │   └── attachments/
│   └── ...
├── dalle/
│   ├── index.md                          # DALL-E gallery
│   └── a1b2c3d4.webp                     # DALL-E images by hash
└── metadata/
    ├── user_profile.md                   # Account information
    ├── settings.md                       # User settings and preferences
    ├── feedback.md                       # Message feedback (thumbs up/down)
    └── export_stats.md                   # Export statistics

Design rationale:

Conversations: Each in its own directory with media and attachments nested together
DALL-E: Centralized directory for all generated images
Metadata: Human-readable summaries of account data

Naming Conventions¶

Conversation Directories¶

Pattern: <YYYY-MM-DD>_<slug>_<short-id>/

Component	Rule	Example
`YYYY-MM-DD`	From `create_time` (Unix → UTC date)	`2024-08-24`
`slug`	Lowercase title; non-alphanumeric → hyphen; max 60 chars	`image-creation-prompt`
`short-id`	First 8 characters of `conversation_id`	`17cd7535`

Slug generation:

Uses python-slugify with text-unidecode for robust Unicode handling
Converts to lowercase
Transliterates accents and non-Latin scripts (e.g., café → cafe, Résumé → resume, Компьютер → kompiuter)
Replaces non-alphanumeric characters with hyphens
Collapses consecutive hyphens
Trims leading/trailing hyphens
Strips emoji
Truncates to 60 characters

Example:

Title: "Image Creation Prompt"
Date: 2024-08-24
ID: 17cd7535-aa77-4553-8c04-ee082d0d702f
Result: 2024-08-24_image-creation-prompt_17cd7535/

Media Files¶

Pattern: <NNN>-<short-hash>.<ext>

Component	Rule
`NNN`	Zero-padded ordinal (order of appearance)
`short-hash`	First 8 hex chars of SHA-256 hash (deduplication key)
`ext`	Original extension, lowercased

Examples:

001-a1b2c3d4.jpeg
002-e5f6g7h8.png
003-9a0b1c2d.webp

Deduplication:

When --deduplicate is enabled (default), identical files share the same hash. The converter creates a single copy and links multiple conversations to it.

Attachment Files¶

Pattern: Sanitized original filename

Rules:

Replace unsafe characters (<>:"/\|?*) → -
Collapse consecutive hyphens
Lowercase
Prepend short-id on collision

Examples:

Original: My Document.pdf → my-document.pdf
Original: file<name>.json → file-name-.json
Collision: data.json → 17cd7535-data.json

Platform Safety¶

All filenames adhere to cross-platform safety rules:

Maximum length: 200 characters
No leading/trailing dots or spaces
Avoid Windows reserved names: CON, PRN, AUX, NUL, COM1–COM9, LPT1–LPT9
UTF-8 encoding throughout

Conversation Markdown Format¶

Front Matter¶

Every conversation starts with YAML front matter:

---
id: 17cd7535-aa77-4553-8c04-ee082d0d702f
title: Image Creation Prompt
created: 2024-08-24T07:33:32Z
updated: 2024-08-24T07:33:40Z
model: gpt-4o
---

Message Structure¶

Messages are rendered by role with clear separators:

**User:**

![Uploaded image](media/001-a1b2c3d4.jpeg)

Give the prompt to generate this image.

---

**Assistant** *(gpt-4o)*:

To recreate this image using a text prompt, here's a detailed description you can use...

Thinking Blocks¶

When --include-thinking is used, reasoning chains are included:

**Assistant** *(thinking)*:

Let me analyze this step by step:
1. The image shows...
2. Key elements include...
3. Color palette suggests...

---

**Assistant** *(gpt-4o)*:

To recreate this image using a text prompt, here's a detailed description you can use...

Tool Output¶

Tool invocations and outputs are rendered as collapsible blocks:

**Tool** (`web.search`):

Searched: "ChatGPT export format"

> **OpenAI Help Center**
> https://help.openai.com/...
>
> You can export your ChatGPT data from Settings → Data controls...

Code Blocks¶

Code content is rendered with language-specific syntax highlighting:

**Tool** (`python`):

```python
import json

with open("data.json") as f:
    data = json.load(f)
```

Missing Assets¶

Unresolved file pointers are indicated with HTML comments:

<!-- MISSING ASSET: file-service://file-8Vk2ls8JSO2iOVBq87yJ880Q -->

This allows post-processing to identify and potentially fix broken references.

Index Files¶

Root Index (`index.md`)¶

Overview of the entire archive:

Export statistics (conversation count, date range, file count)
Links to conversations index
Links to DALL-E gallery
Links to metadata section

Conversations Index (`conversations/index.md`)¶

Table of all conversations sorted by date:

Date	Title	Model
2024-08-24	Image Creation Prompt	gpt-4o
2024-09-11	Python Best Practices	gpt-4o-mini

Each title links to the conversation directory.

DALL-E Index (`dalle/index.md`)¶

Gallery view of all DALL-E-generated images with thumbnails.

Metadata Files¶

user_profile.md: Account ID, subscription status (PII redacted by default)
settings.md: Feature flags, model preferences, voice settings
feedback.md: List of thumbs-up/thumbs-down ratings by conversation
export_stats.md: Export date, total conversations, total messages, etc.

Relative Links¶

All links within the archive are relative for portability:

See the [conversation index](conversations/index.md) for all chats.

View [this conversation](conversations/2024-08-24_image-creation-prompt_17cd7535/index.md).

![Image](media/001-a1b2c3d4.jpeg)

This ensures the archive can be:

Moved to any directory
Hosted on a static web server
Zipped and shared
Opened in any Markdown viewer

Template Customization¶

The output format is controlled by Jinja2 templates in src/chatgpt_to_markdown/templates/:

Template	Purpose
`conversation.md.j2`	Individual conversation rendering
`root_index.md.j2`	Archive root index
`conversation_index.md.j2`	Conversations table
`dalle_index.md.j2`	DALL-E gallery
`metadata/user_profile.md.j2`	User profile page
`metadata/settings.md.j2`	Settings page
`metadata/feedback.md.j2`	Feedback page
`metadata/export_stats.md.j2`	Export statistics page

See Development Guide for customization instructions.

Archive Size¶

Expected archive sizes:

Export Size	Archive Size	Notes
10 MB	8-12 MB	Minimal overhead; mostly text
100 MB	80-110 MB	Deduplication reduces size by ~20%
1 GB	700-900 MB	Significant savings from deduplication

Deduplication impact:

Identical images in multiple conversations are stored once
SHA-256 hashing identifies duplicates
Original export may have 5-10% duplicate assets

Next Steps¶

See CLI reference for conversion options
Review pipeline documentation for processing details
Explore data models for JSON structure