Skip to content

Development Guide

This guide covers development workflows, testing, code style, and extending the converter.

Setup

Prerequisites

  • Python 3.14+
  • uv package manager

Installation

# Clone the repository
git clone https://github.com/yourusername/chatgpt-to-markdown.git
cd chatgpt-to-markdown

# Install dependencies and pre-commit hooks
just init

This runs:

  • uv sync --all-groups (installs all dependency groups)
  • uv run pre-commit install (registers git hooks)

Development Commands

The project uses just for task running. See justfile for all commands:

Command Description
just init Install dependencies and pre-commit hooks
just lint Run Ruff linting with auto-fixes
just format Format Python code with Ruff
just ty Run static type checks with ty
just test Run pytest with verbose output
just check Run all quality checks (lint, ty, format, pre-commit)
just docs Serve documentation locally at http://localhost:2026

Running Tests

# Run all tests
just test

# Run specific test file
uv run pytest tests/test_models.py -vv

# Run with coverage
uv run pytest --cov=src/chatgpt_to_markdown --cov-report=term-missing

# Run single test
uv run pytest tests/test_naming.py::TestSlugify::test_basic -vv

Running the CLI

# Direct execution
uv run chatgpt-to-markdown ./export ./archive

# With options
uv run chatgpt-to-markdown ./export ./archive --include-thinking --no-redact-pii

Code Style

Ruff Configuration

The project uses Ruff for both linting and formatting (configured in ruff.toml):

  • Line length: 120 characters
  • Target version: Python 3.14
  • Quote style: Double quotes
  • Indentation: 4 spaces

Enabled rule sets:

  • F — Pyflakes (unused imports, undefined names)
  • E, W — pycodestyle (PEP 8 compliance)
  • I — isort (import sorting)
  • N — pep8-naming (naming conventions)
  • UP — pyupgrade (modern Python idioms)
  • ASYNC — flake8-async (async best practices)
  • BLE — flake8-blind-except (avoid bare except)
  • B — flake8-bugbear (common bugs)
  • A — flake8-builtins (avoid shadowing builtins)
  • C4 — flake8-comprehensions (comprehension improvements)
  • T10 — flake8-debugger (no debugger statements)
  • ISC — flake8-implicit-str-concat (string concatenation)
  • ICN — flake8-import-conventions (standard import aliases)
  • PIE — flake8-pie (miscellaneous lints)
  • T20 — flake8-print (no print statements in library code)
  • PT — flake8-pytest-style (pytest best practices)
  • Q — flake8-quotes (consistent quote style)
  • RET — flake8-return (return statement improvements)
  • SIM — flake8-simplify (code simplification)
  • TC — flake8-type-checking (TYPE_CHECKING imports)
  • ARG — flake8-unused-arguments (unused arguments)
  • PTH — flake8-use-pathlib (prefer pathlib over os.path)
  • ERA — eradicate (remove commented-out code)
  • PL — Pylint (extensive checks)
  • PERF — Perflint (performance anti-patterns)
  • RUF — Ruff-specific rules

Type Checking

Type checking uses ty (configured in ty.toml):

  • Python version: 3.14
  • Strict mode: Enabled
  • Root: ./src

Type annotation requirements:

  • All public functions must have type annotations
  • Use from __future__ import annotations for forward references
  • Prefer pathlib.Path over str for file paths
  • Use TYPE_CHECKING blocks for import cycles

Example:

from __future__ import annotations

from pathlib import Path
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from chatgpt_to_markdown.models.conversation import Conversation


def process_conversation(conv: Conversation, output_dir: Path) -> Path:
    """Process a single conversation and return the output directory."""
    # Implementation...
    return output_dir / "conversation"

Naming Conventions

Construct Convention Example
Modules snake_case naming.py, loader.py
Functions snake_case load_manifest()
Variables snake_case file_index
Classes PascalCase Conversation, User
Constants UPPER_CASE MAX_FILENAME_LENGTH
Private members _prefixed _content_discriminator

Docstrings

Use Google-style docstrings for public APIs:

def resolve_asset_pointer(file_id: str, file_index: dict[str, str]) -> Path | None:
    """Resolve a file ID to its path in the export.

    Args:
        file_id: The extracted file ID (e.g., "file-8Vk2ls8JSO2iOVBq87yJ880Q").
        file_index: Dictionary mapping file IDs to relative paths.

    Returns:
        The resolved Path object, or None if not found.

    Examples:
        >>> resolve_asset_pointer("file-abc123", {"file-abc123": "image.png"})
        Path("image.png")
    """
    return Path(file_index[file_id]) if file_id in file_index else None

Testing Guidelines

Test Structure

Tests are organized in tests/ mirroring the src/ layout:

tests/
├── conftest.py                    # Shared fixtures
├── test_config.py                 # Config tests
├── test_integration.py            # End-to-end tests
├── test_models.py                 # Pydantic model tests
├── test_naming.py                 # Naming utility tests
├── test_pipeline.py               # Pipeline module tests
└── test_renderer.py               # Rendering tests

Writing Tests

Use pytest with descriptive test names:

import pytest
from pathlib import Path

from chatgpt_to_markdown.naming import slugify


class TestSlugify:
    def test_basic(self):
        assert slugify("Hello World") == "hello-world"

    def test_special_characters(self):
        assert slugify("Test@#$%File") == "test-file"

    def test_unicode(self):
        assert slugify("Café München") == "cafe-munchen"

    def test_max_length(self):
        long_title = "a" * 100
        result = slugify(long_title, max_length=60)
        assert len(result) == 60

Fixtures

Common fixtures in conftest.py:

@pytest.fixture
def tmp_export_dir(tmp_path: Path) -> Path:
    """Create a temporary export directory with sample files."""
    export_dir = tmp_path / "export"
    export_dir.mkdir()

    # Create manifest
    manifest = {
        "export_files": [
            {"path": "conversations-0.json", "size_bytes": 1000}
        ]
    }
    (export_dir / "export_manifest.json").write_text(json.dumps(manifest))

    return export_dir


@pytest.fixture
def sample_conversation() -> Conversation:
    """Return a minimal valid Conversation object."""
    return Conversation(
        id="conv-123",
        conversation_id="conv-123",
        title="Test Conversation",
        create_time=1724486012.0,
        current_node="node-root",
        mapping={"node-root": Node(id="node-root", parent=None, children=[])}
    )

Test Coverage

Current coverage: 66 tests covering:

  • All Pydantic models (11 tests)
  • Naming utilities (19 tests)
  • Pipeline modules (22 tests)
  • Config loading (2 tests)
  • Rendering (4 tests)
  • Integration (2 tests)

Coverage goals:

  • Maintain >80% line coverage
  • Test all public APIs
  • Test edge cases (empty inputs, missing files, malformed JSON)
  • Test error handling paths

Pre-Commit Hooks

The project uses pre-commit to enforce code quality:

Hook Purpose
trailing-whitespace Remove trailing whitespace
end-of-file-fixer Ensure files end with newline
check-yaml Validate YAML syntax
check-json Validate JSON syntax
check-toml Validate TOML syntax
pyupgrade Upgrade Python syntax
ruff check Lint with Ruff
ruff format Format with Ruff
mdformat Format Markdown files
ty Type check with ty
pytest Run test suite
uv-secure Security check for dependencies
bandit Security check for Python code

Hooks run automatically on git commit. To run manually:

# Run on all files
uv run pre-commit run --all-files

# Run specific hook
uv run pre-commit run ruff --all-files

Adding New Content Types

To add support for a new content type:

1. Define the Pydantic Model

Add to src/chatgpt_to_markdown/models/conversation.py:

class NewContentType(BaseModel):
    content_type: Literal["new_content_type"]
    # Add type-specific fields
    data: str
    metadata: dict | None = None

2. Add to Discriminated Union

Update the _KNOWN_CONTENT_TYPES set:

_KNOWN_CONTENT_TYPES = {
    "text",
    "multimodal_text",
    # ... existing types
    "new_content_type",  # Add here
}

Update the Content type alias:

Content = Annotated[
    TextContent
    | MultimodalTextContent
    # ... existing types
    | NewContentType  # Add here
    | FallbackContent,
    Discriminator(_content_discriminator),
]

3. Update the Renderer

Add rendering logic in src/chatgpt_to_markdown/pipeline/renderer.py:

def _render_content(self, content: Content) -> str:
    if isinstance(content, NewContentType):
        return f"**New Content:**\n\n{content.data}"
    # ... existing handlers

4. Add Tests

Create tests in tests/test_models.py:

def test_new_content_type():
    data = {
        "content_type": "new_content_type",
        "data": "example",
        "metadata": {"key": "value"}
    }
    content = NewContentType.model_validate(data)
    assert content.content_type == "new_content_type"
    assert content.data == "example"

5. Update Documentation

Add to data-models.md content type table.

Customizing Templates

Templates live in src/chatgpt_to_markdown/templates/:

Conversation Template

Edit conversation.md.j2 to change conversation rendering:

---
id: {{ conversation.id }}
title: {{ conversation.title }}
created: {{ conversation.created }}
---

# {{ conversation.title }}

{% for message in messages %}
**{{ message.author.role|title }}:**

{{ message.rendered_content }}

---
{% endfor %}

Index Templates

  • root_index.md.j2 — Archive root
  • conversation_index.md.j2 — Conversation table
  • dalle_index.md.j2 — DALL-E gallery

Templates use Jinja2 syntax with full access to Python filters and custom filters defined in renderer.py.

Project Structure

.
├── .github/
│   ├── prompts/
│   │   └── plan-chatgptExportToMarkdown.prompt.md
│   └── workflows/
│       └── docs.yml               # GitHub Pages deployment
├── docs/                          # Documentation source (zensical)
├── src/
│   └── chatgpt_to_markdown/
│       ├── models/                # Pydantic models
│       ├── pipeline/              # Pipeline modules
│       ├── templates/             # Jinja2 templates
│       ├── __init__.py
│       ├── cli.py                 # Cyclopts CLI
│       ├── config.py              # Settings
│       ├── converter.py           # Orchestrator
│       └── naming.py              # Utilities
├── tests/                         # Test suite
├── .pre-commit-config.yaml        # Pre-commit hooks
├── justfile                       # Task runner
├── pyproject.toml                 # Project metadata
├── ruff.toml                      # Ruff configuration
├── ty.toml                        # Type checker configuration
└── zensical.toml                  # Documentation configuration

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/new-feature
  3. Make changes and add tests
  4. Run just check to verify all checks pass
  5. Commit with descriptive message: git commit -m "Add support for X"
  6. Push and open a pull request

Package Publishing Reference

For packaging and release workflows with uv (uv build, uv publish, and version bumping), see:

Next Steps