Development Guide¶

This guide covers development workflows, testing, code style, and extending the converter.

Setup¶

Prerequisites¶

Python 3.14+
uv package manager

Installation¶

# Clone the repository
git clone https://github.com/yourusername/chatgpt-to-markdown.git
cd chatgpt-to-markdown

# Install dependencies and pre-commit hooks
just init

This runs:

uv sync --all-groups (installs all dependency groups)
uv run pre-commit install (registers git hooks)

Development Commands¶

The project uses just for task running. See justfile for all commands:

Command	Description
`just init`	Install dependencies and pre-commit hooks
`just lint`	Run Ruff linting with auto-fixes
`just format`	Format Python code with Ruff
`just ty`	Run static type checks with `ty`
`just test`	Run pytest with verbose output
`just check`	Run all quality checks (lint, ty, format, pre-commit)
`just docs`	Serve documentation locally at http://localhost:2026

Running Tests¶

# Run all tests
just test

# Run specific test file
uv run pytest tests/test_models.py -vv

# Run with coverage
uv run pytest --cov=src/chatgpt_to_markdown --cov-report=term-missing

# Run single test
uv run pytest tests/test_naming.py::TestSlugify::test_basic -vv

Running the CLI¶

# Direct execution
uv run chatgpt-to-markdown ./export ./archive

# With options
uv run chatgpt-to-markdown ./export ./archive --include-thinking --no-redact-pii

Code Style¶

Ruff Configuration¶

The project uses Ruff for both linting and formatting (configured in ruff.toml):

Line length: 120 characters
Target version: Python 3.14
Quote style: Double quotes
Indentation: 4 spaces

Enabled rule sets:

F — Pyflakes (unused imports, undefined names)
E, W — pycodestyle (PEP 8 compliance)
I — isort (import sorting)
N — pep8-naming (naming conventions)
UP — pyupgrade (modern Python idioms)
ASYNC — flake8-async (async best practices)
BLE — flake8-blind-except (avoid bare except)
B — flake8-bugbear (common bugs)
A — flake8-builtins (avoid shadowing builtins)
C4 — flake8-comprehensions (comprehension improvements)
T10 — flake8-debugger (no debugger statements)
ISC — flake8-implicit-str-concat (string concatenation)
ICN — flake8-import-conventions (standard import aliases)
PIE — flake8-pie (miscellaneous lints)
T20 — flake8-print (no print statements in library code)
PT — flake8-pytest-style (pytest best practices)
Q — flake8-quotes (consistent quote style)
RET — flake8-return (return statement improvements)
SIM — flake8-simplify (code simplification)
TC — flake8-type-checking (TYPE_CHECKING imports)
ARG — flake8-unused-arguments (unused arguments)
PTH — flake8-use-pathlib (prefer pathlib over os.path)
ERA — eradicate (remove commented-out code)
PL — Pylint (extensive checks)
PERF — Perflint (performance anti-patterns)
RUF — Ruff-specific rules

Type Checking¶

Type checking uses ty (configured in ty.toml):

Python version: 3.14
Strict mode: Enabled
Root: ./src

Type annotation requirements:

All public functions must have type annotations
Use from __future__ import annotations for forward references
Prefer pathlib.Path over str for file paths
Use TYPE_CHECKING blocks for import cycles

Example:

from __future__ import annotations

from pathlib import Path
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from chatgpt_to_markdown.models.conversation import Conversation


def process_conversation(conv: Conversation, output_dir: Path) -> Path:
    """Process a single conversation and return the output directory."""
    # Implementation...
    return output_dir / "conversation"

Naming Conventions¶

Construct	Convention	Example
Modules	`snake_case`	`naming.py`, `loader.py`
Functions	`snake_case`	`load_manifest()`
Variables	`snake_case`	`file_index`
Classes	`PascalCase`	`Conversation`, `User`
Constants	`UPPER_CASE`	`MAX_FILENAME_LENGTH`
Private members	`_prefixed`	`_content_discriminator`

Docstrings¶

Use Google-style docstrings for public APIs:

def resolve_asset_pointer(file_id: str, file_index: dict[str, str]) -> Path | None:
    """Resolve a file ID to its path in the export.

    Args:
        file_id: The extracted file ID (e.g., "file-8Vk2ls8JSO2iOVBq87yJ880Q").
        file_index: Dictionary mapping file IDs to relative paths.

    Returns:
        The resolved Path object, or None if not found.

    Examples:
        >>> resolve_asset_pointer("file-abc123", {"file-abc123": "image.png"})
        Path("image.png")
    """
    return Path(file_index[file_id]) if file_id in file_index else None

Testing Guidelines¶

Test Structure¶

Tests are organized in tests/ mirroring the src/ layout:

tests/
├── conftest.py                    # Shared fixtures
├── test_config.py                 # Config tests
├── test_integration.py            # End-to-end tests
├── test_models.py                 # Pydantic model tests
├── test_naming.py                 # Naming utility tests
├── test_pipeline.py               # Pipeline module tests
└── test_renderer.py               # Rendering tests

Writing Tests¶

Use pytest with descriptive test names:

import pytest
from pathlib import Path

from chatgpt_to_markdown.naming import slugify


class TestSlugify:
    def test_basic(self):
        assert slugify("Hello World") == "hello-world"

    def test_special_characters(self):
        assert slugify("Test@#$%File") == "test-file"

    def test_unicode(self):
        assert slugify("Café München") == "cafe-munchen"

    def test_max_length(self):
        long_title = "a" * 100
        result = slugify(long_title, max_length=60)
        assert len(result) == 60

Fixtures¶

Common fixtures in conftest.py:

@pytest.fixture
def tmp_export_dir(tmp_path: Path) -> Path:
    """Create a temporary export directory with sample files."""
    export_dir = tmp_path / "export"
    export_dir.mkdir()

    # Create manifest
    manifest = {
        "export_files": [
            {"path": "conversations-0.json", "size_bytes": 1000}
        ]
    }
    (export_dir / "export_manifest.json").write_text(json.dumps(manifest))

    return export_dir


@pytest.fixture
def sample_conversation() -> Conversation:
    """Return a minimal valid Conversation object."""
    return Conversation(
        id="conv-123",
        conversation_id="conv-123",
        title="Test Conversation",
        create_time=1724486012.0,
        current_node="node-root",
        mapping={"node-root": Node(id="node-root", parent=None, children=[])}
    )

Test Coverage¶

Current coverage: 66 tests covering:

All Pydantic models (11 tests)
Naming utilities (19 tests)
Pipeline modules (22 tests)
Config loading (2 tests)
Rendering (4 tests)
Integration (2 tests)

Coverage goals:

Maintain >80% line coverage
Test all public APIs
Test edge cases (empty inputs, missing files, malformed JSON)
Test error handling paths

Pre-Commit Hooks¶

The project uses pre-commit to enforce code quality:

Hook	Purpose
`trailing-whitespace`	Remove trailing whitespace
`end-of-file-fixer`	Ensure files end with newline
`check-yaml`	Validate YAML syntax
`check-json`	Validate JSON syntax
`check-toml`	Validate TOML syntax
`pyupgrade`	Upgrade Python syntax
`ruff check`	Lint with Ruff
`ruff format`	Format with Ruff
`mdformat`	Format Markdown files
`ty`	Type check with ty
`pytest`	Run test suite
`uv-secure`	Security check for dependencies
`bandit`	Security check for Python code

Hooks run automatically on git commit. To run manually:

# Run on all files
uv run pre-commit run --all-files

# Run specific hook
uv run pre-commit run ruff --all-files

Adding New Content Types¶

To add support for a new content type:

1. Define the Pydantic Model¶

Add to src/chatgpt_to_markdown/models/conversation.py:

class NewContentType(BaseModel):
    content_type: Literal["new_content_type"]
    # Add type-specific fields
    data: str
    metadata: dict | None = None

2. Add to Discriminated Union¶

Update the _KNOWN_CONTENT_TYPES set:

_KNOWN_CONTENT_TYPES = {
    "text",
    "multimodal_text",
    # ... existing types
    "new_content_type",  # Add here
}

Update the Content type alias:

Content = Annotated[
    TextContent
    | MultimodalTextContent
    # ... existing types
    | NewContentType  # Add here
    | FallbackContent,
    Discriminator(_content_discriminator),
]

3. Update the Renderer¶

Add rendering logic in src/chatgpt_to_markdown/pipeline/renderer.py:

def _render_content(self, content: Content) -> str:
    if isinstance(content, NewContentType):
        return f"**New Content:**\n\n{content.data}"
    # ... existing handlers

4. Add Tests¶

Create tests in tests/test_models.py:

def test_new_content_type():
    data = {
        "content_type": "new_content_type",
        "data": "example",
        "metadata": {"key": "value"}
    }
    content = NewContentType.model_validate(data)
    assert content.content_type == "new_content_type"
    assert content.data == "example"

5. Update Documentation¶

Add to data-models.md content type table.

Customizing Templates¶

Templates live in src/chatgpt_to_markdown/templates/:

Conversation Template¶

Edit conversation.md.j2 to change conversation rendering:

---
id: {{ conversation.id }}
title: {{ conversation.title }}
created: {{ conversation.created }}
---

# {{ conversation.title }}

{% for message in messages %}
**{{ message.author.role|title }}:**

{{ message.rendered_content }}

---
{% endfor %}

Index Templates¶

root_index.md.j2 — Archive root
conversation_index.md.j2 — Conversation table
dalle_index.md.j2 — DALL-E gallery

Templates use Jinja2 syntax with full access to Python filters and custom filters defined in renderer.py.

Project Structure¶

.
├── .github/
│   ├── prompts/
│   │   └── plan-chatgptExportToMarkdown.prompt.md
│   └── workflows/
│       └── docs.yml               # GitHub Pages deployment
├── docs/                          # Documentation source (zensical)
├── src/
│   └── chatgpt_to_markdown/
│       ├── models/                # Pydantic models
│       ├── pipeline/              # Pipeline modules
│       ├── templates/             # Jinja2 templates
│       ├── __init__.py
│       ├── cli.py                 # Cyclopts CLI
│       ├── config.py              # Settings
│       ├── converter.py           # Orchestrator
│       └── naming.py              # Utilities
├── tests/                         # Test suite
├── .pre-commit-config.yaml        # Pre-commit hooks
├── justfile                       # Task runner
├── pyproject.toml                 # Project metadata
├── ruff.toml                      # Ruff configuration
├── ty.toml                        # Type checker configuration
└── zensical.toml                  # Documentation configuration

Contributing¶

Fork the repository
Create a feature branch: git checkout -b feature/new-feature
Make changes and add tests
Run just check to verify all checks pass
Commit with descriptive message: git commit -m "Add support for X"
Push and open a pull request

Package Publishing Reference¶

For packaging and release workflows with uv (uv build, uv publish, and version bumping), see:

Building and publishing a package using uv

Next Steps¶

Review data models for extending functionality
See pipeline documentation for processing architecture
Consult CLI reference for usage examples