Development Guide¶
This guide covers development workflows, testing, code style, and extending the converter.
Setup¶
Prerequisites¶
- Python 3.14+
uvpackage manager
Installation¶
# Clone the repository
git clone https://github.com/yourusername/chatgpt-to-markdown.git
cd chatgpt-to-markdown
# Install dependencies and pre-commit hooks
just init
This runs:
uv sync --all-groups(installs all dependency groups)uv run pre-commit install(registers git hooks)
Development Commands¶
The project uses just for task running. See justfile for all commands:
| Command | Description |
|---|---|
just init |
Install dependencies and pre-commit hooks |
just lint |
Run Ruff linting with auto-fixes |
just format |
Format Python code with Ruff |
just ty |
Run static type checks with ty |
just test |
Run pytest with verbose output |
just check |
Run all quality checks (lint, ty, format, pre-commit) |
just docs |
Serve documentation locally at http://localhost:2026 |
Running Tests¶
# Run all tests
just test
# Run specific test file
uv run pytest tests/test_models.py -vv
# Run with coverage
uv run pytest --cov=src/chatgpt_to_markdown --cov-report=term-missing
# Run single test
uv run pytest tests/test_naming.py::TestSlugify::test_basic -vv
Running the CLI¶
# Direct execution
uv run chatgpt-to-markdown ./export ./archive
# With options
uv run chatgpt-to-markdown ./export ./archive --include-thinking --no-redact-pii
Code Style¶
Ruff Configuration¶
The project uses Ruff for both linting and formatting (configured in ruff.toml):
- Line length: 120 characters
- Target version: Python 3.14
- Quote style: Double quotes
- Indentation: 4 spaces
Enabled rule sets:
F— Pyflakes (unused imports, undefined names)E,W— pycodestyle (PEP 8 compliance)I— isort (import sorting)N— pep8-naming (naming conventions)UP— pyupgrade (modern Python idioms)ASYNC— flake8-async (async best practices)BLE— flake8-blind-except (avoid bareexcept)B— flake8-bugbear (common bugs)A— flake8-builtins (avoid shadowing builtins)C4— flake8-comprehensions (comprehension improvements)T10— flake8-debugger (no debugger statements)ISC— flake8-implicit-str-concat (string concatenation)ICN— flake8-import-conventions (standard import aliases)PIE— flake8-pie (miscellaneous lints)T20— flake8-print (no print statements in library code)PT— flake8-pytest-style (pytest best practices)Q— flake8-quotes (consistent quote style)RET— flake8-return (return statement improvements)SIM— flake8-simplify (code simplification)TC— flake8-type-checking (TYPE_CHECKING imports)ARG— flake8-unused-arguments (unused arguments)PTH— flake8-use-pathlib (prefer pathlib over os.path)ERA— eradicate (remove commented-out code)PL— Pylint (extensive checks)PERF— Perflint (performance anti-patterns)RUF— Ruff-specific rules
Type Checking¶
Type checking uses ty (configured in ty.toml):
- Python version: 3.14
- Strict mode: Enabled
- Root:
./src
Type annotation requirements:
- All public functions must have type annotations
- Use
from __future__ import annotationsfor forward references - Prefer
pathlib.Pathoverstrfor file paths - Use
TYPE_CHECKINGblocks for import cycles
Example:
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from chatgpt_to_markdown.models.conversation import Conversation
def process_conversation(conv: Conversation, output_dir: Path) -> Path:
"""Process a single conversation and return the output directory."""
# Implementation...
return output_dir / "conversation"
Naming Conventions¶
| Construct | Convention | Example |
|---|---|---|
| Modules | snake_case |
naming.py, loader.py |
| Functions | snake_case |
load_manifest() |
| Variables | snake_case |
file_index |
| Classes | PascalCase |
Conversation, User |
| Constants | UPPER_CASE |
MAX_FILENAME_LENGTH |
| Private members | _prefixed |
_content_discriminator |
Docstrings¶
Use Google-style docstrings for public APIs:
def resolve_asset_pointer(file_id: str, file_index: dict[str, str]) -> Path | None:
"""Resolve a file ID to its path in the export.
Args:
file_id: The extracted file ID (e.g., "file-8Vk2ls8JSO2iOVBq87yJ880Q").
file_index: Dictionary mapping file IDs to relative paths.
Returns:
The resolved Path object, or None if not found.
Examples:
>>> resolve_asset_pointer("file-abc123", {"file-abc123": "image.png"})
Path("image.png")
"""
return Path(file_index[file_id]) if file_id in file_index else None
Testing Guidelines¶
Test Structure¶
Tests are organized in tests/ mirroring the src/ layout:
tests/
├── conftest.py # Shared fixtures
├── test_config.py # Config tests
├── test_integration.py # End-to-end tests
├── test_models.py # Pydantic model tests
├── test_naming.py # Naming utility tests
├── test_pipeline.py # Pipeline module tests
└── test_renderer.py # Rendering tests
Writing Tests¶
Use pytest with descriptive test names:
import pytest
from pathlib import Path
from chatgpt_to_markdown.naming import slugify
class TestSlugify:
def test_basic(self):
assert slugify("Hello World") == "hello-world"
def test_special_characters(self):
assert slugify("Test@#$%File") == "test-file"
def test_unicode(self):
assert slugify("Café München") == "cafe-munchen"
def test_max_length(self):
long_title = "a" * 100
result = slugify(long_title, max_length=60)
assert len(result) == 60
Fixtures¶
Common fixtures in conftest.py:
@pytest.fixture
def tmp_export_dir(tmp_path: Path) -> Path:
"""Create a temporary export directory with sample files."""
export_dir = tmp_path / "export"
export_dir.mkdir()
# Create manifest
manifest = {
"export_files": [
{"path": "conversations-0.json", "size_bytes": 1000}
]
}
(export_dir / "export_manifest.json").write_text(json.dumps(manifest))
return export_dir
@pytest.fixture
def sample_conversation() -> Conversation:
"""Return a minimal valid Conversation object."""
return Conversation(
id="conv-123",
conversation_id="conv-123",
title="Test Conversation",
create_time=1724486012.0,
current_node="node-root",
mapping={"node-root": Node(id="node-root", parent=None, children=[])}
)
Test Coverage¶
Current coverage: 66 tests covering:
- All Pydantic models (11 tests)
- Naming utilities (19 tests)
- Pipeline modules (22 tests)
- Config loading (2 tests)
- Rendering (4 tests)
- Integration (2 tests)
Coverage goals:
- Maintain >80% line coverage
- Test all public APIs
- Test edge cases (empty inputs, missing files, malformed JSON)
- Test error handling paths
Pre-Commit Hooks¶
The project uses pre-commit to enforce code quality:
| Hook | Purpose |
|---|---|
trailing-whitespace |
Remove trailing whitespace |
end-of-file-fixer |
Ensure files end with newline |
check-yaml |
Validate YAML syntax |
check-json |
Validate JSON syntax |
check-toml |
Validate TOML syntax |
pyupgrade |
Upgrade Python syntax |
ruff check |
Lint with Ruff |
ruff format |
Format with Ruff |
mdformat |
Format Markdown files |
ty |
Type check with ty |
pytest |
Run test suite |
uv-secure |
Security check for dependencies |
bandit |
Security check for Python code |
Hooks run automatically on git commit. To run manually:
# Run on all files
uv run pre-commit run --all-files
# Run specific hook
uv run pre-commit run ruff --all-files
Adding New Content Types¶
To add support for a new content type:
1. Define the Pydantic Model¶
Add to src/chatgpt_to_markdown/models/conversation.py:
class NewContentType(BaseModel):
content_type: Literal["new_content_type"]
# Add type-specific fields
data: str
metadata: dict | None = None
2. Add to Discriminated Union¶
Update the _KNOWN_CONTENT_TYPES set:
_KNOWN_CONTENT_TYPES = {
"text",
"multimodal_text",
# ... existing types
"new_content_type", # Add here
}
Update the Content type alias:
Content = Annotated[
TextContent
| MultimodalTextContent
# ... existing types
| NewContentType # Add here
| FallbackContent,
Discriminator(_content_discriminator),
]
3. Update the Renderer¶
Add rendering logic in src/chatgpt_to_markdown/pipeline/renderer.py:
def _render_content(self, content: Content) -> str:
if isinstance(content, NewContentType):
return f"**New Content:**\n\n{content.data}"
# ... existing handlers
4. Add Tests¶
Create tests in tests/test_models.py:
def test_new_content_type():
data = {
"content_type": "new_content_type",
"data": "example",
"metadata": {"key": "value"}
}
content = NewContentType.model_validate(data)
assert content.content_type == "new_content_type"
assert content.data == "example"
5. Update Documentation¶
Add to data-models.md content type table.
Customizing Templates¶
Templates live in src/chatgpt_to_markdown/templates/:
Conversation Template¶
Edit conversation.md.j2 to change conversation rendering:
---
id: {{ conversation.id }}
title: {{ conversation.title }}
created: {{ conversation.created }}
---
# {{ conversation.title }}
{% for message in messages %}
**{{ message.author.role|title }}:**
{{ message.rendered_content }}
---
{% endfor %}
Index Templates¶
root_index.md.j2— Archive rootconversation_index.md.j2— Conversation tabledalle_index.md.j2— DALL-E gallery
Templates use Jinja2 syntax with full access to Python filters and custom filters defined in renderer.py.
Project Structure¶
.
├── .github/
│ ├── prompts/
│ │ └── plan-chatgptExportToMarkdown.prompt.md
│ └── workflows/
│ └── docs.yml # GitHub Pages deployment
├── docs/ # Documentation source (zensical)
├── src/
│ └── chatgpt_to_markdown/
│ ├── models/ # Pydantic models
│ ├── pipeline/ # Pipeline modules
│ ├── templates/ # Jinja2 templates
│ ├── __init__.py
│ ├── cli.py # Cyclopts CLI
│ ├── config.py # Settings
│ ├── converter.py # Orchestrator
│ └── naming.py # Utilities
├── tests/ # Test suite
├── .pre-commit-config.yaml # Pre-commit hooks
├── justfile # Task runner
├── pyproject.toml # Project metadata
├── ruff.toml # Ruff configuration
├── ty.toml # Type checker configuration
└── zensical.toml # Documentation configuration
Contributing¶
- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature - Make changes and add tests
- Run
just checkto verify all checks pass - Commit with descriptive message:
git commit -m "Add support for X" - Push and open a pull request
Package Publishing Reference¶
For packaging and release workflows with uv (uv build, uv publish, and version bumping), see:
Next Steps¶
- Review data models for extending functionality
- See pipeline documentation for processing architecture
- Consult CLI reference for usage examples