Quick Start

Installation

Install chatan from PyPI:

pip install chatan

Basic Usage

Chatan uses async/await for concurrent API calls, which speeds up dataset generation significantly.

Create a generator

import chatan

gen = chatan.generator("openai", "YOUR_OPENAI_API_KEY")
# or for Anthropic
# gen = chatan.generator("anthropic", "YOUR_ANTHROPIC_API_KEY")

Define your dataset schema

ds = chatan.dataset({
    "language": chatan.sample.choice(["Python", "JavaScript", "Rust"]),
    "prompt": gen("write a coding question about {language}"),
    "response": gen("answer this question: {prompt}")
})

Generate data (async)

import asyncio

async def main():
    # Generate 100 samples with concurrent API calls
    df = await ds.generate(n=100)

    # Save to file
    ds.save("my_dataset.parquet")
    return df

df = asyncio.run(main())

Basic Evaluation

You can measure quality while you generate data or after rows are produced.

Inline evaluation

import asyncio
from chatan import dataset, eval, sample

async def main():
    ds = dataset({
        "col1": sample.choice(["a", "a", "b"]),
        "col2": "b",
        "exact_match": eval.exact_match("col1", "col2")
    })

    df = await ds.generate(n=100)
    return df

df = asyncio.run(main())

Aggregate evaluation

# After generating data
aggregate = ds.evaluate({
    "exact_match": ds.eval.exact_match("col1", "col2"),
})
print(aggregate)

Next Steps

Check out Datasets and Generators for more complex use cases
Browse the API Reference reference for all available functions