Quick Start

Installation

Install chatan from PyPI:

pip install chatan

Basic Usage

Chatan uses async/await for concurrent API calls, which speeds up dataset generation significantly.

  1. Create a generator

    import chatan
    
    gen = chatan.generator("openai", "YOUR_OPENAI_API_KEY")
    # or for Anthropic
    # gen = chatan.generator("anthropic", "YOUR_ANTHROPIC_API_KEY")
    
  2. Define your dataset schema

    ds = chatan.dataset({
        "language": chatan.sample.choice(["Python", "JavaScript", "Rust"]),
        "prompt": gen("write a coding question about {language}"),
        "response": gen("answer this question: {prompt}")
    })
    
  3. Generate data (async)

    import asyncio
    
    async def main():
        # Generate 100 samples with concurrent API calls
        df = await ds.generate(n=100)
    
        # Save to file
        ds.save("my_dataset.parquet")
        return df
    
    df = asyncio.run(main())
    

Basic Evaluation

You can measure quality while you generate data or after rows are produced.

Inline evaluation

import asyncio
from chatan import dataset, eval, sample

async def main():
    ds = dataset({
        "col1": sample.choice(["a", "a", "b"]),
        "col2": "b",
        "exact_match": eval.exact_match("col1", "col2")
    })

    df = await ds.generate(n=100)
    return df

df = asyncio.run(main())

Aggregate evaluation

# After generating data
aggregate = ds.evaluate({
    "exact_match": ds.eval.exact_match("col1", "col2"),
})
print(aggregate)

Next Steps