Quick Start
Installation
Install chatan from PyPI:
pip install chatan
Basic Usage
Chatan uses async/await for concurrent API calls, which speeds up dataset generation significantly.
Create a generator
import chatan gen = chatan.generator("openai", "YOUR_OPENAI_API_KEY") # or for Anthropic # gen = chatan.generator("anthropic", "YOUR_ANTHROPIC_API_KEY")
Define your dataset schema
ds = chatan.dataset({ "language": chatan.sample.choice(["Python", "JavaScript", "Rust"]), "prompt": gen("write a coding question about {language}"), "response": gen("answer this question: {prompt}") })
Generate data (async)
import asyncio async def main(): # Generate 100 samples with concurrent API calls df = await ds.generate(n=100) # Save to file ds.save("my_dataset.parquet") return df df = asyncio.run(main())
Basic Evaluation
You can measure quality while you generate data or after rows are produced.
Inline evaluation
import asyncio
from chatan import dataset, eval, sample
async def main():
ds = dataset({
"col1": sample.choice(["a", "a", "b"]),
"col2": "b",
"exact_match": eval.exact_match("col1", "col2")
})
df = await ds.generate(n=100)
return df
df = asyncio.run(main())
Aggregate evaluation
# After generating data
aggregate = ds.evaluate({
"exact_match": ds.eval.exact_match("col1", "col2"),
})
print(aggregate)
Next Steps
Check out Datasets and Generators for more complex use cases
Browse the API Reference reference for all available functions