Quiz — Module 9: Module 09 — Synthetic Datasets: Generation & Usage

1. Why does generate_traces.py use a log-normal distribution for latency values instead of uniform random?

Log-normal is simpler to implement than uniform distribution Log-normal matches real production latency patterns: always positive, long right tail for p99 spikes, matches observed distributions Uniform distribution would generate negative latencies which are invalid Log-normal is required by the Langfuse ingestion API schema

2. You export 1,000 traces from a production Langfuse instance to share with students. Before sharing, you run redact_traces.py with --dry-run --report. It shows 47 values redacted across 3 pattern types. What should you do next?

Share the original file — 47 redactions out of 1,000 traces is negligible Review the report breakdown, confirm the patterns caught are correct, then run without --dry-run to produce the clean output file The dry run already produced the clean file — it's ready to share Increase the --count flag to reduce the redaction percentage

3. seed_langfuse.py receives a 429 response on the third batch. What does it do?

Stops ingestion immediately and reports the error Retries the same batch with exponential backoff (waits 1s, then 2s, then 4s for up to 3 attempts) Skips that batch and continues with the next one Switches to a different Langfuse endpoint to bypass the rate limit

📝 Module 9 Quiz

1. Why does generate_traces.py use a log-normal distribution for latency values instead of uniform random?

2. You export 1,000 traces from a production Langfuse instance to share with students. Before sharing, you run redact_traces.py with --dry-run --report. It shows 47 values redacted across 3 pattern types. What should you do next?

3. seed_langfuse.py receives a 429 response on the third batch. What does it do?