📝 Module 9 Quiz
Module 09 — Synthetic Datasets: Generation & Usage
Answer all questions. You need 70% to pass.
1. Why does generate_traces.py use a log-normal distribution for latency values instead of uniform random?
Log-normal is simpler to implement than uniform distribution
Log-normal matches real production latency patterns: always positive, long right tail for p99 spikes, matches observed distributions
Uniform distribution would generate negative latencies which are invalid
Log-normal is required by the Langfuse ingestion API schema
2. You export 1,000 traces from a production Langfuse instance to share with students. Before sharing, you run redact_traces.py with --dry-run --report. It shows 47 values redacted across 3 pattern types. What should you do next?
Share the original file — 47 redactions out of 1,000 traces is negligible
Review the report breakdown, confirm the patterns caught are correct, then run without --dry-run to produce the clean output file
The dry run already produced the clean file — it's ready to share
Increase the --count flag to reduce the redaction percentage
3. seed_langfuse.py receives a 429 response on the third batch. What does it do?
Stops ingestion immediately and reports the error
Retries the same batch with exponential backoff (waits 1s, then 2s, then 4s for up to 3 attempts)
Skips that batch and continues with the next one
Switches to a different Langfuse endpoint to bypass the rate limit
Submit Quiz
Cancel