
Simplifying Experiment Setup: Using the Experiment Review Agent in Optimizely
If you've ever launched an A/B test only to discover weeks later that it won't reach statistical significance, you know the frustration of wasted time and resources. Setting up experiments correctly requires checking dozens of configuration details, and even experienced teams miss critical elements that doom tests from the start.
Optimizely's new Experiment Review Agent, part of their Opal AI ecosystem launched in beta this year, acts as an automated quality check for your experiments. Think of it as having an experienced data scientist review your setup before launch, but available instantly and consistently for every test.
This guide walks through implementing the Review Agent in your experimentation workflow, including practical setup steps, common configuration issues it catches, and how to integrate its feedback into your testing process.
Prerequisites
Before using the Experiment Review Agent, you'll need:
- Optimizely Web Experimentation or Feature Experimentation account with access to the beta features (contact your account manager if you don't see the Review Agent option)
- At least one experiment draft created in your project
- Basic understanding of Optimizely's experiment structure (variations, audiences, metrics)
- Team alignment on experimentation goals - Working with teams has taught us that the Review Agent works best when your team already has clear testing objectives and success metrics defined
The Review Agent currently supports:
- A/B tests
- Multivariate tests
- Feature flag experiments
- Both client-side and server-side implementations
Note that personalization campaigns and some custom metric types aren't fully supported in the beta version.
Step-by-Step Implementation
Step 1: Create Your Experiment Draft
Start by setting up your experiment in Optimizely as you normally would:
1. Navigate to your project and click "Create New Experiment"
2. Choose your experiment type (A/B test, multivariate, or feature test)
3. Set up your variations with clear, descriptive names
4. Define your audience targeting rules
5. Configure your primary metric (this is critical - the Review Agent will flag experiments without primary metrics)
At this stage, don't worry about perfection. The Review Agent will help identify what needs adjustment.
Step 2: Access the Review Agent
Once your experiment draft is ready:
1. Navigate to your experiment's Summary page
2. Look for the "Review Experiment" button (typically in the upper right corner)
3. Alternatively, access it through: Start Experiment → Pre-launch Review
If you don't see these options, your account may not have beta access enabled yet.
Step 3: Run the Automated Analysis
Click "Review Experiment" to start the analysis. The Review Agent examines:
- Metric configuration and alignment with goals
- Audience targeting rules and potential conflicts
- Traffic allocation settings
- Variation setup and naming conventions
- Sample size requirements for statistical significance
- Overlap with other running experiments
The analysis typically takes 5-10 seconds to complete.
Step 4: Interpret the Feedback Report
The Review Agent presents feedback in three categories:
Critical Issues (Red): Must fix before launch
- Missing primary metric
- Audience conflicts with running experiments
- Invalid traffic allocation percentages
- Privacy compliance violations
Suggested Improvements (Yellow): Best practices to consider
- Adding secondary metrics for deeper insights
- Refining audience targeting for clearer results
- Adjusting traffic allocation for faster significance
Strengths (Green): What you're doing well
- Clear variation naming
- Proper metric selection
- Good experiment documentation
Step 5: Apply Recommendations
For each piece of feedback, the Review Agent provides specific action items:
{
"issue": "Audience overlap detected",
"severity": "critical",
"details": "This experiment targets users also included in 'Holiday Promo Test' (ID: 12345)",
"action": "Add exclusion rule: NOT in_experiment('12345') to your audience configuration"
}Make the recommended changes directly in your experiment setup. Focus on critical issues first, then consider the suggested improvements based on your testing goals.
Step 6: Re-run the Review
After making changes:
1. Click "Review Again" to verify your fixes
2. Continue iterating until all critical issues are resolved
3. Document any suggested improvements you're intentionally skipping (with reasoning)
Step 7: Launch with Confidence
Once the Review Agent shows no critical issues:
1. Proceed to your normal launch approval process
2. Share the Review Agent report with stakeholders
3. Launch your experiment knowing it's configured for success
Code Examples and Configuration
While the Review Agent operates through the UI, you can also integrate its checks programmatically using Optimizely's REST API:
// Example: Triggering review via API
const reviewExperiment = async (experimentId) => {
const response = await fetch(`https://api.optimizely.com/v2/experiments/${experimentId}/review`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${YOUR_API_TOKEN}`,
'Content-Type': 'application/json'
}
});
const reviewResults = await response.json();
return reviewResults;
};
// Process review feedback
const processReview = (reviewResults) => {
const criticalIssues = reviewResults.issues.filter(i => i.severity === 'critical');
if (criticalIssues.length > 0) {
console.log('Critical issues found:');
criticalIssues.forEach(issue => {
console.log(`- ${issue.description}`);
console.log(` Action: ${issue.recommendedAction}`);
});
return false; // Block launch
}
return true; // Ready to launch
};For teams using automated deployment pipelines, you can integrate these checks into your CI/CD process:
# Example GitHub Actions workflow
name: Experiment Review
on:
pull_request:
paths:
- 'experiments/*.json'
jobs:
review:
runs-on: ubuntu-latest
steps:
- name: Review Experiment Configuration
run: |
REVIEW_RESULT=$(curl -X POST \
-H "Authorization: Bearer ${{ secrets.OPTIMIZELY_TOKEN }}" \
https://api.optimizely.com/v2/experiments/${{ env.EXPERIMENT_ID }}/review)
# Check for critical issues
CRITICAL_COUNT=$(echo $REVIEW_RESULT | jq '.issues | map(select(.severity == "critical")) | length')
if [ $CRITICAL_COUNT -gt 0 ]; then
echo "Critical issues found in experiment setup"
exit 1
fiCommon Mistakes to Avoid
Our experience shows that teams often encounter these issues when first using the Review Agent:
1. Ignoring Yellow Warnings
While yellow warnings aren't blockers, they often indicate problems that will hurt your results. A warning about "audience too broad" might seem minor, but it could mean your test takes months to reach significance instead of weeks.
2. Not Re-reviewing After Changes
The Review Agent's recommendations sometimes interact with each other. Fixing one issue might create another, so always re-run the review after making changes.
3. Skipping Documentation
When you intentionally skip a recommendation, document why. Future team members (or future you) need to understand the reasoning. Add notes directly in the experiment description:
Note: Review Agent suggested narrowing audience to mobile users only, but we need desktop data for Q4 reporting requirements.
4. Over-relying on Automation
The Review Agent catches technical issues but can't understand your business context. It might suggest removing an audience segment that seems small but represents your highest-value customers.
5. Launching Without Human Review
Even with green lights from the Review Agent, have someone familiar with your testing program do a final sanity check. The agent might miss issues specific to your implementation or business rules.
Testing and Verification Steps
After implementing the Review Agent in your workflow, verify it's working correctly:
1. Test with a Known Bad Configuration
Create an experiment with obvious issues:
- No primary metric set
- 110% traffic allocation
- Overlapping audience with another test
Run the Review Agent and confirm it catches all these issues.
2. Compare Historical Data
We've found that teams benefit from reviewing past failed experiments:
1. Take 5-10 experiments that failed to reach significance
2. Recreate them as drafts
3. Run the Review Agent on each
4. Document what issues it would have caught
This helps build confidence in the tool and identifies patterns in your testing approach.
3. Measure Impact Over Time
Track these metrics before and after implementing the Review Agent:
- Percentage of experiments reaching statistical significance
- Average time to significance
- Number of experiments killed due to setup errors
- Team time spent on pre-launch reviews
Most teams see improvements within their first 10-15 experiments using the agent.
4. Gather Team Feedback
After your team has used the Review Agent for a month:
- Survey experimenters on time saved
- Document any false positives or missed issues
- Share feedback with Optimizely to improve the beta
5. Validate Integration Points
If you've integrated the Review Agent with your deployment pipeline:
- Test with various experiment configurations
- Verify error handling for API failures
- Confirm notifications reach the right team members
Conclusion
The Experiment Review Agent removes much of the guesswork from experiment setup, catching configuration errors that would otherwise waste weeks of testing time. By automating the technical review process, it frees your team to focus on hypothesis development and results analysis rather than hunting for setup mistakes.
The real value becomes clear after running several experiments with the Review Agent's guidance. Teams report not just fewer failed tests, but also faster learning cycles and more confidence in their experimentation program. The agent essentially codifies best practices from thousands of successful experiments, making that knowledge available instantly for every test you run.
If you're setting up Optimizely's Experiment Review Agent and need help integrating it with your existing testing workflow or building custom validation rules for your specific requirements, we can walk you through the implementation process and share patterns that have worked well for similar companies in your industry.
