Features
- Automated Website Crawling: Systematic analysis of HTML, CSS, JavaScript, and PHP files
- Multi-dimensional Quality Checks: SEO, accessibility, security, performance, and UI evaluation
- Performance Analysis: Integration with Google PageSpeed Insights for mobile and desktop metrics
- Detailed Reporting: Generates structured reports with actionable recommendations
- Real-time Processing: Live streaming of analysis results with progress tracking
How It Works
The QA Agent processes websites through a comprehensive analysis pipeline:Architecture Overview
Processing Pipeline
- Content Extraction: Scrapes and parses all website resources
- Code Analysis: LLM-powered evaluation against quality checklists
- Performance Testing: Google PageSpeed API integration for speed metrics
- Report Generation: Combines findings into structured Excel reports
- Storage & Delivery: Uploads reports to S3 and provides download links
Core Components
Component | Purpose |
---|---|
Web Scraper | Extracts HTML, CSS, JS, and PHP files |
Content Filter | Cleans and optimizes code for analysis |
LLM Chain | Processes content against quality checklists |
PageSpeed API | Retrieves performance metrics |
Report Generator | Creates structured Excel reports |
Queue Manager | Handles real-time result streaming |
Implementation
Analysis Workflow
Key Functions
Function | Purpose |
---|---|
scrape_url_content() | Extracts all website files and resources |
pagespeed_api_call() | Retrieves Google PageSpeed metrics |
batch_checklist_items() | Organizes analysis tasks into processable chunks |
stream_to_queue() | Provides real-time progress updates |
create_excel_report() | Generates formatted analysis reports |
API Reference
Analysis Endpoint
Streaming Analysis
Configuration
Analysis Parameters
Parameter | Type | Default | Description |
---|---|---|---|
batchSize | integer | 10 | Checklist items per LLM call |
maxPages | integer | 50 | Maximum pages to analyze |
includeSubdomains | boolean | false | Analyze subdomain pages |
performanceThreshold | integer | 70 | Minimum acceptable performance score |
Supported Check Categories
- SEO Analysis: Meta tags, structured data, URL structure
- Performance: Load times, resource optimization, Core Web Vitals
- Accessibility: WCAG compliance, keyboard navigation, screen reader support
- Security: HTTPS usage, security headers, vulnerability scanning
- Code Quality: HTML validation, CSS optimization, JavaScript errors
Usage Examples
Basic Website Analysis
Custom Analysis with Options
Python Implementation
Performance & Limits
Processing Metrics
- Average Analysis Time: 30-90 seconds per website
- Concurrent Analyses: Up to 5 websites simultaneously
- Page Limit: 50 pages per analysis (configurable)
Rate Limits
- API Requests: 50 analyses per hour per API key
- Concurrent Jobs: 5 active analyses
- Token Usage: Tracked and reported per analysis
File Constraints
- Website Size: No explicit limit (processed page by page)
- Analysis Depth: Configurable crawl depth (default: 3 levels)
- Report Size: Excel reports typically 1-5MB
Error Handling
Common Error Responses
Status Code | Error Type | Description | Solution |
---|---|---|---|
400 | INVALID_URL | URL format invalid or unreachable | Verify URL format and accessibility |
429 | RATE_LIMIT_EXCEEDED | Too many concurrent analyses | Wait before starting new analysis |
500 | SCRAPING_FAILED | Unable to extract website content | Check website accessibility and robots.txt |
503 | PAGESPEED_UNAVAILABLE | Google PageSpeed API error | Retry analysis or skip performance metrics |
Error Response Format
Integration Guide
Authentication
All API requests require Bearer token authentication:Webhook Notifications
Configure webhooks for analysis completion:Report Processing
Excel reports include multiple worksheets:- Summary: Overall scores and key findings
- SEO Issues: Detailed SEO recommendations
- Performance: PageSpeed metrics and optimization suggestions
- Accessibility: WCAG compliance issues
- Security: Security headers and vulnerability findings