Monitoring, Security & Maintenance
This final article covers monitoring, security best practices, and maintenance strategies for Cloud AI agents.
Monitoring
Cloud Monitoring Setup
Track key metrics:
- Request count and latency
- Error rates
- Token refresh frequency
- MCP call success rates
- Database query performance
Logging
Local Development:
docker-compose logs -f support-agent-adk
docker-compose logs -f mcp-crm
docker-compose logs -f mcp-product-catalogue
Cloud:
# View logs
gcloud logging read "resource.type=cloud_run_revision" --limit 50
# Filter by service
gcloud logging read "resource.labels.service_name=agent-order-receiver"
Health Checks
All services expose /health endpoints:
curl https://agent-order-receiver-xxx.run.app/health
Debug Mode
Set DEBUG_HTTP=true to enable:
- Request/response logging
- Authentication debug logs
- MCP protocol debug logs
Security Best Practices
1. Secrets Management
- Never commit secrets to git
- Use Secret Manager for all sensitive data
- Rotate secrets regularly
- Store service account keys in GCP Secret Manager
2. Authentication
- Always validate Firebase tokens
- Use service account tokens for service-to-service calls
- Implement rate limiting
- Check token expiration
3. Network Security
- Use HTTPS for all endpoints
- Configure CORS properly
- Use VPC connector for private services (optional)
- Implement rate limiting
4. Error Handling
- Don’t leak sensitive information in errors
- Log authentication failures
- Implement proper error codes
- Return generic error messages to clients
Troubleshooting
Common Issues
1. Token Expired
- Service account tokens auto-refresh
- Check
MCP_AUTH_TOKENis set in environment - Verify
FIREBASE_PROJECT_IDis configured
2. MCP Call Fails
- Check MCP server is running
- Verify service account token is valid
- Check network connectivity between services
3. Permission Denied
- Verify user has required permissions
- Check token includes permission claims
- Service accounts have full access by default
4. CORS Errors
- Verify
CORS_ORIGINSincludes frontend URL - Check Cloud Run CORS configuration
- Ensure frontend sends correct headers
Maintenance
Regular Tasks
- Monitor Error Rates: Set up alerts for high error rates
- Review Logs: Check for authentication failures or unusual patterns
- Update Dependencies: Keep packages up to date
- Rotate Secrets: Regularly rotate service account keys
- Backup Data: Regular Firestore backups
Cost Optimization
- Cloud Run: Use min instances = 0 for cost savings
- Firestore: Use on-demand pricing for low traffic
- Cloud Storage: Use standard storage class
- Monitoring: Set up budget alerts
Best Practices Summary
- Architecture: Microservices with clear separation of concerns
- Security: Token exchange pattern, proper authentication
- Protocol: Standardized MCP JSON-RPC 2.0 communication
- Deployment: Cloud-native with auto-scaling
- Monitoring: Comprehensive logging and metrics
- Maintenance: Regular updates and security checks
Conclusion
This series covered building, deploying, and maintaining production-ready AI agents in the cloud. Key takeaways:
- Architecture matters: Microservices enable scalability and maintainability
- Security is critical: Proper authentication and authorization are essential
- Standard protocols: MCP provides consistent agent-to-service communication
- Cloud-native: Leverage managed services for reliability
- Observability: Monitor and log everything for troubleshooting
For more details, refer to the implementation documentation and code examples throughout this series.