In modern cloud applications, observability isn’t optional—it’s essential. Easy Deploy provides comprehensive observability out of the box, giving you complete visibility into your applications without the complexity of traditional monitoring setups.
The Three Pillars of Observability
1. Metrics: What’s Happening
Automatic collection of:
- Application metrics (requests, errors, latency)
- Infrastructure metrics (CPU, memory, disk, network)
- Business metrics (orders, signups, revenue)
- Custom metrics (your specific KPIs)
2. Logs: Why It Happened
Centralized logging with:
- Structured JSON logs
- Full-text search
- Real-time streaming
- Long-term retention
3. Traces: How It Happened
Distributed tracing showing:
- Request flow across services
- Performance bottlenecks
- Dependency mapping
- Error propagation
Zero-Configuration Monitoring
Automatic Instrumentation
The moment you deploy, monitoring is live:
# Deploy your app
easy-deploy deploy
# Monitoring is automatically configured:
# ✓ Metrics collection
# ✓ Log aggregation
# ✓ Distributed tracing
# ✓ Health checks
# ✓ Alerting rules
# ✓ Dashboards
No agents to install. No SDKs to configure. No dashboards to build.
What’s Monitored Automatically
Application Layer:
automatic_metrics:
http:
- request_count
- request_duration (p50, p95, p99)
- status_codes (2xx, 3xx, 4xx, 5xx)
- active_connections
performance:
- throughput (requests/sec)
- error_rate (percentage)
- apdex_score
dependencies:
- database_query_time
- cache_hit_rate
- external_api_latency
Infrastructure Layer:
system_metrics:
compute:
- cpu_utilization
- memory_usage
- disk_io
- network_throughput
containers:
- container_count
- restart_count
- image_pull_time
- scheduling_latency
Real-Time Dashboards
Application Dashboard
┌──────────────────────────────────────────────────────┐
│ my-app-production Last 5 minutes │
├──────────────────────────────────────────────────────┤
│ │
│ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░ Requests: 45.2K/min (+12%) │
│ ░░░░░░░░░░░░░░░░ Errors: 0.02% (↓ 0.01%) │
│ ▓▓▓▓▓▓▓▓░░░░░░░░ Latency p95: 234ms (↓ 45ms) │
│ │
├──────────────────────────────────────────────────────┤
│ Top Endpoints Requests p95 │
│ GET /api/products 12.3K 145ms │
│ POST /api/checkout 3.2K 456ms │
│ GET /api/user/profile 8.1K 89ms │
├──────────────────────────────────────────────────────┤
│ Errors (Last hour) │
│ 500 Internal Error 12 0.02% │
│ └─ /api/payment/process │
│ 429 Too Many Requests 8 0.01% │
│ └─ /api/search │
└──────────────────────────────────────────────────────┘
Infrastructure Dashboard
┌──────────────────────────────────────────────────────┐
│ Resource Utilization │
├──────────────────────────────────────────────────────┤
│ CPU: [████████░░░░░░░░] 42% (8 vCPUs) │
│ Memory: [██████████░░░░░░] 58% (16 GB) │
│ Disk: [███░░░░░░░░░░░░░] 23% (500 GB) │
│ Network: [████████████░░░░] 67% (10 Gbps) │
├──────────────────────────────────────────────────────┤
│ Active Containers: 12/20 │
│ api-service: 4 healthy │
│ web-service: 3 healthy │
│ worker-service: 5 healthy │
└──────────────────────────────────────────────────────┘
Advanced Metrics Collection
Custom Business Metrics
Track what matters to your business:
import { metrics } from '@easy-deploy/sdk';
// Track business events
metrics.increment('orders.completed', {
tags: { payment_method: 'stripe', region: 'us-east' }
});
// Measure business values
metrics.gauge('inventory.stock', 1543, {
tags: { product_id: 'SKU-12345' }
});
// Time operations
const timer = metrics.startTimer('checkout.duration');
await processCheckout();
timer.end();
// Distribution metrics
metrics.histogram('order.value', 149.99, {
tags: { currency: 'USD' }
});
Query Custom Metrics
# CLI query
easy-deploy metrics query \
"orders.completed" \
--group-by payment_method \
--time-range 24h
# Output:
stripe: 12,345 orders ($1.2M revenue)
paypal: 3,892 orders ($387K revenue)
apple: 1,234 orders ($123K revenue)
Intelligent Alerting
Pre-Configured Alerts
Out of the box, you get alerts for:
default_alerts:
- name: High Error Rate
condition: error_rate > 1%
duration: 5m
severity: critical
- name: High Latency
condition: p95_latency > 2s
duration: 10m
severity: warning
- name: Low Availability
condition: uptime < 99.9%
duration: 5m
severity: critical
- name: High CPU
condition: cpu > 80%
duration: 15m
severity: warning
- name: Memory Pressure
condition: memory > 85%
duration: 10m
severity: warning
Custom Alert Rules
# easy-deploy.yml
alerts:
- name: Low Conversion Rate
query: "(checkouts_completed / sessions_started) < 0.02"
duration: 30m
severity: warning
channels:
- slack: "#revenue-alerts"
- pagerduty: "oncall-team"
- name: Payment Failures Spike
query: "increase(payment_failures[5m]) > 100"
severity: critical
channels:
- pagerduty: "payments-team"
- slack: "#payments-critical"
- name: Inventory Low
query: "inventory_count < 10"
severity: info
channels:
- email: "[email protected]"
Alert Routing
alert_routing:
# Business hours vs. After hours
- match:
time: "09:00 to 17:00"
days: ["Mon", "Tue", "Wed", "Thu", "Fri"]
channels:
- slack: "#alerts"
- match:
time: "17:00 to 09:00" # After hours
severity: critical
channels:
- pagerduty: "oncall"
# Team-specific routing
- match:
service: "payment-api"
channels:
- slack: "#payments-team"
- pagerduty: "payments-oncall"
Distributed Tracing
Automatic Trace Collection
Every request is traced across your entire stack:
Trace ID: 7f3a4b2c1d5e6789
[web-frontend] GET /checkout
│ Duration: 1,247ms
│ Status: 200
│
├─▶ [api-gateway] POST /api/v1/orders
│ │ Duration: 1,189ms
│ │ Status: 200
│ │
│ ├─▶ [auth-service] Verify JWT
│ │ Duration: 45ms
│ │ Status: 200
│ │
│ ├─▶ [order-service] Create Order
│ │ │ Duration: 892ms
│ │ │ Status: 200
│ │ │
│ │ ├─▶ [database] INSERT order
│ │ │ Duration: 234ms
│ │ │
│ │ ├─▶ [inventory-service] Reserve Items
│ │ │ Duration: 456ms
│ │ │ Status: 200
│ │ │
│ │ └─▶ [payment-service] Process Payment
│ │ Duration: 189ms
│ │ Status: 200
│ │
│ └─▶ [notification-service] Send Confirmation
│ Duration: 123ms
│ Status: 202 (Async)
Trace Analysis
# Find slow traces
easy-deploy traces query \
--duration ">2s" \
--service "api-gateway" \
--time-range 1h
# Analyze error traces
easy-deploy traces query \
--status "error" \
--group-by "error.type"
# Export trace for debugging
easy-deploy traces export --trace-id 7f3a4b2c1d5e6789
Service Dependency Map
Automatically generated from traces:
┌──────────────┐
│ web-frontend │
└───────┬────────┘
│
┌───────▼────────┐
│ api-gateway │
└────────┬────────┘
│
┌──────────┼──────────┐
│ │ │
┌─────▼───┐ ┌───▼────┐ ┌──▼─────┐
│auth-svc │ │order-svc│ │notif-svc│
└─────────┘ └────┬────┘ └────────┘
│
┌───────┼───────┐
│ │ │
┌────▼──┐ ┌─▼────┐ ┌▼──────┐
│inv-svc│ │pay-svc│ │database│
└───────┘ └───────┘ └────────┘
Log Management
Structured Logging
All logs are JSON-structured automatically:
{
"timestamp": "2025-01-08T14:32:15.123Z",
"level": "error",
"service": "api-gateway",
"trace_id": "7f3a4b2c1d5e6789",
"message": "Payment processing failed",
"error": {
"type": "PaymentDeclined",
"code": "insufficient_funds",
"message": "Card declined"
},
"context": {
"user_id": "usr_abc123",
"order_id": "ord_xyz789",
"amount": 149.99
}
}
Powerful Log Search
# Search logs
easy-deploy logs search \
'level:error AND service:api-gateway' \
--time-range 24h
# Filter by trace
easy-deploy logs search \
--trace-id 7f3a4b2c1d5e6789
# Complex queries
easy-deploy logs search \
'error.type:PaymentDeclined AND amount:>100' \
--group-by error.code \
--time-range 7d
Log Analytics
-- Query logs with SQL
SELECT
DATE_TRUNC('hour', timestamp) as hour,
COUNT(*) as error_count,
error.type as error_type
FROM logs
WHERE level = 'error'
AND service = 'payment-service'
AND timestamp > NOW() - INTERVAL '24 hours'
GROUP BY hour, error_type
ORDER BY error_count DESC;
Performance Profiling
Continuous Profiling
Understand performance in production:
profiling:
enabled: true
sampling_rate: 0.01 # 1% of requests
profiles:
- type: cpu
duration: 30s
interval: 5m
- type: memory
duration: 30s
interval: 15m
- type: goroutines # For Go apps
duration: 10s
interval: 10m
Flame Graphs
Visual performance analysis:
# Generate flame graph
easy-deploy profile cpu --duration 60s
# Output: Interactive flame graph showing:
# - Function call hierarchy
# - CPU time spent in each function
# - Hot paths (slowest code)
Memory Leak Detection
# Analyze memory growth
easy-deploy profile memory --compare \
--baseline "2025-01-07T00:00:00Z" \
--current "2025-01-08T00:00:00Z"
# Output: Objects that grew significantly
Real User Monitoring (RUM)
Frontend Performance
Track actual user experience:
// Automatically injected
import { rum } from '@easy-deploy/rum';
// Core Web Vitals tracked automatically:
// - Largest Contentful Paint (LCP)
// - First Input Delay (FID)
// - Cumulative Layout Shift (CLS)
// - Time to First Byte (TTFB)
User Session Replay
# View user sessions
easy-deploy rum sessions \
--filter "error:true" \
--time-range 24h
# Replay specific session
easy-deploy rum replay --session-id ses_abc123
Synthetic Monitoring
Health Checks
health_checks:
- name: Homepage
url: https://myapp.com
interval: 60s
timeout: 5s
regions:
- "us-east-1"
- "eu-west-1"
- "ap-southeast-1"
- name: API Health
url: https://api.myapp.com/health
interval: 30s
expect:
status: 200
body_contains: "healthy"
- name: Database Connectivity
url: https://api.myapp.com/health/db
interval: 60s
expect:
status: 200
response_time: "less than 100ms"
Uptime Monitoring
Uptime Report (Last 30 days)
Homepage: 99.98% ✓ Excellent
API Endpoints: 99.95% ✓ Good
Database: 99.99% ✓ Excellent
Incidents: 2
1. 2025-01-03 14:23 to 14:47 (24 min)
API rate limit exceeded during traffic spike
2. 2025-01-15 03:12 to 03:15 (3 min)
Database connection timeout
Cost Monitoring
Resource Cost Attribution
# View costs by service
easy-deploy costs show \
--group-by service \
--time-range 30d
# Output:
api-service: $2,345 (38%)
web-service: $1,892 (31%)
database: $1,234 (20%)
cache: $432 ( 7%)
other: $245 ( 4%)
Total: $6,148
Cost Anomaly Detection
# Detect unusual spending
easy-deploy costs anomalies
# Output:
⚠ Anomaly Detected: api-service
Current: $3,124/week
Expected: $2,100/week (+49%)
Likely cause: Instance count increased from 12 → 18
Recommendation: Review auto-scaling policies
Collaborative Debugging
Team Annotations
# Add deployment annotation
easy-deploy annotate "Deployed v2.3.1 with performance fixes"
# Annotations show on all dashboards and graphs
Incident Management
# Declare incident
easy-deploy incident create \
--title "High error rate in payment service" \
--severity high \
--assign @payments-team
# Updates automatically tracked:
# - Timeline of events
# - Metrics during incident
# - Actions taken
# - Resolution notes
Postmortem Reports
# Generate postmortem
easy-deploy incident report inc_123
# Auto-generated report includes:
# - Timeline with metrics
# - Logs during incident
# - Traces of failing requests
# - Actions taken
# - Root cause analysis
Integration Ecosystem
Alerting Channels
integrations:
slack:
- workspace: mycompany
channel: "#alerts"
webhook_url: ${SLACK_WEBHOOK}
pagerduty:
- integration_key: ${PAGERDUTY_KEY}
escalation_policy: "Engineering Oncall"
email:
- recipients:
- [email protected]
- [email protected]
microsoft_teams:
- webhook_url: ${TEAMS_WEBHOOK}
opsgenie:
- api_key: ${OPSGENIE_KEY}
team: "Platform Team"
External Monitoring Tools
# Export to existing tools
exports:
prometheus:
enabled: true
endpoint: /metrics
datadog:
enabled: true
api_key: ${DATADOG_KEY}
new_relic:
enabled: true
license_key: ${NEW_RELIC_KEY}
grafana:
enabled: true
datasource: prometheus
Machine Learning Insights
Anomaly Detection
AI-powered detection of unusual patterns:
ml_insights:
anomaly_detection:
enabled: true
sensitivity: medium # low, medium, high
metrics:
- request_rate
- error_rate
- latency_p95
- cpu_utilization
Predictive Alerts
Get notified before problems occur:
🔮 Predictive Alert: Database Connection Pool
Current: 45% utilization
Predicted (next 2h): 92% utilization
Confidence: 87%
Recommendation: Increase pool size from 20 → 30
Est. cost impact: +$12/month
Capacity Planning
# Forecast resource needs
easy-deploy capacity forecast --days 30
# Output:
Based on growth trends:
In 30 days:
Expected traffic: +45%
Required capacity: 18 → 26 instances
Estimated cost: $2,345 → $3,012 (+28%)
Recommendations:
1. Enable auto-scaling (saves ~$400/month)
2. Consider reserved instances (saves ~$600/month)
3. Optimize hot paths (reduces capacity needs by 20%)
Best Practices
1. Define SLOs
service_level_objectives:
availability:
target: 99.9%
window: 30d
latency_p95:
target: "less than 500ms"
window: 7d
error_rate:
target: "less than 0.1%"
window: 24h
2. Alert Fatigue Prevention
alert_policies:
# Group similar alerts
grouping:
interval: 5m
by: [service, severity]
# Rate limit notifications
throttling:
max_alerts_per_hour: 10
# Auto-resolve stale alerts
auto_resolve: 1h
3. On-Call Runbooks
runbooks:
- alert: HighErrorRate
title: "High Error Rate Detected"
steps:
- "Check recent deployments"
- "Review error logs for patterns"
- "Check dependent service health"
- "Consider rolling back if needed"
commands:
- easy-deploy logs search 'level:error'
- easy-deploy traces query --status error
- easy-deploy rollback --if-needed
Getting Started
# Observability is automatic!
easy-deploy deploy
# View dashboards
easy-deploy dashboard
# Query metrics
easy-deploy metrics query "request_rate"
# Search logs
easy-deploy logs search "error"
# View traces
easy-deploy traces list
Conclusion
Complete observability shouldn’t require a team of specialists and months of setup. Easy Deploy provides enterprise-grade monitoring, logging, and tracing automatically—so you can focus on building features instead of debugging in the dark.
Every Easy Deploy application gets:
- Real-time metrics across your entire stack
- Centralized logging with powerful search
- Distributed tracing showing request flow
- Intelligent alerting that prevents alert fatigue
- Cost monitoring to optimize spending
Join thousands of teams who’ve replaced complex monitoring stacks with Easy Deploy’s integrated observability.
Start your free trial and see everything, fix anything.
Next Steps
- Get started: Deploy with observability
- Learn more: Platform Engineering at Scale
- Watch demo: Observability in action
- Read docs: Monitoring guide