ModelingPre Aggregations
Rollup
Create summarized data tables for high-performance queries.
Overview
Rollup pre-aggregations summarize raw data into aggregated tables, grouped by specified dimensions. They're the most effective way to improve query performance.
Example
pre_aggregations:
- name: orders_daily
type: rollup
measures:
- count
- total_revenue
dimensions:
- status
- category
time_dimension: created_at
granularity: day
partition_granularity: monthTime-Based Rollups
Basic Time Rollup
- name: daily_orders
measures:
- count
time_dimension: created_at
granularity: dayPartitioned Rollup
Partition large datasets for efficient refreshes:
- name: orders_monthly_partitioned
measures:
- count
- total_revenue
time_dimension: created_at
granularity: day
partition_granularity: monthGranularity Options
| Granularity | Description |
|---|---|
second | Per-second aggregation |
minute | Per-minute aggregation |
hour | Hourly aggregation |
day | Daily aggregation |
week | Weekly aggregation |
month | Monthly aggregation |
quarter | Quarterly aggregation |
year | Yearly aggregation |
Refresh Strategies
Time-Based Refresh
- name: orders_hourly
measures:
- count
time_dimension: created_at
granularity: hour
refresh_key:
every: 1 hourIncremental Refresh
Only refresh recent partitions:
- name: orders_incremental
measures:
- count
- total_revenue
time_dimension: created_at
granularity: day
partition_granularity: month
refresh_key:
every: 1 day
incremental: true
update_window: 7 daySQL-Based Refresh
Refresh when data changes:
- name: orders_on_change
measures:
- count
refresh_key:
sql: SELECT MAX(updated_at) FROM ordersIndexes
Add indexes for better query performance:
- name: orders_by_status
measures:
- count
dimensions:
- status
- category
indexes:
- name: status_idx
columns:
- statusQuery Matching
A query uses a rollup if:
- All requested measures are in the rollup
- All requested dimensions are in the rollup
- Time dimension granularity is compatible
- Filters match the rollup's data
# This rollup...
- name: orders_daily
measures:
- count
- total_revenue
dimensions:
- status
time_dimension: created_at
granularity: day
# ...matches queries like:
# - orders.count by day
# - orders.total_revenue by status by week (day rolls up to week)
# - orders.count where status = 'completed' by monthBest Practices
- Match your queries — design rollups around common access patterns
- Use incremental refresh — for large time-series data
- Partition wisely — monthly partitions work well for most cases
- Add indexes — for high-cardinality dimension filters
See Also
- pre-aggregations
- cubes.dimensions.time
- cubes.measures