Why these topics belong together
Once you understand:
- object-oriented design
- database schema design
- how domain objects and tables relate
the next real question is:
How do I make persistence work cleanly in an actual system?
That is where these topics matter:
- aggregates
- transaction boundaries
- repositories
- ORMs
- persistence patterns
They are all connected.
You cannot design good repositories if you do not understand aggregates.
You cannot define sound transaction boundaries if you do not understand consistency boundaries.
You cannot use an ORM well if you do not understand what should and should not be loaded, saved, or exposed.
So this note treats them as one connected topic.
The core idea
When building a real system, you need answers to these questions:
- what group of objects should be treated as one unit of consistency?
- what should load and save together?
- what changes must happen atomically?
- how should domain code access persistence?
- what should the ORM do, and what should it not do?
Those questions are at the center of practical system design.
First: what an aggregate is
An aggregate is a cluster of related domain objects that should be treated as one consistency boundary.
Inside an aggregate:
- objects are closely related
- invariants are enforced together
- updates are coordinated through one main object
That main object is called the aggregate root.
Examples:
OrderwithOrderItemsCartwithCartItemsInvoicewithInvoiceLinesReservationwith its internal state and policy-relevant data
The aggregate root is the object through which outside code interacts with the aggregate.
Example:
- outside code should modify
OrderItems throughOrder - not by mutating
OrderItemindependently from anywhere
This is one of the most important ideas in domain-driven and object-oriented design.
Why aggregates matter
Aggregates solve several real problems:
1. They protect invariants
Example:
- order total must match its items
- order item quantity must be positive
- a cancelled order cannot accept new items
If all related state is governed through one aggregate boundary, it becomes much easier to protect these rules.
2. They define transaction boundaries
Usually, what must remain strongly consistent together belongs in one aggregate.
3. They reduce uncontrolled coupling
Without aggregates, any part of the system may mutate any related object directly.
That often produces:
- inconsistent state
- unclear ownership
- fragile update logic
4. They shape repository design
Repositories usually work at the aggregate-root level.
Example:
OrderRepository- not usually
OrderItemRepositoryfor arbitrary business writes
What belongs inside one aggregate
A practical rule:
Put objects in one aggregate when:
- they are tightly related
- they must stay consistent together
- one root can naturally govern their lifecycle
- they are commonly created, changed, and saved together
Example:
Order aggregate:
OrderOrderItem- maybe shipping address snapshot
Why?
Because:
- order items belong to one order
- they have no strong independent business life outside it
- many core rules apply across them together
What should not be in the same aggregate
Do not put everything related in one aggregate.
That is a common mistake.
Example:
Should Order contain full Customer, PaymentProvider, Warehouse, ProductCatalog, and ShipmentHistory objects directly as one giant aggregate?
Usually no.
Why not?
Because:
- they have separate lifecycles
- they change independently
- loading them all together is expensive
- one huge transaction boundary is hard to scale and reason about
Strong design depends on resisting the urge to make aggregates too large.
A practical aggregate heuristic
When deciding aggregate boundaries, ask:
- What must always be consistent immediately?
- What changes together in one use case?
- What object naturally owns the rule?
- What can be referenced by identity instead of loaded fully?
- What would become too expensive or too coupled if included?
These questions are more useful than memorizing formal definitions.
Aggregate root responsibilities
The aggregate root usually:
- exposes the main behaviors
- protects invariants
- controls child object access
- is the unit loaded and saved through the repository
Example:
Order might expose:
addItem(productId, qty, price)cancel()markPaid()
It may internally manage:
- item list
- total recomputation
- valid status transitions
Outside code should not casually reach inside and mutate items or status directly.
Example: order aggregate
Let us make this concrete.
Suppose the system supports:
- creating orders
- adding items
- cancelling before shipment
- recording payment
Good aggregate boundary
Aggregate root:
Order
Inside aggregate:
OrderItem
Referenced from outside by identity or service boundary:
CustomerProductPaymentGateway
Why this is good
Because:
- order and order items must stay consistent together
- product catalog has its own lifecycle
- customer is not owned by order
- payment integration is an external concern
This gives you a manageable boundary.
Example: school enrollment aggregate
Suppose:
- students enroll in courses
- enrollment has status and grade
A naive design may try:
Studentaggregate containing all enrollments and all courses forever
That is often too broad.
A better approach might be:
Enrollmentas an aggregate root
Why?
Because:
- enrollment has its own lifecycle
- status and grade change around that concept
- student and course can be referenced by identity
This example is useful because it shows that aggregate roots are not always the “largest obvious noun.”
Transaction boundaries: what they really mean
A transaction boundary defines what changes must succeed or fail together.
In relational databases, this usually means atomic commit or rollback.
Example:
- create order
- create order items
- mark order status as placed
If one fails, should the others remain?
Usually no.
So these should happen in one transaction.
That is a transaction boundary.
How aggregates and transactions relate
This is the key connection:
Aggregates often define natural transaction boundaries.
Why?
Because the aggregate is the consistency boundary.
If:
- order status
- order items
- total
must remain consistent together, then updating them together in one transaction makes sense.
This is one reason small, focused aggregates are powerful.
They make transactional thinking cleaner.
Not every business process is one transaction
This is a crucial practical point.
A whole use case may span multiple steps and systems, but not all of it belongs in one database transaction.
Example checkout flow:
- create order
- reserve inventory
- charge payment
- send email
Should this all be one local database transaction?
Usually no.
Why?
Because:
- external systems are involved
- long-running transactions are risky
- failures may require compensation rather than rollback
So distinguish:
- aggregate transaction
- overall business workflow
This distinction is essential in real systems.
Strong consistency vs eventual consistency
When all related changes must be correct immediately, use strong consistency inside a transaction boundary.
Example:
- order cannot contain negative quantity
- order item must belong to existing order
- reservation check-out must be after check-in
When different parts of the system can converge shortly after, eventual consistency may be acceptable.
Example:
- order placed now, email sent a few seconds later
- order paid now, analytics updated later
- booking confirmed now, recommendation system refreshed later
This is often where aggregates stop and broader workflow coordination begins.
Repositories: what they are for
A repository is a persistence abstraction that gives domain or application code a clean way to load and save aggregates.
Typical responsibilities:
- retrieve aggregate roots
- persist aggregate roots
- hide storage details
- expose meaningful query methods
Examples:
OrderRepositoryReservationRepositoryEnrollmentRepository
The repository is not the domain model itself.
It is the bridge between domain logic and persistence infrastructure.
What a good repository looks like
A good repository usually:
- works with aggregate roots
- uses domain language
- exposes a small, meaningful API
- hides ORM and SQL details from domain logic
Example:
OrderRepository
findById(orderId)
save(order)
findPendingOrdersBefore(date)That is clean because the methods reflect business-relevant access patterns.
What a bad repository looks like
A bad repository often:
- mirrors tables mechanically
- exposes too many generic operations
- leaks SQL or ORM concepts everywhere
- exists for every tiny child entity without domain reason
Example smell:
OrderItemRepository
updateQuantityById(...)
setPriceById(...)
setOrderIdById(...)Why is this suspicious?
Because it bypasses the aggregate root and weakens invariant protection.
Why repositories usually target aggregate roots
This is one of the most important practical rules.
If Order is the aggregate root, then:
- load the
Orderaggregate - change it through root behavior
- save it through
OrderRepository
This keeps:
- consistency localized
- invariants enforceable
- transaction boundaries clear
If instead random code directly updates OrderItem, OrderStatus, and payment flags separately, the design loses coherence quickly.
Repositories vs DAOs vs query services
These are related, but not identical.
Repository
Focus:
- domain-friendly access to aggregates
DAO or low-level data access object
Focus:
- lower-level database interaction
- row-oriented operations
Query service or read service
Focus:
- reporting
- projections
- read-optimized access
This distinction matters because not every read use case should force loading a full aggregate.
Example:
- admin dashboard showing order counts by status probably should not load full
Orderobjects
That may be better served by:
- direct SQL
- projection query
- read model
This is an important maturity step in persistence design.
ORMs: what they are good at
An ORM maps object structures to relational persistence.
It helps with:
- object-table mapping
- loading and saving entities
- handling foreign-key relationships
- reducing repetitive SQL for common cases
Used well, an ORM can improve productivity.
It is especially useful for:
- standard CRUD persistence
- aggregate loading and saving
- transaction integration
- identity tracking in unit-of-work patterns
ORMs: what they are bad at
An ORM is not a substitute for design thinking.
It does not automatically solve:
- aggregate boundaries
- query optimization
- transaction strategy
- domain modeling
- reporting complexity
Bad ORM usage often causes:
- giant object graphs loaded accidentally
- N+1 query problems
- anemic models driven by mapping convenience
- business logic leaking into persistence classes
So the rule is:
use the ORM as a tool, not as the architect
The ORM trap: letting mappings define the design
This is a common beginner mistake.
People start thinking:
- “If the ORM supports this relation, this must be my model.”
That is backward.
The correct order is:
- design domain and aggregate boundaries first
- then map them through the ORM
If you let the ORM drive the design, you often get:
- oversized aggregates
- lazy-loading surprises
- weak boundaries
- persistence-shaped domain objects
Lazy loading, eager loading, and why they matter
ORMs often support:
- lazy loading
- eager loading
Lazy loading
Related data loads only when accessed.
Good:
- can reduce unnecessary loading
Risk:
- hidden queries
- N+1 problems
- unpredictable performance
Eager loading
Related data loads up front.
Good:
- more predictable for known aggregate needs
Risk:
- over-fetching
- heavy object graphs
The right choice depends on:
- aggregate boundaries
- use case
- query patterns
This is why persistence design and performance thinking must be connected.
Persistence patterns you should know
You do not need every pattern, but these are important.
1. Repository pattern
Purpose:
- give domain/application logic clean persistence access
Best for:
- aggregate roots
2. Unit of work
Purpose:
- track changed objects
- commit them together in one transaction
Often this is partly provided by ORMs.
Useful because:
- one logical business operation may change multiple related objects
3. Data mapper
Purpose:
- separate in-memory objects from database mapping logic
This is a key pattern behind many ORMs.
4. Active record
Purpose:
- combine data and persistence methods in the same class
Example style:
order.save()user.delete()
This can be simple and productive in smaller systems.
But for richer domain models it can blur boundaries between:
- domain behavior
- persistence concerns
5. CQRS-style separation for reads and writes
Purpose:
- use one model for transactional writes
- another for optimized reads
This is helpful when:
- read needs are very different from aggregate write needs
- dashboards and reports would otherwise distort domain repositories
Active Record vs Repository/Data Mapper
This is worth understanding clearly.
Active Record
Strengths:
- simple
- easy for CRUD-heavy apps
- low ceremony
Weaknesses:
- domain and persistence are tightly mixed
- scales poorly for rich domain logic
- encourages table-shaped thinking
Repository/Data Mapper style
Strengths:
- better separation of concerns
- stronger fit for rich OOD
- more control over aggregate boundaries
Weaknesses:
- more design discipline required
- more abstraction to manage
Neither is “universally correct.”
For deeper object-oriented systems, repository/data-mapper style is often stronger.
Designing repository methods from use cases
A useful practice:
Do not invent repository methods from tables.
Invent them from actual use cases.
Bad:
findByCustomerIdAndStatusAndDateAndFlagAndType(...)everywhere by habit
Better:
findPendingOrdersBefore(date)findActiveReservation(reservationId)save(order)
This keeps persistence API aligned with the application’s needs.
Example: checkout flow with proper boundaries
Suppose:
- customer checks out cart
- system creates order
- system charges payment
Aggregate
Order aggregate:
OrderOrderItem
Repository
OrderRepository
- load/save order aggregate
Transaction boundary
One local transaction may cover:
- insert order
- insert order items
- set initial status
External workflow
Payment charge may happen:
- before final confirmation
- after order creation
- with compensation logic if it fails
Why this matters
This avoids pretending the entire distributed process is one simple database save.
That is realistic system design.
Example: reservation system
Suppose:
- guest makes reservation
- room availability must be protected
- payment is captured later
Possible aggregate
Reservation aggregate:
- reservation details
- status
- reserved room reference
- date range
Possible transaction
One transaction may:
- create reservation
- mark slot as reserved or decrement availability model
Outside transaction
Later workflow may:
- capture payment
- send confirmation email
- notify reporting system
This is a good example of using transactions for strong consistency where needed, not for every side effect.
Example: when not to use repository-only thinking for reads
Suppose the business wants:
- monthly revenue dashboard
- orders grouped by region and payment type
- top products by week
Should you satisfy this by loading many Order aggregates through OrderRepository?
Usually no.
That is not aggregate-oriented domain behavior.
This is a read/reporting problem.
Better options:
- SQL query service
- reporting view
- projection table
- analytics pipeline
This is where mature persistence design avoids forcing one abstraction onto every problem.
How to decide what loads together
Ask:
- what does this use case need immediately?
- what data participates in the invariant?
- what data is only reference information?
- what would be too expensive to load routinely?
Example:
For Order:
- load order and items together if order logic depends on them
- do not always load full customer history or product catalog snapshots unless needed
Good loading decisions come from aggregate boundaries plus use-case needs.
Concurrency and transaction thinking
When multiple users or processes may update the same data, persistence design must consider concurrency.
Questions to ask:
- what if two users modify the same order?
- what if two requests try to reserve the last room?
- what if inventory is nearly exhausted?
This affects:
- locking strategy
- optimistic or pessimistic concurrency
- transaction isolation needs
- aggregate boundary design
You do not need advanced database theory for every system, but you should account for consistency under concurrent updates where it matters.
Practical design flow from scratch
Here is a real workflow you can use.
Step 1: design the domain model
Identify:
- entities
- value objects
- behaviors
- invariants
Step 2: identify aggregates
Ask:
- what must stay consistent together?
- what root should govern each cluster?
Step 3: define transaction boundaries
Ask:
- what changes must commit atomically?
- what side effects can happen outside the transaction?
Step 4: define repositories around aggregate roots
Ask:
- what aggregates need loading and saving?
- what repository API matches real use cases?
Step 5: choose ORM usage deliberately
Ask:
- which mappings are straightforward?
- where do I need custom queries?
- where should I avoid loading full aggregates?
Step 6: separate write model from read model when necessary
Ask:
- are reporting and dashboard queries distorting my aggregate design?
If yes, introduce specialized read pathways.
This flow is practical and scalable.
Common mistakes
1. Giant aggregates
This causes:
- slow loading
- broad transactions
- heavy coupling
2. Tiny aggregates that ignore invariants
This causes:
- rules scattered across services
- weak consistency
3. Repositories for every child object
This often bypasses aggregate-root control.
4. Treating the ORM as the architecture
This leads to poor boundaries and persistence-driven design.
5. Using repositories for analytics-style reads
This often loads too much and mixes concerns.
6. Making one entire business workflow a single transaction
This becomes unrealistic once external systems are involved.
7. Ignoring concurrency
This works until real traffic and race conditions appear.
How these concepts fit together
Aggregates define consistency boundaries in the domain, with an aggregate root enforcing invariants over closely related objects. Those boundaries usually shape transaction boundaries, because changes inside one aggregate often need atomic persistence. Repositories should usually load and save aggregate roots rather than arbitrary child objects, so that domain rules stay protected. ORMs are useful for mapping aggregates to relational storage, but they should not define the domain model or repository design. For reads, especially dashboards or reports, separate query models or direct query services may be better than forcing everything through aggregate repositories.
Bottom line
If you are designing from scratch:
- define aggregates around true consistency needs
- keep aggregate roots in control of child state
- use transactions for what must be atomic, not for entire distributed workflows
- design repositories around aggregate roots and real use cases
- use ORMs as mapping tools, not as design engines
- separate read-heavy reporting concerns from write-side domain consistency when needed
When these pieces fit together, persistence stops being a random technical layer and becomes a coherent extension of your object-oriented design.