5. Aggregates, Transactions, Repositories, and ORMs

Why these topics belong together

Once you understand:

object-oriented design
database schema design
how domain objects and tables relate

the next real question is:

How do I make persistence work cleanly in an actual system?

That is where these topics matter:

aggregates
transaction boundaries
repositories
ORMs
persistence patterns

They are all connected.

You cannot design good repositories if you do not understand aggregates.

You cannot define sound transaction boundaries if you do not understand consistency boundaries.

You cannot use an ORM well if you do not understand what should and should not be loaded, saved, or exposed.

So this note treats them as one connected topic.

The core idea

When building a real system, you need answers to these questions:

what group of objects should be treated as one unit of consistency?
what should load and save together?
what changes must happen atomically?
how should domain code access persistence?
what should the ORM do, and what should it not do?

Those questions are at the center of practical system design.

First: what an aggregate is

An aggregate is a cluster of related domain objects that should be treated as one consistency boundary.

Inside an aggregate:

objects are closely related
invariants are enforced together
updates are coordinated through one main object

That main object is called the aggregate root.

Examples:

Order with OrderItems
Cart with CartItems
Invoice with InvoiceLines
Reservation with its internal state and policy-relevant data

The aggregate root is the object through which outside code interacts with the aggregate.

Example:

outside code should modify OrderItems through Order
not by mutating OrderItem independently from anywhere

This is one of the most important ideas in domain-driven and object-oriented design.

Why aggregates matter

Aggregates solve several real problems:

1. They protect invariants

Example:

order total must match its items
order item quantity must be positive
a cancelled order cannot accept new items

If all related state is governed through one aggregate boundary, it becomes much easier to protect these rules.

2. They define transaction boundaries

Usually, what must remain strongly consistent together belongs in one aggregate.

3. They reduce uncontrolled coupling

Without aggregates, any part of the system may mutate any related object directly.

That often produces:

inconsistent state
unclear ownership
fragile update logic

4. They shape repository design

Repositories usually work at the aggregate-root level.

Example:

OrderRepository
not usually OrderItemRepository for arbitrary business writes

What belongs inside one aggregate

A practical rule:

Put objects in one aggregate when:

they are tightly related
they must stay consistent together
one root can naturally govern their lifecycle
they are commonly created, changed, and saved together

Example:

Order aggregate:

Order
OrderItem
maybe shipping address snapshot

Why?

Because:

order items belong to one order
they have no strong independent business life outside it
many core rules apply across them together

What should not be in the same aggregate

Do not put everything related in one aggregate.

That is a common mistake.

Example:

Should Order contain full Customer, PaymentProvider, Warehouse, ProductCatalog, and ShipmentHistory objects directly as one giant aggregate?

Usually no.

Why not?

Because:

they have separate lifecycles
they change independently
loading them all together is expensive
one huge transaction boundary is hard to scale and reason about

Strong design depends on resisting the urge to make aggregates too large.

A practical aggregate heuristic

When deciding aggregate boundaries, ask:

What must always be consistent immediately?
What changes together in one use case?
What object naturally owns the rule?
What can be referenced by identity instead of loaded fully?
What would become too expensive or too coupled if included?

These questions are more useful than memorizing formal definitions.

Aggregate root responsibilities

The aggregate root usually:

exposes the main behaviors
protects invariants
controls child object access
is the unit loaded and saved through the repository

Example:

Order might expose:

addItem(productId, qty, price)
cancel()
markPaid()

It may internally manage:

item list
total recomputation
valid status transitions

Outside code should not casually reach inside and mutate items or status directly.

Example: order aggregate

Let us make this concrete.

Suppose the system supports:

creating orders
adding items
cancelling before shipment
recording payment

Good aggregate boundary

Aggregate root:

Order

Inside aggregate:

OrderItem

Referenced from outside by identity or service boundary:

Customer
Product
PaymentGateway

Why this is good

Because:

order and order items must stay consistent together
product catalog has its own lifecycle
customer is not owned by order
payment integration is an external concern

This gives you a manageable boundary.

Example: school enrollment aggregate

Suppose:

students enroll in courses
enrollment has status and grade

A naive design may try:

Student aggregate containing all enrollments and all courses forever

That is often too broad.

A better approach might be:

Enrollment as an aggregate root

Why?

Because:

enrollment has its own lifecycle
status and grade change around that concept
student and course can be referenced by identity

This example is useful because it shows that aggregate roots are not always the “largest obvious noun.”

Transaction boundaries: what they really mean

A transaction boundary defines what changes must succeed or fail together.

In relational databases, this usually means atomic commit or rollback.

Example:

create order
create order items
mark order status as placed

If one fails, should the others remain?

Usually no.

So these should happen in one transaction.

That is a transaction boundary.

How aggregates and transactions relate

This is the key connection:

Aggregates often define natural transaction boundaries.

Why?

Because the aggregate is the consistency boundary.

If:

order status
order items
total

must remain consistent together, then updating them together in one transaction makes sense.

This is one reason small, focused aggregates are powerful.

They make transactional thinking cleaner.

Not every business process is one transaction

This is a crucial practical point.

A whole use case may span multiple steps and systems, but not all of it belongs in one database transaction.

Example checkout flow:

create order
reserve inventory
charge payment
send email

Should this all be one local database transaction?

Usually no.

Why?

Because:

external systems are involved
long-running transactions are risky
failures may require compensation rather than rollback

So distinguish:

aggregate transaction
overall business workflow

This distinction is essential in real systems.

Strong consistency vs eventual consistency

When all related changes must be correct immediately, use strong consistency inside a transaction boundary.

Example:

order cannot contain negative quantity
order item must belong to existing order
reservation check-out must be after check-in

When different parts of the system can converge shortly after, eventual consistency may be acceptable.

Example:

order placed now, email sent a few seconds later
order paid now, analytics updated later
booking confirmed now, recommendation system refreshed later

This is often where aggregates stop and broader workflow coordination begins.

Repositories: what they are for

A repository is a persistence abstraction that gives domain or application code a clean way to load and save aggregates.

Typical responsibilities:

retrieve aggregate roots
persist aggregate roots
hide storage details
expose meaningful query methods

Examples:

OrderRepository
ReservationRepository
EnrollmentRepository

The repository is not the domain model itself.

It is the bridge between domain logic and persistence infrastructure.

What a good repository looks like

A good repository usually:

works with aggregate roots
uses domain language
exposes a small, meaningful API
hides ORM and SQL details from domain logic

Example:

OrderRepository
  findById(orderId)
  save(order)
  findPendingOrdersBefore(date)

That is clean because the methods reflect business-relevant access patterns.

What a bad repository looks like

A bad repository often:

mirrors tables mechanically
exposes too many generic operations
leaks SQL or ORM concepts everywhere
exists for every tiny child entity without domain reason

Example smell:

OrderItemRepository
  updateQuantityById(...)
  setPriceById(...)
  setOrderIdById(...)

Why is this suspicious?

Because it bypasses the aggregate root and weakens invariant protection.

Why repositories usually target aggregate roots

This is one of the most important practical rules.

If Order is the aggregate root, then:

load the Order aggregate
change it through root behavior
save it through OrderRepository

This keeps:

consistency localized
invariants enforceable
transaction boundaries clear

If instead random code directly updates OrderItem, OrderStatus, and payment flags separately, the design loses coherence quickly.

Repositories vs DAOs vs query services

These are related, but not identical.

Repository

Focus:

domain-friendly access to aggregates

DAO or low-level data access object

Focus:

lower-level database interaction
row-oriented operations

Query service or read service

Focus:

reporting
projections
read-optimized access

This distinction matters because not every read use case should force loading a full aggregate.

Example:

admin dashboard showing order counts by status probably should not load full Order objects

That may be better served by:

direct SQL
projection query
read model

This is an important maturity step in persistence design.

ORMs: what they are good at

An ORM maps object structures to relational persistence.

It helps with:

object-table mapping
loading and saving entities
handling foreign-key relationships
reducing repetitive SQL for common cases

Used well, an ORM can improve productivity.

It is especially useful for:

standard CRUD persistence
aggregate loading and saving
transaction integration
identity tracking in unit-of-work patterns

ORMs: what they are bad at

An ORM is not a substitute for design thinking.

It does not automatically solve:

aggregate boundaries
query optimization
transaction strategy
domain modeling
reporting complexity

Bad ORM usage often causes:

giant object graphs loaded accidentally
N+1 query problems
anemic models driven by mapping convenience
business logic leaking into persistence classes

So the rule is:

use the ORM as a tool, not as the architect

The ORM trap: letting mappings define the design

This is a common beginner mistake.

People start thinking:

“If the ORM supports this relation, this must be my model.”

That is backward.

The correct order is:

design domain and aggregate boundaries first
then map them through the ORM

If you let the ORM drive the design, you often get:

oversized aggregates
lazy-loading surprises
weak boundaries
persistence-shaped domain objects

Lazy loading, eager loading, and why they matter

ORMs often support:

lazy loading
eager loading

Lazy loading

Related data loads only when accessed.

Good:

can reduce unnecessary loading

Risk:

hidden queries
N+1 problems
unpredictable performance

Eager loading

Related data loads up front.

Good:

more predictable for known aggregate needs

Risk:

over-fetching
heavy object graphs

The right choice depends on:

aggregate boundaries
use case
query patterns

This is why persistence design and performance thinking must be connected.

Persistence patterns you should know

You do not need every pattern, but these are important.

1. Repository pattern

Purpose:

give domain/application logic clean persistence access

Best for:

aggregate roots

2. Unit of work

Purpose:

track changed objects
commit them together in one transaction

Often this is partly provided by ORMs.

Useful because:

one logical business operation may change multiple related objects

3. Data mapper

Purpose:

separate in-memory objects from database mapping logic

This is a key pattern behind many ORMs.

4. Active record

Purpose:

combine data and persistence methods in the same class

Example style:

order.save()
user.delete()

This can be simple and productive in smaller systems.

But for richer domain models it can blur boundaries between:

domain behavior
persistence concerns

5. CQRS-style separation for reads and writes

Purpose:

use one model for transactional writes
another for optimized reads

This is helpful when:

read needs are very different from aggregate write needs
dashboards and reports would otherwise distort domain repositories

Active Record vs Repository/Data Mapper

This is worth understanding clearly.

Active Record

Strengths:

simple
easy for CRUD-heavy apps
low ceremony

Weaknesses:

domain and persistence are tightly mixed
scales poorly for rich domain logic
encourages table-shaped thinking

Repository/Data Mapper style

Strengths:

better separation of concerns
stronger fit for rich OOD
more control over aggregate boundaries

Weaknesses:

more design discipline required
more abstraction to manage

Neither is “universally correct.”

For deeper object-oriented systems, repository/data-mapper style is often stronger.

Designing repository methods from use cases

A useful practice:

Do not invent repository methods from tables.

Invent them from actual use cases.

Bad:

findByCustomerIdAndStatusAndDateAndFlagAndType(...) everywhere by habit

Better:

findPendingOrdersBefore(date)
findActiveReservation(reservationId)
save(order)

This keeps persistence API aligned with the application’s needs.

Example: checkout flow with proper boundaries

Suppose:

customer checks out cart
system creates order
system charges payment

Aggregate

Order aggregate:

Order
OrderItem

Repository

OrderRepository

load/save order aggregate

Transaction boundary

One local transaction may cover:

insert order
insert order items
set initial status

External workflow

Payment charge may happen:

before final confirmation
after order creation
with compensation logic if it fails

Why this matters

This avoids pretending the entire distributed process is one simple database save.

That is realistic system design.

Example: reservation system

Suppose:

guest makes reservation
room availability must be protected
payment is captured later

Possible aggregate

Reservation aggregate:

reservation details
status
reserved room reference
date range

Possible transaction

One transaction may:

create reservation
mark slot as reserved or decrement availability model

Outside transaction

Later workflow may:

capture payment
send confirmation email
notify reporting system

This is a good example of using transactions for strong consistency where needed, not for every side effect.

Example: when not to use repository-only thinking for reads

Suppose the business wants:

monthly revenue dashboard
orders grouped by region and payment type
top products by week

Should you satisfy this by loading many Order aggregates through OrderRepository?

Usually no.

That is not aggregate-oriented domain behavior.

This is a read/reporting problem.

Better options:

SQL query service
reporting view
projection table
analytics pipeline

This is where mature persistence design avoids forcing one abstraction onto every problem.

How to decide what loads together

Ask:

what does this use case need immediately?
what data participates in the invariant?
what data is only reference information?
what would be too expensive to load routinely?

Example:

For Order:

load order and items together if order logic depends on them
do not always load full customer history or product catalog snapshots unless needed

Good loading decisions come from aggregate boundaries plus use-case needs.

Concurrency and transaction thinking

When multiple users or processes may update the same data, persistence design must consider concurrency.

Questions to ask:

what if two users modify the same order?
what if two requests try to reserve the last room?
what if inventory is nearly exhausted?

This affects:

locking strategy
optimistic or pessimistic concurrency
transaction isolation needs
aggregate boundary design

You do not need advanced database theory for every system, but you should account for consistency under concurrent updates where it matters.

Practical design flow from scratch

Here is a real workflow you can use.

Step 1: design the domain model

Identify:

entities
value objects
behaviors
invariants

Step 2: identify aggregates

Ask:

what must stay consistent together?
what root should govern each cluster?

Step 3: define transaction boundaries

Ask:

what changes must commit atomically?
what side effects can happen outside the transaction?

Step 4: define repositories around aggregate roots

Ask:

what aggregates need loading and saving?
what repository API matches real use cases?

Step 5: choose ORM usage deliberately

Ask:

which mappings are straightforward?
where do I need custom queries?
where should I avoid loading full aggregates?

Step 6: separate write model from read model when necessary

Ask:

are reporting and dashboard queries distorting my aggregate design?

If yes, introduce specialized read pathways.

This flow is practical and scalable.

Common mistakes

1. Giant aggregates

This causes:

slow loading
broad transactions
heavy coupling

2. Tiny aggregates that ignore invariants

This causes:

rules scattered across services
weak consistency

3. Repositories for every child object

This often bypasses aggregate-root control.

4. Treating the ORM as the architecture

This leads to poor boundaries and persistence-driven design.

5. Using repositories for analytics-style reads

This often loads too much and mixes concerns.

6. Making one entire business workflow a single transaction

This becomes unrealistic once external systems are involved.

7. Ignoring concurrency

This works until real traffic and race conditions appear.

How these concepts fit together

Aggregates define consistency boundaries in the domain, with an aggregate root enforcing invariants over closely related objects. Those boundaries usually shape transaction boundaries, because changes inside one aggregate often need atomic persistence. Repositories should usually load and save aggregate roots rather than arbitrary child objects, so that domain rules stay protected. ORMs are useful for mapping aggregates to relational storage, but they should not define the domain model or repository design. For reads, especially dashboards or reports, separate query models or direct query services may be better than forcing everything through aggregate repositories.

Bottom line

If you are designing from scratch:

define aggregates around true consistency needs
keep aggregate roots in control of child state
use transactions for what must be atomic, not for entire distributed workflows
design repositories around aggregate roots and real use cases
use ORMs as mapping tools, not as design engines
separate read-heavy reporting concerns from write-side domain consistency when needed

When these pieces fit together, persistence stops being a random technical layer and becomes a coherent extension of your object-oriented design.

ishavasya

Explorer

5. Aggregates, Transactions, Repositories, and ORMs

Why these topics belong together

The core idea

First: what an aggregate is

Why aggregates matter

1. They protect invariants

2. They define transaction boundaries

3. They reduce uncontrolled coupling

4. They shape repository design

What belongs inside one aggregate

What should not be in the same aggregate

A practical aggregate heuristic

Aggregate root responsibilities

Example: order aggregate

Good aggregate boundary

Why this is good

Example: school enrollment aggregate

Transaction boundaries: what they really mean

How aggregates and transactions relate

Not every business process is one transaction

Strong consistency vs eventual consistency

Repositories: what they are for

What a good repository looks like

What a bad repository looks like

Why repositories usually target aggregate roots

Repositories vs DAOs vs query services

Repository

DAO or low-level data access object

Query service or read service

ORMs: what they are good at

ORMs: what they are bad at

The ORM trap: letting mappings define the design

Lazy loading, eager loading, and why they matter

Lazy loading

Eager loading

Persistence patterns you should know

1. Repository pattern

2. Unit of work

3. Data mapper

4. Active record

5. CQRS-style separation for reads and writes

Active Record vs Repository/Data Mapper

Active Record

Repository/Data Mapper style

Designing repository methods from use cases

Example: checkout flow with proper boundaries

Aggregate

Repository

Transaction boundary

External workflow

Why this matters

Example: reservation system

Possible aggregate

Possible transaction

Outside transaction

Example: when not to use repository-only thinking for reads

How to decide what loads together

Concurrency and transaction thinking

Practical design flow from scratch

Step 1: design the domain model

Step 2: identify aggregates

Step 3: define transaction boundaries

Step 4: define repositories around aggregate roots

Step 5: choose ORM usage deliberately

Step 6: separate write model from read model when necessary

Common mistakes

1. Giant aggregates

2. Tiny aggregates that ignore invariants

3. Repositories for every child object

4. Treating the ORM as the architecture

5. Using repositories for analytics-style reads

6. Making one entire business workflow a single transaction

7. Ignoring concurrency

How these concepts fit together

Bottom line

Graph View

Table of Contents