Simple Means Less, Not Easy
Written by Sean on October 28, 2024
I'm surprised how often developers introduce complexity when they are seemingly attempting to make some tasks easier. It's easy to forget that simple and easy are not the same thing.
Simple is less code. Even if it occasionally requires a little more work.
I recently worked on a project where a developer wanted user ids on every join table.
The purpose was to make certain queries easier, by not needing joins. Here is the relevant portion of the schema
Simple is less code. Even if it occasionally requires a little more work.
I recently worked on a project where a developer wanted user ids on every join table.
The purpose was to make certain queries easier, by not needing joins. Here is the relevant portion of the schema
create table users ( id integer, name text, ); create table owners ( user_id integer, name text ); create table dogs ( user_id integer, name text ); create table dog_owners ( owner_id integer, dog_id integer, is_point_of_contact boolean default false );
The request was to add a user_id to the join table dog_owners, so you could run this.
update dog_owners set is_point_of_contact = true where user_id = 1
On the face of it, this doesn't seem like too much added complexity.
It's not until you realize that every write now needs to align the redundant data between the tables. It introduces data integrity issues, especially since you have to add the ids at the application level code.
To be safe then, you need to add database level integrity checks. Otherwise, bad data creeps in and your star select on the join table isn't valid.
In Rails, it means that something as simple as this goes away.
@dog.owners << @owner
...without adding in module concerns to make it work again.
All of this just to avoid a join?
All of this just to avoid a join?
DogOwner.joins(:owner).where(owners: { id: 1}).update_all(is_point_of_contact: true)
You don't have to add a lot of code, if you just let the relational database do its job.
Complex Context
Now if performance is a concern, due to the presence of hundreds of millions/billions of records in this table requiring partitions, that is a different and complex problem.
In this context, adding a user_id to every join is actually a simpler solution compared to other approaches for scaling your database.. e.g., sharding or non shared tables.
But otherwise it is plain ol' premature optimization.