The current changes to the comment.module in #731724: Convert comment settings into a field to make them work with CMI and non-node entities introduce a database design that is not ACID compliant. While Drupal itself does not have an ACID compliant database design, there has been considerable work done to take steps towards this. The first of which is to make real foreign key constants #911352: Document that foreign keys may not be used by all drivers.

Since foreign keys can only relate to a single table, the design implemented in the changes to comment.module make this impossible.

I propose that these fields are removed the from the schema:

entity_id
entity_type

Instead, it would be better to have a table for each entity that related the comments to that entity.

For instance, for nodes you would have a table called comment_node (or node_comment, whichever makes more sense), the schema would look something like this:

cid
nid

The primary key could simply be cid. If the comment shouldn't be allowed to appear on more than one node, or if it should be able to be on more than one node, it could be a primary key of cid+nid. Regardless a pseudokey of 'id' or 'cid_nid' should not be used.

Since comments will rarely, if ever, be queried across entity types, this change introduces relatively little performance problems.

The goal of this issue is to help make Drupal's database be more stable. When foreign key support is fully implemented into Drupal, this change will prevent the database from containing comments that relate to non-existent entities. This will assist in keeping Drupal's database sanitized.

Comments

davidwbarratt’s picture

Title: Remove entity_id and entity_type from the comment table and replace with relationship table » Remove entity_id and entity_type from the comment table and replace with relationship tables

Fix a typo in the title. :)

stevenlafl’s picture

I completely agree. By separating the comment table into multiple entity-type specific comment tables, you gain the benefit of true foreign key constraints that refer to extant tables. Also, by using the CASCADE option on the DELETE operation, you can let the SQL server automatically clean itself through deletion of rows that refer to entity ids that no longer exist as entities.

Additionally, in my experience, the comment module as a whole is relatively outdated, and needs some extensive modification to be in the same league as other modules such as node.

Thanks, David.

swentel’s picture

Category: bug » task
Status: Active » Postponed

We do this in a lot of places, so this is really a task. And postponed too, as we're not even there yet.

andypost’s picture

Status: Postponed » Active
andypost’s picture

Issue summary: View changes

Add another table name suggestion

davidwbarratt’s picture

Priority: Normal » Critical
Issue summary: View changes

Changing this issue to Critical as we are nearing the end of the alpha cycle without a resolution.

peace & love

andypost’s picture

Category: Task » Feature request
Priority: Critical » Major
Issue tags: +beta target

This is not critical, this is a feature without patch.

I like and support the suggested approach to change storage, but nobody cares to file a initial patch so probably we should move to 9.x
Also the size of the patch could be really big so this issue needs

@davidwbarratt Please explain the profit of the change to get approval from commiters, except architectural. Also without patch this looks really D9 material. Could you try to make initial patch and provide profiling?

xjm’s picture

Priority: Major » Normal
Issue tags: -beta target +beta deadline

Isn't the whole point of having pluggable entity storage so that people can do things like this when they need to, without having to change core?

This is more of a beta deadline rather than beta target.

andypost’s picture

Damien Tournoud’s picture

Status: Active » Closed (won't fix)

I'm going to close this. Yes, not being able to use a foreign key here is not nice, but at the same time the suggested solution would just make things worse.

davidwbarratt’s picture

Status: Closed (won't fix) » Active

Damien,

Can you please explain why it would make it worse? by what standard?

Thanks!

Damien Tournoud’s picture

Status: Active » Closed (won't fix)

Thinking about this more, the suggested solution doesn't even resolve the problem.

Suppose that you introduce a comment_* table for each entity type, like this:

comment <--- 1, n ---> comment_node <--- 1, 1 ---> node

By definition, comment cannot have a foreign key because you can have one comment_* per entity type. comment_node can have two foreign key constraints, one pointing to comment and one pointing to node.

This schema is wrong for at least two reasons:

  • You lose the idea that a comment references one and only one entity, which is a major structural issue.
  • You end up with a foreign key in the wrong direction. As a consequence, the CASCADE is not even going to do what you want it to do: if you delete a node, you are going to be able to cascade the deletion to the comment_node table, but not all the way to the comment table.

The suggested schema results in the same level of data integrity issue as the current one. It also results in a additional data loss as compared to the current situation: you lose the information of the entity type and id the comment used to be attached to.

=> Won't fix.

davidwbarratt’s picture

Status: Closed (won't fix) » Active

Damien,

You are correct in your schema analysis, however let's look at the reasons you proposed.

What is wrong with a comment referencing more than one entity? Shouldn't entities NOT be tied to a single entity? Isn't that the difference between a field and an entity?

I know Drupal itself might not be interested in this, but I can see a lot of uses for it in contrib.

The one that comes to mind first is, let's say you have a "facebook" entity that contains status and a "twitter" entity that contains tweets. What if you referenced them to a node (based on the url in the status/tweet). Then you wanted to enable someone to comment on the nodes, but when they leave a comment, to also post to the status/tweet.

Given the current database design, this would be impossible. The only way you could do this is by making multiple copies of the same comment. Then you'd have to also maintain the copies whenever they are edited/deleted (assuming that the only thing that is unique is the relationship).

Shouldn't comments work similar to how Files work? Multiple file fields (across entities) can reference a single file entity. What is wrong with this? Is this "a major structural issue"? I think we can all say "no". But now that Files are Entities, I'm not sure how and why Comments are different from Files.

I'm not sure I understand your statement "You end up with a foreign key in the wrong direction." I understand what you are saying about the CASCADE, but this is of little consequence. With either database design, you do not have cascade support. I think having it is not the reason why you'd change the design. I don't think most people add foreign key constrains in order to have cascade support. I believe they add them to ensure that the relationships they are making are actually valid.

Here's what we gain by changing the DB design:

  1. More ACID complaint design.
  2. Ability to attach a single comment to more than one entity
  3. Relationships are (when using constraints) guaranteed to exist.
  4. JOINS are much easier as the relationship is clear and unambiguous (i.e. "entity_id" is ambiguous).

Here's what we lose

  1. Relationships all in one table

Either design enables comments to exist without a valid relationship. The only difference is that with the current model, if the relationship is not valid (i.e. the entity no longer exists), then it will still be in the database like that, which I feel is more likely to cause errors (i.e. trying to join a non-existent entity). If a valid relationship does not exist in the proposed design, it's not a big deal because the relationship is not erroneous, it just doesn't exist at all.

Like I mentioned, you don't get CASCADE support either way, which was never my reason for the proposal in the first place. ;)

I feel that we are gaining much more than we are losing.

Thanks!

Damien Tournoud’s picture

Status: Active » Closed (won't fix)

@davidwbarratt: you can set the requirements you want in your head, but none of what you describe is the comment module.

There is a lot of flexibility in Drupal, you can build your own entity types and your own relationships. You even have an Entity reference field that is limited to one single target entity type, and as a consequence satisfies your desire for a strongly typed relationship.

But as far as the comment module is concerned, this is won't fix. Please do not change this status again.

davidwbarratt’s picture

Damien,

I found what you wrote to be extremely illogical and rude.

I'm not setting "the requirements you want in your head".

I'm sure that no one said "none of what you describe is the comment module" when #731724: Convert comment settings into a field to make them work with CMI and non-node entities was being proposed and committed.

Rather than responding logically to my comment, you've simply said that comment module IS comment module and there is no changing it.

While you cannot explain why comment is inconsistent with the logic of File and is also inconsistent with a more ACID compliant database design.

We've even done the same thing with Fields, we've converted them to remove the "entity_type" and "entity_id" fields and simply made them per-entity. This removes the ambiguous relationships in the database. However, Comment is allowed to have these ambiguous relationships... that's fine, it's comment after all.

What you are saying is that Drupal finds it completely acceptable to have ambiguous relationships in a database.

I don't think many Enterprise organizations would consider this acceptable, but what do I know?

peace & love

Damien Tournoud’s picture

@davidwbarratt: My apologies, I wasn't meaning to be rude in any way.

In #11, I explained that your suggestion doesn't actually improve consistency of the comment module in any way, and actually results in a data loss as compared with the current schema. I also noted that your suggestion would break a core requirement of the comment module, which is that a comment points to one (and a single one) entity.

In #13, I added that what you are looking for (a strongly typed relationship field) already exists, it is the Entity Reference field. It would allow you to build anything that you can think of, including the use case that you described in #12, that is not the use case of the comment module.

What you are saying is that Drupal finds it completely acceptable to have ambiguous relationships in a database.

I don't think many Enterprise organizations would consider this acceptable, but what do I know?

This is a fairly debatable proposition. Just consider two things: (1) it is a design pattern of many contemporary database designs to have lesser consistency guarantees, or even to have conflict resolution as a core feature of the database, (2) consistency guarantees provided by standard SQL databases is not a silver bullet, it gives you some guarantees, but some (arguably the most important ones) are going to have to be enforced by the application anyway.

andypost’s picture

The idea to split comment table could be sane but not for 8.x
This will cause a lot of troubles with data migration that will require to create table aka we do for fields but without description like there's for fields and instances. And building a new abstraction makes no sense in current release cycle.

At least we can try to get rid of 'entity_type' field in favor of real ER field 'field_id' see #2149859: Convert the 'field_id' base field on comment entities to an entity reference field
It is not low hanging fruit because comment entities needs bundle but bundle can't have '.' (dot) in it's name

Anyway the most common pattern in core to load comments with this approach would be nightmare because there's no metadata about a table name where comments are stored to particular entity type.
And comment as one of the base entities should live in 'base table', so won't fix

PS: if you going to provide a kind of "sharding" for comments you can override storage controller or make it transparent at storage layer