Profile Picture

Visualizing and Deleting Entity Hierarchies in EF Core

August 16th, 2022

At Vesta, we have a multi-tenant application where tenant data is logically separated in our Postgres instance by a tenant_id column on entity tables. As we’ve grown, we’ve had to delete some tenant data. Although I’ve done it manually a few times in psql, manually deleting rows is dangerous, so I decided to write a script to delete tenants for us.

We use EF Core for all of our database interactions at Vesta, so our entire database schema is represented in our DbContext. Because DbContexts contain entity metadata, I wrote the script against that data.

Naive approach

Naively deleting tables might look something like this:

foreach (var tableName in dbContext.Model.GetEntityTypes().Select(et => e.GetTableName()).ToHashSet())
{
  await dbContext.Database.ExecuteSqlRawAsync($"DELETE FROM {tableName} WHERE tenant_id='{tenant}';");
}

Unfortunately, this doesn’t work. We have many entity relationships defined in our model, most of which aren’t defined with any specific ON DELETE behavior. So, with a table foo that has a foreign key to bar, if you attempt to delete the table bar before foo you’ll encounter foreign key constraint errors:

postgres@localhost:vesta> DELETE FROM bar WHERE tenant_id='test_tenant';
update or delete on table "bar" violates foreign key constraint "fk_foo_bar_bar_id" on table "bar"
DETAIL:  Key (id)=(a15ba931-eb31-412a-aadb-49d699af4435) is still referenced from table "bar".

Of course, you can get around this manually, but this becomes increasingly tedious as your entity relationships scale. A better approach would be to sort our delete operations programatically.

Finding dependencies in EF Core

EF Core’s Model is incredibly detailed—it has to be, so that its own operations work without hitting Postgres errors. We can introspect the model to find all foreign key constraints and the tables they reference.

private record struct TableColumn(string Table, string Column);

private static IDictionary<
  string,
(List<TableColumn> WeakReferences, List<TableColumn> References)
> GetDependencyGraph(PGContext context)
{
  var adjacency =
    new Dictionary<string, (HashSet<TableColumn> WeakReferences, HashSet<TableColumn> References)>();

  foreach (
    var entityType in context.Model
    .GetEntityTypes()
    // IBaseEntity is the type that all our entities inherit from; it guarantees the existence
    // of a `tenant_id` column
    .Where(et => typeof(IBaseEntity).IsAssignableFrom(et.ClrType) && et.GetTableName() is not null)
  )
  {
    var tableName = entityType.GetTableName();
    var table = entityType.GetTableMappings().FirstOrDefault(tm => tm.Table.Name == tableName)?.Table;

    Debug.Assert(tableName is not null);
    Debug.Assert(table is not null);

    foreach (var fkey in table.ForeignKeyConstraints)
    {
      if (!adjacency.ContainsKey(tableName))
      {
        adjacency[tableName] = new() { References = new(), WeakReferences = new() };
      }

      var fkeyTableName = fkey.PrincipalTable.Name;
      var column = fkey.Columns.Single();
      var tc = new TableColumn(fkeyTableName, column.Name);
      if (fkeyTableName != tableName)
      {
        if (column.IsNullable)
        {
          adjacency[tableName].WeakReferences.Add(tc);
        }
        else
        {
          adjacency[tableName].References.Add(tc);
        }
      }
    }
  }

  return adjacency.ToDictionary(
    kvp => kvp.Key,
    kvp => (kvp.Value.WeakReferences.ToList(), kvp.Value.References.ToList())
  );
}

Note that we distinguish between “weak” and normal references. Many of the relationships in our data model are optional, meaning that the foreign key column is nullable. For example, we have a borrowers table with a nullable spouse_id column that points to another borrower. To delete a borrowers with a spouse, then, we need to null out spouse_id before deleting the borrowers:

UPDATE borrowers SET spouse_id=NULL WHERE tenant_id='test_tenant';
DELETE FROM borrowers WHERE tenant_id='test_tenant';

Non-nullable foreign keys are required relationships. Our borrowers table has a non-nullable loan_id column referencing the loans table—this means that we must delete borrowers before loans to avoid hitting foreign key constraint errors.

Displaying Graphviz

Graphviz is an excellent way to visualize graphs with a simple syntax. With the adjacency list we built in GetDependencyGraph above, we can generate a Graphviz graph:

private static void PrintGraphviz(
  IDictionary<string, (List<TableColumn> WeakReferences, List<TableColumn> References)> adjacency
)
{
  Console.WriteLine("digraph G {");
  foreach (var kvp in adjacency)
  {
    // nullable ref
    foreach (var dep in kvp.Value.WeakReferences)
    {
      Console.WriteLine("  {0}->{1} [style=\"dotted\"]", kvp.Key, dep.Table);
    }

    // non-nullable ref
    foreach (var dep in kvp.Value.References)
    {
      Console.WriteLine("  {0}->{1} [style=\"solid\"]", kvp.Key, dep.Table);
    }
  }
  Console.WriteLine("}");
}

edotor.net is a free online Graphviz visualizer; I found that for my purposes the fdp engine produced the best visualizations.

Vesta's obfuscated entity hierarchy

Topologically sorting dependencies for deletion

To delete tables properly, we only care about sorting strong references. Since we have a graph, this is just a topological sort. We’ll do it via BFS:

private static IEnumerable<string> TopologicalSort(
  IDictionary<string, List<string>> adjacency
)
{
  var sorted = new List<string>();
  var inDegree = new Dictionary<string, int>();
  foreach (var kvp in adjacency)
  {
    if (!inDegree.ContainsKey(kvp.Key))
    {
      inDegree[kvp.Key] = 0;
    }

    foreach (var edge in kvp.Value)
    {
      if (!inDegree.ContainsKey(edge))
      {
        inDegree[edge] = 0;
      }

      inDegree[edge] += 1;
    }
  }

  var queue = new Queue<string>(inDegree.Where(kvp => kvp.Value == 0).Select(kvp => kvp.Key));
  while (queue.Count > 0)
  {
    var current = queue.Dequeue();
    sorted.Add(current);

    if (adjacency.TryGetValue(current, out var edges))
    {
      foreach (var edge in edges)
      {
        if (inDegree.TryGetValue(edge, out var degree) && degree > 0)
        {
          inDegree[edge] -= 1;

          if (inDegree[edge] == 0)
          {
            queue.Enqueue(edge);
          }
        }
      }
    }
  }

  if (inDegree.Where(kvp => kvp.Value > 0))
  {
    throw new InvalidOperationException("graph has circular dependencies!");
  }

  return sorted;
}

Data deletion

With our list of references and topologically sorted tables, we’re finally ready to delete some data!

public static async Task DeleteTenant(DbContext context, string tenant)
{
  var adjacency = GetDependencyGraph(context);
  var sorted = TopologicalSort(
    adjacency.ToDictionary(kvp => kvp.Key, kvp => kvp.Value.References.ConvertAll(r => r.Table))
  );

  foreach (var kvp in adjacency)
  {
    foreach (var weakReference in kvp.Value.WeakReferences)
    {
      // This can't be parameterized because Postgres can't use a parameter for the table name
      var sql = $"UPDATE {kvp.Key} SET {weakReference.Column}=NULL WHERE tenant_id='{tenant}';";
      Console.WriteLine($"{sql}");
      await context.Database.ExecuteSqlRawAsync(sql);
    }
  }

  foreach (var tableName in sorted)
  {
    var sql = $"DELETE FROM {tableName} WHERE tenant_id='{tenant}';";
    Console.WriteLine($"{sql}");
    await context.Database.ExecuteSqlRawAsync(sql);
  }
}

If all goes well, you’ll see a bunch of UPDATE then DELETE transactions. I’d recommend that you build dry-run functionality before running anything like this in production.

Future work

This code doesn’t account for non-foreign-key constraints or any other checks that might prevent a deletion from succeeding.

Enjoyed this post? Follow me on Twitter for more content like this. Or, subscribe to my email newsletter to get new articles delivered straight to your inbox!

Related Posts

Monotonic Last Modified Columns in Postgres
Seamless Migration Squashing for EF Core 6 Migration Bundles
Hello, Vesta
Scroll to top