🔜 Non-trivial GraphQL issues 2

description

GraphQL is simple, some queries, mutations and models … or isn’t?

TODO

Quick tips:

~~When and Where to use GQL (tRPC, REST, GQL) (GQL is not SQL replacement)~~
~~naming conventions~~
~~enums and enums in inputs~~
~~[Item]~~! ~~vs [Item~~!]!

~~GQL is always about the client (ID for cache, caching many structures)~~
~~Proper modelling~~

thinking in graphs
InputObjects vs params
Interfaces, unions etc.
Think about cache updates (Returning what has been changed, in more complex scenarios refetch parent)

~~Operate on proper level~~

~~Filtering~~
~~in the good GQL schema, entry point (query) isn’t so important~~

~~Type safety~~
~~Versioning~~
R~~esolvers~~

~~Field resolvers > single root resolver~~
~~always use dataloaders~~

Mutations

Anaemic updates vs separated mutations vs mixed (batch)
return types for automatic cache updates

~~Security~~

~~Max query depth~~
~~cost~~
~~persisted queries (automated and static)~~

Error handling (operational vs unexpected)
~~My prefered tooling for GQL~~

~~Yoga (+extensions)~~
~~Pothos (+extensions)~~
~~Apollo Client~~

TL;DR

Use GQL only for FE clients’ communication.

Do NOT type GQL by hand.
Set max query depth and enable automatic persisted queries.
Don’t version your GraphQL. Take advantage of nullability and avoid breaking changes.
Use resolvers with data loaders.

ℹ️

This article is a guideline. I wanted to highlight the most popular issues and solutions for them. When possible, I attached a link with a deeper explanation.

GraphQL. When and where.

GraphQL was designed to be used for clients’ communication with backends. It’s just fitting perfectly there. A stable contract between client and server, resolving nested structures, flexible cache, and type generation.

Why not put it between `server` ↔ `server` communication?

Because the ROI will be much smaller. You will not benefit from normalised cache, schema exploration, or resolving nested structures on the server side. But you will stay with problems like designing schemas, writing resolvers, data loaders etc.

To be clear - there is nothing wrong with consuming GQL API from external services on your backend. You don’t have benefits main benefits but you don’t have also the cost of maintenance. Just don’t write GQL for your internal services.

Maybe `database` ↔ `server`?

Also no. Why? There are tools like hasura.io and PostGraphile which could generate API for me and then I don’t need to handle this bloody SQL in my code! Yeah, simple CRUDs could be fine… but what if you need to aggregate something by enum? Or generate a series of dates and JOIN on it? What about transactions or performance? And the most important thing - GraphQL is not always adjusted with DB schema. The client could require a different shape of data than DB is storing. That is the tradeoff.

Low-hanging fruits

Let’s briefly review the minimum must-haves:

Use default naming convention. You can read about them in Apollo Client Docs. Long story short:

don’t use verbs in queries, ~~getUsers~~ → users
use verbs in mutations, createUser
add Input suffix for input types, ~~CreateUser~~ → UserInput
use GQL enums in types and variables too
validate input with validation libraries, the GQL server will guarantee the types only,
name queries and use variables,

query {
	user(ID: 7) {
		id
    email
  }
}

⛔ Don’t

query UserById($id: ID!) {
	user(id: $id) {
		id
		email
	 }
}

✅ Do

do not expose your schema on production,
set max query depth on server,

Fortunately, I’m not seeing these issues frequently, so I hope we can jump into the next topic.

GraphQL is always for the client

Our schema is a contract. On one side we have a client, on another server. The client is saying which data they need, the server is solving how this data will be delivered. Fetching data through GQL is easier than writing performant and secure resolvers.

If the client is the main beneficiary of this contract, we should 100% of his capacity. My general advice is to learn how the cache is working in your GQL client. Many queries could be avoided by just proper operation on cache. I’ll pick Apollo Client since it sweet spot between popularity, complexity and usability.

Thinking in graphs

Do you know that GraphQL has Graph in the name?

GraphQL schema is not your database schema

A common mistake is to model a GraphQL (GQL) schema to reflect your database schema. But in most cases, these two things will start to drift apart. The client sees the data in a different shape than the provider is storing it.

# Prisma Schema Language
model Issue {
  id:          Int      @id
  title:       String
  description: String?
}

Database data representation

# GraphQL
type Issue {
  id: ID!
  title: String!
  description: String
}

… and corresponding GraphQL

Both the DB model and GraphQL schema are simple and match each other. Even generators like Hasura or Postgraphie can handle it easily. However, when you add a new requirement like an issue can block other issues and can be blocked by another issue, you need to modify the schema accordingly.

The database schema could looks like:

model Dependency {
  issueId:      Int
  dependencyId: Int
}

The happy-hurrah developer could model the GQL schema like:

type Dependency {
  issueId: Int!
  dependencyId: Int!
}

# updated Issue type
type Issue {
  id: Int!
  title: String!
  description: String
  dependencies: [Dependency!]!
}

query issues: [Issues!]!

ℹ️

Common mistake is missing `!` in [Array]. In this scenario GQL guarantee the array but item inside could still be null e.g. [null, { foo: “bar” }, null]

Resolvers has been written, code has been tested, now time to code frontend. We need to render two lists, each list should contain name of the issue. So the smart developer will parse the array from issues query, obtain the dependencies and pass them through some props to components. It works but we have several problems there:

What if issues will miss some issues? (e.g. they will be filtered out in main view?)
What if we want to add more details of dependencies which are not presented in issue returned in main query? (e.g. some dates that will not be queried at the main query)
We need to maintain code which is responsible for parsing the main query (unnecessary complexity, tests)

Let’s try to rewrite the schema, respecting the client requirements.

type Issue {
  id: Int!
  title: String!
  description: String
  blocks: [Issue!]!
  blockedBy: [Issue!]!
}

Now, consuming this data structure on the frontend side will be just a pleasure. You can just pass the dependencies as properties. No need for any custom parsing. If new requirements like showing assignee for dependent issues come up, you only need to add one resolver for Issue, and it will be available for main issue and dependencies. 😌

Thinking in graph

We should think about our schemas as graphs that are easy to travers. This travers could go in many directions.

Let’s expand out project from previous example. Issue could have an Assigne Also Issues could be grouped by Bucket. We have to display a buckets with issues inside.

However, we may also want to display a Bucket for an Issue. So we want to go from child to parent. A popular blunder in this scenario is just adding a field like parentId in the child model because it’s fast and easy to implement (and it matches the database data representation). Modeling like that is just doing REST over GraphQL. We’re losing one of the main advantages of GQL - flexibility.

Instead, Issue should contain a field like bucket which resolves Bucket. There is nothing wrong with circular models in GQL. Just remember to set the max query depth for your GQL server provider.

model Bucket {
	id: ID!
  name: String!
}

model Issue {
	id: ID!
  bucketId: ID! # 2xID smells bad
}

⛔ Don’t

type Bucket {
  id: ID!
  name: String!
  issues: [Issue!]!
}

type Issue {
  id: ID!
  bucket: Bucket!
}

✅ Do

Now, if we want to add some info about bucket inside an issue, we can just open curly bracket and get what we need.

Make use of advanced concepts

Let’s make our example even more complex. We have Users. Users could be assigned to multiple projects.

type User {
	id: ID!
  name: String!
}

query users: [Users!]!

type Project {
  id: ID!
  users: [User!]!
}

query project: Project!

Then we want to have Role (enum with values SALESMAN and MARKETER) for user in Project.

Let’s see how the naive implementation could looks:

enum Role {
	SALESMAN,
  MARKETER
}

type User {
	id: ID!
  name: String!
  role: Role!
}

Updated User type

query project($id: ID!) {
  project(id: $id) {
    id
    users {
      id
      name
      role
    }
  }
}

Sample query

What is wrong here?

The problem is User doesn’t have Role in every context. Only when we’re fetching User in project we are able to say what Role is attached to him. But when want to fetch all the users, we cannot determine the Role.

How to solve the issue?

We could just duplicate User and have something like ProjectUser and only ProjectUser will contain Role. The problem is solved but it produces another problem. What if we want to add field like email to User? We have to add this property to both types.

The proper solution

The proper solution here could be usage of extend.

# ! TODO: Sprawdzic jak to działa 
extend type Story {
  isHiddenLocally: Boolean
}

Operate on proper levels

query AuthorsWithPopularPosts {
  author(filter: { createdAt: { gt: "2022-07-15"}, posts: { likes: { gt: 5 }}}) {
    givenName
    familyName
    posts {
      id
      title
    }
  }
}

⛔ Don’t

query AuthorsWithPopularPosts {
  author(filter: { createdAt: { gt: "2022-07-15" }}) {
    givenName
    familyName
    posts(filter: { likes: { gt: 5 }}) {
      id
      title
    }
  }
}

✅ Do

In a well-designed GraphQL API, entry point (which is query) doesn’t play a huge role. Query is just an entry point but all the available operations like filtering, ordering or pagination should be available per model resolver. With that, you can avoid monster-size input types which are coupled with things that will be returned. And your API is even more flexible.

Type safety

GraphQL is always running by some other programming language like TS. But you cannot translate GQL to TS directly. You have to assert the types anyway. You have to choices.

Write your own types
Generate them

I would always go with options numer 2. It’s just the faster and safest way to type your schema. The manual typing is very tedious, error-prune and hard to read. the miss-match between types and real GQL implementation will increase in time. Trust me, you don’t want to follow this direction. Always generate types.

Versioning

This one is interesting. We’re used to versioning in a REST way, sample.com/api/v2/foo. In GQL we have only one endpoint. In theory we can do something like /v2/graphql. In practice this will be unusable for our GraphQL clients because they can’t just simply combine them.

The official GraphQL versioning best practice is … just avoid breaking changes. They described it there. I mostly agree with them. The one problem is that after time you can have a huge graph of legacy models/fields/queries where everything is nullable. To avoid that you can mark deprecated parts of your graph with wide supported @deprecated directive. Then monitor resolution of this field in time. You can do it even in the scope of single field. If you're sure that nobody is using this field - feel free to delete it.

Resolvers

There is a couple approaches to resolve the graph on backend side

Resolving the whole graph at once - most performant but lack of flexibility (you have to cover all the paths, always) makes it non usable in real projects
Resolve level by level - most flexible way but you must implement data loaders. Without them, you will end up with massive N+1.
Mixed approach - combination of 1. and 2.

My practice shows that only the second approach is scalable. Yeah, you will have to execute N queries to DB, where N is the depth of the query, but it shouldn’t be higher than 5. And even if this is an issue for you, you can always cache the queries, resolvers or even single fields. The biggest value is that you don’t need to cover other paths in your grap in other places. Also you’re not overfetching and smaller resolvers are easier to maintain.

Error handling

Error handling in GraphQL could be tricky. Server could return status 200 but the object could looks like:

"errors": [
    {
      "message": "Invalid argument value",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": ["userWithID"],
      "extensions": {
        "code": "BAD_USER_INPUT",
        "argumentName": "id",
        "stacktrace": [
          "Hope it's not your production code :)"
        ]
      }
    }
  ]
}

GraphQL error object

That's because GraphQL is protocol agnostic. But over 99% of the time, it's served via HTTP and JSON.

Two types of errors

Generally speaking, we can split all error types into two groups: operational and non-operational. Operational errors are used for known errors raised by us (e.g., an email taken and we're throwing an error with status 4XX). Non-operational errors are all the rest that we are not expecting, like a DB connection shutdown.

We can take an advantage of GraphQL unions. I don’t want to explain the whole concept because it was well explained here. You can model your operational errors as part of your schema and then handle them in a type-safe way on the client side. There are tools that are handy for shaping data in this way. For non-operational errors, I recommend just passing them through some error handling and letting them go to the client, which probably will not handle them anyway (probably will show something like "unexpected error occurs").

Security

With great power comes great responsibility

Query Depth

One of the easiest ways to crash a GQL server is to find an infinite resolver. However, the solution is very simple. If possible, find the deepest query in your client and set a limit to this number. If not, just use your intuition 🙂

Cost

Query depth is good for simple apps, where everything is fetched from a single database. When the graph is combined from multiple services or some properties must be computed in an expensive way, you can calculate the cost of the query.

I don't want to waste time explaining query analysis in depth since it was done very well here.

Persisted Queries

Persisted queries are more associated with performance than security, but there are some security aspects. We have two types of persisted queries.

Automatic Persisted Queries

When our client computes a hash from a query and sends it instead of the body, this saves network bandwidth and in an illusory way hides the schema. A potential attacker couldn't distinguish Automatic Persisted Queries from Static Persisted Queries when reviewing some random queries. But it's security by obscurity because every new query will be executed and info will be returned. Anyway, APQ is good to enable - performance boost and a little security by obscurity for free. You can find more info here.

Static Persisted Queries

If we know the shape of the queries of our client, we can go even further than APQ. During the build process, we can collect all the queries and mutations from the client, compute the hashes, and inject them into the GQL server. Normally, it's a kind of map with the hash as a key and the operation as a value. Now, the GQL server will reject all the other operations that are not sent from our client. The GQL server can even save some time processing the queries since it knows the whole query structure. In theory, we can even omit the query depth limit because we know that only queries prepared by us are processed.

Sounds great, but in practice, it's not so easy to achieve, especially if you aren't working in a monorepo and with multiple clients. Obtaining info about all the queries of clients, creating the hash map, delivering it to the server using CI... So as always, you have to consider the pros and cons in your context.

Miscellaneous

Of course we still have to remember about all the others security features like:

input validation,
pagination,
errors sanitization,
authorization & fields access,
etc.

GQL is just a transport layer for our data.

My preferred tooling

Handling all this stuff is not easy, so I decided to share the tools I'm using for GraphQL. The best server for Node, in my opinion, is Yoga on Fastify. It integrates with Envelop perfectly. Envelop is The Missing GraphQL Plugin System. So we can easily apply useful extensions like useOpenTelemetry or useGraphQLJit.

For building a schema, a good choice could be Pothos. It's built around the concept of "backing models" and has multiple extremely handy plugins like Errors Plugin, Prisma Plugin or Smart Subscriptions. Using schema builders like this one is required to build a complex API in a fast and clean way.

My favourite client is Apollo Client. The normalised cache works very well. Very often, we can avoid state libraries or contexts by reading directly from this cache.

To test GraphQL, you can choose any client like Altair, Apollo Explorer, or just GraphiQL v2, which is delivered with Yoga. They all have almost the same features.

Conclusions

As we can see, to write a quality GraphQL, a basic knowledge about operations and types is not enough. We have to make usage of multiple tools, know advanced concepts and combine them together. There’s only way to achieve that - practice 🙂

That was my first article in my life. I tried to be as concrete as possible. If you know any good-to-know tip about GraphQL, or I made a mistake somewhere, let me know. I'll update this article. Thank you for reading, and I hope you learned something! 😊

🔜 Non-trivial GraphQL issues 2

GraphQL. When and where.

Why not put it between server ↔ server communication?

Maybe database ↔ server?

Low-hanging fruits

GraphQL is always for the client

Thinking in graphs

GraphQL schema is not your database schema

Thinking in graph

Make use of advanced concepts

Operate on proper levels

Type safety

Versioning

Resolvers

Error handling

Two types of errors

Security

Query Depth

Cost

Persisted Queries

Automatic Persisted Queries

Static Persisted Queries

Miscellaneous

My preferred tooling

Conclusions

Why not put it between `server` ↔ `server` communication?

Maybe `database` ↔ `server`?