Why Should I Care About GraphQL ?
If you work in identity, sooner or later you’ll need to secure access to some APIs. It’s unavoidable in our hyper-connected world: APIs are the very fabric that ensures interoperability of services across domains and geographies. Besides, a recent poll by PostMan shows that 89% of executives will invest more in APIs over the next 12 months (87% of CEOs, 86% of CIOs, and 93% of CTOs).
Now when we think of APIs, most people still think of REST (or even good old “SOAP”). The world is changing though, and API clients need more flexibility in how they access data. Flexibility that REST(or SOAP) can’t provide. Hence the rise of GraphQL.
The adoption rate of GraphQL has grown steadily since its initial release by Facebook in 2015. From only 6% of developers using the API language in 2019, it is now used by 28% of developers worldwide. That’s almost 1/3 of all developers, and more keep adopting it every day. And if you look at the Javascript community alone, which includes Node.JS and all other JS based runtimes, a whooping 47% of developers used it in 2020. Although REST APIs still constitute the bulk of the APIs out there, their usage declines (89% usage, down from 92% last year).
In short, APIs are still a growing market, and within that market, GraphQL is the contenter to watch. Identity professionals should therefore at least be aware of its existence, and ideally also know the kind of problems it has the potential to create for their organizations.
In order to understand how to secure GraphQL APIs, we must first understand exactly what it is. If you already know, you may skip this next section…
What is GraphQL?
GraphQL is a language for APIs. It has its own specification and implementation vendors (Apollo dominates that market). In GraphQL, an API exposes 1 single REST endpoint. Client applications can then (typically) POST to that endpoint, sending requests written in GraphQL language in the body of their requests.
A GraphQL request can be either a Query or a Mutation. A query is a request to read some data, whereas a mutation is a request to modify data (create, update or delete).
GraphQL requires a Server that can understand GraphQL requests and interact with a backend data store. It also requires Clients that can prepare and package GraphQL requests and understand the responses from the GraphQL servers. Clients typically run in browsers, but any application running on any platform can implement a GraphQL client.
A GraphQL server exposes its capabilities through a schema. A GraphQL schema is a definition of types, queries and mutations that it exposes. The schema is usually introspectable – clients can therefore discover the capabilities of any given GraphQL Server. Here is a Schema example:
- We first define a simple User type with a set of properties (note: it looks like JSON, but it isn’t!)
- Notice the “!”? That marks a mandatory attribute on that type:
type User {
ID: ID!
username: String!
Email: String
Name: String
password: String
}
- Then some mutations – Create, Update and Delete a User. Create and Update return the User object, Delete returns a Boolean:
type Mutation {
CreateUser(
Name: String
Email: String
password: String
username: String!
): User
UpdateUser(
ID: ID!
Email: string
Name: String
password: String
username: String!
): User
DeleteUser(ID: ID!): Boolean
}
}
- And finally a Query – which returns all users (an array of Users actually), with a pagination option:
Type Query {
User(options: paginationOptions): [User]
}
Now all that’s missing is to provide actual code to implement the above schema. This is done through Resolvers, which are pieces of code, or functions that do the actual work of interacting with the backend store and implementing the CRUD operations requested.
Resolvers can be implemented in any language, and can interact with any number of backend stores. This fact alone explains some of the popularity of GraphQL: any given schema can be the collation of multiple backend stores. A GraphQL server could for example use SQL and no-SQL data sources at the same time to serve its schema.
Now for the “Graph” part in GraphQL, which is probably the main driver for the language’s popularity.
GraphQL enables developers to create relationships in the schema between its various types. For example, in an IoT use-case, a user may own one or more devices.
This (User)-[OWNS]->(Device) relationship can be modeled as follows in GraphQL:
type Device {
ID: ID!
IP: String
Name: String
DeviceID: String
}
type User {
ID: ID!
Email: String
Name: String
password: String
username: String!
Devices: [Device!]! @relationship(type: “OWNS”, direction: OUT, properties: “since”)
}
So now we’ve just defined a graph layer on top of any kind of backend database(s)! Note that using a graph database as a backend makes the most sense here, as it removes certain issues such as the “Object/Relational Impedance Mismatch” problem. Essentially, GraphQL is a graph layer sitting over any set of data stores.
Given a schema as defined above, a client App could then send a request to get all the Users and Devices they own in one single call. The GraphQL Query would look something like this :
query getUserDevices {
User (options: { ← The type we’re querying
limit: 10 ){ ← The pagination option: fetch at most 10 users
ID ← The user properties to return
Username
Name
Devices{ ← We’re now traversing the OWNS relationship to Devices
ID ← We’re on the Devices related to the User, fetch some properties
DeviceID
}
}
}
🛑 Important note: in the above query, we’re accessing two types: the Users and the Devices they own. These are two types of resources: each may have its own access policies and requirements. How would you protect that? (answer in the next section).
Anyway, the response always matches the “shape” of the request. For example, the results of the query above would look like this:
{
“ID”: “4f45a8eb-a674-4d63-8273-2c01796fc7f2”,
“username”: “alexb”,
“Name”: “Alex Babeanu”,
“Devices”: [
{ ← This is a Device entity
“ID”: “5b2ed103-50ce-4f2b-9a73-c53515ab4856”,
“DeviceID”: “DABC1”,
“Name”: “Flow Sensor 1”
}
]
},
{
“ID”: “1f9affc3-4d0b-41c9-b109-979c1d9d9f24”,
“username”: “bdoe”,
“Name”: “Bob Doe”,
“Devices”: [] ← Bob doesn’t own any devices
}
}
Queries like the above are hard to implement in REST. An endpoint specific to this query would have to be developed, which takes time to code anew. Then, clients would be stuck with this exact query, if they ever needed even slightly different data, then the REST endpoint would need to change accordingly.
Now compare REST with the GraphQL approach: clients can freely request any property of any type, while traversing any relationship, at any given time as long as they are requesting something defined in the schema. Nobody needs to rewrite anything at all on the server side. GraphQL is a full blown language. Pretty powerful indeed, no wonder it’s so popular! And it’s probably the only viable approach if you have to handle complex, highly connected data such as what is in the Web 3.0 universe, for example.
Problems with GraphQL
But as we know, with great power comes great… headaches.
The very flexibility of GraphQL makes it rather difficult to deploy and operate, so much so that some even recommend never to expose GraphQL APIs to the internet… Among the many challenges, we can cite:
- Poor performance, due to the risk of running n+1 queries
- Security and Access
- Hard to perform File Upload
- Change management
- Object/Relational impedance mismatch (due to using a graph layer on top of a non-graph backend)
- Type explosion
- Vulnerability to deep queries
- And more
Now, most of the above are purely software development problems that can be solved or alleviated through the use of appropriate techniques. For our purposes here, as Identity and Access Management (IAM) professionals, we will focus only on security and access issues. We will also briefly discuss the problem with deep queries, as a Denial of Service (DoS) attack is still a security issue.
Enforcing API Security
Developers notoriously hate having to deal with Authentication (AuthN) and Authorization (AuthZ), which are often done as after-the-fact band-aids, typically resulting in poor API security (to say the least). This has led us to the traditional way of securing APIs by delegating the tediousness of AuthN and AuthZ to external services.
The Traditional Way
The traditional (and textbook) way of enforcing API security is to place some kind of Proxy server in front of the API: a Gateway that acts as a gatekeeper that filters the requests. The requests that reach the APIs have therefore already been vetted by the Access Control system. Figure 1 below depicts this traditional access enforcement architecture.
Figure 1 – Traditional architecture for Access Control
[source: LDAPwiki – https://ldapwiki.com/wiki/Policy%20Based%20Management%20System]
- The Policy Enforcement Point (PEP) is a proxy deployed in front of our traditional REST API. The PEP typically relies on HTTP routes to determine which resource is requested by the client.
- The PEP sends an access request to a Policy Decision Point (PDP).
- The PDP takes care of authentication (and may thus initiate a login flow) and then authorization.
- For authorization, the PDP fetches adequate access policies through a Policy Administration Point (PAP).
- Optionally, the PDP can also fetch some additional data from a Policy Information Point (PIP), which has all the adequate attribute values necessary to make access decisions.
- The PDP replies with an access answer to the PEP, which then lets the request reach the API, or not.
In any case, this architecture is fine as long as your resources to be protected can be modeled as distinct HTTP routes. But as we’ve seen, GraphQL exposes only one single route for a potentially huge number of possible types of requests. We therefore can’t use this architecture at all, not unless our PEP can also parse POST requests and understand GraphQL!
Now, some recent such proxy offerings do support GraphQL. Nevertheless, as we’ll see below, this model goes against GraphQL best practices.
Securing GraphQL APIs
From an IAM perspective, the main worries here are AuthN, AuthZ and Deep Queries.
Architecture
The first consideration is the architecture we need to implement our IAM system. Whereas the traditional way would have us place a PEP proxy in front of our resources, the GraphQL specification and best practices tell us that the authorization checks should all be made at the Business Layer. Figure 2 below depicts the placement of this Business Layer:
Figure 2 – Placing Authorization in the GraphQL stack
[source: https://graphql.org/learn/thinking-in-graphs/#business-logic-layer]
This is radically different from what we, in IAM, have been doing so far! No more Proxies – all the enforcement should be done in the API implementation itself. This actually makes sense, in that the Business logic layer is where all the data processing happens. The data is readily available there, it is therefore the best place to evaluate access policies.
But wait, does this mean that we have to trust the developers with this? Thankfully, not necessarily, as we’ll see…
The two Business Layer Places
The “Business Layer” can mean two things in GraphQL implementations:
- At the database level
Here we’re implementing access control by augmenting the queries that would run normally on the DB to process the incoming requests. This is typically done by dynamically adding WHERE clauses to the query, as appropriate in order to filter the results based on predefined access rules. This is as deep in the “business” layer as it gets: we’re at the DB level.
Pro’s:
- Full control of the data
- Super-fine grained authorization possible
Con’s:
- Definitively developer work
- Access Policy changes require code changes
- Really hard to manage policies.
- Arguably a bit late to enforce AuthZ at the DB level. A lot of processing could have been avoided by checking access earlier.
- At the GraphQL Layer
Now luckily, the GraphQL Specification provides us with a tool that we can leverage to enforce access control: the @auth directive…
Directives are annotations that can be added to the types and properties of a GraphQL Schema. The @auth annotation is reserved in GraphQL for all authorization considerations. For example, we can augment our simple schema defined above with an @auth annotation as follows:
directive @auth(requires: Role = ADMIN) on OBJECT | FIELD_DEFINITION
→ Auth directive that takes a “requires” input parameter of type “Role”. It defaults the Role parameter to “ADMIN”. It can be applied to Type or Field of objects.
type User @auth(requires: USER){ → This type requires subjects to have a “USER” Role
ID: ID!
Email: String
Name: String
password: String
username: String! @auth(requires: ADMIN)→ Access to this property requires an “ADMIN” role
Devices: [Device!]! @relationship(type: “OWNS”, direction: OUT, properties: “since”)
}
This directive requires its own implementation function, which can be written in any language and perform any processing. The GraphQL server ensures that the directive implementation function is run first, before any operation on the annotated object or field. This is perfect, as it can serve, in effect, as a PEP placed directly within the GraphQL schema at the business layer.
⇒ Recommendation: use the @auth annotation to make a call to an external Authorization Service or PDP as needed. This is the closest implementation to the traditional PEP/PDP architecture we can get.
Authentication
AuthN is actually “easy”, in that it isn’t much different from securing REST APIs. It could be done through a Proxy, a Gateway or out-of-band, or as the first step in the @auth directive implementation. The trick is to ensure that the access_token provided in the incoming request is valid.
Authorization
Calls to the PDP can be made from the @auth directive function, right after the authentication check.
Throttling
The last piece of the puzzle is due to the very nature of GraphQL. As we have seen above, it is possible to request the traversal of relationships in any Query. It is therefore also possible to request “deep” queries. Here’s an example of a deep query:
query getUserFriends {
User {
Name
HAS_FRIEND {
to {
Name
HAS_FRIEND {
to {
Name
HAS_FRIEND {
to {
Name
… etc…,
}
}
}
}
}
}
}
}
If the backend database(s) stores several million users, a single query like the one above can easily hang the whole system. This can be used easily for DoS attacks, but also accidentally by unaware clients.
Usual Throttling techniques can’t be used either. We’re only talking here about one single request, not N requests per second. Instead of relying on the typical time-based request count, we need to implement a new kind of throttling by calculating the cost of any given GraphQL request (in particular, queries).
Various techniques exist for mitigating this risk. One handy method is to calculate the cost of any query before running it. This cost can additionally be used to implement various subscription levels in a SaaS, for example. In any case, several cost analysis and open-source libraries exist, for example: https://github.com/pa-bru/graphql-cost-analysis .
Cost analysis is also the most flexible and versatile approach, in that it applies to all queries without having to change the GraphQL schema at all.
⇒ Recommendation: implement cost analysis to protect any GraphQL API. Note that cost analysis can also be done in the @auth directive.
Conclusion
IAM professionals should be aware of GraphQL as its adoption across the industry is growing. Securing GraphQL requires at least some specialist knowledge: traditional methods don’t apply.
The best and least intrusive method is to use special directive annotations on the GraphQL schema, acting as PEPs from within the API’s business layer, and delegate the authorization tasks, as usual, to an external PDP service.
Additionally, it is imperative to compute the cost of any GraphQL query before running it, because of the risk of Deep Queries and easy DoS attacks. Only run queries that have a cost cheaper than a preset limit.
Alex Babeanu
Co-Founder and CTO of 3Edges