Why stateless services?
The primary benefit of stateless services is scalability. In particular, Functions-as-a-Service -- AWS Lambda, Azure Functions, Google Cloud Functions, etc. -- should only host stateless services. Because each function execution might run on a different machine and multiple executions may run in parallel on different machines.
What is state?
State is just data. But in this context it means data kept in memory between executions, and needed in order to service requests. A stateless service should not be required to keep anything between executions, but it may depend on other parts of the architecture with "memory" like databases.
So this is easy right? Just shove everything in a database instead of keeping it in memory. Nearly that. Below I have outlined some specific gotchas that can make a service stateful unintentionally. The first couple of the gotchas below may already be disallowed by Functions-as-a-service providers. But I'll mention them anyway, because statelessness can be useful even if not using FaaS.
Gotcha: resource handles
This is pretty well-understood nowadays, but it bears mentioning that open files and network connections have state. To be stateless, you cannot assume a resource handle is still open between invocations of the service. You have to assume the service may be shutdown and restarted on a different node at any time. In general, it is best to open a resource only when you need it and close it immediately afterward.
Gotcha: session data
Many platforms offer built-in session handling and default to keeping it in-memory. If you want to use session data, move it off of being handled by the service, probably into a database.
You cannot assume that two requests issued asynchronously will be executed in the order they are sent. It could be that the first request gets delayed in processing on one node. Then the second request runs on a different node and finishes first. If the order of processing is important, wait for the first request to finish before issuing the second.
Gotcha: serial identity
The common type of serial identity is the auto-increment primary key. Since many copies of a service may be executing in parallel, you cannot make assumptions about the next-assigned ID or guarantee a gapless ID sequence due to write conflicts (discussed below). In fact, the whole concept of serial identity can become a bottleneck to scalability unless due considerations are made. There are other identity strategies that fit better for scalability, including ones that are (mostly) ordered.
Gotcha: write conflicts
It can happen that two users are trying to make changes to the same entity at the same time. Classic example: An account has $100 balance. Two simultaneous requests are issued to withdraw $100. In a naive multi-threaded or multi-node system, neither request knows about the other, so both would likely succeed. You would rather this kind of situation be detected as a write conflict. (In the real world, banks see this as an opportunity for profit instead of a problem. But either way they have to handle it.)
Every API potentially has this problem. But a lot of APIs never take write conflicts into consideration, and might not need to. In low-volume systems or systems where users can't change shared resources, concurrency conflicts are rare. And the worst case is typically an every-blue-moon support call: "Hey, I just made changes to this item, but now my changes are missing." Two simultaneous changes to an item resulted in one of the changes overwriting the other (aka Last Writer Wins). Support replies "Have you tried turning it off and on again?" and everybody moves on with their lives.
Write conflicts are where you have to start thinking about concurrency control. The usual suspects being pessimistic or optimistic concurrency.
In general pessimistic concurrency will get slower as you add more nodes. You can think of it as having to obtain an exclusive lock on the data before you start running business logic. More nodes means longer waiting times for a lock. Since the point is being scalable, pessimistic concurrency is not a great fit.
Optimistic concurrency says I'll go ahead execute the request and prepare all the changes. But before I commit my changes, I'll make sure that the entity is exactly the same as when I started... nobody else changed it in the meanwhile.
The most common way to implement optimistic concurrency is by adding a "version" to each entity. It could be, for example, an integer or an etag. When I start executing the request I can read the version of the entity from the database or let the client provide it. (After all, the client has made the decision to change the entity based on the version they had at the time. So why not let the client tell me that?) Then if the version still matches when I'm ready to commit, I bump the version with the commit.
If the version does not match, then there are decisions to make. Do I stop and error? Do I reread the entity and retry the request? Do I commit the changes anyway? Fortunately these are not my decisions to make. I have to ask my business experts: what should I do if two people try to make changes to the same order at the same time? It might even come down to very specific circumstances like: Bob is trying to cancel the order but meanwhile Pete just added $10,000 more in sales to it. They will probably tell you to chuck Bob's request in the trash.
Yep, scaling will make you think about problems you never considered before. Both technical and business-oriented.
Should all services be stateless?
Oh, you want more? Ok, well sometimes it is not worth the trade-off. For example internal APIs with a low number of requests per second. Save yourself some time there by skipping the extra-cost scalability features.
Doesn't this just move the scalability problem to the database?
Yes, great observation! With your services stateless, you can deploy as many copies as you need to handle your load. But your storage tech may not be able to handle it. That is a topic for another post.
Addendum - Examining the Scalability of Authentication and Authorization
Authentication means proving yourself to be a valid user. "This Van Gogh painting has been authenticated as genuine." Authorization means checking whether a valid user has permission to perform an operation. "You are authorized to access floors 1-3, but floors 4 and above are off-limits." When referring to both I usually write "auth".
Not that I've done a formal survey, but most auth frameworks should already work with stateless services. Users, permissions, and the link between them are likely stored in a database anyway. Traditional auth frameworks store the list of active logins in the database too. Then your service needs to lookup that active login session on every request to make sure it hasn't been terminated (logged off by either the user or an admin) or had its authorizations changed. That means a network round-trip to ask the database on every request. Even in stateful services, this can cause unwanted latency. So auth frameworks frequently introduce in-memory caching to reduce the number of database reads.
The problem here is that in-memory caching has limited effectiveness on stateless services, if the FaaS provider even supports it. Because each service invocation could run on a different service instance, there will be a lot more cache misses than in stateful services. And consequently a much larger percentage of requests will need to make a round-trip to the auth database. Thus adding latency and database load which may hinder scalability. To make caching effective again, you would need to use an external cache like Redis. That is another integration point, another system to care for and feed.
The newer alternative is to use a cryptographic signature in lieu of centralized database lookups. The common implementation is JWT. I have an LI5 post that does a decent job comparing JWT with traditional authorization. JWT allows for self authorization. That is, the service can check for itself (using CPU instead of a DB round trip) whether the token is valid. Once validated, the token's data (e.g. roles and permissions) can be trusted.
But JWT has a couple of downsides that come to mind. The first one is that it has a learning curve. Like most things that use cryptography, it is best to let a trusted provider or library take care of most of the gory details. Configuring it can still feel other-worldly. But don't panic. Others have done it, and you can too.
Second, permission changes are not immediate. Remember above how I said that authorization is self-service? Without checking some central list of valid logins, I do not have a way to know that a particular token has been administratively revoked. As far as my service is concerned the token will continue to be valid -- letting the user make requests -- until it expires. The common way around this is using Refresh Tokens. This boils down to forcing a reauthentication every so often (e.g. every 5 minutes). But the reauthentication happens behind-the-scenes with no user interaction. That way at least every 5 minutes (or whatever time you set) you get updates on permission changes. This outcome is eerily similar to traditional auth lookups with 5-minute caching. ;)
Personally, I have been using JWT for newer development. I haven't found that I needed refresh tokens so far. It is a good thing too, because most sources say that refresh tokens should not be exposed to the browser. What fun.