Microsoft Orleans – Implementing grains

Note: this is post is part of a serie about Microsoft Orleans and the Actor model

I think that Orleans is a beautiful framework and it was really fun to work with, but unfortunatly it lacks documentation and real simple examples: this make difficult to start with it. It’s on heavy development, and it’s production-ready for projects that fit in the Actor pattern or like “micro/nano-services” or “DDD aggregates”. One thing to keep in mind is that the design should be done Avoiding Microservice Megadisasters (Jimmy Boggart).

A good fit

Orleans can be a considered when:

  • It’s a known project (no greenfield)
  • You need a fast response-time to the client (2 ms)
  • You are planning to add a cache or a distribuited cache (Redis)
  • Need or may need to run on >1 server (for scaling or to have a peak like Black Friday)
  • You could need to use a Responsive Cache
  • You need zero-downtime (due to SLA and deployments)

My idea is that Orleans can work very well if you have:

  • a lot of actors with short lifetime, and you don’t need to query agains them often. An example could be: payment transaction, orders, etc.
  • limited amount of actors with long lifetime, and you need to query agains them often. An example: products, inventory, warehouses, etc.

If it fits the Actor Model:

  • Entities are small enough to be single-threaded
  • Significant number of loosely coupled entities (hundreds to millions)
  • Workload is interactive: request-response, start/monitor/complete
  • No need for global coordination, only between a few entities at a time

A problematic fit

Orleans can be a problematic fit if:

  • You don’t have enought knowledge on the project (greenfield)
  • Managing multiple version of grains (or grain states) is complex and time consuming. It can request multiple additional deployment to be able to migrate the state from A to B.
  • Entities need direct access to each other’s memory
  • The relation between entities is complex (like an ERP)
  • You will not have benefit from in-memory indexes
  • Small number of huge entities, multithreaded
  • Global coordination/consistency needed

I think it will not work very well if you have:

  • A classical CRM/ERP application, with big numbers of actors, and you need to query against them normally. (*1)
  • the need of changing the models/storage/domain or if it’s unclear how the data rapresentation should be (*2)

(*1) The main limitation is the Indexing part, where the in-memory indexes could be bigger than the available RAM. Imagine that you need to index 20 differents actor-types, and you have 10 properties each, and you could have thausand or millions of actors: this means at you will need to keep huge in-memory dictionary to be able to query them. If you need to query through the db to get a search-result, then Orleans will not be leveraged.

(*2) It must be understood very well the serialization strategy: it’s pretty costly to extend or change a model to persist the actor-state, because the default serializer is not version tolerant yet. Then you need to use Google Protobuf, ProtobufNet, Microsoft Bold, or your own implementation. There is a work-in-progress for a default-serializer more version tolerant.

As today it’s not ready for a more DBMS/multi-tier replacement, but they are working on a AODB-Indexing implementation, hopefully coming soon.

Another success factors includes finding the right bounded context and following microservice principles with separate data stores and “caching” others data so we don’t depend on any other service/database when responding to user requests (Data on the inside vs outside, Pat Helland). This approach helps to migrate from a monolith too.

Implementing Grains

  • Grains can be stateless. When to use [StatelessWorker]
    • Functional operations: decrypt, decompress, before forwarding for processing
    • Multiple activations, always local
    • E.g., good for staged aggregation (locally within silo first)
  • By default grains are non-reentrant
    • Deadlock in case of call cycles, e.g. call itself
    • Deadlocks are automatically broken with timeouts
    • [Reentrant] to make a grain class reentrant
    • Reentrant is still single-threaded but may interleave
    • Dealing with interleaving is error prone
  • Grain Sizing
    • For throughput, usually better to use many smaller grains than few large grains, but overall best to choose grain size & types based on application domain model, Example: Users, Orders, etc

Grain persistence

You can store your grain-state basically wher you want. Orleans ufficially supports Azure technologies like:

  • Azure Table Storage (*1)(*2): cheap but 64kb limit. Good fit for normal grains, not for indexing-grains (if a index-dictionary can grow more than 64kb)
  • Azure Blob Storage (*2): more expansive but no storage limits. Good fit to store indexing grains.
  • Microsoft Sql Server RDBM: relational storage is possible. This is perfect if you need to make your data available to other systems, or you need to re-use existing data.

You can also use in-memory storage, very fast, but real-world applications using this approach are very uncommon.

(*1) In the orleans cluster configuration you need to specify a ServiceId (ex: “OrleansTest”). If you change it you will lose all the states stored using “Azure Table Storage” (you need to fix the string-Ids manually).

(*2) if you change a namespace or a class name of a model you are persisting, you will lose all states stored using “Azure Table Storage” and “Azure Blob Storage”.

To benefit of the persistence, we need to define .NET interface that is extending Orleans.IGrainState and containing fields to be included in grain’s persisted state.

  • The Grain class should extend GrainBase<T> and adds strongly typed State property to the grain’s base class.
  • The first State.ReadStateAsync() will occur automatically before ActivateAsync() is called for a grain.
    • Grain can re-read current state data from underlying backing storage using State.ReadStateAsync(): this is good way to force “resync” with underlying DB changes.
    • Alternately, grain can use a timer to re-read data from backing storage periodically, based on suitable “staleness” decisions for an application. Example: Content Cache grain
  • Grain should call State.WriteStateAsync() whenever they change data in the grain’s state object
    • Grains typically call State.WriteStateAsync() at the very end of grain method, and return the Write promise.
    • Storage provider “could” try to batch Write’s for efficiency
    • Alternatively grains might use timer to only write updates periodically: Application can decide how much “eventual consistency” / staleness it can allows – range from immediate / none to several minutes.
  • Each grain class can only be associated with one storage provider.
    • The particular provider to use for a grain defined with [StorageProvider(ProviderName=“name”)] attribute.
    • Silo config file needs <StorageProvider> entry in silo config file with corresponding name.
    • if you need more throughput than the storage provider account can provide, you could use multiple Storage accounts for each grain type.
    • Data stored in binary format in one (Azure table) cell using efficient Orleans serializer.

How about dealing with failure of storage operations?

  • Either grains or storage providers can await storage operations and retry any failures if desired.
  • If unhandled, failure will be propagated back to caller / client as a broken promise.
  • No concept currently of activations getting destroyed automatically if storage operation fails [except initial Read]
  • Built-in storage providers do not retry failing storage operations by default.

Splitting grains

If I have a grain with a lot of data (for example named ProductGrain), does it make sense to split it into multiple grains like ProductGrain, ProductCommentsGrain, and ProductReviewsGrain?
It can be, but remember that each grain call is effectively accross a silo, this can be minimized with PreferLocal placement.

Grain calling grain

If you have two grains and you want them to be in-sync with each other then you need some form of coordination.  For example, you could use transactions. I think it’s likely that you can solve that without transactions, though.

Does a grain exists?

What’s the best way to stop someone using web API to create huge amount grain states in the database? Or validate/whitelist a reference to a grain (ex. a Guid)?
Let’s say you have UserGrain and someone request UserGrain that doesn’t have state in database. How do I make sure its state is not saved to database and then return an error from that web API request?

  • You could introduce a UserManagerGrain which keeps track of existing users. Either by keeping the existing users in it’s state, or querying some external store.
  • Or you could add a boolean to the state of the UserGrain, something like IsCreated which you set once it is created. And check for it in every method of the UserGrain. Or implement it using an IGrainCallFilter which checks the boolean IsCreated property and throw an exception when it was not. You can find interesting this thread on stackoverflow.

For curiosity’s sake I did some tests and found that the former is slower than the latter. The test was made using a crappy machine, running in a VM, single silo, running with debug in Visual Studio and a local Azure Emulator.

The test consists in a check over 100 existing items and over 100 not existing items, with 100 concurrent threads. The in-memory-list approach takes 4 times more (average of 276 ms vs 67 ms). Don’t reference to these number as absolute in execution time: do remember my very poor (and unrealistic) execution context.

Update 2020.08.20: Thanks to this PR Added ‘RecordExists’ flag to perisistent store so that grains can detect if a record already exists now is possible to know if a grain has a persisted state.


Add a Comment

Your email address will not be published. Required fields are marked *