ID generation

21 Jun 2024

Identifier tokens (IDs) are useful for referring to objects across systems.

Since separate systems should not be tightly coupled, one system might need to continue to refer to an object that is no longer retained in the other. Therefore generating IDs using a simple counter is not sufficient. Ideally, IDs should be unique across the lifespan of the system.

Many systems are designed to reguarly refresh their working state, perhaps daily. (Even if they retain objects across refreshes, they may implement the functionality by exporting and reimporting those objects. This can limit the ill effect of minor bugs over time.) For the sake of operational simplicity, we should avoid persisting an extra piece of ID related state across refreshes. We can instead use the timestamp of the refresh.

For high performance we can use an integer ID counter internally, but publish it as a string that includes a substring for the timestamp and for identifying the system instance itself, generated at startup. This string ID can also include characters to enable human operators to understand which type of object is being identified. For example, interally an object might be 12345 and externally it might be published as “GLCTS3_ WDGT_2024062118_12345”.

When multiple stateful systems work together, each should do its own ID generation. An ID from one system can be passed along by another system on related messages to make it easier for human operators to support, but this should be done in a separate field. For example, if the “Galactus” system receives widget messages from three different supplier systems, Galactus should have its own widget ID in the WidgetID field and the ID it receives from those systems passed along in the DownstreamWidgetID field. DownstreamWidgetID is not guaranteed to be unique.

References

https://stackoverflow.com/a/16518915 https://en.wikipedia.org/wiki/Snowflake_ID https://en.wikipedia.org/wiki/Universally_unique_identifier

RSS