Dan, your FracPack format introduces some fascinating concepts in binary serialization, particularly around efficiency and compatibility. I’m curious about the challenges you faced in balancing performance optimization with the need for forwards and backwards compatibility in FracPack. Could you share some insights into how you navigated these trade-offs during the development process?
You are viewing a single comment's thread from:
Forward and backward compatibility is largely a function of making as many fields as possible in your data structures "optional". This is the approach used by every format I have seen. The cost of making something optional is the extra data you need to encode to indicate the presence of the optional fields.
The smallest way to indicate an optional field is a single bit, but if a field is present then you need to know where to find it which requires an offset (4 bytes). There is no extra overhead for making dynamically sized types optional in FracPack because the struct's heap offset pointer can signal non presence in the pointer itself. Likewise, empty strings and vectors have no extra overhead and use the same amount of data as optional.
For speed of zero-copy access to fields, it is best to know a constant offset to the field you are looking for rather than having to first read a variable length bitfield.
So the tradeoff we made is that null optional fields use 4 bytes each, all 0, unless all fields after it are also null in which case we can truncate the size of the fixed-region and each non-present trailing optional field uses 0 bytes.
If data-size is an issue, then a fast zero-compression algorithm like Cap'n'Proto uses can all but eliminate the overhead; however, the act of compressing/decompressing can slow down the first access and potential security issues with data decompression bombs.
So for the sake of speed and constant time access to fields via pre-computed offsets, optional data has an overhead of 4 bytes. This overhead can be mitigated if you know you will never want to remove fields from version 1 of your data types and you always include or exclude all (or most) of your future extended fields.
I’ve noticed in your past blockchain projects, like what became Hive, the use of an ‘extensions’ array at the end of definitions, which provided flexibility for future enhancements, such as the addition of ‘beneficiaries’ in the ‘comment’ object. Has this experience influenced your approach to data structure design in FracPack? Is that what you're referring to by "extended fields," here?