You are viewing a single comment's thread from:

RE: BOINC User XML data serialization comparison

in #gridcoin6 years ago

The only downside I can think of is that it's binary so it's more difficult to read off the air.

Do you think that's possible via flat buffers or grpc?

To further compress the binary serialization we could use 16 byte binary representation of the CPIDs instead of using it's hexadecimal form. I suspect that's where a lot of the storage goes.

Do you have more details on how this can be done in python? Do you mean compresing the string or just converting the CPID from a string to binary?

The files would be far smaller if the CPID was omitted, relying on userId instead & perhaps constructing a separate index for userId:CPID for quick lookup.

Sort:  

Do you think that's possible via flat buffers or grpc?

Never heard of those :)

Do you have more details on how this can be done in python? Do you mean compresing the string or just converting the CPID from a string to binary?

Sure. Change User.cpidfrom string to bytes and assign using hex conversion:

>>> cpid = '5a094d7d93f6d6370e78a2ac8c008407'
>>> len(cpid)
32
>>> cpid.decode('hex')
'Z\tM}\x93\xf6\xd67\x0ex\xa2\xac\x8c\x00\x84\x07'
>>> len(cpid.decode('hex'))
16

It does make it more tedious to use but there should be a significant reduction in size.