Using the publicly available IPFS network on it's own is not feasible I think, because I don't think there's a guarantee of images being served. But it might be reasonable to run a semi-private IPFS network operated by Hive-enthusiasts, and we're considering that as an option. But it's just the beginning of an idea, no analysis has been done on it yet.
For me to support this proposal, I'd like to see some way to reduce the centralization risk. Putting images on IPFS and making the currently centralized infrastructure open source would be a good start.
The infrastructure code is already open source (it's in the main Hive repo).
Just putting it on the public IPFS system won't really do anything useful, IMO, since I don't think the data will be served up at the necessary rate for a useful experience. But anyone is welcome to upload the data there.
What may work better though, is to setup a separate IPFS network that can be supported by Hive enthusiasts (or a similar open overlay network). One of our plans is to investigate what's the best technology for such a network.
As mentioned in the post, some of the other most promising tech is glusterfs and cephs. A few other possibilities to investigate were mentioned by commenters.
The key requirement is high availability of the data, which means we need to have 1) reliable servers and 2) the data is distributed redundantly in such a way that one or two of the servers going down won't make any of the data inaccessible.
Thanks @blocktrades for the info. I am close to supporting this proposal!
The infrastructure code is already open source
Are you talking about the hive/imagehoster repository? Would be great to have a list of repositories and potentially even the commit range that this proposal covers.
Just putting it on the public IPFS system won't really do anything useful, IMO
My primary longterm concern is to avoid centralization risk. For example, I want to avoid having the availability of all images dependent on a single entity. If that entity is going to disappear, it is essential that other actors can quickly mirror the images and hosting can be migrated with minimal (no) disruption.
What I like about IPFS is twofold:
content-addressing. ideally the blockchain could store unlimited content including the full images... the next best thing is content addressing, as the link guarantees the image's authenticity
anyone can host: while this doesn't guarantee that someone will host, it does mean hosting is no longer dependent on a single entity and can transition seamlessly between multiple entities.
To what extent does the current / proposed implementation achieve these goals?
I'm not tied to IPFS. Just I know less about the alternatives.
I see that realistically a single entity likely has to bear the brunt of the hosting demand for now. I just want to make sure it's designed in such a way so that the ecosystem is not forever dependent on that entity for hosting.
We have made some changes to the codebase (not yet committed, AFAIK) to account for performance changes required by the move away from Amazon S3 storage, but most of the work so far has been server setup and experimentation (e.g. devops work) rather than coding (e.g. setup of the infrastructure described in the post such image caching, haproxy, imagehoster, fileservers, etc).
Our plan is definitely to move to a decentralized storage system, but until more work is done, I don't know what form it will take. Figuring that out is the big remaining task, and will require a lot more investigation and experimentation. In the long run, we also don't want to be responsible for all the load either.
Using the publicly available IPFS network on it's own is not feasible I think, because I don't think there's a guarantee of images being served. But it might be reasonable to run a semi-private IPFS network operated by Hive-enthusiasts, and we're considering that as an option. But it's just the beginning of an idea, no analysis has been done on it yet.
For me to support this proposal, I'd like to see some way to reduce the centralization risk. Putting images on IPFS and making the currently centralized infrastructure open source would be a good start.
Any updates in this regard?
The infrastructure code is already open source (it's in the main Hive repo).
Just putting it on the public IPFS system won't really do anything useful, IMO, since I don't think the data will be served up at the necessary rate for a useful experience. But anyone is welcome to upload the data there.
What may work better though, is to setup a separate IPFS network that can be supported by Hive enthusiasts (or a similar open overlay network). One of our plans is to investigate what's the best technology for such a network.
As mentioned in the post, some of the other most promising tech is glusterfs and cephs. A few other possibilities to investigate were mentioned by commenters.
The key requirement is high availability of the data, which means we need to have 1) reliable servers and 2) the data is distributed redundantly in such a way that one or two of the servers going down won't make any of the data inaccessible.
Thanks @blocktrades for the info. I am close to supporting this proposal!
Are you talking about the
hive/imagehoster
repository? Would be great to have a list of repositories and potentially even the commit range that this proposal covers.My primary longterm concern is to avoid centralization risk. For example, I want to avoid having the availability of all images dependent on a single entity. If that entity is going to disappear, it is essential that other actors can quickly mirror the images and hosting can be migrated with minimal (no) disruption.
What I like about IPFS is twofold:
To what extent does the current / proposed implementation achieve these goals?
I'm not tied to IPFS. Just I know less about the alternatives.
I see that realistically a single entity likely has to bear the brunt of the hosting demand for now. I just want to make sure it's designed in such a way so that the ecosystem is not forever dependent on that entity for hosting.
Yes, I was talking about hive/imagehoster repo.
We have made some changes to the codebase (not yet committed, AFAIK) to account for performance changes required by the move away from Amazon S3 storage, but most of the work so far has been server setup and experimentation (e.g. devops work) rather than coding (e.g. setup of the infrastructure described in the post such image caching, haproxy, imagehoster, fileservers, etc).
Our plan is definitely to move to a decentralized storage system, but until more work is done, I don't know what form it will take. Figuring that out is the big remaining task, and will require a lot more investigation and experimentation. In the long run, we also don't want to be responsible for all the load either.