IPFS | Data Persistence For Web Content

in Programming & Dev4 years ago

What Is IPFS?

IPFS is a distributed filesystem. It's basically a peer to peer, global storage engine that allows content to be stored and cached globally.

Think of a large website like Amazon that stores copies of their files in multiple locations all over the globe. IPFS does something similar and can be used by anyone, not just one of the largest companies on the planet. This allows your content to be served by a globally distributed network and without having to run a massive amount of your own server infrastructure.

Why IPFS?

IPFS is great for sharing files with people online and for hosting files posted to the web as well as files posted to blockchains like Hive where images are links to files hosted on a server somewhere. Due to the distributed nature of IPFS your files don't have to be hosted on a dedicated server somewhere. As long as one node in the network has a copy of the file, it can be accessed.

Even better: the only time you need to have your IPFS node online is if no other nodes on the IPFS network have a copy of the file. You can even run your node on your home computer using the desktop application.

Uploading Data

The various IPFS applications will allow you to upload data (called an Import). Once you've added data to IPFS it will be accessible via the IPFS network as long as your node is online or it has been cached by another node on the network.

IPFS has a concept called Pinning that means data will be persisted long term by your node. You will want to pin the data you've added to IPFS so it's available via your node at a minimum.

Uploading data into IPFS is incredibly straight forward and easy to accomplish. The UI is pretty straight forward and the main IPFS site has documentation if you need additional information on how to Import and Pin data.

Long Term Data Persistence

Long term you'll either need to run your own node that's always online or use a pinning service that will pin your files to ensure they are always online. Other nodes in the network will cache you content but not store it long term unless they pin your data. This is why I say you need an always online node or a pinning service or both.

The good news is you don't need to use a pinning service, you can run a node at home easily and with minimal cost as long as you have an always online internet connection.

That said, there is nothing wrong with using a pinning service. They may be the best option for keeping your content always online if you don't have reasonable, always online internet available. Pinning services also allow you to use IPFS without having to deal with running a node on your own hardware. Just keep in mind a lot of internet based service providers have come and gone over the years. Running a local node is a good way to ensure your data is stored in IPFS over time.

A good approach is storing your content in your local node and use a pinning service to pin your content which will store it on their servers. This has the benefit of not relying on the pinning service long term while keeping your content always online. The pinning service will likely handle the majority of situations when your content is accessed while your local node can ensure the content is available if the pinning service goes offline.

By using this approach you still have your data in the IPFS network via your node (even if it's not always online) and you aren't tied to the pinning service long term. You can go to another pinning service at any time or if your current pinning service goes offline. When switching pinning services you'd only need to have your node online while the new pinning service caches the content from your local node.

Posting IPFS Content To The Web

To use the IPFS content you've uploaded and pinned (locally, via pinning service, or both) you can use the main IPFS public gateway URL for traditional web access and sharing. The URL will take the form https://ipfs.io/ipfs/[content_id] where [content_id] is the CID of the content you've added to IPFS (you can find the CID in the IPFS applications).

You can use this URL to add the content to posts on websites, Hive, and more. I strongly recommend verifying the URL works 100% before posting it online. Verifying the URL will also cache the content on the gateway's IPFS nodes. When the gateway first goes to lookup your content it won't have a local copy and can be slow to load initially. Once the content is cached it should load much faster.

This may seem like a single point of failure but the CID is unique to your content across the IPFS network. If a public gateway that has the CID present (like the one above) then users will still be able to find your content via IPFS, even if the gateway website is down. All they need is the CID from the URL and an IPFS browser or another gateway.

Please note: there are many different public gateways and I used the main ipfs.io gateway above. You can find other gateways on the main IPFS website if you prefer to not use the gateway I have referenced.

Server Nodes

If you do NOT plan to deploy an IPFS node on a remote server, please skip this section.

If you're like me and have a server connected to the internet you may want to setup an IPFS node that's not local but is always online. The following sub sections outline setting up a server running Docker (Kubernetis should work too) to be an IPFS node.

The CPU and RAM requirements of IPFS are pretty low so this can be deployed even on small servers.

IPFS Daemon Inside A Container

The following bash script will launch an IPFS server container with the necessary public ports mapped globally and the management ports mapped to localhost. You will want to ensure your firewall has the necessary ports open to facilitate discovery of your node and joining the main IPFS network.

A couple things to note with this script that you may want to change

  • I run IPFS on a dedicated Docker network called services with an IP range of 172.16.16.0/24. You will likely want to change this
  • I run Traefik on my server and it should be disabled for the IPFS server
  • I use /var/ to store container's persistent data
#!/bin/bash

docker pull ipfs/go-ipfs

docker rm -f ipfs

docker run \
    --name ipfs \
    --net services \
    --ip "172.16.16.10" \
    -p 4001:4001/tcp -p 4001:4001/udp -p 127.0.0.1:5001:5001 -p 127.0.0.1:8080:8080 \
    --restart unless-stopped \
    -e TZ=UTC \
    -e DEBUG=1 \
    -l "traefik.enable=false" \
    -v /var/ipfs:/data/ipfs \
    ipfs/go-ipfs

Server config

Once the container is up and running you'll want to run ipfs config profile apply server inside the container to apply the server configuration profile. This will setup your server to not spam the data center network with discovery traffic. It will also apply some server specific optimizations that the IPFS developers feel appropriate. You really do need to apply this profile!

Once the profile has been applied, shut down the daemon and make the following config changes

Datastore.StorageMax "20GB"
Datastore.GCPeriod '"12h"'

Once the above configuration is applied you can restart the container.

Once the container is back online you now have a functional IPFS server.

Web Admin

To work with the node and manage the data stored within it, you will need to tunnel ports 5001 and 8080 to your local machine. Once the tunnels are setup you can navigate to http://127.0.0.1:5001/ipfs/bafybeif4zkmu7qdhkpf3pnhwxipylqleof7rl6ojbe7mq3fzogz6m4xk3i/#/ in a browser to work with your node. The server web UI is actually content stored on IPFS which is why the URL looks strange compared to more traditional URLs.

The management UI will let you configure the node, add data, delete data and pin data. It's a full UI for working with your node.

At this point you can fully interact with your node as long as the two ports above are tunneled.

Going Further

There are some additional features that IPFS provides beyond basic data storage. It can host a website for you, has features to facilitate DNS records that point to IPFS content and more.

This guide is focused on persistent data, content and media type assets. There is more you can do with IPFS beyond this base use case.

Links

Below you'll find all of the links I cataloged while researching IPFS and how to run my own node as well as allowing traditional web users to view the data I've added to IPFS.

Not all of these links will be applicable to you but they should serve as a good starting point for more information.

Sort:  

This was a really nice explanation of IPFS. Pretty cool stuff, I really haven't gotten into IPFS and might explore it some more in the future. I know its getting more and more popular with time.

Definitely. I spent a bit of time sorting it out and its proving to be a great way for media abd assets attached to posts to be persistent like the containing post.

My biggest worry with posts that have images was ensuring im not relying on an external hosting provider that could go offline and no ability to adjust past posts.

IPFS solves that quirk really well

IPFS is just so cool :-)

Agreed, now that I have sorted out some good ways to use it to persist media on my posts I plan on building it into my workflow for anything that benefits from images and media assets.

So that is basically anything :)

Congratulations @kemonine! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

You distributed more than 700 upvotes.
Your next target is to reach 800 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Happy Birthday to the Hive Community
A successful meetup and its commemorative badge

Yay! 🤗
Your post has been boosted with Ecency Points.
Continue earning Points just by using https://ecency.com, every action is rewarded (being online, posting, commenting, reblog, vote and more).

Support Ecency, check our proposal:
Ecency: https://ecency.com/proposals/141
Hivesigner: Vote for Proposal