Ok, first thing to understand is that IPFS will not download any data unless you ask for that data. So if you don't want to download some content, don't open the link containing that information. It is similar to loading a website in that case. If you don't want to load a webpage, don't type the url into your browser.
If you load the content it will stay on your node for a while. Or until you 'garbage collect'. If you want to store content for longer on your node, you should 'pin' it. This will keep it from being removed.
Due to the nature of content adressing it is hard to know what is 'illicit' material in advance. There are no white/blacklists that I know of.
I can answer your first question though:
- There is a maximum storage you can set for IPFS. It can be set in .ipfs/config under StorageMax.
Thanks, but this didn't really solve my problem. If you are running a service that will display IPFS content (such as steemit) then one cannot simply "avoid downloading a hash", it happens at user request.
The maximum storage size is a global amount. I wish to limit the maximum FILE size. I could store 100 GB of 2MB images but not 300 MB movies or 3MB songs.
Also I would rather not have a web request trigger a 1GB download from IPFS.
It depends on how you run your service. If you enable a http-gateway of your own, then yes it is a hard job to filter that content. But you have that same problem with regular http. If users run their own ipfs node, you don't store any content.
You should be able to estimate the size of a hash by getting the object and counting the amount of links in the object. Number of links times the max data size should give a (very) rough size indication. But close enough to distinguish between small and large files.
ipfs file ls ipfs/path
to get the size of the individual files.try
ipfs object stat <hash>