As you can see here, I have some links in my blogs written before the hive-fork pointing to steemit.com. It's time to replace them all.
Almost all blog posts written before the fork are written with apps that are not included in hivescript and lead to problems with canonical URLs:
As it can be seen here, my old post update for beem: first release for HF 21 result in different canonical URLs on different front-ends. This is then handled as duplicated content by the search engines.
The post was written through palnet. As there is no entry for palnet in hivescript, the front-ends to not know how to build a proper canonical URL:
Fixing The mess
Fixing means:
- replacing all steemit, steempeak, ... links with relative links
- setting
canonical_url
for each post written before 2020-03-20, to fix canonical URLs.
Small update
The script uses now relative links, when found a link to steemit.com ..., it will be replaced by a relative link. A relative link looks like: [holger80](/@holger80)
and [this post](/hive-139531/@holger80/how-to-fix-canonical-urls-and-links-in-your-pre-fork-posts)
Small update 2
There are now three boolean parameters, which can be used to set the following:
replace_steemit_links
: when True, steemit, ... links will be replaceduse_relative_links
: when True, relative links will be used (starting with/
)add_canonical_url
: When True, a canonical_url is added to the metadata
Small update 3
It is now possible to use the same script for fixing the canonical links on STEEM for all written post before the fork.
When you want to use the script on STEEM:
- set
target_blockchain = "steem"
When you want to use the script on HIVE:
- set
target_blockchain = "hive"
Python code
The following script is using beem and will do exactly this.
beem can be installed by
pip install beem
or
conda install beem
Store the following as fix_canonical_urls_hive.py
:
#!/usr/bin/python
from beem import Hive, Steem
from beem.utils import addTzInfo
from beem.account import Account
from beem.comment import Comment
from beem.nodelist import NodeList
import time
from datetime import datetime
import getpass
if __name__ == "__main__":
# Parameter
canonical_url = "https://hive.blog"
replace_steemit_links = True
use_relative_links = True
add_canonical_url = True
target_blockchain = "hive" # can be hive or steem
# ----
# at least one option must be true
assert replace_steemit_links or add_canonical_url
assert target_blockchain in ["hive", "steem"]
# Canonical url must not end with /
if canonical_url[-1] == "/":
canonical_url = canonical_url[:-1]
nodelist = NodeList()
nodelist.update_nodes()
test_run_answer = input("Do a test run? [y/n]")
if test_run_answer in ["y", "Y", "yes"]:
test_run = True
print("Doing a test run on %s!" % target_blockchain)
else:
test_run = False
if test_run:
if target_blockchain == "hive":
blockchain_instance= Hive(node=nodelist.get_hive_nodes())
else:
blockchain_instance= Steem(node="https://api.steemit.com")
else:
wif = getpass.getpass(prompt='Enter your posting key for %s.' % target_blockchain)
if target_blockchain == "hive":
blockchain_instance = Hive(node=nodelist.get_hive_nodes(), keys=[wif])
else:
blockchain_instance = Steem(node="https://api.steemit.com", keys=[wif])
if target_blockchain == "hive":
assert blockchain_instance.is_hive
else:
assert blockchain_instance.is_steem
account = input("Account name =")
account = Account(account, blockchain_instance=blockchain_instance)
if add_canonical_url:
print("Start to fix canonical_url on %s for %s" % (target_blockchain, account["name"]))
if replace_steemit_links:
print("Start to replace steemit links on %s for %s" % (target_blockchain, account["name"]))
apps_with_cannonical_url = ["hiveblog", "peakd", "esteem", "steempress", "actifit",
"travelfeed", "3speak", "steemstem", "leofinance", "clicktrackprofit",
"dtube"]
hive_fork_date = addTzInfo(datetime(2020, 3, 20, 14, 0, 0))
blog_count = 0
expected_count = 100
while expected_count - blog_count == 100:
for blog in account.get_blog_entries(start_entry_id=blog_count, raw_data=False):
blog_count += 1
if blog["parent_author"] != "":
continue
if blog["author"] != account["name"]:
continue
if "canonical_url" in blog.json_metadata and canonical_url in blog.json_metadata["canonical_url"]:
continue
if "app" in blog.json_metadata and blog.json_metadata["app"].split("/")[0] in apps_with_cannonical_url and target_blockchain == "hive":
continue
if blog["created"] > hive_fork_date:
continue
body = blog.body
if "links" in blog.json_metadata:
links = blog.json_metadata["links"]
else:
links = None
if "links" in blog.json_metadata and replace_steemit_links:
for link in blog.json_metadata["links"]:
if "steemit.com" in link or "steempeak.com" in link or "busy.org" in link or "partiko.app" in link:
authorperm = link.split("@")
acc = None
post = None
new_link = ""
if len(authorperm) == 1:
continue
authorperm = authorperm[1]
if authorperm.find("/") == -1:
try:
acc = Account(authorperm, blockchain_instance=blockchain_instance)
if use_relative_links:
new_link = "/@" + acc["name"]
else:
new_link = canonical_url + "/@" + acc["name"]
except:
continue
else:
try:
post = Comment(authorperm, blockchain_instance=blockchain_instance)
if use_relative_links:
new_link = "/" + post.category + "/" + post.authorperm
else:
new_link = canonical_url + "/" + post.category + "/" + post.authorperm
except:
continue
if new_link != "":
for i in range(len(links)):
if links[i] == link:
links[i] = new_link
body = body.replace(link, new_link)
print("Replace %s with %s" % (link, new_link))
json_metadata = blog.json_metadata or {}
if links is not None and replace_steemit_links:
json_metadata["links"] = links
if add_canonical_url:
json_metadata["canonical_url"] = canonical_url + "/" + blog["category"] + "/@" + blog["author"] + "/" + blog["permlink"]
print("Edit post nr %d with canonical_url=%s" % (blog_count, json_metadata["canonical_url"]))
print("---")
if not test_run:
try:
blog.edit(body, meta=json_metadata, replace=True)
except:
print("Skipping %s due to error" % blog.authorperm)
time.sleep(6)
expected_count += 100
You can now start the script with:
python fix_canonical_urls_hive.py
If you are on Linux, you should replace pip
by pip3
and python
by python3
.
How does it work
The script goes through all blog posts written before 2020-03-14. Whenever the post was written by an app, that is not properly handled by hivescript, a new canonical_url is set.
You can define your preferred front-end here:
canonical_url = "https://hive.blog"
If you like other front-ends, you can replace this line by
canonical_url = "https://peakd.com"
canonical_url = "https://leofinance.io"
canonical_url = "https://esteem.app"
In the next step, all used links are checked. Whenever a link is pointing to a valid hive post or to a valid hive user, the link is replaced by a releative url (When the link was pointing to steemit.com, steempeak.com, busy.org or partiko.app).
Test run
You can do a test run and checking what will be changed by the script:
This show now the following information:
The set canonical url is shown as well all links that will be replaced.
Fixing your posts
We can now start to fix all old posts:
Results
All changes have been broadcasted:
The links have been corrected, as shown here:
There seems to be a bug with hive.blog, that steemit.com links are shown as internal and hive.blog links are shown as external links.
The canonical url is also fixed:
It seems that esteem.app has not changed its canonical url right now. As I know that esteem.app should read the canonical_url
parameter (works for steempress), it may correct the canonical URLs later.
After a fix on esteem.app, esteem.app is using now the correct canonical url:
Results on STEEM
Setting canonical_url
works also on steemit:
I used seoreviewtools to check the canonical urls.
If you like what I do, consider casting a vote for me as witness on Hivesigner or on PeakD
This changes the canonical link on the Hive blockchain. Shouldn't we do same for the Steem blockchain? Otherwise we may have two canonical links set, one on Hive, one on Steem and search engines will either ignore both, or will consider the domain with the highest authority as the source, and penalize the others, or will penalize all domains.
I updated the script, it can now be used on Steem to set canonical urls.
Hi @holger80! I tried the script for Steem. It works in the test run (with a minor update, to catch unexpected json metadata fields - for example the app parley did set a dict for "app" with more details, rather than the standard string).
But the actual post edits, none are successful (I checked on steemd). When printing the error(s), it says it's
(<class 'AttributeError'>, AttributeError("'PointJacobi' object has no attribute '_Point__x'"), <traceback object at 0x7fc4b9dc7aa0>)
.I'm not experienced with Python, but after a search this is what I used to get info about the error within the except block of the post edits broadcasts:
e = sys.exc_info()
.Any idea what could generate this error?
There is a package missing. Which operating system do you use?
Here I'm using Ubuntu 18.04.
EDIT: I also have beem 0.23.9, the latest pip upgraded to.
This is strange, can you double check that you are using the newest version?
I will test in the meanwhile beem on a newly installed machine.
Yeah, I was right:
EDITS:
I also have Anaconda 4.8.3, if it matters:
Python version is 3.7.7:
That's true, I will prepare a script that will set the canonical on all steem posts.
Just asked the same question, before I saw yours. I agree we want to update canonical links in as many places as possible.
What are the chances of this becoming an online tool that simple users can use without having to run Python scripts on their own machines?
Don't use canonical URLs. Use relative URLs. If Hive is supposed to be distributed, if we open a Hive link in some other front end, the Hive link should open in the same front end. Centralizing over one front end, is a problem with our culture.
For example, my blog is here
I agree relative links are better. I will change the script and it will replace steemit.com links with relative links now.
Short test if it does also work for posts:
my post
I think it would be best to separate the functionality of updating in-post links and the canonical link. For example, I want to update the canonical link on all my posts, but not sure I'm ready to update in-post links, since oftentimes I chose certain frontends for certain reasons when linking to things.
Would be good to have an option to do each of the actions.
I added parameters at the top of the script, which can be used to define the behavior of the script.
Good idea, I will make it optional.
This is great to help Hive with SEO rankings... But I have no clue how to use this script.
Do you have some tutorials for learning from scratch how to use python to interact with the blockchain?
Which operation system do you use?
I have use Windows 10 on my computer which is the OS I use the most, but also have a partition with a version of Ubuntu installed on it.
The easiest way to start on windows 10 is:
conda install beem
fix_canonical_urls.py
filecd
inside the anaconda promptpython fix_canonical_urls.py
inside the anaconda promptSweet! I'll do this later today (I mostly work on my laptop during nighttime). And about the tutorials about how to learn coding in python focusing on tools for interacting with the blockchain? Have you wrote some or know of somebody who did?
Do you have a specific topic in mind? It is on my todo list to write some tutorials using the beem library.
Well I think about what you built as an airplane, I want to know how to pull the levers so I can take off and find out where it gets me lol... More seriously, I've been thinking about developing two projects... A mobile games for Hive and a site where we can bet on sports. I'm not the guy who knows how to code tho, I can only do 3D modeling and I'm fairly good a project management... Nevertheless it's never too late to learn and this lockdown hell pushes me to seek for challenges to prevent my brain cells to suffer boredom decay lol.
So far I've been working behind the scenes for the betting site, but a mobile game would give cool as well.
Maybe too ambitious, but they say that it's easy to learn if you already have goals about what you want to create
Thanks for the implementation. Looking forward to trying it out once the Steem updates are also figured out.
The code is a bit complex. You might enjoy this talk from PyCon 2019. I learned a lot about code complexity from it!
Thanks for the video, I will watch it :).
Wow ! Thank you this is very very useful.
Thanks very much for this! So this will only replace the canonical links and will not break my blog, right? (Sorry for the questions, just want to be sure :D)
Yes, it will set canonical links and replace steemit links. I tested the script on my posts and it worked :).
So this is updating posts on the Hive blockchain only? Is it possible to also update posts on Steem to set their canonical link to hive.blog?
I improved the script, it is now possible to set parameters in the header. When setting
target_blockchain
tosteem
, it adds canonical_urls on blog posts on steam.Awesome. Running it now. Here's an example transactions.
I checked on steemit.com and it sets the canonical URL to Hive. However, @steempeak hasn't yet switched the canonical URL to the hive.blog one.
Does the script create a totally new in the transaction? It seems like it shouldn't be necessary to repost the entire text, but just edit the
json_metadata
instead. Is that possible?Lol i uave like 9000 posts how how fucking hivepower mana wouldd i need ?
I am being punished for having used steem the most. I have crazy people like pfunk running around asking people in steemspeak.com discord to DELETE their steem blogs all because he was wronged. Its so iratuonalky self centered spoiled brat behaviour
Now everyone expexts m3 to take all this rime to erase steem from eveeything after i invested all that time into promoting it? Fuck that and the shitty anarchist philoaphies that got thismplace nowhwre
Top witnesses literaky got scammed and no one will admit that it was always a ponzi ? Leta hope hive can set up some form of actual governance like telos did.... i know behind the scenes top telos and hivedevs are all worki g eslecially eith scot bot
If we can get scot bot for telos and move hive to telos qe dan have free acxounts and just use hive for a social media.dapp ijside telos. The old hive chain can still funcyion as hive classic but will just be used for wocial networkibg front ends etc
We should migrate to telos or merge to telos and then you can apply for block onw angle funcung from the billions at eos vc https://eos.vc i fant do it alone but if u all teamed up and MERGED HIVE with teloa then it would qualify for eosio funding millions of dollars for adveetising can go in and teloa will have all the governance and support and dapp ecosyatem and hive will have all the social networking but come on l3ts adnit we need help thats the first step
If hive axtually merged with felos we could stand a chance at having tye world care and hive could be front page top 20 coin . Qe just need the uwers of hive with the back wnd infrastructure and funding acces of telos eosio and uts 2 million + free wallets imagine not having to pay mollions of dollars just to sign up a million usrrs
Congratulations @holger80! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :
You can view your badges on your board And compare to others on the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
Support the HiveBuzz project. Vote for our proposal!
A huge hug 🤗 and a little bit of !BEER 🍻 from @amico!
Un caro abbraccio 🤗 e un po' di BEER 🍻 da @amico!