You are viewing a single comment's thread from:

RE: Leeching Arseholes!

in #python14 days ago

I suspect the multi-language posts are done via a translator anyway, many people from Latin America are not so fluent in English. Not sure what you mean by 'How about two language posts, in which one language is merely a machine translation?'.

Btw are you going to make the bot public for the others to use?

I don't know yet, possibly.

Sort:  

If you count words regardless of the language, people who use machine translators have an advantage, they basically need like 250 words plus the translated 250.

Machine tranaslations have SEO penalty, they might be one of the reasons why Hive content does not appear high in search engines unless the querry is very specific. From the global point of view, it would be better if authors did not use MT, and we all translated articles we would like to read in our browsers with the tool of our choice instead. Or with integrated AI, for instance the Peakd one.

Writing actual bilingual posts is a whole different story, although they could also have SEO penalty if they happen to puzzle crawlers.

If you count words regardless of the language

As it stands, that's how this function works.., my challenge is to figure out if it contains multiple versions, using MT's or otherwise. Keeps my brain ticking over nicely!

image.png

Well, you can check for specific characters that English lacks - Ñ and vocals with accents for Spanish; umlauts (ä, ö, ü or ß) for German, and likely most Germanic languages; and so on.

If there are English articles AND at least certain number of non-English characters, then the text is likely in two or more laguages.

A more complex option woul counting these English articles, I guess there would be a certain ratio for them in a common English text. Say 0,8 articles per sentence in average or so. If the ratio gets below certain threshold, the text likely contains other language(s), or perhaps is not a fluent natural text, but say a table or something similar.

This looks promising, Python has a vast array of libraries..I will give it a trial run. There's more than one that does the same thing... useful!

image.png

It should be easy with such libraries, since you're about to detect languages in entire posts and not in separated sentences :)

It's nice to see a challenge coupled to a solution that improves a thing.