Yesterday, I needed an image translated. It wasn't a demand to have that translation, but more like a good to have to better understand the context. I knew this should have been possible with AI, but haven't done it before, and since I was busy, didn't bother researching further.
So, today I remembered about that and decided to see how it is done. For some reason I continue to default to ChatGPT for general AI purposes, although I don't like the direction OpenAI is headed (almost nothing open about it), I don't like Sam Altman, and I think they'll be left behind anyway in the long run by behemoth with access to tons of data of their own, besides higher compute capabilities.
First, I tried while being logged out, which is my usual way of using ChatGPT. But looks like the uploading function is disabled for visitors, so I had to log in. Once logged in, all I had to do was to upload the image and ask the AI to translate it. No problem! Got the translation in a few seconds. Looks like it was really simple.
Ideogram helped with this one.
Then I thought... since I'm in this spot, why don't I see how do I get transcripts from videos...
ChatGPT 4o mini (the free version) can't create transcripts from the audio of videos. I then searched online and found some alternatives. Some that seemed very good, but they either required me to create an account or to upgrade in order to be able to copy or save the transcript. Others only took a video as input, not a link, or didn't support Youtube, which is what I wanted in this case (but funnily, supported Tiktok and Instagram).
Eventually, I found this one, which may not be the greatest tool, but did its job and I could copy the transcript after it finished with it.
I don't particularly need to extract transcripts from videos, but I thought it doesn't hurt to learn about it since it's so easy nowadays...
Posted Using InLeo Alpha
Hmm recently I have been playing around with Grok (on X) and I thought it's quite good too.
I tried it out once. Wasn't particularly helpful, although the request was kind of complicated for a language model at this time too. I didn't try a different model then to see how they compare for the same request, since I didn't have much time.
I see, I usually pose identical questions to different AIs to sort of cross check their responses. Haha.
It's interesting to see all those tools available. I did think that they were possible, but never looked into it. As usual, I think you can't trust the accuracy but if you find something interesting, then you can just verify it afterwards.
I verified both of them and they were accurate. But I don't think they will be in all cases. For images, if they are unclear, I don't think they will recognize text well, and for audio/video, if there is background noise, there will be mistakes as well. Plus, if the AI doesn't recognize certain niche language constructs.
There used to be a time when anyone could easily download the subtitles of a YouTube video. The feature went away during the same time ChatGPT was getting popular. Alphabet would want to keep as much data as possible for them. Gemini should be able to easily act better than NoteGPT for YouTube videos.
Thanks! I didn't think of trying Gemini.