Better import/editing for YouTube auto-generated captions
in progress
Darin Weeks
YouTube auto-generated captions, at least in Danish, are not formatted into nice paragraphs in LingQ so this request is to somehow clean this up. These auto-generated captions have line breaks based on (I guess) character count to keep the lines about the same size. This makes for inconvenient and short paragraphs in LingQ, and the audio timestamps are often set to be too long... and I think can be overlapping. These auto-generated captions are pretty accurate and could be a huge asset in learning but because of the paragraph breaks they are not very enjoyable to use in LingQ. If it isn't possible to detect a more appropriate sentence start/end for the paragraphs, another idea would be to combine 2 lines into 1 paragraph. Perhaps the current overlapping audio timescales could help to sort out which paragraphs to combine. It seems that the start times are fairly accurate so they may be a more reliable indicator of true position and might be used for the previous sentence ending as well. Capitalization and translations might also help sort out where to break things. The highest priority, I think, should be in trying to correct the line breaks/sentences as it's easier to adjust the audio start/stop using the "edit lesson" screen than to cut/paste and edit the text.
Mark Kaufmann
in progress
Mark Kaufmann
We have tried doing this with Ai but it seems to create problems along with fixing others. We are still working on optimizing this. At the same time, we are going to try and automatically adjust YouTube timestamps so they are more accurate. Hopefully, we will have some improvements here over the next few weeks to a month.