Skip to main content
2025-01-17
5 min read

Transcribing and Translating Subtitles from a YouTube Video

Introduction

Recently, I worked on a personal project to transcribe and translate subtitles from a YouTube video. The reason I wanted to do this was because I wanted to watch a Chinese video that had no subtitles. It was an educational exercise to learn more about tools like yt-dlp, Whisper, and ffmpeg. Here’s how I did it step by step.

Step 1: Downloading the Video

First, I used yt-dlp, a command-line tool, to download the video from YouTube.

```bash yt-dlp <youtube_video_url> ```

Replace

plaintext
<youtube_video_url>
with the link to the video you want to process.

Step 2: Transcribing the Video

To transcribe the audio, I used Whisper. Since the video was in Mandarin, I specified the language and task:

```bash whisper video.mp4 --language Mandarin --task transcribe ```

This generates a

plaintext
.srt
file with subtitles in Mandarin.

Step 3: Translating the Subtitles

Next, I translated the subtitles using a Node.js script and a translation API (e.g., Google Translate or DeepL). The script reads the original

plaintext
audio.srt
file and creates a translated version.

```bash node translateSrt.js ```

You can find the full Node.js script in this repository, which includes detailed instructions for use.

```javascript

const fs = require("fs"); const path = require("path"); const translate = require("google-translate-api-x"); const cliProgress = require("cli-progress");

(async () => { try { // Read your SRT file const inputFilePath = path.join(__dirname, "audio.srt"); const outputFilePath = path.join(__dirname, "audio_translated.srt");

plaintext
const data = fs.readFileSync(inputFilePath, "utf-8");
const lines = data.split("\n");
const translatedLines = [];

// Set up the progress bar
const progressBar = new cliProgress.SingleBar(
  {
    format: "Translating |{bar}| {value}/{total} lines",
  },
  cliProgress.Presets.shades_classic
);

// Start the bar at 0 with the total set to the number of lines
progressBar.start(lines.length, 0);

for (let i = 0; i < lines.length; i++) {
  const line = lines[i];
  const trimmedLine = line.trim();

  if (
    trimmedLine.match(/^\d+$/) ||
    trimmedLine.includes("-->") ||
    trimmedLine === ""
  ) {
    // If it's an index line, timing line, or empty line, just keep it
    translatedLines.push(line);
  } else {
    // Attempt to translate the line
    try {
      const res = await translate(trimmedLine, { from: "auto", to: "en" });
      translatedLines.push(res.text);
    } catch (err) {
      console.error("Translation error:", err);
      // Fall back to original text if there's an error
      translatedLines.push(line);
    }
  }

  // Update progress
  progressBar.update(i + 1);
}

// Stop the progress bar
progressBar.stop();

// Write out the translated file
fs.writeFileSync(outputFilePath, translatedLines.join("\n"), "utf-8");
console.log("Translation complete:", outputFilePath);

} catch (err) { console.error("Error:", err); } })();

```

Step 4: Adding Translated Subtitles to the Video

I used ffmpeg to embed the translated subtitles back into the video:

```bash ffmpeg -i video.mp4 -vf subtitles=translated.srt output_video.mp4 ```

Conclusion

This process was purely for educational purposes and allowed me to explore various tools. If you attempt this, ensure you respect copyright laws and use videos you have permission to modify.