DVD Ripping to MPEG-4 with subtitles: Mission Impossible?!

After ripping my CD library and seeing the advantages of managing the entire library from iTunes, I was asking myself why Apple does not provide a similar solution for ripping DVDs, blaming mostly the large movie publishers for not allowing this due to the copyright issues.

Well, if legal and political reasons might not allow a large company like Apple to implement DVD conversion to iTunes, I thought than hackers will certainly find a way to circumvent all these problems and provide a solution.

product dvd ripper

After doing a little research on the subject, ignoring legal and concentrating on technical issues, I realised that currently fully automated conversion of DVDs to iTunes format is almost impossible due to… subtitles format differences!

If protection mechanisms can be avoided, video and audio streams can be converted from one format to a completely different one, bitmap subtitle streams cannot be reliably converted to timed text!

Going a bit deeper into video format details, the DVD VOB (Video OBject, the container format used for DVDs), defines multiple timed bitmap streams, one for each subtitle track. This means the subtitle text was rendered to bitmap during DVD creation, and only the bitmaps were stored, the original text being not included.

On the other side, iTunes uses the newest MPEG-4 standards, and the .mp4 format defined in MPEG-4 Part 14 uses for subtitles the timed text format, defined in MPEG-4 Part 17, a format heavily based on XML semantics.

Recreating the original text from the bitmap stream requires a good OCR (optical character recognition) software. There are several such programs, but none is perfect, and the generated text may contain a lot of mistakes, inherent to OCR processing, like capital-I instead of small-L.

Trying to avoid the OCR solution, Nero Recode proposed a different approach, to encode the bitmap subtitles in the MPEG-4 container as private streams. Although this is perfectly legal according to MPEG-4, support for this solution is very limited (VLC, of course supports it), and I doubt Apple will ever implement this in QuickTime/Apple TV.

Another possible compromise would be to superimpose the bitmap subtitles over the video stream. The disadvantages of this method are obvious, the subtitles are burned in permanently in the video stream and cannot be removed later, the video stream being permanently damaged. Also it is not possible to include subtitles for several languages.

From a personal point of view, neither compromise is acceptable for a long term solution.

What to do?

If a single step conversion is not possible due to possible errors in the generated text subtitles, that require manual proofing, than a multi-step approach can be imagined:

  • convert the DVD video/audio streams to MPEG-4, without burning in any subtitle
  • when text subtitles are available, add them to the .mp4 container (multiple languages can be added)

There are many solutions for ripping and converting DVDs to MPEG-4. My personal favourite is HandBrake. It can directly read DVDs (you will need to have VLC installed in order to read protected disks), or you can use RipIt to read the DVD into the filesystem and later process it more conveniently.

As for adding subtitles to an existing .mp4 movie, my favourite is iSubtitle. It is a very nice program, that supports all major subtitle file formats like SubRip (.srt), SubViewer 1 & 2 (.sub), SubStation Alpha (.ssa/.ass) and MicroDVD.

As for getting the proper subtitles for the given DVD, this is a tricky issue. The first idea would be to check the many public subtitle sites and see if you can find something appropriate for your particular DVD. You must be very careful, since many subtitles exist for different movie editions, different frames/second (23.976, 25, 29.997 FPS), different formats (.srt, .sub), etc and most of them need additional trimming, like spelling corrections and time shifting for proper audio/subtitle synchronisation.

The second option would be to rip the subtitle stream from the DVD and process the bitmaps with a specialised OCR package to generate a timed text subtitle file. At the time of writing this post I do not know of any Mac program for doing this. Several Windows programs exists, like SubRip, AviDemux, DVDSubEdit, but most of them are quite old and not so user friendly and may even require some training to correctly identify characters.

subrip ocr chars

From my personal experience, subtitle processing is a very time consuming task, and most of the time the  processed subtitles are not identical to original bitmap subtitles.

Conclusion

For DVDs that do not need subtitles (music, movies with audio tracks in your own languages, etc), you can start ripping them right now without problems.

For movies in a foreign language that you know, you can also rip them, and later add subtitles, when available. Personally I would add a subtitle track with the original language and a second track with my own language.

For movies in foreign languages that you will probably never learn, you can compromise on burning in a single subtitle track, but be sure you keep the original DVD in your archive, in case you want a different language.

Or you can simply wait for your favourite titles to be released in Blu-ray format, or to become available on various on-line stores in HD, and buy them once again…

As a final though, it seems that the old DVD format is already obsolete…

About Liviu Ionescu (ilg)
Hi! My name is Liviu Ionescu (ilg, ilegeul or eunete for colleagues and friends) and I’m a senior IT engineer. Or should I say a real programmer?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: