Now that a huge amount of video footage can be collected easily thanks to the development of network technology, a system to produce the transcription of speech in video footage efficiently is needed ...