Computer scientists develop new tool that generates videos from themed text
A global team of computer scientists, from Tsinghua and Beihang Universities in China, Harvard University in the US and IDC Herzliya in Israel, have developed “Write-A-Video,” a new tool that generates videos from themed text. Using words and text editing, the tool automatically determines which scenes or shots are chosen from a repository to illustrate the desired storyline. The tool enables novice users to produce quality video montages in a simple and user-friendly manner that doesn’t require professional video production and editing skills.
The team is set to present their work at ACM SIGGRAPH Asia, held Nov. 17 to 20 in Brisbane, Australia. SIGGRAPH Asia, now in its 12th year, attracts the most respected technical and creative people from around the world in computer graphics, animation, interactivity, gaming, and emerging technologies.
While existing video editing tools still demand knowledge in video processing and editing, the researchers’ new method allows novices to create videos that tell stories more naturally. Write-A-Video, say the researchers, allows users to create a video montage by simply editing the text that accompanies the video. For example, adding or deleting text, and moving sentences around convert to video-editing operations, such as finding corresponding shots, cutting and rearranging shots, and creating a final video montage result.
“Write-A-Video uses current advances in automatic video understanding and a unique user interface to allow more natural and simpler video creation,” says Professor Ariel Shamir, Dean of the Efi Arazi School of Computer Science at IDC Herzliya. “With our tool, the user provides input mostly in the form of editing of text. The tool automatically searches for semantically matching candidate shots from a video repository, and then uses an optimization method to assemble the video montage by cutting and reordering the shots automatically.”
“Write-A-Video also allows users to explore visual styles for each scene using cinematographic idioms generating, for example, faster or slower paced movies, less or more content movements, etc.” explains Dr. Miao Wang from Beihang University.
When selecting candidate shots from the video repository, the method also considers the aesthetic appeal of the shots, choosing those that are ideally lit, that are well focused and are not blurry or unstable. “At any point, the user can render the movie and preview the video montage result with an accompanying voice-over narration.” says Professor Shi-Min Hu from Tsinghua University.
The team’s research shows that intelligent digital tools combining the abilities of humans and algorithms together can assist users in the creative process. “Our work demonstrates the potential of automatic visual-semantic matching in idiom-based computational editing, offering an intelligent way to make video creation more accessible to non-professionals,” says Shamir.
For the study, the approach was tested on various pieces of themed text and video repositories, with quantitative evaluation and user studies. Users without any video editing experience could produce satisfactory videos using the Write-A-Video tool, sometimes even faster than professionals utilizing frame-based editing software. At SIGGRAPH Asia, the team will demonstrate the Write-A-Video application and showcase a variety of examples of text-to-video productions.
The team includes Miao Wang (State Key Lab of Virtual Reality Technology and Systems/Beihang University and Tsinghua University); Guo-Wei Yang (BNRist/Tsinghua University); Shi-Min Hu (BNRist/Tsinghua University); Shing-Tung Yau (Harvard) and Ariel Shamir (IDC Herzliya, Israel).
A video illustrating the project can be seen here: https://vimeo.com/357657704