Narrative contents such as interactive games and animated movies are a major application domain for computer graphics, with implications in entertainment, education, cultural heritage, scientific communication, and professional training. In those applications, the creation of 3-D content cannot be limited to the production of shapes and motions ; it should also include the necessary steps to organize shapes and motions into compelling stories, using adequate staging, directing and editing. As a result, it is essential to conduct research in directing virtual worlds.
In this context, my research program focuses on developing computer tools for directing animated movies and interactive games, using virtual sets, actors, cameras and lights. Similar research agenda have been high-lighted in the past by Andrew Glassner (Glassner 2004) and Ken Perlin (Perlin 2005) in their investigations of computer storytelling. Building on their analysis, I have identified four scientific challenges and research directions for future work.
The first scientific challenge in directing virtual world is to find a common syntax and semantics for sharing the story between the user/director and the digital tools offered to him in the virtual world.
Directing virtual actors and cameras requires a large vocabulary of objects, places, actions and events in the virtual world. Describing 3-D contents in the words of the user/ director requires intensive work in annotating/ recognizing 3-D contents in terms of a shared ontology. Furthermore, the story itself should be represented in a machine-readable, symbolic notation.
It should be noted that storytelling techniques can be usefully applied to a variety of specific domains, including cooking recipes, or the teaching of medical anatomy and mechanical design, which I am investigating with my colleagues in the IMAGINE team. I am also starting collaborations with experts in other domains such as geology (IFPEN and TOTAL), cartoon animation (TEAMTO), psychology (UNIV GENEVA), virtual reality theatre (Paris 8) and driving simulations (SYM2B).
In most practical situations, the story is initially drafted in natural language. Given the current state of the art in natural language processing, such input needs to be manually encoded into a structured language to eliminate ambiguities (Mani 2013). This structured language should use terms defined in the story ontology, and have the expressive power to represent stories at all stages of the creation pipeline (screenplay, storyboard, production and post-production).
I have started to address this challenge in recent years with promising results. The prose storyboard language (Ronfard 2013) is a formal language for describing the movements of actors and cameras in movies. The screenwriter and director model (Ronfard and Szilas 2014) redefines the interaction between story engines and computer graphics engines. And my most recent work on cinematography and editing (Galvane 2016) extends the discrete event calculus (Mueller 2008) to handle durative events and actions. Future work will be devoted to extend this preliminary work to a general-purpose story ontology compatible with existing frameworks such as MSML (Van Rijsselbergen 2009) and NarrativeML (Mani 2014), together with more specialized ontologies dedicated to specific application domains. Furthermore, my previous work must be extended to more general story graph structures (Elson 2012) that have not previously been applied to the case of visual narratives.
In the next five years, my goal will be to propose a generic model of a story graph representing story events and the relations between them independently of domain and media at all stages in the design cycle of an animated visual story.
The next stage in directing virtual worlds is creating the animation for all the characters in the story. Previous work has investigated automatic generation of animation in limited domains (Ye 2008). But such approaches are limited in at least two ways. Firstly, many details in the animation still need to be resolved at this stage and require user input, such as the starting positions of actors, the speed at which they perform actions, etc. All of those require directorial control. Secondly, character animation is usually generated by piecing together stored animation samples from existing motion database. This can work well in limited domains where dedicated motion databases can be collected. But it does not easily generalize to a large vocabulary of expressive actions, as required in storytelling.
My second scientific challenge is therefore to propose algorithmic models for procedural animation with a large space of expressive and stylistic parameters. To confront this challenge, I propose an intermediate step between the story graph and the final animation, consisting in an animation score that places all actions and events in the story graph in time and in space ; and can be further refined progressively by adding animation details. This representation plays a similar role to the storyboard and layout phases in traditional animation ; a topic that I have worked on previously with INA and France Animation.
The advantage of such an intermediate representation is that it offers possibilities for (1) sketching animation, using storyboard-like drawings that can be used as space-time constraints during the computation of the final animation ; and (2) acting animation, where the user/ director can play the parts of all actors one by one, and the actual character animation is generated by recognizing and imitating his expressions, and transfering them to virtual actors.
In the real world, directors and actors work towards a successful performance through rehearsals that involve trials and errors. In the digital world, some attempts have been made to control virtual actors in real-time using handdrawn sketches (Thorne 2004), speech (Wang 2006) or gesture (Jung 2006). But very little work has been devoted to the more difficult problem of progressively editing and improving such animation.
In the next five years, my goal will be to automatically create complex 2D and 3D animation using a combination of hand-drawn sketches, acting gestures and high-level story graphs. The approach should be general enough to fit the requirements of specific applications domains, e.g. the story-driven simulation of geo-dynamic phenomena from geological sketches; the story-driven simulation of passengers in urban city transportation systems; or the story-driven simulation of two interacting characters in cartoon animation.
Traditional movie-making involves many steps from screen-writing to story-boarding to animation, cinematography and film editing. One desirable feature for digital movie-making is the ability to automate cinematography and editing, so that the effects of changing one element in the screen-play or story-board can be immediately vizualized and evaluated from the perspective of the audience, in a cinematically correct movie. Automatic cinematography and film editing methods are also needed for interactive games and interactive drama applications, where the story is modified in real-time by the player’s choices.
My third scientific challenge is therefore to propose novel methods for automatic cinematography and film editing, i.e. controling the cameras and the lighting while the animation is playing, and choosing which cameras should be shown to the audience . The recent PhD theses of Vineet Gandhi and Quentin Galvane have proposed efficient optimization frameworks for choosing the framing and the editing of virtual cameras that best convey events and actions in a given story. Their approaches, however, remain limited to linear editing (maintaining the exact chronology of events) with simplistic narrative strategies and cinematographic styles.
In the next five years, I will adapt the cinematography and editing to other narrative domains, such as scientific training videos, tutorials and how-to guides, and to demonstrate the benefits of the proposed approaches in terms of narrative comprehension. I am also planning to extend my previous work to the case of non-linear video editing (including temporal ellipses and flashbacks) with more complex narrative strategies and editing styles. I will address this challenge by data mining cinematic knowledge from existing movies. Modern computer vision techniques are now making it possible to automatically analyze movies scenes and infer cinematography and film editing styles from real examples. As a first step, I am planning to learn semi-Markov models, which have well understood inference algorithms. Then a promising direction of research will be to move from semi-Markov models to stochastic context-free grammars of film editing, and to learn such grammars from examples using probabilistic grammatical inference.
Directing virtual worlds is especially important in real-time, interactive situations, which raise additional challenges that need to be addressed in future work. The fourth challenge will be to control the storytelling process in real-time with feedback from the audience . This is important for applications such as video games where the players are experiencing the game while at the same time taking part in the story.
Within the video game industry, real-time control of non-player characters (NPC) and cameras is emerging as a new discipline of artificial intelligence often described as “game AI”, typically using finite-state machines and behavior trees. This is a topic I teach to my Master 2 students. While those methods have proved useful for representing moderately complex behaviors in combat and sport games, they still require a lot of programming effort and do not easily extend to other types of games, i.e. story-driven serious games. The challenge in this case consists in reconciling the conflicting goals of directed interactions driven by the story, and autonomy of the player. This difficult problem can best be addressed in specialized application domains where the goals of the story can be explicitly encoded and tracked in real-time. For example, a cooking recipe demonstrated by a chef could be represented as a story and used to drive an interactive game for apprentice cooks to reproduce the recipe in their own way.
My objective in the next five years will be to extend my previous work in animation and cinematography to the case of real-time interactions and to evaluate the proposed methods with experts in specific application domains.
(Elson 2012) David Elson. Modeling Narrative Discourse. Ph.D. Thesis. Columbia University, New York City, 2012.
(Galvane 2016) Quentin Galvane, Rémi Ronfard, Marc Christie. A Semi-Markov Theory of Continuity Editing for Automatic Movie Generation. Under review, Artificial Intelligence Journal.
(Glassner 2004) Andrew Glassner. Interactive Storytelling: Techniques for 21st Century Fiction. AK Peters/ CRC Press,
(Jung 2006) Jung, Ben Amor, Heumer, Weber. From motion capture to action capture: a review of imitation learning techniques and their application to VR-based character animation. In Proceedings of the ACM symposium on Virtual reality software and technology, 2006.
(Mani 2014) Indeerjit Mani. Computational modeling of narrative. Synthesis Lectures on Human Language Technologies, Morgan & Claypool 2013.
(Mueller 2008) Erik Mueller. Event calculus. Handbook of Knowledge Representation. Elsevier, 2008.
(Perlin 2005) Ken Perlin. Toward interactive narrative. In International Conference on Virtual Storytelling, pages 135-147, 2005.
(Ronfard 2013) Remi Ronfard, Vineet Gandhi, Laurent Boiron. The Prose Storyboard Language: A Tool for Annotating and Directing Movies. Workshop on Intelligent Cinematography and Editing (WICED), Foundations of Digital Games, May 2013, Chania, Crete, Greece.
(Ronfard and Szilas 2014) Remi Ronfard, Nicolas Szilas. Where story and media meet: computer generation of narrative discourse. Computational Models of Narrative, Jul 2014, Quebec City, Canada.
(Thorne 2004) Thorne Matthew, Burke David and van de Panne Michiel. Motion Doodles: An Interface for Sketching Character Motion. ACM Transactions on Graphics, 2004.
(Van Rijsselbergen 2009) Van Rijsselbergen et al. Movie script markup language. ACM symposium on Document Engineering, 2009.
(Wang 2006) Z. Wang and M. van de Panne. Walk to here: a voice driven animation system. In ACM SIGGRAPH/ Eurographics symposium on Computer animation, 2006.
(Ye 2008) Ye, Baldwin. Towards automatic animated storyboarding. AAAI, 2008.