PhD Defense 14-1-2020 – Hoai Le Nguyen – Étude des conflits dans l’édition collaborative

Collaborative Editing (CE) has long captured the attention of Computer-supported-cooperative work (CSCW) researchers. Early researches about CE (in the 1990s and the early 2000) focused on describing different characteristics of CE based on interviewing people who had participated in some CE projects.
Some recent researches about CE started analyzing the logs of CE activities to study how people edit together with support of modern CE tools such as Git version control systems and Google Docs.

From the general view point, the process of CE is the continuous synchronization of `multiple, parallel streams of activity’ of collaborators. If the synchronization takes place less often, for example the development of a software project based on Git version control system, it is considered as `asynchronous’ work mode. And if the synchronization takes place within a small interval, for example editing a shared document in ShareLaTex, it is considered as `synchronous’ work mode.
The longer the divergence is, more conflicts are likely to happen during the synchronization.
Resolving conflicts is costly, especially after a long period of divergence. Understanding how often conflicts happen and how do user resolve conflict in real CE projects is important to ensure good performance and user experience in collaborative editing.
In the first part of this thesis, we borrow the collaboration traces of four large open source projects in Git version control system to conduct our analysis. We analyze different types of textual conflicts that arise during the development and how developers resolve these types of conflict. In particular regarding `adjacent-lines conflicts’, we found that users mostly resolve them by applying changes from both sites. Besides, we also analyze how often users use `roll-back to previous version’ as a way to resolve merge conflict.

The process of CE based on online collaborative editor is more specific. It can be split into several  `sessions’ of editing which are performed by a single author or several authors. They are denoted as `single-authored session’ and  `co-authored session’ respectively. This fragmentation process requires a predefined `interval’ or `maximum time gap’ which is not yet well defined in previous studies. In the second part of this thesis, we analyze the logs of CE works of students of an Engineering School using ShareLaTeX which were collected and anonymized for privacy purpose. By examining different `maximum time gaps’ from 30 seconds to 15 minutes on the logs we found that we can determinate a suitable `maximum time gap’ to split CE activities into sessions by evaluating the distribution of the `external-distance’.
Besides, we analysed the editing activities inside each `co-author sessions’. We borrow a [30 seconds, 10 characters] time-position window to examine these `potential conflict’ cases. The result shows that people rarely edit closely in both time-position. However, conflicts are more likely to happen in these cases.