Measuring Visual Representation on Screen: A Guideline for Assessment

Computer vision techniques are revolutionizing the way we analyse on-screen representation, offering insights beyond mere presence. By utilising advanced models like multi-branch convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers, these methods can capture aspects such as facial emotions, gaze direction, gesture dynamics, and contextual display properties.

Multi-scale Feature Extraction and Beyond

Multi-scale feature extraction in CNNs allows for the analysis of fine details, like eye edges, as well as broader context, such as head orientation. This enables a deeper analysis of facial states or medical imaging features.

Video emotion analysis uses deep learning to track emotional expressions over time, accounting for cultural and individual differences. Facial reconstruction and emulated displays in VR/AR contexts aim to provide realistic facial states, handling the limitations of physical hardware. Behavioural pattern recognition through mouse movements, keystroke dynamics, and window focus data can enrich on-screen activity analysis, offering insights into user engagement or authenticity of interactions.

Ethical Considerations

While computer vision methods offer powerful tools, their deployment must carefully uphold ethical standards. Key considerations include user consent and transparency, privacy and data security, bias and fairness, accuracy and interpretability, avoiding deception or manipulation, and impact on vulnerable users.

Ensuring informed consent and transparency means users must be informed about what data is collected, how it’s processed, and for what purposes. Sensitive biometric data require strict safeguards against misuse or unauthorised access. Models must be carefully designed and validated to avoid cultural, racial, or gender bias. Systems should acknowledge uncertainty, avoid overclaiming, and allow user agency in how representations are generated or used. Techniques that synthesise or alter facial representations should respect truthfulness and avoid creating misleading impressions. Behavioural logging can unfairly exclude users with disabilities or atypical interaction patterns, requiring careful consideration and accommodations.

Measuring On-Screen Representation

The proposed framework for measuring on-screen representation focuses on three key questions: what is being measured, how is it being measured, and why is it being measured. The framework is most relevant for measuring Equality Act protected characteristics, such as gender, gender identity, age, ethnicity, sexual orientation, and disability.

Measures of diversity generally fall under one of '3P's - presence, prominence, and portrayal. Examples of 'Presence' metrics include the make-up of the cast by gender/ethnicity. 'Portrayal' metrics include emotion of faces or the words by a character, and likelihood of appearing next to particular objects like weapons or drinks. 'Prominence' metrics include duration of screen time, likelihood to appear as a solo face on screen, and relatively more central or influential characters.

The framework is aimed at media regulators, broadcasters, researchers, and film/TV fans interested in measuring on-screen representation. It is important to acknowledge whether or not it is possible to capture intersectionality, with evidence identifying the need for more insights into the intersectional dynamics of underrepresented groups.

In the future, technical recommendations and data standards specific to representation metrics can be developed. Data compiled using different methods (survey, manual counting, computational approach) may capture different aspects of diversity. The next blog will demonstrate how computer vision can be used to measure the relative prominence of people on screen. Interdisciplinary efforts are key to thoughtfully deploy computational methods to generate richer and more regular data about representation. Moving from presence to prominence and portrayal can bring new value and prompt new questions.

Advanced techniques in data-and-cloud-computing, like multi-branch convolutional neural networks (CNNs) and transformers, are revolutionizing the analysis of on-screen representation, offering insights beyond mere presence.
The creative industries are increasingly relying on research in education-and-self-development, particularly in artificial-intelligence, to improve the analysis of facial emotions, gaze direction, gesture dynamics, and contextual display properties.
In the realm of education, understanding the impact of data analysis skills and talent on industries can lead to innovative strategies for improving on-screen representation and enhancing user experiences.
With the increased use of computer vision, there is a growing need for ethical considerations, such as ensuring informed consent, data security, and avoidance of bias in the developed models.
By focusing on measuring Equality Act protected characteristics, such as gender, age, ethnicity, or disability, the framework for on-screen representation offers a deeper analysis of diversity through three main aspects: presence, prominence, and portrayal.
Collaborative efforts among media regulators, broadcasters, researchers, and film/TV fans, combining interdisciplinary skills and insights from multiple industries, can contribute to more sophisticated data analysis and regular measurements of on-screen representation.
The utilization of advanced technologies, such as computer vision, in education-and-self-development can lead to innovative methods for measuring the relative prominence of people on-screen, bringing about new value and sparking further questions.

Measuring Visual Representation on Screen: A Guideline for Assessment