Women in Film


The issue of gender inequality within the film industry has existed since its inception. Along with a significant lack of women working as directors, producers, and head writers; women are also excluded from major speaking roles. According to an article from the New York Film Academy, only 30.8% of speaking characters are women and the average ratio of men working on films to women is 5:1. This lack of women in the production of film reveals the deeply ingrained inequality that still exists within the industry. For my final project I chose to include several components to highlight this issue. I was highly influenced by the article, Film Dialogue, by The Pudding as it is the largest analysis of film dialogue by gender ever created. My visualization is comprised of two quantitative data sources along with two datasets I created myself. In light of recent events and the movement for the empowerment of women, it is important to analyze the data on the issue so we can strive for improvements in the future.

Process and Rationale

This project consists of four visualizations including sexualized attire by gender, a comparison of dialogue between men and women, and the presence of female Academy Award winning directors. I used Google Sheets to create my datasets. I used two data sources to compile my datasets, IMDB Top 250 Films page and the Bechdel Test website which was my main source of inspiration. The Bechdel Test analyzes all films by applying three simple rules: 1.) the film has to include two female characters 2.) who speak to one another about 3.) something other than a man. The number of films that fail to pass this simple test is astounding, and highlights the issues in Hollywood that continue to exist. I also used two datasets from external sources that discuss the position of women in film. One entitled, Women-in-Film from data.world and a dataset entitled “Scripts” from GitHub. I used the software program Tableau Public to create my visualizations into one dashboard. I will go into specific rationale for each visualization below.

Sexualized Attire in Film

This visualization depicts the use of sexualized attire in films compared between male and female actors. The data source for this visualization was taken from data.world and included several datasets about different components of gender in film. While I feel that this visualization provides an interesting perspective, I do have some issues in how this data was collected. Due to the subjective nature of the content, I do not know by what metric the creator used to establish what was considered “sexualized attire” in film. In order to reduce misinformation, I included a caption below my visualization which provides a link to the source material which users can refer to for further information. To create the visualization, I had to reformat the dataset to include the male and female variables within the same column so that they could be compared directly. I created a line graph on Tableau and applied the “Gender” variable to the color filter to differentiate between men and women. I used colorblind accessible colors for my visualization (as I did throughout my dashboard) and included a color key at the top for clarity.

Words Spoken in Film by Gender

The second component in my dashboard is a bar graph which compares the words spoken in films according to gender. This visualization was created using two dataset made available from GitHub. This data source included several datasets which analyzed 2,000 films by breaking down the words spoken by all characters in each.

I used two datasets for this visualization. The first, included the film title and accompanying “script id” number. This “script id” was used to connect the film titles to the characters and word count. I toyed with several different iterations of this visualization. I tried a graph each film listed separately with a bar graph comparing the words spoken between men and women, and included a scroll bar for users to see each film. While this provided more detailed information for the 2,000 films analyzed it did not provide a striking overview of the topic and ultimately was not effective. I instead chose to connect the datasets within Tableau and create a bar graph for an easy overview comparison. I used the same color scheme as my previous visualization and ensured that the male and female components had uniform colors across the dashboard. Since this graph depicts the words spoken overall between men and women across all 2,000 films analyzed, I included another caption at the bottom of the graph for the users to understand the significance of the results. I also included the source link for users to follow for additional information.

IMDB Top 250 Films that Pass/ Fail the Bechdel Test

This visualization used a dataset which I created myself. I wanted to include some information from the Bechdel Test. I used the information from their website to analyze the films that are included in IMDB’s Top 250 Films page. By going through each film I recorded the title, the binary result of the test, and the year in which the film was made.

I tried several different variations of how to represent this data. I was interested in including the years to allow users to see if there was any trends over time, but found in the end that a straightforward bar graph was the most effective in communicating the information. This is the only visualization that does not represent male vs. female therefore the color scheme reflects this difference. I also ensured once again that I was using a colorblind accessible palette. The graph also includes a hover feature which displays the total number of films that pass or fail. While I feel that this graph is relatively plain, it is simple and effective.

Directors of ‘Best Picture’ Oscars Award 1975- 2015 by Gender

The final component of this project is a visualization of the directors who have won the Academy Award for “Best Picture” from 1975 to 2015 separated by gender. To create this visualization I created my own dataset. I used a list of past winners compiled by Today  and looked up the director for each film individually. I created a Google Sheet that included the year, film title, and director gender.

I felt it was important to have an interactive quality to this visualization. I included the tooltip feature from Tableau to allow users to view the variables (film title, year, director gender) for each specific entry. I again used the same color scheme as the others in the dashboard for the gender variables. This analysis of the presence/ success of female directors in the industry is startling and this simple visualization conveys the progress that is yet to be made.

Dashboard Layout

After completing each individual visualization I began to consider how to display them together into a dashboard. I toyed with the size and placement of each component. I knew that I wanted the line graph to stretch across the top to ensure that the information is clear as it does not represent binary results like my other visualizations. I also wanted the “Directors of ‘Best Picture’ Oscars Award 1975-2015” to have the full height of the dashboard. Due to the interactive nature of the visualization, I wanted to ensure that there was ample white space. I chose to include the legend and citations along with the corresponding graph. As for the size of the dashboard itself, I chose to make the height and width to be compatible on a computer screen only. I felt that the visualization would be most effective on this type of device as opposed to a phone or tablet.


User Research & Methods

I chose to test three users of various education backgrounds and interests to examine the visualization. Since films are considered part of popular culture, I feel that this information is applicable to a wide range of users. User 1 has no formal education background, but is interested in the analysis of gender studies. User 2 has a Bachelor’s in Studio Art. Although User 2 is not as interested in the analysis of the subject, their background in digital design should provide a design perspective for the project. User 3 is an avid activist for equality and is also a graduate student in the LIS program. Their understanding of the topic and requirements of the project will provide educated criticisms. All of these users should accurately represent the spectrum of the user group for this topic.

User testing was conducted both in face-to-face and video chat interviews lasting approximately 20-30 minutes each. Prior to the examination of the actual visualization, I conducted a brief questionnaire to establish the knowledge base and attitudes of each user in regards to the topic. Following this brief quiz, I allowed the users to browse the dashboard. I then asked them several questions regarding the design (i.e. Where is your eye drawn to?) and asked them to complete several tasks (i.e. How many films failed the Bechdel Test?). I also included questions about general impressions (i.e. What is your impression of the relationship between film and gender?). To ensure that users were comfortable and vocal in their impressions I ensured them before and throughout the test that there were no wrong answers and that they were not personally being tested. I wanted to make this clear as users can often feel pressured to provide certain answers that they believe people want to hear.


Results and Findings

Link to the interactive visualization can be found here.


Sexualized Attire in Films According to Gender

The visualization is clear in the representation between the values for male and female actors and the trends of the topic over time. The line graph depicts the inclusion of the percentage of sexualized attire in male and female actors in film from 2007- 2013. The inclusion of the color key made it quick and easy to understand the information. While there are some questions to the accuracy of the data used to form the visualization, the inclusion of the source link will allow users to research the credibility as desired.

Words Spoken in Film By Gender

While there was some struggle in how this information should be represented, the use of a bar graph for the totals of all of the films ensured that the overall trend was obvious for users. The graph compares the total number of words spoken by male and female actors in a sample of 2,000 films. The inclusion of the caption at the bottom with information on the total films included in the analysis and the link for the source ensure transparency in the methods of creation. The continuity in color schemes provides easier readability and reduces confusion.

IMDB Films that Pass/ Fail the Bechdel Test

This visualization also had some revisions in how to best present the information. The comparison in the totals provides a bold and simple overview of the topic through clear comparisons. The bar graph shows the total number of films from IMDB’s Top 250 Film list that pass or fail the Bechdel Test. Since this graph does not differentiate between male and female variables (as the others do) it was important to use a different color scheme. The inclusion of the caption which explains the Bechdel Test criteria is important for users who may be unfamiliar with the initiative.

Directors of ‘Best Picture’ Oscars Award 1975-2015

This graph shows the gender of each director for films that have won the Academy Award for ‘Best Picture’ from 1975 to 2015. While the visualization is very simple, the lack of variation is extremely effective in expressing the exclusion of women directors from the award. Again, the continuation of the color scheme differentiated between male and female variables is applied. The use of the interactive hover tool is also important as it provides additional information about the year and film title.

UX Findings

Overall, the user testing indicated that my visualizations were clear and effective in expressing the information in a user-friendly manner. I received positive feedback in the layout of my dashboard and my choice in color schemes. Users were able to complete the tasks asked of them during the testing phase and were confident in their findings and opinions.

Two of my users pointed out the issues in subjectivity in regards to the Sexualized Attire in Film by Gender graph. They expressed their concern in the accuracy of the data, and commented that the inclusion of the data source in the caption was important in promoting transparency. I also had one user comment that they did not know what the Bechdel Test was, which prompted me to include a caption explaining the criteria of the test in my final product. I also had one user ask how I got the results for the Words Spoken in Film by Gender visualization, but was satisfied once I pointed out the caption below, explaining the total number of films analyzed and the source link. I had two users comment on the use of the interactive element in the Directors of ‘Best Picture’ Oscars Award 1975-2015 by Gender, and one user spent some time looking at each film included in the visualization.

The users agreed that my visualization was effective in discussing the topic and vocalized their surprise in the severity of inequality in the film industry.

Future Directions

The user testing and my own reflection have given me several ideas in how I could improve upon the visualization in the future. Through the comments during the user testing I would like to refine my visualizations and exclude any data sources that may be unreliable or subjective. While I feel that I handled the issue of the sexual attire dataset as well as I could, I think that I would feel more secure using a different data source or by focusing on a more specific, measurable topic.

Upon my own reflection, I would like to add other elements to the dashboard to show more specific details in regards to the topic, as my visualizations are very broad overviews and trends. I like that within the broad visualization of Directors of ‘Best Picture’ Oscars Award 1975-2015 by Gender there is an interactive tool which allows for more detailed information to be viewed, and would like to add similar tools to my other visualizations.

Overall, I like the visualization I produced, but these components would greatly improve the effectiveness of my project




Latest posts by smagin (see all)