Finding Correlations with Popular Movies

Posted by Nathan Lucero August 25, 2022

Finding Correlations with Popular Movies

GitHub Repository: HERE
Handled with: Jupyter Notebook
Programming Language: Python

What are correlations?

Correlations are a mutual relationship or connection between two or more things. For example, statistically people tend take an umbrella outside when its raining. Therefore, there is a positive correlation between rain and umbrella usage. The opposite can also be said where people tend to not take an umbrella outside when its a sunny day. This would be an example of a negative correlation between the sun and umbrella usage. I hope this gives whoever is reading this a a foreshadow of what this project is attempting to see.

The Hypothesis

This project handles a particular dataset found on Kaggle and focuses on finding any correlations within the data. Here the dataset encapsulates popular movie titles along with their financial data along with other movie information (director, release date, main actor, et cetera). My main hypothesis was to see if there was a correlation between a movies budget and it's box office gross total. Here I assumed if a company spent a lot of money on a project then in theory it should produce a profit (not always the case but I wanted to see if there is a common trend).

Steps I did for this Project

Downloaded the CSV file from Kaggle
Imported crucial Python libraries for compilation (pandas, seaborn, numpy, and matplotlib)
Changed data types from float64 to int64 to make the financial data clean
Generated a Scatter plot (with titles, labels and a trend line) between Budget versus Gross
Generated a heatmap visual to make the information convenient to understand.
Generated more correlation examples for future reference
Got my results

What I Found

Budget versus Gross Scatter Plot (positive correlation!):

Values are incremented by 50 million dollars

Raw Correlation Matrix (pearson method):

Correlation Matrix Visual (pearson method):

Conclusion

The data shows that there is in fact a positive correlation between a company's budget and net value. A company that tends to spend a lot of money on a movie are more likely to have success. Also what I found in addition to this finding was that there was a positive correlation between votes and gross. Which makes sense since a movie's success is highly determined by the amount of tickets sold. Whereby, if people liked the movie they watched then they will give the movie a high rating which in addition provides more people seeing the movie and providing more profit for the company.

Also (not mentioned here) I also tried converting every Company to a unique integer value and tried to see if they company itself (especially those with high respect) had any correlations with gross profits. Unfortunately, that hypothesis failed. However, it did teach me to transform the problem into a more creative way to get more insights for future analysis.