Finding Correlations with Popular Movies
GitHub Repository: HERE
Handled with: Jupyter Notebook
Programming Language: Python
What are correlations?
Correlations are a mutual relationship or connection between two or more things. For example, statistically people tend take an umbrella outside when its raining. Therefore, there is a positive correlation between rain and umbrella usage. The opposite can also be said where people tend to not take an umbrella outside when its a sunny day. This would be an example of a negative correlation between the sun and umbrella usage. I hope this gives whoever is reading this a a foreshadow of what this project is attempting to see.
The Hypothesis
This project handles a particular dataset found on Kaggle and focuses on finding any correlations within the data. Here the dataset encapsulates popular movie titles along with their financial data along with other movie information (director, release date, main actor, et cetera). My main hypothesis was to see if there was a correlation between a movies budget and it's box office gross total. Here I assumed if a company spent a lot of money on a project then in theory it should produce a profit (not always the case but I wanted to see if there is a common trend).
Steps I did for this Project
- Downloaded the CSV file from Kaggle
- Imported crucial Python libraries for compilation (pandas, seaborn, numpy, and matplotlib)
- Changed data types from float64 to int64 to make the financial data clean
- Generated a Scatter plot (with titles, labels and a trend line) between Budget versus Gross
- Generated a heatmap visual to make the information convenient to understand.
- Generated more correlation examples for future reference
- Got my results
- Values are incremented by 50 million dollars