CMU Movie Summary Corpus
You can find the data here.
This is the main dataset we used for our analysis. It contains 42,306 movie plot summaries extracted from Wikipedia alongside aligned metadata extracted from Freebase, includingmovie box office revenue
,genre
,release date
,runtime
, andlanguage
.
Character names and aligned information about the actors who portray them, including gender and estimated age at the time of the movie’s release
TMDB
You can find the data here.
The TMDb (The Movie Database) is a comprehensive movie database that provides information about movies, including details like titles, ratings, release dates, revenue, genres, and much more.
This dataset contains a collection of 1,000,000 movies from the TMDB database. Among the features available in the dataset areid
,title
,release_date
,vote_average
,vote_count
,status
,release_date
,revenue
,runtime
,adult
,original_language
,popularity
genres
, andposter_path
.
MovieLens (with Reviews)
You can find the data here.
This dataset contains 32 million ratings that we used to perform sentiment analysis and compute a success score metric alongside other features.