The ‘hit’ phenomenon: a mathematical model of human dynamics interactions as a stochastic process

A mathematical model for the ‘hit’ phenomenon in entertainment within a society is presented as a stochastic process of human dynamics interactions. The model uses only the advertisement budget time distribution as an input, and word-of-mouth (WOM), represented by posts on social network systems, is used as data to make a comparison with the calculated results. The unit of time is days. The WOM distribution in time is found to be very close to the revenue distribution in time. Calculations for the Japanese motion picture market based on the mathematical model agree well with the actual revenue distribution in time.


Introduction
Human interaction in 'real' society can be considered using the 'many-body' theory. With the popularization of social network systems (SNS) such as blogs, Twitter, Facebook, Google+ and other similar services around the world, interactions between accounts can be stored as digital data. Although the SNS society is not the same as real society, we can assume that communication is very similar. Thus, we can use the huge stock of human communication digital data as observation data for real society [1][2][3][4]. Using this observation, we can apply statistical mechanics methods to the social sciences. Since word-of-mouth (WOM) is very significant-for example, in marketing science [5][6][7][8]-such an analysis and prediction of the digital WOM in the sense of statistical physics is of importance today.
In this paper, as an applied field of the statistical mechanics of human dynamics, we focus our attention on motion picture entertainment, because the logs of communication for each movie in SNS can be distinguished easily, and the market competition between movies can be neglected because of the character of each movie; the markets for the Harry Potter series, 3 the Pirates of the Caribbean series and Avatar can be distinguished, for example. Moreover, traditionally, the motion picture industry in Japan has daily data on revenue for each movie.
The theoretical treatment of the motion picture business has a long history in the social sciences, such as marketing. The traditional method of forecasting motion picture revenue is to assume the following simple model: where A, B, C and D represent qualities such as advertisement budget for the movie, strength of WOM, star power, quality of story, quality of music, etc. Then the formula is 'linearized' as follows: log R = α 1 log A + α 2 log B + α 3 log Bα 2 D.
Using the huge stock of market data, the coefficients α 1 , α 2 , α 3 and α 4 are determined using accurate statistics [9][10][11][12][13][14][15]. However, before discussing the actual determined coefficients, physicists must question the model of equation (1) itself. The form of equation (1) itself should be considered deeply and should be derived. Moreover, the model of equation (1) has no way to include the dynamics of human interactions because it is too simplified. In an actual society, including the SNS society, communication between humans has some dynamic behaviors, so that we should use a more realistic model to consider the aggregation behavior of communication in society.
Approaches from physicists also exist [16][17][18][19]. These approaches address the statistical law and the dynamics of motion picture popularity at the box office. Sinha and co-workers [16] found the long-tail distribution of the popularity of top movies in theaters and discussed in detail the similarities and differences between two types of hits-blockbusters and sleepers-mainly in the US market. They suggested that popularity may be the outcome of a linear multiplicative stochastic process. They found the lognormal nature of the tail of total income and the bimodal form of the overall gross income distribution. They also discussed the nature of the decay of gross income per theater with time.
Before discussing their work, we should point out that the Japanese market is very different from the markets of the US and India. Because of the small land size and the high concentration of people in the metropolitan areas, the distribution of Japanese movie theaters is very concentrated in metropolitan areas such as the greater Tokyo area, the greater Osaka area and Nagoya, for example. Moreover, the three major motion picture companies, Toho, Toei and Shotiku, control most theaters in Japan. Because of this concentration of the location of theaters and the control of the major companies, sleeper-type hits never happen in the Japanese motion picture market. All Japanese hit movies are of the blockbuster type. Therefore, Sinha et al's [21] detailed analysis of sleeper-type hits cannot be applied to the Japanese motion picture entertainment industry.
For blockbusters, Pan and Sinha [17] pointed out that the opening week is the most critical event in the commercial life of a movie. However, from their analysis of their weekly data, they concluded that advertising may not be a decisive factor in the success of a movie at the box office. Asur and Huberman [18] used Twitter logs for movies, focusing their attention on the period comprising one week before the opening and the opening two weeks. However, they did not pay attention to the correlation between daily advertisements and daily weblog (blog) entries. Moreover, they did not pay attention to the daily advertisement budget, but only the total advertisement budget of a certain movie.
Ratkiewicz et al [19] discussed online popularity. They proposed a minimal model combining the classic preferential popularity increase mechanism with the occurrence of random popular shifts due to exogenous factors. They analyzed two large-scale networks: Wikipedia and the Chilean Web. Although their analysis is very interesting and useful, the sudden increase in the popularity of a certain movie has too short a duration for their approach.
The dynamics of the popularity increase have been investigated theoretically by presenting mathematical models to discuss it. The stochastic process has been tried as a way to forecast motion picture revenues [20], but the approach is still incomplete and could be made more accurate using data from blogs, Twitter or Facebook postings.
A better approach from the point of view of physicists is the so-called Bass model, which was presented as a simple model of aggregation behavior for WOM in 1969 [21,22]. The key concept of the Bass model is a diffusion equation: diffusion of WOM in society. Many modified Bass models have been presented to analyze WOM for motion pictures [23,24]. In the Bass model, we consider the number of adopters at the time t, R(t). The number of non-adopters is calculated as N-R(t), where N is the number of persons in the market. If advertisements affect the number of people who adopt the products, we can write the increasing rate of adoption as where p is the probability that non-adopters will adopt the product per unit of time due to the advertisement. People can also be affected by WOM from the adopter. Thus, if we consider only the WOM effect, we find that where q is the probability that non-adopters will adopt the product per unit time due to WOM from the adopters. Thus, combining both effects, we can write equation (5) as follows: This is the equation of the Bass model. In the Bass model, the advertisement is included only as the factor p. The many modified Bass models include the decrease per time of the advertisement effect using the exponential decay function as follows: where t 0 is the time of the release day of the product. However, the real marketing actions begin several weeks before release. The modified Bass model above does not include such advertisement effects before release. Moreover, the Bass and modified Bass models above do not include rumor effects in real society, that are not described using the person-to-person two-body interaction.
From the brief review of the previous studies above, we find that the effects of advertisements and WOM are included incompletely and the rumor effect is not included. Therefore, from the point of view of statistical physics, we present in this paper a model to include these three effects: the advertisement effect, the WOM effect and the rumor effect. The model presented is applied to the motion picture business in the Japanese market, and we compare our calculation with the reported revenue and observed number of blog postings for each film.

Purchase intention
We start the modeling from the viewpoint of the individual consumer. We define the purchase intention of the individual consumer, labeled i, at time t as I i (t). We assume that the number of products adopted until time t can be written as where N is the maximum number of adopted persons, p is the price of the product and t 0 is the release date of the product. Thus, our problem is to define the equation of the purchase intention of each consumer I i (t). We consider the modeling of the effects of advertisements, WOM and rumor for the purchase intention in the following subsections.

Advertisement effect
The advertisement effect through mass media such as TV, newspapers, magazines, the Web, Facebook and Twitter is modeled as an external force for the equation of the purchase intention of the individual consumer: where A(t) is the time distribution of the effective advertisement effect per unit time and the coefficient describes the impression of the advertisement for consumer i. The external force A(t) can be considered as trends in the world or political pressure on the market. In the application to the motion picture business in the Japanese market, we input the real daily advertisement budget used by the largest advertisement office in Japan, Dentsu Inc.

The word-of-mouth effect
Usually, a film's success spreads through WOM. Such WOM sometimes has a very significant effect on the success of the movie. Thus, the WOM effect should be included in our theory. The WOM effect can be distinguished into two types: WOM direct from friends and indirect WOM as rumors. We term the WOM effect between friends direct communication, because customers obtain information directly from their friends. In previous marketing theories based on the Bass model [21][22][23][24], usually only communications from adopter to non-adopter are taken into account. Here, we include communication between non-adopters. It is very significant for movie entertainment, especially before the opening of the movie. Let us consider that person i hears information from person j. The probability per unit of time for the information to affect the purchase intention of person i can be described as D i j I j (t), where I j (t) is the purchase intention of person j and D i j is the coefficient of the direct communication. A schematic image of the direct communication of persons i and j is shown in figure 1. Thus, we can write the effect of the direct communication as follows: where the summation is done without j = i. In this paper, the rumor effect is termed as indirect communication. In this form of communication, a person hears a rumor while chatting on the street, overhearing a conversation from the next table in a restaurant or on a train, or finds the rumor in blogs or on Twitter. The situation is illustrated in figure 2(a), where many conversations are conducted on the streets of a city. To construct the theory using mathematics, we focus on one person who listens to a conversation happening around him/her. Let us consider that person I overhears the conversation between person j and person k. The strength of the effect of the conversation can be described as D jk I j (t)I k (t). The probability per unit time for the conversation to affect the purchase intention of person i is defined as Q i jk D jk I j (t)I k (t), where Q i jk is the coefficient. Thus, the indirect communication coefficient can be defined as P i jk = Q i jk D jk . The situation is shown in figure 2 Therefore, direct communication is two-body interaction and indirect communication is three-body interaction. Thus, our theory for 'hit' phenomena such as hit movies or music can be described as the equation of the purchase intention of person i with two-body interaction and three-body interaction terms.

Decline of audience
A person tends to only watch a particular movie once. This means that the potential number of attendees decreases monotonically after the opening day. In figure 3, we show the typical decline of cinema box office returns for the Japanese movie market. It should be pointed out that DVD sales in Japan for a movie begin several months after the opening day at the cinema. This is the business practice in the Japanese entertainment industry. Thus, on the days shown in figure 3, no one can buy DVDs or online movies in Japan.
We found in figure 3 that the revenues decrease almost exponentially. This evidence is very natural, because the number of audience members decreases monotonically due to the effect that a person who has watched the movie does not watch the same movie again. In figure 3, only film E shows a sudden decrease. This happened due to a sudden scandal involving the actress in the movie. From the data, we find that the decay factors for most movies in figure 3 seem to be similar. The decay factor of the exponential decay is nearly 0.06 per day. This value is similar to the value reported in [17] for Spider-Man in the US market. It agrees well with the empirical rule of the Japanese movie market that the number of audience members undergoes roughly a 6% decrease.
This exponential decrease can be explained easily using a simple mathematical model. First, we denote the number of potential audience members as N 0 and the number of integrated audience members at time t as N (t). Thus, the number of people who are interested in the movie Assuming that the probability of watching the movie per day is a, we obtain the equation to describe the number of audience members as The solution of the equation with N 0 = 0 at t = 0 is It is clear that the result can explain roughly the exponential decay of the audience shown in figure 3. The purchase intention of person i, I i (t), can also be considered to decay in a similar manner.

Equation of purchase intention for the 'hit' phenomenon
According to the above consideration, we write down the equation of purchase intention at the individual level as where d i j , h ijk and f i (t) are the coefficient of the direct communication, the coefficient of the indirect communication and the random effect for person i, respectively. We consider the above equation for every consumer so that i = 1, . . . , N p . Taking into account the effect of direct communication, indirect communication and the decline of audience, we obtain the above equation for the mathematical model for the hit phenomenon. The advertisement and publicity effect for each person can be described as the random effect f i (t).
Equation (12) is the equation for all individual persons, but it is not convenient for analysis. Thus, we consider the ensemble average of the purchase intention of individual persons as follows: Taking the ensemble average of equation (12), we obtain for the left-hand side For the right-hand side, the ensemble average of the first, second and third terms is as follows: For the fourth term, the random effect term, we consider that the random effect can be divided into two parts: the collective effect and the individual effect: where f i (t) means the deviation of the individual external effects from the collective effect, f (t). Thus, we consider here that the collective external effect term f (t) corresponds to advertisements and publicity to persons in society. The deviation term f i (t) corresponds to the deviation effect from the collective advertisement and publicity effect for individual persons, which we can assume to be Therefore, we obtain the equation for the ensemble-averaged purchase intention in the following manner: where N d = D, Equation (21) can be applied to the purchase intention in the real market. Equation (21) is the equation we assumed in our previous works without derivation [25][26][27][28][29][30]. In this paper, we apply this equation to the motion picture business.

Observed data in the market
For the application of equation (21) to the real markets of motion pictures, we observe some market data as inputs and compare the observation with our calculation. The market data we use here are daily advertisement cost, daily revenue and daily number of blog postings on the Internet.

Daily advertisement data
In much of the previous marketing science research [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][20][21][22][23][24], advertising costs are considered only in total. However, the time distribution of the daily advertising cost is very important. In figure 4, we show the advertising cost and related gross rating point (GRP) for The Da Vinci Code in the Japanese market. GRP is a term used in advertising to measure the size of the audience reached by a specific media vehicle or schedule [31]. It is just the product of the percentage of the target audience reached by an advertisement times the frequency at which they see it in a given movie campaign. Both the daily advertising costs and the GRP for each movie in the Japanese market are obtained from Dentsu Inc., the largest advertising agency in Japan. In Japan, Dentsu handles most advertisements appearing on every TV station except NHK (the national station), so all campaigns for all movies in all Japanese prefectures are organized entirely by Dentsu; thus, the advertisement cost data from Dentsu are exactly equal to the advertisement costs of the movies in the Japanese market. The value of the daily advertising costs does not include the discount for major movie production agencies in Japan, but the discount rate is the same during the campaign of each movie.
It was clearly seen in figure 4 that the advertising costs and GRP are distributed for two months after the opening of a movie. This feature can be found for every movie campaign in the Japanese market. Thus, not only the total advertising cost but also the time distribution of the costs is very important for the success of the advertising campaign for a motion picture.

Daily audience
The number of daily audience members for each movie is very important for the mathematical study of a motion picture campaign. In Japan, sales of DVDs or online access to movies is forbidden until several months after the opening of the movie in movie theaters. Thus, the number of audience members is the exact number of people who watch the movie. During this time, no one can watch the movie without going to the movie theater. We obtain the data for daily audience numbers from Box Office Japan (Kogyo Tsushinsha, Tokyo).

Daily blog postings
Daily blog postings for a movie are a very important signal to measure the movement of purchase intention among persons in the society. We measure the daily data of the number of posts for movies using the site Kizasi, which is a service for observing blog postings in Japan. We measure the number of blog posts for 25 movies in Japan in order to compare this information with the box office gross income for each movie in Japan. We show several results in figures 5-9. In these figures, the curve of daily blog posts is multiplied by a certain constant to normalize the value to be similar to the revenue value.
From figures 5-9, we find that the curve of the daily blog posts for each movie is very similar to the curve of the daily revenue we reported in our previous works [27,28]. We observed 25 movies in the Japanese market and found a similar feature for all 25 movies. Since the selected 25 movies were a random sampling of hit movies in Japan, we can consider that the similarity can be applied to the most popular motion pictures in the Japanese market. This observation means that the ratio of blog postings for each person is almost constant during the duration of each movie so that the daily number of blog postings is very similar to the daily revenue.
Blog postings for each film can be distinguished into positive, negative and neutral opinions. A positive opinion means that the blogger wants to watch the film or judges the watched film in a positive way. In figure 10, we show that more than half of the blogs show  a positive opinion for several movies. Moreover, we find that the ratio of positive, negative and neutral opinions is almost constant during the duration of the movie opening. Thus, the observed blog posting counts can be considered to be proportional to the counts of positive blog posts.
According to this observation, we propose to use the daily number of blog posts as the daily 'quasi-revenue.'. Quasi-revenue is very useful for analysis, because it can be defined even before the opening of the movie. We can observe the increase in anticipation of a movie.

Calculation
We calculate the daily purchase-intention using equation (21) with the daily advertisement cost as input data for f (t) . The parameters in equation (21) are adjusted to fit the data with the observed quasi-revenue for each movie.

Calculation in detail
For the calculation of each movie, we should derive the actual formulation for the calculation where the adopter and the non-adopter are distinguished. We can write the direct communication term of the non-adopter-to-non-adopter interaction in equation (21) as follows:  where The direct communication term for the adopter to the non-adopter is Similarly, we obtain the indirect communication term due to the communication between non-adopters at time t: where p nn is the factor of the indirect communication between non-adopters at time t.
For the indirect communication, we obtain two more terms corresponding to the indirect communication due to the communication between adopters and that between an adopter and a non-adopter as follows: where p yy is the factor of the indirect communication between adopters and p ny is the factor of the indirect communication due to the communication between adopters and non-adopters at time t.
Finally, we obtain the equation of purchase intention for the actual calculation as follows: For the period before the opening of the movie, there are no adopters. Furthermore, the total number of adopters is zero. Thus, for the period before opening, equation (28) is reduced to the following form: Equation (28) with (24) is the nonlinear integro-differential equation. However, since the data are handled daily, the time difference is 1 day. We can solve the equation numerically as a difference equation.

Reliable factor
For the purpose of reliability, we introduce here the so-called 'R-factor' (reliable factor), well known in the field of low-energy electron diffraction (LEED) experiments [32]. In LEED experiments, the experimentally observed curve of current versus voltage is compared with the corresponding theoretical curve using the R-factor.
For our purpose, we define the R-factor as follows: where the functions f(i) and g(i) are defined in figure 11. The smaller the R-factor, the better the match in functions f and g. We use this R-factor as a guide to obtain the best adjustment of our parameters for each movie.

Results
We perform calculations using equation (29) for many movies in the Japanese market. We show several results in figures 12-16. The solid line shows our calculation with the real daily advertisement costs as input data of f (t) of equation (29). In the calculation, we use trial and error to decide the parameters to minimize the R-factor. The durations we calculate the Rfactors are equal to those of each corresponding figure. We also perform Metropolis-like random number sampling for some parameters to minimize the R-factor, but not for all the parameters, because it would require too much calculation time. The R-factor for each calculation is shown The results are compared with the number of blog posts as the quasi-revenue shown as a red histogram in the figures. We found that the agreement of the calculation with the quasi-revenue (blog posts) is very good.
The parameters we use in the above calculations are shown in table 1. We found that the value is similar for these movies, although not equal. The value for Avatar will be discussed later.

Pirates of the Caribbean At World's End
Posting counts

Agreement with observations
In figures 12-16, we find that our calculations agree well with the observed daily number of blog posts for each movie. Since we find that the daily number of blog posts for a movie is proportional to the revenue for the movie, this agreement means that our calculation can reproduce the revenue for each movie. watched in cinemas, because online downloads and DVDs are available only several months after the opening day of the movie. Moreover, the cinema entrance fees in the Japanese market are similar for almost all cinemas. Thus, the agreement of our calculation with the daily blog post numbers means that our calculation can describe the movement of people in society very well, at least for the Japanese motion picture market.
The agreement here can be considered to mean that our theory can describe the collective motion of people in society, at least for the entertainment market in Japan. Since our theory is considered to be general, we expect that it can be applied to other entertainment markets in the world.

Optimization of time distribution of advertisement cost
As we saw in section 3.1, the advertisement cost has a distribution in time. The calculated result using equations (28) and (29) of our theory depends strongly on the time distribution of the advertisement cost. This means that, even for the same total cost, the total revenue can change depending on the time distribution of the advertisement cost. Thus, using our equations (28) and (29), we can consider how to optimize the time distribution of the advertisement cost to obtain the maximum value of the total revenue at the condition of the fixed total advertisement cost.

Indirect communication
One of the original ideas of the present theory is the effect of indirect communication, as explained in section 2.3. To demonstrate the effect of indirect communication, we perform two calculations: one including indirect communication and one without indirect calculation. The example we show here is the calculation for Avatar, which was a very big hit in 2010. Both calculations are optimized as much as possible using the Metropolis-like method of parameter optimization with random numbers to minimize the R-factor. The result is shown in figure 17. We found that the indirect calculation is better. The parameters for the best fit are shown in

Relation to the Bass model
The Bass model [21][22] is well known to be the model of the spread of WOM. In this subsection, we verify that the Bass model is derived from the equation of the mathematical model of the hit phenomenon. For sales of the production, the equation of our mathematical The total number of sales for the product, N, can be defined using the number of people, m, and the purchase intention, as follows: In the Bass model, advertisement is spread from adopters to non-adopters. Thus, where the sum is only the persons who do not buy the product. For the direct communication term in equation (31), i j D i j I i I j , we assume here that the coefficient D i j is not zero only for the pair of persons, adopter and non-adopter. The purchase intention for the adopter is considered to be I j = 0. Thus, we can express the direct communication term as follows: Thus, summing up for i, we obtain from equation (21) d Therefore, substituting (34) into (28), we obtain dN (t) dt = a (m − N (t)) + (m − N (t)) N (t) b + i j k P i jk I j (t)I k (t). (36) Here, if we neglect the indirect communication term as P i jk = 0, we obtain the well-known Bass model equation Therefore, we find that our mathematical model for the hit phenomenon includes the Bass model and addresses the previously neglected indirect communication. Thus, the indirect communication in equation (21) is the new WOM effect.

Statistical physics method for human interactions
In this paper, we calculate the time variation of the averaged action of humans in real society using our theory. Equation (21) or equations (28) and (29) seem to describe the collective action of humans for entertainment well. As in the usual physics approach, we calculate the prediction and compare it with observation. Thus, if we obtain a large quantity of observation data from real society, the usual method of statistical physics or many-body physics is available to investigate the human interaction in real society. For this purpose, of course, the human interaction for each individual problem should be investigated carefully with a social scientist specialising in communication.

Conclusion
We present the mathematical model of the hit phenomenon as an equation of consumer action where consumer-consumer communication is taken into account. In the communication effect, we include both direct communication and indirect communication. We found the daily number of blog posts to be very similar to the revenue of the corresponding movie. The daily number of blog posts can be used as quasi-revenue. The results calculated with the model can predict the revenue of the corresponding movie very well. We found that indirect communication affects revenue using the calculation and our theory. The conclusion presented in this paper will be applicable to any consumer market.