Software Size Estimation Using Activity Point

Software size is widely recognized as an important parameter for effort and cost estimation. Currently there are many methods for measuring software size including Source Line of Code (SLOC), Function Points (FP), Netherlands Software Metrics Users Association (NESMA), Common Software Measurement International Consortium (COSMIC), and Use Case Points (UCP). SLOC is physically counted after the software is developed. Other methods compute size from functional, technical, and/or environment aspects at early phase of software development. In this research, activity point approach is proposed to be another software size estimation method. Activity point is computed using activity diagram and adjusted with technical complexity factors (TCF), environment complexity factors (ECF), and people risk factors (PRF). An evaluation of the approach is present.


External Interface Files (EIF), External Inputs (EI), External Outputs (EO), and External Inquiry (EQ).
Counting elements are multiplied with numerical rating and summed to give Unadjusted Function Point (UFP). UFP is adjusted by Technical Complexity Factor (TCF) to collect the result in function point unit size.
CFP approach presented in [5,6] is applied to business application, real-time, and industrial software. The CFP approach is inspired from FPA and is mainly focused on data movement, which is classified into four types: entry, exit, read, and write. Software size will be replaced with the total number of data movements. G Karner [7] proposes a model to measure software size using use case diagram named Use Case Point (UCP). UCP approach emphasizes in user interaction and use case transactions of each use case. UCP consists of two components: unadjusted actor point (UAP), and unadjusted use case weight (UUCW). The components are summed and adjusted with Technical Complexity Factor (TCF) and Environment Complexity Factor (ECF).
A. Sellami et al [8] use a set of procedures for sizing sequence diagrams at a high level of granularity using the COSMIC rules, and for the measurement of the structural size of sequence diagrams at a finer level of granularity. The approach uses three interactions in sequence diagram including Alternative (alt), which represents optional behavior that will be compiled in the system, Optional (op) operator that represents a behavioral choice that will be chosen follows the condition.
G. Costagliola [9] et al present sizing approach for object oriented products using class diagram called Class Point (CP). The approach has three steps. First, identifying and classifying the classes into four types namely problem domain, human interaction, data management, and task management type. Second, assigning a complexity level and weighting to each class. Finally, determining class point by adjusting the weighted sum from the previous step.
L Lavazza et al [10] improve UCP method using Path (P) and Transaction (T) which is described in the use case description to create a measurement model established by linear regression. G Robiolo et al [11] offer an alternative way by creating an effort model with linear regression using use case transactions and scenarios as main factors.
H. Madhavji et al [12] present quality factors which are impacted to software architecture in people issues, including procrastination, missing/late meetings, team members(s) not delivering sufficient work, poor planning, and group strategy.

Activity and use case diagram in two perspectives
The approach divides activities into two perspectives: high level and low level. High level of activity diagram presents use case's relationships that are directly affected to software size such as "glue code" which includes activities or incidents. On the other hand, low level of activity diagram describes transactions and scenarios that are separately computed and weight with ideal number which includes activities or incident. Fig. 1 describes activity perspectives that are designed from use case diagram. There are transaction perspective and use case relation perspective presented in different level of abstraction as follows.
- Fig 1.a and 1.b show an activity diagram at low level that separately describes use case description of UC1 and UC2. Activity/Action represents transaction and Path represent scenario of use case diagram. -Fig 1.c shows a high level activity diagram that describes use case's relation. Activity/Action represents each use case in use case diagram.

Activity and use case diagram in two perspectives
In this section, Activity Point approach is proposed to estimate software size in early phase.

Total activity point
TAP is a factor, which presents size at low level, including TAW and TPW. The level of TAW and TPW is classified into 3 levels: simple, average, and complex. TAP is obtained by summation of TAW and TPW at row 1 in Table 1. A number of transactions in use case diagram is replaced by a number of activities or incidents. A number of scenarios is indicated by a number of paths in activity diagram. TAW is calculated from a Number of Activities or incidents (NA) level multiplied with weighting number at row 2 in Table 1. NA classification level and weight value are presented in Table 2. In the same way, TPW is computed from Number of Paths (NP) level multiplied with weighting number at row 3 Table 1. NP classification level and weight value are shown in Table 3.

Total activity complexity weight
TACW is a factor which represents a size at high level of activity diagram. The activity diagram will be transformed to Control Flow Graph (CFG) and specified complexity using McCabe Complexity approach. The total complexity represents size as Activity Complexity (AC). In addition, if the activity diagram consists of Folk and Join notation, there are possible cases as follows.
-Paths of fork/join are sequences. The paths will be combined to a single path as shown in Fig. 3 because it is equal to McCabe complexity result. The reason is parallel sequencing process does not affected to complexity increased. -In contrast, Fig. 4 shows paths of fork/join including decision or loop in different path. The notations are connected to a single path that the notations sequence connected is insignificant meaning if the loop or decision comes first to calculating complexity. -Synchronisation link between actions in different paths of fork/join is shown in Fig. 5 which the paths should be separated and linked together. TACW is computed from AC and weighting number at row 4 in Table 1. AC is classified into three levels as simple, average, and complex which are shown in Table 4.

Unadjusted activity point
UAP is an initial software size computed during software development from TAP and TACW at row 5 Table 1. The number of TAP (N) in the formula is equal to the total number of use cases at low level. TACW is described as size at high level.

Activity point
Activity point (AP) is a software development size, which is computed from UAP multiplied with adjusting factors at row 6 in Table 1.
The adjusting factors include TCF, ECF, and PRF which are subsequently shown in Table 5-7. TCF, ECF, and PRF are scored from 0 (not related) to 5 (highly related). TCF at row 7 in Table 1 is an adjusting factor in technical consideration. ECF at row 8 in Table 1 is an adjusting factor based on environment consideration. TCF and ECF calculation are the same as UCP approach in [7]. People risk factor (PRF) is an additional people factor proposed at row 9 in Table 1 which is used to find impact factors affected to AP. Formula at row 9 in Table 1 is created from estimated effort without PRF since PRF is assumed to be 1. The ideal number of PRF comes from the multiplier of the addition of personal factor (PF) and the ideal number that makes the lowest average MRE from 7 project training data sets which is presented at row 10 Table 1. PF in Table 7 is inspired from [12]. The highest risk is 5.2 when PF score is 35.

Effort
An estimation effort model is provided at row 11 in Table 1 which uses AP as size and is created by linear regression including predictive variable, and independent variable which subsequently represent actual effort and AP.  Simple alternative path in activity between 1 and 3. 3 Average More alternative path in activity between 4 and 7.

Complex
Much steps alternative path in activity between that more than or equal 8 9  Poor planing and group strategy. 1 R5 Customer Orientation 1.5

Defining weight values
Weight values of factors in AP approach are calculated using the algorithm as shown in Fig. 6. The multiplier of NA level, NP level, and AC level in (2) -(4) are executed from training data sets of 7 projects including 4 enterprise software developed with JAVA, 2 mobile applications developed with JavaScript, and 1 e-commerce website developed with PHP. Number of activities/actions, paths, and complexity are calculated to find correlation with actual effort. The correlation indicates which factor is more affected to actual effort. The correlation of activities/actions, paths, and complexity with effort are subsequently resulted as 0.823, 0.844, and 0.665. It means NP is the most affected to actual effort comparing with others. Therefore, the proportionate percentage affecting factor with actual effort is 50 percent of TPW, 40 percent of TAW, and 10 percent of TACW. In Table 2 -4, weight values are between 1 and 20. NA level, NP level, and AC level are weighted with the ideal number from the range which gives UAP the highest correlation with actual effort. UAP is a prior adjusted size that does not consider external adjusting factor. Therefore, adjusting factors should be considered as external factors to adjust UAP to be a final size as AP unit.
7 FIND ideal weight value of each point that gives the closest to actual effort of each project with looping. 4 MULTIPLY actual effort with the percentage.
8 SET standard weight value by average ideal weight result Figure 6. Algorithm for defining weight values Figure 1 is a demonstrating diagram which shows activity diagrams at low level and high level. At low level, there are a simple activity level in (1.a) and an average level in (1.b). The path level of low level in (1.a) and (1.b) are simple. TAW of (1.a) is equal to 2 and (1.b) is equal to 3. TPW of (1.a) and (1.b) are equal to 3. Then, TAP of (1.a) and (1.b) are subsequently equal to 5 and 6. At high level, McCabe complexity of (1.c) is equal to 3 as average level. Thus, TACW is equal to 14. TAP of (1.a), (1.b), and TACW are summed to be UAP that is equal to 25. Assuming that adjusting factor score is equal to 1 that gives result of TCF, ECF, and PRF is 0.74, 1.265, and 1.7 consecutively. Therefore, AP is calculated from multiplier of UAP, and adjusting factor which gives result of 39.50 and uses as an input to estimate software development effort.

Evaluation
Three projects as test data sets including 2 enterprise software developed with JAVA, and 1 mobile application developed with JavaScript are used to evaluate the AP approach. The test data sets are used to evaluate AP model. Average Magnitude Relative error (MRE) and Prediction Quality (PRED) are presented as quality factors of predictive expected effort. Table 8 shows the number of paths of 6 The International Conference on Information Technology and Digital Applications IOP Publishing IOP Conf. Series: Materials Science and Engineering 185 (2017) 012013 doi:10.1088/1757-899X/185/1/012013 each project at low level. Table 9 presents the number of activity/action of each project at low level. Table 10 provides the number of activity complexity of each project at high level. Activity point, estimated effort, and actual effort are presented in Table 11 with the average MRE of 0.09. The average MRE is calculated from absolute RE of each project compared with actual effort and estimated effort. PRED(0.25) is resulted as 1 to present quality of AP which shows the percentage of project that MRE lower than 25% comes from 100% of total projects.

Conclusion
Software size estimation is a factor that many effort models use. This research considers in estimating effort using software size estimation based on activity diagram named Activity Point. The approach emphasizes metrics that are used in use case point approach and additional metrics proposed. The additional metrics focus on use case relation as activities or incidents, paths, and complexity in activity diagram. Unadjusted activity point is a prior adjusted size. A new adjusting factor named people risk factor is proposed. Technical Complexity Factor (TCF), and Environment Complexity Factors (ECF) are alike presented in use case point approach. Evaluation is presented with absolute mean relative error and PRED(25). The average MRE is calculated from absolute RE of three projects compare with actual effort and estimated effort which is resulted as 0.09. PRED(0.25) is resulted as 1.0 to present quality of AP which shows the percentage of project that MRE lower than 25% comes from 100% of total projects. In the future works, the researchers will increase the efficiency of the model by working with more data sets, suitable weight value factors and adjusting factor scored criterion. A comparing with other size estimation approaches may be investigated. An automated tool supporting the proposed will be developed.