Machine learning for structural design models of continuous beam systems via influence zones

This work develops a machine learned structural design model for continuous beam systems from the inverse problem perspective. After demarcating between forward, optimisation and inverse machine learned operators, the investigation proposes a novel methodology based on the recently developed influence zone concept which represents a fundamental shift in approach compared to traditional structural design methods. The aim of this approach is to conceptualise a non-iterative structural design model that predicts cross-section requirements for continuous beam systems of arbitrary system size. After generating a dataset of known solutions, an appropriate neural network architecture is identified, trained, and tested against unseen data. The results show a mean absolute percentage testing error of 1.6% for cross-section property predictions, along with a good ability of the neural network to generalise well to structural systems of variable size. The CBeamXP dataset generated in this work and an associated python-based neural network training script are available at an open-source data repository to allow for the reproducibility of results and to encourage further investigations.


Introduction
It was recently argued that structural design is an inverse problem [1], in which one estimates the model parameters (the causal factors) of possible structural solutions from a set of structural utilisations (the observations).This inverse problem perspective, highlighted in Figure 1, is underscored by the ill-posed characteristics structural design shares with other inverse problems [2], which in civil and structural engineering include subject areas such as structural health monitoring [3,4], self-sensing smart materials [5,6] and forensic blast engineering [7,8].

Known priors
Design brief: loads and spans

Observations
Utilisation ratios

Model parameters Data Operations
Figure 1: The inverse problem perspective for structural design, which relies on known priors such as design brief details of loading and span requirements along with observations of utilisation ratios that represent structural adequacy to evaluate the model parameters of a solution, such as size, shape and topology of a viable structure.Structural analysis is treated as the forward problem.
Inverse problems are predominantly solved iteratively [9], and unsurprisingly so is structural design [10], often with the help of structural optimisation such as size [11,12], shape [13,14], topology [15,16] and layout optimisation [17,18].Provided that a clear objective function exists, these techniques are the state of the art for solving the structural design inverse problem iteratively.
However, in industry, the uptake of iteration based design approaches face certain barriers, including high computational costs [19], complex outputs that require additional post-rationalisation [20], and demand a particular expertise from practising design engineers that can be absent from engineering curriculums [21].These challenges have encouraged researchers to investigate the use of machine learning (ML) methodologies for structural design [22].This parallels a similar development of using ML within the domain of inverse problems [23], with exemplary applications in areas such as structural health monitoring [24,25], that aid or replace the optimisation problem with learned components.
The earliest application of such machine learned components for structural design occurred in 1989 [26] with simplified perceptron models.This research was followed in the 1990s by more advanced feed-forward neural networks for simple reinforced concrete beam depth estimations [27] as well as cross-sectional area predictions of trusses [28,29].Whilst other machine learning modalities such as support vector machines [30] have also been studied, neural networks tend to outperform other ML models archetypes in terms of prediction error [31].
More recently, deep learning techniques have been investigated for structural design.These include convolutional [32,33] and generative adversarial networks [34,35] to accelerate topology optimisation, and the application of variational auto encoders for structural design space exploration [36].A common limitation across such investigations is the inability for the same machine learned model to generalise to differently sized topologies and structural arrangements.These two challenges, highlighted by design ill-posedness and the inability of previous machine learning models to generalise to structural arrangements of arbitrary size, have motivated the work presented here.
This investigation has two objectives.The first objective is to reconcile the relationship between structural design, inverse problems and machine learning by developing a non-iterative structural design model for continuous beam systems using a multi-layer neural network.The authors believe that this perspective could serve as a framework to distinguish between different types of machine learning applications within the field of structural engineering in the future.The second objective is to address the inherent issue of generalisability in respect to system size by taking advantage of a recently developed concept known as a continuous beam's influence zone [37].This technique could potentially form the basis to generalise a design model for continuous structural systems of arbitrary topology, and might complement other techniques that attempt to address the generalisability issue such as graph neural networks [38,39].
The paper is structured as follows: Section 2 explores the problem statement from the inverse problem perspective and provides the rationale for machine learned design models, Section 3 explains the methodology employed to develop the generalisable structural design model, Section 4 presents the step-by-step process of the neural network development process, and Section 5 discusses the model's generalisability and prediction variability, along with suggestions for further research.

Problem statement 2.1 A novel perspective
The inverse problem perspective for structural design as shown in Figure 1 consists out of two operations, the forward and inverse problem (shown as the bottom and top ellipses, respectively) and three sets of data: observations, known priors and causal factors (shown as rectangles from left to right, respectively).One of the underpinning features of the inverse problem perspective is the clear demarcation between structural analysis and structural design, a distinction often re-iterated in engineering philosophy [40,41], yet never linked to the corresponding nature of forward and inverse problems, respectively.
Both the forward (structural analysis) and the inverse problem (structural design) rely on known priors, shown centrally in Figure 1, which can be thought of as constraints set by a design brief such as load and span requirements.During design they inform and regularise the search space of causal factors (model parameters such as section properties and topologies), and in analysis they allow the evaluation of observations (utilisation ratios such as ultimate (ULS) and serviceability limit states (SLS) [42]).Unlike traditional inverse problems, the observations are not measured physically, yet are expressed theoretically based on the utilisation ratios that could be measured from a compliant design solution which the set of causal factors correspond with; inverse problems are not defined by the physicality of the observations.Within this context, the application of machine learning in structural engineering can be split into three categories based on the type of operations the machine learned components replace.These categories help distinguish between fundamentally different types of machine learning applications that occur within the context of structural engineering and are identifiable across different decades of the literature: a) ML forward operators: machine learned components that aid or accelerate solving the forward problem (structural analysis) to inform or validate design decisions.Examples include neural network like models as quick re-analysis tools for optimum design (1991) [43] and machine learning models to determine the buckling behaviour and model decomposition of thin-walled members required for structural analysis (2023) [44].
b) ML optimisation solvers: machine learned components entirely motivated by the traditional iterative solution process to arrive at structural designs.Examples include "neural dynamic models" developed as an alternative structural design optimisation technique (1995) [45] and a physics informed neural energy-force network that replaces both the structural design and analysis steps (2023) [46].c) ML inverse operators: machine learned components which solve the inverse problem (structural design) by mapping a set of structural utilisations and known priors to model parameters directly.Examples include estimating cross-sectional properties for simple trusses directly based on known optimum examples using neural networks (1994) [29] and approximating topological optimised structures in real-time using convolutional neural networks (2022) [47].
These three categories can also be differentiated visually as shown in Figure 2. It is worth noting that the field of ML forward operators has likely received the most research attention in the form of "surrogate models" [48,49].In this respect, machine learned optimisation solvers and inverse operators are less common.Furthermore, the machine learned forward operators and optimisation solvers identified above typically require some form of iteration to achieve structural design; machine learned inverse operators on the other hand can be non-iterative [29,47].The ability to provide real-time design feedback is of particular interest to address the limitations of current iterative structural design approaches.To this end, and in support of the inverse problem perspective, this paper will focus on developing a non-iterative structural design model for continuous beam systems.

Design problem: continuous beam systems
Continuous beam systems arise in structural engineering when rigid connections between members are required or unavoidable due to design or material considerations.The support fixity and structural connectivity render the system statically indeterminate.This poses a challenge from a design perspective, since the compliance of cross-sectional properties cannot be evaluated without knowledge of their magnitudes; this results in an iterative design process, especially for complex design scenarios with heterogeneous loading and span conditions [50].
Figure 3 highlights the design problem for continuous beam systems from the inverse problem perspective.The known priors, which are shown centrally as the design brief, include the number of members m in the system indexed by i with span length L i from vector L = [L i ] 0≤i<m , subjected to uniformly distributed loads (UDLs) ω i from vector ω = [ω i ] 0≤i<m .These known priors and the utilisation ratios u of the members, shown on the left in Figure 3, are needed to evaluate the causal factors, shown on the right as the cross-section property vector P = [P i ] 0≤i<m .
The design problem is complicated due to the existence of c potentially critical load arrangements J indexed by j from set J = [J j ] 0≤j<c shown at the bottom of Figure 3.The size c of J was studied in [37].Each of these load arrangements cause different structural responses such as bending moments M, and will give rise to a matrix of utilisation ratios u i,j to form matrix u = [u ij ] 0≤i<m, 0≤j<c that can be evaluated with structural analysis to check for structural compliance (u i,j ≤ 1.0).Instead of repeatedly assuming cross-section properties P and conducting structural analysis calculations until the matrix of utilisation ratios u are compliant, a machine learned inverse operator relies solely on the known priors and the utilisation ratios u to directly evaluate the cross-section properties P.

Structural design
Estimate/update cross-section properties

variable UDLs w i
Beams with varying P i M j bending moment Design brief with given m = 3, and Cross-section properties P

Structural analysis
Evaluate utilisation ratios for various load arrangements Load arrangements J ... For the purpose of the continuous beam system considered in this work, several assumptions will be made: members are made out of S355 steel, are considered laterally restrained (and hence not susceptible to lateral instability), Timoshenko-Ehrenfest beam theory is used to model this system and the structure will be analysed elastically yet designed against ULS plastic cross-section property checks as allowed by Eurocode EN 1993-1-1 5.4.2 (2) [51].The cross-sectional properties to be evaluated include the major axis second moment of area I, the major axis shear area A z and the major axis plastic section modulus W pl for each member i. Together they form the memberbased cross-sectional property vector P i : The structural analysis operation in Figure 3 is defined by a forward operator: and similarly the structural design operation by an inverse operator: where both O forw and O inv rely on the same known priors, the design brief information m, ω and L that define the structural system and design problem.

The need for machine learned inverse operators
Defining an explicit non-iterative inverse operator for Equation 3 is challenging due to the difficulty of inverting the forward operator and is directly linked to the ill-posed nature common across most inverse problems [2].A quantitative evaluation of the extent of ill-posedness in structural design is not obvious, however it is possible to describe why the structural design problem shown in Figure 3 is ill-posed, namely due to the infinite number of viable solutions, and: a) The physical limitations introduced by yielding, buckling, serviceability that arise when combining the forward model with structural codes, resulting in a discontinuous relationship between the observations and causal factors.
b) The indeterminacy of the continuous beam system which increases with the number of members of the system.
Furthermore, any structural analysis forward operators themselves are approximations of the true behaviour of structures, and dealing with this associated uncertainty is a key challenge in design.For example, engineers need to decide if the assumptions and simplifications of structural analysis models, such as the material response (e.g.perfectly elastic) and underlying beam theory (e.g.Euler-Bernoulli theory), are representative of the structure's true behaviour.
The difficulty of inverting a forward operator can be shown mathematically.Typically, the O forw operator contains two steps.The first step, defined by O forw,1 would evaluate the structural response of the system when subjected to a set of external forces ω in terms of deflections and internal forces, and the second step, defined by O forw,2 , would take these structural response observations to evaluate the utilisation ratios based on design codes.Consider for example building a O forw,1 operator using the stiffness matrix method to evaluate the internal force vector [f p ] i for member i defined as: where V , and M represent the internal shear forces and bending moments within member i at the start (index 1) and end of the member (index 2).Let us also assume, for simplicity, that the members consist out of steel with E and G for the Youngs and shear modulus, respectively, with a maximum yield stress of σ y .In this case, the internal forces [f p ] i for each member could be evaluated using a simplified Timoshenko-Ehrenfest beam theory for a single load arrangement by Equation 5. To achieve this, [k pq ] i is defined as the local stiffness matrix shown in Equation 6, [K pq ] as the global stiffness matrix in Equation 7, [d q ] as the nodal displacement vector in Equation 8 with F p ([ω i ]) as the external force vector, where rows and columns of all matrices are indexed by p and q, respectively: These operations can be succinctly written to transform the cross-section vector P with help of the known priors m, ω, L into the internal forces vector for each member i: Inverting this equation to yield O inv is difficult since it would require separating or decomposing the individual cross-section properties P i out of the stiffness matrices [k pq ] i .This cannot be done without, at minimum, making some assumptions about the relative proportions of the crosssection properties from one member to another.Inverting the second step of the forward operator O forw,2 poses further challenges.Suppose O forw,2 transforms the internal member forces [f p ] i to evaluate the governing (critical) utilisation ratios indexed by r for t design checks for a single load arrangement J.For example, using the steel design code EN 1993-1-1 [51]: where: The difficulty here is that the governing utilisation ratio can change according to the known priors of the problem statement.This means that an individual equation for each possible critical design check would need to be derived.For example different design equations exist for the same structural check depending on the type of cross-section (Class 1 vs. Class 4) [51] a final design solution might contain, which is not known ahead of time.Note also that the equations above do not even consider the serviceability limit state, the multiple load arrangements J which may be critical, nor the need to sufficiently discretise individual beam members.
It is because of the challenges identified above that machine learned inverse operators are particularly appealing, since they can approximate a relationship between a set of variables that may be difficult to encode explicitly [52].Given a dataset generated by the O forw operator that maps a set of cross-section properties P to compliant utilisations ratios u, one can train a probabilistic machine learning model O † inv with parameters θ to map the set of bounded utilisation ratios u back to the cross-sectional properties P with known priors m, ω and L: By generating a dataset of valid structural designs with the help of existing optimisation approaches that contain the forward operator, a supervised machine learning model can be trained to learn the mapping of known priors and utilisations to cross-sectional properties directly.This represents a fundamental shift from traditional approaches employed in structural design that rely on engineering expertise and computationally expensive structural analysis or optimisation models at the point of design application.Machine learned inverse operators create non-iterative structural design models for which there currently exist no explicitly defined equivalents.Instead of focusing on accelerating forward models, computational resources can be invested in generating a dataset using physically complex yet realistic modelling assumptions.These machine learned structural design models aim to provide significantly greater generalisability than typical rules of thumb employed in design whilst still providing real-time feedback, benefit non-expert stakeholders whose own decision making relies on structural design outcomes and improves design knowledge permanence which can be difficult to attain due to industry turnover.

Choosing an appropriate machine learning model archetype
The aim of the inverse operator O † inv is to predict the cross-section property vector defined by Equation 1 numerically; therefore O † inv will be a regression model.This restricts the types of supervised machine learning models of interest.The complexity and size of the design space are likely to demand a large dataset size discouraging the use of instance-based models such as the knearest neighbour algorithm that store similarity measurements in memory [53].Similarly, support and relevance vector machines become impractical for datasets containing more than 3000 samples [54].The non-linearity of the design problem voids the applicability of linear regression models, and decision trees (including the ensembled variants such as random forests) perform better at classification tasks [55].
These reasons motivated the use of neural networks, in particular multilayer neural networks (MLPs), a choice which is supported by evidence that suggests neural networks outperform other data-driven approximation algorithms in structural engineering applications [31].Although various archetypes exist ranging from convolutional (CNNs), recurrent (RNNs) and graph-based types (GNNs), MLPs are commonly used in literature [28,29], and the results within this work could prove useful as a comparative performance measure for more advanced deep learning architectures [32,34,36] in future studies.
Multilayer neural networks have a fixed-dimensional input vector x 0 of size n that map to the output vector ).Multiple hidden layers give form to the neural network f through a function composition defined as: The exact choice of architecture in terms of depth D, height H and activation functions a d of the network as indicated in Equation 12will require experimentation to achieve acceptable performance with a good bias-variance trade-off [56].More importantly though, the features used for the input vector x 0 will require careful consideration to create a generalisable inverse operator O † inv as set out in Equation 11.

Selecting appropriate neural network features
Feature selection, the process of choosing appropriate inputs, is essential for a machine learning model to generalise well to unseen data points.Unnecessary or irrelevant features can cause a model to learn a relationship with target variables that are not representative of the physical behaviour of the system, and thereby lead to worse results when interpolating within or extrapolating beyond the training set.
Previous studies of neural network based design models selected features relevant to the singular topology of the structural system at hand [29,31].Such approaches expose the largest limitation of multilayer neural networks: the fixed-dimensionality of the input vector [38].These models may perform well for the particular topology they were trained against, yet the same model tends to perform worse or may not be applicable for differently sized structural systems, which severely limits their utility.
To address this limitation, this work takes advantage of a recently developed concept known as the influence zone [37].The influence zone k max is a measure of the extent to which surrounding design information is relevant for the utilisation evaluation of members.Whilst k max differs for each member within a continuous structural system as shown in Figure 4, for well defined design constraints and error thresholds, the maximum value of k max within continuous beam systems converges towards a non-negative integer.The influence zone of member g is found when the following two conditions are met: Design beam g = 3 Figure 4: A figurative influence zone of k max = 2 for design beam g = 3 within a m = 7 continuous beam system with ϵ max = 0.02 limit.
In Equation 13, ϵ max represents the maximum error threshold due to the difference between u g,cap , the captured utilisation ratio of the design beam g for a given value of k max , and u g,true , the true utilisation ratio of the design beam g if the contribution of all members of the continuous beam system had been considered.u g,i, j is the utilisation ratio contribution function towards the design beam g by member i based on the UDLs ω, spans L, structural properties P and load arrangements J.If the requirement for ϵ max is sufficiently relaxed, the maximum influence zone k max can be determined for any potential continuous beam system arising under the specified design constraints [37].This is extremely useful to ensure the relevant inputs are fed to a machine learning model.The influence zone thereby acts as a mechanics-driven feature selection process, and provides the basis to generalise to a continuous beam system of arbitrary size m.

Structuring features for arbitrary system size m
Zero-padding, the process of adding zero-valued inputs, arises in the context of convolutional neural networks to allow trained kernel filters to parse through the edges and corners of an input space [56].This technique can also be applied to continuous beam systems to conceptualise a design model that parses over a structural system to make localised predictions for each member i.If the design information, here the UDLs ω and span L that fall within the influence zone are provided as inputs to the network, then this would result in an input vector x 0 of size n = 4k max + 2, as shown in Figure 5 for member i = 3 and i = 0.These inputs should, based on the principle of the influence zone, contain the relevant information to predict the cross-section properties of member d with an accuracy of up to ϵ max .
It is now conceivable that the same neural network could be used to make a prediction for any other member using a fixed-dimensional input vector x 0 by structuring the inputs relative to the position of the design beam's influence zone.This would include end-span beams by using zero-padding as shown in Figure 5 for member i = 0. Zero-padding in this instance is also logically consistent, since it corresponds with a beam that does not in fact exist; that is a beam of zero length L and zero UDL load ω.Therefore, instead of structuring the neural network based on the absolute position of a beam within the entire continuous beam system (as indexed by i), the inputs are structured relatively to the influence zone of a design beam g to predict the cross-section properties of that design beam P g .

Zero-padding
Influence zone for beam i = 0 Influence zone for beam i = 3 Beams with varying P i Max.variable UDLs w i Whilst such an approach will require m forward passes (inferences) to predict the cross-section properties of an m sized system (one prediction per beam), it enables the same neural network to be applied to continuous beam systems of any size m for which the maximum influence zone value k max that determined the size of the input vector x 0 applies to.Based on the principle of influence zones, the neural network will be able to make predictions for continuous beam systems of size greater than the fixed-dimensional input vector size m > 2k max + 1, since any information outside the influence zone should by definition not be relevant (for an assumed ϵ max ).On the other hand, zero-padding allows the same neural network to predict along system edges as well as continuous beam systems of sizes smaller than the influence zone.

Generating an appropriate dataset
As explained previously, the maximum influence zone k max size depends on the design constraints and an assumed error threshold ϵ max .These design constraints can be defined by setting minimum and maximum ranges on the known priors, UDLs ω and spans L, as well as the crosssection properties within vector P = [I, A z , W pl ]: Constraints for each of these variables were chosen generously to cover the entire range of potential continuous beam systems that arise in structural design (from fixed framed multi-storey buildings to continuous bridge decks).Table 1 highlights the ranges chosen for the UDLs and spans, along with the interval at which these inputs were sampled at using a random uniform distribution.
Property Min.Interval Max.
0.5 0.5 20.0 Table 1: Ranges and intervals of know priors used for the influence zone evaluation and data generation.
Although arbitrary cross-section property combinations could have been chosen for I, A z and W pl , using cross-section properties from an explicitly defined set ensures the predicted cross-section properties are physically realistic.Initially, the standardised UB cross-sections from BS EN 10365 [57] were considered.However, the minimum and maximum cross-section properties from this set were not sufficient for the lightest and heaviest loading conditions possible under the design constraints set by Table 1.For this reason, a set of custom I-sections were generated and used exclusively for all members.
These custom I-sections were generated by averaging the geometrical ratios between the web depth d w , flange thickness t f , flange breadth b f and the web thickness t w that arise in BS EN 10365 [57].Aside from ensuring that they share commonalities with the UB BS EN 10365, this process also ensured at minimum Class 2 sections [51] to allow the use of plastic cross-section properties.1000 individual cross-sections were generated that ensured equal spacing across these ratios.The resulting granularity (as opposed to the 91 within BS EN 10365) meant that the utilisation ratio precision achievable during data-generation was significantly higher.The custom I-sections and associated cross-section properties are shown in Table 2.
Together, these efforts ensure that the dataset on which the neural network is trained on covers sufficient breadth in terms of the input and output space to generalise for a wide variety of continuous beam systems.The dataset generated based on the aforementioned design constraints, the concept of influence zones, and the technique of zero-padding were chosen with the aim to maximise the generalisability of the inverse operator for any system size m, UDLs ω and spans L. This leaves only the utilisation ratios u as the remaining input variable in Equation 11.Instead of passing utilisation ratios as explicit inputs to the network, it was decided that the dataset will be generated so that all beams closely correspond to the target utilisation ratio u target .The network will therefore implicitly learn the u target from the data itself.
The dataset was generated by designing continuous beam systems of size m = 2k max + 1 with each member having a span L and UDL ω value drawn from a random uniform distribution based on the discretised ranges and intervals specified in Table 1.These heterogeneous structural systems were modelled and optimised using third-party software (Rhino3D © , Grasshopper © and Karamba3D © [58]) after having identified the influence zone k max for the design constraints in Table 1 and Table 2.The beams were optimised for minimum depth against ULS cross-section checks from EN 1993-1-1 6.2 [51] using a coupled analysis and design procedure [50] with a target utilisation ratio of u target = 0.99.

Neural network training procedure
The generalised neural network structure developed in this work is shown in Figure 6.Identifying an appropriate architecture in terms of height H, depth D and activation functions a d requires experimentation.The choice of loss function J to compare predicted targets x D against true targets x D also form part of the experimentation process.

Loss functions and performance metrics
In this study, four different loss functions were investigated as shown in Table 3.These include the Mean Absolute Error (MAE) and Mean Square Error (MSE) loss functions that are commonly used for regression models.One limitation associated with both is that their derivates (in respect to predicted targets) back-propagate the model parameters θ with no regards what the relative size of the error is in relation to the magnitude of the output variables I, A z and W pl .This is problematic given the orders of magnitude difference between the largest and smallest section properties of the custom I-sections as shown in Table 2.An error of 100 cm 4 for I would cause the same back propagation adjustment using MAE or MSE regardless if the true second moment of area value target is 305 cm 4 or 305 × 10 5 cm 4 .As a consequence, both MAE and MSE would prioritise minimising the absolute error, which mathematically favours target values of large magnitudes at the expense of smaller ones.
To address the above mentioned issue, percentage-based versions of both MAE and MSE were tested, defined in Table 3 as the Mean Absolute Percentage Error (MAPE) and the Mean Squared Percentage Error (MSPE).Whilst MAPE is commonly used, MSPE is not tested in practice.Both MAPE and MSPE ensure that during back-propagation, the optimiser updates model parameters in proportion to the relative deviation between predicted x D and true outputs x D , which should be a better performance criterion to address the orders of magnitude difference in the output space that arise in these particular continuous beam systems.
Regardless of the choice of loss function, MAPE will be used as a comparison metric between different networks.However to study the dispersion of prediction errors, an accuracy metric M will also be evaluated with minimum, 0.5%, 2.5%, 50% (median), 97.5%, 99.5% and maximum percentile values.This will allow the evaluation of the 95% and 99% confidence intervals (CI) and help identify the range of over and under prediction of outputs, which is important in the context  of safe structural design:

Activation functions
The activation functions tested in this work are listed in Table 4, and includes the commonly used rectified linear unit (ReLU) function amongst others [56].A distinction is drawn between the inner activation functions a in within the hidden layers, and the outer activation function a out , that evaluate the target values x D .All inputs and outputs were scaled between 0 and 1 by dividing the values by the maximum magnitude of the features and targets within the training and validation set, respectively.Therefore, it is important to choose only output activation functions compatible with the scaled values of the targets as reflected by the range of a out functions listed in Table 4.

Height and depth analysis
The appropriate size of a neural network in terms of height H and depth D was found by finding a suitable trade-off between under and over fitting the model parameter space.The design complexity of continuous beam systems will likely be reflected in deeper and wider neural networks than those considered in previous literature [31] due to the large number of load arrangements that may be critical, the numerous design criteria that govern the design, and the variety of viable crosssections.For this reason, a wide range of heights and depths were tested.The size of the networks

Name
Formula Derivative Mean Absolute Error were denoted by a simple syntax based on the architecture of the hidden layers.For example, "50-50-50" refers to a neural network with three hidden layers with 50 nodes each.

Other neural network parameters and hyperparameters
Given the large dataset size and computational resources required for training, a simple holdout strategy was deemed appropriate as opposed to other validation strategies [59], and hence the final dataset was randomised and split into training, validation and testing sets using a 70%, 15%, 15% split, respectively.The testing set was only used once after an appropriate neural network architecture was found experimentally.A robustness check with various initialiser seeds was carried out on the final architecture.Other neural network training aspects, such as optimisers, types of initialisers, learning rates and batch-sizes were chosen empirically based on MAPE performance and qualitative comparison of learning behaviour.The options/ranges for each of these are summarised in Table 5.All stochastic elements were controlled through explicit initialiser seeds.

Summary of methodology
The following procedure was adopted to develop the machine learned inverse operator: 1. Evaluate the maximum influence zone k max for the continuous beam system using the procedure from [37] based on the design constraints specified in Tables 1 and 2. Table 5: Options and/or ranges of neural network learning parameters and hyperparameters tested, along with selected parameters for all training runs presented in results.
2. Design continuous beam systems of size m = 2k max + 1 using a coupled analysis and design approach [50] with a target utilisation ratio u target = 0.99.Each beam within the continuous beam system will correspond with one data point, with zero-padding for edge or near-edge beams as shown in Figure 5. Finally, split and normalise the data into a training, validation and testing set as explained in Section 3.5.
3. Develop the neural network model using the following steps: 3.1.Assume a standard 50-50 architecture and test out the various combinations of loss and activation functions as identified Table 3  4. Evaluate the performance of the final neural network against the testing dataset and conduct a robustness test using various initialiser seeds for the weights and biases.

Influence zone size estimation
The maximum influence zone k max of continuous beam systems subject to design constraints specified by Tables 1 and 2 was established.Using the procedure from [37], 25 random UDL and span distributions were generated for a m = 17 sized system and designed against ULS checks from EN 1993-1-1 [51] using the custom I-sections specified in Table 2 with a target utilisation ratio u target = 0.99.This led to the creation of 10,625 continuous beams (25 × 25 × 17).Each beam's influence zone value was evaluated using an error threshold of ϵ max = 0.02.This threshold was selected based on the expected MAPE performance achievable with the multi-layered neural network, with the results shown in Figure 7.The results indicate that the average and maximum influence zone size is k max = 1.75 and k max = 5, respectively.This suggests the system size required for the dataset generation is m = 2k max + 1 = 11, and the required input layer size is 4k max + 2 = 22.This influence zone evaluation took 5 hours of computation time.

Data generation, visualisation and pre-processing
Drawing from uniform distributions for spans and UDL values identified in Table 1, two datasets were created.The first consists out of 266 unique UDL ω and span L permutations of m = 11 sized continuous beam systems, and the second out of 251 unique permutations.A coupled analysis and design optimisation approach [50] with a target utilisation ratio u target = 0.99 based on ULS crosssection checks [51] and all critical load arrangements [37] was implemented to find the appropriate  1 and 2 using the methodology from [37].
custom I-section from Table 2 for each beam within the system.This process resulted in 1,471,327 individual data-points (11 × (266 2 + 251 2 )) that took 3.5 days to generate.The distribution of utilisation ratios achieved for the specified target utilisation ratio u target = 0.99 are shown in Figure 8a).Since the design space is limited to discretized cross-section properties the utilisation target ratio u target = 0.99 was rarely met exactly.Therefore, a sub-selection of this dataset took place, discarding all of the beams that fell outside of utilisation ratio range 0.97 ≤ u < 1.00.Although only 54,322 data-points belonged to the discarded set, the dataset was further stripped of all data points which had beam members within their system that belonged to the discarded set, even if those beams themselves fell within the selected utilisation ratio range.These data points are defined as the "Imperfect set" in Figure 8a).This process removed another 415,038 data-points.This left 1,001,957 data points (1,471,(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52)(53)(54)322), each representing a beam within a m = 11 member system and the surrounding design information from the influence zone of valid structural designs under ULS conditions.This set was randomised, further stripped of another 1957 data points, to yield a dataset size of exactly 1 million (1000k).This 1000k dataset was named CBeamXP: Continuous Beam Cross-section Predictors [60] and represents ULS compliant beam systems of system size m = 11 with utilisation ratios between 0.97 ≤ u < 1.00.The CBeamXP dataset was split into a training, validation and testing set using a 70%, 15%, 15% split as shown in Figure 8b).Histograms of the spans, UDLs, utilisation ratios and cross-section indices (corresponding to one of the 1000 custom I-sections in ascending stiffness order) are shown in Figure 9.For pre-processing, all inputs and outputs were divided by the maximum value within the training and validation set.

Loss and activation function variations
The neural network development began by evaluating the MAPE performance of 50-50 architectures for different loss functions J, inner a in and outer a out activation functions.These networks were trained for 1000 epochs using 100k datapoints from the training set, yet validated against the entire 150k validation set.The total training time was 5 hours with results shown in Table 6.
The results clearly indicate that the percentage-based loss functions J MAPE and J MSPE typically outperform their non-percentage-based counter-parts.Performance between either J MAPE and J MSPE was relatively similar, with all MAPE values of less than 10% (0.100) bolded.J MAPE was chosen as the loss function for this investigation due to it being more commonly used.The five best inner and outer activation function combinations in Table 6 under J MAPE (bolded values) were subsequently qualitatively analysed.
This qualitative analysis highlighted that using ReLU as an outer activation function allows the prediction of null cross-section properties.This is an invalid prediction since the model operates on the basis that a beam with some minimum cross-section properties must exist in the context of this structural system.A similar limitation applied for the sigmoid activation function which asymptotically approaches the value of one at positive infinity.This limits the network's ability of predicting cross-sections larger than those found within the training and validation dataset.For these reasons, the a out,exp function was selected for this particular network architecture since a out,exp does not result in zero-valued cross-section properties and also does not impose an upper limit on the outputs.From the remaining viable networks, the a in,ReLU and a out,exp architecture converged the quickest and was therefore chosen for further development in this study.

Height and depth variations
The architecture of the hidden layers (height H and depth D) needs to be sufficiently expressive to reflect the design complexity of continuous beam systems, and need to avoid under-and overfitting the model.Therefore, comparison between training and validation performance is needed.The 100k training and 150k validation sets from section 4.3.1 were re-used for this purpose.Figure 10 compares the performance of various networks containing two hidden layers of varying heights at epoch 1000.The combined training time of these networks was 22 hours.The 600-600 network was identified as the point at which the performance transitioned from under-fitting to slight overfitting.Figures 10b) and c) further indicate the accuracy profiles for both training and validation, respectively.Note that the maximum validation accuracy values greatly exceed the value of at least 19 (1900%) for all networks, regardless of height, meaning the network predicted cross-sectional properties 19 times larger than the target value.
The "optimal" number of hidden layers for height 600 was investigated, with results shown in Figure 11.This resulted in a combined training time of 12 hours.Networks with more than three hidden layers showed no major improvements in either training or validation performance except in minimum training accuracy.However, this was not associated with an improvement in minimum validation accuracy as shown in Figure 11c).For these reasons, a depth of three hidden layers was deemed appropriate.

Model performance: testing and robustness
The final neural network model consists of a 600-600-600 architecture with a in,ReLU and a out,exp activation functions trained using the J MAPE loss function based on a training and validation dataset size of 700k and 150k data points, respectively, with learning graphs shown in Figure 13.The neural network at epoch 1000 was also evaluated against the testing set created in Figure 8, and checked for robustness by re-training the same network a further 9 times using different kernel initialiser seeds, which took 4.5 days.The general model performance results and standard deviations σ initialiser due to these different initialiser seeds is summarised in Table 7.
The similar performance between the testing and validation set in    that the model is likely to generalise well to new data-points.The impact of changing initialiser seed is minimal except for the minimum and maximum accuracy values.

Model generalisability
One of the fundamental objectives of this work was to develop a machine learned structural design model capable of generalising beyond the system size m = 11 it was trained on.To achieve this, the influence zone concept was leveraged with zero-padding to theoretically allow the neural network to make localised predictions for continuous beams of arbitrary system size m.To test this, over 1000 additional testing data-points were generated using the same methodology as described in Section 3.4 and sub-selection process as shown in Section 4.2 for each system size 1 ≤ m ≤ 20 (including m = 11).The MAPE and accuracy performance are shown in Figure 14.

95% CI Median
Figure 14: Generalisability performance of the final 600-600-600 neural network in terms of MAPE and accuracy for unseen structures of varying system size m.
Figure 14 indicates that the machine learned inverse operator demonstrates strong generalisation capability for continuous beam system sizes m ≥ 5 with MAPE ≈ 2%.System sizes m < 5 saw a slightly deteriorating MAPE values of 5% − 6%.These are encouraging results given that the neural network was never trained on system sizes less or greater than m = 11.The greatest variations in performance were typically in the maximum and minimum accuracy values; in fact the model performed often better in terms of maximum performance for system size other than m = 11.
These results provide merit to the novel implementation of the influence zone concept [37] as a mechanics driven feature selector and using zero-padding to build machine learned inverse operators capable of generalising to differently sized continuous structural systems.This opens the possibility of investigating the applicability of this methodology for two or three dimensional frames.Furthermore, these results also provide a solution to the limitation of fixed-dimensional input vectors of multi-layer neural networks [38].
In recent years, other researchers have investigated the development of generalisable machine learning models; most of these efforts have focused on machine learned forward operators [38,39,61].Within the realm of structural design inverse operators, researchers have noted that the question of generalisability remains typically under-investigated [32].Whilst previous works studied the ability of neural networks to generalise under different boundary conditions [33,34], this work distinguishes itself on generalising across differently sized systems.Combining the underlying techniques behind these studies may allow one to train a generalisable model of arbitrary size and arbitrary boundary conditions.

Performance variability
This investigation also differentiated itself from previous works by measuring the variability of predictions in terms of accuracies.Notably, this allowed one to identify the range of over-and under-predictions, which are not captured by average loss function metrics such as MAE or MAPE.Despite gradual improvement within the 95% and 99% confidence intervals, the final performance graph in Figure 13 indicates that the confidence intervals of the validation set lag those of the training set.The same can also be said for the testing set, especially for maximum and minimum accuracies as shown in Table 7.
To identify potential causes of this divergence in performance between the training and testing set, custom box plots of testing accuracies were generated for a number of variables that describe the dataset, ordered based on ascending deciles (D 0 to D 10 ).By evaluating the standard deviation of each decile's accuracy values, and taking the standard deviation of those standard deviations σ(σ Deciles ), one can quantify numerically which variable causes the greatest dispersion of the accuracy values.These results are shown in Figure 15.By studying Figure 15 in detail, it was identified that the total load variable ω 0 × L 0 caused the greatest σ(σ Deciles ) dispersion as seen in Figure 15f).Figure 15f) also showed the most identifiable demarcation between low and high accuracy results.The prediction variability of cross-section properties of a beam is the worst when the combined product of both the UDL load ω 0 and span L 0 fell in the lowest Decile (< D 1 ).This pattern can also be identified by studying heat-maps of the average and maximum MAPE performance which occurred at each ω 0 and span L 0 combination within the dataset as shown in Figure 16.Using structural engineering intuition, one infers that the design of short and lightly loaded spans is more likely to be influenced by the UDLs of the surrounding members within a continuous system.Whilst the influence zone concept ensures the pertinent design information is contained within the inputs, providing that information solely in the form of an input vector may not be sufficient to make accurate predictions under all circumstances.The fact that there were also wide prediction variabilities for the smallest deciles for both the second-moment of area and stiffness values as shown in Figures 15b) and c) suggests that exposing the machine learning model to additional physics knowledge (other than influence zones) of the structural system may lead to further improvements.
In one of the early studies, Berke et al. [28] noted that despite achieving relatively low prediction errors on average, neural network predictions can occasionally vary significantly.In more recent works, the presence of large error predictions (over >40%) were noted and manually removed from the final reported average prediction error metric [33].The results from this study highlight that this variability issue needs to be further addressed.So far, the authors have identified only a single study that investigated error variability when evaluating machine learning performance for civil and structural engineering applications [31].The use of the accuracy metric along with its minumum, maximum, 95% and 99% metrics could provide a framework to study error variability in more detail.

Other neural network performance observations
Despite using a simple multi-layer neural network for the structural design inverse operator O † inv , the development procedure successfully lowered the validation error from MAPE values of ≈ 10% in Table 6 to 1.6% in Table 7, a performance that was matched by the testing dataset as well.This was attributable to numerous factors, a notable one being the use of the percentage based loss function, which as anticipated in Section 3.5, was more suitable for the dataset given the orders of magnitude differences in the targets.The use of the exponential output activation function a exp may also have positively contributed to dealing with target values that vary greatly in magnitude.
The lack of literature on machine learned structural design models for continuous beam systems means that a direct comparison of the 1.6% MAPE performance is not possible at present.However, one can compare this performance with performance metrics of structural design models developed for different applications.For example, the network developed in this work outperformed previous multi-layer neural network regression models; an early concrete beam prediction model achieved a MAPE value of 10.17% [27], whilst a cross-section predictor of aerospace components averaged out at a MAPE value of 5% [28].The network presented in this study also performed well when compared to more advanced network architectures such as convolutional neural networks for topologically optimised truss structures that achieved voxel value errors of 5.63% [33].Comparison with further works that developed machine learned structural design models was not possible for studies which reported performance with non-percentage based metrics such as MAE [29] or MSE [31,32,47].
This study also differentiates itself by the quantity of data it was trained on (up to 700,000 data-points), which based on Figure 12 helped improve validation performance.Early works from the 1990s had training set sizes smaller than 100 data-points [27,28,29], and even more recent literature only trained using 600 [31], 12,000 [32] 28,000 [33] or just under 40,000 [34] data-points.Whilst large datasets significantly increase computational cost, the combination of big data and more advanced neural network architectures may improve performance further, both in terms of average error and prediction variability.

Limitations and scope for future works
There are multiple limitations that restrict practical use of the proposed design model.The first is the fact that the structural systems within the dataset were designed against ULS constraints only, and made other assumptions on the nature of the design problem listed in Section 2.2.The generalisability of the model, specifically for system sizes m < 5 also requires further work, and the issue of prediction variability will also require additional investigation in terms of either model architectures or generating larger datasets.Furthermore, there likely exist a wide range of mathematical techniques from inverse problems that could aid in developing and assessing operators for structural design problems, by for example estimating the Lipschitz coefficient of the mapping.These limitations provide a clear basis for further works in the future.
On another note, Table 8 summarises the total computation time required for the entirety of the results section.Whilst the computation time could have been accelerated through parallelisation, improved computation resources and simplification of metric evaluations algorithms, the purpose of Table 8 is to indicate the relative proportion of time spent at each stage.Greater computational resources may allow investigations using alternative validation strategies such as k -fold cross-validation [59] and automated hyperparameter selection procedures [62] that could result in improved performance.A significant portion of the computation effort was spent simply generating the data-points for training, validation and testing.In light of encouraging reproducibility studies [63] and to encourage research that improves the predictive capability of the machine learned structural design model presented here, the CBeamXP dataset along with an associated python-based neural network training script are made available at an open-source data repository [60].

Conclusions
This work developed a new neural network based structural design model to predict crosssection property requirements of continuous beam systems non-iteratively.The major contributions of this investigation include: • Framing structural design as an inverse problem, and using this novel perspective to identify three distinct types of machine learning applications.One of these types, machine learned inverse operators, were investigated in this work to develop a non-iterative structural design model.This presents a fundamental shift from traditional design approaches.
• Developing a non-iterative structural design model for continuous beam systems of arbitrary member size through the novel use of influence zones [37] to provide a mechanics-driven feature selection process that enhanced the model's generalisability.
• Achieving a mean absolute percentage error of 1.6% which was lower than machine learned structural design models from comparative literature.This performance was attributable to the careful consideration of the network architecture in terms of height and depth of the hidden layers, the selection of loss and activation functions that were appropriate to address the challenges posed by continuous beam system, and a dataset size of 700,000 data points.
• Identifying the importance of measuring and reducing prediction error variability.In this study the 99% confidence interval for testing accuracy was between 91.7% and 113.8%.Reducing prediction variability is a significant knowledge gap in literature, especially in regards to machine learning applications within safety critical systems such as structural design.
The CBeamXP dataset generated in this work containing one million data-points along with an associated python-based neural network training script were published at an open-source data repository [60].Aside from allowing results to be reproduced, sharing this data will hopefully encourage future research towards machine learned structural design models that improve the mean absolute percentage error, generalisability, or prediction variability achieved in this investigation.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.and the Ramboll Foundation.For the purpose of open access and funding stipulations, the authors have applied a Creative Commons Attribution (CC-BY) licence to any Accepted Manuscript (AM) version arising.

Figure 3 :
Figure 3: Design process of a continuous beam system from the inverse problem perspective.
x D of size o with D layers.In this study, a network has D − 1 hidden layers of height H each indexed by d and are defined by f d (x d ), which contains a (non-linear) activation function a d with weight matrix w d and bias vector b d .The weight matrices and bias vectors of each layer form the model's parameters θ

Figure 5 :
Figure 5: An illustration demonstrating the structuring of the neural network inputs using influence zones and zero-padding with k max = 2, leading to n = 4k max + 2 = 10 inputs.

Figure 6 :
Figure6: Generalised neural network structure with known priors from the influence zone k max as the input layer x 0 and cross-section properties of the beam as the output layer x D .

Figure 7 :
Figure 7: Influence zone results for a m = 17 system with ϵ max = 0.02 based on the design constraints established by Tables1 and 2using the methodology from[37].

Figure 8 :
Figure 8: Sub-selection of data points from the initial 1,471,327 dataset based on a ULS utilisation ratio u range of 0.97-1.00.The 1000k CBeamXP dataset was drawn from the green sets.

Figure 9 :
Figure 9: Frequency distributions for various descriptor variables of the CBeamXP dataset.Spans and UDL values are uniformly distributed, whilst the selected cross-section indices of the optimised beam systems follow a normal distribution.

Figure 12
Figure12shows the change in performance as a function of the training dataset size, from 25k to 700k data points, with the same 150k validation dataset as in the previous sections.The combined training time was 1.5 days.Except for slight variations in the minimum and maximum accuracy values, the performance of the neural network naturally improved with a larger training set.

Figure 10 :
Figure 10: Loss and accuracy profiles for a in,ReLU and a out,exp networks at epoch 1000 with J MAPE with two hidden layers of equal height.Training set size of 100k and validation set size of 150k.

Figure 11 :
Figure 11: Loss and accuracy profiles for a in,ReLU and a out,exp networks at epoch 1000 with J MAPE with hidden layers of height H = 600.Training set size of 100k and validation set size of 150k.

Figure 12 :
Figure 12: Loss and accuracy profiles for 600-600-600 a in,ReLU and a out,exp network at epoch 1000 with J MAPE for various training dataset sizes, with a validation set size of 150k.

Figure 13 :
Figure 13: Loss and accuracy profiles for 600-600-600 a in,ReLU and a out,exp network at epoch 1000 with J MAPE , 700k training and 150k validation sets.Note use of logarithmic y-axis to show the full range of maximum validation accuracies.

Figure 15 :
Figure15: Custom box plots of accuracies vs. variables in binned deciles.Total load f) correlates with the greatest dispersion σ(σ Deciles ) = 0.045, which can also be identified visually.

Figure 16 :
Figure 16: Total load heatmaps which evaluates the a) average MAPE and b) maximum MAPE values for all UDL ω 0 and span L 0 combinations.Notice how maximum MAPE errors occur at low total load combinations (ω 0 × L 0 ) within the first decile up to D 1 .

Table 3 :
Loss functions to be tested with p predicted targets x D , p true targets x D and small ϵ to avoid division by zero errors.

Table 4 :
Table of inner a in and outer a out activation functions to be tested with weight vector w, bias vector b and layer vector x.
and Table 4, respectively, based on 100k training data points.
3.2.Test various height H and depth D variations as explained in Section 3.5 based on 100k training data points.3.3.For the best architecture (height, depth and activation function), test the performance against different training set sizes.

Table 6 :
Validation MAPE metrics at epoch 1000 for different combinations of loss J, inner a in and outer a out activation functions for a 50-50 architecture using 100k training and 150k validation data points.MAPE values of less than 0.100 (10%) are in bold.

Table 7
strongly suggests

Table 7 :
Loss and accuracy profiles for 600-600-600 a in,ReLU and a out,exp network at epoch 1000 with J MAPE , 700k training, 150k validation sets and 150k testing set.

Table 8 :
Computation time for each neural network development stage.