CHAPTER 7: REGIONAL ANALYSIS 
"In most natural systems, drainage from the uplands finds its way into rivers and then into the ocean. Ocean disposal
is nature's way of moving dissolved salts out of the landscape."
Jan van Schilfgaarde (1990)

This chapter is divided into three sections. Section 7.1 describes joint probability distributions, including marginal distributions and conditional probability. Section 7.2 describes the techniques of regression analysis. Section 7.3 presents selected techniques for regional analysis of flood and rainfall characteristics.

7.1 JOINT PROBABILITY
In engineering hydrology, regional analysis encompasses the study of hydrologic phenomena with the aim of developing mathematical relations to be used in a regional context.
Generally, mathematical relations are developed so that information from gaged or longrecord catchments can be readily transferred to neighboring ungaged or shortrecord catchments of similar hydrologic characteristics.
Other applications of regional analysis include regression techniques used to develop empirical (i.e., parametric) equations applicable within a broad geographical region.
Regional analysis makes use of statistics and probability, including frequency
analysis (Chapter 6) and joint probability distributions.
Joint Probability Distributions
Probability distributions possessing one random variable (X) were discussed in Chapter 6.
These are called univariate distributions.
Probability distributions with two random variables, X and Y, are called bivariate or joint distributions.
A joint distribution expresses in mathematical terms the probability of occurrence of an outcome consisting of a pair of values of X and Y.
In statistical notation, P(X = x_{i}, Y = y_{j}) is the probability P that the random variables X and Y will take on the outcomes x_{i} and y_{j} simultaneously.
A shorter notation is P(x_{i}, y_{j}).
For x_{i} (1, 2, ... , n), and y_{j} (1, 2, ... , m), the sum of the probabilities of all possible outcomes is equal to unity:
_{n m}
Σ Σ P(x_{i}, y_{j}) = 1
^{i = 1 j = 1}
 (71) 
A classical example of joint probability is that of the outcome of the cast of two dice, say A and B.
Intuitively, the probability of getting a 1 for A and a 1 for B is P(A = 1, B = 1) = 1/36; see Fig. 71.
In total, there are 6 × 6 = 36 possible outcomes, and each one of them has the same probability: 1/36 (assuming, of course, that the dice are not loaded).
This distribution is referred to as the bivariate uniform distribution because each outcome has a uniform and equal probability of occurrence.
The sum of the probabilities of all possible outcomes is confirmed to be equal to 1.
Figure 71 Joint probability: The outcome of two dice.

Joint cumulative probabilities are defined in a similar way as for univariate probabilities:
_{k l}
F(x_{k}, y_{l}) = Σ Σ P(x_{i}, y_{j})
^{i = 1 j = 1}
 (72) 
in which F(x_{k}, y_{l}) is the joint cumulative probability.
Continuing with the example of the two dice, the probability of A being ≤ 3 and B being ≤ 3 is the sum of all the individual probabilities, for all combinations of i and j, as i varies from 1 to 3, and as j varies from 1 to 3; i.e., 3 × 3 = 9 possible combinations, resulting in a probability equal to 9 × (1/36) = 1/4.
Marginal Probability Distributions
Marginal probability distributions are obtained by summing up P(x_{i}, y_{j}) over all values of one of the variables, for instance, X.
The resulting (marginal) distribution is the probability distribution of the other variable, in this case Y without regard to X.
Marginal distributions are univariate distributions obtained from bivariate distributions.
In statistical notation, the marginal probability distribution of X is:
_{m}
P(x_{i}) = Σ P(x_{i}, y_{j})
^{j = 1}
 (73) 
Likewise, the marginal distribution of Y is:
_{n}
P(y_{j}) = Σ P(x_{i}, y_{j})
^{i = 1}
 (74) 
The example of the two dice A and B may be used to illustrate the concept of marginal probability.
Intuitively, the probability of A being equal to 1, regardless of the value of B, is 6 × (1/36) = 1/6.
Likewise, the probability of B being equal to 4, regardless of the value of A, is also 1/6.
Notice that the joint probabilities (1/36) of each one of all 6 possible outcomes have been summed in order to calculate the marginal probability.
Marginal cumulative probability distributions are obtained by combining the concepts of marginal and cumulative distributions.
In statistical notation, the marginal cumulative probability distribution of X is:
_{k m}
F(x_{k}) = Σ Σ P(x_{i}, y_{j})
^{i = 1 j = 1}
 (75) 
Likewise, the marginal distribution of Y is:
_{n l}
F(y_{l}) = Σ Σ P(x_{i}, y_{j})
^{i = 1 j = 1}
 (76) 
The example of the two dice A and B is again used to illustrate the concept of marginal cumulative probability.
The probability of A ≤ 2, regardless of the value of B, is: 2 × 6 × (1/36) = 1/3.
Likewise, the probability of B ≤ 5, regardless of the value of A, is: 5 × 6 × (1/36) = 5/6.
To calculate the marginal cumulative probabilities, the concepts of marginal and cumulative distributions have been combined.
Conditional Probability
The concept of conditional probability is useful in regression analysis and other hydrologic applications.
The conditional probability is the ratio of joint and marginal probabilities. In statistical notation:
P(x,y)
P(x y) = ^{________}
P(y)
 (77) 
in which P(x y) is the conditional probability of x, given y. Likewise,
the conditional probability of y, given x, is:
P(x,y)
P(y x) = ^{________}
P(x)
 (78) 
From Eqs. 77 and 78, it follows that joint probability is the product of conditional and marginal probabilities.
Joint probability distributions can be expressed as continuous functions.
In this case they are called joint density functions, with the notation f(x,y).
For the conditional density function, the notation is
f(x y),
or alternatively, f(y x).
As with univariate distributions, the moments provide descriptions of the properties of joint distributions.
For continuous functions, the joint moment of order r and s about the origin (indicated with ') is defined as follows:
_{∞ ∞}
μ'_{r,s} = ∫ ∫ x^{ r}y^{ s} f (x,y ) dy dx
^{∞ ∞}
 (79) 
With r = 1 and s = 0, Eq. 79 reduces to the mean of x :
_{∞ ∞}
μ'_{1,0} = ∫ x [ ∫ x^{ r}y^{ s} f (x,y ) dy ] dx
^{∞ ∞}
 (710) 
with the expression between brackets being the marginal PDF of x, or f(x).
Therefore, the expression for the mean of x is:
∞
μ'_{1,0} = μ_{x} = ∫ x f (x ) dx
∞
 (711) 
Similar equations hold for y.
The second moments are usually written about the mean:
_{∞ ∞}
μ'_{r,s} = ∫ ∫ ( x  μ_{x} )^{r} ( y  μ_{y} )^{s} f (x,y ) dy dx
^{∞ ∞}
 (712) 
For r = 2 and s = 0, Eq. 712 reduces to the variance of x.
Likewise, for r = 0 and s = 2, Eq. 712 reduces to the variance of y.
A third type of second moment, i.e., the covariance, arises for r = 1 and
s = 1:
_{∞ ∞}
σ_{x,y} = ∫ ∫ ( x  μ_{x} ) ( y  μ_{y} )^{} f (x, y ) dy dx
^{∞ ∞}
 (713) 
in which σ_{x,y} is the covariance.
The correlation coefficient is a dimensionless value relating the covariance σ_{x,y} and standard deviations σ_{x} and σ_{y} :
σ_{x,y}
ρ_{x,y} = ^{_________}
σ_{x} σ_{y}
 (714) 
in which ρ_{x,y} is the correlation coefficient based on population data.
The sample correlation coefficient is:
s_{x,y}
r_{x,y} = ^{________}
s_{x} s_{y}
 (715) 
The calculation of sample correlation coefficient r_{x,y} including the sample covariance s_{x,y} is illustrated by Example 71.
The correlation coefficient is a measure of the linear dependence between x and y.
It varies in the range of 1 to + 1.
A value of ρ (or r ) close to or equal to 1 indicates a strong linear dependence
between the variables, with large values of x associated with large values of y,
and small values of x with small values of y.
A value of ρ (or r ) close to or equal to 1 indicates a correlation such that large values of x are associated with small values of y and vice versa.
A value of ρ = 0 (or r = 0), i.e., a zero covariance, indicates the lack of linear dependence between x and y.
Example 71.
The monthly flows of the North Fork and South Fork tributaries of a certain stream (see, for example, Fig. 72) have the following joint probability distribution f (x, y) (expressed as mean value in each class) (Note that
1 hm^{3} = 1 million cubic meters):
North fork, x (hm^{3}) 
100 
200 
300 
400 
South fork, y (hm^{3}) 

100 
0.14 
0.03 
0.00 
0.00 
200 
0.02 
0.18 
0.11 
0.00 
300 
0.00 
0.09 
0.23 
0.02 
400 
0.00 
0.00 
0.03 
0.15 

Calculate the marginal distributions, means, variances, standard deviations, covariance, and correlation coefficient for this joint distribution.
The North Fork marginal distribution, f(x), is obtained by summing up the joint probabilities across y.
Therefore:
x (hm^{3}) 
100 
200 
300 
400 
f (x) 
0.16 
0.30 
0.37 
0.17 

Likewise, the South Fork marginal distribution, f(y), is obtained by summing up the joint probabilities across x:
y (hm^{3}) 
100 
200 
300 
400 
f (y) 
0.17 
0.31 
0.34 
0.18 

The means are the first moments of the marginal distributions with respect to the origin:
x̄ = (100 × 0.16) + (200 × 0.30) + (300 × 0.37) + (400 × 0.17) = 255 hm^{3}
 
ȳ = (100 × 0.17) + (200 × 0.31) + (300 × 0.34) + (400 × 0.18) = 253 hm^{3}
 
The variances are the second moments of the marginal distributions with respect to the means:
s_{x}^{2} = Σ ( x  x̄ )^{2} f (x)
s_{x}^{2} = ( 100  255 )^{2} × 0.16 + ( 200  255 )^{2} × 0.30 + ( 300  255)^{2} × 0.37 + ( 400  255 )^{2} × 0.17
s_{x}^{2} = 9075 hm^{6}
Therefore:
s_{x} = 95.26 hm^{3}
Likewise, for y:
s_{y}^{2} = 9491 hm^{6}
s_{y} = 97.42 hm^{3}
 
The covariance is the second moment of the joint distribution:
s_{x,y} = Σ (x  x̄) (y  ȳ) f (x, y) =
+ [(100  255) × (100  253) × 0.14]
+ [(200  255) × (100  253) × 0.03]
+ [(100  255) × (200  253) × 0.02]
+ [(200  255) × (200  253) × 0.18]
+ [(300  255) × (200  253) × 0.11]
+ [(200  255) × (300  253) × 0.09]
+ [(300  255) × (300  253) × 0.23]
+ [(400  255) × (300  253) × 0.02]
+ [(300  255) × (400  253) × 0.03]
+ [(400  255) × (400  253) × 0.15] = 7785 hm^{6}
 
The correlation coefficient is r_{x,y} = s_{x,y} / (s_{x} s_{y}) = 7785 / (95.26 × 97.42) = 0.839.
 
ONLINE CALCULATION. Using
ONLINE TWOD CORRELATION,
the answer
is: Correlation coefficient r_{x,y} = 0.839, confirming the hand calculation.



Figure 72 North Fork and South Fork, Little Butte Creek, Oregon.
Bivariate Normal Distribution
Among the many joint probability distributions, the bivariate normal distribution is important in hydrology because it is the foundation of regression theory.
The bivariate normal probability distribution is [12]:
f (x , y) = K e^{ M}
 (716) 
in which x and y are the random variables, and K and M are coefficient and exponent. respectively, defined as follows:
1
K = ^{_________________________}
2 π σ_{x} σ_{y} (1  ρ^{2})^{1/2}
 (717) 
1
M =  ^{___________ } [ A ]
2 (1  ρ^{2})
 (718a) 
in which:
x  μ_{x} x  μ_{x} y  μ_{y} y  μ_{y}
A = (^{ _________ })^{2}  2 ρ (^{ _________ }) (^{ _________ }) + (^{ _________ })^{2}
σ_{x} σ_{x} σ_{y} σ_{y}
 (718b) 
The distribution has five parameters: the means μ_{x} and μ_{y}, the standard deviations σ_{x}
and σ_{y}, and the correlation coefficient ρ.
Following Eq. 78, the conditional distribution is obtained by dividing the bivariate normal (Eq. 716) by the univariate normal (Eq. 67), to yield
f
(x, y)
K = ^{_________} = K' e^{M'}
f (x)
 (719) 
in which K' and M' are coefficient and exponent, respectively, defined as follows:
1
K' = ^{_____________________}
σ_{y} [2 π (1  ρ^{2})]^{1/2}
 (720) 
1
σ_{y}
M' =  ^{________________ } [ (y  μ_{y})  ρ ^{______} (x  μ_{x}) ]^{2}
2 σ_{y}^{2} (1  ρ^{2}) σ_{x}
 (721) 
By inspection of Eqs. 720 and 721, and comparison with Eq. 67, it is concluded that the conditional distribution is also normal, with mean and variance:
σ_{y}
μ_{yx} = μ_{y}  ρ ^{_____} (x  μ_{x})
σ_{x}
 (722) 
σ_{e}^{2} = σ_{y}^{2} (1  ρ^{2})
 (723) 
Equations 722 and 723 are useful in regression analysis.
Equation 722 expresses the linear dependence between x and y.
The slope of the regression line is [ρ σ_{y}/σ_{x}].
Likewise, ρ is the fraction of the original variance explained or removed by the regression.
In other words, the variance of the conditional distribution is less than or equal to the variance of y without regard to x, and it depends on the value of the correlation coefficient ρ.
For ρ = 1, all the variance is removed, and the error of the predictive equation (i.e., the error of the regression) is reduced to zero.
For ρ = 0, none of the original variance is removed, and σ_{e} remains equal to σ_{y}.
7.2 REGRESSION ANALYSIS
A fundamental tool of regional analysis is the equation relating two or more hydrologic variables.
The variable for which values are given is called the predictor variable.
The variable for which values must be estimated is called the criterion variable [7].
The equation relating criterion variable to one or more predictor variables is called the prediction equation.
The objective of regression analysis is to evaluate the parameters of the prediction equation relating the criterion variable to one or more predictor variables.
The predictor variables are those whose variation is believed to cause or agree with variation in the criterion variable.
Correlation provides a measure of the goodness of fit of the regression.
Therefore, while regression provides the parameters of the prediction equation, correlation describes its quality.
The distinction between correlation and regression is necessary because the predictor and criterion variables cannot be switched unless the correlation coefficient is equal to 1.
Stated in other terms, if a criterion variable Y is regressed on a predictor variable X, the regression parameters cannot be used to express X as a function of Y, unless the correlation coefficient is 1.
In hydrologic modeling, regression analysis is useful in model calibration; correlation is useful in model formulation and verification.
The principle of least squares is used in regression analysis as a means of obtaining the best estimates of the parameters of the prediction equation.
The principle is based on the minimization of the sum of the squares of the differences between observed and predicted values.
The procedure can be used to regress one criterion variable on one or more predictor variables.
OnePredictorVariable Regression
Assume a predictor variable x, a criterion variable y, and a set on n paired observations of x and y.
In the simplest linear case, the line to be fitted has the following form:
in which y' is an estimate of y, and α and β are parameters to be determined by regression.
In the least squares procedure, values of the intercept α and slope β are sought such that y' is the best estimate of y.
For this purpose, the sum of the squares of the differences between y and y' are minimized as follows:
Σ ( y  y' )^{2} = Σ [ y  ( α + βx ) ]^{ 2}
 (725) 
in which the symbol Σ indicates the sum of all values from i = 1 to i = n.
Setting the partial derivatives equal to zero:
∂
^{____} { Σ [ y  ( α + βx ) ]^{ 2} } = 0
∂α
 (726) 
∂
^{____} { Σ [ y  ( α + βx ) ]^{ 2} } = 0
∂β
 (727) 
This leads to the normal equations:
Σ y  nα  β Σ x = 0
 (728) 
Σ xy  α Σ x  β Σ x^{2} = 0
 (729) 
Solving Eqs. 728 and 729 simultaneously gives:
Σ xy  ( Σ x Σ y ) / n
β = ^{________________________}
Σ
x^{2}  ( Σ x )^{2} / n
 (730) 
Σ y
 β Σ x
α = ^{__________________}
n
 (731) 
Since the slope of the regression line is: β = ρ σ_{y} /σ_{x},
the estimate from sample data is: β = r s_{y} /s_{x}.
Therefore, the correlation coefficient is
s_{x}
r = β ^{____}
s_{y}
 (732) 
The standard error of estimate of the correlation is the square root of the variance of the conditional distribution:
1
s_{e} = [ ^{______} Σ (y  y' )^{2} ]^{ 1/2}
n  2
 (733) 
in which n  2 is the number of degrees of freedom, i.e., the sample size minus the number of unknowns.
Alternatively, the standard error of estimate can be estimated from the variance of the conditional distribution, Eq. 723.
For calculations based on sample data, the standard error of estimate is:
n  1
s_{e} = s_{y} [ ^{______} (1  r^{ 2}) ]^{ 1/2}
n  2
 (734) 
Nonlinear Equations.
Equations 730 and 731 can also be used to fit power functions of the type y = a x^{b}.
First, this equation is linearized by taking the logarithms: log y = log a + b log x.
With u = log x, and v = log y, this equation is: v = log a + bu.
The variables u and v are used in Eqs. 730 and 731 instead of x and y, respectively.
Then α = log a, and β = b, and the
regression equation is: y = 10^{α}x^{ β}.
Example 72.
Find the regression equation linking the low flows (annual minima series) of streams X and Y shown in Cols. 2 and 3 of Table 71.
Calculate the linear regression parameters α and β, the correlation coefficient, and the standard error of estimate.
Summing up the values of Cols. 2 and 3, and dividing by n = 15, the means are obtained: x̄ = 72 m^{3}/s and ȳ = 77 m^{3}/s.
Columns 4 and 5 show the square of the deviations from the means.
Summing up Cols. 4 and 5, dividing the sums by (n  1) = 14, and taking the square roots, the standard deviations s_{x} = 29.568 m^{3}/s and s_{y} = 26.589 m^{3}/s are obtained.
Column 6 shows the x^{2} values, and Col. 7, the xy values.
The sum of these values is: Σ x^{2} = 90,000 and Σ xy = 93,056.
Using Eq. 730: β = [93,056  (1,080 × 1,155)/ 15 ] / [ 90,000  (1,080 × 1,080)/ 15] = 0.80849.
Using Eq. 731: α = [1155  (0.8085 × 1080)] / 15 = 18.7882.
Using Eq. 732, the correlation coefficient is: r = 0.80849 × 29.568 / 26.589 = 0.899.
Using Eq. 734, the standard error of estimate is: s_{e} = 26.589 × [(14/13) (1  0.899^{2})] ^{1/2} = 12.08 m^{3}/s.
The data and regression line are plotted in Fig. 73.
Table 71 OnePredictorVariable Regression: Example 72.

(1) 
(2) 
(3) 
(4) 
(5) 
(6) 
(7) 
Year 
x (m^{3}/s) 
y (m^{3}/s) 
( x  x̄ )^{2} 
( y  ȳ )^{2} 
x^{2} 
xy 
1973 
110 
89 
1,444 
144 
12,100 
9,790 
1974 
42 
51 
900 
676 
1,764 
2,142 
1975 
75 
72 
9 
25 
5,625 
5,400 
1976 
120 
112 
2,304 
1,225 
14,400 
13,440 
1977 
89 
70 
289 
49 
7,921 
6,230 
1978 
32 
45 
1,600 
1,024 
1,024 
1,440 
1979 
37 
42 
1,225 
1,225 
1,369 
1,554 
1980 
56 
59 
256 
324 
3,136 
3,304 
1981 
82 
100 
100 
529 
6,724 
8,200 
1982 
90 
92 
324 
225 
8,100 
8,280 
1983 
50 
70 
484 
49 
2,500 
3,500 
1984 
30 
42 
1,764 
1,225 
900 
1,260 
1985 
81 
92 
81 
225 
6,561 
7,452 
1986 
110 
130 
1,444 
2,809 
12,100 
14,300 
1987 
76 
89 
16 
144 
5,776 
6,764 
Sum 
1,080 
1,155 
12,240 
9,898 
90,000 
93,056 

 
ONLINE CALCULATION. Using
ONLINEREGRESSION11,
the answer
is: α = 18.7882; β = 0.80849; standard deviation s_{x} = 29.568;
standard deviation s_{y} = 26.589;
correlation coefficient r_{x,y} = 0.899;
standard error of estimate s_{e} = 12.08. The results of the online calculation
confirm the hand calculations.



Figure 73 XY ( Onepredictorvariable) regression: Example 72.

Multiple Regression
The extension of the least squares technique to more than one predictor variable is referred to as multiple regression.
In the case of two predictor variables, x_{1} and x_{2}, with criterion variable y and a set of n observations of y, x_{1} and x_{2}, the line to be fitted is:
y' = α + β_{1}x_{1} + β_{2}x_{2}
 (735) 
in which x_{1} and x_{2} are measured values and y' is an estimate of y.
As with the two variable case, values of the intercept α and slopes β_{1} and β_{2} are sought such that y' is the best estimate of y.
For this purpose, the sum of the squares of the differences between y and y' are minimized.
Σ ( y  y' )^{2} = Σ [ y  (α + β_{1}x_{1} + β_{2}x_{2}) ] ^{2}
 (736) 
Setting the partial derivatives with respect to α, β_{1} and β_{2} equal to zero leads to the normal equations:
Σ y  nα  β_{1} Σx_{1}  β_{2} Σx_{2} = 0
 (737) 
Σ yx_{1}  αΣ x_{1}  β_{1} Σ x_{1}^{2}  β_{2} Σ x_{1}x_{2} = 0
 (738) 
Σ yx_{2}  αΣ x_{2}  β_{2} Σ x_{2}^{2}  β_{1} Σ x_{1}x_{2} = 0
 (739) 
Solving Eqs. 737 to 739 simultaneously:
( nΣyx_{2}  Σy Σx_{2} )( nΣx_{1}x_{2}  Σx_{1} Σx_{2} )  [ nΣx_{2}^{2}  (Σx_{2} )^{2}] [ nΣyx_{1}  ΣyΣx_{1}]
β_{1} = ^{___________________________________________________________________________________}
(nΣx_{1}x_{2}  Σx_{1}Σx_{2})^{2}  [nΣx_{1}^{2}  (Σx_{1})^{2}] [nΣx_{2}^{2}  (Σx_{2})^{2}]
 (740) 
( nΣyx_{1}  Σy Σx_{1} )  β_{1} [nΣx_{1}^{2}  (Σx_{1})^{2}]
β_{2} = ^{______________________________________________}
nΣx_{1}x_{2}  Σx_{1} Σx_{2}
 (741) 
Σy  β_{1}Σx_{1}  β_{2}Σx_{2}
α = ^{___________________________}
n
 (742) 
As in the case of the onepredictorvariable regression, the standard error of estimate of the correlation is the square root of the variance of the conditional distribution:
1
s_{e} = [ ^{_______} Σ (y  y' )^{2} ]^{ 1/2}
n  3
 (743) 
in which n  3 is the number of degrees of freedom.
Alternatively, the standard error of estimate can be estimated from the variance of the conditional distribution.
For calculations based on sample data, the standard error of estimate is:
n  1
s_{e} = s_{y} [ ^{_______} ( 1  R ^{2} ) ]^{ 1/2}
n  3
 (744) 
in which R = multiple regression coefficient, or coefficient of multiple determination, calculated as follows [8]:
R ^{2} = 1  (SSE / SSTO )
 (745) 
in which SSE = error sum of squares, defined as
SSE = Σ ( y  y' )^{2}
 (746) 
and SSTO = total sum of squares, defined as
SSTO = Σ ( y  ȳ )^{2}
 (747) 
Nonlinear Multiple Regression
Equations 740 to 742 can also be used to fit equations of the type:
y = a x_{1}^{b1} x_{2}^{b2}
 (748) 
First, this equation is linearized by taking the logarithms:
log y = log a + b_{1} log x_{1} + b_{2} log x_{2}
 (749) 
With u = log x_{1} v = log x_{2}, and w = log y, this equation is: w = log a + bu + cv.
The variables u, v, and w are used in Eqs. 740 to 742 instead of x_{1}, x_{2}, and y, respectively.
Then α = log a, β_{1} = b_{1}, β_{2} = b_{2}, and the regression equation is:
y = 10^{α} x_{1}^{β1} x_{2}^{β2}
 (750) 
Multiple regression analysis involving more than two predictor variables is based on the same least squares principle as in the cases shown here.
Library programs are usually available to perform the large amount of computations involved.
7.3 REGIONAL ANALYSIS
Peak Flow Based on Catchment Area
The earliest approach to regionalization of hydrologic properties was to assume that peak flow is related to catchment area and to perform a regression to determine the parameters.
The equation is of the following form:
in which Q_{p} = peak flow; A = catchment area; and c and m are regression parameters.
In nature, as catchment area increases, the spatially averaged rainfall intensity decreases, and consequently peak flow does not increase as fast as catchment area.
Therefore, the exponent m in Eq. 751 always less than 1, usually in the range 0.4 to 0.9 [5, 10] .
Practical examples of the use of this method are given in Section 14.6.
Other formulas relating peak flow to catchment area are the following:
Q_{p} = c A^{ nAm}
 (752) 
Q_{p} = c A^{ a  b log A}
 (753) 
cA
Q_{p} = ^{______________} + dA
(a + bA ) ^{m}
 (754) 
in which a, b, c, d, m, and n are parameters determined from statistical analysis of measured data and are applicable on a regional basis, i.e., for neighboring watersheds of similar physiographic, vegetative, and land use patterns.
The Creager curves (Fig. 273) are an example of Eq. 752 [3]. Equation 753 been used in regional flood studies in the Southwest [2, 6, 9], whereas Eq. 754 appears to be typical of European practice [5].
In principle, none of these equations accounts explicitly for flood frequency, being limited to providing a maximum flow.
The effect of flood frequency, however, can be accounted for by varying the parameters (Section 14.6).
IndexFlood Method
The indexflood method is used to determine the magnitude and frequency of peak flows for catchments of any size, whether gaged or ungaged, located within a hydrologically homogeneous region, i.e. , a region with similar hydrologic characteristics [1, 4].
The application of the indexflood method consists of developing two curves.
The first curve depicts the mean annual flood (i.e., that corresponding to the 2.33y frequency) versus catchment area.
The second curve shows peak flow ratio versus frequency.
The peak flow ratio is the ratio of peak flow for a given frequency to the mean annual flood.
Using these two curves, a floodfrequency curve may be developed for any catchment in the region.
The procedure consists of the following steps:

Measuring the catchment area,

Using the first curve to obtain the mean annual flood,

Using the second curve to obtain peak flow ratios for selected frequencies,

Calculating the peak flows for each frequency, and

Plotting peak flows versus frequencies.
Mean Annual Flood
The magnitude of the mean annual flood is a function of several physiographic and meteorologic factors.
The physiographic factors that may influence the mean annual flood are the following:

Drainage area,

Channel storage,

Artificial or natural storage in lakes and ponds,

Catchment slope,

Land slope,

Stream density and pattern,

Mean elevation,

Catchment shape,

Orographic position,

Underlying geology,

Soil cover, and

Vegetative and land use patterns.
The meteorologic factors include:

Regional climatic characteristics,

Rainfall intensities,

Storm direction, pattern and volume,

Effect of snowmelt.
Of the above factors, drainage area is the most important and the one most readily available.
Measuring the other factors is usually more difficult.
For instance, channel storage has an important effect but cannot be measured directly.
For practical use, a regression of mean annual flood on catchment area is usually sufficient.
Alternatively, equations relating mean annual flood to catchment characteristics other than area can be determined by using multiple regression techniques.
Regional Frequency Curve.
The procedure to develop a regional frequency curve by the indexflood method consists of the following steps:
Assemble the records (annual exceedence or annual maxima series) of several stations (usually 10 to 15), each having more than 5 y of record.
Select a time base common to all the stations (common base period of analysis) in order to eliminate the effect of variability with time.
For each i th station, rank the records in descending order and compute return periods using a plotting position formula such as Weibull's (Eq. 626).
For each i th station, plot the annual flows versus return periods on extreme value probability paper and fit a line visually to determine the frequency curve.
For each i th station, determine the mean annual flood, that is, the peak flow corresponding to the 2.33y frequency.
Choose several frequencies, and for each i th station and j th frequency calculate the peak flow ratio, i.e., the ratio of peak flow for the j th frequency to the mean annual flood.
For each j th frequency, determine the median value of peak flow ratios for all stations, that is, the median peak flow ratio.
Plot median peak flow ratios versus frequencies on extreme value probability paper and draw a line of best fit to obtain a regional flood frequency curve for the given data.
Test of Hydrologic Homogeneity.
The indexflood method includes a test of regional hydrologic homogeneity.
Any station not passing this test should be excluded from the set.
The test procedure consists of the following steps [4]:
For each i th station, use its frequency curve to determine the 2.33y and the 10y floods.
For each i th station, calculate the 10y peak flow ratio, i.e., the ratio of the 10y flood to the 2.33y flood.
Calculate the average of the 10y peak flow ratios for all stations.
For each i th station, multiply the 2.33y flood by the average 10y peak flow ratio to obtain an adjusted 10y peak flow.
For each i th station, use its frequency curve to determine the return period T_{i} for the adjusted 10y peak flow.
For each i th station, plot the return period T_{i} versus the length of record n, in years, in Fig. 74.
Points located within the confidence limits (solid lines) are considered to be hydrologically homogeneous.
Points lying outside of the solid lines should not be used in the calculation of the median peak flow ratio (step 7 of the indexflood method).
Figure 74 Homogeneity test chart for indexflood method [4].

Limitations of the IndexFlood Method.
Benson [1] has noted the following limitations of the indexflood method:
The mean annual flood for stations with short periods of record may not be typical, which means that the peak flow ratios of different return periods may vary widely among stations.
The homogeneity test is used to determine whether the differences in the frequency curves are greater than those that could be attributed to chance alone.
The indexflood test uses the 10y flow ratio because of the lack of sufficient data
to define the frequency curve adequately at longer return periods.
Studies have shown that although homogeneity may be assumed on the basis of the 10y peak flow ratio, the individual frequency curves may show wide and sometimes systematic differences at longer return periods.
The method combines frequency curves for all catchment sizes, excluding only the largest.
At the 10y peak flow ratio level, the effect of catchment size is small and can be neglected. Studies have shown that the peak flow ratios tend to vary inversely with catchment size.
In general, the larger the catchment, the flatter the frequency curve and the lower the peak flow ratios.
The effect of catchment size is particularly marked for floods of long return period.
Example 73.
Use the Q_{i}/Q_{2.33} data for the five stations shown in Table 72 to develop a regional flood frequency curve by the indexflood method.
Assuming Q_{2.33} = 2.5A^{0.6}, in which Q_{2.33} is in cubic meters per second and catchment area A is in square kilometers, calculate the 50y flood for a 150km^{2} catchment based on the regionally developed curve.
The median values are shown at the bottom of each column.
These values are plotted against the return period, as shown in Fig. 75.
The fitted line is the regional floodfrequency curve.
For a 150km^{2} catchment, the mean annual flood is: 50.5 m^{3}/s.
From Fig. 75, the peak flood ratio for the 50y return period is 2.62.
Therefore, the 50y flood for this catchment is 132 m^{3}/s.

Table 72 IndexFlood Method: Example 73.

(1) 
(2) 
(3) 
(4) 
(5) 
(6) 
(7) 
(8) 
Station i 
Q_{i} /Q_{ 2.33} for the j th Return Period (years) 
1.11 
1.25 
2 
5 
10 
25 
50 
1 
0.32 
0.49 
0.90 
1.45 
1.82 
2.28 
2.62 
2 
0.35 
0.51 
0.92 
1.44 
1.79 
2.23 
2.56 
3 
0.39 
0.55 
0.92 
1.40 
1.73 
2.14 
2.44 
4 
0.27 
0.45 
0.90 
1.50 
1.88 
2.38 
2.74 
5 
0.31 
0.50 
0.91 
1.46 
1.84 
2.32 
2.68 
Median 
0.32 
0.50 
0.91 
1.45 
1.82 
2.28 
2.62 
Figure 75 Indexflood method: Example 73.

Rainfall IntensityDurationFrequency
Curves showing the relationship between intensity, duration, and frequency of rainfall (IDF curves) are required for peak flow computations in small catchments (see rational method, Chapter 4).
These curves can be developed using either: (a) depthdurationfrequency data provided by the National Weather Service, or (b) regional or local rainfall intensityduration data.
The latter procedure is illustrated by the following example.
Example 74.
Determine the equation relating rainfall intensity and duration for the following 10y frequency rainfall data.
Rainfall duration t_{r} (min) 
5 
10 
15 
30 
60 
120 
180 
Rainfall intensity i (cm/h) 
8 
5 
4 
2.5 
1.5 
1.0 
0.8 

The data suggest that the relation is of hyperbolic type, with greater intensities associated with shorter durations.
Therefore, an equation of the type of Eq. 26 is applicable:
a
i = ^{___________}
t_{r} + b

(755) 
in which a and b are constants to be determined by regression analysis.
This equation can be linearized in the following way:
1 t_{r} b
^{___} = ^{____} + ^{____}
i a a

(756) 
With y = 1/i, x = t_{r}, α = b/a, and β = 1/a, the application of the regression formulas (Eqs. 730 and 731) to the data leads to: 1/i = 0.006422 t_{r}, + 0.1706, in which α = 0.1706 and β = 0.006422.
Therefore: a = 155.7 and b = 26.56.
The regression equation is: i = 155.7 / (t_{r} + 26.56).
The data and regression line are shown in Fig. 76.
 
ONLINE CALCULATION. Using
ONLINEREGRESSION15,
the answer
is: a = 155.702; b = 26.5632,
which confirms the hand calculations.




Figure 76 Fitting intensitydurationfrequency curve: Example 74.

State Equations for Regional Flood Frequency
The U.S. Geological Survey has developed a comprehensive methodology for regional analysis of flood frequency [11].
Details of this method are given in Section 14.6.
QUESTIONS
What is a joint probability? What is a marginal probability?
What is a joint density function? Give an example.
What is a conditional probability? How is it used in regression analysis?
Define covariance.
What is a correlation coefficient?
What is the difference between correlation and regression?
Describe briefly the indexflood method for regional analysis of flood frequency.
PROBLEMS
Using ONLINE TWOD CORRELATION,
calculate the correlation coefficient of the following joint distribution of quarterly flows (expressed as mean values in each class) in streams A and B:
Stream A (acft) 
1000 
2000 
3000 
4000 
5000 
Stream B (acft) 

1000 
0.07 
0.03 
0.02 
0.00 
0.00 
2000 
0.03 
0.08 
0.04 
0.03 
0.00 
3000 
0.02 
0.04 
0.08 
0.05 
0.02 
4000 
0.00 
0.04 
0.08 
0.11 
0.06 
5000 
0.00 
0.00 
0.03 
0.08 
0.09 

Develop a spread sheet to calculate the regression constants, correlation coefficient, and standard error of estimate of a series of paired flow values X and Y. Test your program using the data of Example 72 in the text.
Using the spread sheet developed in Problem 72, calculate the regression constants, correlation coefficient, and standard error of estimate for the following paired lowflow series (annual minima):
Stream X (m^{3}/s) 
Stream Y (m^{3}/s) 
50 
65 
66 
76 
32 
45 
78 
95 
12 
18 
34 
50 
23 
31 
50 
64 
43 
67 
89 
99 
76 
89 
22 
33 

Verify with ONLINE REGRESSION11.
Modify the spread sheet developed in Problem 72 to calculate the regression constants to fit a power function of the following form (Eq. 751):
in which Q_{p} = peak discharge; A = drainage area; c and m are coefficient and exponent, respectively.
Using the spread sheet developed in Problem 74, fit a power function to the following data:
Peak Discharge (m^{3}/s) 
Drainage Area (km^{2}) 
124 
25 
254 
46 
378 
78 
101 
22 
678 
99 
540 
89 
490 
83 
267 
52 
350 
73 

Verify with ONLINE REGRESSION12.
ONLINE REGRESSION13 solves the twopredictorvariable
linear regression problem (Eq. 735). Use this program to determine the regression constants for the following data set:
Y Time of Concentration (min) 
X_{1} Hydraulic Length (m) 
X_{2} Catchment Slope (m/m) 
89 
3245 
0.008 
75 
2567 
0.011 
57 
2783 
0.009 
34 
1234 
0.015 
101 
5345 
0.006 
121 
5329 
0.007 
68 
3002 
0.008 
79 
2976 
0.010 
25 
1034 
0.018 
59 
2984 
0.010 
96 
3892 
0.007 
12 
534 
0.020 

Use ONLINE REGRESSION14
to solve the twopredictorvariable nonlinear regression problem of Eq. 748, for the data of Problem 76.
The median Q_{i}/Q_{2.33} ratios (i = frequency) for 10 stations have been found to be 1.95 for the 10y frequency and 2.45 for the 50y frequency.
Use the indexflood method to calculate the 25y flood for a point in a stream having a 340km^{2} catchment and a mean annual flood given by the following formula:
Q_{ 2.33} = 3.93 A^{ 0.75
}  
in which Q = flood discharge in cubic meters per second, and A = drainage area in square kilometers.
Modify the spread sheet developed in Problem 72 to calculate the regression constants and correlation coefficient to fit intensitydurationfrequency rainfall data. Test your spread sheet using the data of Example 74 in the text.
Using ONLINE REGRESSION15 for a hyperbolic regression,
calculate the regression constants a and b (Eq. 755) for the following 25y frequency rainfall data:
Duration (min) 
5 
10 
15 
30 
60 
120 
180 
Intensity (mm/h) 
15.5 
7.5 
6.5 
4.5 
3.5 
2.5 
1.5 

REFERENCES
Benson, M. A. (1962). "Evolution of Methods for Evaluating the Occurrence of Floods," U.S. Geological Survey Water Supply Paper No. 1580A.
Boughton, W. C., and K. G. Renard. (1984). "Flood Frequency Characteristics of Some Arizona Watersheds," Water Resources Bulletin, Vol. 20, No. 5, October, pp. 761 769.
Creager, W. P., J. D. Justin, and 1. Hinds. (1945). Engineering for Dams. Vol. 1. New York: John Wiley.
Dalrymple, T. (1960). "Flood Frequency Analyses," U.S. Geological Survey Water Supply Paper No. 1543A.
Hall, M. J. (1984). Urban Hydrology. London: Elsevier Applied Science Publishers.
Malvick, A. J. (1980). "A MagnitudeFrequencyArea Relation for Floods in Arizona," Research Report No. 2, College of Engineering, University of Arizona, Tucson.
McCuen, R. H. (1985). Statistical Methods for Engineers. Englewood Cliffs, N.J.: PrenticeHall.
Neter, J., W. Wasserman, and M. H. Kutner. (1989). Applied Linear Regression Models, Second Edition, Irwin, Homewood, illinois.
Reich. B. M., H. B. Osborn. and M. C. Baker. (1979). "Tests on Arizona New Flood Estimates," in Hydrology and Water Resources in Arizona and the Southwest, University of Arizona, Tucson, Vol. 9.
Roeske, R. H. (1978). "Methods for Estimating the Magn!tude and Frequency of Floods in Arizona," Final Report, ADOTRSlS121, U.S. Geological Survey, Tucson, Arizona.
U.S. Geological Survey. (1994). "Nationwide Summary of U.S. Geological Survey Regional Regression Equations for Estimating Magnitude and Frequency of Floods for Ungaged Sites, 1993" Compiled by M. E. Jennings, W. O. Thomas, and H. C. Riggs, WaterResources Investigations Report 944002, Reston, Virginia.
Viessman, W. Jr., J. W. Knapp, G. L. Lewis, and T. E. Harbaugh, Introduction to Hydrology, 2d. ed, New York: Harper & Row.
http://engineeringhydrology.sdsu.edu 

150714 
