The term frequency analysis refers to the techniques whose objective is to analyze the occurrence of hydrologic variables within a statistical framework, i.e., by using measured data and basing predictions on statistical laws.
These techniques are applicable to the study of statistical properties of either rainfall or runoff (flow) series.
In engineering hydrology, however, frequency analysis is commonly used to calculate flood discharges.
In principle, techniques of frequency analysis are applicable to gaged catchments with long periods of streamflow record.
In practice, these techniques are primarily used for large catchments, because these are more likely to be gaged and have longer record periods.
Frequency analysis is also applicable to midsize catchments, provided the record length is adequate.
For ungaged catchments (either midsize or large), frequency analysis can be used in a regional context to develop flow characteristics applicable to hydrologically homogeneous regions. These techniques comprise what is referred to as regional analysis (Chapter 7).
The question to be answered by flow frequency analysis can be stated as follows: Given n years of daily streamflow records for stream@ what is the maximum (or minimum) flow Q that is likely to recur with a frequency of once in T years on the average?
Or, what is the maximum flow Q associated with a T-year return period? Alternatively, frequency analysis seeks to answer the inverse question: What is the return period T associated with a maximum (or minimum) flow Q?
In more general terms, the preceding questions can be stated as follows:
Given n years of streamflow data for stream Sand L years of design life of a certain structure', what is the probability P of a discharge Q being exceeded at least once during the design life L?
Alternatively, what is the discharge Q which has the probability P of being exceeded during the design life L?
This chapter is divided into three sections. Section 6.1 contains a review of statistics and probability concepts useful in engineering hydrology.
Section 6.2 describes techniques of flood frequency analysis. Section 6.3 discusses low-flow frequency and droughts .
6_1 CONCEPTS OF STATISTICS AND PROBABILITY
Frequency analysis uses random variables and probability distributions.
A random variable follows a certain probability distribution.
A probability distribution is a function that expresses in mathematical terms the relative chance of occurrence of each of all possible outcomes of the random variable.
In statistical notation, P(X = XI ) is the probability P that the random variable X takes on the outcome XI' A shorter notation is P(XI)'
An example of random variable and probability distribution is shown in Fig. 6- 1.
This is a discrete probability distribution because the possible outcomes have been arranged into groups (or classes).
The random variable is discharge Q; the possible outcomes are seven discharge classes, from 0-100 mJ/ s to 600-700 mJ/s.
In Fig. 6-1 , the probability that Q is in the class 100-200 mJ/s is 0.25.
The sum of probabilities of all possible outcomes is equal to 1.
A cumulative discrete distribution, corresponding to the discrete probability distribution of Fig. 6-1, is shown in Fig. 6-2.
In this figure, the probability that Q is in a class less than or equal to the 100-200 class is 0.40.
The maximum value of probability of the cumulative distribution is 1.
Properties of Statistical Distributions
The properties of statistical distributions are described by the following measures:
(1) central tendency,
(2) variability, and
(3) skewness.
Statistical distributions are described in terms of moments.
The first moment describes central tendency, the second moment describes variability, and the third moment describes skewness. Higher order moments are possible but are seldom used in practical applications.
The first moment about the origin is the arithmetic mean. or mean.
It expresses the distance from the origin to the centroid of the distribution
(Fig. 6-3)
(6-1)
in which x is the mean, Xi is the random variable, and n is the number of values.
The geometric mean is the nth root of the product of n terms:
(6-2)
The logarithm of the geometric mean is the mean of the logarithms of the individual values.
The geometric mean is to the lognormal probability distribution what the arithmetic mean is to the normal probability distribution.
The median is the value of the variable that divides the probability distribution into two equal portions (or areas) (Fig. 6-3(bÈ.
For certain skewed distributions (Le. one with third moment other than zero), the median is a better indication of central tendency than the mean.
Another measure of central tendency is the mode. defined as the value of the variable that occurs most frequently (Fig. 6-3(cÈ .
Statistical moments can be defined about axes other than the origin.
The second moment about the mean is the variance. defined as
(6-3
in which S2 is the variance.
The square root of the variance, s, is the standard deviation.
The variance coefficient (or coefficient of variation) is defined as s
(6-4)
The standard deviation and variance coefficient are useful in comparing relative variability among distributions.
The larger the standard deviation and variance coefficient, the larger the spread of the distribution (Fig. 6-3).
The third moment about the mean is the skewness. defined as follows:
(6-5)
in which a is the skewness.
The skew coefficient is defined as
(6-6)
For symmetrical distributions, the skewness is 0 and Cs = O.
For right skewness (distributions with the long tail to the right), Cs > 0; for left skewness (long tail to the left), Cs < 0 (Fig. 6-3(eÈ.
Another measure of skewness is Pearson 's skewness. defined as the ratio of the difference between mean and mode to the standard deviation.
Example 6-1.
Calculate the mean, standard deviation, and skew coefficient for the following flood series: 4580, 3490, 7260,9350, 2510,3720,4070, 5400, 6220, 4350, and 5930 mJ/ s.
The calculations are shown in Table 6-1.
Column 1 shows the year and Col. 2 shows the annual maximum flows.
The mean (Eq. 6-1) is calculated by summing up Col. 2 and dividing the sum by n = 11.
This results in x = 51 71 mJ/ s.
Column 3 shows the flow deviations from the mean, Xi - X.
Column 4 shows the square of the flow deviations, (Xi - x)2.
The variance (Eq. 6-3) is calculated by summing up Col. 4 and dividing the sum by (n - 1)= 10.
This results in: S2 = 3,780,449mb/ s2¥
The square root ofthevariance is the standard deviation: s = 1944 mJ/ s.
The variance coefficient (Eq. 6-4) is Cy = 0.376.
Column 5 shows the cube of the flow deviations, (Xi - x)J.
The skewness (Eq. 6-5) is calculated by summing up Col. 5 and multiplying the sum by n /[(n - l)(n - 2)]= 111 90.
This results in a = 6,717,359,675 m9/ sJ.
The skew coefficient (Eq. 6-6) is equal to the skewness divided by the cube of the standard deviation.
This results in Cs = 0.914.
Continuous Probability Distributions
A continuous probability distribution is referred to as a probability density function (PDF).
A PDF is an equation relating probability, random variable, and parameters of the distribution.
Selected PDFs useful in engineering hydrology are described in this section.
Normal Distribution.
The normal distribution is a symmetrical, bell-shaped PDF also known as the Gaussian distribution, or the natural law of errors.
It has two parameters: the mean, p., and the standard deviation, a, of the population.
In practical applications, the mean x and the standard deviation s derived from sample data are substituted for p. and a. The PDF of the normal distribution is
(6-7)
in which x is the random variable and f(x) is the continuous probability. By means of the transformation
(6-8)
the normal distribution can be converted into a one-parameter distribution, as follows:
(6-9)
in which z is the standard unit, which is normally distributed with zero mean and unit standard deviation.
From Eq. 6-8,
x = p. + za (6-10)
in which z, the standard unit, is the frequency factor of the normal distribution. In general, the frequency factor of a statistical distribution is referred to as K.
A cumulative density function (CDF) can be derived by integrating the probability density function. From Eq. 6-9, integratipn leads to
(6-11)
in which F(z) denotes cumulative probability and u is a dummy variable of integration. The distribution is symmetrical with respect to the origin; therefore, only half of the distribution needs to be evaluated. Table A-S (Appendix A) shows values of F(z) versus z, in which F(z) is integrated from the origin to z.
Eumple 6-2.
The annual maximum flows of a certain stream have been found to be nonnally distributed, with mean 90 m3/s and standard deviation 30 m3/s. Calcuiate the probability that a flow larger that 150 m3/s will occur.
To enter Table A-S, it is necessary to calculate the standard unit. For a flow of 150 mJ I s, the standard unit (Eq. 6-8) is: z = (lSO - 90)/ 30 = 2. This means that the flow of 150 mJ / s is located two standard deviations to the right of the mean (had z been negative, the flow would have been located to the left of the mean). In Table A-S, for z = 2, F(z) = 0.4772. This value is the cumulative probability measured from z = 0 to z = 2, i.e. , from the mean (90 mJ I s) to the value being considered (lSO mJ I s). Because the normal distribution is symmetrical with respect to the origin, the cumulative probability measured from z = - 00 to z = 0, is 0.5 . Therefore, the cumulative probability measured from z = - 00 to z = 2, is F (z)= 0.5 + 0.4772 = 0.9772. This is the probability that the flow is less than 150 mJ/ s. To find the probability that the flow is larger than 150 mJ/ s, the complementary cumulative probability is calculated: G(z )= 1 - F(z) = 0.0228. Therefore, there is a (0.0228 X 100) = 2.280/0 chance that the annual maximum flow for the given stream will be larger than 150 mJ/ s.
Lognormal Distribution. For certain natural phenomena, values of random variables do not follow a normal distribution, but their logarithms do. In this case, a suitable PDF can be obtained by substituting y for x in the equation for the nonnal distribution, Eq. 6-7, in which y = In x. The parameters of the lognonnal distribution are the mean and standard deviation of y: J.i.y and ay-
Gamma Distribution. The gamma distribution is used in many applications of engineering hydrology. The PDF of the gamma distribution is the following:
(6-12)
for ¡ < x < 00, ~ > 0, and r> 0. The parameter r is known as the shape parameter, since it most influences the peakedness of the distribution, while the parameter is called the scale parameter, since most of its influence is on the spread of the distribution [4].
mean of the gamma distribution is .61' -:the variance is .62-r, and the sKewness IS 2/ (1') . The term where y is a positive integer,is an important definite integral referred to as
the gamma function. defined as follows:
(6-13)
Pearson Distributions. Pearson [@J has derived a series of probability functions to fit virtually any distribution. These functions have been widely used in practical statistics to define the shape of many distribution curves. The general PDF of the Pearson distributions is the following 5 :
(6-14)
in which a, bo, bl> and b2 are constants. The criterion for determining the type of distribution is K, defined as follows:
(6-15)
in which.61 = f.i.~1 f.i.~ and .62 = f.i.41 f.i.~, with f.i.2 , f.i.3, and f.i.4 being the second, third, and fourth moments about the mean. With J.l.3 = ¡ (Le., zero skewness), {31 = 0, K = 0, and the Pearson distribution reduces to the normal distribution.
The Pearson Type III distribution has been widely used in flood frequency analysis.
In the Pearson Type III distribution, K = 00, which implies that 2{32 = (3{31 + 6).
This is a three-parameter skewed distribution with the following PDF:
(6-16)
and parameters {3, -y, and Xo' For Xo = 0, the Pearson Type III distribution reduces to the gamma distribution (Eq. 6-12). For -y = 1, the Pearson Type III distribution reduces to the exponential distribution, with the following PDF:
(6-17)
The mean of the Pearson Type III distribution is Xo + {3-y, the variance is {32-y, and the skewness is 2/( -y)1I2.
Extreme Value Distributions. The extreme value distributions Types I, II, and III are based on the theory of extreme values. Frechet (on Type II) in 1927 and Fisher and Tippett (on Types I and III) in 1928 7 independently studied the statistical distribution of extreme values. Extreme value theory implies that if a random variable Q,is the maximum in a sample of size n from some population of x values, then, provided n is sufficiently large, the distribution of Q is one of three asymptotic types (1, II, or III), depending on the distribution of x.
The extreme value distributions can be combined into one and expressed as a general extreme value (G EV) distribution ~. The cumulative density function of the GEV distribution is:
(6-18)
in which k, u and a are parameters. The parameter k defines the type of distribution, u is a location parameter, and a is a scale parameter. For k = 0, the GEV distribution reduces to the extreme value Type I (EVl), or Gumbel, distribution. For k < 0, the GEV distribution is the extreme value Type II (EV2), or Frechet, distribution. For k > 0, the GEV distribution is the extreme value Type III (EV3), or Weibull, distribution. The GEV distribution is useful in applications where an extreme value distribution is being considered but its type is not known a priori.
Gumbel [@, 13~ has fitted the extreme value Type I distribution to long records of river flows from many countries. The cumulative density function (CDF) of the Gumbel distribution is the following double exponential function:
(6-19)
in which y = (x - u)/a is the Gumbel (reduced) variate.
The mean Yn and standard deviation an of the Gumbel variate are functions of record length n. Values ofYn and an as a function of n are given in Table A-8 (Appendix A). When the record length approaches 00, the mean Yn approaches the value of the Euler constant (0.5772) @ ' and the standard deviation anapproaches the value 7r/ .J6. The skew coefficient of the Gumbel distribution is 1.14.
The extreme value Type II distribution is also known as the log Gumbel. Its cumulative density function is
(6-20)
for k < O.
The extreme value Type III distribution has the same CDF as the Type II, but in this case k > O. As k approaches 0, the EV2 and EV3 distributions converge to the EV1 distribution.
6.2 FLOOD FREQUENCY A
Flood frequency analysi refers t the applica áon of frequency analysis 0 study the occurrence of floods. His, orical , many pro ability distributions have een used for this purpose. The norma\ dis ribution wa first used by Horton [ 8 in 1913, and shortly thereafter by Fulle~ . Hazen used the lognormal distribution to reduce skewness, whereas Foster ~ preferred to use the skewed Pearson distributions.
The logarithmic version of the Pearson Type III distribution, i.e. , the log Pearson III, has been endorsed by the U.S. Interagency Advisory Committee on Water Data for general use in the United States f@. The Gumbel distribution (extreme value Type I, or EVl) is also widely used in ~~ United States and throughout the world. The log Pearson III and Gumbel metho, are described in this section.
Selection of Data Series 2~
The complete record of streamflows at a given gaging station is called the complete duration series. To perform a flood frequency analysis, it is necessary to select aflood series, i.e., a sample of flood events extracted from the complete duration series.
There are two types of flood series: (1) the partial' duration series and (2) the extreme value series. The partial duration (or peaks-over-a-threshold (POT) [@ ) series consists of floods whose magnitude is greater than a certain base value. When the base value is such that the number of events in the series is equal to the number of years of record, the series is called an annual exceedence series.
In the extreme value series, every year of record contributes one value to the extreme value series, either the maximum value (as in the case of flood frequency analysis) or the minimum value (as in the case of low-flow frequency analysis). The former is the annual maxima series; the latter is the annual minima series.
The annual exceedence series takes into account all extreme events above a certain base value, regardless of when they occurred. However, the annual maxima series considers only one extreme event per yearly period. The difference between the two series is likely to be more marked for short records in which the second largest annual events may strongly influence the character of the annual exceedence series. In practice, the annual exceedence series is used for frequency analyses involving short return periods, ranging from 2 to 10 y. For longer return periods the difference between annual exceedence and annual maxima series is small. The annual maxima series is used for return periods ranging from 10 to 100 y and more.
Return Period, Frequency, and Risk
The time elapsed between successive peak flows exceeding a certain flow Q is a random variable whose mean value is called the return period T (or recurrence interval) of the flow Q. The relationship between probability and return period is the following:
P(Q) = T (6-21)
in which P( Q) is the probability of exceedence of Q, or frequency. The terms frequency and return period are often used interchangeably, although strictly speaking, frequency is the reciprocal of return period. A frequency of 11 T, or one in T years, corresponds to a return period of T years.
The probability of nonexceedence P( Q) is the complementary probability of the probability of exceedence P( Q), defined as
(6-22)
The probability of nonexceedence in n successive years is
(6-23)
Therefore, the probability, or risk, that Q will occur at least once in n successive years is
(6-24 )
Plotting Positions
Frequency distributions are plotted using probability papers. One of the scales on a probability paper is a probability scale; the other is either an arithmetic or logarithmic scale. Normal and extreme value probability distributions are most often used in probability papers.
An arithmetic probability paper has a normal probability scale and an arithmetic scale. This type of paper is used for plotting normal and Pearson distributions. A log probability paper has a normal probability scale and a logarithmic scale and is used for plotting lognormal and log Pearson distributions. An extreme value probability paper has an extreme value scale and an arithmetic scale and is used for plotting extreme value distributions.
Data fitting a normal distribution plot as a straight line on arithmetic probability paper. Likewise, data fitting a lognormal distribution plot as a straight line on log probability paper, and data fitting the Gumbel distribution plot as a straight line on extreme value probability paper.
For plotting purposes, the probability of an individual event can be obtained directly from the flood series. For a series of n annual maxima, the following ratio holds: