Go to ⇒ Reading format 
6.1 CONCEPTS OF STATISTICS
Introduction The term frequency analysis refers to the techniques whose objective is to analyze the occurrence of hydrologic variables within a statistical framework, i.e., by using measured data and basing predictions on statistical laws. These techniques are applicable to the study of statistical properties of either rainfall or runoff (flow) series. In engineering hydrology, however, frequency analysis is commonly used to calculate flood discharges. In principle, techniques of frequency analysis are applicable to gaged catchments with long periods of streamflow record. In practice, these techniques are primarily used for large catchments, because these are more likely to be gaged and have longer record periods. Frequency analysis is also applicable to midsize catchments, provided the record length is adequate. For ungaged catchments (either midsize or large), frequency analysis can be used in a regional context to develop flow characteristics applicable to hydrologically homogeneous regions. These techniques comprise what is referred to as regional analysis (Chapter 7). The question to be answered by flow frequency analysis can be stated as follows: Given n years of daily streamflow records for stream S, what is the maximum (or minimum) flow Q that is likely to recur with a frequency of once in T years on the average? Or, what is the maximum flow Q associated with a Tyear return period? Alternatively, frequency analysis seeks to answer the inverse question: What is the return period T associated with a maximum (or minimum) flow Q? In more general terms, the preceding questions can be stated as follows: Given n years of streamflow data for stream S and L years of design life of a certain structure, what is the probability P of a discharge Q being exceeded at least once during the design life L? Alternatively, what is the discharge Q which has the probability P of being exceeded during the design life L? Random Variables Frequency analysis uses random variables and probability distributions. A random variable follows a certain probability distribution. A probability distribution is a function that expresses in mathematical terms the relative chance of occurrence of each of all possible outcomes of the random variable. In statistical notation, P (X = x_{1}) is the probability P that the random variable X takes on the outcome x_{1}. A shorter notation is P (x_{1}). An example of random variable and probability distribution is shown in Fig. 61. This is a discrete probability distribution because the possible outcomes have been arranged into groups (or classes). The random variable is discharge Q; the possible outcomes are seven discharge classes, from 0100 m^{3}/s to 600700 m^{3}/s. In Fig. 61, the probability that Q is in the class 100200 m^{3}/s is 0.25. The sum of probabilities of all possible outcomes is equal to 1.
A cumulative discrete distribution, corresponding to the discrete probability distribution of Fig. 61, is shown in Fig. 62. In this figure, the probability that Q is in a class less than or equal to the 100200 m^{3}/s class is 0.40. The maximum value of probability of the cumulative distribution is 1.
Properties of Statistical Distributions The properties of statistical distributions are described by the following measures:
Statistical distributions are described in terms of moments. The first moment describes central tendency, the second moment describes variability, and the third moment describes skewness. Higher order moments are possible but are seldom used in practical applications. The first moment about the origin is the arithmetic mean, or mean. It expresses the distance from the origin to the centroid of the distribution, as shown in Fig. 63 (a):
in which x̄ is the mean, x_{i} is the random variable, and n is the number of values. The geometric mean is the nth root of the product of n terms:
The logarithm of the geometric mean is the mean of the logarithms of the individual values. The geometric mean is to the lognormal probability distribution what the arithmetic mean is to the normal probability distribution. The median is the value of the variable that divides the probability distribution into two equal portions (or areas); see Fig. 63 (b). For certain skewed distributions (i.e., one with third moment other than zero), the median is a better indication of central tendency than the mean. Another measure of central tendency is the mode, defined as the value of the variable that occurs most frequently; see Fig. 63 (c).
Statistical moments can be defined about axes other than the origin. The second moment about the mean is the variance, defined as
in which s^{ 2} is the variance. The square root of the variance, s, is the standard deviation. The variance coefficient (or coefficient of variation) is defined as
The standard deviation and variance coefficient are useful in comparing relative variability among distributions. The larger the standard deviation and variance coefficient, the larger the spread of the distribution; see Fig. 63 (d). The third moment about the mean is the skewness, defined as follows:
in which a is the skewness. The skew coefficient is defined as
For symmetrical distributions, the skewness is 0 and C_{s} = 0. For right skewness (distributions with the long tail to the right), C_{s} > 0; for left skewness (long tail to the left), C_{s} < 0; see Fig. 63 (e). Another measure of skewness is Pearson's skewness, defined as the ratio of the difference between mean and mode to the standard deviation.
Continuous Probability Distributions A continuous probability distribution is referred to as a probability density function (PDF). A PDF is an equation relating probability, random variable, and parameters of the distribution. Selected PDFs useful in engineering hydrology are described in this section. Normal Distribution The normal distribution is a symmetrical, bellshaped PDF also known as the Gaussian distribution, or the natural law of errors. It has two parameters: the mean μ, and the standard deviation σ, of the population. In practical applications, the mean x̄ and the standard deviation s derived from sample data are substituted for μ and σ, respectively. The PDF of the normal distribution is:
in which x is the random variable and f (x) is the continuous probability. By means of the transformation
the normal distribution can be converted into a oneparameter distribution, as follows:
in which z is the standard unit, which is normally distributed with zero mean and unit standard deviation. From Eq. 68:
in which z, the standard unit, is the frequency factor of the normal distribution. In general, the frequency factor of a statistical distribution is referred to as K. A cumulative density function (CDF) can be derived by integrating the probability density function. From Eq. 69, integration leads to
in which F(z) denotes cumulative probability and u is a dummy variable of integration. The distribution is symmetrical with respect to the origin; therefore, only half of the distribution needs to be evaluated. Table A5 (Appendix A) shows values of F(z) versus z, in which F(z) is integrated from the origin to z.
Lognormal Distribution For certain natural phenomena, values of random variables do not follow a normal distribution, but their logarithms do. In this case, a suitable PDF can be obtained by substituting y for x in the equation for the normal distribution, Eq. 67, in which y = ln (x). The parameters of the lognormal distribution are the mean and standard deviation of y : μ_{y} and σ_{y}. Gamma Distribution The gamma distribution is used in many applications of engineering hydrology. The PDF of the gamma distribution is the following:
for 0 < x < ∞, β > 0, and γ > 0. The parameter γ is known as the shape parameter, since it most influences the peakedness of the distribution, while the parameter β is called the scale parameter, since most of its influence is on the spread of the distribution [4]. The mean of the gamma distribution is βγ, the variance is β^{2}γ, and the skewness is 2/γ^{1/2}. The term Γ(γ) = (γ  1)! , in which γ is a positive integer, is an important definite integral referred to as the gamma function, defined as follows:
Pearson Distributions Pearson [24] has derived a series of probability functions to fit virtually any distribution. These functions have been widely used in practical statistics to define the shape of many distribution curves. The general PDF of the Pearson distributions is the following [6]:
in which a, b_{0} , b_{1}, b_{2} are constants. The criterion for determining the type of distribution is κ, defined as follows:
in which β_{1} = μ_{3}^{2}/μ_{2}^{3} and β_{2} = μ_{4}_{}/μ_{2}^{2}, with μ_{2}, μ_{3}, and μ_{4} being the second, third, and fourth moments about the mean. With μ_{3} = 0 (i.e., zero skewness), β_{1} = 0, κ = 0, and the Pearson distribution reduces to the normal distribution. The Pearson Type III distribution has been widely used in flood frequency analysis. In the Pearson Type III distribution, κ = ∞, which implies that 2β_{2} = (3β_{1} + 6). This is a threeparameter skewed distribution with the following PDF:
and parameters β, γ, and x_{o}. For x_{o} = 0, the Pearson Type III distribution reduces to the gamma distribution (Eq. 612). For γ = 1, the Pearson Type III distribution reduces to the exponential distribution, with the following PDF:
The mean of the Pearson Type III distribution is: x_{o} + βγ;
the variance is: β^{2}γ; and
the skewness is: Extreme Value Distributions The extreme value distributions Types I, II, and III are based on the theory of extreme values. Frechet (on Type II) in 1927 [8] and Fisher and Tippett (on Types I and III) in 1928 [8] independently studied the statistical distribution of extreme values. Extreme value theory implies that if a random variable Q is the maximum in a sample of size n from some population of x values, then, provided n is sufficiently large, the distribution of Q is one of three asymptotic types (I, II, or III), depending on the distribution of x. The extreme value distributions can be combined into one and expressed as a general extreme value (GEV) distribution [23]. The cumulative density function of the GEV distribution is:
in which k, u and α are parameters. The parameter k defines the type of distribution, u is a location parameter, and α is a scale parameter. For k = 0, the GEV distribution reduces to the extreme value Type I (EV1), or Gumbel distribution. For k < 0, the GEV distribution is the extreme value Type II (EV2), or Frechet distribution. For k > 0, the GEV distribution is the extreme value Type III (EV3), or Weibull distribution. The GEV distribution is useful in applications where an extreme value distribution is being considered but its type is not known a priori. Gumbel [13, 14, 15, 16] has fitted the extreme value Type I distribution to long records of river flows from many countries. The cumulative density function (CDF) of the Gumbel distribution is the following double exponential function:
in which y = (x  u)/α is the Gumbel (reduced) variate. The mean ȳ_{n} and standard deviation σ_{n} of the Gumbel variate are functions of record length n. Values of ȳ_{n} and σ_{n} as a function of n are given in Table A8 (Appendix A). When the record length approaches ∞, the mean ȳ_{n} approaches the value of the Euler constant (0.5772) [29], and the standard deviation σ_{n} approaches the value π /6^{1/2}. The skew coefficient of the Gumbel distribution is 1.14. The extreme value Type II distribution is also known as the log Gumbel. Its cumulative density function is:
for k < 0. The extreme value Type III distribution has the same CDF as the Type II, but in this case k > 0. As k approaches 0, the EV2 and EV3 distributions converge to the EV1 distribution. 6.2 FREQUENCY ANALYSIS
Flood frequency analysis refers to the application of frequency analysis to study the occurrence of floods. Historically, many probability distributions have been used for this purpose. The normal distribution was first used by Horton [19] in 1913, and shortly thereafter by Fuller [11]. Hazen [17] used the lognormal distribution to reduce skewness, whereas Foster [9] preferred to use the skewed Pearson distributions. The logarithmic version of the Pearson Type III distribution, i.e., the log Pearson III, has been endorsed by the U.S. Interagency Advisory Committee on Water Data for general use in the United States [31]. The Gumbel distribution (extreme value Type I, or EVl) is also widely used in the United States and throughout the world. The log Pearson III and Gumbel methods are described in this section. Selection of Data Series The complete record of streamflows at a given gaging station is called the complete duration series. To perform a flood frequency analysis, it is necessary to select a flood series, i.e., a sample of flood events extracted from the complete duration series. There are two types of flood series: (1) the partial duration series and (2) the extreme value series. The partial duration (or peaksoverathreshold (POT) [23] series consists of floods whose magnitude is greater than a certain base value. When the base value is such that the number of events in the series is equal to the number of years of record, the series is called an annual exceedance series. In the extreme value series, every year of record contributes one value to the extreme value series, either the maximum value (as in the case of flood frequency analysis) or the minimum value (as in the case of lowflow frequency analysis). The former is the annual maxima series; the latter is the annual minima series. The annual exceedance series takes into account all extreme events above a certain base value, regardless of when they occurred. However, the annual maxima series considers only one extreme event per yearly period. The difference between the two series is likely to be more marked for short records in which the second largest annual events may strongly influence the character of the annual exceedance series. In practice, the annual exceedance series is used for frequency analyses involving short return periods, ranging from 2 to 10 y. For longer return periods the difference between annual exceedance and annual maxima series is small. The annual maxima series is used for return periods ranging from 10 to 100 y and more. Return Period, Frequency, and Risk The time elapsed between successive peak flows exceeding a certain flow Q is a random variable whose mean value is called the return period T (or recurrence interval) of the flow Q. The relationship between probability and return period is the following:
in which P(Q) is the probability of exceedance of Q, or frequency. The terms frequency and return period are often used interchangeably, although strictly speaking, frequency is the reciprocal of return period. A frequency of 1/T, or one in T years, corresponds to a return period of T years. The probability of nonexceedance P(Q̄) is the complementary probability of the probability of exceedance P(Q), defined as
The probability of nonexceedance in n successive years is
Therefore, the probability, or risk, that Q will occur at least once in n successive years is
Plotting Positions Frequency distributions are plotted using probability papers. One of the scales on a probability paper is a probability scale; the other is either an arithmetic or logarithmic scale. Normal and extreme value probability distributions are most often used in probability papers. An arithmetic probability paper has a normal probability scale and an arithmetic scale. This type of paper is used for plotting normal and Pearson distributions. A log probability paper has a normal probability scale and a logarithmic scale and is used for plotting lognormal and log Pearson distributions. An extreme value probability paper has an extreme value scale and an arithmetic scale and is used for plotting extreme value distributions. Data fitting a normal distribution plot as a straight line on arithmetic probability paper. Likewise, data fitting a lognormal distribution plot as a straight line on log probability paper, and data fitting the Gumbel distribution plot as a straight line on extreme value probability paper. For plotting purposes, the probability of an individual event can be obtained directly from the flood series. For a series of n annual maxima, the following ratio holds:
in which x̄ = mean number of exceedances; N = number of trials; n = number of values in the series; and m = the rank of descending values, with largest equal to 1. For example, if n = 79, the second largest value in the series (m = 2) will be exceeded twice on the average (x̄ = 2) in 80 trials (N = 80). Likewise, the largest value in the series (m = 1) will be exceeded once on the average (x̄ = 1) after 80 trials (N = 80). Since return period T is associated with x̄ = 1, Eq. 625 can be expressed as follows:
in which P = exceedance probability. Equation 626 is known as the Weibull plotting position formula. This equation is commonly used in hydrologic applications, particularly for computing plotting positions for unspecified distributions [1]. A general plotting position formula is of the following form [12]:
in which a = parameter. Cunnane [7] performed a detailed study of the accuracy of different plotting position formulas and concluded that the Blom formula [3], with a = 0.375 in Eq. 627, is most appropriate for the normal distribution, whereas the Gringorten formula, with a = 0.44, should be used in connection with the Gumbel distribution. According to Cunnane, the Weibull formula, for which a = 0, is most appropriate for a uniform distribution. In computing plotting positions, when the ranking of values is in descending order (from highest to lowest), P is the probability of exceedance, or the probability of a value being greater than or equal to the ranked value. When the ranking of values is in ascending order (from lowest to highest), P is the probability of nonexceedance, or the probability of a value being less than or equal to the ranked value. The computation of plotting positions is illustrated by the following example.
Curve Fitting Once the data have been plotted on probability paper, the next step is to fit a curve through the plotted points. Curve fitting can be accomplished by any of the following methods: (1) graphical, (2) least square, (3) moments, and (4) maximum likelihood. The graphical method consists of fitting a function visually to the data. This method, however, has the disadvantage that the results are highly dependent on the skills of the person doing the fitting. A more consistent procedure is to use either the least square, moments, or maximum likelihood methods. In the least square method, the sum of the squares of the differences between observed data and fitted values is minimized. The minimization condition leads to a set of m normal equations, where m is the number of parameters to be estimated. The simultaneous solution of the normal equations leads to the parameters describing the fitting (Chapter 7). To apply the method of moments, it is first necessary to select a distribution; then, the moments of the distribution are calculated based on the data. The method provides an exact theoretical fitting, but the accuracy is substantially affected by errors in the tail of the distribution (i.e., events of long return period). A disadvantage of the method is the uncertainty regarding the adequacy of the chosen probability distribution. In the method of maximum likelihood, the distribution parameters are estimated in such a way that the product of probabilities (i.e., the joint probability, or likelihood) is maximized. This is obtained in a similar manner to the least square method by partially differentiating the likelihood with respect to each of the parameters and equating the result to zero. The four fitting methods can be rated in ascending order of effectiveness: graphical, least square, moments, and maximum likelihood. The latter, however, is somewhat more difficult to apply [6, 21]. In practice, the method of moments is the most commonly used curve fitting method (see, for instance, the log Pearson III and Gumbel methods described later in this section). Frequency Factors Any value of a random variable may be represented in the following form:
in which x = value of random variable; x̄ = mean of the distribution, and Δx = departure from the mean, a function of return period and statistical properties of the distribution. This departure from the mean can be expressed in terms of the product of the standard deviation s and a frequency factor K such that Δx = K s. The frequency factor is a function of return period and probability distribution to be used in the analysis. Therefore, Eq. 628 can be written in the following form:
or, alternatively,
in which C_{v} = variance coefficient. Equation 629 was proposed by Chow [5] as a general equation for hydrologic frequency analysis. For any probability distribution, a relationship can be determined between frequency factor and return period. This relationship can be expressed in analytical terms, in the form of tables, or by KT curves. In using the procedure, the statistical parameters are first determined from the analysis of the flood series. For a given return period, the frequency factor is determined from the curves or tables and the flood magnitude computed by Eq. 629. Log Pearson III Method The log Pearson III method of flood frequency analysis is described in Bulletin 17B: Guidelines for Determining Flood Flow Frequency, published by the U.S. Interagency Advisory Committee on Water Data, Reston, Virginia [31]. Methodology To apply the method, the following steps are necessary:
The procedure is illustrated by the following example.
Regional Skew Characteristics The skew coefficient of the flood series (i.e., the station skew) is sensitive to extreme events. The overall accuracy of the method is improved by using a weighted value of skew in lieu of the station skew. First, a value of regional skew is obtained, and the weighted skew is calculated by weighing station and regional skews in inverse proportion to their mean square errors (MSE). The formula for weighted skew is the following:
in which C_{sw} = weighted skew; C_{sy} = station skew; C_{sr} = regional skew; (MSE)_{sy} = mean square error of the station skew; and (MSE)_{sr} = mean square error of the regional skew. To develop a value of regional skew, it is necessary to assemble data from at least 40 stations or, alternatively, all stations within a 160km radius. The stations should have at least 25 y of record. In certain cases, the paucity of data may require a relaxation of these criteria. The procedure includes analysis by three methods: (1) skew isolines map, (2) skew prediction equation, and (3) statistics of station skews. To develop a skew isolines map, each station skew is plotted on a map at the centroid of its catchment area, and the plotted data are examined to identify any geographic or topographic trends. If a pattern is evident, isolines (lines of equal skew) are drawn and the MSE is computed. The MSE is the mean of the square of the differences between observed skews and isoline skews. If no pattern is evident, an isoline map cannot be developed, and this method is not considered further. In the second method, a prediction equation is used to relate station skew to catchment properties and climatological variables. The MSE is the mean of the square of the differences between observed and predicted skews. In the third method, the mean and variance of the station skews are calculated. In some cases, the variability of runoff may be such that all the stations may not be hydrologically homogeneous. If this is the case, the values of about 20 stations can be used to calculate the mean and variance of the data. Of the three methods, the one providing the most accurate estimate of skew coefficient is selected. First a comparison of the MSEs from the isolines map and prediction equations is made. Then the smaller MSE is compared to the variance of the data. If the smaller MSE is significantly smaller than the variance, it should be used in Eq. 633 as (MSE)_{sr}. If this is not the case, the variance should be used as (MSE)_{sr}, with the mean of the station skews used as regional skew (C_{sr}). In the absence of regional skew studies, generalized values of regional skew for use in Eq. 633 can be obtained from Fig. 65. When regional skew is obtained from this figure, the mean square error of the regional skew is MSE_{sr} = 0.302. The mean square error of the station skew is approximated by the following formula:
in which
with G = absolute value of the station skew, and n = record length in years.
Treatment of Outliers Outliers are data points that depart significantly from the overall trend of the data. The treatment of these outliers (i.e., their retention, modification, or deletion) may have a significant effect on the value of the statistical parameters computed from the data, particularly for small samples. Procedures for treatment of outliers invariably require judgment involving mathematical and hydrologic considerations. The detection and treatment of high and low outliers in the log Pearson III method is performed in the following way [31]. For station skew greater than +0.4, tests for high outliers are considered first. For station skew less than 0.4, tests for low outliers are considered first. For station skew in the range 0.4 to +0.4, tests for high and low outliers are considered simultaneously, without eliminating any outliers from the data. The following equation is used to detect high outliers:
in which y_{H} = high outlier threshold (in log units); and K_{n} = outlier frequency factor, a function of record length n. Values of y_{H} are given in Table A7 (Appendix A). Values of y_{i} (logarithms of the flood series) greater than y_{H} are considered to be high outliers. If there is sufficient evidence to indicate that a high outlier is a maximum in an extended period of time, it is treated as historical data. Otherwise, it is retained as part of the flood series. Historical data refers to flood information outside of the flood series, which may be used to extend the record to a period much longer than that of the flood series. Historical knowledge is used to define the historical period H, which is longer than the record period n. The number z of events that are known to be the largest in the historical period are given a weight of 1. The remaining n events from the flood series are given a weight of (H  z)/n. For instance, for a record length n = 44 y, a historical period H = 77 y, and a number of peaks in the historical period z = 3, the weight applied to the three historical peaks would be 1, and the weight applied to the remaining flood series would be (77  3)/44 = 1.68. In other words, the record is extended to 77 y, and the 44 y of flood series (excluding outliers that have been considered part of the historical data) represent 74 y of data in the historical period of 77 y [31]. The following equation is used to detect low outliers:
in which y_{L} = low outlier threshold (in log units) and other terms are as defined previously. If an adjustment for historical data has been previously made, the values on the righthand side of Eq. 636 are those previously used in the historically weighted computation. Values of y_{i} smaller than y_{L} are considered to be low outliers and deleted from the flood series [31]. Complements to Flood Frequency Estimates The accuracy of flood estimates based on frequency analysis deteriorates for values of probability much greater than the record length. This is due to sampling error and to the fact that the underlying distribution is not known with certainty. Alternative procedures that complement the information provided by flood frequency analysis are recommended. These procedures include flood estimates from precipitation data (e.g., unit hydrograph, Chapter 5) and comparison with catchments of similar hydrologic characteristics (regional analysis, Chapter 7). Table 64 shows the relationship between the various types of analysis used in flood frequency studies.
Gumbel's Extreme Value Type I Method The extreme value Type I distribution, also known as the Gumbel method [16], or EVl, has been widely used in the United States and other countries. The method is a special case of the threeparameter GEV distribution described in the British Flood Studies Report [23]. The cumulative density function F(x) of the Gumbel method is the double exponential, Eq. 619, repeated here for convenience:
in which F(x) is the probability of nonexceedance. In flood frequency analysis, the probability of interest is the probability of exceedance, i.e., the complementary probability to F(x):
The return period T is the reciprocal of the probability of exceedance. Therefore,
From Eq. 638:
In the Gumbel method, values of flood discharge are obtained from the frequency formula, Eq. 629, repeated here for convenience:
The frequency factor K is evaluated with the frequency formula:
in which y = Gumbel (reduced) variate, a function of return period (Eq. 639); and ȳ_{n} and σ_{n} are the mean and standard deviation of the Gumbel variate, respectively. These values are a function of record length n (see Table A8, Appendix A). In Eq. 629, for K = 0, x is equal to the mean annual flood x̄. Likewise, in Eq. 640, for K = 0, the Gumbel variate y is equal to its mean ȳ_{n}. The limiting value of ȳ_{n}, for n → ∞ is the Euler constant, 0.5772 [28]. In Eq. 638, for y = 0.5772: T = 2.33 years. Therefore, the return period of 2.33 y is taken as the return period of the mean annual flood. From Eqs. 629 and 640:
and with Eq. 639:
The following steps are necessary to apply the Gumbel method:
Values of Q are plotted against y or T (or P) on Gumbel probability paper, and a straight line is drawn through the points. Gumbel probability paper has an arithmetic scale of Gumbel variate y in the abscissas and an arithmetic scale of flood discharge Q in the ordinates. To facilitate the reading of frequencies and probabilities, Eq. 638 may be used to superimpose a scale of return period T (or probability P) on the arithmetic scale of Gumbel variate y.
Modifications to the Gumbel Method Since its inception in the 1940s, several modifications to the Gumbel method have been proposed. Gringorten [12] has shown that the Gumbel distribution does not follow the Weibull plotting rule, Eq. 6 26 (or Eq. 627 with a = 0). He recommended a = 0.44, which led to the Gringorten plotting position formula:
Lettenmaier and Burges [22] have suggested that better flood estimates are obtained by using the limiting values of mean and standard deviation of the Gumbel
variate (i.e., those corresponding to In this case, ȳ_{n} = 0.5772, and σ_{n} = π / 6^{1/2} = 1.2825. Therefore, Eq. 641 reduces to
and Eq. 642 reduces to
Lettenmaier and Burges [22] have also suggested that a biased variance estimate, using n as the divisor in Eq. 63, yields better estimates of extreme events that the usual unbiased estimate, that is, the divisor n  1. Comparison Between Flood Frequency Methods In 1966, the Hydrology Subcommittee of the U.S. Water Resources Council began work on selecting a suitable method of flood frequency analysis that could be recommended for general use in the United States. The committee tested the goodness of fit of six distributions: (1) lognormal, (2) log Pearson III, (3) Hazen, (4) gamma, (5) Gumbel (EV1) and (6) log Gumbel (EV2). The study included ten sets of records, the shortest of which was 40 y. The findings showed that the first three distributions had smaller average deviations that the last three. Since the Hazen distribution is a type of lognormal distribution and the lognormal is a special case of the log Pearson III, the Committee concluded that the latter was the most appropriate of the three, and hence recommended it for general use. The same type of analysis was repeated for six sets of records in the United Kingdom, the shortest of which was 32 y [2]. The methods were: (1) gamma, (2) log gamma, (3) lognormal, (4) Gumbel (EV1), (5) GEV, (6) Pearson Type III, and (7) log Pearson III. At low return periods (from 2 to 5 y), the GEV and Pearson Type III showed the smallest average deviations, whereas for return periods exceeding 10 y the log Pearson III method had the smallest average deviations. Similar comparative studies were reported in the British Flood Studies Report [23]. The study concluded that the threeparameter distributions (GEV, Pearson Type III, and log Pearson III) provided a better fit than the twoparameter distributions (Gumbel, lognormal, gamma, log gamma). Based on mean absolute deviation criteria, the study rated the log Pearson III method better than the GEV and the latter better than the Pearson Type III. However, based on root mean square deviation, it rated the Pearson Type III better than both the log Pearson III and GEV distributions. Although in general, the threeparameter methods seemed to fare better than the twoparameter methods, the latter should not be completely discarded. The British Flood Studies Report [23] observed that their use in connection with short record lengths often leads to results which are more sensible than those obtained by fitting threeparameter distributions. A threeparameter distribution fitted to a small sample may in some cases imply that there is an upper bound to the flood discharge equal to about twice the mean annual flood. While there may be an upper limit to flood magnitude, it is certainly higher than twice the mean annual flood. 6.3 LOWFLOW FREQUENCY
Whereas high flows lead to floods, sustained low flows can lead to droughts. A drought is defined as a lack of rainfall so great and continuing so long as to affect the plant and animal life of a region adversely and to deplete domestic and industrial water supplies, especially in those regions where rainfall is normally sufficient for such purposes [18]. In practice, a drought refers to a period of unusually low water supplies, regardless of the water demand. The regions most subject to droughts are those with the greatest variability in annual rainfall. Studies have shown that regions where the variance coefficient of annual rainfall exceeds 0.35 are more likely to have frequent droughts [6]. Low annual rainfall and high annual rainfall variability are typical of arid and semiarid regions. Therefore, these regions are more likely to be prone to droughts. Studies of tree rings, which document long term trends of rainfall, show clear patterns of periods of wet and dry weather [30]. While there is no apparent explanation for the cycles of wet and dry weather, the dry years must be considered in planning water resource projects. Analysis of long records has shown that there is a tendency for dry years to group together. This indicates that the sequence of dry years is not random, with dry years tending to follow other dry years. It is therefore necessary to consider both the severity and duration of a drought period. The severity of droughts can be established by measuring:
Alternatively, lowflowfrequency analysis can be used in the assessment of the probability of occurrence of droughts of different durations.
Methods of lowflow frequency analysis are based on an assumption of invariance of meteorological conditions. The absence of long records, however, imposes a stringent limitation on lowflow frequency analysis. When records of sufficient length are available, analysis begins with the identification of the lowflow series. Either the annual minima or the annual exceedance series are used. In a monthly analysis, the annual minima series is formed by the lowest monthly flow volumes in each year of record. If the annual exceedance method is chosen, the lowest monthly flow volumes in the record are selected, regardless of when they occurred. In the latter method, the number of values in the series need not be equal to the number of years of record. A flow duration curve can be used to give an indication of the severity of low flows. Such a curve, however, does not contain information on the sequence of low flows or the duration of possible droughts. The analysis is made more meaningful by abstracting the minimum flows over a period of several consecutive days. For instance, for each year, the 7day period with minimum flow volume is abstracted, and the minimum flow is the average flow rate for that period. A frequency analysis on the lowflow series, using the Gumbel method, for instance, results in a function describing the probability of occurrence of low flows of a certain duration. The same analysis repeated for other durations leads to a family of curves depicting lowflow frequency, as shown in Fig. 67 [28]. In reservoir design, the assessment of low flows is aided by a flowmass curve. The technique involves the determination of storage volumes required for all lowflow periods. Although it is practically impossible to provide sufficient storage to meet hydrologic risks of great rarity, common practice is to provide for a stated risk (i.e., a drought probability) and to add a suitable percent of the computed storage volume as reserve storage allowance. The variance coefficient of annual flows is used in determining the risk and storage allowance levels. Extraordinary drought levels are then met by cutting draft rates. Regulated rivers may alter natural flow conditions to provide a minimum downstream flow for specific purposes. In this case, the reservoirs serve as the mechanism to diffuse the natural flow variability into downstream flows which can be made to be nearly constant in time. Regulation is necessary for downstream lowflow maintenance, usually for the purpose of meeting agricultural, municipal and industrial water demands, minimum instream flows, navigation draft, and water pollution control regulations. 6.4 DROUGHTS
Drought is a weatherrelated natural phenomenon, affecting regions of the Earth for months or years. It has an impact on food production, reducing life expectancy and the economic performance of large geographic regions or entire countries. Drought is a recurrent feature of the climate; it occurs in virtually all climatic zones, with its characteristics varying significantly among regions. Drought differs from aridity in that drought is temporary; aridity is a permanent characteristic of regions with low rainfall. Drought is related to a deficiency of precipitation over an extended period of time, usually for a season or more (Fig. 68). This deficiency results in a water shortage for some activity, group, or environmental sector. Drought is also related to the timing of precipitation. Other climatic factors such as high temperature, high wind, and low relative humidity are often associated with drought. Drought is more than a physical phenomenon or natural event. Its impact results from the relation between a natural event and the demands on the water supply, and it is often exacerbated by human activities. The experience from droughts has underscored the vulnerability of human societies to this natural hazard.
Definition of drought
Drought definitions are of two types: (1) conceptual, and (2) operational. Conceptual definitions help understand the meaning of drought and its effects. For example, drought is a protracted period of precipitation deficiency which causes extensive damage to crops, resulting in loss of yield. Operational definitions help identify the drought's beginning, end, and degree of severity. To determine the beginning of drought, operational definitions specify the degree of departure from the precipitation average over some time period. This is usually accomplished by comparing the current situation (the study period) with the historical average. The threshold identified as the beginning of a drought (e.g., 75% of average precipitation over a specified time period) is usually established somewhat arbitrarily. An operational definition for agriculture may compare daily precipitation to evapotranspiration to determine the rate of soilmoisture depletion, and express these relationships in terms of drought effects on plant behavior. Operational definitions are used to analyze drought frequency, severity, and duration for a given historical period. Such definitions, however, require weather data on hourly, daily, monthly, or other time scales and, possibly, impact data (e.g., crop yield). A climatology of drought for a given region provides a greater understanding of its characteristics and the probability of recurrence at various levels of severity. Information of this type is beneficial in the formulation of mitigation strategies. Types of droughts The following types of drought have been identified:
Meteorological drought is defined on the basis of the degree of dryness, in comparison to a normal or average amount, and the duration of the dry period. Definitions of meteorological drought must be regionspecific, since the atmospheric conditions that result in deficiencies of precipitation are highly variable. The variety of meteorological definitions in different countries illustrates why it is not possible to apply a definition of drought developed in one part of the world to another. For instance, the following definitions of drought have been reported:
Data sets required to assess meteorological drought are: (1) daily rainfall, (2) temperature, (3) humidity, (4) wind velocity, and (5) evaporation.
Agricultural drought links various characteristics of meteorological drought to agricultural impacts, focusing on precipitation shortages, differences between actual and potential evapotranspiration, soilwater deficits, reduced groundwater or reservoir levels, and so on. Plant water demand depends on prevailing weather conditions, biological characteristics of the specific plant, its stage of growth, and the physical and biological properties of the soil. A good definition of agricultural drought should account for the susceptibility of crops during different stages of crop development. Deficient topsoil moisture at planting may hinder germination, leading to low plant populations per hectare and a reduction of yield. Data sets required to assess agricultural drought are: (1) soil texture, (2) soil fertility, (3) soil moisture, (4) crop type and area, (5) crop water requirements, (6) pests, and (7) climate. Hydrological drought refers to a persistently low discharge and/or volume of water in streams and reservoirs, lasting months or years. Hydrological drought is a natural phenomenon, but it may be exacerbated by human activities. Hydrological droughts are usually related to meteorological droughts, and their recurrence interval varies accordingly. Changes in land use and land degradation can affect the magnitude and frequency of hydrological droughts. Data sets required to assess hydrological drought are: (1) surfacewater area and volume, (2) surface runoff, (3) streamflow measurements, (4) infiltration, (5) watertable fluctuations, and (6) aquifer properties. Socioeconomic drought associates the supply and demand of some economic good with elements of meteorological, hydrological, and agricultural drought. It differs from the other types of drought in that its occurrence depends on the processes of supply and demand. The supply of many economic goods, such as water, forage, food grains, fish, and hydroelectric power, depends on the weather. Due to the natural variability of climate, water supply is ample in some years but insufficient to meet human and environmental needs in other years. Socioeconomic drought occurs when the demand for an economic good exceeds the supply as a result of a weatherrelated shortfall in water supply. The drought may result in significantly reduced hydroelectric power production because power plants were dependent on streamflow rather than storage for power generation. Reducing hydroelectric power production may require the government to convert to more expensive petroleum alternatives and to commit to stringent energy conservation measures to meet its power needs. The demand for economic goods is increasing as a result of population growth and economic development. The supply may also increase because of improved production efficiency, technology, or the construction of reservoirs. When both supply and demand increase, the critical factor is their relative rate of change. Socioeconomic drought is promoted when the demand for water for economic activities far exceeds the supply. Data sets required to assess socioeconomic drought are: (1) human and animal population, (2) growth rate, (3) water and fodder requirements, (4) severity of crop failure, and (5) industry type and water requirements. IntensityDurationFrequency Relations The relations between intensity, duration, and frequency of droughts may be analyzed by the conceptual model described in Table 66 [27]. The conceptual approach is applicable to meteorological droughts lasting at least one year, in midlatitudinal regions where the prevailing climate may be primarily characterized by precipitation. The climate types, from superarid to superhumid, are defined in terms of mean annual precipitation P_{ma} (mm) as shown in Table 66, Line 1:
The (mean) annual global terrestrial precipitation is P_{agt} = 800 mm [27]. At the extremes of the climatic spectrum, mean annual precipitation is less than 100 mm (superarid), or greater than 6400 mm (superhumid). The superarid example is the Atacama desert, in northern Chile, with mean annual precipitation P_{ma} = 0.5 mm, which is hardly measurable. The superhumid example is Cherrapunji, in Meghalaya, Esatern India, with mean annual precipitation P_{ma} = 11,777 mm, long considered by many as the wettest spot on Earth. However, Mawsynran, near Cherrapunji, now boasts P_{ma} = 11,873 mm, effectively edging out Cherrapunji of the distinction. Climates types may also be defined as the ratio of mean annual precipitation P_{ma} to (mean) annual global terrestrial precipitation P_{agt} (Line 2). The ratio P_{ma}/P_{agt} = 1 depicts the middle of the climatic spectrum. The conceptual model is also defined in terms of the annual potential evaporation (evapotranspiration) E_{ap} (Line 3) and of the ratio of annual potential evaporation to mean annual precipitation E_{ap}/P_{ma} (Line 4). The ratio E_{ap}/P_{ma} = 2 describes the middle of the climatic spectrum. To complement the description, the length of the rainy season L_{rs} is also indicated (Line 5).
For any year for which P is the annual precipitation, drought intensity I is defined as the ratio of the deficit (P_{ma}  P) to the mean (P_{ma}). For any year, an intensity I = 0.25 is classified as moderate;
For drought durations lasting two years or more, intensity is the summation of the individual annual intensities (Lines 68, Table 66). Therefore, the longer the drought duration, the greater the drought intensity. Extreme intensities are generally associated with droughts of long duration. Experience has shown that the longer droughts generally occur around the middle of the climatic spectrum (800 mm of mean annual precipitation). Drought duration varies between 1 yr (or less) at the extremes of the climatic spectrum and (about) 6 yr around the middle (Line 9) [26]. Droughts lasting more than 6 yr are uncommon; they are more likely to be driven by anthropogenic pressures, for instance, deforestation or overgrazing [25]. A classical example of an anthropogenically derived drought is that of the Sahel, in Northern Africa (Fig. 69), where, in the past 40 years, droughts have had a tendency to persist for durations much longer than normal. Figure 610 shows values of standardized annual seasonal rainfall (JuneOctober) in the Sahel for the period 18982004. The standardized annual rainfall has zero mean and unit standard deviation. Note that through the 1980s, drought in the Sahel persisted for more than 10 years.
Figure 69 Mean annual precipitation in the Sahel, North Africa.
In general, the dry periods (drought events) are followed by corresponding wet periods. Therefore, the drought recurrence interval (i.e., the reciprocal of the frequency) is always greater than the drought duration. Drought recurrence intervals increase from 2 year on the extreme dry side of the climatic spectrum (superarid) to (more than) 100 years on the extreme wet side (superhumid) (Line 10, Table 66). QUESTIONS
PROBLEMS
REFERENCES
Suggested Readings

Documents in Portable Document Format (PDF) require Adobe Acrobat Reader 5.0 or higher to view; download Adobe Acrobat Reader. 