Abstract
Climate change is increasing the risk of atmospheric perils which is causing broader uptake of catastrophe models. Historically niche products used by finance experts and actuaries, catastrophe models are now utilised across sectors and the public to understand their location-specific risk. We analyse the performance of seven catastrophe flood models: Verisk, KatRisk, Moody’s RMS, Karen Clark & Company, Aon, Florida Public Flood Loss Model, and First Street Technology, Inc. We find that the three catastrophe models used in the National Flood Insurance Program (NFIP) pricing – Verisk, KatRisk, and Moody’s RMS – accurately represent historical flood losses with a 4% differential compared to NFIP claims data. Five catastrophe flood models for Florida report similar damage estimates with one – Karen Clark & Company – showing flood damage approximately 13x lower than others. In contrast, First Street Technology, Inc., reports flood damage estimates roughly twice as high across the United States. The wide range of loss estimates from the state level to the asset level indicates methodologies vary significantly across modelers. Further transparency of catastrophe model outputs and methodologies is necessary to ensure these models accurately price climate risk to prevent maladaptation and ensure effective risk reduction investments.
Key Points
- Catastrophe models are becoming widely used, but public validation of these models is limited.
- Verisk, Moody’s RMS, and KatRisk loss rates are within 4% of historical losses from the National Flood Insurance Program
- First Street Technology, Inc. reports higher flood losses than other models nationally, while Karen Clark & Company reports lower flood losses in Florida.
1. Introduction
Catastrophe modelers and climate risk data providers have proliferated over the past 10 years as companies, governments, and communities look to prepare for the impacts of climate change. Yet, there is little disclosure of methodology and validation of these models (Condon, 2023; Pollack et al., 2023). Publication of model validation analyses is vital as these models have a large influence on society through the insurance, financial, and real estate systems (Condon, 2023). Catastrophe modelers provide data to insurance, reinsurance (i.e., insurance for insurers) firms, and financial institutions. First Street Technology, Inc. (First Street) provides climate risk information generated from their catastrophe models to more than 30 US federal agencies, state and local governments, real estate portals such as Realtor.com, Redfin.com and Zillow.com, and financial companies (Eby, 2023). Not only is validating these models crucial to ensure risk is not over or underpriced, but these financial signals motivate market behaviour that determine risk exposure. Just as importantly, many public users with minimal training in assessing risk now have access to catastrophe model and climate risk data. Without public validation of these models, many users are unaware of the limitations of these models.
Disclosure of methodology and publication of model output is not a common occurrence in the catastrophe modeling industry. Occasionally, modelers, such as Verisk and RMS, will release a white paper detailing a validation exercise but with minimal information on methodology and model output (RMS, 2012; Wojtkiewicz and Ramanathan, 2020). Flow of information between academia and government agencies has largely been one directional – catastrophe modelers often rely on scientific literature to build models, but do not typically share data or methodologies with researchers (Guin et al., 2024). An exception is First Street which has published detailed methodologies and reports detailing aggregated results, but geospatial and property specific data are hidden behind a paywall. In February 2021, First Street published a white paper estimating the average annual loss (AAL) from flooding for the continental United States (CONUS) and disaggregated by state. AAL is the measure of the average loss per year over a specified time period. Insurers use AAL to estimate how much they should charge for policies and is a common output of catastrophe models used by risk management practitioners.
Since catastrophe modelers provide so little data to the public, there are very few published studies that compare catastrophe models. Schubert et al. (2024) analysed the results of two flood models – First Street and PRIMo-Drain – in Los Angeles, CA and found the two models agreed on which properties were at risk roughly 1 out of every 4 times. Chegwidden et al. (2024) looked at how flood and wildfire risk data from XDI and Jupiter Intelligence, two climate data risk providers, (dis)agree and found similarities at the state-level, but “systematic differences” at the property-level. Both these studies demonstrate that models vary widely in their assessment of asset-level risk. A comparison of eight global flood models, including catastrophe modelers KatRisk, Fathom, and JBA, for China revealed variability up to a factor of 4 in inundation area and gross domestic product at risk (Aerts et al., 2020). Validation of the Fathom flood model (First Street uses the same flood hazard model as First Street but different vulnerability and exposure models) showed strong agreement with historical flood losses in the United Kingdom (Bates et al., 2023). The Fathom flood catastrophe model has also been validated previously using National Flood Insurance Program (NFIP) loss data and we attempt to reproduce components of that validation exercise in this paper (Wing et al., 2022).
Additionally, the only government in the U.S. that regulates flood catastrophe models is the state of Florida. The Florida Commission on Hurricane Loss Projection Methodology (FCHLPM) reviews any catastrophe model that in-state insurers are allowed to use for pricing premiums. Three flood model submissions to the FCHLPM have been released as of this writing: Karen Clark & Company, Aon, and the Florida Public Flood Loss Model (FPFLM). Only the hurricane wind models submitted to the FCHLPM have been compared previously in the literature – no comparison of FCHLPM flood models has been conducted – where Weinkle and Pielke Jr. (2017) found that across five catastrophe models, the 95% confidence band range was $33 billion to $192 billion for the 100-year hurricane wind event. It is evident how little information on catastrophe model validation is available in the public domain.
In this paper, we aim to address this data gap by comparing AAL flood results across seven catastrophe modelers: Verisk, KatRisk, Moody’s RMS (RMS), Karen Clark & Company, Aon, FPFLM, and First Street. Catastrophe modelers occasionally publish aggregate AAL estimates but the Federal Emergency Management Agency (FEMA) does so regularly. These vendors submit model results to support NFIP (re)insurance pricing activities. This reinsurance placement data will be used as a comparison to the loss estimates from First Street (First Street, 2021). In addition to comparing results across catastrophe models, we will also validate the catastrophe models’ AAL using historical data from the NFIP. We focus on comparing AAL between catastrophe models because it is the only publicly available metric that can be used to compare models. Regardless, if the AAL is based on aggregate exceedance probabilities or event tables, mathematically the AAL would be the same, regardless of correlations between properties. This is a crucial point since First Street released in 2024 a new financial loss methodology that incorporates correlations across properties. Therefore, the results presented in this paper apply similarly to the AAL of the new First Street correlated flood risk model assuming vulnerability functions and exposure datasets are unchanged. Additionally, if the AAL of a model is historically inconsistent, it would raise concerns that other risk metrics, such as tail value at risk, produced by the model are historically inconsistent. This is because the AAL and tail risk metrics are based on the same exceedance probability curve, and if the AAL is shown to be unreliable then there is a possibility that the underlying exceedance probability curve, and subsequent tail risk metrics, are also unreliable.
2. Methods
2.1. Comparison of Catastrophe Models and FEMA Claims Data
This section is divided into two separate comparisons: 1) NFIP historical losses compared to catastrophe modeled losses; 2) Comparison between NFIP and First Street average relative number of properties flooded. Due to the limitations of data that is publicly available from the NFIP and catastrophe models, we limit the comparisons to these two analyses. In order to appropriately compare the AAL of catastrophe models to the historical NFIP AAL, both datasets require normalisation. Simply comparing the AAL values across datasets would be useless since the sample of properties used are different and thus the exposure levels are different. For an appropriate comparison, we divide the AAL by the total coverage amount in thousands. This results in the AAL per US$1,000 of insurance coverage (also known as the burn rate or loss cost), respectively. The burn rate is a common metric in the insurance industry and while commonly applied in property and casualty insurance, can also be used in health insurance (Clark, 1996).
2.1.1. NFIP Loss Rate and Catastrophe Model Burn Rate Comparison
The loss rate of the NFIP is defined as the AAL in 2022 USD (building and contents) per $1,000 of coverage in 2022 USD in 2022, as shown in equation (1). We use the year 2022 because that is the year the catastrophe model data was generated. The payment amounts are taken from the NFIP claims database for 1978-2024, while the total coverage for 2022 was taken from the NFIP policies database. To represent present day (i.e., 2022) exposure and losses, we modify NFIP annual losses by accounting for inflation and an increase in the amount of assets in a location, referred to as loss trending. Trending losses is commonly done when analysing a time series of historical damages from climate extremes (Grinsted et al., 2019; Muller et al., 2025; Pielke Jr et al., 2008). We convert the losses for each year to 2022 values by multiplying the annual losses by a trend factor that accounts for change in exposure over time and inflation by zip code as shown in Equation 2 where represents zip code, represents year, CPI is the Consumer Price Index, and represents the state for the respective zip code following the methodology of Wojtkiewicz and Ramanathan (2020). Since NFIP policy data at the zip code level is only available since 2009, we use the total number of NFIP policies to loss trend the losses pre-2009 using NFIP policy counts from the National Research Council for 2005 to 2008 and the Insurance Information Institute for 1978 to 2004 (NRC, 2015; Hartwig and Wilkinson, 2005).
The burn rates for KatRisk, Verisk, and RMS were calculated as the total AAL per US$1,000 of coverage limit as provided in the NFIP reinsurance placement data (FEMA, 2023). We use the 2023 reinsurance placement data from FEMA for the three catastrophe modelers which is based on data from May 31, 2022 (FEMA, 2023). All three modelers include levees and large-scale flood barriers, but it is less clear if smaller forms of flood mitigation infrastructure such as pumps are included. Additionally, the losses from these three modelers represent gross losses (i.e., actual cash value, deductibles, and coinsurance are accounted for) as well as including demand surge. Gross losses are used rather than ground up losses because the NFIP payouts also represent gross losses. The catastrophe model annual exceedance probability losses published by FEMA are split into two perils: inland and coastal flooding. Only KatRisk and Verisk provide inland flood (i.e., pluvial and fluvial) values while all three modelers provide coastal (i.e., surge) flood values. The catastrophe model burn rate is expressed in Equation 4. We calculate the AAL using the entire loss distribution in addition to using Equation 3 where we limit the AAL calculation up to the 150-year event which corresponds to the highest return period of the annual losses (see Supplementary Materials for methods). We calculate the average burn rate by first adding the average inland losses from KatRisk and Verisk and the average coastal losses from KatRisk, Verisk, and RMS. The average coverage equals the average from all three modelers. Besides the ability to include RMS in the model average, model ensembles are often used by practitioners rather than just one model so we include the model average for this reason as well.
`"L = loss,"`
`"P = Probability of loss for return period,"`
`"i = return period"
`
`
2.1.2. NFIP and First Street Average Relative Number of Properties Flooded Comparison
To understand how well First Street estimates the total number of properties flooded, we compare the average relative number of properties flooded in the NFIP compared to the average relative number of properties flooded as estimated by First Street. The NFIP average relative number of properties flooded is calculated by dividing the average annual number of properties flooded by the total number of policies in force in 2022. The average annual number of properties flooded was estimated by using all claims that were not denied, except those that were denied due to the damage amount being less than the deductible, and if the property was damaged before the inception of the policy. Each year’s total of properties flooded was loss trended to the 2022 exposure levels as described in the previous section. The number of policies in force for 2022 was taken from the NFIP policies database.
The First Street average relative number of properties flooded was calculated by first estimating the properties flooded probability curve from the First Street report, The First National Flood Risk Assessment: Defining America’s Growing Risk (First Street, 2020). All data from First Street used in this analysis represents ground-up physical risk, omits demand surge, includes pluvial, fluvial, and surge losses, and accounts for not only large scale defense infrastructure, such as levees, but smaller scale municipal pumps and green infrastructure. First Street provides the number of properties flooded for the five-year, 100-year, 500-year events on page 8 of the First Street report which are used to estimate the properties flooded probability curve by estimating a power law fit using the three provided data points which resulted in an R2 value of 1.0 (Figure 1). We used a power law curve because it provided a better fit to the data than a logarithmic fit and flood and hurricane losses have been shown to follow the power law (Blackwell, 2014; De Michele et al., 2002). The properties flooded probability curve is used to estimate the 1000-year, 250-year, 150-year, 20-year, and five-year events. These specific return periods were selected to get a complete range of the distribution. The annual average number of properties flooded (AAPF) is calculated using a Riemann sum approximation of AAPF which is expressed in Equation 5. The relative average annual number of properties flooded is then calculated by dividing the AAPF by the total number of properties analysed (approximately 142 million).
`"L = number of properties flooded,"`
`"P = Probability of properties flooded for return period,"`
`"i = return period"
`

Figure 1: The First Street properties flooded probability curve. The black data points are provided by First Street, the orange line is the fitted logarithmic curve, and the blue line is the fitted power curve. The fitted curve equations and R2 values are also shown.
Even though the risk profile of the NFIP vs the First Street exposure dataset is different, there is still value in this comparison. Most of the NFIP locations are inside the Special Flood Hazard Area (SFHA) (i.e., considered the 100-year floodplain) – 57% of residential contracts are in the SFHA whereas the majority of properties in the United States, 96%, are outside the SFHA (FEMA, 2025). Because of this difference, the NFIP exposure is inherently riskier than the First Street exposure. We mitigate some of this discrepancy through normalisation, but we also expect the NFIP annual average relative flooded properties metric to be higher than First Street’s.
2.2. Intercomparison of Catastrophe Model AAL
2.2.1. State-Level Burn Rate
We continue the use of the burn rate metric to compare catastrophe model output across data providers. The First Street state and CONUS values of AAL in this paper are taken from First Street’s 2021 report, The Cost of Climate (First Street, 2021). We will use the 2023 reinsurance placement data published by FEMA for KatRisk, Verisk, and RMS which is based on data from May 31, 2022. We use data from 2022 rather than 2021, which is when the First Street data was published, because we believe it is important to use estimates from the most advanced flood models to date. One example is the improvements in Verisk’s coastal flooding model. The 2021 documentation indicates that the AAL only includes inundation from storm surge and not hurricane precipitation-induced flooding which is included in the 2022 data.
Since the NFIP risk profile is likely different than the entire population of properties in the United States, we limit the analysis to properties in the SFHA. This allows for a direct comparison between First Street and catastrophe models because the sample of properties is similar, and therefore the risk profile is equivalent. For the state-level burn rate comparisons, the First Street AAL is limited to the 100-year event since First Street did not publish state-level AAL estimates for events with greater return periods (e.g., 500-year) in the SFHA. However, the aggregate CONUS AAL comparison does include the 500-year event.
The burn rate calculation is as follows. The First Street burn rates for each state (and CONUS) were calculated using only the properties located in the SFHA. The coverage amount used in the denominator of the burn rate is US$250,000 multiplied by the total number of properties in the SFHA according to the NSI and NFHL. The coverage value of US$250,000 is used since First Street assumes each property has the maximum possible coverage for a policy in the NFIP. Therefore, as shown in Equation 6, the First Street burn rate ends up being the AAL for properties in the SFHA and First Street’s 100-year floodplain divided by US$250 (US$250,000 divided by US$1,000) multiplied by the number of properties located in the SHFA.
`
The burn rate for KatRisk, Verisk, and RMS is calculated in a similar fashion but with several differences. The AAL for each state (and CONUS) is not modified even though the AAL contains commercial properties, residential (>4 units) properties, and contents policies. It is not possible to remove those properties from the AAL as the data is not disaggregated in that manner. The coverage amount used in the denominator of the burn rate is the coverage amount of residential (1-4 units) policies in a SFHA that were in effect at the time of the catastrophe model calculation, May 31, 2022. We use the total residential (1-4 units) coverage to maintain consistency with the First Street analysis. While using only residential coverage for the burn rate will result in a larger burn rate than the true burn rate, this methodological decision was made to ensure a conservative estimate of the burn rate for comparison to the First Street burn rate. The catastrophe model burn rate is expressed in Equation 7.
`
2.2.2. Literature Comparison
Published estimates of AAL from flooding is limited, but two reports using loss estimates from KatRisk can be used as an additional benchmark for comparison with First Street. Evans and Baeder (2022) estimated flood AAL for all single family homes (not just the NFIP portfolio) in North Carolina, New York, and New Jersey using the KatRisk model. They provide total AAL (by multiplying the number of properties by AAL per property) and AAL per property for only damaged properties. The second report, Evans et al. (2020), estimates the flood AAL for residential properties in the United States also using the KatRisk model. These two reports provide both a state level comparison and also a country-wide comparison and represent ground-up losses that include pluvial, fluvial, and surge losses. Details are not provided in the report on whether demand surge and flood defenses are accounted for, but we assume that flood defenses are included since KatRisk has included flood defenses in other simulations, such as for the NFIP. Finally, three catastrophe modelers (i.e., Karen Clark & Company, Aon, and the Florida Public Flood Loss Model (FPFLM)) provide the burn rate of ground-up losses for properties in the state of Florida (a distinct exposure dataset from the NFIP portfolio) within their submissions to the FCHLPM for 2021 (KCC, 2024; Aon, 2024; FIU, 2024). All models consider demand surge, include pluvial, fluvial, and coastal flooding, and incorporate flood defenses albeit to different degrees. All models account for large scale flood control measures, such as levees, but only Aon considers pluvial flood mitigation such as stormwater systems. The burn rates are provided by construction class which are averaged together for this paper. We compare the burn rates from these three models to the Florida burn rates of KatRisk, Verisk, and RMS as reported by the NFIP in their reinsurance placement publications and the NFIP loss rate for Florida.
3. Results
3.1. Comparison of Catastrophe Model and NFIP Claims Data
3.1.1. NFIP Loss Rate and Catastrophe Model Burn Rate Comparison
The mean catastrophe models’ burn rates and historical NFIP loss rate are in close agreement – less than 4% difference. The mean catastrophe model burn rate is US$1.80 while the historical NFIP loss rate is US$1.74 (Table 1). The high similarity between the modeled burn rate and actual loss rate is indicative of the catastrophe models accurately representing average losses from real world flood events. As a sensitivity test, we also provide the NFIP loss rate for the 2000-2024 period which is calculated to be US$2.37. This is 36% higher than the loss rate calculated across the lifetime of the NFIP because prior to 2005 (i.e., Hurricane Katrina), the annual losses were relatively stable. While we account for changes in exposure for flood events in the NFIP record, we are unable to account for flood losses where presently there is exposure, but none existed during a flood event in the past. The historical loss rate will inherently be an underestimate. We also explore the uncertainty around the AAL of the catastrophe models, specifically KatRisk and Verisk (see Supplementary Materials for methods). We find the standard deviation for Verisk to be US$5.39 billion and $5.46 billion for KatRisk. The NFIP AAL is well inside the one standard deviation range of these models’ AAL.
| NFIP | Model Mean | KatRisk | Verisk | ||||
|---|---|---|---|---|---|---|---|
| 150-Year | No Cap | 150-Year | No Cap | 150-Year | No Cap | ||
| AAL (billions) | $2.27 | $2.22 | $3.24 | $1.92 | $2.95 | $2.31 | $3.53 |
| Coverage(billions) | $1,306 | $1,231 | $1,231 | $1,280 | $1,280 | $1,280 | $1,280 |
| LossRate / Burn Rate | $1.74 | $1.80 | $2.63 | $1.50 | $2.31 | $1.80 | $2.75 |
3.1.2. NFIP and First Street Average Relative Number of Properties Flooded Comparison
We compare two different measures of flood frequency. First, we calculate the historical claim rate from the NFIP by dividing the number of past claims – adjusted for current risk levels – by the total number of policies in-force. We then compare this to the observed flood rate from First Street's data, which is the number of properties that have flooded divided by the total number of properties analysed by First Street. Given available data, we are unable to include other catastrophe models in this analysis. Again, we only use annual exceedance probability values from First Street up to the 150-year event because the highest annual loss in the NFIP data was estimated to be a 152-year event. We find that First Street estimates 1.38x more properties flooded compared to the NFIP data (Table 2). These results indicate that First Street appears to be overestimating the number of properties flooded. The relative number of properties flooded calculated from the NFIP data, 0.79%, is 0.67x less than the value reported by Wing et al., (2022) because we have loss trended claims to present day exposure levels while Wing et al., (2022) did not. Without loss trending, we arrive at the same value as Wing et al., (2022), 1.18%.
| NFIP Claims (up to 150-year) | First Street (up to 150-year) | First Street (up to 1,000-year) | |
|---|---|---|---|
| Properties Flooded | 0.79% | 1.09% | 1.14% |
3.2. Intercomparison of Catastrophe Model AAL
3.1.1. State-Level Burn Rate
The state burn rate comparison reveals that, overall, First Street estimates substantially much more damage due to flooding than Verisk, RMS, and KatRisk. For the continental United States (CONUS), the First Street burn rate is 1.5x the mean burn rate across Verisk, RMS, and KatRisk (Table 3). When disaggregated by state, the First Street burn rate is greater for just 17 of the Lower-48 states, as shown in Figure 2. Therefore, the difference between the CONUS burn rates is driven by a few states with large burn rates estimated by First Street and large exposures: Delaware (US$46.80), South Carolina (US$44.01), Washington (US$22.91), Florida (US$20.62), and Georgia (U$15.75). One clear pattern across state burn rate differences is that for coastal states, First Street estimates greater (less) coastal (inland) state burn rates than catastrophe models. The two main exceptions to the pattern are Louisiana and Texas. It is not clear why the First Street burn rates for Delaware and South Carolina are significantly higher than the KatRisk and Verisk burn rates. Additionally, the reason for the large discrepancy for Maine and Louisiana is not evident except for the fact that levees in Louisiana might be represented differently across models and Louisiana contains a large portion of overall NFIP policies which could amplify model differences. Investigation into each model’s software would be required to determine the sources of the discrepancies which was not possible for this work.

Figure 2: Burn rate (AAL per US$1,000 of coverage) of three catastrophe models (First Street, Verisk, and KatRisk) by state. The First Street AAL is limited to the intersection of the SFHA and First Street 100-year event.
| First Street | NFIP Catastrophe Models Mean | KatRisk | Verisk | |
|---|---|---|---|---|
| AAL (billions) | $12.03 | $3.24 | $2.96 | $3.53 |
| Coverage (billions) | $1,029 | $413 | $413 | $413 |
| Properties | 4,118,154 | 1,627,549 | 1,627,549 | 1,627,549 |
| Burn rate | $11.69 | $7.82 | $7.14 | $8.52 |
3.2.2. Literature Comparison for Florida, New Jersey, New York, and North Carolina
The First Street AAL per damage property is 2x greater than the KatRisk AAL from Evans and Baeder (2022) when aggregating all single family homes (not just in the NFIP portfolio) across New Jersey, New York, and North Carolina. North Carolina has the biggest difference where the First Street AAL is 2.8x greater than the Evans and Baeder (2022) AAL (Table 4). AAL per damaged property shows similar discrepancies between the two datasets except for New York where the First Street AAL per damaged property is only 1.1x greater than the Evans and Baeder (2022) estimate. Evans et al. (2020) estimates the total flood AAL for single-family properties in the United States (90 million properties) to be US$7.1 billion using the KatRisk model. The First Street estimate of US$20.3 billion is 2.9x greater than the Evans et al. (2020) value.
| Source | Total AAL (millions) | Total Properties | AAL per damaged property | |
|---|---|---|---|---|
| New Jersey | First Street | $415 | 94,146 | $4,412 |
| EB (2022) | $213 | 120,306 | $1,679 | |
| New York | First Street | $557 | 161,489 | $3,447 |
| EB (2022) | $322 | 135,216 | $3,126 | |
| North Carolina | First Street | $487 | 151,331 | $3,219 |
| EB (2022) | $172 | 290,542 | $1,211 |
In their flood model submissions to the FCHLPM, for which each modeler uses a distinct exposure dataset and separate from the NFIP portfolio, Karen Clark & Company reports a burn rate of US$0.17, the Florida Public Flood Loss Model (FPFLM) reports a burn rate of US$1.81, and Aon reports a burn rate of US$2.20 (KCC, 2024; FIU, 2024; Aon, 2024) (Table 5). The burn rates from the FCHLPM model submissions are lower than those reported by KatRisk and Verisk for the NFIP portfolio in Florida, $2.75 and $2.26, respectively. However, the burn rate of Aon is highly similar, 3% less than the Verisk burn rate. The burn rate reported by Karen Clark & Company is a significant outlier from the other catastrophe models – more than 13x less than the Verisk burn rate. While each modeler created their own exposure dataset, all use NFIP policy data which underwrites more than 90% of the residential flood insurance market. Given the same source for the different exposure datasets and the normalisation of expected losses with respect to the exposure dataset, the comparison between modelers is valid.
| FPFLM | KCC | Aon | KatRisk | Verisk | |
|---|---|---|---|---|---|
| Florida Burn Rate | $1.81 | $0.17 | $2.20 | $2.75 | $2.26 |
4. Discussion and Conclusion
We are unable to compare the First Street, FPFLM, Aon, and Karen Clark & Company burn rates to historical data, but we establish that the FPFLM and Aon burn rates are highly similar to the burn rates of KatRisk, Verisk, and RMS. The Karen Clark & Company burn rate is much lower than peer catastrophe models – 13x lower than Verisk – while the First Street burn rate is higher than the KatRisk, Verisk, and RMS burn rates by at least 1.5x in the SFHA and in all areas by at least 2x when compared to the KatRisk model according to Evans et al. (2020) and Evans and Baeder (2022). Verisk and KatRisk burn rates are highly similar to the NFIP historical loss rate which leads us to conclude that the First Street burn rate appears to be inflated while Karen Clark & Company’s burn rate appears to be an underestimate.
One clear reason for the discrepancy between the loss rate of Karen Clark & Company and other models is the depth-damage functions utilised by each model. The functions used by Karen Clark & Company associate less damage for various water depths compared to Aon and FPFLM, as shown in Figure S.2 in the Supplementary Information. For inland depth-damage, Karen Clark & Company assumes damages are limited to 56.8% of the exposure value starting at 15 ft of water depth while FPFLM assumes 72% and Aon assumes 73.7% of the exposure value for 15 ft of water depth. A similar pattern is seen for the coastal depth-damage functions. Another possible source of discrepancy is the flood modeling methods. Karen Clark & Company does not use a hydrodynamic model to simulate storm surge but rather a connected components “bathtub” approach with statistically generated peak storm surge heights (KCC, 2024). This significant methodological difference between KCC and Aon/FPFLM likely contributes to the difference in loss rates.
The First Street estimated number of flooded properties is 1.38x higher than what is reflected in the NFIP claims data which likely contributes to the burn rate discrepancies between First Street and other modelers. In fact, because of the difference in risk profile of the exposure datasets between First Street and the NFIP, we expect the NFIP to report a higher annual average relative number of properties flooded. This is because the NFIP policies are concentrated in the SFHA – 57% of residential contracts are in the SFHA whereas the majority of properties in the United States, 96%, are outside the SFHA (FEMA, 2025). Therefore, we would expect the policies in the NFIP to be flooded at a higher rate than the properties examined by First Street. Since we show the opposite, the data suggests the First Street overestimate is higher than presented here.
Other possible explanations for the data suggesting that First Street overestimates AAL include an overestimation of flood depth impacting properties that are inundated, inflated replacement cost values of structures, unrealistic depth-damage functions, or a combination of these reasons. There are five caveats in the comparison between First Street and the catastrophe models used by the NFIP (i.e., KatRisk, Verisk, and RMS). These five caveats involve the use of actual cash value, demand surge, subsetting for residential properties, gross vs ground-up losses, and coverage limit and are discussed in detail in the Supplementary Information. Three of these caveats, (e.g., demand surge, subsetting for residential properties, and coverage limit) would cause the First Street burn rates to be lower than the catastrophe models’ metrics while two (e.g., actual cash value and gross vs ground-up losses) cause the inverse. The subsetting for residential properties and coverage limit methodological choices were made to push the KatRisk, Verisk, and RMS burn rates closer to the First Street burn rate, and yet the First Street burn rate is still considerably higher than its catastrophe modeling peers. Therefore, this methodology gives us high confidence in our results showing that the First Street CONUS burn rate appears to be overestimated.
A 2024 publication by First Street provides a window into their most recently calculated loss metrics. First Street states that there are 17.9 million properties (residential and commercial) within the 100-year flood zone, a 23% increase from their 2020 report, The First National Flood Risk Assessment (First Street, 2020; First Street, 2024a). Additionally, they state that AAL from flooding is estimated at US$61.5 billion. Even though this value includes commercial properties and includes inflation, the 2024 AAL is an increase of more than 200% compared to the 2021 residential estimate. We believe that the current First Street AAL also overestimates compared to historical data and other catastrophe models based on the findings in this paper. First Street has expanded their risk model to incorporate correlated risk across properties and perils, but AAL is a correlation-independent risk metric and therefore has no bearing on the results shown in this paper (First Street, 2024b).
It is important to mention that the historical loss data used in this paper is a single sample of a distribution and the sample may not be representative of the long-term average. We use the historical sample as it is the best available way to validate catastrophe models, but practitioners should note the uncertainty around such an exercise. Additionally, changes in the built environment and flood defense also add uncertainty to the results and is something the historical loss normalisation method does not account for. A historical flood event may not be equivalent in magnitude as if it happened today due to these flood defenses. Conversely, climate change has led to an increase in the hazard component of risk. A flood event that happened later in the historical record would likely have been less intense at the beginning of the historical record.
While five of the seven models analysed here show similar burn/loss rates and compare well to observed loss rates at the state and national level, this does not signify that loss rates are similar across models or accurate at higher spatial resolutions. As Chegwidden et al. (2024) showed in their analysis of two climate risk data providers, regional agreement does not translate into property-level agreement. We find similar disagreements at the zip-code level between Aon, Karen Clark & Company, and FPFLM. The highest r-squared value across the three models is between Aon and FPFLM (0.47) and the lowest is between Aon and Karen Clark & Company (0.10) as shown in Figure S.3. Scatter plots of loss rates between models for zip-codes across Florida show a significant amount of disagreement where a loss rate for one model is close to US$0 while almost US$100 for a different model for the same zip-code. Evidently, models can vary widely at the zip code level while having similar aggregate damage estimates. This might be appropriate for insurance pricing within a risk pool, but not for individuals or asset-managers looking for property level risk information.
Not all climate risk products are created equal. As their usage increases across sectors and within the general public, these products require rigorous verification. First Street has been a leader in disclosing their methodologies and disseminating climate risk information. Still, our findings strengthen our argument that climate risk products require public validation to be a beneficial service. We recognise that this analysis is incomplete due to the lack of publicly available data from catastrophe modelers. Since catastrophe models have such a large impact on home prices, local government revenues, the financial system, and drive household-level risk decisions, catastrophe model outputs should be more publicly accessible to foster robust validation of these models.
Data and Code Availability
- The National Structures Inventory can be accessed at https://www.hec.usace.army.mil/confluence/nsi.
- The National Flood Hazard Layer can be downloaded by state at https://msc.fema.gov/portal/advanceSearch.
- The NFIP policies data can be downloaded here: https://www.fema.gov/openfema-data-page/fima-nfip-redacted-policies-v2.
- The NFIP claims data can be downloaded here: https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2.
- The NFIP reinsurance placement data can be accessed from: https://www.fema.gov/about/openfema/data-sets/national-flood-insurance-program-nfip-reinsurance-placement-information.
- The disclosures from the Florida Commission on Hurricane Loss Projection Methodology can be found here: https://fchlpm.sbafla.com/model-submissions/flood-model-submissions/.
- The code used for the analysis and creation of figures in this paper can be found at this GitHub repository: https://github.com/ddusseau/CatModel-Validation.
Supplementary Figures
Available as a separate PDF; download here.
References
Declarations
Handling Editor: Cameron Rye, Director of Natural Catastrophe Analytics, Willis Re
The Journal of Catastrophe Risk and Resilience would like to thank Cameron Rye for his role as Handling Editor throughout the peer-review process for this article. We would also like to extend our thanks to the chosen academic reviewers and to Guy Carpenter for sharing their expertise and time while undertaking the peer review of this article.
Received: 17th June 2025
Accepted: 15th January 2026
Published: 20th February 2026
Rights and Permissions
Access: This article is Diamond Open Access.
Licencing: Attribution 4.0 International (CC BY 4.0)
DOI: 10.63024/k6yh-a2v6
Article Number: 04.01
ISSN: 3049-7604
Copyright: Copyright remains with the author, and not with the Journal of Catastrophe Risk and Resilience.
Article Citation Details
Dusseau, D., et al., 2026. Validation and Comparison of U.S. Loss Estimates from Catastrophe Flood Models, Journal of Catastrophe Risk and Resilience, (2026). https://doi.org/10.63024/k6yh-a2v6
Share this article: https://journalofcrr.com/research/04-01-Dusseau-et-al/



