Stop using zip codes for geospatial analysis (2019) by voxadam

ByHackTech February 7, 2025

Share This Article

Sed ut perspiciatis unde.

The last time you used your zip code, you were most likely entering your address into a website to make a purchase, finding a store near your home or office, or filling out some other online form. You likely found the answer you were looking for and didn’t stop to think further about that five-digit code you’d just typed out.

However, lots of companies, marketers and data analysts spend hours looking at zip codes. They are deciding how to use data tied to those zip codes to understand trends, run their businesses, and find new ways to reach you – all using that same five-digit code.

Even though there are different place associations that probably mean more to you as an individual, such as a neighborhood, street, or the block you live on, the zip code is, in many organizations, the geographic unit of choice. It is used to make major decisions for marketing, opening or closing stores, providing services, and making decisions that can have a massive financial impact.

The problem is that zip codes are not a good representation of real human behavior, and when used in data analysis, often mask real, underlying insights, and may ultimately lead to bad outcomes. To understand why this is, we first need to understand a little more about the zip code itself.

Looking to learn more about Spatial Data Science? Check out our range of Spatial Data Science events!

The Zip Code: A Brief History

The predecessor to the zip code was the postal zone which represented a post office department for a specific city. For example:

Mr. John Smith
3256 Epiphenomenal Avenue
Minneapolis 16 Minnesota

“16” represents the postal zone in a Minneapolis. But with more and more mail being sent, in 1963 the Postal Service decided to roll out the Zone Improvement Plan which transformed addresses to look like the following:

Mr. John Smith
3256 Epiphenomenal Avenue
Minneapolis MN 55416

The five digit code represents a part of the country (5_ _ _ _ ) a sectional center facility ( _ 5 4 _ _ ) and the associate post office or delivery area (_ _ _ 1 6).

The first digit for every zip code for the states in the contiguous United States

By 1967 ZIP codes were made mandatory for bulk mailers and continued to be adopted by almost anyone sending mail in the US. Over time, the ZIP+4 was added to add more granularity to the zip code to denote specific locations even buildings for postal workers to deliver. The Postal Service even created a character, Mr. Zip, to promote the use of ZIP codes, who was featured on stamps, commercials and songs.

ZIP codes themselves do not actually represent an area, rather a collection of routes:

Despite the geographic derivation of most ZIP Codes, the codes themselves do not represent geographic regions; in general, they correspond to address groups or delivery routes. As a consequence ZIP Code “areas” can overlap be subsets of each other, or be artificial constructs with no geographic area (such as 095 for mail to the Navy which is not geographically fixed). In similar fashion, in areas without regular postal routes (rural route areas) or no mail delivery (undeveloped areas) ZIP Codes are not assigned or are based on sparse delivery routes and hence the boundary between ZIP Code areas is undefined.

The US Census provides data for ZIP Code Tabulation Areas, or geographic files:

ZIP Code Tabulation Areas (ZCTAs) are generalized areal representations of United States Postal Service (USPS) ZIP Code service areas.The USPS ZIP Codes identify the individual post office or metropolitan area delivery station associated with mailing addresses. USPS ZIP Codes are not areal features but a collection of mail delivery routes.

Here we find our first problem with ZIP Codes, that they do not represent an actual area on a map, but rather a collection of routes that help postal workers effectively deliver mail. They aren’t designed to measure sociodemographic trends as a business would generally want to do. You can actually look up individual delivery routes like the one below:

One zip code route in New York’s East Village

We are only scratching the surface of the issue here. Similar issues exist around the world, with postal codes representing strange boundaries

Postal Codes in London, Toronto, and Sydney.

Using ZIP codes for data analysis

Fast forward to today, where many companies can easily look into their database and find a dataset with a zip_code column in it, which allows them to group and aggregate data to see trends and business performance metrics. As stated earlier, the problem with ZIP Codes is that:

They don’t represent real boundaries but rather routes
They don’t represent how humans behave

The later represents two specific issues in using spatial data: spatial scale of observations and spatial scale support (you can learn more about this in this lecture from UChicago’s Luc Anselin, here). The first is that humans don’t behave based on administrative units such as zip codes, or even census units. Their behavior is influenced much more by their neighbors, or areas such as a neighborhoods or high activity areas (such as central business districts). The second is that spatial data is provided a

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (33)

33 Comments

Post Author

hammock

Posted February 7, 2025 at 5:24 pm

Great article. Zip codes can be super expedient. But you have to be self aware that for many uses cases they function WORSE than a random grid. Because they have built-in aggregation of a central post office(and surrounding) with a certain radius of rural/less dense surrounding.

So for example, if you are sorting “rural zips” vs “urban zips” it will only take you so far, and may actually be harmful.

Same goes with MSAs/DMAs (media markets). These have to be used for buying media, but for geospatial analysis they are suboptimal for the same reasons.

Easiest way to dip your toe into the water of something better is to start with A-D census counties.

0Likes Log in to Reply
Post Author

jihadjihad

Posted February 7, 2025 at 5:29 pm

To put it in plain mathematical language, ZIP codes are not defined as polygons [0]. The consequence is that performing any analysis with an assumption that ZIP codes are polygons is bound to be error-prone.

0: https://manifold.net/doc/mfd8/zip_codes_are_not_areas.htm

0Likes Log in to Reply
Post Author

mattforrest

Posted February 7, 2025 at 5:34 pm

Funny to see this one pop up today (I wrote this one way back when) but I just refreshed it into a video on my channel: https://www.youtube.com/watch?v=x-opv4REEic

0Likes Log in to Reply
Post Author

nancyp

Posted February 7, 2025 at 5:34 pm

Instead of zip use the following?

Use Addresses
Use Census Units
Use your own Spatial Index

Why not lat, long?

0Likes Log in to Reply
Post Author

ajfriend

Posted February 7, 2025 at 5:36 pm

…and use H3 instead! https://h3geo.org/

0Likes Log in to Reply
Post Author

jpjoi

Posted February 7, 2025 at 5:38 pm

Zip codes are just weird to use for anything other than mail in general because they’re set up based off infrastructure.

CGP Grey has a great video on this: https://m.youtube.com/watch?v=1K5oDtVAYzk

0Likes Log in to Reply
Post Author

paulddraper

Posted February 7, 2025 at 5:38 pm

People use ZIP codes because they have ZIP codes.

No one has census blocks.

And coordinates can work but lack some inherent advantages, such as human readability and a semblance of pop density normalization.

0Likes Log in to Reply
Post Author

funkaster

Posted February 7, 2025 at 5:48 pm

If you want to learn a bit more, there was a recent, really good Planet Money episode[1] about this exact same topic. They focus on the problems that you might face when using zip code for demographic analysis.

[1]: https://www.npr.org/2025/01/08/1223466587/zip-code-history

0Likes Log in to Reply
Post Author

throw0101c

Posted February 7, 2025 at 5:52 pm

CGP Grey recently posted a video on Zip codes, "The Hidden Pattern in Post Codes":

* https://www.youtube.com/watch?v=1K5oDtVAYzk

0Likes Log in to Reply
Post Author

jonas21

Posted February 7, 2025 at 6:05 pm

ZIP codes are an emergent property of the mail delivery system. While the author might consider this a bad thing, this makes them "good enough" on multiple axes in practice. They tend to be:

– Well-known (everybody knows their zip code)

– Easily extracted (they're part of every address, no geocoding required)

– Uniform-enough (not perfect, but in most cases close)

– Granular-enough

– Contiguous-enough by travel time

Notably, the alternatives the author proposes all fail on one or more of these:

– Census units: almost nobody knows what census tract they live in, and it can be non-trivial to map from address to tract

– Spatial cells: uneven distribution of population, and arbitrary division of space (boundaries pass right through buildings), and definitely nobody knows what S2 or H3 cell they live in.

– Address: this option doesn't even make sense. Yes, you can geocode addresses, but you still need to aggregate by something.

0Likes Log in to Reply
Post Author

trgn

Posted February 7, 2025 at 6:28 pm

First the mercator projection, now they're coming after the zip codes.

0Likes Log in to Reply
Post Author

serjester

Posted February 7, 2025 at 6:30 pm

H3 is awesome here! What I don't think many people realize is that H3 cells and normal geographic data (like zips) are not mutually exclusive. You can take zip outlines, and find all the h3 cells within them and allocate your metric accordingly (population, income, etc).

This makes joining disparate data sources quite easy. And this also lets you do all sorts of cool stuff like aggregations, smoothing, flow modeling, etc.

We do some geospatial stuff and I wrote a polars plugin to help with this a while back [1].

[1] https://github.com/Filimoa/polars-h3

0Likes Log in to Reply
Post Author

agtech_andy

Posted February 7, 2025 at 6:39 pm

Zip codes are great for anything with delivery logistics.

Anything else is a loose correlation at best, that will likely change over time.

0Likes Log in to Reply
Post Author

PLenz

Posted February 7, 2025 at 6:55 pm

I gave a talk at DataEngConf many years ago: https://www.datacouncil.ai/talks/zip-codes-and-other-lies-yo…

0Likes Log in to Reply
Post Author

zuhayeer

Posted February 7, 2025 at 7:25 pm

This is interesting since zip codes came up in consideration for how we built out our pay choropleth map in the US: https://levels.fyi/heatmap

Though ultimately it was far too granular (for example the Bay Area would be so many different zip codes). Instead we went with Nielsen's DMA (Designated Market Area) mappings within the US to abstract aggregated data a bit better. And of course this DMA dataset also had a different original use case. It was used for TV / media market surveys so it has some weird vestiges. Some regions are grouped very far and wide (you'll notice there's a bit of Denver within Nevada and its just a remnant of how it used to be categorized), but it still provides a bit of a broader level grouping than something acute like zip code.

I do like this map from the article though and the granularity you can get with zip code when zooming: https://clausa.app.carto.com/map/29fd0873-64cb-42a6-a90d-c83…

We've also been considering using Combined Statistical Areas using population instead. This is something that is under way, and in the interim we've considered charting styles that don't necessarily need borders (for example this bubble map: https://www.levels.fyi/bubble-plot/europe/). The benefit with DMAs is that it offers full border coverage of the entire US whereas some hubs can still be missing from CSAs if relying on a population threshold. But the plan is to create some of our own regional definitions and borders using our own submissions combined with population. Will be an interesting project.

GeoJSON data for the map borders: https://github.com/PublicaMundi/MappingAPI/blob/master/data/…

Nielsen DMA regions: https://blocks.roadtolarissa.com/simzou/6459889

0Likes Log in to Reply
Post Author

dhunter_mn

Posted February 7, 2025 at 7:27 pm

I used to work for a company that basically merged USPS and Census Bureau data on a monthly basis. The output would be a roadbase that was optimized for address ranges on road segments. ZIP Codes were extra fun to work with.

0Likes Log in to Reply
Post Author

eterevsky

Posted February 7, 2025 at 7:31 pm

ZIP codes are a simple approximation, which does their job good enough in most cases.

The alternatives that the author suggests are much more complicated, both in terms of the implementation and in terms of convincing the user to give you their full address.

0Likes Log in to Reply
Post Author

ej1

Posted February 7, 2025 at 7:38 pm

[flagged]

0Likes Log in to Reply
Post Author

Zamicol

Posted February 7, 2025 at 7:39 pm

I wrote the blackout system for Comcast TV scheduling. My understanding was that blackouts were used mostly for sports where games need to be available in one area and not others. Contractually, they were required to use zip codes, so I used the US Post office's zip code data to enforce blackouts.

0Likes Log in to Reply
Post Author

ej1

Posted February 7, 2025 at 7:40 pm

[flagged]

0Likes Log in to Reply
Post Author

lacoolj

Posted February 7, 2025 at 7:41 pm

For anyone curious, here is the official US Gov list of ZIP codes in CSV with lots of helpful related data (longitude, latitude, etc.)

http://federalgovernmentzipcodes.us/free-zipcode-database-Pr…

0Likes Log in to Reply
Post Author

ivell

Posted February 7, 2025 at 7:43 pm

India is experimenting with Digipin https://www.indiapost.gov.in/Navigation_Documents/Static_Nav…

Which is derived from longitude and latitude..

0Likes Log in to Reply
Post Author

mannyv

Posted February 7, 2025 at 7:43 pm

Zip codes, zctas, and tiger/line are good enough for what most people need. Maybe you can find an edge by using something more granular…but I'm not sure what edge you'd be looking to get with geodata. Maybe for real estate trends and/or market analysis?

0Likes Log in to Reply
Post Author

honestSysAdmin

Posted February 7, 2025 at 7:57 pm

[dead]

0Likes Log in to Reply
Post Author

mmmlinux

Posted February 7, 2025 at 7:59 pm

Can anyone tell me why I have to enter both my city / state and a zip code. shouldn't one or the other of those plus my street address be enough information?

0Likes Log in to Reply
Post Author

ubermonkey

Posted February 7, 2025 at 8:00 pm

I'm reminded of this:

https://www.npr.org/2004/04/01/1805651/post-office-calls-for…

0Likes Log in to Reply
Post Author

freyfogle

Posted February 7, 2025 at 8:24 pm

There are many problems with zip codes / postal codes but the biggest two we see are:

a. Excel treats them as numbers instead of strings of digits and thus drops the leading 0

b. Developers make assumptions about postal codes based on how they work (or more usually how the developer incorrectly thinks they work) in their own country and these assumptions absolutely do NOT hold in other countries.

A relevant guide to geocoding and postal codes: https://opencagedata.com/guides/how-to-think-about-postcodes…

0Likes Log in to Reply
Post Author

0xbadcafebee

Posted February 7, 2025 at 8:30 pm

Here's a recent podcast about why ZIP codes are not great for analysis: https://www.npr.org/2025/01/08/1223466587/zip-code-history

0Likes Log in to Reply
Post Author

spankalee

Posted February 7, 2025 at 8:32 pm

Unironically, what a great sales blog post!

It's so well written and informative that I completely didn't mind the "and here's how to do it in Carto" bit in the middle. Instead I thought they earned it.

0Likes Log in to Reply
Post Author

Anon84

Posted February 7, 2025 at 8:36 pm

This is an example of the well known Modifiable Areal Unit problem: https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem In general, your statistics depend on how you define your areas and you will get different pictures with different definitions.

0Likes Log in to Reply
Post Author

flappyeagle

Posted February 7, 2025 at 8:37 pm

No

0Likes Log in to Reply
Post Author

JackFr

Posted February 7, 2025 at 9:07 pm

When doing your first ML project, zip codes are unsurpassed in providing a set of hand written digits to train on.

0Likes Log in to Reply
Post Author

OriPekelman

Posted February 7, 2025 at 10:12 pm

Well funny story, some twenty something years ago I actually worked on an election cycle volunteer infra thing in France, and living in Paris which is department 75 and therefore 750xx the prefecture being 75000 I assumed it was neatly hierarchical 75004 won't be far away from 75003 (true)… The French thing being orderly and rational.

I didn't need much precision so truncating seemed an easy way to group stuff.

Oh the surprise. I never again made such assumptions, let's just say I should have gotten a clue from Corsica being 2A and 2B.

0Likes Log in to Reply

Stop using zip codes for geospatial analysis (2019) by voxadam

Stop using zip codes for geospatial analysis (2019) by voxadam

Share This Article

Newsletter

The Zip Code: A Brief History

Using ZIP codes for data analysis

33 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter