Skip to content Skip to footer
0 items - $0.00 0

Stop using zip codes for geospatial analysis (2019) by voxadam

Stop using zip codes for geospatial analysis (2019) by voxadam

Stop using zip codes for geospatial analysis (2019) by voxadam

33 Comments

  • Post Author
    hammock
    Posted February 7, 2025 at 5:24 pm

    Great article. Zip codes can be super expedient. But you have to be self aware that for many uses cases they function WORSE than a random grid. Because they have built-in aggregation of a central post office(and surrounding) with a certain radius of rural/less dense surrounding.

    So for example, if you are sorting “rural zips” vs “urban zips” it will only take you so far, and may actually be harmful.

    Same goes with MSAs/DMAs (media markets). These have to be used for buying media, but for geospatial analysis they are suboptimal for the same reasons.

    Easiest way to dip your toe into the water of something better is to start with A-D census counties.

  • Post Author
    jihadjihad
    Posted February 7, 2025 at 5:29 pm

    To put it in plain mathematical language, ZIP codes are not defined as polygons [0]. The consequence is that performing any analysis with an assumption that ZIP codes are polygons is bound to be error-prone.

    0: https://manifold.net/doc/mfd8/zip_codes_are_not_areas.htm

  • Post Author
    mattforrest
    Posted February 7, 2025 at 5:34 pm

    Funny to see this one pop up today (I wrote this one way back when) but I just refreshed it into a video on my channel: https://www.youtube.com/watch?v=x-opv4REEic

  • Post Author
    nancyp
    Posted February 7, 2025 at 5:34 pm

    Instead of zip use the following?

    Use Addresses
    Use Census Units
    Use your own Spatial Index

    Why not lat, long?

  • Post Author
    ajfriend
    Posted February 7, 2025 at 5:36 pm

    …and use H3 instead! https://h3geo.org/

  • Post Author
    jpjoi
    Posted February 7, 2025 at 5:38 pm

    Zip codes are just weird to use for anything other than mail in general because they’re set up based off infrastructure.

    CGP Grey has a great video on this: https://m.youtube.com/watch?v=1K5oDtVAYzk

  • Post Author
    paulddraper
    Posted February 7, 2025 at 5:38 pm

    People use ZIP codes because they have ZIP codes.

    No one has census blocks.

    And coordinates can work but lack some inherent advantages, such as human readability and a semblance of pop density normalization.

  • Post Author
    funkaster
    Posted February 7, 2025 at 5:48 pm

    If you want to learn a bit more, there was a recent, really good Planet Money episode[1] about this exact same topic. They focus on the problems that you might face when using zip code for demographic analysis.

    [1]: https://www.npr.org/2025/01/08/1223466587/zip-code-history

  • Post Author
    throw0101c
    Posted February 7, 2025 at 5:52 pm

    CGP Grey recently posted a video on Zip codes, "The Hidden Pattern in Post Codes":

    * https://www.youtube.com/watch?v=1K5oDtVAYzk

  • Post Author
    jonas21
    Posted February 7, 2025 at 6:05 pm

    ZIP codes are an emergent property of the mail delivery system. While the author might consider this a bad thing, this makes them "good enough" on multiple axes in practice. They tend to be:

    – Well-known (everybody knows their zip code)

    – Easily extracted (they're part of every address, no geocoding required)

    – Uniform-enough (not perfect, but in most cases close)

    – Granular-enough

    – Contiguous-enough by travel time

    Notably, the alternatives the author proposes all fail on one or more of these:

    – Census units: almost nobody knows what census tract they live in, and it can be non-trivial to map from address to tract

    – Spatial cells: uneven distribution of population, and arbitrary division of space (boundaries pass right through buildings), and definitely nobody knows what S2 or H3 cell they live in.

    – Address: this option doesn't even make sense. Yes, you can geocode addresses, but you still need to aggregate by something.

  • Post Author
    trgn
    Posted February 7, 2025 at 6:28 pm

    First the mercator projection, now they're coming after the zip codes.

  • Post Author
    serjester
    Posted February 7, 2025 at 6:30 pm

    H3 is awesome here! What I don't think many people realize is that H3 cells and normal geographic data (like zips) are not mutually exclusive. You can take zip outlines, and find all the h3 cells within them and allocate your metric accordingly (population, income, etc).

    This makes joining disparate data sources quite easy. And this also lets you do all sorts of cool stuff like aggregations, smoothing, flow modeling, etc.

    We do some geospatial stuff and I wrote a polars plugin to help with this a while back [1].

    [1] https://github.com/Filimoa/polars-h3

  • Post Author
    agtech_andy
    Posted February 7, 2025 at 6:39 pm

    Zip codes are great for anything with delivery logistics.

    Anything else is a loose correlation at best, that will likely change over time.

  • Post Author
    PLenz
    Posted February 7, 2025 at 6:55 pm

    I gave a talk at DataEngConf many years ago: https://www.datacouncil.ai/talks/zip-codes-and-other-lies-yo…

  • Post Author
    zuhayeer
    Posted February 7, 2025 at 7:25 pm

    This is interesting since zip codes came up in consideration for how we built out our pay choropleth map in the US: https://levels.fyi/heatmap

    Though ultimately it was far too granular (for example the Bay Area would be so many different zip codes). Instead we went with Nielsen's DMA (Designated Market Area) mappings within the US to abstract aggregated data a bit better. And of course this DMA dataset also had a different original use case. It was used for TV / media market surveys so it has some weird vestiges. Some regions are grouped very far and wide (you'll notice there's a bit of Denver within Nevada and its just a remnant of how it used to be categorized), but it still provides a bit of a broader level grouping than something acute like zip code.

    I do like this map from the article though and the granularity you can get with zip code when zooming: https://clausa.app.carto.com/map/29fd0873-64cb-42a6-a90d-c83…

    We've also been considering using Combined Statistical Areas using population instead. This is something that is under way, and in the interim we've considered charting styles that don't necessarily need borders (for example this bubble map: https://www.levels.fyi/bubble-plot/europe/). The benefit with DMAs is that it offers full border coverage of the entire US whereas some hubs can still be missing from CSAs if relying on a population threshold. But the plan is to create some of our own regional definitions and borders using our own submissions combined with population. Will be an interesting project.

    GeoJSON data for the map borders: https://github.com/PublicaMundi/MappingAPI/blob/master/data/…

    Nielsen DMA regions: https://blocks.roadtolarissa.com/simzou/6459889

  • Post Author
    dhunter_mn
    Posted February 7, 2025 at 7:27 pm

    I used to work for a company that basically merged USPS and Census Bureau data on a monthly basis. The output would be a roadbase that was optimized for address ranges on road segments. ZIP Codes were extra fun to work with.

  • Post Author
    eterevsky
    Posted February 7, 2025 at 7:31 pm

    ZIP codes are a simple approximation, which does their job good enough in most cases.

    The alternatives that the author suggests are much more complicated, both in terms of the implementation and in terms of convincing the user to give you their full address.

  • Post Author
    ej1
    Posted February 7, 2025 at 7:38 pm

    [flagged]

  • Post Author
    Zamicol
    Posted February 7, 2025 at 7:39 pm

    I wrote the blackout system for Comcast TV scheduling. My understanding was that blackouts were used mostly for sports where games need to be available in one area and not others. Contractually, they were required to use zip codes, so I used the US Post office's zip code data to enforce blackouts.

  • Post Author
    ej1
    Posted February 7, 2025 at 7:40 pm

    [flagged]

  • Post Author
    lacoolj
    Posted February 7, 2025 at 7:41 pm

    For anyone curious, here is the official US Gov list of ZIP codes in CSV with lots of helpful related data (longitude, latitude, etc.)

    http://federalgovernmentzipcodes.us/free-zipcode-database-Pr…

  • Post Author
    ivell
    Posted February 7, 2025 at 7:43 pm

    India is experimenting with Digipin https://www.indiapost.gov.in/Navigation_Documents/Static_Nav…

    Which is derived from longitude and latitude..

  • Post Author
    mannyv
    Posted February 7, 2025 at 7:43 pm

    Zip codes, zctas, and tiger/line are good enough for what most people need. Maybe you can find an edge by using something more granular…but I'm not sure what edge you'd be looking to get with geodata. Maybe for real estate trends and/or market analysis?

  • Post Author
    honestSysAdmin
    Posted February 7, 2025 at 7:57 pm

    [dead]

  • Post Author
    mmmlinux
    Posted February 7, 2025 at 7:59 pm

    Can anyone tell me why I have to enter both my city / state and a zip code. shouldn't one or the other of those plus my street address be enough information?

  • Post Author
    ubermonkey
    Posted February 7, 2025 at 8:00 pm
  • Post Author
    freyfogle
    Posted February 7, 2025 at 8:24 pm

    There are many problems with zip codes / postal codes but the biggest two we see are:

    a. Excel treats them as numbers instead of strings of digits and thus drops the leading 0

    b. Developers make assumptions about postal codes based on how they work (or more usually how the developer incorrectly thinks they work) in their own country and these assumptions absolutely do NOT hold in other countries.

    A relevant guide to geocoding and postal codes: https://opencagedata.com/guides/how-to-think-about-postcodes…

  • Post Author
    0xbadcafebee
    Posted February 7, 2025 at 8:30 pm

    Here's a recent podcast about why ZIP codes are not great for analysis: https://www.npr.org/2025/01/08/1223466587/zip-code-history

  • Post Author
    spankalee
    Posted February 7, 2025 at 8:32 pm

    Unironically, what a great sales blog post!

    It's so well written and informative that I completely didn't mind the "and here's how to do it in Carto" bit in the middle. Instead I thought they earned it.

  • Post Author
    Anon84
    Posted February 7, 2025 at 8:36 pm

    This is an example of the well known Modifiable Areal Unit problem: https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem In general, your statistics depend on how you define your areas and you will get different pictures with different definitions.

  • Post Author
    flappyeagle
    Posted February 7, 2025 at 8:37 pm

    No

  • Post Author
    JackFr
    Posted February 7, 2025 at 9:07 pm

    When doing your first ML project, zip codes are unsurpassed in providing a set of hand written digits to train on.

  • Post Author
    OriPekelman
    Posted February 7, 2025 at 10:12 pm

    Well funny story, some twenty something years ago I actually worked on an election cycle volunteer infra thing in France, and living in Paris which is department 75 and therefore 750xx the prefecture being 75000 I assumed it was neatly hierarchical 75004 won't be far away from 75003 (true)… The French thing being orderly and rational.

    I didn't need much precision so truncating seemed an easy way to group stuff.

    Oh the surprise. I never again made such assumptions, let's just say I should have gotten a clue from Corsica being 2A and 2B.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.