I work with locations and addresses on an international system. They are not anywhere near as standardized as to allow this. And also at scale, you'll get things that seem like they shouldn't be addresses but are.
One of our addresses that caused trouble is literally: "The yellow sign across the street from the Seven-Eleven at <reasonable address>".
We have one address that's legally in two countries at the same time.
One address is just a whole city. Like the entirety of the city, but also it still needs to be considered a separate place from the city.
It of course depends on your use cases and etc. But I find addresses can be like storing names in a lot of contexts. i.e. Just take the bytes the user gives you and alert them if some service downstream complains, but don't require they change it to meet your requirements.
There are systems to help offer standardized addresses and you can display them as suggestions to the user. But sometimes you get a multi billion dollar company telling you "Maybe the address is legally X, but the bus stops at the yellow sign across the street, and we get 30 customers calling for refunds every week because they didn't get on the bus. So either accommodate this change or we'll need to find a new partner."
I've come around to this view, too. Also like names, the best solution is to avoid using them for analysis. Run them through an address geocoding service and store the coordinates next to the address. Use the original address for sending mail or filling out forms, and coordinates for analysis.
Unless cleaning, parsing, and geocoding addresses is one of your core business values, let somebody else do it. It's a lot of work that's never really finished. Find a good service, hand them your garbage addresses, and feel confident they'll do better than you could.
This is why two successive Deliveroo drivers went to the wrong street this week. The first 'tried to deliver' and gave up, and I had to run after the replacement one.
Despite the address and postcode being both correct and unambiguous the driver followed a pin to a badly geocoded coordinate, and didn't even look at the street name.
I'd encourage anyone (especially Deliveroo) to make more effort to model addresses in the correct locale (UK in my case) rather than taking these shortcuts.
I agree. We’ve built automation that ingests geocoding data from multiple services, and keeps the system up to date. The models for an “Address“ get complex, a single entity can have many different types of addresses.
I understand these issues. That's precisely why a custom type seems appropriate -- it should be able to cover all the alternatives while not burdening you with the problem of storing a discriminated union as a set of disparate relations. Or, at the very least, if this approach still has some issues remaining, it should still have fewer issues than any other approach I can think of, because there's nothing you can't do with a custom type that you can do with a JSON blob. It's just that the custom type is more likely to be much more efficient for the task, and it also keep integrity checks as part of the type's implementation. And considering how often one needs to manipulate addresses in business settings, it seems a bit of a no-brainer to me that there should be some king of first-class support for addresses, just like there's for example first-class support for datetimes with time zones these days.
I seriously doubt you do. The closest you can get to “standard address format” is:
Address Line 1
Address Line 2
Address Line n
Postal code (which can be blank)
Postal Area (which can be blank)
Country
There is no way to build “first-class” support for addresses. Because theres no such thing as a valid or invalid address, only whether or not someone can find the correct location by reading the address. Of course, that person should be a local and intimately familiar with local address conventions. Conventions that will change how you describe flat numbers, street name, and address line ordering.
> And considering how often one needs to manipulate addresses in business settings,
Anyone who’s ever had to deal with real addresses would know this is the one thing you avoid doing with addresses. It’s pretty much impossible to correctly “manipulate” an address, because again, there’s no standard, it’s entirely dictated by local conventions, which can change street-to-street, city-to-city.
The best you can hope for when you’re forced to mutilate and address, is make the mutilation simple and obvious enough that the human actually delivering the post can un-mutilate it when they read it.
Is the only way to do this, btw. any other options are going to be broken in other locations. I’m thinking specifically of apartment/unit number, eg
Street address
Apartment number
This seems equivalent, but due to country specific differences, street and apartment is impossible to do correctly, or at least way harder than using line 1, line 2. This is because sometimes apartment number should come first, other times street needs to come first so the user has to second-guess your system in order to get mail delivered, while numbered lines are (hopefully) less ambiguous.
Yeah this is one of my big frustrations with many address systems.
Street address
Apartment number
Is an (i think) and American convention. Here in the UK you would have
Flat Number, Building Name
Street Name (or possibly Street Number Street Name)
Having addresses printed with the lines in the opposite order looks wrong to me. Additionally the American convention seems to assume all addresses are pretty simple, with there being an XOR relationship between street number and building name.
> Additionally the American convention seems to assume all addresses are pretty simple, with there being an XOR relationship between street number and building name.
It doesn't assume anything of the sort.
American addresses are a reference tag to direct the postal service where to send your mail. That's it.
I don't see how this changes anything. You can accommodate any address format, or any finite union of multiple address formats, including any computed or materialized views of the address with a custom type (AND including an "I give up" default for when everything else fails).
> there’s no standard, it’s entirely dictated by local conventions, which can change street-to-street, city-to-city
Well, that may be an international issue. In my country's case, it's quite clearly defined by law. A type for an international case might by necessity be a union of unions.
> In my country's case, it's quite clearly defined by law.
I theory the same is true in my country. But clearly someone forgot to inform the populous that not using the official standard is criminal, because I’ve seen plenty of “valid” addresses that don’t follow the standard. These addresses are clearly encoding local conventions, which makes decoding using the official standard not only impossible, but nonsensical, because manipulations that assume the standard will produce unusable addresses.
Best part is, i know this because i was responsible for the system that was then to munge these addresses into the “official” format for tax reasons. That code is 99% edge cases, 0.09% “well shit, just shove it all in the last line and hope for the best” and 0.01% “official” standard.
The type should always include provisions for manual fixes. IMO such an address type should even include a provision for manual geocoding for cases where automated geocoding fails ("This is how I write it, and this is where it's located").
99% of the time an address is just a “unique” opaque identifier. You ingest the address, then you print it on the parcel. You avoid manipulating or trying to interpret it too much.
A system that attempts to codify address to a standard that can’t express all addresses (including their nuance) is useful. You can’t trust any analytics created from it, because by its nature, address that can’t be interpreted will appear in geographic clusters, and thus skew all your stats.
You can however collect address as opaque strings, and optionally request extra data of a know format (like zipcode or postal code) which is generally considered part of the address. You can then produce stats only on those well know identifiers, and ignore the rest.
But doing that doesn’t require a complicated address type, or supporting address manipulations or any other crap like that. It just requires a free text box, and a separate postal code box.
The most important thing to recognise is that any arbitrary address will fit many different address conventions, but each of those conventions will result in a different location. It practically impossible to definitively interpret an arbitrary address correctly, without significant amounts of additional local context. So its best not to bother, and let the postal workers figure it out using their local knowledge.
Well, in my case, analytics for planning/scheduling of operations. I need to figure out which service points to cluster together. I need to point out that I specifically don't deal with parcels. Occasional individual outliers (some of these points DO need specifically GPS coordinates because occasionally there's an item like "the side of a shed on a parking lot", where the parking lot doesn't have a postal address) can be dealt with, but having, say, 99.5-99.9% ("best effort") systematization is very useful, especially when looking at how to migrate former units of work (which might involve fighting some organizational structures in a large national company). Without this it's impossible for me for example to estimate the objective function difference between a system that routes operations completely arbitrarily with optimal route length and a system that uses somewhat sub-optimal routes but with vastly lesser "human complexity"; for example, with routes spanning a small set of roads) that doesn't need that much automation (a paper list of locations in order - the status quo of the former system).
> The most important thing to recognise is that any arbitrary address will fit many different address conventions, but each of those conventions will result in a different location.
In my case it definitely should not do that, even with different conventions. Maybe that's one of those international things.
That might work if you have infinite resources and can create special handling for individual cities in the world or maybe even individual neighborhoods.
Mailing works in a very decentralized way and has a lot of local variations when you go outside of places that have put a lot of effort into standardizing addresses. Most post offices won't be looking at your whole address. They just care about understanding enough to forward it to a post office one step closer to the final destination. Understanding the final local address might not even use written data. It might just be tacit knowledge that's shared between a few local postal workers.
IMO that just makes it a very interesting problem to work on.
> Understanding the final local address might not even use written data. It might just be tacit knowledge that's shared between a few local postal workers.
To me there seems to be a contradiction in those two statements. By definition, an address is written. You can't decide where to deliver an item in any other way. If one and the same written text of the address could imply two or more "final local addresses", and somehow the delivery worker decided where is the item actually supposed to arrive, how would the sender indicate the alternatives if not by including it in the text of the address? Or did you mean something different by this?
> You can't decide where to deliver an item in any other way.
Only if you’re a robot, which postal workers aren't. They can use local context, such as no one lives at address X so they must have meant address Y. Or even, person at address X has a birthday this week, so this envelop that looks like a birthday card, and has their misspelled name on it, is obviously for address X not address Y.
I’ve had our friendly postal worker deliver post correctly to me, despite having a throughly munged and incorrect address, because she recognised my name, and knew someone with a similar name didn’t live at the more obvious interpretation of the incorrect address.
So address parsing and mail delivery is an extremely human and imprecise process. Full of nuance and edge cases that can’t even be observed, unless you actually follow the humans making deliveries and see what they’re doing.
> Or even, person at address X has a birthday this week, so this envelop that looks like a birthday card, and has their misspelled name on it, is obviously for address X not address Y.
That seems awfully contextual and ad-hoc. Surely this mechanism won't work in many instances unless you only receive mail on your birthday. It's a nice thing if it sometimes succeeds even when it shouldn't, but that's not something you can rely on. And should you get a different mail worker who doesn't know you, poof, your mail is gone.
> I’ve had our friendly postal worker deliver post correctly to me, despite having a throughly munged and incorrect address, because she recognised my name, and knew someone with a similar name didn’t live at the more obvious interpretation of the incorrect address.
Considering that this was presumably a problem with an address written on a physical item as a linear text, that's not quite in the purview of the problems that I'm trying to solve for my own application which needs to process physical addresses of objects (sometimes not even involving people in any way). So I can't comment on mail delivery specifically, sadly.
Yup, but that’s never stopped someone from relying on a method in the past. The vast majority of addresses aren't written by engineers. If you sent a letter using an address once, and it worked, then most people will just assume it'll always work. How would they know any better?
> It's a nice thing if it sometimes succeeds even when it shouldn't, but that's not something you can rely on.
Have you seen the internet? Or even just HTML? The entire world relies on things working when they shouldn’t. We can talk all day about the merits of that approach, but it wont change reality.
> Considering that this was presumably a problem with an address written on a physical item as a linear text
The text was printed perfectly if thats what your saying. It was just wrong. Some system somewhere had attempted to manipulate it, and ended up misinterpreting the original address, and produced something completely wrong as a result.
> I'm trying to solve for my own application which needs to process physical addresses of objects
That’s slightly different, and presumably you own far more of the process thats producing and interpreting these addresses. I’ve worked in systems that had to deal with addresses created by normal people, and let me tell you, normal people have a very diverse view on how to write addresses.
> That seems awfully contextual and ad-hoc. Surely this mechanism won't work in many instances unless you only receive mail on your birthday.
No, you're only more likely to receive stuff that looks like birthday cards on or around your birthday.
But, forget the birthday -- your earlier statement:
>>> To me there seems to be a contradiction in those two statements. By definition, an address is written
...is already contradicted if the postal worker just recognizes your name, and knows that you live at your actual address and not the one the written one more resembles. Your name isn't written in the address itself.
I know what to write on my envelopes so they get to me. You do not. Your job is to make it possible for things you send to get to me, not the other 3 units in the apartment.
Yes, I said that the complete type would have to have components developed by others as well, since I can't provide input for other countries. Doesn't mean that a minimum complexity formalization is impossible (in fact, mathematically, by enumeration one such formalization must exist). An "as-given" component is obviously always going to work for you.
Addresses are first and foremost a social convention. Simple mathematics will not help you understand something with so much implicit complexity and contradiction.
This sounds to me like a "just draw the rest of the owl" solution.
You're right in that there's nothing you can do with JSONB that you can't do with a custom type, but the larger problem is "do I even know how to make a custom type that might fit all of the various inconsistencies that I might face with this data?"
The answer, for me, is no. I have no idea, and I'm not going to pretend. I can throw it all into a giant chunk of JSON and leave explicit note that address handling isn't, and at least that way no one actually thinks that it is, which is probably safer for everyone involved.
In my case, I actually do know that; there's even a legal definition of what an address is and there's a national registry of all addresses. The problem is that I can't solve this for every country myself since I only have national knowledge (which is sufficient for my needs, fortunately).
Sounds like, for your purposes, you can define an address type that you can map to columns. But I think you're the outlier.
We have fairly regular addresses in the UK. You can enforce a country and a postcode. You can enforce at least two lines of the local address. That's where it ends.
I live right now I'm a house that's described as one village everywhere except by the local council, who address us by the other village. We're in a lane between the two. But the lane is impassable in the middle except with an off-road vehicle. Google maps doesn't know that. Visitors get lost even with GPS and precise co-ordinates!
Before moving out here, I've frequently had to just enter "London" twice due to aggressive validation. I've even entered "London, London, Greater London". It works, but I know there's a senior engineer somewhere who's obstinate and wrong. Hell, I lived somewhere where flats B, C and D for our house number were a different door to A, because properties can be converted.
What is the value of structuring this data? Store a country, a postal code, and a multiline local field, and don't try to validate that field except maybe to clean up surplus whitespace.
When validation and parsing fails, there should be room for local fixes. There's an interesting question as to what extent should corner cases be code-driven and to what extent they should be data-driven. Your case definitely sounds like one of those things that would need to be fixed by hand and then associatively recalled (so that they'd need to be fixed only once).
> What is the value of structuring this data?
Well, for example I definitely need it for analytical purposes, so I have to try.
> Well, for example I definitely need it for analytical purposes, so I have to try.
Try accepting the inherent limitations of trying to analyze something so unruly and take your analysis with a larger grain of salt instead of forcing order on something so inherently ad hoc.
You have highlighted the problem I described in my original reply to the post, and although you have been pretty active in this thread trying to come up with solutions, you seem to summarise with "I don't know the answer" with this reply.
Try imagining your application(s) that requires addresses moving into new markets, like Norway, Sweden, Ghana or whatever.
One of our addresses that caused trouble is literally: "The yellow sign across the street from the Seven-Eleven at <reasonable address>".
We have one address that's legally in two countries at the same time.
One address is just a whole city. Like the entirety of the city, but also it still needs to be considered a separate place from the city.
It of course depends on your use cases and etc. But I find addresses can be like storing names in a lot of contexts. i.e. Just take the bytes the user gives you and alert them if some service downstream complains, but don't require they change it to meet your requirements.
There are systems to help offer standardized addresses and you can display them as suggestions to the user. But sometimes you get a multi billion dollar company telling you "Maybe the address is legally X, but the bus stops at the yellow sign across the street, and we get 30 customers calling for refunds every week because they didn't get on the bus. So either accommodate this change or we'll need to find a new partner."