37

What's a correct format of a geographical address/location which is a good fit for any address on the Earth? At the moment I have:

  • country
  • city
  • street
  • number
  • text data (for simplicity)
  • zip
  • lat/lng

But I believe I can improve it: there might be a state/region of a country or something like area. Or no area/region/state, say, in Singapore or Hong Kong.

There might be no street, but road or boulevard or something else. A number of a building might be compound. There might be a floor. A room number. Etc....

Xwaro
  • 411

8 Answers8

54

Google has developed a library that helps validate postal addresses for every country in the world, which you can use to design a schema to store this data.

Look for the most common required fields across addresses from your targeted customer base to get started, and as you identify further countries with different requirements you can continue to adjust your schema.

mitchdav
  • 735
42

The universal way to store a geographical address/location in a database is this one:

[Address] nvarchar(max) not null

This requires the least amount of programming code (and so cuts maintenance costs) and is fully compatible with any address. It has, however, three big issues:

  • The lack of data validation means that the field can be used for the purposes other than storing the address. One of the purposes is a DOS attack intended to fill the space of your database by entering 2 GB of data in the address field.

  • The data stored this way makes it impossible to process it for business intelligence and data mining purposes. For instance, how many users are from India? There is no easy way to tell, since those addresses won't be normalized.

  • The users may mistakenly enter an incomplete or plainly wrong address.

In order to mitigate the first issue, limit the field to what you think to be a reasonable limit. Personally, I would start with 1000 characters, and then reduce it based on the length of the addresses entered by the first users once you get a data set large enough.

In order to mitigate the other two issues, you can use a third-party API which parses addresses and presents you with the data containing the country, city, postal code, etc. If possible, the API should be able to display the address on a map back to the user to reduce the risk for the user of entering an incomplete or wrong address: most users know where they live, and seeing a different position on a map would immediately give them a clue that they should check their input.

Note that whatever API you use, it won't be perfect. It will find most addresses, but not all of them. This means that if the API tells that the address doesn't exist, but the user insists that it does, you should a priori trust the user, even if he might be wrong.

This also means that you still should store the original user's input, side-by-side with the result of the API. This means that the schema becomes:

[RawAddress] nvarchar(max) not null
[ParsedAddress] xml null
37

There isn't one.

Every country has different address formats. If you are lucky, and they have a format at all!

Obviously latitude/longitude will give you a point on the globe, but it's not really useful for identifying individual houses. Just consider a tower block for example.

Your best bet is to check each countries postal service for an official format. This can be great for your backend database. But you will probably have to simplify it for end users as it will contain many more fields than most people are used to.

The UK one for example includes things like 'double dependant locality', but no one would know what that meant if you asked them.

Ewan
  • 83,178
21

The only universal format is to have a single text field which may have multiple lines of text. This will allow any possible address on earth.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
14

I have been developing software solutions to be used in many countries. We address this issue by starting with the larger entity first, i.e. country then have fields down to the least common or smallest. It works well for all countries we have experimented with so far. We also have a smart duplicate prevention system, and merger for those that have somehow get in the system since users are very 'creative'. In the admin section we have an address field order per country setting. i.e. Japan has the Post/Zip-code first where as UK/US last.

In general, we use:

  • Country
  • Post/Zip-code
  • State/Province/Prefecture/County
  • City/Town/Village
  • Street/Road/Block
  • Building Name/Number
  • Specific/Custom Information

Once entered and saved, a conjugated version can be displayed leaving out fields not necessary.

As I said, this works for all those countries we have software had software in and is the result of developing since 1989.

Hope this helps somehow or at least provides another insight.

0

As already stated, the most universal (but impractical to validate and perhaps least useful) is a single big unicode field.

You could separate country from the rest of the address and store it as the ISO country code. It would normalize the country and offer some utility in validating the remainder of the address.

You could also separate postal code aka zip code from the rest of the address. This would also have some utility in validating the remainder of the address, and could be helpful (though imprecise) in geolocation. For example: in Canada you can uniquely identify any address specifying only postal code and street number (aka house number); this may not be true in all countries.

Dedicating fields to states/provinces or cities starts to get more problematic because of the variations in the way each country formulates an address. I've set up address tables having such fields because the initial audience is focused on North America, knowing that an international audiences would pose a problem fitting in. In most cases, they can be "shoe-horned" in, but it is an awkward and potentially failure-prone compromise - definitely not universal.

Zenilogix
  • 347
0

Contrary to Mitchdav's answer, I would advise against using Google's library. I searched the repository for various international places with unorthodox addressing schemes hoping to find unit test data, but worryingly I found zero hits in the entire repository.

I think your best bet is to treat an address as free-form multi-line text. It sucks that you maybe cannot validate all addresses, but some addressing formats are really weird and possibly unanticipated and in the end the responsibility of filling in the correct address rests on the user and in most applications the user bears any negative consequences of filling in an invalid address.

You might, maybe, use a validator to provide a warning, but nothing more than that. But don't reject addresses that don't validate, because otherwise you might lose some customers. Which leads to the question of how to communicate the warning to the user in such a way that it will communicate that, if the user lives in an area with a weird address format, it's safe to ignore the warning...

0

As you say any address on earth there's only lat long or...

https://what3words.com

What 3 words, is an algorithm (so not a database so can be embedded into anything) that can define a 3x3 metre patch of anywhere on Earth.

Tonga and a few other states have adopted it as their postcode system, whilst it'll not replace it as an overlay its pretty cool, and very well built and thought out.