There are a handful of problems that keep recurring across my decades of working in information technology. Some problems are simple to address, some are always annoying, and a few of them seem doomed to need re-solving each time.
在我从事信息技术数十年的工作中,经常会遇到很多问题。 有些问题很容易解决,有些总是很烦人,有些似乎注定每次都需要重新解决。
The one on my mind today is that of addresses, and locations. The crux of the problem is this: addresses and locations are two very different things, and yet information system designers and implementers persistently and repeatedly mix the two up, and fail to grasp the subtle complexities of either.
我今天想到的是地址和位置。 问题的症结在于:地址和位置是两个截然不同的事物,而信息系统设计人员和实施人员却坚持不懈地反复将两者混为一谈,而无法把握两者的微妙复杂性。
I’ve struggled with the consequences of this in many different arenas — validation of addresses, parsing addresses to try to identify a location, going from locations to addresses, and all sorts of misguided attempts to perform geographical analysis of data without having any locations.
我在许多不同的领域中都在为这种结果而苦苦挣扎-验证地址,解析地址以尝试识别位置,从一个位置到另一个地址以及各种在没有任何位置的情况下进行数据地理分析的错误尝试。
So what do I mean by the assertion that addresses and locations are different things?
那么,所谓地址和位置是不同的东西,这意味着什么?
I will be blunt:
我会直言不讳:
- locations are points (or polygons) that can be indicated on a map; 位置是可以在地图上显示的点(或多边形);
- addresses are instructions to humans for how to deliver something to a location, or how to navigate to a location. 地址是对人类的指示,说明如何将某些物品运送到某个位置或如何导航到某个位置。
Locations are quite straightforward. Here’s one, somewhere near Salisbury at 51.184150, -1.831776:
位置非常简单。 这是一个靠近索尔兹伯里(Salisbury)的电话,地址为51.184150,-1.831776:

The thing about a location is that it can be described unambiguously with reference to some coordinate system. The most common and obvious one is latitude and longitude — here 51.184150,-1.831776 — but that’s not the only coordinate system that we can use.
关于位置的问题是,可以参考某些坐标系来明确地描述位置。 最常见和最明显的是纬度和经度-在这里51.184150,-1.831776-但这不是我们可以使用的唯一坐标系。
A fascinating project is What 3 Words, which has divided the world into 3m squares, and assigned each square a unique name consisting of 3 words — in this case “feasted, paraded, solder”. As with a latitude/longitude pair, the label is unambiguously associated with a point on a map. A similar effort is being made by Google’s Plus Codes project, which labels our location as “55M9+MG, Salisbury”.
一个引人入胜的项目是What 3 Words ,它将世界划分为3m个正方形,并为每个正方形分配一个由3个单词组成的唯一名称-在这种情况下为“盛宴,游行,焊接”。 与纬度/经度对一样,标签明确地与地图上的一个点相关联。 Google的Plus Codes项目正在做出类似的努力,该项目将我们的位置标记为“ 55M9 + MG,索尔兹伯里”。
A quick aside about those schemes by the way — they still do not necessarily identify a precise location. Each will have a margin of error or accuracy, and for most data sets that will only be accurate to within a few meters. More importantly, they do not include elevation — I used to deal with data sets that contained locations with the same latitude and longitude, but some locations were below ground (coal mines) and some were above ground (structures built over the top of railway lines). Our habit of mapping the world onto two dimension maps trips us up into forgetting that locations are three dimensional, a fact which is rather important for air traffic…
顺便说一句,这些方案不多-它们仍然不一定能确定确切的位置。 每个数据集都有一定的误差或准确度,并且对于大多数数据集而言,其准确度仅在几米之内。 更重要的是,它们不包括海拔高度 -我曾经处理过的数据集包含的纬度和经度相同,但是有些位置在地下(煤矿)以下,而某些位置在地面以上(在铁路线上方建造的结构) )。 我们将世界映射到二维地图的习惯使我们忘记了位置是三维的,这一事实对于空中交通而言相当重要……
Now, the nice thing about locations is that they can be considered to be inside other polygons on a map (or spaces, if we want to remember the third dimension!). So for our example above, we can determine that it’s inside various areas as described by Open Street Map
现在,关于位置的好处是可以将它们视为在地图上的其他多边形内(如果要记住第三维,则可以是空格!)。 因此,对于上面的示例,我们可以确定它位于Open Street Map所描述的各个区域内
- Cursus Ridge Barrows 柯苏斯山脊手推车
- Larkhill 拉克希尔
- Wiltshire 威尔特郡
- South West England 西南英格兰
- England 英国
- postal code SP3 4DX 邮政编码SP3 4DX
- United Kingdom 英国
This starts to give us the handle we need for grouping and analysing data by location — as long as our locations are somewhere relatively well organised and subdivided as rural England. 48.867765,-20.769324 might give us more trouble, being somewhere in the rather large area we just call the North Atlantic Ocean.
这开始为我们提供了按位置对数据进行分组和分析所需的处理能力,只要我们的位置相对井井有条,并且可以细分为英格兰乡村。 48.867765,-20.769324可能会给我们带来更多麻烦,因为我们位于我们称为北大西洋的较大区域中的某个地方。
The alternative approach, which is common in GIS systems, is to group and analyse locations based on geographical proximity to each other, or to some fixed point.
在GIS系统中很常见的另一种方法是根据彼此之间或某个固定点的地理邻近性对位置进行分组和分析。

Let’s turn toward addresses now. As I said above, we really cannot think of addresses as anything other than instructions to humans on how to deliver something — usually mail.
现在让我们转向地址。 就像我在上面说过的,我们真的不能将地址视为对人类的指示,通常是关于邮件的指示。
To start with, addresses are really complicated. The wonderful collection of falsehoods programmers believe in includes references to several explainers of just how messy addresses are in different parts of the world, although it does miss a rather pithy one related mainly to UK addresses. I particularly like an example that latter one points out, where the DVLA Swansea office has five different postcodes for five different departments at the same location — rather than identifying any sort of location, the postcode component of the address is used specifically to route mail to different parts of the building.
首先,地址真的很复杂 。 程序员相信的绝妙虚假集合包括对一些解释器的引用,这些解释器说明了地址在世界不同地区的混乱程度,尽管它确实遗漏了一个主要与英国地址相关的相当虚假的地址 。 我特别喜欢后面一个例子指出的一个例子,DVLA斯旺西办事处在同一位置为五个不同部门提供了五个不同的邮政编码- 地址是特定于路由邮件的,而不是标识任何种类的位置建筑物的不同部分。
If you are ever building a user interface, or a database schema, that breaks an address down into something like “street number, street name, street type, town, state, country”… just stop and go read some of the explainers about how addresses work. To begin with, there is no international accepted standard for address formatting. Wikipedia points to 60+ conventions for address formatting from around the world, and provides references to national standards where they exist. Google have suggested a standard — mainly for their purposes — related to their GeoCoding API, and Open Street Map have been struggling to defined a format which is flexible enough to cover a lot of cases.
如果你曾经构建用户界面,或者数据库模式,打破地址分解成类似“门牌号,街道名称,街道类型,城市,州,国家” ...只是停止和阅读一些关于如何Astaro网站的解决工作。 首先,没有国际公认的地址格式标准。 Wikipedia指向了60多个来自世界各地的地址格式约定,并提供了它们存在的国家标准的参考。 Google提出了与其GeoCoding API相关的标准(主要是出于他们的目的),并且Open Street Map一直在努力定义一种足够灵活的格式来涵盖很多情况。
The key thing when designing or implementing any information system dealing with addresses is this: why do you need or want the address, and what are you going to do with it.
在设计或实现任何处理地址的信息系统时,关键的事情是:您为什么需要或想要该地址,以及如何使用它。
To begin with, if you are collecting postal addresses, and you are only using them to send your customers mail — let your customers decide how to format the address, and store it as an arbitrary set of free text lines. If you really need to do analysis based on some broad geographic area like country, state or postal code, then break those out as specific fields that are validated against some reliable authoritative set. If you do this, be prepared to keep your validation set updated, as even countries come and go — what happens if you have broken up your address into separate fields, and a decade later the region name has changed?
首先,如果您要收集邮政地址,而仅使用它们向客户发送邮件,则让客户决定如何设置地址格式并将其存储为任意的自由文本行。 如果您确实需要根据某个广泛的地理区域(例如国家/地区,州或邮政编码)进行分析,则可以将其分解为特定字段,并根据一些可靠的权威性集合进行验证。 如果这样做,即使国家来回也要随时更新验证集,如果您将地址分成多个单独的字段,而十年后该地区的名称发生了变化,会发生什么呢?
Oh, by the way: don’t make the mistake of assuming that people live at their postal addresses. I think I’ve seen that mistake at least four times.
哦,顺便说一句:不要误以为人们住在他们的邮政地址。 我认为我至少看到过4次错误。
For the UK, you might think you have a nicely unambiguous mapping between postal address and a reasonably precise location… and you do, for long established urban areas. Even here though, it’s estimated that about 1.25 million postal addresses change their post code in the UK each year. For analytical purposes, that’s a real headache: if you analyse time-series data over years or decades based on post code, there is a significant risk of gross inaccuracies in the analysis.
对于英国来说,您可能会认为您在邮政地址和合理精确的位置之间建立了清晰明确的对应关系……而且对于已建立很久的市区来说,确实如此。 即使在这里,据估计每年仍有大约125万个邮政地址在英国更改其邮政编码。 出于分析目的,这确实令人头疼:如果您根据邮政编码分析过去数年或数十年的时间序列数据,则存在很大的总体误差的风险。
If you’re collecting addresses to identify locations… you need to think very carefully about how you do that mapping, and how the quality of that data might degrade over time. Also, you almost definitely don’t need an address. At best you might be able to make an argument for retaining a post code, region name, or town name for analytical purposes —”do customers in Paris drink more chablis than customers in Marseille?” is a good question, but comparing customers based on their street is not likely to be much use.
如果您要收集地址以识别位置,则需要非常仔细地考虑如何进行映射以及数据的质量可能随时间下降。 此外,你几乎肯定不需要一个地址。 充其量来说,您也许可以为保留邮政编码,地区名称或城镇名称而进行论证,以进行分析—“巴黎的顾客喝的马卡比面包比马赛的顾客多吗?” 这是一个很好的问题,但是根据他们的街道比较客户的可能性不大。
Mapping an address to a location — geolocation of that address — is an extremely non-trivial problem. It’s somewhat reliable in some countries in urban areas. Mostly. In general terms, urban areas in many countries have well defined addressing schemes that can be easily transformed back to a location.
将地址映射到某个位置(该地址的地理位置)是一个极其重要的问题。 在某些国家的城市地区,这种方式有些可靠。 大多。 概括而言,许多国家/地区的市区都有明确定义的寻址方案,可以轻松地转换回某个位置。
Google Maps, Open Street Maps, Apple Maps and services from GPS providers like Garmin are seductively good at putting a dot on the map when you query them with an address. But even that can be a problem. I worked on a project many years ago dealing with a dataset of properties in Outback Australia. The problem was that the legal, formal address for many properties might indicate access to the property from a certain road, while in reality access to the property might be from another side of the property, on a different road, an hour or more’s drive away. Emergency services like fire and ambulance were very keen to understand how to reach a property in a hurry, as you may imagine! Dropping a dot in a polygon slightly larger than Belgium is not much help.
Google Maps,Open Street Maps,Apple Maps和GPS提供商(如Garmin)提供的服务诱人地擅长在用地址查询时在地图上加点。 但这甚至可能是一个问题。 多年前,我参与了一个处理澳大利亚内陆地区属性数据集的项目。 问题在于,许多财产的合法正式地址可能指示从某条道路访问该财产,而实际上,访问该财产可能是从该财产的另一侧,在不同的道路上,一个小时或更长时间的车程。 如您所想,消防和救护车之类的紧急服务非常渴望了解如何赶紧到达物业! 在比比利时稍大的多边形中放置一个点并没有太大帮助。
The bad news is there is no straight-forward reliable way to geolocate all addresses. There’s simply too much variation between countries, too much ambiguity outside urban areas, and too much potential difference between a formal, legal address and where the residents think they live. Even in urban contexts that can be discrepant. For example, frequently the postal address for a property on a corner is different to the street address: many countries use the convention that the legal address is based on where your front door (or driveway) is, while the postal address is where your letter box is.
坏消息是,没有直接可靠的方法来对所有地址进行地理定位。 国家之间的差异太大,城市地区之外的歧义太多,正式的法律地址与居民的居住地之间的潜在差异也太大。 即使在城市环境中也可能会出现差异。 例如,拐角处的物业的邮政地址经常与街道地址不同:许多国家/地区使用以下约定:法定地址基于前门(或车道)所在的位置,而邮政地址则是您的信件所在的位置盒子是。
The only truly reliable way to deal with locational data is to use a dedicated GIS or digital mapping solution, to associate data with polygons or points on a map. It’s feasible to use APIs from places like Google or Open Street Map to attempt to parse an address back to a point or a polygon, but you must be prepared for there to be addresses that cannot be mapped, and addresses that are inaccurately mapped.
处理位置数据的唯一真正可靠的方法是使用专用的GIS或数字地图解决方案,以将数据与地图上的多边形或点相关联。 它使用从谷歌或开放街道地图的地方API来试图解析地址返回到一个点或多边形可行的 ,但你必须准备为那里是不能被映射的是不准确的映射地址和地址。
As always, the key to intelligent design and implementation of an information system dealing with addresses and locations is to properly consider what you want to do with the information, and to properly understand the complexity of the data you are collecting or analysing. And remember — addresses are not locations; some locations have no addresses; and addresses may not map back to locations.
与往常一样,智能设计和实现处理地址和位置的信息系统的关键是正确考虑您要对信息进行的处理,并正确理解要收集或分析的数据的复杂性。 记住-地址不是位置; 一些地点没有地址; 并且地址可能不会映射回位置 。
Before signing off, I’d like to recommend a book : The Address Book: What Street Addresses Reveal About Identity, Race, Wealth and Power by Deirdre Mask. It’s a tremendous introduction to some of the complicated history about addresses, a revealing look at the way that addresses are not value free, and a survey of some of the projects going on around the world to bring addresses to places that have never had them.
在签字之前,我想推荐一本书: 地址簿:街上的地址揭示了Deirdre Mask的身份,种族,财富和权力 。 这是对地址的一些复杂历史的一个伟大介绍,它揭示了地址不是没有价值的方式,并对世界上正在进行的一些将地址带到从未有过地址的项目进行了调查。
翻译自: https://medium.com/the-innovation/addresses-are-not-locations-ee99fae4170a