Most of your data will be transferred via stream or batch process into your Data Warehouse or Data Lake. You will choose stable ETL or ELT processes to bring the majority and most relevant data from your legacy systems, CRM, ERP, IOT, etc.
您的大多数数据将通过流或批处理过程传输到Data Warehouse或Data Lake中。 您将选择稳定的ETL或ELT流程,以从旧系统,CRM,ERP,IOT等中获取大多数和最相关的数据。
However, in the world of Data Science and Self Service BI your Data Analyst/Scientist and BI users will also need data from external systems, open API’s, data from sheets, websites and so on. Therefore, tools like Google Sheets could be a great solution to implement a data flow easily. This article should give you a first idea how you can combine Google’s powerful BigQuery with easy to use Googles Sheets and its’ endless add-ons.
但是,在数据科学和自助服务BI的世界中,您的数据分析师/科学家和BI用户还将需要来自外部系统的数据,开放API,表,网站的数据等等。 因此,诸如Google表格之类的工具可能是轻松实现数据流的绝佳解决方案。 本文应该让您初步了解如何将Google强大的BigQuery与易于使用的Googles表格及其无尽的附加组件结合在一起。
谷歌云 (Google Cloud)
Normally you would connect your Google Sheets to BigQuery and load table data into your sheets to visualize it in bar charts, pie charts, etc. But you can also use sheets as a source for BigQuery and vice versa. You don’t have to worry about connectors etc. There are already build in and ready to use.
通常,您会将Google表格与BigQuery关联,然后将表格数据加载到表格中,以条形图,饼图等形式对其进行可视化。但是您也可以将表格用作BigQuery的来源,反之亦然。 您不必担心连接器等。已经内置并且可以使用。

一些启示 (Some Inspiration)
During my working experiences, I came across Google Sheets and its’ great add-ons that provide me with possibilities for automated data processes for BigQuery:
在工作过程中,我遇到了Google表格及其强大的附加功能,这些功能为我提供了BigQuery自动化数据处理的可能性:
Get data from a website or a file
从网站或文件获取数据
With build-in-functions you can fetch data from any website (e.g. HTML tables) or linked files like CSV or TSV.
使用内置功能,您可以从任何网站(例如HTML表格)或CSV或TSV等链接文件中获取数据。

Transform your data
转换数据
For geocoding my address data I once had to merge three columns into one, separated by commas. Therefore, I used the Sheets’ add-on Power-Tools:
为了对地址数据进行地址解析,我曾经不得不将三列合并为一列,并以逗号分隔。 因此,我使用了Sheets的附加电动工具 :

With this great add-on, you can easily merge and combine data, find duplicates, randomize data and much more without any coding. This could be a great advantage for users without a deeper knowledge of SQL. So you don’t have to transform the data within BigQuery — which costs per Query — you can do this step beforehand, load the data to BigQuery and visualize it e.g. via Data Studio.
有了这个出色的插件,您可以轻松地合并和合并数据,查找重复项,将数据随机化等等,而无需任何编码。 对于没有更深入SQL知识的用户来说,这可能是一个巨大的优势。 因此,您不必在BigQuery中转换数据(每个查询的成本),您可以预先执行此步骤,将数据加载到BigQuery并通过Data Studio可视化。
Geocode data
地理编码数据
After transforming my data, I wanted to geocode it. Another great add-on came along: Geocode by Awesome Table
转换数据后,我想对其进行地理编码。 另一个很棒的附加组件出现了: Awesome Table提供的地址解析

So this add-on provided me a service to get the latitude and longitude of an address. This is a great feature because for map visualizations you will often need coordinates.
因此,此附加组件为我提供了获取地址纬度和经度的服务。 这是一个很棒的功能,因为对于地图可视化来说,您通常需要坐标。
Get data from an API
从API获取数据
There are a lot of open API’s you can use to get data you don’t have in your Data Lake yet. I searched for an API to get synonyms for certain words — to combine them with ERP data in BigQuery and to find suspicious bookings. I also used Google Sheets to call the API via HTTP request. You can do it, for example, with App Script on your own or also use and add-on, here I liked API Connector.
您可以使用许多开放的API来获取Data Lake中还没有的数据。 我搜索了一个API,以获取某些单词的同义词-将它们与BigQuery中的ERP数据结合起来并查找可疑的预订。 我还使用Google表格通过HTTP请求来调用API。 例如,您可以自己使用App Script进行操作,也可以使用附加功能,这里我喜欢API Connector。

Statistics and Machine Learning
统计与机器学习
Another use case could be statistics and machine learning. Could Google Sheets be a suitable solution for these areas? Yes, it can be. The following two tools can be very helpful:
另一个用例可能是统计和机器学习。 Google表格可以适合这些领域吗? 是的,可以。 以下两个工具可能非常有用:
Solver: for advanced statistics operations
求解器:用于高级统计操作
BigML: for machine learning via drag and drop
BigML:通过拖放进行机器学习
Results could also be loaded into BigQuery and be enriched with data from others sources.
结果也可以加载到BigQuery中,并用其他来源的数据进行充实。
局限性和注意事项 (Limitations and Notes)
- Keep in mind, that Google Sheets is a spreadsheet program and should be used mainly for that reason — it’s not suitable for heavy data loading processes. 请记住,Google表格是一个电子表格程序,主要应出于此原因使用-它不适用于繁重的数据加载过程。
- Also consider to implement an app-script-based (if not provided by add-on) and automated refresh for your data, so that the data flow from sheets to BigQuery always provide up-to-date data. 还可以考虑为您的数据实施基于应用脚本的(如果附加组件未提供)和自动刷新,以便从工作表到BigQuery的数据流始终提供最新数据。
- For their own usage, it is okay to let also business users get their desired data by the ways described above. However, when considering to spread the data via Data Lake to other consumers, the data integrations via Google Sheets should be monitored or even implemented by the IT department or at least technical advanced staff. 对于他们自己的用法,还可以通过上述方法让业务用户也获得他们所需的数据。 但是,当考虑通过Data Lake将数据分发给其他用户时,应该由IT部门或至少由技术高级人员来监视甚至实施通过Google Sheets进行的数据集成。
- The mentioned add-ons are pay-to-use services with certain prize ranges. 提到的附加组件是具有一定奖赏范围的按使用付费服务。
资料来源和进一步阅读 (Sources and Further Readings)
https://support.google.com/docs/answer/3093335?hl=de
https://support.google.com/docs/answer/3093335?hl=de
https://support.google.com/docs/answer/9702507?hl=en
https://support.google.com/docs/answer/9702507?hl=zh_CN
https://bigml.com/tools/bigml-gas
https://bigml.com/tools/bigml-gas
https://gsuite.google.com/marketplace/app/api_connector/95804724197
https://gsuite.google.com/marketplace/app/api_connector/95804724197
https://gsuite.google.com/marketplace/app/geocode_by_awesome_table/904124517349
https://gsuite.google.com/marketplace/app/geocode_by_awesome_table/904124517349