另类数据在交易中的应用

Alternative Data for Trading

这篇文章讨论了另类数据在交易中的应用。它解释了个人、业务流程和传感器产生另类数据的方式,并提供了一个框架来评估投资目的下不断增多的另类数据供应。它展示了使用Python从网页抓取数据的获取、预处理和存储的工作流程,为应用机器学习做好准备。文章最后提供了另类数据的来源、提供商和应用的示例。

文章介绍了另类数据革命、替代数据资源、替代数据来源、评估另类数据集的标准、另类数据市场、使用、工作流程和代码示例。

在算法交易中,新的数据来源提供信息优势,如果它们提供从传统来源无法获得的信息,或者提供更早的信息。随着全球趋势,投资行业正在迅速扩展,超越市场和基本数据,转向替代来源,通过信息优势获取Alpha收益。预计到2020年,数据、技术能力和相关人才的年度支出将从当前的30亿美元以每年12.8%的速度增长。

今天,投资者可以实时访问宏观或公司特定的数据,这些数据历史上只能以更低的频率获得。新数据来源的用例包括以下内容:
 -  在线价格数据可以用于测量代表性商品和服务的通货膨胀率
 -  商店访问或购买数量可以实时估计公司或行业特定的销售或经济活动
 -  卫星图像可以揭示农业产量,或矿山或油井的活动,这些信息在其他地方不可用

另类数据集由许多来源产生,但可以在高层次上分类为主要由以下机构产生:
 - 	在社交媒体上发布帖子、评论产品或使用搜索引擎的个人 	
 - 记录商业交易,特别是信用卡付款,或作为中介捕捉供应链活动的企业
 - 传感器,它们通过图像(如卫星或安全摄像头)或移动模式(如手机塔)捕捉经济活动,等等

另类数据源在关键方面存在差异,这些差异决定了它们对算法交易策略的价值或信号内容。

另类数据的最终目标是在寻找产生Alpha收益的交易信号的竞争性搜索中提供信息优势。在实践中,从另类数据集中提取的信号可以单独使用,也可以作为量化策略的一部分与其他信号结合使用。

文章还提供了使用网页抓取获取另类数据的示例,以及使用Selenium抓取OpenTable数据和SeekingAlpha收益电话转录的代码示例。

This chapter explains how individuals, business processes, and sensors produce alternative data. It also provides a framework to navigate and evaluate the proliferating supply of alternative data for investment purposes.

It demonstrates the workflow, from acquisition to preprocessing and storage using Python for data obtained through web scraping to set the stage for the application of ML. It concludes by providing examples of sources, providers, and applications.

Content

  1. The Alternative Data Revolution
  2. Sources of alternative data
  3. Criteria for evaluating alternative datasets
  4. The Market for Alternative Data
  5. Working with Alternative Data

The Alternative Data Revolution

For algorithmic trading, new data sources offer an informational advantage if they provide access to information unavailable from traditional sources, or provide access sooner. Following global trends, the investment industry is rapidly expanding beyond market and fundamental data to alternative sources to reap alpha through an informational edge. Annual spending on data, technological capabilities, and related talent are expected to increase from the current $3 billion by 12.8% annually through 2020.

Today, investors can access macro or company-specific data in real-time that historically has been available only at a much lower frequency. Use cases for new data sources include the following:

  • Online price data on a representative set of goods and services can be used to measure inflation
  • The number of store visits or purchases permits real-time estimates of company or industry-specific sales or economic activity
  • Satellite images can reveal agricultural yields, or activity at mines or on oil rigs before this information is available elsewhere

Resources

Sources of alternative data

Alternative datasets are generated by many sources but can be classified at a high level as predominantly produced by:

  • Individuals who post on social media, review products, or use search engines
  • Businesses that record commercial transactions, in particular, credit card payments, or capture supply-chain activity as intermediaries
  • Sensors that, among many other things, capture economic activity through images such as satellites or security cameras, or through movement patterns such as cell phone towers

The nature of alternative data continues to evolve rapidly as new data sources become available and sources previously labeled “alternative” become part of the mainstream. The Baltic Dry Index (BDI), for instance, assembles data from several hundred shipping companies to approximate the supply/demand of dry bulk carriers and is now available on the Bloomberg Terminal.

Alternative data sources differ in crucial respects that determine their value or signal content for algorithmic trading strategies.

Criteria for evaluating alternative datasets

The ultimate objective of alternative data is to provide an informational advantage in the competitive search for trading signals that produce alpha, namely positive, uncorrelated investment returns. In practice, the signals extracted from alternative datasets can be used on a standalone basis or combined with other signals as part of a quantitative strategy.

Resources

The Market for Alternative Data

The investment industry is going to spend an estimated $2bn-3bn on data services in 2018, and this number is expected to grow at double digits per year in line with other industries. This expenditure includes the acquisition of alternative data, investments in related technology, and the hiring of qualified talent.

Working with Alternative Data

This section illustrates the acquisition of alternative data using web scraping, targeting first OpenTable restaurant data, and then move to earnings call transcripts hosted by Seeking Alpha.

Code Example: Open Table Web Scraping

Note: different from all other examples, the code that uses Selenium is written to run on a host rather than using the Docker image because it relies on a browser. The code has been tested on Ubuntu and Mac only.

This subfolder 01_opentable contains the script opentable_selenium to scrape OpenTable data using Scrapy and Selenium.

Code Example: SeekingAlpha Earnings Transcripts

Update: unfortunately, seekingalpha has updated their website to use captcha so automatic downloads are no longer possible in the way described here.

Note: different from all other examples, the code is written to run on a host rather than using the Docker image because it relies on a browser. The code has been tested on Ubuntu and Mac only.

The subfolder 02_earnings_calls contains the script sa_selenium to scrape earnings call transcripts from the SeekingAlpha website.

Python Libraries & Documentation

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Longbo-AI

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值