MySQL: Extracting timstamp and MAC address from UUIDs

本文介绍如何利用 MySQL 的 UUID 函数提取时间戳和 MAC 地址等敏感信息,并探讨了 UUID 作为代理键的泄露性问题。


In mysql, it is possible to use the uuid function to extract milliseconds.

select conv( 
            concat(
                   substring(uid,16,3), 
                   substring(uid,10,4), 
                   substring(uid,1,8))
                   ,16,10) 
            div 10000 
            - (141427 * 24 * 60 * 60 * 1000) as current_mills
from (select uuid() uid) as alias;

Result:

+---------------+
| current_mills |
+---------------+
| 1410954031133 |
+---------------+

It also works in older mysql versions!

Thank you to this page: http://rpbouman.blogspot.com.es/2014/06/mysql-extracting-timstamp-and-mac.html



To whom it may concern.

Surrogate keys: auto-increment or UUID?

I recently overheard a statement about whether to use  auto-incrementing id's  (i.e, a sequence managed by the RDBMS) or  universal unique identifiers (UUIDs) as method for generating  surrogate key  values.

Leakiness

Much has been written about this subject with regard to storage space, query performance and so on, but in this particular case the main consideration was  leakiness . Leakiness in this case means that key values convey information about the state of the system that we didn't intend to disclose.

Auto-incrementing id's are leaky

For example, suppose you would subscribe to a new social media site, and you get assigned a personal profile page which looks like this:
http://social.media.site/user/67638
Suppose that  67638  is the auto-incrementing key value that was uniquely assigned to the profile. If that were the case then we could wait a day and create a new profile. We could then compare the key values and use it to estimate how many new profiles were created during that day. This might not necessarily be very sensitive information, but the point here is that by exposing the key values, the system exposes information that it didn't intend to disclose (or at least not in that way).

Are UUIDs leaky?

So clearly, auto-incrementing keys are leaky. The question is, are UUIDs less leaky? Because if that's the case, then that might weigh in on your consideration to choose for a UUID surrogate key. As it turns out, this question can be answered with the universal but always unsatisfactory answer that "it depends". Not all UUIDs are created equal, and  wikipedia  lists 5 different variants. This is not an exhaustive list, since vendors can (and so, probably will) invent their own variants.

MySQL UUIDs

In this article I want to focus on MySQL's implementation. MySQL has two different functions that generate UUIDs:  UUID()  and  UUID_SHORT() .

Are MySQL UUIDs leaky?

If you follow the links and read the documentation, then we can easily give a definitive answer, which is: yes, MySQL UUIDs are leaky: It is not my role to judge whether this leakiness is better or worse than the leakiness of auto-incrementing keys, I'm just providing the information so you can decide whether it affects you or not.

Hacking MySQL UUID values

Now, on to the fun bit. Let's hack MySQL UUIDs and extract meaningful information. Because we can.

Credit where credit's due: Although the documentation and MySQL source code contain all the information, I had a lot of benefit from the inconspicuously-looking but otherwise excellent website from the Kruithof family. It provides a neat recipe for extracting timestamp and MAC addressfrom type 1 UUIDs. Thanks!

Here's a graphical representation of the recipe: 

Without further ado, here come the hacks:

Extracting the timestamp from a MySQL UUID

Here's how:
select  uid                           AS uid
,       from_unixtime(
          (conv(                      
            concat(                   -- step 1: reconstruct hexadecimal timestamp
              substring(uid, 16, 3)
            , substring(uid, 10, 4)
            , substring(uid, 1, 8)
            ), 16, 10)                -- step 2: convert hexadecimal to decimal
            div 10 div 1000 div 1000  -- step 3: go from nanoseconds to seconds
          ) - (141427 * 24 * 60 * 60) -- step 4: substract timestamp offset (October 15,  
        )                             AS uuid_to_timestamp
,       current_timestamp()           AS timestamp
from    (select uuid() uid)           AS alias;
Here's an example result:
+--------------------------------------+---------------------+---------------------+
| uid                                  | uuid_to_timestamp   | timestamp           |
+--------------------------------------+---------------------+---------------------+
| a89e6d7b-f2ec-11e3-bcfb-5c514fe65f28 | 2014-06-13 13:20:00 | 2014-06-13 13:20:00 |
+--------------------------------------+---------------------+---------------------+
1 row in set (0.01 sec)
The query works by first obtaining the value from UUID(). I use a subquery in the from clause for that, which aliases the  UUID()  function call to  uid . This allows other expressions to manipulate the same uid value. You cannot simply call the  UUID()  multiple times, since it generates a new unique value each time you call it. The raw value of  uid  is shown as well, which is: a89e6d7b-f2ec-11e3-bcfb-5c514fe65f28 . Most people will recognize this as 5 hexadecimal fields, separated by dashes. The first step is to extract and re-order parts of the uid to reconstruct a valid timestamp:
  • Characters 16-18 form the most significant part of the timestamp. In our example that's 1e3; the last 3 characters of the third field in the uid.
  • Characters 10-13 form the middle part timestamp. In our example that's f2ec; this corresponds to the second field
  • Characters 1-8 form the least significant part of the timestamp. In our example that's a89e6d7b; this is the first field of the uid.

Extracting the parts is easy enough with SUBSTRING(), and we can use CONCAT() to glue the parts into the right order; that is, putting the most to least significant parts of the timestamp in a left-to-right order. The hexadecimal timestamp is now 1e3f2eca89e6d7b.

The second step is to convert the hexadecimal timestamp to a decimal value. We can do that using CONV(hextimestamp, 16, 10), where 16 represents the number base of the hexadecimal input timestamp, and 10 represents the number base of output value.

We now have a timestamp, but it is in a 100-nanosecond resolution. So the third step is to divide so that we get back to seconds resolution. We can safely use a DIV integer division. First we divide by 10 to go from 100-nanosecond resolution to microseconds; then by 1000 to go to milliseconds, and then again by 1000 to go from milliseconds to seconds.

We now have a timestamp expressed as the number of seconds since the date of Gregorian reform to the Christian calendar, which is set at October 15, 1582. We can easily convert this to unix time by subtracting the number of seconds between that date and January 1, 1970 (i.e. the start date for unix time). I suppose there are nicer ways to express that, but 141427 * 24 * 60 * 60 is the value we need to do the conversion.

We now have a unix timestamp, and MySQL offers the FROM_UNIXTIME() function to go from unix time to a MySQL timestamp value.

Extracting the MAC address from a MySQL UUID

The last field of type 1 UUID's is the so-called node id. On BSD and Linux platforms, MySQL uses the MAC address to create the node id. The following query extracts the MAC address in the familiar colon-separated representation:
select  uid                           AS uid
,       concat(
                substring(uid, 25,2)
        , ':',  substring(uid, 27,2)
        , ':',  substring(uid, 29,2)
        , ':',  substring(uid, 31,2)
        , ':',  substring(uid, 33,2)
        , ':',  substring(uid, 35,2)
        )                             AS uuid_to_mac
from    (select uuid() uid)           AS alias;
Here's the result:
+--------------------------------------+-------------------+
| uid                                  | uuid_to_mac       |
+--------------------------------------+-------------------+
| 78e5e7c0-f2f5-11e3-bcfb-5c514fe65f28 | 5c:51:4f:e6:5f:28 |
+--------------------------------------+-------------------+
1 row in set (0.01 sec)
I checked on Ubuntu with  ifconfig  and found that this actually works.

What about UUID_SHORT()?

The UUID_SHORT() function is implemented thus:
(server_id & 255) << 56
+ (server_startup_time_in_seconds << 24)
+ incremented_variable++;
This indicates we could try and apply right bitshifting to extract server id and start time.

Since the server_id can be larger (much larger) than 255, we cannot reliably extract it. However, you can give it a try; assuming there are many mysql replication clusters with less than 255 nodes, and assuming admins will often use a simple incrementing number scheme for the server id. you might give it a try.

The start time is also easy to extract with bitshift. Feel free to post queries for that in the comments.

Conclusions

I do not pretend to present any novel insights here, this is just a summary of well-known principles. The most important take-away is that you should strive to not expose system implementation details. Surrogate key values are implementation details so should never have been exposed in the first place. If you cannot meet that requirement (or you need to compromise because of some other requirement) then you, as system or application designer should be aware of the leakiness of your keys. In order to achieve that awareness, you must have insight at the implementation-level of how the keys are generated. Then you should be able to explain, in simple human language, to other engineers, product managers and users, which bits of information are leaking, and what would be the worst possible scenario of abuse of that information. Without that analysis you just cannot decide to expose the keys and hope for the best.

In mysql, it is possible to use the uuid function to extract milliseconds.

select conv( 
            concat(
                   substring(uid,16,3), 
                   substring(uid,10,4), 
                   substring(uid,1,8))
                   ,16,10) 
            div 10000 
            - (141427 * 24 * 60 * 60 * 1000) as current_mills
from (select uuid() uid) as alias;

Result:

+---------------+
| current_mills |
+---------------+
| 1410954031133 |
+---------------+

It also works in older mysql versions!

Thank you to this page: http://rpbouman.blogspot.com.es/2014/06/mysql-extracting-timstamp-and-mac.html

### LLMFactor Method for Extracting Profitable Factors through Prompts in Explainable Stock Movement Prediction LLMFactor 是一种利用大语言模型(LLM)通过提示(prompt)来提取可解释股票波动预测中的盈利因子的方法。这种方法的核心在于结合了大语言模型的强大知识库和金融领域的特定需求,以实现对股票市场趋势的精准预测与深入理解。 在 LLMFactor 中,提示工程扮演着至关重要的角色。通过精心设计的提示,可以将复杂的金融数据转化为文本形式,使得语言模型能够处理这些信息并生成有价值的见解。例如,(Xue and Salim, 2023) 提出了一种基于提示的方法,将数字输入和输出转换为文本提示,将预测任务定义为句子到句子的转换。这种方法允许将语言模型直接应用于预测任务。对于金融时间序列预测,(Yu et al., 2023) 使用提示从 LLM 生成摘要和关键短语,结合多种数据源来增强时间序列预测。他们的方法为股市趋势提供了宝贵的见解。然而,他们的提示包含过多的信息,导致 LLM 的响应相对缺乏细节 [^3]。 尽管 LLMFactor 在实践中展现出了巨大的潜力,但它也面临着一些挑战。一方面,LLMFactor 对不同语言之间存在一定的性能差异,这可能影响其在全球范围内的应用效果。另一方面,在预测过程中,LLMFactor 也会出现一些不确定性,这就需要未来对其鲁棒性进行改进的研究 [^2]。此外,为了进一步提高 LLMFactor 对事件的理解和分析能力,如果能更充分地应用上 LLM 的知识储备,尤其是金融专业知识,或许还能取得更好的预测结果 [^2]。 从实际应用的角度来看,LLMFactor 方法不仅有助于提升股票市场预测的准确性,还能够在大数据时代帮助企业更好地处理海量数据,提高数据分析和决策的准确性 [^4]。掌握大模型应用开发技能,可以让程序员更好地应对实际项目需求,从而推动金融行业的创新发展。 ```python def example_prompt_engineering(data): """ 示例函数演示如何将金融数据转换为文本提示。 参数: data (dict): 包含金融数据的字典,如价格、成交量等。 返回: str: 转换后的文本提示。 """ prompt = f"当前股票价格为 {data['price']} 元,成交量达到 {data['volume']} 股。" return prompt ``` 通过上述示例代码,可以看到如何将具体的金融数据(如价格和成交量)转换为一段自然语言描述,进而作为提示输入给大语言模型,以便于模型进行后续的分析和预测。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值