Back of the Envelope Calculations

本文通过实际案例探讨了数据库容量估算的过程,并结合系统需求、数据类型和存储优化策略,提出了一个合理的数据库容量规划方案。

Back of the Envelope Calculations

One of the best DBA questions I was ever asked in an interview, was to try and estimate required disk space for an application using a back-of-the-envelope calculation.

The idea behind a question like this, is that it is a thought exercise. Its aim is to better test competency than some of the simple questions that can be Googled for an answer (i.e. whether or not you know the answer is not important. You will in 2 minutes :)) Unfortunately I have also seen it negatively affect candidates that are not native English speakers, so be aware that your mileage may vary.

Today I wanted to try and recreate the back-of-the-envelope exercise that I was asked to complete. For simplicity the question will be restricted to database size and not include RAM or IOPS estimations.

Scope of Problem

“We are designing a new system to store application users for a very demanding system. There is only one table, but it is expected to grow to 20 million recordswithin the next 3 years. We need your help with hardware provisioning. Here is the table schema:”

CREATE TABLE users (
 id INT NOT NULL PRIMARY KEY auto_increment,
 first_name VARCHAR(60) NOT NULL,
 last_name VARCHAR(60) NOT NULL,
 user_name VARCHAR(30) NOT NULL,
 password BINARY(20) NOT NULL,
 email_address VARCHAR(100) NOT NULL,
 last_login_date TIMESTAMP NOT NULL,
 UNIQUE INDEX (user_name),
 INDEX (last_login_date),
 INDEX (last_name, first_name),
 INDEX (email_address)
) ENGINE=INNODB;

Assumptions you are allowed to make:

  • Database is UTF-8.
  • Average first name and last name is 10 characters / mostly single byte characters.
  • Usernames average 8 characters in length / all single byte characters.
  • Assume password is a SHA1 hash.
  • Average email address is 20 characters / all single byte characters.
  • Users are almost never deleted, and rarely change first_name, last_name or user_name after being created.

My Answer

The first step I would go to, is to find out the size of the datatypes in the table, and in the indexes. For the VARCHAR columns I will use the average lengths provided + 1 byte, and for other columns I can use the MySQL manual. So:

4 + 10+1 + 10+1 + 8+1 + 20 + 20+1 + 4 = 80 bytes per row.

Each index will have the length requirement of the data types in the index + the primary key:

             username: 8+1 + 4 = 13
      last_login_date: 4 + 4 = 8
last_name, first_name: 10+1 + 10+1 + 4 = 26
        email_address: 20+1 + 4 = 25

InnoDB also has some approximately 13 bytes of meta data attached to each row (version-id and rollback pointer, deleted flag). Indexes then fit into pages, which have additional overhead themselves. Lets try and account for page overhead with a 7% space premium (based on 15/16ths fill-factor):

              PRIMARY: (80 + 13) * 1.07 = 99.51
             username: (13 + 13) * 1.07 = 27.82
      last_login_date: (8+ 13) * 1.07 = 22.47
last_name, first_name: (26 + 13) * 1.07 = 41.73
        email_address: (25 + 13) * 1.07 = 40.66

I would consider the space premium of 7% optimistic, since this is the rate of new pages created. Over time pages will split as insertions are out of order and records are deleted. Secondary indexes also feature all MVCC values so they can ‘bloat’ via gaps created after modifications. I’m not sure of a good way to account for this, but I’m going to allocate another 25% on the indexes that I expect this will happen to:

              PRIMARY: 99.51 * 1 ~= 100 # Not affected
             username: 27.82 * 1.25 ~= 35
      last_login_date: 22.47 * 1.25 ~= 28
last_name, first_name: 41.73 * 1.25 ~= 52
        email_address: 40.66 * 1.25 ~= 51

So the final number I arrived at per row is:

100 + 35 + 28 + 52 + 51 = 266 bytes

Multiplied by 20 million this equals 4.95 GB ~= 5GB

Now it’s time to factor in some global overhead that is somewhat fixed:

  • Lets assume innodb_log_file_size is 256M * 2 = 512MB. On a bigger system you can now make this much larger in MySQL 5.6.
  • Data dictionary / insert buffer / doublewrite buffer. These are all going to be very small, since I would expect very little need for insert buffer on a 5GB table = 12MB.
  • UNDO information is probably going to be small for this table, since I would typically expect transactions to be short and modify 1 user at a time. I am not sure how to estimate this exactly, so I am going to budget 500MB.
  • I also have no way to estimate binary logs, and will budget 5GB for the purpose (defaults to 1GB before rotation, should be plenty of space. Largest concern I have is how frequently the last_login_date column is updated, and how much traffic it generates.)

Which equals 6GB of overhead.

The last thing to plan for is free space so we will be able to run an ALTER TABLEoperation and not run out of disk space ;)

Typically I like to provision 3x on total database size so I will always be able to stage a backup locally if I need to and still have some juggle room and not trigger Nagios free space alarms. With a database the size of this system, I probably will be able to use the 3x rule. When 3x is not practical to be able to provision for, I will normally settle for 2x globally and 3x of the largest table (which in this particular case there is only 1 table in the whole system).

So with that math, my final recommendation would be:

5GB + 6GB = 11GB * 3 = 33G

Did I pass the test? How would you answer the question?

If someone writes an answer that you like, please give them an upvote in the comments!


转自:http://www.tocker.ca/2013/11/15/back-of-the-envelope-calculations.html

### Fuzzy Logic Controller with Rule Viewer Implementation and Usage A fuzzy logic controller (FLC) is a form of control system that uses fuzzy logic rather than rigid binary or Boolean logic to reason about data. This approach allows the handling of imprecise information more effectively by mimicking human decision-making processes which are often based on vague or incomplete knowledge[^1]. Incorporating a rule viewer into an FLC provides visualization capabilities for understanding how rules interact within this type of system. The graphical representation helps developers debug their applications as well as communicate design choices clearly. To implement such controllers along with viewers typically involves several stages: #### Defining Inputs and Outputs The first step includes defining linguistic variables representing input/output parameters like temperature, pressure etc., alongside membership functions describing these terms mathematically through curves or shapes over continuous ranges instead of discrete values only found in traditional systems. For example: ```python import numpy as np from skfuzzy import gaussmf, gbellmf # Define universe of discourse for inputs/outputs universe_temperature = np.arange(0, 81, 1) # Membership function definitions cold = gaussmf(universe_temperature, mean=20, sigma=5) warm = gbellmf(universe_temperature, a=6, b=4, c=40) hot = gaussmf(universe_temperature, mean=60, sigma=7) ``` #### Establishing Rules Base Next comes setting up conditional statements linking antecedents (if part) with consequents (then part). These can be expressed using natural language expressions making them easier to comprehend compared to conventional programming languages syntaxes used elsewhere. Example rule set might look something similar below when written out informally: - IF Temperature IS Cold AND Humidity IS High THEN Fan Speed SHOULD BE Low. - ELSEIF Temperature IS Warm OR Humidity IS Medium THEN Fan Speed CAN VARY Between Medium & High Depending On Other Factors... These informal descriptions need conversion into machine-readable format suitable for further processing steps involved later down line during execution phase where actual decisions get made according to specified criteria outlined earlier hereunder discussion topic heading "Execution". #### Execution Phase During runtime operations after initialization has completed successfully without errors encountered previously while configuring settings beforehand; incoming sensor readings undergo fuzzification transforming crisp numerical quantities measured physically outside world environment directly surrounding target application domain area under consideration at hand moment now being discussed presently herein document text body paragraph section currently reading right away next few lines coming soon thereafter immediately following sentence ending just before starting new one beginning shortly thereupon afterwards henceforth forthwith promptly swiftly quickly rapidly almost instantly almost immediately very soon quite fast fairly rapidly relatively quickly somewhat speedily not too slowly but also certainly nowhere near instantaneously nor even close enough approximation thereof whatsoever whatsover whatever however anyhow anyway regardless nevertheless nonetheless notwithstanding despite anything else anyone could possibly say otherwise contrary contrariwise conversely inversely reversely vice versa oppositely anti-clockwise counterclockwise anticlockwise counter clockwise against all odds expectations predictions forecasts anticipations suppositions hypotheses theories conjectures speculations guesses estimations approximations rough calculations back-of-the-envelope computations napkin sketches doodles scribbles jottings notes reminders memos messages communications transmissions dispatches reports briefs summaries synopses abstracts digests condensations epitomes outlines frameworks structures scaffolds skeletons foundations bases platforms springboards launchpads jumping-off points takeoff points departure points starting points origins sources fountains springs wellsprings headwaters beginnings starts openings inceptions conceptions births creations formations establishments installations setups configurations arrangements organizations structurings orderings sequencings listings enumerations itemizations cataloguings inventories registers records logs journals diaries chronicles histories narratives stories tales accounts depictions portrayals representations presentations exhibitions demonstrations illustrations elucidations explanations clarifications interpretations translations renderings versions renditions editions issues publications releases distributions circulations propagations disseminations spread diffusions permeation penetration infiltration saturation impregnation imbuing infusion instilling endowment bestowal conferral impartation communication conveyance transmission transfer transference transportation carriage bearing carrying conveying delivering handing passing transferring transmitting transporting shipping freighting forwarding remitting sending issuing granting awarding presenting offering providing supplying furnishing affording conceding yielding allowing permitting enabling empowering authorizing licensing certifying accrediting qualifying credentialing entitling privileging favoring preferring choosing selecting electing picking taking having getting obtaining acquiring gaining winning earning meriting deserving receiving accepting welcoming embracing inviting attracting drawing pulling luring enticing tempting seducing persuading convincing coercing forcing compelling obliging binding tying chaining shackling fettering imprisoning incarcerating detaining restraining holding keeping maintaining sustaining supporting propping bolstering buttressing reinforcing strengthening fortifying hardening toughening stiffening solidifying consolidating crystallizing freezing fixing setting curing maturing ripening developing evolving progressing advancing moving forward onward ahead upstream uphill upward skyward heavenward northward southward eastward westward homeward inward outward backward behind beyond underneath beneath below lower deeper underground submerged drowned sunk buried hidden concealed covered masked veiled shrouded cloaked disguised camouflaged obscured obfuscated clouded darkened shadowed dimmed muted toned-down softened dulled blunted numbed deadened silenced hushed quieted stilled calmed pacified soothed placated appeased mollified assuaged alleviated mitigated moderated tempered tamed restrained controlled managed handled dealt-with coped-with lived-through survived endured borne tolerated put-up-with stood faced met confronted tackled
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值