Authorization

The question “what problems will I have when migrating to the cloud” gets asked often enough. If by cloud you mean Amazon EC2, then from a technical perspective there isn’t much that changes. The biggest thing that changes is just how you pay your bill.
Having said that, there’s still a few potential gotchas:
If you can live with these three things, then hopefully your migration should work smoothly. If you can’t free yourself from these limitations, then perhaps you should either look at a cloud hosting provider that can host non-virtualized servers for you in the same data center, or hold tight for the moment.
[Morgan via www.mysqlperformanceblog.com]
One of the most typical reasons for performance and scalability problems I encounter is simply failing to do the math. And these are typically bad one because it often leads to implementing architectures which are not up for job they are intended to solve.
Let me start with example to make it clear. Lets say you’re doing some reports from your apache log files – how many distinct visitors hit the page and stuff like that. You picked full logs because they are great in flexibility – you can run any adhoc queries and drill down as much as you like. Initially traffic was small and young and with 10000 page views a day you few days of history the queries there instant which gave you a confidence this approach will work.
As the time passes and you get 1.000.000 events per day and looking to do reporting for up to the whole year worth of data you find things not working any more with response times for individual queries taking half an hour or more when it previously took seconds.
So what math would be in the case like this ? Say you have query like “SELECT page,count(*) cnt FROM logs GROUP BY page ORDER BY cnt DESC LIMIT 100″ to find the most visited pages as a simple example.
Before you can do the math (or say apply mathematical model) you really need to understand how things work. I like to call it having X-Ray vision. If you do not understand what is happening and you see MySQL Server (or any system really) as a black box magically providing you with results you can’t model its behavior which means you have little or no ability to predict it without running benchmarks.
For one our project I needed proxy solution, and mysql-proxy is one of well-known, so it was logical first choice.
The obvious question which come in mind is what performance penalty we have using mysql-proxy, version 0.7.1.
So it is easy to test. (By the way sysbench recently was pushed to Launchpad, see lp:sysbench, and Percona is going to be active developer of this project and scripting benchmarks).
I took lp:sysbench with LUA script oltp_complex_ro, and tested for couple connections, here are results (in transactions per second, more is better)
| Threads | MySQL-5.0.77 | MySQL-proxy+MySQL-5.0.77 |
| 1 | 660.02 | 349.86 |
| 2 | 1158.66 | 477.77 |
| 4 | 1223.84 | 485.21 |
| 8 | 1224.22 | 455.69 |
| 16 | 1109.72 | 441.55 |
| 32 | 1059.23 | 419.05 |
| 64 | 909.98 | 414.30 |
| 128 | 882.46 | 406.28 |
ok, and let’s look on response time (for 1 thread).
MySQL
per-request statistics:
min: 1.31ms
avg: 1.51ms
max: 5.30ms
approx. 95 percentile: 1.56ms
Proxy+MySQL
per-request statistics:
min: 2.04ms
avg: 2.86ms
max: 6.44ms
approx. 95 percentile: 4.30ms
Well, I expected some penalty of using proxy… but 2-3x times, that’s overkill. Worth to consider if you want to run Query Analyzer with MySQL-proxy on your MySQL Enterprise setup.
There is alternative – Dormando-proxy, which I want to try also, but the problem is it crashed under sysbench load, so I was not able to get any results yet.
[Vadim - via www.mysqlperformanceblog.com]
We finally managed to get Intel X25-E SSD drive into our lab. I attached it to our Dell PowerEdge R900. The story making it running is worth separate mentioning - along with Intel X25-E I got HighPoint 2300 controller and CentOS 5.2 just could not start with two RAID controllers (Perc/6i and HighPoint 2300). The problem was solved by installing Ubuntu 8.10 which is currently running all this system. Originally I wanted to publish some nice benchmarks where InnoDB on SSD outperforms RAID 10, but recently I faced issue which can make previous results inconsistent.
In short words using Intel SSD X25-E card with enabled write-cache (which is default and most performance mode) does not warranty storing all InnoDB transactions on permanent storage.
I am having some déjà vu here, as Peter was rolling this 5 years ago http://lkml.org/lkml/2004/3/17/188 regarding regular IDE disks, and I did not expect this question poping up again.
Long story is:
I started with puting XFS on SSD and running very primitive test with INSERT INTO fs VALUES(0) into auto-increment field into InnoDB table. InnoDB parameters are
- innodb_buffer_pool_size=3G
- innodb_data_file_path=ibdata1:10M:autoextend
- innodb_file_per_table=1
- innodb_log_buffer_size=8M
- innodb_log_files_in_group=2
- innodb_log_file_size=256M
- innodb_thread_concurrency=0
- innodb_flush_log_at_trx_commit=1
- innodb_flush_method = O_DIRECT
Actually most interesting one are innodb_flush_log_at_trx_commit=1 and innodb_flush_method = O_DIRECT (I tried also default innodb_flush_method, with the same result), using innodb_flush_log_at_trx_commit=1 I expect to have all committed transactions even in case of system failure.
Browsers cannot start rendering till all Stylesheets loaded. Visitors see blank page (or previous page) during loading of CSS so it’s especially important to load these components faster.
If you have many Stylesheet components, then visitors will have to wait more because of HTTP requests overhead.
Let’s look what can be done to reduce time to render.
1. First, we want to move CSS components to the HEAD section of HTML document so the Stylesheet will be loaded with first priority.
Worse case is when CSS included on the bottom – in this case browser waits till whole document including all components loaded before start rendering. The same thing may happen when CSS included with @import directive, so avoid using @import.
2. If we have several Stylesheets we join them into one bigger CSS file. As described in the previous post about Image Sprites, we want to reduce amount of downloaded components because of HTTP request overhead. And if in case of images there is just a delay before loading next image, in case of Stylesheet it’s a worse case: visitors see blank page until all Stylesheets loaded. One single file loads faster because there will be minimal overhead and usually it compresses better too.
3. We can go even further: if user comes with empty cache we can avoid additional request by including Stylesheet required for the page into HTML. So there will be no requests for CSS at all during loading of the page.
We can preload Stylesheet after the page loaded so for primed cache page views user already has the Stylesheet cached. This way we reduce download size and avoid HTTP requests.
More detailed:
Set a cookie when visitor comes for first time. Check the cookie on every request to determine whether the visitor comes first time or not. This way we can guess cache state of the visitor. Don’t inline Stylesheet component if visitor comes with primed cache, use an external definition instead.
Postload the external Stylesheet component when visitor comes first time (without the cookie). See example below.
Cookie named “jsl” will be set when user sees this page for first time and next requests will use external Stylesheet component.
You can set such cookie in PHP:
$is_first = !isset($_COOKIE['jsl']);
if ($is_first)
setcookie('jsl',1,time()+3600*24,'/','.site.com',false,true);
The block above will output inline Stylesheet in case user comes for first time otherwise it will output link to the static Stylesheet components. You should place this code inside document head.
It will work correctly even in case user comes with empty cache (for example if our cache got forced out because cache size limit).
Style.css – global CSS file. Probably in real application instead of readfile(’style.css’) you will include only part of the Stylesheet that’s needed to render current page.
To find out CSS that’s really required for current page, you can use a Firefox extension: Dust-Me Selectors (note that this tool may not count CSS rules for elements you assign dynamically in Javascript).
Replace existing opening tag
with
>What if we cannot predict lifetime of page content? If we have a page with info that changes unpredictably we still can use browser cache to avoid unneeded traffic.
Using validation mechanism browser sends HTTP request with info about cache entry and server can respond that the content wasn’t changed.
There are two validation methods: one is based on Last-Modified and the other is based on Etag.
Last-Modified
Server sends Last-Modified header with datetime value that means the time when content was changed last time.
Cache-Control: must-revalidate
Last-Modified: 15 Sep 2008 17:43:00 GMT
The first header Cache-Control: must-revalidate means that browser must send validation request every time even if there is already cache entry exists for this object.
Browser receives the content and stores it in the cache along with the last modified value.
Next time browser will send additional header:
If-Modified-Since: 15 Sep 2008 17:43:00 GMT
This header means that browser has cache entry that was last changed 17:43.
Then server will compare the time with last modified time of actual content and if it was changed server will send the whole updated object along with new Last-Modified value.
If there were no changes since the previous request then there will be short empty-body answer:
HTTP/1.x 304 Not Modified
And browser will use the cached content.
In benchmarks passion (see my two previous posts) I managed to setup all three devices (RAID was on board; Intel X25-E SSD connected to HighPoint controller, FusionIO card) on our working horse Dell PowerEdge R900 (btw, to do that I had to switch from CentOS 5.2 to Ubuntu 8.10, as CentOS was not able to start with attached SSD card to HighPoint controller) and along with other tests I ran tpcc-like IO-bound workload on all devices.
For tests I used MySQL 5.4/InnoDB, and all other parameters are the same from previous posts (100W, buffer_pool 3GB). Filesystem – XFS mounted with nobarrier option.
Graphical results are here
and average results:
RAID10 – 7439.850 TPM
SSD – 10681.050 TPM
FusionIO – 17372.250 TPM
However what should be noted – both SSD and FusionIO are run in “non-durable” mode, that is you may lose some transactions in case of power outage (see my post http://itezer.com/blog/mysql-performance/161-SSD-XFS-LVM-fsync-write-cache-barrier-and-lost-transactions.html).
While results for SSD (note it is single device, in comparison to RAID 10 on 8 disks) and FusionIO are impressive, it is worth to consider price/performance parameter.
Here is my very rough calculation:
For RAID 10 we use 8 73GB SAS 2.5″ 15K RPM disks, with price 190$ per disks it gives us 1520$ for 292GB useful space, or ~ 5.2$ per GB.
For SSD I can get 32GB card for 390$, which is ~12.1$ per GB
For FusionIO I really not sure what is price (it was given as only for tests), but quick googling gave me 30$ per GB, so for 160GB card gives 4800$.
Now simple dividing TPM on price of IO system, we have
RAID 10 – 4.8 TPM / $
SSD – 27 TPM / $
FusionIO – 3.6 TPM / $
Please note that price of transaction is not the main criteria to consider, as total TCO for systems with SSD may be much cheaper (considering you need less servers, less space, less power). Also worth to consider that SSD is only 32GB space and to have the same space as FusionIO we need 4 cards (but it still will be cheaper than FusionIO), but it also may improve performance as such setup will be able to handle IO requests in parallel.
Recently I found another interesting service: DURIS (Data URI Sprites). This is an automated solution for building Data:URI CSS Sprites for background images.
Basically you provide a page URL, then the service loads all styles of the page (both internal and external), finds all background images and encodes them into base64 (for data:URI format). As output, it provides a tar.gz file with HTML file and CSS files with encoded images in textual format.
The HTML contains the original pages with built-in code that checks browser version and plugs the styles with encoded images. There are several version of the styles for different browsers:
The advantages are:
This is a new tool that’s in beta stage so I would not use it in production.
The tool written in Java and the authors are going to opensource it as soon as release-candidate is ready.
(via Habr article in Russian)
I remind that besides Data:URI Sprites we have classic Image Sprites that involve joining images into another big image (opposite to base64 supposed by Data:URI). There are number of tools exist for Image Sprites and most advanced that I know is SmartSprites.
The advantage of Image Sprites over Data:URI Sprites is that it’s supported by almost all browsers including very old versions. The disadvantage is that Image Sprites composition cannot be fully automated and every time should be controlled by web developer.
Even fast connections suffer from multiple HTTP request overhead because of network latency and HTTP request processing overhead.
MSIE7 opens no more than 2 connections per domain, Firefox3 – 4-6 connections, Google Chrome – 6 connections.
Let’s suppose that time to first byte is 70ms (typical response time of http://google.com via 1.5 Mbps ADSL connection from Dulles)
If a page view involves loading of 20 images from the same domain, then total overhead in MSIE7 is 20/2*70ms = 700ms. The more images site contains, the more time visitor will have to wait.
The popular solution is to use CSS Image Sprites:
Join design/logo images into one bigger image. Use spriting technique (css background) to show the images on page. An image can be defined via css property background-position.
This approach is compatible with modern browsers (and even with IE4).
Let’s review the process step by step:
Step1.
Replace image entries with
It is a known fact that ext3 is not the most efficient file system out there and for example file removals can be
painfully slow and cause a lot of random I/O. However, as it turns out, it can sometimes have a much more severe impact on the MySQL performance that it would seem. When or why?
When you run DROP TABLE there are several things that need to happen – write lock on a table so it cannot be used by any other thread, the data file(s) removal by the storage engine and of course in the end MySQL has to destroy the definition file (.frm). That's not all that happens, there is one other thing:
CODE:
- VOID(pthread_mutex_lock(&LOCK_open));
- error= mysql_rm_table_part2(thd, tables, if_exists, drop_temporary, 0, 0);
- pthread_mutex_unlock(&LOCK_open);