<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Madhax.org</title>
	<atom:link href="http://madhax.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://madhax.org</link>
	<description>code, caffeine and culture</description>
	<lastBuildDate>Wed, 07 Apr 2010 19:34:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Non-programmer Programmer</title>
		<link>http://madhax.org/2010/04/programmer-politics-the-non-programmer-programmer/</link>
		<comments>http://madhax.org/2010/04/programmer-politics-the-non-programmer-programmer/#comments</comments>
		<pubDate>Wed, 07 Apr 2010 19:33:29 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=131</guid>
		<description><![CDATA[One of the most frustrating things with working in a team is the arguments that inevitably show up in design meetings. I am blogging about the frustration that typically stems from ignorant managers or sales executives. 
Scenario 1:
Manager: we need to add twitter-like functionality
Programmer: We need to integrate micro-blogging in our product?
Manager: It’ll make our [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most frustrating things with working in a team is the arguments that inevitably show up in design meetings. I am blogging about the frustration that typically stems from ignorant managers or sales executives. </p>
<p><b>Scenario 1:</b><br />
<em>Manager:</em> we need to add twitter-like functionality<br />
<em>Programmer:</em> We need to integrate micro-blogging in our product?<br />
<em>Manager:</em> It’ll make our product better.<br />
<em>Programmer:</em>  You want me to make twitter from scratch?<br />
<em>Manager:</em> Yes, just save texts to a database and output it. Stop being lazy.</p>
<p>This scenario goes beyond the simple feature creep phenomena in that it is a direct result of managerial inexperience or ignorance (as this issue could be avoided with a properly trained inexperienced manager.) </p>
<p>I had become very aware of this when I was involved in a startup with a friend who wasn’t a very good programmer, but was a better visual designer/business person than I. At first I thought it was just growing pains associated with working with someone new, but some people NEVER learn. My perspective only solidified when I became a part of the corporate culture.</p>
<p>I’ve learned that the best way to deal with a situation like this (if being civil doesn’t work) is to be complicit and get them into the role of architecting their idea. They will either: give up right away, as they don’t want to do the work themselves; give up eventually, as they will realize the idea is unrealistic; or eventually fail, which will kill their credibility and their ego.</p>
<p>Programming should be left up to people who are experienced programmers… not “tech-savvy” teach yourself C++ programming in 21 days MBAs.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2010/04/programmer-politics-the-non-programmer-programmer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Delete Duplicate Rows From Database</title>
		<link>http://madhax.org/2010/01/delete-duplicate-rows-from-database/</link>
		<comments>http://madhax.org/2010/01/delete-duplicate-rows-from-database/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 13:28:52 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=108</guid>
		<description><![CDATA[I investigated various ways of deleting duplicate rows in a MySQL table. My problem required that the method should be quick, fault tolerant, verbose and be able to handle large datasets. Following is a list of common ways to either select distinct rows or delete duplicate rows from MySQL tables, along with how long it [...]]]></description>
			<content:encoded><![CDATA[<p>I investigated various ways of deleting duplicate rows in a MySQL table. My problem required that the method should be quick, fault tolerant, verbose and be able to handle large datasets. Following is a list of common ways to either select distinct rows or delete duplicate rows from MySQL tables, along with how long it took (or how long it was taking). I would like to preface by saying that having the data column indexed isn’t a viable solution because it significantly reduces the insertion time when the table is large. </p>
<p><b>select distinct method</b></p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">12</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #993333; font-weight: bold;">DISTINCT</span> <span style="color: #993333; font-weight: bold;">DATA</span> <span style="color: #993333; font-weight: bold;">FROM</span> test;
<span style="color: #cc66cc;">959058</span> rows <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span> min <span style="color: #cc66cc;">37.36</span> sec<span style="color: #66cc66;">&#41;</span>
&nbsp;
<span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">255</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+--------------------------------+</span>
<span style="color: #66cc66;">|</span> Id <span style="color: #66cc66;">|</span> User <span style="color: #66cc66;">|</span> Host           <span style="color: #66cc66;">|</span> db     <span style="color: #66cc66;">|</span> Command <span style="color: #66cc66;">|</span> Time <span style="color: #66cc66;">|</span> State                        <span style="color: #66cc66;">|</span> Info                           <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+--------------------------------+</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">9</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">2728</span> <span style="color: #66cc66;">|</span> testdb <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span>    <span style="color: #cc66cc;">0</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>                         <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist               <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">17</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">3592</span> <span style="color: #66cc66;">|</span> testdb <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">1243</span> <span style="color: #66cc66;">|</span> Copying <span style="color: #993333; font-weight: bold;">TO</span> tmp <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #993333; font-weight: bold;">ON</span> disk <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #993333; font-weight: bold;">DISTINCT</span> <span style="color: #993333; font-weight: bold;">DATA</span> <span style="color: #993333; font-weight: bold;">FROM</span> test <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+--------------------------------+</span>
&nbsp;
<span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">12</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> tmp<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #993333; font-weight: bold;">DISTINCT</span> <span style="color: #993333; font-weight: bold;">DATA</span> <span style="color: #993333; font-weight: bold;">FROM</span> test;
Query OK<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">959044</span> rows affected <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">3</span> min <span style="color: #cc66cc;">4.89</span> sec<span style="color: #66cc66;">&#41;</span>
Records: <span style="color: #cc66cc;">959044</span>  Duplicates: <span style="color: #cc66cc;">0</span>  Warnings: <span style="color: #cc66cc;">0</span>
&nbsp;
<span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">255</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> tmp<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #993333; font-weight: bold;">DISTINCT</span> <span style="color: #993333; font-weight: bold;">DATA</span> <span style="color: #993333; font-weight: bold;">FROM</span> test;
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+------------------------------------------------------+</span>
<span style="color: #66cc66;">|</span> Id <span style="color: #66cc66;">|</span> User <span style="color: #66cc66;">|</span> Host           <span style="color: #66cc66;">|</span> db     <span style="color: #66cc66;">|</span> Command <span style="color: #66cc66;">|</span> Time <span style="color: #66cc66;">|</span> State                        <span style="color: #66cc66;">|</span> Info                                                 <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+------------------------------------------------------+</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">8</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">2725</span> <span style="color: #66cc66;">|</span> testdb <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">7591</span> <span style="color: #66cc66;">|</span> Copying <span style="color: #993333; font-weight: bold;">TO</span> tmp <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #993333; font-weight: bold;">ON</span> disk <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> tmp<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #993333; font-weight: bold;">DISTINCT</span> <span style="color: #993333; font-weight: bold;">DATA</span> <span style="color: #993333; font-weight: bold;">FROM</span> test <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">9</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">2728</span> <span style="color: #66cc66;">|</span> testdb <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span>    <span style="color: #cc66cc;">0</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>                         <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist                                     <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+------------------------------------------------------+</span></pre></div></div>

<p><b>equal data different key method</b></p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #cc66cc;">10</span> 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">12</span> bytes<span style="color: #66cc66;">.</span> 
<span style="color: #993333; font-weight: bold;">DELETE</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`test`</span> <span style="color: #993333; font-weight: bold;">USING</span> <span style="color: #ff0000;">`test`</span><span style="color: #66cc66;">,</span> <span style="color: #ff0000;">`test`</span> <span style="color: #993333; font-weight: bold;">AS</span> vtable <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #ff0000;">`test`</span><span style="color: #66cc66;">.</span>id <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">`vtable`</span><span style="color: #66cc66;">.</span>id<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">`test`</span><span style="color: #66cc66;">.</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">`vtable`</span><span style="color: #66cc66;">.</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&gt;</span><span style="color: #cc66cc;">5</span> mins</pre></div></div>

<p><b>equal data different key improved method</b></p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #cc66cc;">10</span> 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">12</span> bytes<span style="color: #66cc66;">.</span> 
mysql<span style="color: #66cc66;">&gt;</span>delete <span style="color: #993333; font-weight: bold;">FROM</span> test <span style="color: #993333; font-weight: bold;">USING</span> test<span style="color: #66cc66;">,</span> test <span style="color: #993333; font-weight: bold;">AS</span> vtable <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #66cc66;">&#40;</span>test<span style="color: #66cc66;">.</span>id <span style="color: #66cc66;">&gt;</span>; vtable<span style="color: #66cc66;">.</span>id<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #66cc66;">&#40;</span>test<span style="color: #66cc66;">.</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">=</span>vtable<span style="color: #66cc66;">.</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&gt;</span><span style="color: #cc66cc;">5</span> mins</pre></div></div>

<p><b>add unique index method</b></p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">12</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">ALTER</span> <span style="color: #993333; font-weight: bold;">IGNORE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #ff0000;">`test`</span> <span style="color: #993333; font-weight: bold;">ADD</span> <span style="color: #993333; font-weight: bold;">UNIQUE</span> <span style="color: #993333; font-weight: bold;">INDEX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">&#41;</span>;
Query OK<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1000000</span> rows affected <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span> min <span style="color: #cc66cc;">51.14</span> sec<span style="color: #66cc66;">&#41;</span>
Records: <span style="color: #cc66cc;">1000000</span>  Duplicates: <span style="color: #cc66cc;">4</span>  Warnings: <span style="color: #cc66cc;">0</span>
&nbsp;
<span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">255</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">ALTER</span> <span style="color: #993333; font-weight: bold;">IGNORE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #ff0000;">`test`</span> <span style="color: #993333; font-weight: bold;">ADD</span> <span style="color: #993333; font-weight: bold;">UNIQUE</span> <span style="color: #993333; font-weight: bold;">INDEX</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">DATA</span><span style="color: #66cc66;">&#41;</span>;
Query OK<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1000000</span> rows affected <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">6</span> hours <span style="color: #cc66cc;">37</span> min <span style="color: #cc66cc;">49.35</span> sec<span style="color: #66cc66;">&#41;</span>
Records: <span style="color: #cc66cc;">1000000</span>  Duplicates: <span style="color: #cc66cc;">0</span>  Warnings: <span style="color: #cc66cc;">0</span></pre></div></div>

<p><b>group by data method</b></p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">12</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> temp_table <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> test <span style="color: #993333; font-weight: bold;">GROUP</span> <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #993333; font-weight: bold;">DATA</span>;
Query OK<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">959175</span> rows affected <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">10</span> min <span style="color: #cc66cc;">50.14</span> sec<span style="color: #66cc66;">&#41;</span>
Records: <span style="color: #cc66cc;">959175</span>  Duplicates: <span style="color: #cc66cc;">0</span>  Warnings: <span style="color: #cc66cc;">0</span>
&nbsp;
<span style="color: #cc66cc;">1</span> 000 000 rows<span style="color: #66cc66;">.</span> Duplicates unlikely<span style="color: #66cc66;">.</span> Datalen <span style="color: #cc66cc;">255</span> bytes
mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+-------------------------------------------------------------+</span>
<span style="color: #66cc66;">|</span> Id <span style="color: #66cc66;">|</span> User <span style="color: #66cc66;">|</span> Host           <span style="color: #66cc66;">|</span> db     <span style="color: #66cc66;">|</span> Command <span style="color: #66cc66;">|</span> Time <span style="color: #66cc66;">|</span> State                        <span style="color: #66cc66;">|</span> Info                                                        <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+-------------------------------------------------------------+</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">9</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">2728</span> <span style="color: #66cc66;">|</span> testdb <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">5013</span> <span style="color: #66cc66;">|</span> Copying <span style="color: #993333; font-weight: bold;">TO</span> tmp <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #993333; font-weight: bold;">ON</span> disk <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> temp_table <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> test <span style="color: #993333; font-weight: bold;">GROUP</span> <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #993333; font-weight: bold;">DATA</span> <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">19</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">2185</span> <span style="color: #66cc66;">|</span> testdb <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span>    <span style="color: #cc66cc;">0</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>                         <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist                                            <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+------+----------------+--------+---------+------+------------------------------+-------------------------------------------------------------+</span>
<span style="color: #cc66cc;">2</span> rows <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.09</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>None of the above solutions are suitable for a number of reasons. Adding a unique index creates the problem of long insertion times for subsequent INSERT queries. The growth seems to be exponential. There isn’t a way to see how long the query will take to complete. Should the machine fail while the week&gt; long query is executing then the table is likely to be corrupted, without being partially completed. </p>
<p>	A better solution is to create a python script (or use any other language for that matter) that makes use of the MySQL API to delete duplicates. The design is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">-N in memory hash tables that store the indices to a distinct column in a file
&nbsp;
-N stream formatted (data + \0) files that store distinct columns
&nbsp;
-N queues which store the hashes and the rows that need to be checked for their respective hash table
&nbsp;
-Uniformly distributed hashing function to be used on the duplicate column</pre></div></div>

<p>	Multiple files help reduce (or eliminate) the overall seek-time on the hard drive. The result of the hashing function on a column specifies a hash table and an index into the hash table. i.e.
<pre>hash_table = fn(data)%num_hash_tables, index = fn(data)%size_of_hash_table</pre>
<p> File offsets to a distinct column are stored in the hash table. </p>
<p>	As each row is hashed it is placed into a queue for later processing. When a finite number (10 000~) rows has been distributed among the queues, each queue is processed in sequence to reduce hard drive seek time. 		</p>
<p><b>Python-like pseudo code with brevity in mind</b></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">row_cnt = <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">for</span> row <span style="color: #ff7700;font-weight:bold;">in</span> database:
	ntable = <span style="color: #008000;">hash</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span><span style="color: #66cc66;">%</span>num_hash_tables
	index = <span style="color: #008000;">hash</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span><span style="color: #66cc66;">%</span>size_of_hash_table
	queues<span style="color: black;">&#91;</span>ntable<span style="color: black;">&#93;</span>.<span style="color: black;">push</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>index, data, primary_key<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
	row_cnt++
	<span style="color: #ff7700;font-weight:bold;">if</span> row_cnt == <span style="color: #ff4500;">10000</span>:
		<span style="color: #ff7700;font-weight:bold;">for</span> each entry <span style="color: #ff7700;font-weight:bold;">in</span> each queue:
			fileoffset  = hash_tables<span style="color: black;">&#91;</span>num<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>index<span style="color: black;">&#93;</span>
			<span style="color: #ff7700;font-weight:bold;">if</span> data == read_data<span style="color: black;">&#40;</span>num + “_file”, fileoffset<span style="color: black;">&#41;</span>:
				add_new_entry_to_file_and_hash_table<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
			<span style="color: #ff7700;font-weight:bold;">else</span>:
				delete_row<span style="color: black;">&#40;</span>primary_key<span style="color: black;">&#41;</span>
	row_cnt=<span style="color: #ff4500;">0</span></pre></div></div>

<p>	Depending on the complexity of the hash function, size of the table, and size of the column, this method could be slower. For any significantly large non-indexed table it is better than any of the above solutions. It is also easy to determine how long the script will take to complete and it is also easy to make the script fault tolerant (could be safely interrupted). </p>
<p>	For <b>large databases</b> – the number of rows exceeds the number of offsets that can be stored in memory &#8211; one would use a variation of the algorithm. Store N distinct rows in the hash tables, and keep track of the primary key when N values have been stored. Iterate through remaining rows deleting any duplicates. When finished iterating, empty hash tables and start storing new hashes/deleting duplicates beginning from the previous primary key that was stored. </p>
<p><b>more python-like pseudo code with brevity in mind</b></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">start_value = <span style="color: #ff4500;">0</span>
new_start_value = <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
	row_cnt = <span style="color: #ff4500;">0</span>
	<span style="color: #ff7700;font-weight:bold;">for</span> row <span style="color: #ff7700;font-weight:bold;">in</span> database starting at primary key start_value:
		ntable = <span style="color: #008000;">hash</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span><span style="color: #66cc66;">%</span>num_hash_tables
		index = <span style="color: #008000;">hash</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span><span style="color: #66cc66;">%</span>size_of_hash_table
		queues<span style="color: black;">&#91;</span>ntable<span style="color: black;">&#93;</span>.<span style="color: black;">push</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>index, data, primary_key<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
		row_cnt++
		<span style="color: #ff7700;font-weight:bold;">if</span> row_cnt == <span style="color: #ff4500;">10000</span>:
			<span style="color: #ff7700;font-weight:bold;">for</span> each entry <span style="color: #ff7700;font-weight:bold;">in</span> each queue:
				fileoffset  = hash_tables<span style="color: black;">&#91;</span>num<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>index<span style="color: black;">&#93;</span>
				<span style="color: #ff7700;font-weight:bold;">if</span> data <span style="color: #66cc66;">!</span>= read_data<span style="color: black;">&#40;</span>num + “_file”, fileoffset<span style="color: black;">&#41;</span>:
					add_new_entry_to_file_and_hash_table_if_not_full<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
					<span style="color: #ff7700;font-weight:bold;">if</span> hash_table_full<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> start_value == new_start_value:
						new_start_value = primary_key
				<span style="color: #ff7700;font-weight:bold;">else</span>:
					delete_row<span style="color: black;">&#40;</span>primary_key<span style="color: black;">&#41;</span>
		row_cnt=<span style="color: #ff4500;">0</span>
	empty_hash_tables<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	<span style="color: #ff7700;font-weight:bold;">if</span> start_value == new_start_value:
		<span style="color: #ff7700;font-weight:bold;">break</span>
	<span style="color: #ff7700;font-weight:bold;">else</span>:
		start_value = new_start_value</pre></div></div>

<p>	To put the performance increase into perspective, for 1 000 000 rows, datalen 255 with duplicates unlikely, it handled 10 000 rows in an average time of 4 seconds. All duplicates were removed in roughly 7 minutes, whereas all other solutions took hours. The speed increase gets better as the dataset gets larger.</p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2010/01/delete-duplicate-rows-from-database/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>JavaFX Won’t Stream Media Anytime Soon</title>
		<link>http://madhax.org/2009/11/javafx-won%e2%80%99t-stream-media-anytime-soon/</link>
		<comments>http://madhax.org/2009/11/javafx-won%e2%80%99t-stream-media-anytime-soon/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 18:49:35 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=100</guid>
		<description><![CDATA[A couple weeks ago I had been working on something I thought would be pretty cool. The idea was a video player that could stream videos from external servers. The way I had planned on accomplishing this was via JavaFX and a signed applet. Needless to say (if you read the title) it didn’t really [...]]]></description>
			<content:encoded><![CDATA[<p>A couple weeks ago I had been working on something I thought would be pretty cool. The idea was a video player that could stream videos from external servers. The way I had planned on accomplishing this was via JavaFX and a signed applet. Needless to say (if you read the title) it didn’t really work out. I had managed to stream traditional FLV video files… but a large portion of video sites – at least the more popular ones that I’ve tried – make use of H.264 video files. JavaFX supposedly supports all media that is supported by Windows media player [1]</p>
<p><strong>Problem 1: <em>I couldn’t stream H.264 files. </em><br />
</strong> <strong>Fine print: <em>I COULD play a local H.264 file</em></strong></p>
<p>I tried working around the problem of not being able to stream H264 files by downloading portions of the video – luckily the bleeding edge version of JavaFX (1.2) allowed the output of httprequests to be redirected to file – and playing those portions in sequence.</p>
<p><strong>Problem 2: <em>javafx.io.http.HttpRequest locks file while downloading… which makes using a media file impossible until the download is complete.</em><br />
</strong> <strong>Fine Print: <em>I imagine that HttpRequest wasn’t meant to download media files&#8230;  but it would be incredibly useful to download and play a media file.</em></strong></p>
<p>I then tried to build a library in native java that would release the file lock and allow JavaFX to load the media. This worked for the first bit that had been downloaded (and flushed to a file), but then I had to update file. I deleted the media object that made use of the local file and was hoping that it would release the file lock it put on the file while reading it.</p>
<p><strong>Problem 3: <em>It didn’t release the file lock… and there IS NOW WAY to force the release of system resources held by JavaFX.</em></strong></p>
<p>There is no fine print for this… this is completely inexcusable.  It makes it impossible to update any resource (I am assuming this problem exists beyond video files I tried to play.) Whoever designed the language and omitted an interface – or a means – to delete a language native object, had a very narrow minded view on how the language should be used. I could’ve wrote each bit to a new file and then load the new file, but I would have to keep track of FLV offsets and correctly build a new FLV header for each new file and even then it will end up looking like a series of clips. At this point I had given up and moved onto a new idea that caught my interest. These are some more problems I’ve encountered that relate to JavaFX.</p>
<p><strong>Problem 4: <em>JavaFX had a problem handling  fileoffsets meta tag in the FLV header for some videos for me. To play videos I had to filter them out.  (Yes, it meant correctly rebuilding an FLV stream… yes… I did it =\)</em></strong></p>
<p><strong>Problem 5: <em>The JavaFX forums were inactive. Not one person had replied to any thread I had started.</em></strong></p>
<p><strong>Problem 6: <em>No one wants to wait 20 seconds for the JVM to fire-up just to play a quick game or music video.</em></strong></p>
<p><strong>Problem 7: <em>Video support doesn’t come out of the box. So if you want your users to be able to watch videos… they will not only have to download Java and JavaFX (both are bundled), but also a third party codec pack.</em></strong></p>
<p><em> </em>In retrospect, I put a lot of work in hacking away at the problem. I don’t think I failed… I got it to work… it’s just that the effort  I – and a user – would have to go through makes using JavaFX impractical. At least it was fun learning about FLVs [2].</p>
<p>[1] <a href="http://www.javafx.com/docs/articles/media/format.jsp">http://www.javafx.com/docs/articles/media/format.jsp</a><br />
[2] <a href="http://www.adobe.com/devnet/flv/pdf/video_file_format_spec_v10.pdf">http://www.adobe.com/devnet/flv/pdf/video_file_format_spec_v10.pdf</a></p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2009/11/javafx-won%e2%80%99t-stream-media-anytime-soon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Facebook Platform</title>
		<link>http://madhax.org/2009/07/facebook-platform/</link>
		<comments>http://madhax.org/2009/07/facebook-platform/#comments</comments>
		<pubDate>Fri, 17 Jul 2009 02:40:05 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=87</guid>
		<description><![CDATA[	I recently had an interesting experience with Facebook. I made an application that let a user search for another user’s public albums even if they weren’t on each other’s friends list. This application didn’t violate any clause of the TOS list at the time. The application made use of a design decision on Facebook’s part [...]]]></description>
			<content:encoded><![CDATA[<p>	I recently had an interesting experience with Facebook. I made an application that let a user search for another user’s public albums even if they weren’t on each other’s friends list. This application didn’t violate any clause of the TOS list at the time. The application made use of a design decision on Facebook’s part to allow world readable albums. When you would create an album on Facebook,  you can set the privacy setting of the album. One of the choices (the one selected by default) is that the album would be world readable. I made use of the Facebook API (PHP, SQL for caching, FQL for speed, all that jazz) to create the application. I even made a slick interface that looked very facebook-ish. For lols I created a tutorial that centered around viewing Mark Zuckerberg’s photos.  In the first couple days after the app had launched, I had gotten several hundred users and several 5-star reviews (all the reviews were 5-star).  </p>
<p>	I decided to submit the application to the Facebook public application directory (stupid me&#8230; I know) – which would make it visible to any user that is specifically looking for a super awesome application that lets them view the public albums of any user on Facebook. The application was submitted for approval and I waited. One day passed, then two  &#8211; I thought that the approval process would only take one business day, since it just involves a person to add and try an application out.  A couple days after submitting the application I noticed that I could no longer view Mark Zuckerberg’s photos. Not that I was stalking Mark Zuckerberg, but I like to run through things I make – daily &#8211; to make sure they aren’t broken. Viewing Mark Zuckerberg’s photos had become routine from a testing standpoint.  Testing my app on other users showed that my application _still_ worked. </p>
<p>	Doing a regex search in my apache logs showed that my application had been added (and removed) by _many_ ips in the Facebook subnet over 250 times. The regex count returned 255 (WHAT A ROUND NUMBER :D ). Each IP interacted with my application in some way, which suggests that it wasn’t a bot that kept on adding and removing my application. I got really excited because I thought that many employees of facebook saw my app, and in Mark Zuckerberg’s embarrassment, he changed the privacy setting of his albums (thought: MARK ZUCKERBERG HIMSELF SAW MY APP, WOW :D). <br /><img src="http://madhax.org/wp-content/uploads/bearbear.jpg" alt="facebook" /><br /><b>(photo from one of Mark Zuckerberg&#8217;s albums)</b></p>
<p>	Today I tried my app, it barely returns any results on any search. All the searches I tried previously no longer work. Facebook hurriedly fixed a problem that didn’t exist… and it broke my app. What I mean by a problem that doesn’t exist is that I was taking advantage of a design decision Facebook made. Albums had a privacy setting that allowed everyone to view them. I could view what albums a user had using FQL, now I can’t (it still works for some people… but everyday I’m getting less and less results… so I assume Facebook is fixing each account sequentially or something.) The world readable attribute still exists and it’s still the default choice when creating a new album.</p>
<p>	What bothers me most about this experience is not the hours I put in developing the app. Not that I shared this application publicly and won’t be able to use it privately. What bothers me most about this is that Facebook delayed the approval of my application for the public directory listing until it didn’t work.  It’s their platform. They can change whatever functionality they want… send no notification to you…and if they break your app there’s nothing you can do about it.</p>
<p>	Facebook did what they had to do to accomplish their goals (avoid embarassment, fix an initial design error, w/e.) I now view them as an evil company, but that&#8217;s just my opinion. Luckily, I cached (for optimization purposes) the albums IDs of Mark Zuckerberg&#8217;s albums :D.</p>
<p>ref:<br />
[1] My application: <a href='http://apps.facebook.com/publicphotos/'>http://apps.facebook.com/publicphotos/</a><br />
[2] <a href='http://apps.facebook.com/publicphotos/album.php?a=17182073944'>http://apps.facebook.com/publicphotos/album.php?a=17182073944</a><br />
[3] <a href='http://apps.facebook.com/publicphotos/album.php?a=17181871868'>http://apps.facebook.com/publicphotos/album.php?a=17181871868</a><br />
[4] <a href='http://apps.facebook.com/publicphotos/album.php?a=17181903264'>http://apps.facebook.com/publicphotos/album.php?a=17181903264</a><br />
[5] <a href='http://apps.facebook.com/publicphotos/album.php?a=17181916415'>http://apps.facebook.com/publicphotos/album.php?a=17181916415</a></p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2009/07/facebook-platform/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MySQL ALTER TABLE is Slow</title>
		<link>http://madhax.org/2009/06/mysql-alter-table-is-slow/</link>
		<comments>http://madhax.org/2009/06/mysql-alter-table-is-slow/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 21:57:09 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=65</guid>
		<description><![CDATA[Recently I discovered that the recommended way of changing a column type of a table in MySQL is intolerably slow.  By recommended I mean that the first result of searching for “how to change column type mysql” (excluding quotes) is a link to the MySQL developer documentation for an ALTER TABLE query.
The circumstances started [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I discovered that the recommended way of changing a column type of a table in MySQL is intolerably slow.  By recommended I mean that the first result of searching for “how to change column type mysql” (excluding quotes) is a link to the MySQL developer documentation for an ALTER TABLE query.</p>
<p>The circumstances started with me needing a table to store MD5 hashes. The table definition I had come up with at the time was</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">DESCRIBE</span> mytable;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">-------+------------+------+-----+------------------+-------+</span>
<span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">FIELD</span> <span style="color: #66cc66;">|</span> Type       <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">KEY</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span>          <span style="color: #66cc66;">|</span> Extra <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">-------+------------+------+-----+------------------+-------+</span>
<span style="color: #66cc66;">|</span> HASH  <span style="color: #66cc66;">|</span> CHAR  <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">16</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> NO   <span style="color: #66cc66;">|</span> PRI <span style="color: #66cc66;">|</span>                  <span style="color: #66cc66;">|</span>       <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">-------+------------+------+-----+------------------+-------+</span>
<span style="color: #cc66cc;">1</span> row <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.03</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>The design error is that I had defined the `HASH` as char(16). A properly escaped MD5 digest would be stored correctly in the table and I was able to fetch any digest – via MySQL command line applications – as it was stored in the table (Data after \0’s would get returned.) There wasn’t any problem until I chose to make optimizations to the project and make use of functionality provided by a MySQL API.</p>
<p>When selecting any char(x) column from a table the API would read up to a \0 (NULL, int(0).) So I would get varying column lengths depending on the digest in the current row. It’s reasonable for the APIs to assume that because I have defined a column as a sequence of chars then a null terminated string would be stored in it … especially since there is another data type that is specifically used for binary data.</p>
<p>This realization came AFTER I had already populated the table with a lot of data. The table had grown to a fairly large size.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> COUNT<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`mytable`</span>;
&nbsp;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----------+</span>
<span style="color: #66cc66;">|</span> COUNT<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----------+</span>
<span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">38744395</span> <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----------+</span>
<span style="color: #cc66cc;">1</span> row <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.33</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>I was looking to fix my mistake. The ideal solution would be to change the column type. Research done via Google and looking through a book (High Performance MySQL) lead me to believe that</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">ALTER</span> <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #ff0000;">`mytable`</span> <span style="color: #993333; font-weight: bold;">MODIFY</span> HASH <span style="color: #993333; font-weight: bold;">BINARY</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">16</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'’;</span></pre></div></div>

<p>would be the best way to go about changing a column type. I was wrong. The query ran for days. This is the result of a “SHOW PROCESSLIST” a few days into the query.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">------+------+----------------+----------+---------+--------+-------------------+----------------------------------------------------------------------+</span>
<span style="color: #66cc66;">|</span> Id   <span style="color: #66cc66;">|</span> User <span style="color: #66cc66;">|</span> Host           <span style="color: #66cc66;">|</span> db       <span style="color: #66cc66;">|</span> Command <span style="color: #66cc66;">|</span> Time   <span style="color: #66cc66;">|</span> State             <span style="color: #66cc66;">|</span> Info                                                                 <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">------+------+----------------+----------+---------+--------+-------------------+----------------------------------------------------------------------+</span>
<span style="color: #66cc66;">|</span>   <span style="color: #cc66cc;">82</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">4442</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>     <span style="color: #66cc66;">|</span> Sleep   <span style="color: #66cc66;">|</span>     <span style="color: #cc66cc;">52</span> <span style="color: #66cc66;">|</span>                   <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>                                                                 <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">912</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">3677</span> <span style="color: #66cc66;">|</span> mydb     <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">275084</span> <span style="color: #66cc66;">|</span> copy <span style="color: #993333; font-weight: bold;">TO</span> tmp <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">ALTER</span> <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #ff0000;">`mytable`</span> <span style="color: #993333; font-weight: bold;">MODIFY</span> HASH <span style="color: #993333; font-weight: bold;">BINARY</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">16</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">''</span>     <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">2672</span> <span style="color: #66cc66;">|</span> root <span style="color: #66cc66;">|</span> localhost:<span style="color: #cc66cc;">4527</span> <span style="color: #66cc66;">|</span> <span style="color: #66cc66;">********</span> <span style="color: #66cc66;">|</span> Query   <span style="color: #66cc66;">|</span>      <span style="color: #cc66cc;">0</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>              <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">SHOW</span> processlist                                                     <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">------+------+----------------+----------+---------+--------+-------------------+----------------------------------------------------------------------+</span></pre></div></div>

<p>A day later the query stopped executing because `mytable` was corrupted. It didn’t occur to me that my experimenting may have corrupted the table – so I didn’t bother running a consistency check. The MySQL client should’ve probably verified the tables before executing any query that is expected to run for more than a day. Regardless, I would have thought that the column modification would be near instantaneous. The column data wouldn’t need to be changed or converted (I used a single byte character set). All that would be required is for the particular column to be recognized as a different type. An ALTER TABLE query shouldn’t be this slow.  </p>
<p>In contrast, a full table scan – which involves reading all the data from the table off of the hard drive – is completed within 8 minutes.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`mytable`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`HASH`</span> <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">&quot;%dsdfsdfsdfsd%&quot;</span> <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span>;
Empty <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">7</span> min <span style="color: #cc66cc;">1.45</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>So even if it needed to modify all the rows, and assuming that reading from disk takes as much time as writing to disk, then it should finish within 20 minutes. (The columns would have the same width so no data re-arrangement would need to take place.)</p>
<p>Without digging through the source code I can only provide speculation based on behavior and on information provided by third party sources. An excerpt from High Performance MySQL reads:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">MySQL performs most alterations by making an empty table with the desired new
structure, inserting all the data from the old table into the new one, and deleting the
old table.
....
mysql&gt; ALTER TABLE sakila.film
-&gt; MODIFY COLUMN rental_duration TINYINT(3) NOT NULL DEFAULT 5;
&nbsp;
Profiling that statement with SHOW STATUS shows that it does 1,000 handler reads and
1,000 inserts.</pre></div></div>

<p>My understanding is that it creates a temporary table, fetches one row at a time from the original table and inserts one row at a time into the temporary table. The problem with this method is that it incurs a lot of overhead performing queries one row at a time. This overhead could be avoided by handling a lot of rows in a lot fewer queries. All this occurred to me while the ALTER TABLE had been executing, but I didn’t kill it because I wasn’t completely confident that this was the problem or that I had a better solution.</p>
<p>When the ALTER TABLE failed I tried an alternate solution that involves bulk SELECTs and INSERTs. I did the following steps:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">1.	Create an SQL dump of mytable using mysqldump command
2.	Opened the SQL dump in an editor that supports large files (I used UEStudio.)
3.	I did a string-replace of char(16) to binary(16) (The reason I did a string replace is because UEStudio – when opening large files – would write whatever you typed directly to disk.. so for very large files a single keystroke would lag for about a minute.)
4.	Import the edited SQL file via  “mysql –uuser –p mydb &lt; edited.sql”</pre></div></div>

<p>This entire process was completed within an hour. Hopefully this post will help someone save a few days of their life.</p>
<p>RE: <a href="http://oreilly.com/catalog/9780596003067/">http://oreilly.com/catalog/9780596003067/</a><br />
RE: <a href="http://dev.mysql.com/doc/refman/5.1/en/alter-table.html">http://dev.mysql.com/doc/refman/5.1/en/alter-table.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2009/06/mysql-alter-table-is-slow/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Efficient Way to Fetch Random Rows From a MySQL Table</title>
		<link>http://madhax.org/2009/06/efficient-way-to-fetch-random-rows-from-a-mysql-table/</link>
		<comments>http://madhax.org/2009/06/efficient-way-to-fetch-random-rows-from-a-mysql-table/#comments</comments>
		<pubDate>Tue, 16 Jun 2009 10:00:43 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=36</guid>
		<description><![CDATA[I’m currently working on a small project that led me to look for the best way to fetch a random row from a MySQL database. Online research showed that a common solution to this problem is something similar to
[1]

SELECT * FROM `TABLE` ORDER BY RAND&#40;&#41; LIMIT 1;

The problem with this solution is that it assigns [...]]]></description>
			<content:encoded><![CDATA[<p>I’m currently working on a small project that led me to look for the best way to fetch a random row from a MySQL database. Online research showed that a common solution to this problem is something similar to</p>
<p>[1]</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`TABLE`</span> <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> RAND<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span>;</pre></div></div>

<p>The problem with this solution is that it assigns a random value to each row and attempts to sort the entire table by the random value. I list here a variety of solutions I have found on the web.</p>
<p>[2]</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$range_result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_query</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">&quot;SELECT MAX(`id`) AS max_id , MIN(`id`) AS min_id FROM `table` &quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$range_row</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_fetch_object</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$range_result</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> 
<span style="color: #000088;">$random</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mt_rand</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$range_row</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">min_id</span> <span style="color: #339933;">,</span> <span style="color: #000088;">$range_row</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">max_id</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_query</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">&quot; SELECT * FROM `table` WHERE `id` &gt;= <span style="color: #006699; font-weight: bold;">$random</span> LIMIT 0,1 &quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Solution [2] scans the entire table up to $random.  Worst case scenario &#8211;  this requires a full table scan.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">EXPLAIN</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`to_visit`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`ID`</span> <span style="color: #66cc66;">&gt;=</span> <span style="color: #cc66cc;">250</span>;
&nbsp;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+----------+------+---------------+------+---------+------+----------+-------------+</span>
<span style="color: #66cc66;">|</span> id <span style="color: #66cc66;">|</span> select_type <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">TABLE</span>    <span style="color: #66cc66;">|</span> type <span style="color: #66cc66;">|</span> possible_keys <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">KEY</span>  <span style="color: #66cc66;">|</span> key_len <span style="color: #66cc66;">|</span> ref  <span style="color: #66cc66;">|</span> rows     <span style="color: #66cc66;">|</span> Extra       <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+----------+------+---------------+------+---------+------+----------+-------------+</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span> SIMPLE      <span style="color: #66cc66;">|</span> to_visit <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">ALL</span>  <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span>       <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>    <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">20345551</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">USING</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+----------+------+---------------+------+---------+------+----------+-------------+</span>
&nbsp;
&nbsp;
<span style="color: #cc66cc;">1</span> row <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.00</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>[3]</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$offset_result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_query</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">&quot; SELECT FLOOR(RAND() * COUNT(*)) AS `offset` FROM `table` &quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$offset_row</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_fetch_object</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$offset_result</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> 
<span style="color: #000088;">$offset</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$offset_row</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">offset</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_query</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">&quot; SELECT * FROM `table` LIMIT <span style="color: #006699; font-weight: bold;">$offset</span>, 1 &quot;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Solution [3] scans the entire database up to $offset. Worst case scenario – a full table scan is required.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">EXPLAIN</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`to_visit`</span> <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">20</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1</span>;
&nbsp;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+----------+------+---------------+------+---------+------+----------+-------+</span>
<span style="color: #66cc66;">|</span> id <span style="color: #66cc66;">|</span> select_type <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">TABLE</span>    <span style="color: #66cc66;">|</span> type <span style="color: #66cc66;">|</span> possible_keys <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">KEY</span>  <span style="color: #66cc66;">|</span> key_len <span style="color: #66cc66;">|</span> ref  <span style="color: #66cc66;">|</span> rows     <span style="color: #66cc66;">|</span> Extra <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+----------+------+---------------+------+---------+------+----------+-------+</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span> SIMPLE      <span style="color: #66cc66;">|</span> to_visit <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">ALL</span>  <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>          <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>    <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">20345551</span> <span style="color: #66cc66;">|</span>       <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+----------+------+---------------+------+---------+------+----------+-------+</span>
&nbsp;
&nbsp;
<span style="color: #cc66cc;">1</span> row <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.00</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>[4]</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`table`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> id <span style="color: #66cc66;">&gt;=</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> FLOOR<span style="color: #66cc66;">&#40;</span> MAX<span style="color: #66cc66;">&#40;</span>id<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">*</span> RAND<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`table`</span> <span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> id <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span>;</pre></div></div>

<p>Solution [4] is equivalent to solution [2]. I’m not sure why the “ORDER BY id” was required at all. Worst case scenario &#8211; a full table scan is required.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">EXPLAIN</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`to_visit`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> ID <span style="color: #66cc66;">&gt;=</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> FLOOR<span style="color: #66cc66;">&#40;</span>MAX<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">`ID`</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">*</span> RAND<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`to_visit`</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span>;
&nbsp;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+----------------------+----------+-------+---------------+---------+---------+------+----------+-------------+</span>
<span style="color: #66cc66;">|</span> id <span style="color: #66cc66;">|</span> select_type          <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">TABLE</span>    <span style="color: #66cc66;">|</span> type  <span style="color: #66cc66;">|</span> possible_keys <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">KEY</span>     <span style="color: #66cc66;">|</span> key_len <span style="color: #66cc66;">|</span> ref  <span style="color: #66cc66;">|</span> rows     <span style="color: #66cc66;">|</span> Extra       <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+----------------------+----------+-------+---------------+---------+---------+------+----------+-------------+</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span>              <span style="color: #66cc66;">|</span> to_visit <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">ALL</span>   <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>          <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>    <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>    <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">20345551</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">USING</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">2</span> <span style="color: #66cc66;">|</span> UNCACHEABLE SUBQUERY <span style="color: #66cc66;">|</span> to_visit <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>          <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">4</span>       <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">20345551</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">USING</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+----------------------+----------+-------+---------------+---------+---------+------+----------+-------------+</span>
&nbsp;
&nbsp;
<span style="color: #cc66cc;">2</span> rows <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.00</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>[5]</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">@rand_id :<span style="color: #66cc66;">=</span> FLOOR<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">+</span> <span style="color: #66cc66;">&#40;</span>RAND<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">*</span> <span style="color: #66cc66;">&#40;</span>@max_id <span style="color: #66cc66;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">`column`</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`table`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`id`</span> <span style="color: #66cc66;">&gt;=</span> @rand_id <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #ff0000;">`id`</span> <span style="color: #993333; font-weight: bold;">ASC</span> <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span></pre></div></div>

<p>Solution [5] is equivalent to solution [4]. However, if `column` _is_ a primary key then only the index table will be searched. Worst case scenario (`column` is not a primary key) a full table scan may be done. Best case scenario is that `column` is indexed and it would be significantly faster.</p>
<p>The main problem in all these solutions is the assumption that MySQL will do some magical handling for any reference to a column that has a primary key. MySQL will not use an indexed column if you are fetching anything other than indexed columns (not exactly true – see <a href="http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html">http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html</a> .)</p>
<p>My solution makes use of a table index. The following code is in Python (which means it should be very easy to read :D .)</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">cursor.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;SELECT MAX(`ID`), MIN(`ID`) FROM `to_visit`&quot;</span><span style="color: black;">&#41;</span>
result = cursor.<span style="color: black;">fetchall</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
maxid = result<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
minid = result<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
<span style="color: #008000;">id</span> = <span style="color: #dc143c;">random</span>.<span style="color: black;">randint</span><span style="color: black;">&#40;</span>minid, maxid<span style="color: black;">&#41;</span>
cursor.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;SELECT * FROM `to_visit` as t1 JOIN (SELECT ID FROM `to_visit` WHERE `ID`&gt;=&quot;</span>+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">id</span><span style="color: black;">&#41;</span>+<span style="color: #483d8b;">&quot; limit 1) as t2 where t1.id = t2.id&quot;</span><span style="color: black;">&#41;</span>
result = cursor.<span style="color: black;">fetchall</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Selecting MAX(`ID`) and MIN(`ID`) should be constant time (it is for MyISAM storage engine.) The second query uses a random ID generated in python. Since the inner query fetches the `ID` &#8211; which is a primary key  &#8211; only a scan of the index table is required. The outer join is performed in constant time because you are referencing a specific row by using one specific primary key.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">mysql<span style="color: #66cc66;">&gt;</span> <span style="color: #993333; font-weight: bold;">EXPLAIN</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`to_visit`</span> <span style="color: #993333; font-weight: bold;">AS</span> t1 <span style="color: #993333; font-weight: bold;">JOIN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> ID <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`to_visit`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`ID`</span><span style="color: #66cc66;">&gt;=</span><span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> t2 <span style="color: #993333; font-weight: bold;">ON</span> t1<span style="color: #66cc66;">.</span>id <span style="color: #66cc66;">=</span> t2<span style="color: #66cc66;">.</span>id;
&nbsp;
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+------------+--------+---------------+---------+---------+-------+----------+--------------------------+</span>
<span style="color: #66cc66;">|</span> id <span style="color: #66cc66;">|</span> select_type <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">TABLE</span>      <span style="color: #66cc66;">|</span> type   <span style="color: #66cc66;">|</span> possible_keys <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">KEY</span>     <span style="color: #66cc66;">|</span> key_len <span style="color: #66cc66;">|</span> ref   <span style="color: #66cc66;">|</span> rows     <span style="color: #66cc66;">|</span> Extra                    <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+------------+--------+---------------+---------+---------+-------+----------+--------------------------+</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span>     <span style="color: #66cc66;">|</span> <span style="color: #66cc66;">&lt;</span>derived2<span style="color: #66cc66;">&gt;</span> <span style="color: #66cc66;">|</span> system <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>          <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>    <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>    <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>  <span style="color: #66cc66;">|</span>        <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span>                          <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span>     <span style="color: #66cc66;">|</span> t1         <span style="color: #66cc66;">|</span> const  <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span>       <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">4</span>       <span style="color: #66cc66;">|</span> const <span style="color: #66cc66;">|</span>        <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span>                          <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">2</span> <span style="color: #66cc66;">|</span> DERIVED     <span style="color: #66cc66;">|</span> to_visit   <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">INDEX</span>  <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span>       <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">4</span>       <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">NULL</span>  <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">21254018</span> <span style="color: #66cc66;">|</span> <span style="color: #993333; font-weight: bold;">USING</span> <span style="color: #993333; font-weight: bold;">WHERE</span>; <span style="color: #993333; font-weight: bold;">USING</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #66cc66;">|</span>
<span style="color: #66cc66;">+</span><span style="color: #808080; font-style: italic;">----+-------------+------------+--------+---------------+---------+---------+-------+----------+--------------------------+</span>
&nbsp;
&nbsp;
<span style="color: #cc66cc;">3</span> rows <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.00</span> sec<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>I used this small snippet of code to benchmark my solution.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> fetch_random_row_test<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">global</span> db
	start = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	cursor = db.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	cursor.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;SELECT MAX(`ID`), MIN(`ID`) FROM `to_visit`&quot;</span><span style="color: black;">&#41;</span>
	result = cursor.<span style="color: black;">fetchall</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	maxid = result<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
	minid = result<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
	<span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">1000</span><span style="color: black;">&#41;</span>:
		<span style="color: #008000;">id</span> = <span style="color: #dc143c;">random</span>.<span style="color: black;">randint</span><span style="color: black;">&#40;</span>minid, maxid<span style="color: black;">&#41;</span>
		cursor.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;SELECT * FROM `to_visit` as t1 JOIN (SELECT ID FROM `to_visit` WHERE `ID`&gt;=&quot;</span>+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">id</span><span style="color: black;">&#41;</span>+<span style="color: #483d8b;">&quot; limit 1) as t2 where t1.id = t2.id&quot;</span><span style="color: black;">&#41;</span>
		result = cursor.<span style="color: black;">fetchall</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
	<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> – start</pre></div></div>

<p>Even though the code doesn’t use a performance timer it should still give fairly accurate and practical results. On my home computer it completed in 21 seconds (that is it fetched 1000 random rows in 21 seconds.) There are other issues that need to be addressed when selecting random rows from a MySQL table (such as fragmentation – which would make the “randomness” not so random :P .) I will maybe discuss fragmentation in a future blog post.</p>
<p>RE: <a href="http://dev.mysql.com/doc/refman/5.0/en/using-explain.html">http://dev.mysql.com/doc/refman/5.0/en/using-explain.html</a><br />
RE: <a href="http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html">http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2009/06/efficient-way-to-fetch-random-rows-from-a-mysql-table/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DMCA in Canada – Google Filters Search Results According to US Law</title>
		<link>http://madhax.org/2009/05/dmca-in-canada-%e2%80%93-google-filters-search-results-according-to-us-law/</link>
		<comments>http://madhax.org/2009/05/dmca-in-canada-%e2%80%93-google-filters-search-results-according-to-us-law/#comments</comments>
		<pubDate>Sun, 10 May 2009 07:01:48 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[filters]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=34</guid>
		<description><![CDATA[A few days ago I had been searching for information about a very cool IDE I’ve begun using called ‘UEStudio’. The bottom of the page read
“In response to a complaint we received under the US Digital Millennium Copyright Act, we have removed 1 result(s) from this page. If you wish, you may read the DMCA [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago I had been searching for information about a very cool IDE I’ve begun using called ‘UEStudio’. The bottom of the page read</p>
<blockquote><p><em>“In response to a complaint we received under the US Digital Millennium Copyright Act, we have removed 1 result(s) from this page. If you wish, you may read the DMCA complaint that caused the removal(s) at ChillingEffects.org.”</em></p></blockquote>
<p>I don’t live in the United States &#8211; I live in Canada. I contacted Google via their help desk as to why my search results were being filtered according to the US Digital Millennium Copyright Act law when I’m not even in the US. The next day &#8211; after no reply from Google – I searched for ‘UEStudio’ once again and the warning was gone. I don’t know if the missing result was displayed this time or if they just omitted the error message for that particular query. Searching for ‘torrent’ still yields the warning with 2 results being filtered. Google obviously didn’t fix the problem.</p>
<p>I can only speculate about what kind of design Google has in its data centers where a fix can only be pushed down to one query. Thousands of servers assemble the final result to display. Complex algorithms prioritize Canadian based website (on google.ca) while still taking into account every other website in the entire world. No doubt Google knows what they’re doing.</p>
<p>PROTIP: on the machine that assembles and/or pumps out the final result</p>
<blockquote><p><em>/*The server is probably tied to the hostname, so you’d only have to check if there is a DMCA warning*/</em></p>
<p><em> If(HOSTNAME == “google.ca”) omit_US_DMCA_WARNING();</em></p></blockquote>
<p>This solution – although inane – would be a more acceptable outcome than the one query fix that may or may not have given me the missing result.</p>
<p>Another solution would be to find out where the computer for a particular IP is approximately located. Geo IP address lookup is a very simple technology that only requires an indexed lookup (QUICK) of an IP to the approximate area the IP address belongs to. Google has made use of this in previous scenarios *recalls not being able to watch the official video of Bleeding Love on Youtube due to location restrictions*.</p>
<p>To Google’s credibility, the server I connect to while accessing google.ca is located in Mountain View California. So even though I am outside of the US, the server isn’t. Google should just build another data center in Canada.</p>
<p>Ideally, everyone would be able to share and access all information. Hopefully, all limitations imposed on us are ones signed by people we put in office. Sadly, my search results are being filtered and I get a fucking US DMCA warning message. Google is still better than Yahoo.</p>
<p><strong>Update:</strong> Searching for &#8216;torrent&#8217; doesn&#8217;t show the error anymore. Searching for &#8216;torrent download&#8217; does.</p>
<p><strong>Update 2:</strong></p>
<p>Google has finally replied &#8211; after 9 days &#8211; to my initial query about DMCA in Canada. This is their reply:</p>
<blockquote><p>I agree that it is unfortunate that <strong>DMCA</strong> complaints affect our users worldwide.</p>
<p><em>Google.com is the index from which all other domain specific indexes are drawn, which is why items that are removed from Google.com will also be removed from any sub-index. We will keep you informed anytime we remove content due to a <strong>DMCA</strong> complaint by inserting a notice at the bottom of the search results page that links to the original complaint. While this is not an ideal solution, following the link will allow you to view the URLs you would have otherwise seen in the search results.</em></p>
<p><em>If the Chilling Effects notice is no longer showing up for the same search this usually means one of two things:</em></p>
<p><em>1. That the site that had its page blocked filed a counter notification with us that resulted in the page being re-included into our search results.</em></p>
<p><em>or</em></p>
<p><em>2. The ranking of the blocked page altered after our last crawl.</em></p>
<p><em>Hopefully this helps explain why you&#8217;re seeing the notice in some searches and not others; however, I know that this can be a bit confusing so don&#8217;t hesitate to ask if you would like further clarification.</em></p>
<p><em>Hope that helps!</em></p></blockquote>
<p><span>The main point being that all other indices around the world are derived from the main index at Google.com and therefore are at the mercy of whatever laws that index is bound to. I am suspicious about the possible reasons they give for the error not showing up anymore. For the first reason, it seems unlikely that two counter notifications would be processed right after me complaining about them. The second reason also seems unlikely because any site that does have a very low pagerank would be less prone to DMCA complaints. Nevertheless, the Google employee that had replied to my query had joined the helpdesk the day of answering my question and only answered my question. I imagine that it probably went up the support chain &#8211; which is why it took so long to answer &#8211; and they eventually did get back to me with an answer. I still like Google better than Yahoo or Live.<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2009/05/dmca-in-canada-%e2%80%93-google-filters-search-results-according-to-us-law/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Madhax Begins</title>
		<link>http://madhax.org/2009/02/hello-world-2/</link>
		<comments>http://madhax.org/2009/02/hello-world-2/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 12:27:19 +0000</pubDate>
		<dc:creator>Madhax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://madhax.org/?p=1</guid>
		<description><![CDATA[Hello world!
]]></description>
			<content:encoded><![CDATA[<p>Hello world!</p>
]]></content:encoded>
			<wfw:commentRss>http://madhax.org/2009/02/hello-world-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
