Web Design & Development Tips Tutorials

APT Repository for Old Ubuntu Releases

July 31, 2011 by Josh Hartman

Recently I installed Ubuntu Server 6.06 (Dapper) on an old Sun Blade 100 which has a SPARC processor. The installation on a Sun Blade 100 has its own quirks, but after getting it up and running I of course wanted to make sure it had the latest software versions and security updates that it could support.

Problem

After support for an old Ubuntu releases has been dropped the folks at Canonical remove the repositories from their main repository servers such as archive.ubuntu.com and security.ubuntu.com. So, when you run sudo apt-get update you are met with a list of errors telling you that the your configured repositories were not found.

Solution

To get back on the repository train, edit your /etc/apt/sources.list and replace all instances of archive.ubuntu.com and security.ubuntu.com with the very fitting old-releases.ubuntu.com. After that, run sudo apt-get update and watch the repository indexes roll in!

Here is an example of an updated /etc/apt/sources.list for Ubuntu 6.06 (Dapper):


## Uncomment the following two lines to fetch updated software from the network
deb http://old-releases.ubuntu.com/ubuntu dapper main restricted
deb-src http://old-releases.ubuntu.com/ubuntu dapper main restricted

## Uncomment the following two lines to fetch major bug fix updates produced
## after the final release of the distribution.
deb http://old-releases.ubuntu.com/ubuntu dapper-updates main restricted
deb-src http://old-releases.ubuntu.com/ubuntu dapper-updates main restricted

## Uncomment the following two lines to add software from the 'universe'
## repository.
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## universe WILL NOT receive any review or updates from the Ubuntu security
## team.
deb http://old-releases.ubuntu.com/ubuntu dapper universe
deb-src http://old-releases.ubuntu.com/ubuntu dapper universe

deb http://old-releases.ubuntu.com/ubuntu dapper-security main restricted
deb-src http://old-releases.ubuntu.com/ubuntu dapper-security main restricted

deb http://old-releases.ubuntu.com/ubuntu dapper-security universe
deb-src http://old-releases.ubuntu.com/ubuntu dapper-security universe

deb http://old-releases.ubuntu.com/ubuntu dapper multiverse
deb-src http://old-releases.ubuntu.com/ubuntu dapper multiverse

deb http://old-releases.ubuntu.com/ubuntu dapper-backports main restricted universe multiverse

That’s it for today, I hope this has helped you get the software you need onto an old Ubuntu installation.

Setup WordPress to Respond to Multiple Site URLs

July 23, 2011 by Josh Hartman

Have you ever encountered a case where you want to access your WordPress site from multiple site URLs (for example a local URL and an external URL, or multiple external URLs)?

Solution

Change the Site URL and Home URL to a relative URL, making the domain name irrelevant to loading the site. Oh, but it can’t be as simple as updating that on the Settings->General admin page, no…WordPress requires a valid URL. Well don’t give up, you’ll just have to hack the option directly in the database.

Connect to your WordPress database using a web-based utility like phpMyAdmin or another SQL client such as HeidiSQL.
Open your wp_options table for editing (if applicable, use your table prefix instead of wp_).
Find the rows with an option_name of siteurl or home and change the option_value of each to / if WordPress is installed in the root of your website or /path/to/wordpress if your WordPress site is installed in a sub-directory. Do NOT add a trailing slash.

For those wanting to execute a simple SQL query, here is an example:


UPDATE wp_options SET option_value = '/' WHERE option_name IN('siteurl', 'home');

Test your site out by going to your site URLs (for example http://192.168.0.1, http://mywebserver.local, http://www.example.com).

Important Note: This method is a hack and therefore isn’t a behavior WordPress developers intended. I wouldn’t recommend it for a production site. I have noticed that after making this change on a site that the admin login page lacks CSS, the toolbar on the Visual Editor is missing, and the front-end admin bar sometimes fails to be styled and shows at the bottom on pages. I haven’t seen any issues with the normal front-end site that visitors see. So be aware of some admin CSS problems.

Hope that tip can help some of you out, I know it was useful to me.

Selecting Random Rows Using MySQL + Benchmark Results

March 23, 2011 by Josh Hartman

A topic of contention that has been around for many years is that of using ORDER BY RAND() to select random rows. This method is said to be very slow, doesn’t scale well and should not be used. Is this the case? Does it have its uses? Well, they say the proof is in the pudding, so let’s get to the pudding.

Contenders

Order by Rand() Method

SELECT * FROM myTable ORDER BY RAND() LIMIT 1;

This method works by generating a random value for each row of the table, sorting the table according to these random values, and then returning one row.

Advantage: Easy to remember and use in complicated SQL queries and you don’t need to have a uniquely assigned or auto-incrementing ID field in your table. Easily select multiple random rows by simply increasing the LIMIT.

Disadvantage: This method’s speed is directly related to how long it takes to generate a random value for each row. The more rows a table has the longer it takes.

Order By Rand() Alternative Method

SELECT * FROM myTable WHERE RAND()<(SELECT ((1/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 1;

This method uses ORDER BY RAND(), but only on a small percentage of the table rows; this percentage is based upon how many rows you want, LIMIT 1, divided by how many rows the table has, COUNT(*), and then multiply that figure by 10 to avoid returning less rows than you request. I developed this method after reading How to select random rows in MySQL @ rndblog.com.

Advantage: Easy to use in complicated SQL queries and you don't need to have a sequentially numbered field in your table. Easily select multiple random rows by simply increasing the LIMIT and adjusting the WHERE statement to match. For example, to get 30 random rows you would use:

SELECT * FROM myTable WHERE RAND()<(SELECT ((30/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 30;

Disadvantage: This method's speed is directly related to how long it takes to generate a random value for each row you query. The more random rows you want, the longer it takes.

Inner Join Method

SELECT t.* FROM myTable AS t JOIN (SELECT ROUND(RAND() * (SELECT MAX(id) FROM myTable)) AS id) AS x WHERE t.id >= x.id LIMIT 1;

This method smartly uses the primary key (id column) to generate a random integer based upon the highest number in your key field and assigns a table alias so that you can use it to select one row where the key matches the previously generated random integer.

Note that you many also express this method as a sub-query like so:

SELECT t.* FROM myTable AS t, (SELECT ROUND(RAND() * (SELECT MAX(id) FROM myTable)) AS id) AS x WHERE t.id >= x.id LIMIT 1;

Depending on your SQL query one way may be easier to integrate than the other.

Advantage: Doesn't spend time generating a random value for each row. See benchmark results.

Disadvantage: Requires that your table have an auto-incrementing primary key field or other sequentially numbered field and it must have no holes (must be 1, 2, 3, 4... not 3, 8, 11, 34...) in the sequence of integers. Must run query multiple times to get more than one random row, increasing the limit will only give you sequential rows starting from the randomly selected row. May be difficult for a beginner to integrate into an already complicated SQL query.

User Variable Method

SELECT t.* FROM (SELECT ROUND(RAND() * (SELECT MAX(id) FROM myTable)) num, @num:=@num+1 FROM (SELECT @num:=0) AS a, myTable LIMIT 1) AS b, myTable AS t WHERE b.num = t.id;

This method uses user-defined variables to generate a number of random integers defined by the LIMIT and uses those integers to match rows with the table's id field. For a little more information about this method check out How to produce random rows from a table @ MySQL Diary.

Advantage: Doesn't spend time generating a random value for each row. Can be used to get multiple random rows (on tables with no numbering holes) by increasing the LIMIT within the inner SELECT. See benchmark results.

Disadvantage: User-defined variables are connection specific, so you can't use this method with connection pooling or persistent connections. Requires that your table have an auto-incrementing primary key field or other sequentially numbered field and it must have no holes (must be 1, 2, 3, 4... not 3, 8, 11, 34...) in the sequence of integers. May be time consuming to integrate into an existing SQL query.

Setup

With the help of GenerateData I was able to generate a fictional customer data table with 1,250,000 rows (MyISAM engine, latin1_swedish_ci collation). I then used HeidiSQL to run each of the random row methods 10 times, recorded the query execution time of each run and extracted an average run time for each method. You can see a sample of the generated customer data below:

Click image to enlarge

Also, using the main customer table, I created intermediate tables with 10, 100, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 10,000, 25,000, 50,000, 100,000, 250,000, 500,000, and 1,000,000 rows and tested these the same way.

Results: Part 1

Below you can see the overall performance chart of all three methods and a partial chart showing just the performance on tables with 10-50,000 rows. View Detailed Results on a Google Docs Spreadsheet. A value of zero means the query time was too fast to be measured by HeidiSQL (less than one-thousandth of a second).

Full Chart

Click image to enlarge

Partial Chart

Click image to enlarge

The Inner Join and User Variable methods certainly show themselves to be optimized and speedy; getting a random row from 1.25 million records in a little over one thousandth of a second. Either of these methods would be great to use on a table with a sequentially numbered field with no numbering holes. If you do have numbering holes in your primary key and would like to avoid using ORDER BY RAND() then Jan Kneschke has a method for you involving building a mapping table to fill the numbering holes and triggers to keep it updated.

We can see that the Order By Rand() method is not efficient and doesn't scale well, no question about that, but it is the easiest method to remember and use, and you can get multiple random rows without worrying about numbering holes in your primary key. Calling the Order By Rand() method slow would be a relative statement because in smaller tables it is still relatively fast; under 1/100th of a second when querying up to 2000 rows. I personally wouldn't hesitate to use it in tables having as many as 5,000 rows.

We certainly can't overlook the Order By Rand() Alternative method; it is a great option that gives you an easy solution for multiple random rows and it performs up to 16x better than the regular Order By Rand() method.

What about multiple random rows?

Yes, this is a very important question. Let's have a fictional customer appreciation day and pick 100 random customers out of the 1.25 million customers to send gift certificates to. How would I go about accomplishing that? To give all of the methods ideal conditions we'll assume that the customer table were working with has a primary key with no numbering holes, so for the Order By Rand() and Order By Rand() Alternative methods you will just increase the LIMIT to 100. Note that in the alternative method you'll also need to adjust the WHERE statement to match the LIMIT like so: RAND()<(SELECT ((100/COUNT(*))*10) FROM myTable)

For the Inner Join method you'll have to run it 100 times to get 100 random rows, and for the User Variable you can just increase the LIMIT within the inner SELECT because there are no numbering holes. If there were numbering holes you would have to run the User Variable method 100 times just like the Inner Join method.

To benchmark the selection of multiple rows I setup a PHP script to run the queries and print out the execution time of those commands. View the Multiple Random Rows MySQL PHP Benchmark Script.

100 Random Rows MySQL Benchmark
Order By Rand(): 9.761297 seconds
Order By Rand() Alternative: 0.613716 seconds
Inner Join: 0.015041 seconds
User Variable: 0.001578 seconds

We can see here that the methods using ORDER BY RAND() simply can't keep up with awesome performance of the Inner Join and User Variable methods, in fact they don't even show up on the chart because the value is so small. It is important to note that if the customer table we were using had numbering holes, which I think about 90+% of tables do, then the Inner Join and User Variable methods could not be used.

Winner

From my perspective the Order By Rand() Alternative method is the overall winner here because it is easy to use and gives you quick results on small tables and very large tables whether you want a single random row or multiple random rows, and for selecting multiple random rows from large tables with numbering holes there is no better method.

SELECT * FROM myTable WHERE RAND()<(SELECT ((1/COUNT(*))*10) FROM myTable) ORDER BY RAND() LIMIT 1;

Conclusion

Always choose your method depending on your table structure, number of table rows you are querying, how many random rows you need, and always weigh the ease of use against performance.

Thanks for reading!

Have another method you'd like to share? Leave a comment and let us know.

Get File Extension with PHP + Benchmark Results

February 27, 2011 by Josh Hartman

I know you’ve probably seen this topic a hundred times, and so have I, but this one has a different twist. We already know that there are many ways to get the extension of a file, but which one is the fastest? That’s what I’m going to address right now.

Contenders

In each of the code examples $file is set to c:\\xampplite\\htdocs\\index.php.

String-to-Array Method


<?php
$ext = end(explode('.', $file));
echo $ext; // outputs 'php'
?>

Sub-String Method


<?php
$ext = substr($file, strrpos($file, '.')+1);
echo $ext; // outputs 'php'
?>

Path Info Method


<?php
$ext = pathinfo($file, PATHINFO_EXTENSION);
echo $ext; // outputs 'php'
?>

Setup

To test each of the contenders I put together a script that timed the execution of 1,000,000 iterations of each command. If you would like the script you can download it here.

Results

Get File Extension PHP Benchmark
Sub-String Method: 0.778156 seconds
String-to-Array Method: 1.889744 seconds
Path Info Method: 2.020036 seconds

Winner

Our winner? The Sub-String Method! Next time you reach for that line of code to get a file’s extension, go for gold, and choose the Sub-String Method.


<?php
$ext = substr($file, strrpos($file, '.')+1);
echo $ext; // outputs 'php'
?>

« Previous Page
1
…
5
6
7
8
9
…
12
Next Page »