Posted: May 21st, 2010 | Author: jgeiger | Filed under: ruby, web | Tags: dreamhost, logrotate, rails, ruby | 1 Comment »
While Dreamhost may rotate the apache logs for you there is nothing automatic to rotate the rails production logs. This may not be an issue since you have “unlimited” disk space but it’s a good idea anyway.
You need to install logrotate since it doesn’t exist by default on the server and then place it in a location where it can be run. You also need to create the configuration file and status files. Once that is set up, you can install a cron job via the Dreamhost panel.
Now it should be rotating your application logs nightly. You can add more sites to the conf file as needed.
Posted: March 16th, 2010 | Author: jgeiger | Filed under: ruby, web | Tags: cloud crowd, gminer, mcw, rabbitmq, rails, redis, resque, ruby | No Comments »
I’ve been working on a project that requires the processing of a series of jobs. I had originally written my own system for doing this because I wanted to know more about how they work. After a time, I decided to modify it, and found that I had broken it. Instead of trying to fix it, I decided to see if there was anything out there that someone else had done that would work better for me.
Resque
My first attempt was to use resque. It worked, but as I started to scale things up, I ran into some issues that I didn’t like.
It polled the DB a lot. While it was in memory, it was a lot of “do I have a message?” checks which seemed messy.
It wasn’t fast. There was a lot of overhead. Things “felt” slow as they were running.
It was memory based. Redis will store data to the disk, but it’s meant as an in memory system which gives it the speed.
What I did like is that it worked. The jobs finished and there was a nice web interface to see what was going on.
Cloud Crowd
I had looked at Cloud Crowd before and it seemed interesting. I like the web dashboard but it was also one of the biggest problems with Cloud Crowd. According to the authors, it was created to handle a small number of very expensive jobs. I have no doubt based on my experience that it would excel in that environment. My problems consists of a very large number of small fast jobs. Cloud Crowd ground to a halt pretty quickly. The dashboard was taking too much time to render which began to multiple the render time and eventually it needed to be turned off.
The other big issue I ran into was with how the workers processed their jobs. It wouldn’t start another job until all of the other jobs it had created finished. If you have a scheduling job that launches 10 processing jobs, the system gets stuck waiting for the 10 processing jobs to finish before it can start another scheduling job. Again, it works very well. Everything gets done and you get a result string but it was slow.
RabbitMQ
I decided to figure out what was wrong with my system since it was working at one point. I’m using RabbitMQ as a message broker to pass the jobs back and forth between daemons running on linux machines. I believe my issue was caused by using a topic exchange with a key per worker. I was running into issues where some processors were picking up messages from the topic that were not assigned to their key. Once I realized this was happening I decided to go back to a queue per worker. I wanted to get away from that since originally I had been creating multiple queues in rabbit that never disappeared. I changed the queues to be exclusive. Exclusive means that only one client (processor) can read from that queue. It also makes the queue self-delete when the consumer disappears.
I’m attaching the code for my system below. I’ll post more about how each of the parts works later. I hope to add a bit more control into the system, but as of now it’s pretty self healing and very fast.
http://github.com/mcwbbc/gminer_scheduler
http://github.com/mcwbbc/gminer_node
http://github.com/mcwbbc/gminer_processor
http://github.com/mcwbbc/gminer_databaser
Posted: March 2nd, 2010 | Author: jgeiger | Filed under: ruby, web | Tags: google, rails, ruby | No Comments »
I just migrated a site that had a bunch of links that have been in in the search engines for a while. Oddly it seems that the only thing hitting those links seem to be the crawlers themselves. I needed a way to invalidate those links, since I couldn’t create a proper redirect because of changing IDs.
/records/show/12345 used to be valid, but has been replaced with the RESTful version /records/00123. The ID is now also meaningful instead of a MySQL generated id.
My first attempt was to just redirect to the 404 page.
record = Record.find(params[:id]
rescue ActiveRecord::RecordNotFound
redirect_to("/404.html")
But as I watched the logs, I noticed that this really wasn’t right since it was still returning a 302 (redirect) and the a 200 (OK) code for those links. The crawlers were getting the instruction that you should just display the 404 page for those links. That might seem OK, but really I wanted them to get the 404 immediately and remove the page from their databases.
record = Record.find(params[:id]
rescue ActiveRecord::RecordNotFound
render(:file => "#{RAILS_ROOT}/public/404.html", :layout => false, :status => 404)
By rendering the 404.html directly and including the 404 status code, it should help to fix the situation.
Posted: February 24th, 2010 | Author: jgeiger | Filed under: ruby, web | Tags: mongodb, rails, ruby, vps | No Comments »
I started looking into moving off of dreamhost because I’ve had some issues with responsiveness on my applications. For $20 a year, I could put up with it. Now that I’m paying $100, it’s a bit more annoying since there are other options out there at that price point.
I’m considering slicehost.com, linode.com and webfaction.com.
I guess the other big reason is that I want to play with MongoDB and each of these gives me that option.
Posted: February 24th, 2010 | Author: jgeiger | Filed under: ruby | Tags: bundler, rails, rails 3, ruby | No Comments »
I wish they would have made a bigger deal about this, but it seems that bundler now has two different gems.
bundler08 is for bundler 0.8.4 and such, which plays really well with rails 2.3.5
bundler is for bundler 0.9.x and beyond which plays well with rails 3 (and rails 2.3.5 if you can get it to work…)
This is a really good thing because you can now install both of them at the same time and the warning that you must un-install all previous versions of bundler is now moot. Really helpful if you’re running on dreamhost with mixed rails 2/3 sites.
Posted: February 9th, 2010 | Author: jgeiger | Filed under: ruby, web | Tags: bundler, rails | No Comments »
Seems I was missing a few things.
Take a look at this gist and see if it can help you.
Edit: Updated the gist to a better one.
Posted: September 21st, 2009 | Author: jgeiger | Filed under: ruby, web | Tags: rails, ruby 1.9.1 | 2 Comments »
I’ve spend the better part of the last two weeks dealing with upgrading my operating system to Snow Leopard. Honestly, I haven’t seen much difference, but I believe that there is an improvement. Things just feel faster.
One of the big changes I was going to make was moving all my development to ruby 1.9.1, since it’s now the preferred ruby as stated by ruby-lang.org.
I was able to install it, add gems and such based on various tutorials I read on the net. My problem started as soon as I tried to deal with rails 2.3.4 and textmate. Time after time, I tried to run tests, and ended up with the same issue.
invalid multibyte character
Here’s a test for you. Fire up an irb shell using ruby 1.9.1.
irb
control = %Q|\x00-\x1f\x7f-\xff|
CONTROL_CHAR = /[#{control}]/n
The next thing you should see is:
ArgumentError: invalid multibyte character
from (irb):2
from /usr/local/bin/irb:12:in `<main>'
Those two lines are taken from actionmailer-2.3.4/lib/action_mailer/vendor/tmail-1.2.3/tmail/utils.rb:115-117
That’s the tmail-1.2.3 that’s vendored in the gems used for rails. A few searches on the net and you find:
http://github.com/mikel/tmail
The first line of text in the readme?
Note… as of 1.2.5, TMail is not compatible with Ruby 1.9.1.
Huh.
So, my guess is, anyone who’s using rails with ruby 1.9.1 has just been getting lucky up to this point. I was not so lucky, and that’s why I’ve moved back to ruby 1.8.7. I’m a lot happier right now.
Posted: February 11th, 2009 | Author: jgeiger | Filed under: Uncategorized | Tags: mcw, rails, ruby, vipdac | No Comments »
I’m building a web application the analyze large data sets. The simplified process is: upload a data set, split it into multiple chunks, process the chunks and then zip the results together.
While you can kill a job in progress right now, all it’s really doing is removing it from the database. The queue is still clogged full of tasks that need to complete for a job that doesn’t exist anymore. I’m referring to this as a push model, since I’ve pushed all the tasks onto the queue and the workers consume them as fast as they can. The problem lies in the fact that to remove the job messages from the queue, you need to kill the queue. (Using beanstalkd right now) This is fine if you have a single job on the queue, and you can ssh into the server, but it’s still a pain.
After some thought, I’m going to try to impliment a pull model. Each worker will announce it’s available to the head node when it starts up. The head node will note it’s existence in a ‘workers’ table, with the status of available. When a job gets submitted, the head node looks to see if any workers are available. If so, it drops the message onto the worker queue. It doesn’t matter if the worker that was available gets the job, just that there was one available. When the worker pulls the task off the queue, it sends a message back to the head queue saying that it’s now busy. The process continues once we have a series of tasks backing up on the head node, where the head will see if we have available workers, and if so, drop a task onto the worker queue.
What we gain from this is the ability to kill the job, and all associated tasks on the head node before they’re put into the worker queue. The tasks that are in process will still complete since we can’t go in and stop them, but that’s ok. Once they’ve all finished, we clean up the working files, and remove the job and other valid jobs can continue on without any issues.
Another gain is the ability to pause jobs, or better assign priorities. We don’t want a job that’s 95% done to be trumped by a higher priority job, since the system would think it’s stuck.
Posted: January 23rd, 2009 | Author: jgeiger | Filed under: Uncategorized | Tags: rails | 3 Comments »
I had an issue with adding my own before_destroy hook to remove some files from S3 after deleting a record. Paperclip has a before_destroy hook that removes the attachments, and I was using the filename it provides to delete the remote files. It seems that the order of the before_destroy lines is important (which makes sense) but you need to be aware if it exists in a plugin you’re using.
In the case of Paperclip, I needed to put my custom hook before the has_attached_file declaration.
This comment helped me out http://apidock.com/rails/ActiveRecord/Callbacks/before_destroy#43-Where-you-declare-before-destroy-matters