Skip to content


ActiveRecord Callbacks aka How to Keep Data you Don’t Control Fresh

Ruby on Rails logoHave you ever been frustrated by having to query data that you don’t control? Especially if the data you want is not accessible in a format that you desired, you have probably “locally cached” this information. What happens then when this data changes on the source? There are a few approaches:

  • Rebuild the cache at a certain time - This approach allows your code to function in a way that doesn’t care too much about the data being cached. You do your thing, and a cron/scheduled job, does its thing and everyone is happy. Well, mostly. The problem with this approach is the frequency of the cache rebuilding. The shorter the frequency, the more accurate, but intensive the application becomes. The longer the frequency, the less intensive, but less accurate your data becomes. In either scenario, you will probably have to worry about mechanisms to manually rebuild the cache
  • Rebuild the cache on-the-fly - This approach allows your code to be as up-to-date as possible, while preserving the local cache, and not affecting performance too much. A typical scenario would be to insert records into your local cache the first time you pull it from the native source. This takes care of the need to pre-cache objects, since it is done at request time, but it comes at a performance penalty. The first request is the longest, then subsequent requests are quick. Also, you still have the issues of when to refresh the cache, and how to allow manually refreshing the cache. Also, this complicates your code; in addition to your logic, you now have relatively meaningless cache logic side-by-side with your meaningful logic.
  • Don’t cache - Just take the performance hit, optimize it as much as possible, and hope that no one cares the operation takes some extra time to complete. The problem with this approach is efficiency. Computers are fast, and people expect this. People may stop using your code all together if the performance impacts are severe enough to outweigh its usefulness.

So what is a programmer to do? Out of the approaches above, I have opted to perform caching on-the-fly with a twist. That twist takes advantage of ActiveRecord’s callbacks. What is a callback? Think of them as “in-between” steps available for you to hook into as ActiveRecord does its thing. Callbacks are an API that allow you to do this without any ugly hacks, or baseline modifications. Callbacks are also known as hooks. From Ruby on Rails official website:

“Callbacks are methods that get called at certain moments of an object’s lifecycle. With callbacks it’s possible to write code that will run whenever an Active Record object is created, saved, updated, deleted, validated, or loaded from the database.”

Simply, you can create methods with certain names in an ActiveRecord::Base derived model, and define your cache logic here. For example, if we had a Users model, we could query a user in a method like the following:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username
 end
 
# pull the first user
User.find_by_username('kristin')
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Graham', ...>

This code sample will return the first instance of a user, with their attributes loaded. Now, if this information was pulled from our local cache, the information contained may be different than in the original source. For instance, perhaps since the cache was built, this person got married, and changed their name. Your cache is now different from your original source, and this needs to be resolved. So lets implement some cache refreshing via ActiveRecord’s callback method after_find:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username
 
  # ActiveRecord callback
  def after_find
    puts "refreshing cache"
    user = OtherDatabase::User.find_by_username(self.username)
    self.first_name = user.first_name
    self.last_name = user.last_name #Last name is now "Simpson"
    self.save
  end
 end
 
# pull the first user
User.find_by_username('kristin')
refreshing cache
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>

A few things to note. The name “after_find” means that this will be executed immediately following the completion of an ActiveRecord find operation. This includes: first, last, find_by_xxx, all, etc. The method then changes the User instance (local cache) with the data from the other database. ActiveRecord is smart enough to not actually issue a save command unless the data has actually changed, so don’t worry about not being efficient here. Also, you can write this without using the “self” prefix, but it helps me keep track of what is what. Also note that I using “put” just to show when this is executed. You can see that after I call find_by_username, this code is run. If there are any changes, they are reflected in the result, transparent to the rest of your application’s logic. This keeps the cache logic out of your “real” logic.

This will execute everytime we issue a find command on a User class, so this isn’t really efficient yet. Basically, the cache is always immediately expired. For performance reasons, lets make only check the other database every 10 minutes for a user:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username
 
  # ActiveRecord callback
  def after_find
    if self.updated_at.blank? or self.updated_at < 10.minutes.ago
      puts "refreshing cache"
      user = OtherDatabase::User.find_by_username(self.username)
      self.first_name = user.first_name
      self.last_name = user.last_name #Last name is now "Simpson"
      self.updated_at = Time.now # Force the updated time to reflect now
      self.save
    end
  end
 end
 
# pull the first user
User.find_by_username('kristin')
refreshing cache
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>
User.find_by_username('kristin')
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>

Now, we can see the cache working. Every 10 minutes, the local cache is checked against the original source, and for all the other requests, it just skips the conditional, and exits. You can obviously change the 10 minute expiration to anything you desire. Better still, throw this value in a YAML config file, and reference it so that this setting can be customized.

There are many other callback functions that you can use, and can work together to be a very powerful part tool. Check out this following code:

# app/models/user.rb
 class User < ActiveRecord::Base
  attr_accessor :first_name, :last_name, :username
 
  # ActiveRecord callback
  def after_find
    if self.updated_at.blank? or self.updated_at < 10.minutes.ago
      puts "refreshing cache"
      self.last_name = nil
      self.first_name = nil
      self.save
    end
  end
 
  def before_save
    if self.last_name.blank? or self.first_name.blank?
      puts "last_name or first_name is blank"
      user = OtherDatabase::User.find_by_username(self.username)
      self.first_name = user.first_name
      self.last_name = user.last_name #Last name is now "Simpson"
    end
  end
 end
 
# pull the first user
User.find_or_create_by_username('kristin')
last_name or first_name is blank
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>
User.find_by_username('kristin')
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>

This allows me to use “find_or_create_by” to generate records with incomplete information. The missing information is filled in at creation time thanks to the before_save method. Just a note, do NOT call “save” from within some of these methods, as this would create an infinite loop - think about it. Before_save calling save, which would call before_save, etc. Be careful.

There is a performance penalty for me creating a record in this manner, and it would be much better if I got all this information in one query. For example:

# pull the first user
OtherDatabase::User.find_by_username('ksimpson') do |user|
  user = User.find_or_create_by_username(:username => user.username, :last_name =>       user.last_name, :first_name => user.first_name)
end
user
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>
# 10 minutes elapse... (use your imagination)
User.find_by_username('kristin')
refreshing cache
=> #<User id:45242134, first_name: 'Kristin', last_name: 'Simpson', ...>

The before_save would have taken care of any missing information (as we saw above), however this comes at the penalty of a second query, and can quickly mean you have unnecessarily doubled your queries.

Posted in Open-source, Ruby, Software, Thoughts, Web.

Tagged with , , , , .


ActiveRecord’s Secret find_by_sql Results

Ruby on Rails logo Well, its not exactly a secret. It sure isn’t well documented however. Recently, I wanted to return a query that spanned multiple database tables. I decided to go with find_by_sql because of the mind-blowing idiocy with which this legacy database was structured. I will take a watered down version of what I was attempting to do to demonstrate how we can expose some “hidden” functionality of ActiveRecord’s find_by_sql method.

 
Channel table:
------------------------------------
id | title  | description        | user_id
------------------------------------
1  | first | the first channel | 1

User table:
-----------
id | name
-----------
1  | ben

After I constructed my find_by_sql query, it looked something like this:

Channel.find_by_sql("SELECT a.*, b.name
FROM channel a, user b
WHERE a.user_id = b.id")

This query selects all columns from table a (channel), and a single column from table b (user). This is pretty standard, as many queries need to gather values from multiple table columns in a single SELECT operation.

Running this query, you will receive an array of Channel instances with all the attributes filled in for the channel model. Missing however, will be the attributes from any table other than “Channel”:

Channel.find_by_sql("SELECT a.*, b.name
FROM channel a, user b
WHERE a.user_id = b.id")
=> "[#<Channel id: 1, title: "first", description: "the first channel", user_id: 1>]"

Notice how the “name” column from table b (user) is not present in the display? You can even query this attribute directly:

c = Channel.find_by_sql("SELECT a.*, b.name
FROM channel a, user b
WHERE a.user_id = b.id")
=> "[#<Channel id: 1, title: "first", description: "the first channel", user_id: 1>]"
c[0].name
=> NoMethodError: undefined method 'name' for #<Channel:0xb4ac0274>

We could create an attr_accessor for the Channel class, and this would resolve the NoMethodError, but it still won’t be populated for our Channel instance after a find_by_sql.

After some digging around in the source code, and online, I came across this posting, which made the brilliant suggestion of looking in channel.attributes. This method will list an array of attributes that ActiveRecord knows about. Take a look at channel.attributes.keys:

c.attributes.keys
=> ["id", "title", "description", "user_id", "name"]

There it is! Our “missing” name attribute from the SELECT query has been located. Accessing the value for this attribute is trivial:

c.attributes["name"]
=> "ben"

We can do this with as many “extra” columns as we want. If two column names conflict (say channels had a column “name”, and users also had a column “name”), the database will return “name”, and “name_1″ respectively. This is a really powerful feature of ActiveRecord that will encourage people to stick with the ORM, since they can still write SQL in a pinch.

Bonus: Customizing .to_json to include find_by_sql attributes

In the preceding example, the attribute “name” would not be included in the output of a “.to_json” call, as in the following example:

c.attributes.keys
=> ["id", "title", "description", "user_id", "name"]
c.to_json
=> "{"channel":{"id":1,"title":"first","description":"The first channel"}}"

This is where we can customize what is included in our JSON output. This article showed me that you can use the :methods argument with to_json to explicitly include any custom attributes, such as those that are attr_accessor objects in your class. When passing in the :methods argument, I must specify which attributes to include:

c.attributes.keys
c.to_json
=> "{"channel":{"id":1,"title":"first","description":"The first channel"}}"
c.to_json :methods => :name
=> "{"channel":{"id":1,"title":"first","description":"The first channel","name":"ben"}}"

Good job Rails team! No ugly hacks, or overrides needed today.

Posted in Computers.

Tagged with , , , , , , .


SWAN Manager 3.0

Ruby on Rails logoIts back to programming lately at my job. I have taken it upon myself to reinvent SWAN Manager (yet again). This is its third iteration, and has come a LONG ways since 1.0. I love the satisfaction in figuring something out, and then implementing the logic in code. It is creating something from nothing. Today I finished working on the audience builder for our Portal. The layout on the Portal side is terrible, spanning across an LDAP branch, and five different database tables. The logic cannot be inferred from the tables, and of course no source code or documentation is ever is provided on our Portal platform. The beauty is, now that is is done (100% mapped to our functions), I can determine whether any user can access any part of a channel, or announcement in Ruby.

In a (sparse) 150 lines of code, I have done what I would imagine has taken thousands of lines of fragmented thoughts stemming from many programmers in all written in Java code. There is a simplicity, and a poetry that I enjoy when writing in Ruby. Maybe someday, folks will see the light.

For this version of our management website, I had the idea while driving home from work, of making the interface for the Portal totally web service driven, and using sparse templates, and making the whole damn thing run in a few “administrative” channels inside the Portal. It seems like a perfect fit. To start with, how can I expect other departments, and other applications to integrate with the Portal if I am the administrator, and I don’t even do it? Besides, it is a management application for the Portal, so what better place? Also, I can verify that the web services are operating as expected, by having a living proof-of-concept.

Also, new in this version, I have reimplemented the way Targeted Announcements are sent. Before we had a Java class (that I threw a fit about until a certain company gave me the source code), that we modified to accept switches when called for the parameters. This class (and its 10MB of dependencies were “jarred up” (fuck you to Java), and placed in our management website. When someone filled out the nice announcement form, I would take all the parameters, and build the switches on the fly, scp the jar files over to the server, and run a Java command from the bash shell. Needless to say, this sucked. Ruby and Java should never mingle. If I wanted to change the way the class was implemented, it was back to Java, and mucking around with an API I didn’t understand (once again because the company is TERRIBLE at documentation).

I finally got smart, and decided to use Wireshark (thanks James!) and “listen” to the mysterious SOAP traffic occurring from my machine to the server and back. After a few minutes of isolating the traffic, the mystery was revealed as little more than a few dozen lines of XML. A lightbulb went off in my head, and I decided to use Ruby’s REXML library to construct this procedurally based on the form (from earlier) the user fills out and submits. The end result is a cleaner interface, no Java, no scp, or Bash environment, and best of all 9.99MB less space. Hold your applause.

I also decided to really take a good, hard look at all of my models and associations and made the startling discovery that the first significant portion of your project should be ironing out these associations. If you skimp here, your entire application will suffer. Badly. That is because you are laying the foundation here, and if you do it wrong, or half-assed you have really missed the power of Rails.

After a few wonderful hours at home, self-medicated on NyQuil, I managed to get a “user” to be created with just a username. For instance, “User.create(:username => ‘bsimpson’)”. This in turn, spawned off a frenzy of associated activities, including building roles the user belongs to, checking community groups the user is a member of, and building a list announcements the user has authored. It is nice to build an application, thinking about the associations between models first and foremost. The semantics pay off quick, with actions such as “user.groups”, “user.announcement_authorships”, “user.channels”, etc.

Hopefully in a few more weeks, we will have the version 3.0 in production, and in use by a few of our channels.

Posted in Open-source, Ruby, Software, Thoughts.

Tagged with , , , , , .


VirtualBox NAT Tunneling

At work, we have database that you cannot connect to without using 802.1x authentication, or VPN. That sucks for my Virtual Machines, where I can’t used Bridged networking because of the requirements. Instead, I setup my host machine to use 802.1x, and pass this into the Virtual Machine using NAT networking.

You can issue commands to do tunneling from host to guest, as outlined by various websites. I am just old fashioned I suppose, and don’t like to blindly fire commands. I want to see what is being changed. I dug around a discovered the following (Windows host):

C:\Documents and Settings\<username>\.VirtualBox\Machines\<VM>\<VM>.xml

Substitute in your username, and Virtual Machine name. If the VM name has a space, or special character, you can quote the name in double-quotes.

This is where commands from “VBoxManage” are saved. In the following example, I am substituting in VM for the name of the VM, and Name is an arbitrary name that you label for your mapping. You can do this as many times as needed, provided that the name is unique. For example, if you wanted to share SSH from your host, you may choose “ssh” as the name of the configuration. Additionally, the ports don’t have to match up in the guest and host. You can see the result of the following VBoxManage commands:

cd "c:\Program Files\Sun\xVM VirtualBox" (Windows only)
VBoxManage setextradata <VM> "VBoxInternal/Devices/e1000/0/LUN#0/Config/<Name>/Protocol" TCP
VBoxManage setextradata <VM> "VBoxInternal/Devices/e1000/0/LUN#0/Config/<Name>/GuestPort" 22
VBoxManage setextradata <VM> "VBoxInternal/Devices/e1000/0/LUN#0/Config/<Name>/HostPort" 2222

Here:

<VirtualBox>
  <Machine>
    <ExtraData>
      <ExtraDataItem name="VBoxInternal/Devices/e1000/0/LUN#0/Config/Name/Protocol" value="TCP"/>
      <ExtraDataItem name="VBoxInternal/Devices/e1000/0/LUN#0/Config/Name/GuestPort" value="3000"/>
      <ExtraDataItem name="VBoxInternal/Devices/e1000/0/LUN#0/Config/Name/HostPort" value="3000"/>
    </ExtraData>
  </Machine>
</VirtualBox>

Now you can go in and change these commands and see the results in the file. Note that changing the file directly doesn’t affect the machine until a restart of the guest VM.

If you make a mistake creating the command, you can delete the command by issuing “VBoxManage” “VM” “Setting” without issuing a value for the setting. This blank value removes the XML line from the configuration file.

Posted in Computers, Linux, Open-source, Software, Windows.

Tagged with , , , , .


Rdoc for Rails Projects

For those that don’t know, RDoc is the infamous substitution Ruby developers throw out there for “getting started”. RDoc will read code from your Ruby files, and build meaningful output in the form of HTML, chm (Windows help files), RI, and XML. The most common appearance of RDoc for me has been the HTML output. And example of this format can be found on RDoc’s page

I was recently trying to generate documentation for a Rails application that I wrote, however I couldn’t find the documentation for the built in “rake doc:app” task. My complaint with the default options is that it documents the entire Rails framework, including any gems you have included, etc. This is typically WAY TOO MUCH INFORMATION. Considering that Rails is a framework with its own documentation, combining its documentation in with your documentation doesn’t seem like a good idea. If I want to view how ActiveRecord works, I would go to http://api.rubyonrails.org/classes/ActiveRecord/Base.html, not to my generated RDoc collection.

I decided not to use the rake task, and want back to RDoc (since they have a man page) to see what my available options are. “doc:app” just calls RDoc anyways. I found that RDoc will allow you to build documentation based on a directory, instead of the whole she-bang. From the man page:

“rdoc [options] [names...]

If a name is a directory, it is traversed. If no names are specified, all Ruby files in the current directory (and subdirectories) are processed. This means that we can issue a command to look at just the “app/” folder of our project (where the majority of our created code resides) with this command:

# (From inside the Rails root directory)
rdoc -o doc/ app/*

This will create our output at doc/. Inside this directory, you will see an index.html file allowing you to build your class.

If you want to exclude files, or directories, you can use the argument “-x”. If you want to include other working directories, you can use the “-i” argument.

Here are a few other interesting things to try with RDoc:

  • –all - This will generate documentation for public AND private methods. By default, RDoc skips the private methods. This can be used to generate internal, and external documentation, where more or less information is exposed.
  • –diagram - This requires some additional libraries, but it promises to generate visual graphics that show your classes and modules. You will need to install graphviz first.
  • –line-numbers - Helpful for internal documentation, and for tasks like debugging
  • –style - You can specify your own CSS file for custom styles

As for formatting your comments, take a look at the RDoc official docs (created with RDoc!). There is a section on Markup syntax. It is very similar to the Wiki markup syntax.

Posted in Computers, Open-source, Ruby, Software.

Tagged with , , .