read

Mongoid is an excellent Ruby ORM for mongodb, building a simple query is clear Article.where(status: "published") but when the context grows the query becomes larger and its intentions are lost. Recognizing when to switch to a structured approach is important to preserve the query logic and avoid premature optimization.

Let’s start from a simple query:

Article.where(status: "published", legacy: true, title: /#{keyword}/)

You can create criterias inside the Article class to promote reusability, ie.

# article.rb
def self.published
  where(status: 'published')
end

def self.imported_from_legacy
  where(legacy: true)
end

# refactored query
Article.imported_from_legacy.published.where(title: /#{keyword}/)

In a growing application the query conditions are likely to increase in number and complexity and probably need to be driven by some user input.

Following the approach above we’d have:

if filter[:published] = 'published'
  criteria = Article.published
elsif filter[:published] = 'unpublished'
  criteria = Article.unpublished
else
  criteria = Mongoid::Criteria.new(Article)
end
criteria.imported_from_legacy.where(title: /#{keyword}/)

So far so good. The code deciding which filter to apply is becoming a transaction script, regardless if it lives in a Ruby on Rails controller private method or in a Mongoid Document or in a separate plain Ruby object. A transaction script as defined by Martin Fowler in PEAA:

Organizes business logic by procedures where each procedure handles a single request from the presentation.

Let’s extend our over simplified “toy” problem with an example adapted from an actual project:

the keyword filter would need to match not only the title but also the article tags.
let’s say we only store the tags ids in the database so we will need to contact the tags API to translate the keyword to an id.
the keyword search should be on tags only when a specific string is provided (say “tag:’black’”) and ignore titles.

The increase in complexity means the above script will be start looking like this:

criteria = Mongoid::Criteria.new(Article)
if filter[:published_state] == 'true'
  criteria = Article.published
elsif filter[:published_state] == 'false'
  criteria = Article.unpublished
end
if filter[:keyword].present? && filter[:keyword].match(/^tag:'.+'$/)
  searcher = TermsApi::Searcher.new
  filter[:keyword].match(/tag:'(.+)'/)
  searcher.match_exactly($1)
  criteria.in( term_ids: searcher.term_ids )
elsif filtering_by_keyword?
  searcher = TermsApi::Searcher.new
  criteria.or([ { term_ids: { "$in" => searcher.match(filter[:keyword]) } }, { headline: /#{filter[:keyword]}/i } ])
end

When to stop using a transaction script

It’s hard to tell when to stop on that approach. In general here’s a guide line of symptoms you should pay attention to:

adding new filters or changing existing ones becomes hard to estimate
nested conditions or multiple inline conditions
test driving is cumbersome, you tend to have one long integration test with a setup populating all the possible combinations
the code is fragile and breaks often
the script code becomes a knowledge silos where only one guy approximately knows what’s going on and nobody wants to work on it

To me a line count growing over 10/15 is also a signs that we should explore another direction.

Another approach

What I’ve been doing when dealing with large more complex queries in the last couple of years is test drive their criterias in separate objects and having a query object orchestrating those filters. I think this as an application of the query object pattern explained by Martin Fowler in PEAA:

A Query Object is an application of the Interpreter pattern geared to represent a SQL query. Its primary roles are to allow a client to form queries of various kinds and to turn those object structures into the appropriate SQL string.

The query is broken up in to criteria objects, and the query object responsibility is in this case just instantiating each criteria with the right parameters and merging them.

When TDD this you’d have a high level integration test testing the first criteria you’re adding, then a unit test for the first criteria on the query object, then build that criteria’s unit test. At this point your high level integration test should be passing.

How to TDD is not the focus of this post so I omitted the integration test and I am adding the full spec code and class code in one go:

describe ContentFilter do

    describe ".apply" do
      let(:criteria) { Mongoid::Criteria.new(Article) }
      let(:keywords) { double }
      let(:published) { double }
      let(:legacy_links) { double }
      let(:piece_type) { double }
      let(:filters) { { keyword: keywords, published: published,
                        legacy_links: legacy_links, piece_type: piece_type } }

      before do
        allow(Criteria::KeywordFilter).to receive(:new).and_return( double(value: criteria) )
        allow(Criteria::PublishedStateFilter).to receive(:new).and_return( double(value: criteria) )
        allow(Criteria::LegacyLinkFilter).to receive(:new).and_return( double(value: criteria) )
        allow(Criteria::PieceTypeFilter).to receive(:new).and_return( double(value: criteria) )
      end

      it "should trigger keyword criteria" do
        keyword_filter = double('KeywordFilter', value: criteria)
        expect(Criteria::KeywordFilter).to receive(:new).with(keywords).and_return(keyword_filter)

        filter = ContentFilter.new(filters)
        filter.apply
      end

      it "should trigger published state criteria" do
        published_filter = double('PublishedStateFilter', value: criteria)
        expect(Criteria::PublishedStateFilter).to receive(:new).with(published).and_return(published_filter)

        filter = ContentFilter.new(filters)
        filter.apply
      end

      it "should trigger legacy link criteria" do
        legacy_links_filter = double('LegacyLinkFilter', value: criteria)
        expect(Criteria::LegacyLinkFilter).to receive(:new).with(legacy_links).and_return(legacy_links_filter)

        filter = ContentFilter.new(filters)
        filter.apply
      end

      it "should trigger piece type criteria" do
        piece_type_filter = double('PieceTypeFilter', value: criteria)
        expect(Criteria::PieceTypeFilter).to receive(:new).with(piece_type).and_return(piece_type_filter)

        filter = ContentFilter.new(filters)
        filter.apply
      end
    end

  end
end

This test ensures the correct attribute is passed from the query object (ContentFilter) to each criteria. Each criteria will return a fallback criteria when not used.

And here’s the code for it:

class ContentFilter

  def initialize(filter_parameters)
    @filter_parameters = filter_parameters || {}
  end

  # Generate a filter on keyword and published state.
  # @return [Mongoid::Criteria]
  def apply
    keyword_criteria.merge(legacy_link_criteria).merge(piece_type_criteria).merge(published_state_criteria)
  end


  private

    def keyword_criteria
      criteria = Criteria::KeywordFilter.new(@filter_parameters[:keyword])
      criteria.value
    end

    def published_state_criteria
      criteria = Criteria::PublishedStateFilter.new(@filter_parameters[:published])
      criteria.value
    end

    def legacy_link_criteria
      criteria = Criteria::LegacyLinkFilter.new(@filter_parameters[:legacy_links])
      criteria.value
    end

    def piece_type_criteria
      criteria = Criteria::PieceTypeFilter.new(@filter_parameters[:piece_type])
      criteria.value
    end
end

The logic of the query is broken in small easy to understand parts that when assembled together deliver the full query.

Now let’s switch to a unit test for the published filter:

describe Criteria::PublishedStateFilter do

  context 'filtering published pieces' do
    let(:published_state) { 'true' }

    it 'should return a Mongoid::Criteria' do
      criteria = Criteria::PublishedStateFilter.new(published_state)
      expect(criteria.value).to be_a_kind_of(Mongoid::Criteria)
    end

    it "should retrieve published pieces" do
      expect(Article).to receive(:published)
      
      filter = Criteria::PublishedStateFilter.new(published_state)
      filter.value
    end
  end

  context 'filtering unpublished pieces' do
    let(:published_state) { 'false' }

    it "should retrieve unpublished pieces" do
      expect(Article).to receive(:unpublished)
      
      filter = Criteria::PublishedStateFilter.new(published_state)
      filter.value
    end
  end

  context 'without a published state filter' do
    let(:published_state) { nil }

    it "should apply no filter" do
      expect(Article).to_not receive(:published)
      expect(Article).to_not receive(:unpublished)
      
      filter = Criteria::PublishedStateFilter.new(published_state)
      filter.value
    end
  end

end

Our unit test is ensuring that based on user input the correct mongoid API are made. This object’s value will always be a Mongoid::Criteria to allow the ContentFilter class to merge it with other criterias.

Here’s the code:

module Criteria

  class PublishedStateFilter

    def initialize(state_filter_parameter)
      @published_state = state_filter_parameter
    end

    # Generate a filter on published state.
    # @return [Mongoid::Criteria]
    def value
      if filtering_published?
        Article.published
      elsif filtering_unpublished?
        Article.unpublished
      else
        Mongoid::Criteria.new(Article)
      end
    end


    private

      def filtering_published?
        @published_state == 'true'
      end

      def filtering_unpublished?
        @published_state == 'false'
      end

  end

end

Now let’s look at the keyword filter, it has to work in two scenarios: a tag specific filter ie. "tag:'president day'" or "tag:'public holiday'" (to find any article tagged with the text between quotes) as well as: “president” to find articles titled and tagged “president”. That logic will live inside the Criteria::KeywordFilter.

Now we start building the keyword criteria test:

describe Criteria::KeywordFilter do

  context "with a keyword filter" do
    let(:keyword) { 'president' }
    let(:president) { "ed723f60-afce-11e4-ab7d-12e3f512a338"  }
    let(:president_obama) { "ed7241f4-afce-11e4-ab7d-12e3f512a338" }
    before do
      # Here we are stubbing our Api interface to say when I search for 'president'
      # return two terms: president and president obama
      allow_any_instance_of(TermsApi::Searcher).to receive(:match).with(keyword).and_return([president, president_obama])
    end

    it 'should generate a criteria' do
      criteria = Criteria::KeywordFilter.new(keyword)
      expect(criteria.value).to be_kind_of(Mongoid::Criteria)
    end

    it 'should match headline and tags' do
      expect(Article).to receive(:or).with( [{ term_ids: { "$in" => [president, president_obama] } },
                                             { headline: /#{keyword}/i },
                                            ])

      criteria = Criteria::KeywordFilter.new(keyword)
      criteria.value
    end
  end


  context "with an exact keyword filter on tag" do
    let(:tag) { "Music Ceremony" }
    let(:keyword) { %Q{tag:'#{tag}'} }
    let(:president) { "ed723f60-afce-11e4-ab7d-12e3f512a338"  }
    let(:president_obama) { "ed7241f4-afce-11e4-ab7d-12e3f512a338" }
    before do
      # Here we are stubbing our Api interface to say when I search for the exact term 'Music Ceremony'
      # return only that term.
      expect_any_instance_of(TermsApi::Searcher).to receive(:match_exactly).with(tag).and_return([president])
    end

    it 'should match only for the exact tag' do
      expect(Article).to receive(:in).with(term_ids: [president])

      criteria = Criteria::KeywordFilter.new(keyword)
      criteria.value
    end
  end

end

Our unit test is ensuring that based on keyword filters the correct mongoid API as well as 3rd party calls for terms are made. Like for PublishedFilter this object’s value will always be an instance of Mongoid::Criteria to allow the ContentFilter class to merge it.

I once again pasted the entire spec but while TDD-ing you’d add one example at the time. Based on how critical the feature is you might start from a specific integration test.

And here’s the code:

module Criteria

  class KeywordFilter

    # Generate a filter on keyword matching headline, term or exact term.
    def initialize(keyword)
      @keyword = keyword
    end

    # @return [Mongoid::Criteria]
    def value
      if filtering_by_exact_term?
        Article.in( term_ids: exact_term_to_ids )
      elsif filtering_by_keyword?
        Article.or(matching_headline_or_terms)
      else 
        Mongoid::Criteria.new(Article)
      end
    end


    private
      
      def filtering_by_exact_term?
        filtering_by_keyword? &&
        @keyword.match(/^tag:'.+'$/)
      end
      
      def exact_term_to_ids
        searcher = TermsApi::Searcher.new
        searcher.match_exactly(extract_exact_term_from_keyword)
      end
      
      def extract_exact_term_from_keyword
        @keyword.match(/tag:'(.+)'/)
        $1
      end
      
      def filtering_by_keyword?
        @keyword.present?
      end
      
      def matching_headline_or_terms
        [ { term_ids: { "$in" => term_to_ids } },
          { headline: /#{@keyword}/i } ]
      end

      def term_to_ids
        searcher = TermsApi::Searcher.new
        searcher.match(@keyword)
      end

  end
end

We’re using a higher level language to represent the query composition which helps to better understand what the code does.

Conclusion

I find this approach useful to organize complex query code and to steadily expand queries as an application grows.

Failing to see that growth and continuing to extend a transaction script will lead to your domain model complexity be tangled in code making it hard to maintain.

You can add sorting capabilities with a ContentSorter class that given the ContentFilter applies the necessary sorting logic.

Create maintainable mongodb queries in Ruby with query object and mongoid criterias

When to stop using a transaction script

Another approach

Conclusion

Next up:

Reduce memory footprint requiring portions of your component based Rails application with Bundler

More reads about work, ruby, mongodb, maintainability, other topics

Enrico Teotti

agile coach, (visual) facilitator with a background in software development and product management since 2001 in Europe, Australia and the US.