Some web security tools for Rails

November 9, 2017

We have recently improved some of our tools and capabilities around security from the perspective of a web Rails application. This article is a mix of concepts, tools and the design choices I found adequate working on this.

Password complexity

Probably you’ve been in one of these forms lately where, for the sake of security, you have been forced to set up a password following what is called LUDS: "lower and uppercase letters, digits and symbols".

Whether you like it or not, passwords are still a key component of most online authentication systems and therefore a common source of problems. The idea behind LUDS is that the bigger the guessing space the safer the password will be. These basic probabilistic concepts are well known for most computer scientists, programmers, etc.

This metric works well for passwords generated by machines, but it doesn’t play as well with human habits. We do not make asymmetric choices when creating our passwords, quite the opposite. We make similar choices, trying to pass LUDS in similar ways and consequently, reducing the guessing space (P@ssword, sequences, keyboard patterns, etc.). LUDS is also sometimes, hard to follow for users, disallowing perfectly safe passwords. zxcvbn takes a different approach, trying to match the password with already known common passwords or patterns, names and surnames and popular English words.

The library maintained by Dropbox makes things quite easy. We have client and server validations for the passwords in place, we just needed to make sure that the versions were the same. We force it in the frontend and backend:

{
  "name": "app",
  "dependencies": {
    "zxcvbn": "4.4.1"
  }
}

Typical package.json file

gem 'zxcvbn-js', '4.4.1', require: ‘zxcvbn'

Typical gemfile

Connection geographical location

You have probably already used it in many online applications. A feature that gets your location so fraud can be detected should a connection from a strange location appear. Geocoder is a fairly standard tool in the Rails world that handles most of the details for us.

As the connection to the geolocation provider means by default a remote connection that will block the Rails process, we had to do it separately using, again, the default choice for a lot of Rails applications: DelayedJobs.

def delayed_geocode
	
end

handle_asynchronously :delayed_geocode, queue: ‘my_queue’

Password expiration

Security

Again, a fairly typical security measure. After a determined period, your password gets marked as expired so you have to renew it. It is nevertheless a controversial technique, as its usefulness has decreased over time and it clearly exists different criteria and approaches when reading different literature.

This technique minimizes the risk associated with losing backups that could be somehow accessed by an attacker. If the backup had the same password that you still have in production and a weak hash function was in place, an attacker could break your password and gain access to the system. With a good security policy, backups tend to be secured and encrypted, but as organizations grow, sometimes they can become laxer about them, even more, when older. With a password expiration policy of 90 days and a hash secure enough to hold the attacker, your account would be safer.

The current version of the PCI DSS (3.2 when this article was written) indicates that users must change their password every 90 days, taking into account that they must not be the same as any of the four previous passwords. Other parts of the document suggest that the standard is not clearly based on the current state of the art password security guidelines. But even if the standard is not good enough that does not mean that organisations are free to ignore it.

ISO 27001 provides generic guidelines on password management, 90 days and 2 passwords to prevent re-use.

Notifications

One of the challenges is to notify users that the expiration is coming so they can change the password. Although different notification methods can add more complexity to the final solution (email, SMS, Slack, etc.) these are usually easy to abstract. Building the infrastructure needed to keep track of successfully sent notifications, errors, retry policy, etc. is more complicated.

DelayedJobs comes with a nice API that makes extremely easy to schedule new jobs. If we need to notify customers, say, 15, 7 and 1 day before expiration, it is trivial to think about something like this:

# Almost extracted from the documentation
Notifier.delay(run_at: @user.expires_at - 1.day).send(@user)
Notifier.delay(run_at: @user.expires_at - 7.day).send(@user)
Notifier.delay(run_at: @user.expires_at - 15.day).send(@user)

If we define "state" as the present condition of a system/entity (therefore the need to remember), it is easy to see how this schema generates a big piece of it. If we have a big base of customers we will store the fact that we need to send three notifications for each one of them. The number of things that we need to remember and keep in sync grows quickly and maintenance problems appear soon. What happens if we need to change the notifications from [15, 7, 1] to [30, 15]?. This would imply a big removal of rows and changes in others.

A different approach is to set a unique job that will be responsible for scheduling the actual sending operation and nothing else. This job can run, for example, every day. Once it checks all the users that should be notified, it schedules atomic jobs for all of them but with a much smaller life scope. Almost fire and forget compared with something that should be remembered, following our example, 1 day - the expiration period minus one day in the worst case. This new job will be just in charge of just sending the email and managing the retry policy (something that DelayedJobs basically does for us). Any change in any configuration of the scheduler job does not imply any extra maintenance cost or modifications of any previous state.

Blocklisting

We built most of our infrastructure for this topic around Rack::Attack. Not much to be added about the tool itself. Rack::Attack is an incredibly widely used middleware for blocking and throttling requests. The definition for blocklisting as we understood it is left a bit open for consumers.

The Rack::Attack features that need to store information are saved in a configurable cache, usually Memcached or Redis. Both of them are in-memory databases. As our definition of the block is persistent, we did use our own implementation of the store.

A few hints about the concepts itself:

What is the difference between throttling and blocklisting?

  • Throttling: Makes sure that clients are operating under a limit of requests in a range of time.
  • Blocklisting: Blocks clients when bad behavior has been detected.

Both slow down the window for brute force attacks but the blocklisting tries to lead the client to a locked state as soon as possible, so it remains locked and cannot operate under any limits. It is used when the client is clearly misbehaving, in our case, in security exceptions around login operations.

You definitely want to leave some operations whitelisted, log-in with cookie would be one of them. It does not operate under the blocklisting pipeline.

Locking accounts has a user experience impact, so some thought must be put on it. It is always important to have in mind the cases that we are trying to optimize and work with data as much as possible, before and after the release to production.

Why would you want to lock an account?

When an activity is detected that has no explanation other than an attempt to harm security, there is no other solution that marking this circumstance as explicitly as possible. Throttling does not help as much to achieve this. Blocking makes the detection of abuse much likely.

How does the account get unlocked?

Via the administrator or the reset password feature, if we are talking about login. Once the password is reset via a method outside the application (email usually) the client can be trusted again.

What is the definition of client?

You need to define how you are going to aggregate the metrics used to decide if a request is going to be blocked or not. Usually, IP + account.