Nerd wannabe

lazy weblog

SQL Dependency Injection

with 2 comments

“All problems in computer science can be solved by another level of indirection, but that usually will create another problem.” — David Wheeler, inventor of the subroutine

This is the story of a “framework” that doesn’t exist: SqlDi

Most database frameworks’ authors consider that it is wrong to write SQL in your source code and provide many kinds of wrappers you need to deal with, instead of pure query strings.

Their reasons are along the lines of:

Portability: SQL is not database independent

Sadly, this is true. Although there are ANSI standards – ANSI SQL 99, 2003 etc. – no RDBMS supports all the features from ANSI and most vendors force you to use some of their custom extensions to accomplish things from the standard.

(For example, SQL 2003 introduces ‘window functions’ that could be used instead of the vendor specific ROWNUM, LIMIT or TOP keywords – and the window functions allow you to do even more. Few database systems support SQL 2003 though..)

Abstraction: SQL queries contain details about the storage mechanism

The relational model is a good way of assuring the integrity and non-redundancy of data. JOINs allow you to reverse the normalization process, while keeping the “Single Point of Truth” – storage model. INDEXes allow you to lookup faster.

Having SQL (in your source) that displays all these storage details is bad in practice. Even if your attributes (columns) are not subject to change, the way of retrieving them may be.

For example, if you want to display a (paginated) list of books on a web page, with their authors, a minimal query would be:


SELECT books.title as title, books.date_published as date_published, authors.name as author
FROM
  books
INNER JOIN
  authors
ON
  books.author_id = autors.id
WHERE
  ROW_NUMBER() OVER (ORDER BY books.title) BETWEEN 1 AND 10

This is both an example of SQL that reveals the storage internals and that is not portable (because ROW_NUMBER() is supported by very few ‘chosen’ ones)

Correctness: Query strings are validated only when run against the database

Even if you’re using an editor that knows how to highlight syntax from the SQL strings, you are still prone to mistakes, syntactical or not (e.g. wrong column name).

The usual solution is to write objects that construct your query – here’s an example for Hibernate’s Criteria API


Criteria crit = session.createCriteria(Cat.class);
crit.add( Expression.eq( "color", eg.Color.BLACK ) );
crit.setMaxResults(10);
List cats = crit.list();

This still fails if the “color” string is mistyped, for example.

SqlDi

The proposed SqlDi solution emphasizes some of the positive things about SQL, while working around the above-mentioned issues.

First let’s talk about the “Domain Specific Language” well known meme – SQL is one of the best examples of DSL around, but suddenly everybody wants to write in lolcode like this:

include StarbucksDSL

order = latte venti, half_caf, non_fat, no_foam, no_whip
print order.prepare

As opposed to this ruby Starbucks language I found on the web, SQL has proper syntax rules, and everybody knows at least its basics.

So SqlDI will allow you to write a subset of SQL called SqlDi.

But how about the aforementioned Abstraction ?

In order to decouple the storage model from the application query, SqlDi uses a kind of remote-process inversion of control (RP/IoC) pattern where the actual storage-specific part of the query gets injected in your code by the RDBMS process (instead of your code depending on the storage layout).

Here’s an example of the previous query, re-written in SqlDi:

SELECT title, date_published, author
FROM books_with_authors
LIMIT 1,10

You can already guess the syntax of SqlDi, but I will paste it here anyway:

SELECT attribute_expr [, attribute_expr]* FROM thing [WHERE boolean_expr] [LIMIT n [,m]]

Now how does the ‘thing’ called ‘books_with_authors’ got injected?

Simple, it’s part of the database “configuration file” (the script that creates the schema):

CREATE TABLE authors
( id INTEGER PRIMARY KEY, name VARCHAR(80), birthday DATE);

CREATE TABLE books
( id INTEGER PRIMARY KEY, date_published DATE, author_id INTEGER FOREIGN KEY
  REFERENCES authors(id) );

CREATE VIEW books_with_authors AS
SELECT books.title as title, books.date_published as date_published, authors.name as author
FROM books INNER JOIN authors ON books.author_id = autors.id;

Now, if some other time you discover that one book can have multiple authors, all you have to do is to re-write the storage-dependent part of the query.

But how does SqlDi provide Portability ?

SqlDi is a parser for a subset of SQL – you have already seen the SELECT statement syntax. The various functions dealing with numbers, dates and strings, even the operators (such as string concatenation) are defined one way or another in SqlDi and translated into the native dialect.

For example, the LIMIT keyword is translated into ROWNUM, LIMIT, TOP or ROW_NUMBER() depending on the used database.

This translation is accomplished in two steps:

  • First, the query is tokenized and parsed, and an object which represents the root of the Abstract Syntax Tree is constructed.
  • Second, the AST spills out the native dialect – based on some form of run-time pluggable dialect writers (plus ‘drivers’)

In this way you’ll never need to change the “SELECT x FROM y WHERE …” from your source, even if you change the used database. (You do, however, need another schema creation script).

Yeah, but how about the compile-time Correctness ?

This is the most interesting part.

SqlDi is implemented in D (because I don’t know LISP), and the first step (parsing) is accomplished at compile-time, through the use of mixin() statement. The output of parsing is code, like this:

SqlDi.Select ( "books_with_authors", null /*the where clause*/, 1, 10 );

For example, if you have WHERE name LIKE ‘Richard %’, instead of a null you will have SqlDi.Like ( SqlDi.Column(“name”), “Richard %” ) – where “SqlDi” is the package name.

Of course you could write this yourself if you are a hardcore SqlDi user;)

Conclusion

To summarize what we achieved:

  • you can write SQL in your source code,
  • it will be database independent and
  • you will get compile-time syntax checking
  • bonus points: the DBA will become your best friend

What we didn’t do:

  • no object mapping was even considered
  • we didn’t enter into details of INSERTs and UPDATEs
  • SqlDi does not exist
About these ads

Written by vlad

April 24, 2008 at 8:28 pm

Posted in ideas, Uncategorized

2 Responses

Subscribe to comments with RSS.

  1. This topic is quite trendy on the Internet at the moment. What do you pay the most attention to while choosing what to write ?

    How to Get Six Pack Fast

    April 15, 2009 at 5:44 pm

  2. nice info thx man

    sql injection

    July 6, 2010 at 9:23 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: