Next Previous Contents

1. Foreword and Introduction

1.1 Copyright

Copyright (c) 1998,1999,2000,2001,2002,2003 Carlo Strozzi.

Large parts of this manual come directly from the RDB documentation, written by W. Hobbs, and are included here with his permission.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

1.2 Preface

This working draft describes and provides instructions for the use of NoSQL (I personally like to pronounce it noseequel), a derivative of the RDB DataBase system. The RDB system was (and still is) developed at RAND Organization by Walter W. Hobbs. Part of the NoSQL code, as well as large parts of the text of this document, have been taken directly from RDB, so a good share of the credit goes to the original author.

Other major contributors to the original RDB system, besides W. Hobbs, were:

Chuck Bush

Don Emerson

Judy Lender

Roy GateS

Rae Starr

People who helped with turning RDB into NoSQL:

David Frey

Maurizio (Masar) Sartori

Vincenzo (Vicky) Belloli

Giuseppe Paternò

Paul Lussier

Seth LaForge

The NoSQL.png logo has been kindly provided by Kyle Hart.

NoSQL tends to be biased in favour of Linux. This means that, wherever it matters, NoSQL makes use of the GNU versions of the various UNIX commands, as those are the ones that are usually packaged together with the GNU/Linux operating system. Among the many Linux "distributions" available, NoSQL has been developed entirely on Debian, although NoSQL should run just as well on any other common Linux distribution. NoSQL is Free Software, released under the terms of the GNU General Public License. As such, NoSQL qualifies fully as Open Source Software.

1.3 Introduction

A good question one could ask is, "With all the relational database management systems available today, why do we need another one?" The main reasons are:

  1. Several times I have found myself writing applications that needed to rely upon simple database management tools. Commercial database products are often too costly and too feature-packed to encourage casual use. There also are plenty of good free databases around, but they too tend to provide far more than I need most of the time. Most commercial and free databases lack the shell-level approach of NoSQL. By contrast, NoSQL takes a very simple approach (even simplistic, as some may argue :-), but that is exactly its distinguishing feature. Admittedly, having been written mostly with interpretive languages (Shell, Perl, AWK), NoSQL is not the fastest DBMS, at least not always. A lot depends on the application.
  2. NoSQL is easy for non-computer people to use. The concept is straightforward and logical. To select rows of data, the 'row' operator is used; to select columns of data, the 'column' operator is used.
  3. The data is highly portable to and from many types of machines, like Macintoshes or DOS computers.
  4. NoSQL should run on any UNIX machine that has perl(1) and mawk(1) installed.
  5. NoSQL has very few arbitrary limits, and can work where other products can't. For example, there is no limit on data field size, the number of columns, or file size (the number of columns in a table actually may be limited to 32,768 by some implementations of the AWK interpreter, including mawk I think).

Note: NoSQL has only been tested with mawk(1). Mawk is Mike Brennan's implementation of the AWK programming language. NoSQL will most likely not work out-of-the-box with any other AWK, including gawk(1). While getting NoSQL to work also with gawk(1) should not be difficult, making it work with other AWKs may prove hard, if at all possible.

As its name implies, NoSQL is not an SQL database. The rationale behind this is well explained in the accompanying paper 4gl.ps (Postscript), or 4gl.txt (ASCII).

NoSQL data is contained in regular UNIX ASCII files, and so can be manipulated by regular UNIX utilities, e.g. ls, wc, mv, cp, cat, head, more, less, editors like 'vi,' etc., as well as by powerful versioning systems, such as RCS and CVS.

The form of each file of data is that of a relation, or table, with rows and columns of information.

To extract information, a file of data is fed to one or more "operators" via the UNIX Input/Output redirection mechanism.

There are also programs to generate, modify, and validate the data. A thorough discussion on why this type of relational database structure makes sense is found in the book, "UNIX Relational Database Management," Reference #2.

It is assumed that the reader has at least a basic familiarity with the UNIX Operating System, including knowledge of Input/Outout redirection (e.g., STDIN, STDOUT, pipes).

Again, the key feature of NoSQL (and other similar packages mentioned in this manual), is its close integration with UNIX. Real-world problems are typically more complex than the data models provided by many DBMS. Actual applications, and Web-based applications are no exception, are complex puzzles made up of many small pieces, several of which are data-related. Unlike other fourth generation systems, NoSQL is an extension of the UNIX environment, making available the full power of UNIX during application development and usage.

NoSQL was designed with the UNIX shell language as its user interface. This level of integration removes the need to learn yet another set of commands to use and administer the database system. A database is just a file, and can be maintained like all other files that the user owns or to which he has access. Because NoSQL commands are executable programs, the UNIX shell is inherited as the primary command language of the database; no other proprietary database scripting language, to my knowledge, is as powerful and flexible as the UNIX shell. The shell-level nature of NoSQL encourages casual use of the system, and successful casual use leads to familiarity and successful formal use. This concept is much more thoroughly explained in the paper "The UNIX Shell As a Fourth Generation Language," included in the NoSQL documentation tree with the file name 4gl.ps (Postscript) or 4gl.txt (ASCII), that shows why the UNIX shell is an excellent tool for scripting database access.

NoSQL certainly cannot be seen as a "Big Name Database." But NoSQL's extreme generality and absence of restrictions make it more convenient and more effective for many everyday data management tasks than supposedly more powerful products.

1.4 Perl and the Operator/Stream Paradigm

As stated in the Abstract, NoSQL uses the Operator/Stream DBMS Paradigm. The main reason why I decided to turn the original RDB system into NoSQL is that the former is entirely written in Perl. Perl is a good programming language for writing self-contained programs, but Perl's pre-compilation phase and long start-up time are worth paying only if once the program has loaded it can do everything in one go. This contrasts sharply with the Operator/Stream model, where operators are chained together in pipelines of two, three or more programs. The overhead associated with initializing Perl at every stage of the pipeline makes pipelining Perl inefficient. A better way of manipulating structured ASCII files is using the AWK programming language, which is much smaller than Perl, is more specialized for this task, and is very fast at startup. On my Pentium II Linux /usr/bin/mawk (POSIX AWK) is just 99K. Perl 5 is almost 500K. You get the point.

1.5 Bug reports

There is a mailing list for discussions related to NoSQL. The address is noseequel(at)strozzi.it. To subscribe simply send a message to minimalist(at)strozzi.it with the word "subscribe" (without the quotes) in the message subject.

Please send bug reports (fixes are most welcome) to the same list noseequel(at)strozzi.it. Always include as much information as possible, especially the content of file nosql.version, which is created in the NoSQL installation directory during install. By "bug reports" I mean not just errors in the code, but also grammatical mistakes, typos, and bad English constructions in the documentation, as English isn't my native language.


Next Previous Contents