Wednesday, July 8, 2015

Whether you like it or not, no one should ever claim to be a data analyst until he or she has done string manipulation.

I am reading Gaston Sanchez' book Handling and Processing Strings in R (pdf).

In the preface, I found the following quote, to which I wholeheartedly agree:

Perhaps even worse is the not so uncommon believe that string manipulation is a secondary non-relevant task. People will be impressed and will admire you for any kind of fancy model, sophisticated algorithms, and black-box methods that you get to apply. Everybody loves the haute cuisine of data analysis and the top notch analytics. But when it comes to processing and manipulating strings, many will think of it as washing the dishes or pealing and cutting potatos. If you want to be perceived as a data chef, you may be tempted to think that you shouldn’t waste your time in those boring tasks of manipulating strings. Yes, it is true that you won’t get a Michelin star for processing character data. But you would hardly become a good data cook if you don’t get your hands dirty with string manipulation. And to be honest, it’s not always that boring. Whether you like it or not, no one should ever claim to be a data analyst until he or she has done string manipulation.

Saturday, June 20, 2015

Little things that make life easier #9: Using data.entry in r

With data.entry(), it's easy to visually fill (small) matrices in r.

Let's see it in action. First, I create a 4x3 matrix:

mx <- matrix(nrow=4, ncol=3) show(mx)

[,1] [,2] [,3] [1,] NA NA NA [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA

The matrix is created with the cells' values being NA Now, in order to assign values to these cells, I use

data.entry(mx)

This opens a small window where I can enter the data.
This is how the cells were filled before my editing them:

And here's how they looked after my editing them just before I used File > Close:

Back in the shell, the matrix has indeed changed its values:

show(mx)

var1 var2 var3 [1,] 2 4 2 [2,] 12345 8 42 [3,] 5 6 489 [4,] 9 22 11

Pretty cool, imho.

Thursday, March 5, 2015

Is a sequence incremented in a failed insert?

Here's a sequence
create sequence tq84_failed_insert_seq start with 1 increment by 1;
And an insert statement:
insert into tq84_failed_insert values( tq84_failed_insert_seq.nextval, lpad('-', i, '-') );

If the insert statement fails, is the sequence still incremented?

Let's try it with a test. The table:

create table tq84_failed_insert( i number primary key, j varchar2(20) );
Some insert statements:
insert into tq84_failed_insert values (5, lpad('-', 5, '-')); insert into tq84_failed_insert values (9, lpad('-', 9, '-'));
and an anonymous block:
begin for i in 1 .. 10 loop begin insert into tq84_failed_insert values( tq84_failed_insert_seq.nextval, lpad('-', i, '-') ); exception when dup_val_on_index then null; end; end loop; end; /

After running this anonymous block, the table contains:

select * from tq84_failed_insert order by i;

Returning:
I J ---------- -------------------- 1 - 2 -- 3 --- 4 ---- 5 ----- 6 ------ 7 ------- 8 -------- 9 --------- 10 ----------
So, the value of I is ascending in steps of 1, showing that the value of netxtval is "wasted" if the insert statement fails.
Source code on github

Monday, February 16, 2015

The most important wget command line options (flags)

Note to self: remember those wget flags and you'll be fine:
-r, --recursive specify recursive download. -H, --span-hosts go to foreign hosts when recursive. -l, --level=NUMBER maximum recursion depth (inf or 0 for infinite). -np, --no-parent don't ascend to the parent directory. -nd, --no-directories don't create directories. -x, --force-directories force creation of directories. -nc, --no-clobber skip downloads that would download to existing files. -k, --convert-links make links in downloaded HTML point to local files. -p, --page-requisites get all images, etc. needed to display HTML page. -A, --accept=LIST comma-separated list of accepted extensions. -R, --reject=LIST comma-separated list of rejected extensions. -w, --wait=SECONDS wait SECONDS between retrievals.

Inserting and selecting CLOBs with DBD::Oracle

Here's a table with a CLOB:
create table tq84_lob ( id number primary key, c clob )

With Perl and DBD::Oracle, the CLOB in the table can be filled like so:

my $sth = $dbh -> prepare(q{ insert into tq84_lob values ( 1, empty_clob() ) }); # setting ora_auto_lob to false: # fetch the «LOB Locator» instead of the CLOB # (or BLOB) content: my $c = $dbh -> selectrow_array( "select c from tq84_lob where id = 1 for update", {ora_auto_lob => 0} ); $dbh -> ora_lob_write( $c, 1, # offset, starts with 1! join '-', (1 .. 10000) );

A CLOB can be selected like so:

my $c = $dbh -> selectrow_array( "select c from tq84_lob where id = 1", {ora_auto_lob => 0}); my $count = 0; while (my $buf = $dbh->ora_lob_read($c, 1+$count*1000, 1000)) { print $buf; $count++; }

Tuesday, February 10, 2015

A Perl wrapper for Oracle's UTL_FILE package

I finally found time to write a simple wraper for Oracle's UTL_FILE package that allows to read a file on the database server with perl.

Here's a simple perl script that demonstrates its use:

use warnings; use strict; use OracleTool qw(connect_db); use OracleTool::UtlFile; my $dbh = connect_db('username/password@database') or die; my $f=OracleTool::UtlFile->fopen($dbh,'DIR','file.txt','r'); my $line; while ($f -> get_line($line)) { print "$line\n"; }

The code is on github: OracleTool.pm and UtlFile.pm.

Checking the value of NLS_LANG in SQL*Plus on Windows

Oracle Support Note *179113.1* offers a, ahem, clever way to display the used value for NLS_LANG on Windows in SQL*Plus.

First, it can be verified if the environment variable NLS_LANG is set:

SQL> host echo %NLS_LANG%

SQL*Plus will answer with either something similar to

AMERICAN_AMERICA.WE8MSWIN1252
or with
%NLS_LANG%

In the first case, the environment variable is set and its value, as displayed by the echo command is the value for NLS_LANG.

If the variable is not set, that is in the second case, the following trick allows to determine its value none the less:

SQL> @.[%NLS_LANG%].

There are again two possibilities how SQL*Plus will react. Either

SP2-0310: unable to open file ".[AMERICAN_AMERICA.WE8ISO8859P1]..sql"
or
SP2-0310: unable to open file ".[%NLS_LANG%]."

In the first case, the value for NLS_LANG is set in the Windows registry (to the value between [ and ]). In the second case, NLS_LANG is not even set in the Windows registry.

Incidentally, this seems to be achieved much easier like so

SQL> select sys_context('userenv', 'language') from dual;