The Oracle Multimedia Blog

The goal of this blog is to list tips and techniques when using Oracle Multimedia on Oracle 11G. The blog will also include personal views and cover a lot of minor topics that the iTunes podcasts do not cover.

Saturday, April 4, 2009

Topics to Annoy #4 - Standards in the World

Its interesting that we view competition as healthy but in the IT world competition slows down progress. The point I am trying to start out with is that standards allow IT to progress. Competition slows it down. The trick is to work out when competition is good and healthy and when its dangerous.

Imagine if there were competing standards for HTML. Imagine also if these standards used different network protocols and photo display. You could have one standard supporting HTML, TCP/IP and JPG, but the competing standard involves a markup language running only on named pipes and Bitmaps (which for those that are interested Microsoft has tried to do many times in the 90's). The internet would not be around now if there were competing standards.

The use of standards involves a group of experts getting together and agreeing on a common format and usage for a particular IT area. Its agreed the standard is not perfect and will evolve over time based on feedback and usage in the market place. There are arguments and temper tantrums thrown in the standards forums. There is politics and the attendees putting together the standards will have various agenda's, but you have to be honest, in the most part the various standards committees have done a good job in getting the standards out there.

We have standards agreed to for HTML, SQL, XML and Networking. There are standards for software development, design and implementation. The concept of standards isn't just limited to the IT world, it does exist in other fields like the building industry and in shipping (containers). In the IT world they reign supreme.

And yet we still have competition. There was beta and vhs, DVD-R, DVD+R and recently with HD and BlueRay. Even when the different players in the market agreed that they needed one standard and only one could survive, they couldn't decide on which one. It was left to the market to decide. In the case of Blue Ray, it was more political maneuvering than market forces that drove the decision in the end. The consumers didn't care, they just wanted one - unless they bought the wrong one.

I guess its easier to define standards for software than for hardware. When you look at Mp3 players, its hard to find two different brands that:
- support the same charging device
- support the same plug-in to load music
- support the same controls
And yet, the force that drives the companies to standards is cost. How much cheaper would it be if all mobile phones could use the same battery? Better still use the same charger? The world would change if this happened. Its in the consumers best interest for this to happen and its in the manufacturers best interest also, so lets hope the initiative to do this happens quickly.

Still there are places in software that we need standards. A good example is for internet messaging and communication. Imagine being able to use Skype to call Yahoo. We have standards for email, why not messaging (or maybe we do, but the players do not want to use them just yet). I would really like an agreed standard on Video and Audio. MP3 seems to be winning for Audio, but Apple has introduced additional standards, and Microsoft would like to put their own in. For video there are so many standards its a minefield dealing with them all.

When it comes to spoken language there is debate whether a universal language is better than a mixed set. Esperanto was proposed as the international language for communication and was designed with standards in mind. It looked good on paper, but never made it. We are now looking at English as becoming the de facto language for communication because of the internet (which I know the French do not like). But English as the universal language is not guaranteed and even Josh Whedon in his TV show Firefly thought the end result might be a mix of English and Chinese. I know linguistics abhor the concept of universal speaking language and are doing their best to keep alive dying languages of the world. I would say that a universal speaking language is worthy of more debate, I personally would like to hear more about why keeping alive all the languages is good.

Wouldn't it be great if there was an agreed standard for Remote Controllers?
What about Solar Power Cells. If they could make them to plug-and-play between A3, A4 and A5 size, then consumers could buy the cells they need and mix and match as required. And it would be great if the solar cells had a USB adaptor. Now here is a standard that is simplifying computing, the humble USB. But it competes with firewire, which is losing out. Now wouldn't it be great if more appliances in the house that were low powered, used USB interfaces? Why do we need AC for every low powered device when DC is sufficient?

Some other areas that standards are needed include:
- Power. Different countries of the world use different voltages and device shapes to plug into the walls. Europe is different to America which is different to Australia. The Australian one is the least used (because of population size), but the safest. If there was an agreed standard it would be easy for manufacturers to make devices that work universally.
- Driving on the road. Left or right? Road rules. Different countries, different parts of the country, all different rules. In the global world people travel and drive in different countries. Switching to drive on the other side of the road requires expertise and is fraught with danger. Unfortunately I doubt we will ever get a standard in this area.

Now onto my topic to annoy. The one major standard in the world that everyone has agreed to work with and use is Metric. Its easy to use, understand and work with. So which countries in the world do not use it? Even though the USA has agreed to use Metric and was one of the first countries in the world to attempt to adopt its usage, its still not being used. Only three countries in the world do not use it : See Wiki and Metric

Why is it annoying? I travel a lot to the US. I am inundated by TV shows from America, and in all cases I have to translate units from imperial to metric to understand it, and likely get it wrong doing so. Even shows like Mythbusters try to say units in Metric but are always caught out. Try watching a documentary and they express weight in bounds, size in feet and speed in miles per hour. Annoying and frustrating, makes me want to get out my Tandy Model 3 and Betamax and use it to access MSN. (my understanding is that the only time feet is OK to use the expression feet, is when on an aeroplace and you are expressing the height you are travelling at).

Metric is a powerful standard. All countries have had to do the switch to it, most recently the UK. And it takes time to do the switch, but its worth it. My guess is that as the US sells more goods to the rest of the world, and as more overseas goods come in to the US, the packaging, weighting and sizing will have to conform to the agreed metric system. Slowly the US will use the metric system, but they will not start using it because a lone Australian is having a temper tantrum on the other side of the world.

Before I close this blog, my personal wish is for the banks of the world to come up with a standard for communicating merchant payments, because in the last 3 years I have had to build 12 different interfaces for 12 different banks, to handle merchant (credit card) - ecommerce payments. Each interface was different and each time I built it, I thought they couldn't come up with a new different way, but they always did. Its not efficient, its a pain to support and it costs the IT industry time and money to integrate. Well it costs me patience and sanity, but you get the point.

Experiences #8 - on National Languages and Indexes and Diacritics

Ok, here is the issue. A column in a table needs to be searched on. It needs to be case insensitive and it needs to be able to deal with Diacritic characters (not to be confused with Diuretic which always get me). These are latin based characters that have inflections or accents on them (see Wiki). The aim is to allow searching that ignores these characters.
e.g. If the column contained: de Niró, then searching on DE NIro, dë Nirõ, DE níro will all return the same results. Also wild card searching will also work.

How to achieve it? There are two methods, the first involves using features for NLS provided by Oracle. The second is to build it yourself.

To start, you should ensure your database has a character set that is UTF-8, Unicode or some equivalent of better character set. By now, if you are on Oracle 10 or greater you should be using Unicode, because of XML (if you aren't then look to migrating to it).

The hard part involves getting these characters in. Don't try using SQL*Plus and cutting and pasting. Odds are, the SQL*Plus session will corrupt them in the buffer before sending them to the database. Using a HTML page seems to work, and I have had good success migrating from non-Oracle databases into Oracle using database links.

Once in, if you want to use the Oracle features to support this type of query, it involves modifying your session. With the tests I did, I used these statements:

ALTER SESSION SET NLS_COMP=LINGUISTIC;
ALTER SESSION SET NLS_SORT=generic_m_ai;

Then I did my search as per normal (no SQL change).
I recreated my indexes like this:
create index i_myindex_n1 on mytable(mypk,NLSSORT(word_upper, 'NLS_SORT=generic_m_ai'));

Check the manual for what happens behind the scenes. Really what happens is the equivalent setup of materialised views and indexing. On Oracle 10 I could search correctly, but wild card searches were slow because they couldn't use the index. This has been addressed in Oracle 11.

The downside was Mod PL/SQL. By changing the session the Mod PL/SQL output was corrupted. The workaround was to do an execute immediate and alter the session in between my search calls and set it back after. Slightly messy but it worked. Didn't solve the 10g wild card issue - making it a good reason to migrate to 11g.

The manual solution involves setting up a trigger (or equivalent) and on insert/update just taking a copy of the data and running this function put it into the replicated column:

function nls_conv_val(vtxt varchar2)
return varchar2
as
rtxt varchar2(1000);
begin
rtxt := upper(vtxt);
rtxt := translate(rtxt,'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝØ','AAAAAAACEEEEIIIIDNOOOOOUUUUYO');
rtxt := replace(rtxt,'ß','SS');
return( rtxt );
exception
when others then return( rtxt );
end nls_conv_val;
I can then index this column. Its simple and the only downside is the management of this replicated column. I then have to make sure all searches use the column containing the converted data and not the original one. But it does work with wildcards in Oracle10, I don't need to create extra indexes, and I can control which characters are converted.

So these are two methods available for dealing with NLS. One is transparent, one is manual but controllable. Each is as good as the other and its up to your business to decide which one is right for you. I expect there are additional methods available and am keen to hear from anyone who might have them.

Tip #6 - Oracle XE Version

The free version of Oracle is called Oracle XE. It has limitations including:
- Database size is 4Gb
- No Java
- Platforms and # CPUs

When using Oracle Multimedia, the size limit of 4Gb is an issue and is very restrictive (which encourages you to buy a licence). But with 4Gb of storage (really assume 3Gb once you factor in loss due to database use, block sizing, indexes, inefficient use of relational storage structures), then how much can be stored?

Lets say we wanted to store images in the database. If a thumbnail is around 2k, a web quality around 50K, and an original 400k (OK, so most cameras today are 5mega pixel or greater meaning the images are 2Mb - 10Mb in size. But 6 yrs ago, this wasn't the case and most original images were lucky to make it to 140K). If your original images are for viewing on a computer screen, then 400K is better than what most screens can handle.

Ok, back to the sizing. Lets keep it simple and round up, and assume Thumbnail + Web + Original = 500K. That means 2 images per Mb or 2000 images per Gigabyte, allowing 6000 images to be stored. That's a pretty healthy figure. Now if you only store the web quality + tnail (lets assume 50k for both), then we get 20 images per Mb, or 20,000 images per Gb, or 60,000 images that can be stored in XE. Now that is a good figure.

Back to the other limitation. Oracle XE doesn't support Java, and Oracle Multimedia uses Java. Bit of a show stopper that. There is nothing stopping you from running the scripts to define the Oracle Multimedia types in XE. The methods will not be available, but if you are willing to write your own, you can still have a database that has type support with Oracle Multimedia (so when you do migrate to Standard/Enterprise its an easy migrate).

So without Java, how do you shell out? For those not in the know, if your Oracle database is running Java, its fairly easy to shell out to operating system and run a script. This script can then call a tool like Image Magick to modify the images and create thumbnails/web quality. These can then be loaded back into the database using standard dbms_lob packages.

But if XE doesn't support Java how do you shell out? Oracle XE supports the use of Scheduler, and with Scheduler you can shell out. Problem is with Unix, there is a nasty bug in 10.2.0.1 relating to permissions and shelling out (fixed in 10.2.0.2). There is a manual work around to fix this (check metalink).

Following are examples of how to use scheduler for Windows and Unix:

Windows:
SHELL := myjob;
jobidx := 1;
jaction := 'c:\winnt\system32\cmd.exe';
dbms_scheduler.create_job
(
job_name => SHELL || jobidx,
job_type => 'EXECUTABLE',
job_action => jaction,
enabled => false,
number_of_arguments => 3,
comments => 'Run shell-script'
);
dbms_scheduler.set_job_argument_value(SHELL || jobidx,1,'/q');
dbms_scheduler.set_job_argument_value(SHELL || jobidx,2,'/c');
dbms_scheduler.set_job_argument_value(SHELL || jobidx,3,cmd);
dbms_scheduler.enable(SHELL || jobidx);

Unix:
jaction := '/bin/sh';
dbms_scheduler.create_job
(
job_name => SHELL || jobidx,
job_type => 'EXECUTABLE',
job_action => jaction,
enabled => false,
number_of_arguments => 2,
comments => 'Run shell-script'
);
dbms_scheduler.set_job_argument_value(SHELL || jobidx,1,'-c');
dbms_scheduler.set_job_argument_value(SHELL || jobidx,2,cmd);
dbms_scheduler.enable(SHELL || jobidx);

Where cmd is a call to a .bat script or .sh script. Don't try and get it to run a command, it will likely fail. Big issue on Unix you will find is the script run has no parent knowledge. So you need to define the PATH and all other Environment variables. Assume it knows nothing and you will be OK.

Big Limitation on Windows for XE. I have noticed without finding a workaround, that if the script shelling out does not finish and terminate, it will hang and consume resources, and the max number of scheduled jobs that can run is around 6. So if the shell script has issues you will get these scheduler jobs hanging around, that "cannot be killed" when using scheduler commands (I have tried) and require a database bounce to fix it. My guess is its a bug in 10.2.0.1.

Also, if you want to try and trap the output then grind your teeth, its easier. Redirecting job output to a file, STDOUT, STDERR doesn't work consistently, if at all. If you can embed in your script the ability to dump information to a file, and then from PL/SQL read that file, that is a better method.

So don't think you can't use XE for managing multimedia. You can, and you can build some fancy applications in it. With Piction I have managed to port our application to it with only minor limitations and functionality loss. I have even ported it to Ubuntu. On Windows I managed to get it running from an 8Gb flash drive plugged into the computer (that's the Windows o/s, Oracle software, database all in 8Gb total). For Linux, couldn't do it in 8Gb, but achieved it in 16Gb. With the right will it can be done (and at times I wish I had a sledgehammer).

Experiences #5 - integrating with other Image Managment tools

There are hundreds of image formats, video codecs and audio codecs out there and Oracle Multimedia only supports the most common ones.
So what do you do if you want to store a non supported one in the Oracle database?

This is where I have spent the last 8 years trying out software from different vendors and looked at a variety of methods for manipulating and controlling these images.

The challenge is that it is hitting a moving target, especially with video. Each 6 months new codecs appear and based on user popularity (versus what is best) determines what is required to be supported. A long time ago AVI was popular, but AVI isn't AVI, there are 30+ codecs supported within it, including Divx, so just because the video has an avi extension, doesn't mean you can deal with it. And it gets worse when you can mix video and audio codecs together. The result is you might be able to decode the video but not the audio. Quicktime was popular, and so was MPEG, but the current popular video format is Flash Video (flv) because of YouTube. There is a huge amount of legacy video out there and trying to convert or manage them all is very challenging.

From my perspective, I look for tools that satisfy the following conditions:
- can run on Windows/Linux/Solaris
- can be called from a command line
- can be enhanced
- optional - good licencing rates

For Images, the preferred image management tool is Image Magick. It's open source and has an active community supporting it. It is also fast and has a lot of powerful options in it for watermarking and converting images. It can also deal with some video types and can handle postscript and pdf. It doesn't have memory issues and the largest file processed with it has been over 1Gb.

By being able to call the tool from the command line means an image can be managed from a shell script (or Windows Bat). Meaning, you can shell out from Java inside the database to manipulate the image and once manipulated, load it into the database.

For Video, the preferred tool is ffmpeg. This is supported across all platforms and supports a large number of codecs. It works well converting video and for extracting video snippets in other formats.

Both Image Magick and ffmpeg are open source. I will cover in later blogs techniques and lessons learnt when using these tools.

Now its just not enough to convert an image or video without thinking about scalability, both in terms of the huge amount of CPU required to convert, but also the large size of multimedia objects. But this is a topic for another conversation.

On Blog Entries

If anyone has wondered why I haven't posted anything recently on Multimedia its because for the last 3 months I have been head down programming and doing a major rewrite of our e-commerce engine. I will be doing papers, podcasts and blogs on everything to do with e-commerce and multimedia soon (especially on selling audio). Its been rewritten in PL/SQL and uses database objects extensively (lots of lessons learnt). All done whilst providing a complete webservice front end to all interfaces allowing dual support for usage.
I also find it easier to blog in bulk rather than a bit at a time, and provided I don't get distracted I should have a whole new set of blogs on multimedia out today. So whilst it rains outside for the first time in 2 months, I feel the motivation to put out some new blogs.

Thursday, September 25, 2008

Oracle Multimedia Basics: Directories

A goal when dealing with multimedia is to load images into the multimedia types, and those images exist on a filesystem.

To load them in, an Oracle directory needs to be defined. This directory allows you to assign a name to a physical location. Access can then be given to users using the granted command. Directories are used by a number of tools including datapump.

Following are examples of creating a directory:

create or replace directory win_mydir as 'c:\temp';
create or replace directory unix_mydir as '/u01/imageloc';

You can then grant read access to a user like:

grant read on directory win_mydir to websys;

Note 1: The directory does not have to exist. Existance is only checked at runtime usage.
Note 2: Access is recursive. You can access subdirectories that exist in this top level directory.
e.g.
If there is a file called c:\temp\subdir\abc.jpg, then you can access it using directory win_mydir and file name called subdir\abc.jpg

The following highlights this concept:

create table myaudio(pk integer, myaud ordsys.ordaudio);
insert into myudio values(1,ordsys.ordaudio.init());
commit;

declare
cursor c1 is select * from myaudio for update;
c1rec c1%rowtype;
ctx raw(4000);
begin
open c1;
fetch c1 into c1rec;
close c1;
c1rec.myaud.importfrom(ctx,'FILE','MYDIR','subdir\bwf_short.mp3');
c1rec.myaud.setproperties(ctx);
update myaudio set myaud = c1rec.myaud;
commit;
end;
/

SQL> select m.myaud.audioduration from myaudio m;

MYAUD.AUDIODURATION
-------------------
12

Wednesday, September 24, 2008

You Tube Interview on 11G

You Tube interview on Oracle 11G