Friday, November 9, 2007

ETL: CamptoCamp, FDO and Open Source - What about OLAP?

CamptoCamp & Talend


CampToCamp is introducing (not yet released, but anticipated), a Spatial ETL tool, that works in conjunction with Talend's Open Source ETL product Open Studio.


Once released, I'll begin to work with the product, but CamptoCamp was in Victoria presenting their solution at FOSS4G2007.


More information on their presentation can be found here.


Open Source ETL - Without Spatial


For Open Source ETL, without the Spatial, there are various Open Source solutions. Talend, being one of them.


You can take a look at:


1) Pentaho - URL: http://www.pentaho.com/

2) Clover - URL: http://www.cloveretl.org/

3) KETL - URL: http://www.ketl.org/


Is there another Open Source Spatial ETL Tool?

But we do have another option as well for the Spatial side now that AutoDesk is now working in the Open Source community and they released FDO (Feature Data Objects), which is similiar to FME - but is not an FME.


FDO is a Data Access Technology that was developed to manipulate, define and analyze geospatial data regardless of where it was stored.

FDO was originally developed and included in the Autodesk Map 3D 2005 product during the spring of 2004. In this initial implementation, it was capable of working with the following geospatial types:

  • Oracle
  • SDF

The following version introduced ArcSDE.

The third verision implemented more sources, and added providers for MySQL, SQL Server, ODBC, SHP, Raster, OGC WFS, and OGC WMS.

It was then that they decide to take FDO Open Source, but would not release the Oracle version - but being Open Source, there is always an option out there and some ingenious minds to come out with some solutions.

Quoting the OSGeo FDO History site:

"The release of FDO as open source coincided with the release of MapGuide as open source in 2006. It included the SDF, SHP, MySQL, ArcSDE, ODBC, OGC WFS, and OGC WMS providers. "

FDO was now out to the Open Source World - right on schedule with their release of MapGuide Open Source.

But what about Oracle and FDO?


Much of the work for the Oracle side has been developed by SL-King in Slovenia.

King.Oracle is Open Source FDO provider for Oracle.

Through this product, which is Open Source, SL-King is providing a tool that supports Oracle Locator and Oracle Spatial. It is specifically designed for Oracle and Oracle alone and they are designing it in such a way, that it will support full Oracle Spatial functionality.


Currently the latest version (Version 0.7.3) provides:

  • Support for Oracle 10G, Oracle XE and Oracle 9i
  • Optimized for Oracle
  • Using plain Oracle tables and views
  • Can be used inside AutoCAD MAP 3D to edit and query Oracle data

For a Flash Movie on this, please look here.

Can we convert between FDO different sources?

Yes, of course! This is Open Source.

Coming from SL-King, again, we have another ingenious tool, called FDO2FDO.

FDO2FDO is an Open Source FDO client application which uses the above mentioned Open Source FDO library to manipulate, create, and define geospatial data.

Currently, the software is capable of the following:


  • Copy data from SHP files to SDF
  • From SHP to Oracle
  • Oracle to SDF...

In the end, FDO2FDO allows the user copy and modify any data from any FDO Data Store to any FDO Data Store.


There are three main parts in FDO2FDO and they are:

  1. Fdo2Fdo Api library
  2. F2Fcmd Command line utility
  3. Fdo2Fdo GUI
An introduction to FDO2FDO can be found here.

There are always solutions out there.


Can Geospatial move towards ETL and Data Warehousing?


Yes.

Currently, in order to do web-mapping, you require the following:


  • database
  • web server
  • data

That is all. You can have a web-map up-and-running with MapServer or MapGuide quite easily, definitely within less than a day depending on how complicated and stylish you want to get.

But look deeper and what is it we are after? The data.

Where is the data stored? In a database.

So why don't we work on bringing OLAP into the Internet mapping and GIS worlds?

Aggregations of data can occur anywhere.

You can group data by postal/post/zip codes, populations, etc. - this is prime data for OLAP.

Take industries such as oil and gas - well production is recorded on an hourly basis. This can be summed and aggregated into data marts (OLAP) for daily, monthly, yearly.

The maps are a starting point, but there should be no disconnect between the data, databases, GIS, internet mapping, as we are only working with data and transforming it into much more usable and valuable information.

By bringing OLAP into GIS and Internet mapping, you can add more value to your client's data and this data can be fed into other applications for reporting, etc., etc..

The internet map acts as a portal to a whole other world of information.

Just a few thoughts, as I'm involved in both worlds presently.

What do you think?

Feel free to write and let me know.

Saturday, November 3, 2007

Axis Order Confusion: Software, Geodesy & Transformations



Axis Order Confusion



Not simply X, Y, Z or N, E, S, W, but how we interpret them and use them in software, geodesy and navigation - it varies and has led to confusion among many people. This is an overview of the systems, the problem and points out things to take note of.



Over time, since the early days of mathematics, and then moving into software development, GIS and internet mapping, axis play an important role, whether it be in datum transformation or even displaying a map.


Is it X,Y,Z or Z,Y,X?


As we know in mathematics, a coordinate system is a system for assigning n-tuple of numbers or scalars to each point in n-dimensional space.



We are familiar with the following coordinate sytems involved in GIS/Mapping and Geodesy:


  • Cartesian coordinate system, which may be called "rectangular", where for 3D space, it uses three numbers representing some distance

  • Polar coordinate systems

  • Curvilinear coordinates, which are based on an intersection of curves


Delving further into Polar coordinate systems, we see the following subtypes:



  • Circular coordinate systems, which is represented by a point in a plane, by an angle, and a distance from the origin

  • Cylinderical coordinate systems which require a point in space, a distance from an origin an a height

  • Spherical coordinate system, which is represented by two angles and a distance from an origin


In mapping and geodesy, we deal quite often with Spherical coordinate systems and often refer to them as Geographic Coordinate Systems.


Ordered Pairs & Coordinate Systems




In GIS software and mapping software, we have three different perspectives for an Ordered Pair (2-tuple). They are in computer science, mathematics and of course Geographical Coordinate Systems.



Let's take a quick look at how these distinct perspectives see the their world:





In computer science and computer graphics, the axis order is (X,Y), where unsigned values increase to the bottom and to the right.



Mathematics sees this world differently for the same axis order (X,Y), where we have signed values increase to the right and upwards.



In the world, where we are most involved, Geographical Coordinate Systems, the axis order varies, sometimes being (X,Y) or (Y,X). The signed values increase upwards and to the right, based on a spheroid, hence we have -180, -90, 180, and 90.



Rotation Confusion as Well - Which Sign is positive?



There are 2 different conventions in use in the survey and mapping industry for defining rotations.



This too has led to considerable confusion in the GIS and mapping world.



Both are valid when used properly.



The two conventions can be referred to as:



1) Position Vector rotation (Commonly referred to as the Busra-Wolfe)


2) Coordinate Frame rotation


This essentially comes down to the left-handed vs. right-handed rotations (see image above) for the various transforms.

But what does this mean with left vs. right? Well, this is one way of determining orientation of axes and direction of rotations.

  • Thumb = Positive X
  • Index up = Positive Y
  • Middle out = Positive Z


Clifford's Point of View

Clifford Mugnier of LSU, whom I met when I lived in Houston, gives a good explanation and way of handling rotations and coordinate systems. His reply can be found here.

My quote follows:

"Probably the best way to document a rotation method is:

use the accepted terms "coordinate frame" and "position vector".

These terms are also used in other disciplines like kinematics (robotics).

But there are more difficulties.One datum transformation method is laid down in the ISO 19111 standard.

This is an approximated 7-parameter Helmert transformation with position vectorrotation. See also ISO/IEC 18026 - Annex B.

PROJ uses about the same method, only the scaling method differs. PROJ does a scalar multiplication, ISO a matrix multiplication.

With the commonly used parameter ranges, the differences between the scaling methods are less than microns, so not important.

As far as I know, the Bursa-Wolfe transform is an approximation to theHelmert transform. The Helmert transform has sines and cosines in therotation matrices, whereas Bursa-Wolfe (and ISO 19111) use the angles themselves (since sin(a) ~ a, sin(a)*sin(b) ~ 0, and cos(a) ~ 1 for small angles).

If you read section B.6 of ISO/IEC 18026, then you'll notice that aBursa-Wolfe transform can be done with a position vector rotation model OR with a coordinate frame rotation model.

Just what one likes the best; be sure to use the correct sign of the rotation angles.

Therefore:

Bursa-Wolfe is NOT equivalent with this or that rotation model.

A well known expert repeatedly states that the Australians use the same datum transform rotation model as the Americans.

This is NOT true!

The order of the rotations differs (XYZ vs. ZYX).

See the Australian GDATechnical Manual.

By the way, if the rotations are approximated, then the order is not important.

Again, the differences in rotation order for real life numbers are literally microscopic."

He clearly points out, and I agree, that it is "silly to refer to a datum transformation method as an American, Australian, European, whatever regional model. If you want to document the way for instance an application transforms,givethe complete formulae, not just an ill-defined name. Why not referring to the EPSG coordinate transformation method numbers? These clearly define the most used datum transformation methods and projection methods."

A simple solution to a complex problem of rotations.

The key is to document and document and be sure to specify what is being done, what axis are being used and let your users know. Do not assume, ask questions, and your life will be easier when dealing with coordinate sytems and rotations.

Or even simplier, as Clifford points out, refer to the coordinate transformation method numbers referenced in the EPSG data and data model for defined coordinate sytems.

Friday, November 2, 2007

CS-Map, Open Source & FDO - Autodesk Speaks

This article was recently published by the Australian PC World magazine. In it they interview Autodesk's Liam Speden about CS-Map and how it will be integrated into OSGeo.

CS-Map currently supports many projections and over 3000 coordinate systems.

It is an interesting read and shows how Autodesk believes that because there are so many coordinate systems out there, Open Source can make their customers benefit from this technology and how the rapid development and implementation that takes place in the Open Source world provides a sound and stable product used by real users.

Take a look at some of the books I've listed on the side bar. They explain Open Source quite well and how innovation can happen elsewhere.

EPSG & UKOOA - Defining Co-ordinates in Digital Data Exchange Formats



Introduction to the Offshore



UKOOA and the EPSG have been working together for many years. Through their collaboration, they have developed many standards and UKOOA has been open to listening to the industry about positioning in the North Sea.





When seismic data is acquired, whether it be 2-D or 3-D seismic surveys, the shotpoints (energy source, common mid-point, etc.) need to be positioned or referenced on surface.



Over the years UKOOA has developed various formats, named via a version and a year. This is a short introduction to how these files describe positions in the oil and gas industry.



UKOOA P1/90



Information is described in the Header for the file. The Header records following the convention listed below:



Record Identifer "H" Column(s): 1 Format: A1
Header Record Type 2-3 I2
Header Record Type Modifier 4-5 I2
Parameter Description 6-32 A27
Parameter Data 33-80 Varies



Using the above as a basis, let us take a look at how Datum and Spheroid information is described in this file.



Header records H1600 and H1601 are required for Datum Transformation parameters used by the Bursa-Wolfe Transformation.



Reviewing the Bursa-Wolfe Transformation (as vectors), we see the following:



X DX 1 -RZ +RY X
Y = DY + SCALE * +RZ 1 -RX * Y
Z DZ -RY +RX 1 Z



where

X,Y,Z are geocentric cartesian coordinates defined in metres
DX,DY,DZ are the translation parameters defined in metres
RX,RY,RZ are clockwise rotations defined in arc seconds,
but are converted to radians for use in the formula


SCALE = [1+S. (1oe-6)] where S is in parts per million



The Vertical Datum, is identified by Header record H1700. Some examples of the vertical datum, in relation to offshore work are:



LAT - Lowest Astronomic Tide
MSL - Mean Sea Level
SL - Sea Level
ES - Echo Sounder



The units of measurement are specified in H2001. These should be consistent with the position data. The height unit code will be 1 for metres, 2 for any other unit of measure. Header H2002 specifies the Angular unit code to 1 for degrees, 2 for grads.



Projection Data is specified in Header records H1800 to H2509

Currently, in this older format, the following projection codes were defined and used.



001 - UTM Northern Hemisphere
002 - UTM Southern Hemisphere
003 - Transverse Mercator (North Oriented)
004 - Transverse Mercator (South Oriented)
005 - Lambert Conic Conformal with one standard parallel
006 - Lambert Conic Conformal with two standard parallels
007 - Mercator
008 - Cassini-Soldner
009 - Skew Orthomorphic
010 - Stereographic
011 - New Zealand Map Grid
999 - Any other projection or non-standard variation of the 11 listed above



Since this initial positioning file was developed with the help of surveyors, they planned ahead and answered the question: What happens when we cross the Equator?



When a survey crosses from the South to the North, and the whole survey is shot on a Southern Hemisphere UTM Zone, the coordinates will possibly exceed 9,999,999.9. This is not acceptable in the P1/90 format, so Header record H2600 must indicate that 10,000,000 must be added to the co-ordinates.


More detail about this specification can be found here.





As the industry matures, new versions were released, the next in 1994. This was called p2/94 and was derived for raw marine positioning data.


UKOOA P2/94


During this time, differential positioning with GPS was just being implemented and the industry was beginning to rely on this technique more and more for offshore surveying.


This format is based on UKOOA p2/91 and has extended many of the definitions needed for differential GPS.


As operators maintain and store data in these formats, P2/94 also acts as an archiving format and records information such as the satellite ephemeride, ionospheric conditions and weather/meteorological conditions of the survey.


With the move to P2/94, Geodetic information moved to new headers, and are such described as:


H0100 Magnetic Variation - General Information
H0101 Magnetic Variation - Grid Data
H011# Datum and Spheroid Definitions


where # = 1..9 and is the datum & spheroid number


H0120 Seven Parameter Cartesian Datum Shifts
H0130 Other Datum Shift Parameters
H0140 Projection Type
H0150 (Universal) Transverse Mercator Projection
H0160 Mercator Projection
H0170 Lambert Projection
H0180 Skew Orthomorphic & Oblique Mercator Projection
H0181 Skew Orthomorphic & Oblique Mercator Projection cont.
H0190 Stereographic Projection
H0199 Any other Projection

Satellite System Definitions

H600# Satellite System Description
H610# Definition of Differential Reference Stations
H620# Satellite Receiver Definition
H6300 GPS parameter recording strategy
H6301 DGPS differential recording strategy
H631# GPS clock and ephemerides parameters
H632# GPS ionospheric model & UTC parameters
H6330 Meteorological parameters
H65## DGPS differential correction source defn
H66## DGPS differential correction source defn
H67@0 GPS ellipsoid height estimate


Rotation Conventions


Note that 2 different conventions are in use in the survey industry for defining rotations. This has led to considerable confusion in the GIS and mapping world. Both are valid when used properly.


The two conventions can be referred to as:



1) Position Vector rotation (Commonly used in Europe and referred to as the Busra-Wolfe)
2) Coordinate Frame rotation (Commonly used in North America)

I will talk more about these on a later blog, as it comes into play with a lot of GIS and mapping software.

More detailed information about the P2 format can be found here.


UKOOA P5/94


This version came along to facilitate the exchange of position data for pipelines, flowlines, umbicals and power cables offshore.


In these cases, the data required for pipeline positions are the Latitude, Longitude, Easting, Northing, Depth, and Kilometre Point (KP), along with the standard datum and map projection parameters.


Without wanting to bore my readers with more H records, you can found out more about how the pipelines are stored in this format here.


UKOOA P6/98



In 1998, a new version was developed for 3D seismic surveys and binning.



This is quite complex and would make this short blog even longer, so I'll write about this format at a later date.


The main emphasis of this blog, though, is to show how formats can change over time as technology and data sharing increases. It also points out the importance of knowing the format of your data, especially if you are doing historical work over a region - do not always assume a specific data format. This is partly why the OGP has started the Joint Industry Project I mentioned in an earlier post.


UKOOA P7/2000


Well's deviate. With the advent of horizontal wells and sidetracks, and relating to seismic surveys, we enter a whole other story again.


In this story, as well (bad pun!), we have to consider height measures (such as Kelly-Bushing), and the 4 Norths (which I will explain on a later post).


As this file type would make this blog even longer, I'm going to jump ahead to what the EPSG and UKOOA are doing now in defining the Header records for this specific part of the oil and gas industry.


How the EPSG comes into Play


Turning our eyes to the EPSG in P formatted files, we want to enable integrity checking of co-ordinate system definitions in UKOOA P1, P2, P5 and P6 formats, so a provision is made to describe co-ordinate system by reference to the European Petroleum Survey Group (EPSG) database of geodetic parameters. This is the group of codes we see in use throughout the GIS field and in products such as ESRI and PROJ.4.


What this allows UKOOA to do is to adopt an industry-standard name to be quoted where the geodetic co-ordinate system used is a common system. Defining parameters and units are then as given by EPSG and are not strictly required to be explicitly given in the P-format records.


As an integrity check, it is considered good practice also to include the explicit definition .The new records which can be used as extensions within the P1/90, P2/94, P5/96 and P6/98 formats are:



H8000 EPSG Geographic CS Name
H8001 EPSG Geographic CS Code
H8002 EPSG Projected CS Name
H8003 EPSG Projected CS Code
H8004 EPSG Vertical CS Name

H8005 EPSG Vertical CS Code
H8006 EPSG Database Version



As we know, co-ordinate systems may be two- or three- dimensional.


A vertical co-ordinate system is one-dimensional.


For the P1, P2 and P5 formats:



the H8002, H8003 and H8006 records are required when latitude, longitude, easting and northing but no height or depth are given;


the H8002, H8003, H8004, H8005 and H8006 records are required when latitude, longitude, easting, northing and gravity related height or depth are given;


the H8000, H8001, H8002, H8003 and H8006 records are required when latitude, longitude, easting, northing and ellipsoidal height or depth are given.



For the P6 format, the H8002, H8003 and H8006 records are required.



That is the way UKOOA and the EPSG see the offshore world when it comes to positioning and exploration in the North Sea and elsewhere.


Exploration & Production Blog


If you are interested, I also write another blog on the oil and gas industry, mainly describing where exploration is occurring, the technology being used, history of a region, some geology, etc. and some aspects of the UN Convention on the Law of the Seas.


The blog is located here.

Enjoy!

Thursday, November 1, 2007

Quarter Degree Grid Cells: Another way of Mapping Africa


Recently, Ragnvald Larsen of the Norwegian University of Science and Technology (NTNU) in Trondheim released some news to the Spatial Data Infrastructure for Africa (SDI-Africa) mailing list.

He has been working on project about creating Quarter Degree Grid Cells for mapping purposes for the African countries on a national level.

But what are Quarter Degree Grid Cells?

Quarter Degree Grid Cells (QDGC) are a way of dividing longitude and latitude degree square cells into smaller squares, forming in effect a system of geocodes. This is similar to the NTS system in Canada (for mapping the Northern areas of Canada) and to the way the North Sea is mapped when determining leases by the various countries involved (Norway, Denmark, UK, Germany, the Netherlands). An example of how the North Sea is subdivided by country follows:

The respective sectors are divided by median lines agreed in the late 1960s.

In the United Kingdom, the UKCS (United Kingdom Continental Shelf) is divided into quadrants of 1 degree latitude and one degree longitude. Each quadrant is divided into 30 blocks measuring 10 minutes of latitude and 12 minutes of longitude.

Norway has a similar model and is divided into quadrants of 1 degree by 1 degree. Norwegian licence blocks are larger than British blocks, being 15 minutes of latitude by 20 minutes of longitude (12 blocks in a quad).

In Denmark, the Danish sector of the North Sea is divided into 1 degree by 1 degree quadrants, and their blocks are 10 minutes latitude by 15 minutes longitude.

Germany and the Netherlands share a quadrant and block grid - quadrants are given letters rather than numbers. The blocks are 10 minutes latitude by 20 minutes longitude. The Dutch sector is located in the Southern Gas Basin and shares a grid pattern with Germany

So the theory of using grid squares has been around for quite some time.

Almost Equal Areas

QDGC represents a way of making (almost) equal area squares covering a specific area to represent specific qualities of the area covered. The squares themselves are based on the degree squares covering earth.

We know that around the equator there are 360 longitudal lines lines. For latitude, i.e. from the north to the south pole we have 180 latitudal lines. Multiplying we determine that this gives 64800 segments or tiles that can cover earth. The form of the squares becomes more rectangular the further north or south we move. At the poles they are not square or even rectangular at all, but end up in elongated triangles.

Each degree square is designated by a full reference to the main degree square.

An Example using Tanzania

Taking from the project web-site, I'll use their example, with regards to Tanzania.

S01E010 is a reference to a square in Tanzania. S means the square is south of equator, and E means it is East of the zero meridian.

The numbers refer to longitudal and latitudal degree.

A square with no sublevel reference is also called QDGC level 0.

This is square based on a full degree longitude by a full degree latitude. The QDGC level 0 squares are themselves divided into four.

A grid at this level is shown as:

AB
CD


Smaller squares are determined by dividing the above squares into 4 again.

So if we divide S01E010 by four again, the new grid would be S01E010AD.

The number of squares for each QDGC level can be calculated using the following formula:

number of squares = (2^d)^2 where d is QDGC level

Putting all the above theory into place, there is code out there that will allow you to compute a Quarter Degree Grid Cell, please follow the link here.

Project Information

For more information on this project and the work done by Ragnvald Larsen, please take a look at QDGC.

The attached image shows QDGC being used for mappings of the fires in Africa in the year 2000.