Thursday, April 16, 2009

Best Practices Manual for optmized Modeling using Framework Manager

Best Practices Manual for optimized Modeling using Framework Manager


One Model Versus Too Many Models
It is a common tendency to create one large model with all the possible reporting elements within it, leaving the modeler with a single object to maintain.
Clients usually clutter the model with multiple data sources; group all the metadata for all business units to avoid maintaining multiple models, this design method is unsustainable in the long run.

This can cause more problems than are anticipated at first because; as the model grows so does dependency on this model. modeling more doesn’t mean better and it’s better to create smaller more manageable models rather than one large model with everything in it.

It is recommended to design multiple models based on reporting business units or data sources. For example an HR project should have a Model designed all for itself rather derive packages from enterprise performance model.

Size of the Model
While currently there are no documented limits to the size of a model, at some point a model designer needs to evaluate the breadth of the information being put into his model. They need to keep in mind that the model they create will at some point be used to create a package and in turn
loaded into one of the studios.

A good rule of thumb is if the model which is being used is too cluttered when trying to view it in the diagram viewer in Framework Manager then you likely are on your way to a complex model. Package size affects the users both in the UI as it takes time to navigate the structure to find the desired data elements, as well as the length of time it takes to load the package into the studio. Before the package can be consumed by the studio it is retrieved from the content store and then run through a SAX parser, and in turn an RTM file is created. The initial creating and loading of this rtm file takes some time and for the first user accessing this package there will be a hit against the time it takes to run the report or load the package into the studio while this rtm file is created.

Once created it is reused for all users accessing the same package, however this file is created on each of the dispatcher boxes in a multi server configuration as the particular request is sent to each of the dispatchers. Limiting the size of the package will limit the access time for the initial user and in turn improve the user experience.

Structure
The best practice is to switch to a three tier architecture. Three tier architecture offers the flexibility in designing and maintaining the model.
The three tier involves:
Database Layer The database layer should contain a base layer that matches the physical database tables and elements. By keeping the database layer as close as possible to the actual database it would be easier to propagate changes in the future. Auto Joins should not be enabled at this level. This layer shall provide any new/existing developer with accurate view of the existing database. Adding joins, manipulating joins or changing business names, creating aliases & tables (using sql) at this level will lead to confusion in terms of maintenance. Because with time, modelers will lose the distinction between tables on the database and virtual tables and views that were created by modelers.
Business Layer is the sandbox for the modeler where the modeler can create joins, create alias tables, manipulate and apply additional filters, create calculations and default prompts and filters. This sandbox allows the modeler to work group and organize data. Pick the best way to model the data between dimensional and relational. This layer also allows the modeler to maintain business names. This layer allows the flexibility to support any future changes to naming conventions and business names. Changes to calculations can be done once and the change shall be reflected across multiple reports.

Presentation Layer is the view that is being offered to the users and report consumers. In this layer we design a query subjects with all the required fields from multiple tables for each reporting business group. You could provide team specific business names for certain calculations etc. For example an amount field needs to be displayed as a credit on Investor accounting reports and as a debit on the Corporate accounting reports. The same amount field may be labeled differently for each of the business reporting units.

Package Design
The model size affects the modeler, the package size affects the user interaction on the Studios. Especially, fully loaded packages take very long to load in Query Studio.
It is a good practice to hide or exclude any tables or views that are not required by a particular package. This saves load times in query Studio and refresh&Validate in Report Studio.

- Always maintain distinct packages for relational and dimensional cube based datasources.
- Do not import cube(s) into existing relational model to publish as a package. Always maintain distinct Cube models for the following reasons

The Categories stored in the cube metadata change and hence require you to publish the cube package which in turn will overwrite your drill thru definitions. This is only valid till Cognos 8.2. Cognso8.3 allows you to publish cube directly(with necessary domain privileges) to Cognos Connection without having to use Framework Manager.

When you open a package which has both a relational and cube datasource in it, both data sources require an RTM file to be created, regardless of whether the user is accessing both data sources in their report. There can be a delay in the time it takes to open a package with multiple data sources (Ie relational or cubes) involved, as it has to create the RTM files for each datasource.

Use of Namespaces
It is recommended that you multiple namespaces in the Business Layer by each of your reporting sub divisions.
The advantages of having namespaces are that, you limit the number of required tables, you can create report specific joins, reduce the number of alias tables created,. Namespaces make sure that only joins represented within the namespace are used even if other joins exist between the tables.

Pareto principle
Pareto’s principle applies to reporting as well. 80% of reporting is generated from 20% of the data/tables or fields. So it is very important to identify, fine tune and constantly monitor this very important 20 % of the table or views.
Also it is a good practice to always provide the most frequently used fields in a query subjects at the top of the query subject. Always store the identifiers at the top of the query subject, this will subtly force the user to use key fields, then dimensions and then decide on the Facts that need to be aggregated.
This is even a good practice in development where you end up running multiple test queries against fact tables. This will help the adhoc report developers to focus on Indexes add dimension to see if they have got required fields and calculations before they pick fields initiating queries with no indexes resulting in full Outer joins.
SQL
Use the With clause for better query performance if the request is restricted to functionality supported by the underlying data source software. When a request uses functionality that is not supported by the data source, using the With clause may cause additional decomposition of the query, which can lead to degraded performance. In this case, not using the With clause may generate a better set of queries to the underlying data source.
Because each query subject is sent to the data source as a separate statement rather than being optimized by Framework Manager, performance is slower. Therefore, in choosing between native SQL and pass-through SQL, you must decide which is more important: performance or using SQL that is not permitted in a subquery.
Generally, you should use pass-through SQL only if you must create a query subject that contains constructs that are specific to a data source and that cannot be used inside a derived table, such as in a With or OrderBy clause.
The SQL Generation type of a query subject can be set to either As View or Minimized. By default, it is set to Minimized.
When the generation type is set to Minimized, the generated SQL contains only the minimal set of tables and joins needed to obtain values for the selected query items.
When the generation type is set to As View, Framework Manager generates queries that contain the full SQL statement that defined the query subject. Use As View when you want to ensure that the query is run as a block. The SQL is treated as a view. For example, you want the query to return the same number of rows each time that it is run.
The SQL Generation setting has no effect on a query subject with determinants specified for it.
Using minimized SQL improves performance, resulting in a query that runs significantly faster. Generating minimized SQL is especially beneficial for query subjects that represent dimension tables. By using a single model query subject to model a dimension, you can benefit from small SQL queries that run significantly faster.
For example, the SQL Generation Type of the following query subject is As View. Note that this query subject contains a nested select statement.
Usage property
It is very important to define the usage property of each query item after importing it from the data source. In Cognos all number datatypes are defined as facts by default, which might not be the case always.
As a best practice, a modeler should always check the usage properties and define it correctly as per need both on query items and calculated query items. For ex, year(date()) will result in a number data type with rollup aggregate property of sum which when grouped in a report will give incorrect results.
Joins
Join paths are best established based on the reporting needs, reporting frequency and report retrieval limitations. There is no hard and fast rule to resolving loop joins. There are 4 basic resolutions
- Break the join
- Create alias tables with different join strategies
- Use the join expression editor to specify the join
- Modify SQL
Each of the resolutions is done for a different reason and may have some issues associated with it. Determine the best resolution for your situation by analyzing the data with regards to the results required from the join structure.

To do or not to do Multiple Data source connections
Multiple data source connections allows users to execute reports using different database but with identical metadata.
They facilitate the migration from one environment to another and maintain the integrity of the project.
This is a very flexible option provided by Cognos. In practice however there are many bugs associated with multiple data source connections. To begin, it definitely takes longer to cache metadata as it now has job of caching multiple database metadata. When switching across different databases or cubes it takes longer to process and you will experience report hang-ups and metadata refresh hang ups in Report studio design mode.
Multiple data source connections have been used access one of the two databases. However, since the metadata is not identical you could foresee many issues while generating the reports.

Quick Tips
- Move towards a 3 tier architecture
- Organize your model using folders and namespaces
- Create Namespaces instead of Alias tables
- Keep metadata simple by importing only required fields
- Upgrade to Cognos 8.3 or higher
- Occasionally disable “Allow usage of local Cache” property and publish your package
- Run your queries against plan table and monitor the index usage, join paths and query cost