Creating a basic Mondrian OLAP Cube

最新推荐文章于 2025-07-29 21:17:35 发布

Jerrysbest

最新推荐文章于 2025-07-29 21:17:35 发布

阅读量1.3k

点赞数

分类专栏： BI 文章标签： basic transformation database properties excel schema

BI 专栏收录该内容

8 篇文章

订阅专栏

本文介绍如何使用Mondrian OLAP引擎创建分析立方体，并通过实例演示了从Excel文件转换为Mondrian立方体的过程。文章还详细解释了事实表、维度和度量的概念。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

This post is a hands on tutorial on how to create an analysis cube for the Mondrian OLAP engine. It is an introductory post for an audience with no OLAP experience at all. I will assume some experience with relational databases and the Kettle ETL tool. If you’d like to follow the examples you will need access to a database, a copy of Pentaho Kettle and a Mondrian installation. I will be using MySQL as RDBMS and JasperServer 3.7. CE for the Mondrian installation. Other possibilities include Pentaho BI-Server and a bare bones Mondrian/JPivot and PAT installation.

The example data

For the example data I would like to use a small and simple dataset, so it is easy to share and easy to understand. Therefore I decided to use an Excel extract from the public issue tracker for the Kettle tool . It has one thousand lines of issues, bugs and feature requests that I would like to put into an OLAP cube for analysis. Each row contains the issue type, a summary, the assignee, a priority, a status and a resolution. I took the liberty to replace the real names of the assignees with figures from Sesame Street in the input data. And no, I will not reveal who is who

Download the Excel file that contains the example data.

Transforming the Excel file to a Mondrian cube

A Mondrian cube is stored in a relational database. The central table for an OLAP cube is called the fact table. It contains entities (think: rows) whose properties are analyzed in the OLAP cube. For the example cube an issue will be a fact. It will have the properties type, assignee, priority, status and resolution. I’d like to drop the summary field, because it contains free text information that is of little use in aggregations or breakdowns.

The simplest path from the Excel sheet to the database is to create the table in the database and to import the Excel sheet. I will be using Kettle for that. I’ll be using a counter to create a key for each issue. Other than that the import transformation is just your garden variety database import.

As a result the facts table looks like this:

        CREATE 
        TABLE 
        fact_issue 
       
        ( 
       
        id  
        INT 
        PRIMARY 
        KEY 
        , 
       
        issue_type TINYTEXT, 
       
        assignee TINYTEXT, 
       
        priority TINYTEXT, 
       
        status TINYTEXT, 
       
        resolution TINYTEXT 
       
        );

Get the import transformation if you want to follow along.

Now that the data has made it to the database Mondrian needs to be configured to use it.

Telling Mondrian about the cube

Mondrian cubes are configured using schema files . The schema is going to hold a single cube. We must give it a name and define dimensions for it. A dimension is an OLAP concept that I will not bother to define with academic precision at this point. For the time being I’d like to think of a Mondrian dimension as a property of a fact (i.e. issue) that we can break down along. As such the issue type, assignee, priority, status and resolution are each dimensions of an issue. OLAP dimensions are always hierarchical in nature. The issue dimensions only have one hierarchy level each. In real world applications dimensions may have multiple hierarchies with multiple levels each. If you’re wondering how this can possibly be useful, imagine a date dimension that breaks down by year/quarter/month/day or by year/week/weekday. That’s two hierarchies with four levels and three levels respectively. An OLAP cube also needs at least one measure. Again, academic definitions are probably not useful at this point. Therefore I’d like to think of a measure as something describing a quantitative value. In the cube at hand an obvious measure seems to be the amount of issues in any given breakdown. It is interesting how many open issues are not assigned yet, for instance. In real world cubes measures usually address questions beginning with “how many?” and “how much?”

The discussion above leads to the following schema file :

 
  
   
    ?


   

   
       
        1
        
       
        2
        
       
        3
        
       
        4
        
       
        5
        
       
        6
        
       
        7
        
       
        8
        
       
        9
        
       
        10
        
       
        11
        
       
        12
        
       
        13
        
       
        14
        
       
        15
        
       
        16
        
       
        17
        
       
        18
        
       
        19
        
       
        20
        
       
        21
        
       
        22
        
       
        23
        
       
        24
        
       
        25
        
       
        26
        
       
        27
        
       
        28
        
       
        29
        
       
        30
        
       
        31
        
       
        32
       
 
       
        
         <? 
         xml 
         version 
         = 
         "1.0" 
         ?> 
         
        
         < 
         Schema 
         name 
         = 
         "IssueSchema" 
         > 
         
        
              
         < 
         Cube 
         name 
         = 
         "Issue" 
         > 
         
        
                  
         < 
         Table 
         name 
         = 
         "fact_issue" 
         /> 
         
        
                  
         < 
         Dimension 
         name 
         = 
         "Type" 
         > 
         
        
                      
         < 
         Hierarchy 
         hasAll 
         = 
         "true" 
         allMemberName 
         = 
         "All Types" 
         > 
         
        
                          
         < 
         Level 
         name 
         = 
         "Type" 
         column 
         = 
         "issue_type" 
         uniqueMembers 
         = 
         "true" 
         /> 
         
        
                      
         </ 
         Hierarchy 
         > 
         
        
                  
         </ 
         Dimension 
         > 
         
        
                  
         < 
         Dimension 
         name 
         = 
         "Assignee" 
         > 
         
        
                      
         < 
         Hierarchy 
         hasAll 
         = 
         "true" 
         allMemberName 
         = 
         "All Assignees" 
         > 
         
        
                          
         < 
         Level 
         name 
         = 
         "Assignee" 
         column 
         = 
         "assignee" 
         uniqueMembers 
         = 
         "true" 
         /> 
         
        
                      
         </ 
         Hierarchy 
         > 
         
        
                  
         </ 
         Dimension 
         > 
         
        
                  
         < 
         Dimension 
         name 
         = 
         "Priority" 
         > 
         
        
                      
         < 
         Hierarchy 
         hasAll 
         = 
         "true" 
         allMemberName 
         = 
         "All Priorities" 
         > 
         
        
                          
         < 
         Level 
         name 
         = 
         "Priority" 
         column 
         = 
         "priority" 
         uniqueMembers 
         = 
         "true" 
         /> 
         
        
                      
         </ 
         Hierarchy 
         > 
         
        
                  
         </ 
         Dimension 
         > 
         
        
                  
         < 
         Dimension 
         name 
         = 
         "Status" 
         > 
         
        
                      
         < 
         Hierarchy 
         hasAll 
         = 
         "true" 
         allMemberName 
         = 
         "All Status" 
         > 
         
        
                          
         < 
         Level 
         name 
         = 
         "Status" 
         column 
         = 
         "status" 
         uniqueMembers 
         = 
         "true" 
         /> 
         
        
                      
         </ 
         Hierarchy 
         > 
         
        
                  
         </ 
         Dimension 
         > 
         
        
                  
         < 
         Dimension 
         name 
         = 
         "Resolution" 
         > 
         
        
                      
         < 
         Hierarchy 
         hasAll 
         = 
         "true" 
         allMemberName 
         = 
         "All Resolution" 
         > 
         
        
                          
         < 
         Level 
         name 
         = 
         "Resolution" 
         column 
         = 
         "resolution" 
         uniqueMembers 
         = 
         "true" 
         /> 
         
        
                      
         </ 
         Hierarchy 
         > 
         
        
                  
         </ 
         Dimension 
         > 
         
        
                  
         < 
         Measure 
         name 
         = 
         "Issue Count" 
         column 
         = 
         "id" 
         aggregator 
         = 
         "count" 
         formatString 
         = 
         "Standard" 
         /> 
         
        
              
         </ 
         Cube 
         > 
         
        
         </ 
         Schema 
         >

The schema file must be presented to the Mondrian engine. I can not give a complete account on how to do this, since the necessary steps depend on the server software (and version) you are using. The screencast shows the steps necessary on JasperServer 3.7 CE. You’ll have to consult the manual of your server solution to find out how to introduce a new OLAP schema.

Regardless of which server you use, at some point you’ll likely need to provide an MDX query for a view of the new cube. The initial query usually is not very important in interactive environments, since you’ll be able to use the UI to navigate the cube. And since this post is not dealing with MDX as such, I’d like to present a short MDX query that will work as a kickstart for the cube without comment.

        SELECT 
       
        Type.Children  
        ON 
        COLUMNS, 
       
        Assignee.Children  
        ON 
        ROWS 
       
        FROM 
        Issue

Conclusion

This post gives a practical introduction to Mondrian cubes. It shows how to create a minimal, but fully operational Mondrian cube. It also explains that an OLAP cube always revolves around a fact table and tries to give an intuitive understanding of dimensions and measures. Dimensions are essentially hierarchical properties attributed to facts that can be used to examine facts with certain characteristics at various levels of detail. Measures are describing facts in terms of numbers. They answer quantitative questions.

I hope that this little introduction enables Mondrian beginners to get up to speed faster. There is much more to know, of course. The follow up article deals with dimension tables . Check it out if you want the cube to be stored more efficiently.

If you would like to suggest a Mondrian topic, feel free to leave a comment.

Cheers

Slawo