DATASTAGE
Decision support systems are usually based on the
development of Data Warehouse infrastructures.
Data warehouse architecture has
two major areas:
The staging area and the presentation area.
1) We present the staging area. The sources, from which data
shall be systematically extracted, in order to be loaded in the DW, are
determined. The database schema documentation of these sources is reviewed in
order to design the data extraction logic.Datastage Online Training
2) Documentation quality of the data structures of these sources
influences the degree of difficulty in designing the data extraction logic.
Data extracted are loaded in the staging area, either as simple files or as
updates in database tables. The staging area may have various stages.
Extraction of data from sources, transformation of data into new structures and
data loading in the DW, a process known as ETL, takes places in the staging
area.Datastage Online Training In Hyderabad
The extraction process requires the determination of source
relational tables - fields, from which data shall be extracted (as mentioned
above, documentation of these structures is crucial for design). The design of
the extraction process determines Various types of raw data processing, take
place at the staging area:
Data standardization: data transformation to a standard
format, if needed Sorting of records Matching and merging records of the same
entity, which are derived from different sources (e.g. order records of the
same Customer from different order handling systems), after standardization
Processing of calculated facts (facts derived from detailed data e.g. total
monetary value of an order).
- Management of surrogate keys, which replace operational systems keys
- Enrichment of records with default values, if required
- Production of aggregate data, if needed
Data conversion according to the technological platform used
by the DW (DBMS, operating system).The ETL process is automated by software and
executed periodically to update the DW. the frequency of data extraction the
extraction method (e.g. changes only) and technology (database partial
replication) the database instance or the file in which data are initially
loaded, in the staging area.Online Datastage Training
Moreover, the volume of data to be extracted is estimated,
in order to plan for computational & storage capacity. Estimation sheets
known as 'volumetric sheets' are developed with the following information per
source field:
- extraction frequency
- estimated volume
- Standardization and transformation rules applied (if any)
- DW database field to which data will be loaded.
In many cases, data quality assessment and data cleansing
steps also take place in the staging area. Design and implementation of the
automated ETL process, often represents a major part of the man effort to
develop a DW (international statistics estimate that it exceeds 70% of total
effort). The DW staging area, is often implemented in a separate physical
server (staging server), thus adding complexity and cost. However, this approach
has certain advantages like.

The information which you provided is very much useful for Hadoop Online Training Learners thanks for sharing valuable information
ReplyDelete