Flowman Project 'weather' version 1.0

Description: This is a simple but very comprehensive example project for Flowman using publicly available weather data. The project will demonstrate many features of Flowman, like reading and writing data, performing data transformations, joining, filtering and aggregations. The project will also create a meaningful documentation containing data quality tests.
Generated at 2022-10-13T08:43:08.6

Index

Mappings

Relations

Targets

Mappings

Mapping 'weather/aggregates'

Description: This mapping calculates the aggregated metrics per year and per country

Inputs

[mapping_output] weather/facts:main

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 country STRING The country the weather station belongs to [weather/stations].country
IS NOT NULL ERROR
HAS UNIQUE VALUES ERROR
2 min_wind_speed FLOAT Minimum wind speed [weather/measurements].wind_speed [weather/measurements].wind_speed_qual
min_wind_speed >= 0 ERROR
3 max_wind_speed FLOAT Maximum wind speed [weather/measurements].wind_speed [weather/measurements].wind_speed_qual
max_wind_speed <= 60 ERROR
4 avg_wind_speed DOUBLE The wind speed in m/s (AVG) [weather/measurements].wind_speed [weather/measurements].wind_speed_qual
5 min_temperature FLOAT The air temperature in degree Celsius (MIN) [weather/measurements].air_temperature_qual [weather/measurements].air_temperature
min_temperature >= -100 ERROR
6 max_temperature FLOAT The air temperature in degree Celsius (MAX) [weather/measurements].air_temperature_qual [weather/measurements].air_temperature
max_temperature <= 100 ERROR
7 avg_temperature DOUBLE The air temperature in degree Celsius (AVG) [weather/measurements].air_temperature_qual [weather/measurements].air_temperature

Mapping 'weather/facts'

Description:

Inputs

[mapping_output] weather/measurements_joined:main

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf INT The USAF (US Air Force) id of the weather station [weather/measurements].usaf
2 wban INT The WBAN id of the weather station [weather/measurements].wban
3 date DATE The date when the measurement was made [weather/measurements].date
4 time STRING The time when the measurement was made [weather/measurements].time
5 report_type STRING The report type of the measurement [weather/measurements].report_type
6 wind_direction INT The direction from where the wind blows in degrees [weather/measurements].wind_direction [weather/measurements].wind_direction_qual
7 wind_direction_qual STRING The quality indicator of the wind direction. 1 means trustworthy quality. [weather/measurements].wind_direction_qual
8 wind_observation STRING [weather/measurements].wind_observation
9 wind_speed FLOAT The wind speed in m/s [weather/measurements].wind_speed [weather/measurements].wind_speed_qual
10 wind_speed_qual STRING The quality indicator of the wind speed. 1 means trustworthy quality. [weather/measurements].wind_speed_qual
11 air_temperature FLOAT The air temperature in degree Celsius [weather/measurements].air_temperature_qual [weather/measurements].air_temperature
12 air_temperature_qual STRING The quality indicator of the air temperature. 1 means trustworthy quality. [weather/measurements].air_temperature_qual
13 year INT NOT NULL The year of the measurement, used for partitioning the data [weather/measurements].year
14 name STRING An optional name for the weather station [weather/stations].name
15 country STRING The country the weather station belongs to [weather/stations].country
16 state STRING Optional state within the country the weather station belongs to [weather/stations].state
17 icao STRING [weather/stations].icao
18 latitude FLOAT The latitude of the geo location of the weather station [weather/stations].latitude
19 longitude FLOAT The longitude of the geo location of the weather station [weather/stations].longitude
20 elevation FLOAT The elevation above sea level in meters of the weather station [weather/stations].elevation
21 date_begin DATE The date when the weather station went into service [weather/stations].date_begin
22 date_end DATE The date when the weather station went out of service [weather/stations].date_end

Mapping 'weather/measurements'

Description:

Inputs

[relation] weather/measurements

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf INT The USAF (US Air Force) id of the weather station [weather/measurements].usaf
2 wban INT The WBAN id of the weather station [weather/measurements].wban
3 date DATE The date when the measurement was made [weather/measurements].date
4 time STRING The time when the measurement was made [weather/measurements].time
5 report_type STRING The report type of the measurement [weather/measurements].report_type
6 wind_direction INT The direction from where the wind blows in degrees [weather/measurements].wind_direction
7 wind_direction_qual STRING The quality indicator of the wind direction. 1 means trustworthy quality. [weather/measurements].wind_direction_qual
8 wind_observation STRING [weather/measurements].wind_observation
9 wind_speed FLOAT The wind speed in m/s [weather/measurements].wind_speed
10 wind_speed_qual STRING The quality indicator of the wind speed. 1 means trustworthy quality. [weather/measurements].wind_speed_qual
11 air_temperature FLOAT The air temperature in degree Celsius [weather/measurements].air_temperature
12 air_temperature_qual STRING The quality indicator of the air temperature. 1 means trustworthy quality. [weather/measurements].air_temperature_qual
13 year INT NOT NULL The year of the measurement, used for partitioning the data [weather/measurements].year

Mapping 'weather/measurements_extracted'

Description:

Inputs

[mapping_output] weather/measurements_raw:main

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf INT The USAF (US Air Force) id of the weather station [weather/measurements_raw].raw_data
2 wban INT The WBAN id of the weather station [weather/measurements_raw].raw_data
3 date DATE The date when the measurement was made [weather/measurements_raw].raw_data
4 time STRING The time when the measurement was made [weather/measurements_raw].raw_data
5 report_type STRING The report type of the measurement [weather/measurements_raw].raw_data
6 wind_direction INT The direction from where the wind blows in degrees [weather/measurements_raw].raw_data
7 wind_direction_qual STRING The quality indicator of the wind direction. 1 means trustworthy quality. [weather/measurements_raw].raw_data
8 wind_observation STRING [weather/measurements_raw].raw_data
9 wind_speed FLOAT The wind speed in m/s [weather/measurements_raw].raw_data
10 wind_speed_qual STRING The quality indicator of the wind speed. 1 means trustworthy quality. [weather/measurements_raw].raw_data
11 air_temperature FLOAT The air temperature in degree Celsius [weather/measurements_raw].raw_data
12 air_temperature_qual STRING The quality indicator of the air temperature. 1 means trustworthy quality. [weather/measurements_raw].raw_data

Mapping 'weather/measurements_joined'

Description:

Inputs

[mapping_output] weather/measurements:main
[mapping_output] weather/stations:main

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf INT The USAF (US Air Force) id of the weather station [weather/measurements].usaf
2 wban INT The WBAN id of the weather station [weather/measurements].wban
3 date DATE The date when the measurement was made [weather/measurements].date
4 time STRING The time when the measurement was made [weather/measurements].time
5 report_type STRING The report type of the measurement [weather/measurements].report_type
6 wind_direction INT The direction from where the wind blows in degrees [weather/measurements].wind_direction
7 wind_direction_qual STRING The quality indicator of the wind direction. 1 means trustworthy quality. [weather/measurements].wind_direction_qual
8 wind_observation STRING [weather/measurements].wind_observation
9 wind_speed FLOAT The wind speed in m/s [weather/measurements].wind_speed
10 wind_speed_qual STRING The quality indicator of the wind speed. 1 means trustworthy quality. [weather/measurements].wind_speed_qual
11 air_temperature FLOAT The air temperature in degree Celsius [weather/measurements].air_temperature
12 air_temperature_qual STRING The quality indicator of the air temperature. 1 means trustworthy quality. [weather/measurements].air_temperature_qual
13 year INT NOT NULL The year of the measurement, used for partitioning the data [weather/measurements].year
14 name STRING An optional name for the weather station [weather/stations].name
15 country STRING The country the weather station belongs to [weather/stations].country
16 state STRING Optional state within the country the weather station belongs to [weather/stations].state
17 icao STRING [weather/stations].icao
18 latitude FLOAT The latitude of the geo location of the weather station [weather/stations].latitude
19 longitude FLOAT The longitude of the geo location of the weather station [weather/stations].longitude
20 elevation FLOAT The elevation above sea level in meters of the weather station [weather/stations].elevation
21 date_begin DATE The date when the weather station went into service [weather/stations].date_begin
22 date_end DATE The date when the weather station went out of service [weather/stations].date_end

Mapping 'weather/measurements_raw'

Description:

Inputs

[relation] weather/measurements_raw

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 raw_data STRING Raw measurement data [weather/measurements_raw].raw_data
2 year INT NOT NULL The year when the measurement was made [weather/measurements_raw].year

Mapping 'weather/stations'

Description:

Inputs

[relation] weather/stations

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf STRING NOT NULL USAF station id [weather/stations].usaf
2 wban STRING NOT NULL WBAN station id [weather/stations].wban
3 name STRING An optional name for the weather station [weather/stations].name
4 country STRING The country the weather station belongs to [weather/stations].country
5 state STRING Optional state within the country the weather station belongs to [weather/stations].state
6 icao STRING [weather/stations].icao
7 latitude FLOAT The latitude of the geo location of the weather station [weather/stations].latitude
8 longitude FLOAT The longitude of the geo location of the weather station [weather/stations].longitude
9 elevation FLOAT The elevation above sea level in meters of the weather station [weather/stations].elevation
10 date_begin DATE The date when the weather station went into service [weather/stations].date_begin
11 date_end DATE The date when the weather station went out of service [weather/stations].date_end

Mapping 'weather/stations_raw'

Description:

Inputs

[relation] weather/stations_raw

Outputs

Output 'main'

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf STRING NOT NULL USAF station id [weather/stations_raw].usaf
2 wban STRING NOT NULL WBAN station id [weather/stations_raw].wban
3 name STRING An optional name for the weather station [weather/stations_raw].name
4 country STRING The country the weather station belongs to [weather/stations_raw].country
5 state STRING Optional state within the country the weather station belongs to [weather/stations_raw].state
6 icao STRING [weather/stations_raw].icao
7 latitude FLOAT The latitude of the geo location of the weather station [weather/stations_raw].latitude
8 longitude FLOAT The longitude of the geo location of the weather station [weather/stations_raw].longitude
9 elevation FLOAT The elevation above sea level in meters of the weather station [weather/stations_raw].elevation
10 date_begin DATE The date when the weather station went into service [weather/stations_raw].date_begin
11 date_end DATE The date when the weather station went out of service [weather/stations_raw].date_end

Relations

Relation 'weather/aggregates'

Description: The aggregate table contains min/max temperature value per year and country

Physical Resources

[file] file:/tmp/weather/aggregates

Sources

[file] file:/tmp/weather/measurements/year=2013
[file] file:/tmp/weather/stations

Direct Inputs

[mapping_output] weather/aggregates:main

Schema

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 country STRING The country the weather station belongs to [weather/stations].country
IS NOT NULL ERROR
2 min_wind_speed FLOAT Minimum wind speed [weather/measurements].wind_speed [weather/measurements].wind_speed_qual
All wind speeds must be positive
min_wind_speed >= 0
ERROR
3 max_wind_speed FLOAT Maximum wind speed [weather/measurements].wind_speed [weather/measurements].wind_speed_qual
4 avg_wind_speed FLOAT The wind speed in m/s (AVG) [weather/measurements].wind_speed [weather/measurements].wind_speed_qual
5 min_temperature FLOAT The air temperature in degree Celsius (MIN) [weather/measurements].air_temperature_qual [weather/measurements].air_temperature
min_temperature >= -100 ERROR
6 max_temperature FLOAT The air temperature in degree Celsius (MAX) [weather/measurements].air_temperature_qual [weather/measurements].air_temperature
max_temperature <= 100 WHERE min_temperature >= 10 ERROR
7 avg_temperature FLOAT The air temperature in degree Celsius (AVG) [weather/measurements].air_temperature_qual [weather/measurements].air_temperature
8 year INT NOT NULL
IS NOT NULL ERROR
Quality Check Result Remarks
There has only to be one entry per country and year
PRIMARY KEY (country,year)
ERROR

Relation 'weather/measurements'

Description: This model contains all individual measurements

Physical Resources

[file] file:/tmp/weather/measurements

Sources

[file] s3a://dimajix-training/data/weather/2013

Direct Inputs

[mapping_output] weather/measurements_extracted:main

Schema

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf INT The USAF (US Air Force) id of the weather station [weather/measurements_raw].raw_data
IS NOT NULL ERROR
2 wban INT The WBAN id of the weather station [weather/measurements_raw].raw_data
IS NOT NULL ERROR
3 date DATE The date when the measurement was made [weather/measurements_raw].raw_data
IS NOT NULL ERROR
4 time STRING The time when the measurement was made [weather/measurements_raw].raw_data
IS NOT NULL ERROR
5 report_type STRING The report type of the measurement [weather/measurements_raw].raw_data
6 wind_direction INT The direction from where the wind blows in degrees [weather/measurements_raw].raw_data
IS NOT NULL ERROR
(wind_direction >= 0 AND wind_direction <= 360) OR wind_direction_qual <> 1 ERROR
7 wind_direction_qual STRING The quality indicator of the wind direction. 1 means trustworthy quality. [weather/measurements_raw].raw_data
IS NOT NULL ERROR
8 wind_observation STRING [weather/measurements_raw].raw_data
9 wind_speed FLOAT The wind speed in m/s [weather/measurements_raw].raw_data
10 wind_speed_qual STRING The quality indicator of the wind speed. 1 means trustworthy quality. [weather/measurements_raw].raw_data
11 air_temperature FLOAT The air temperature in degree Celsius [weather/measurements_raw].raw_data
12 air_temperature_qual STRING The quality indicator of the air temperature. 1 means trustworthy quality. [weather/measurements_raw].raw_data
IS NOT NULL ERROR
IS IN (0,1,2,3,4,5,6,7,8,9) ERROR
13 year INT NOT NULL The year of the measurement, used for partitioning the data
IS NOT NULL ERROR
IS BETWEEN 1901 AND 2022 ERROR
Quality Check Result Remarks
The measurement has to refer to an existing station
FOREIGN KEY (usaf,wban) REFERENCES stations(usaf,wban)
ERROR

Relation 'weather/measurements_raw'

Description:

Physical Resources

[file] s3a://dimajix-training/data/weather

Schema

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 raw_data STRING Raw measurement data
2 year INT NOT NULL The year when the measurement was made

Relation 'weather/stations'

Description: The 'stations' table contains meta data on all weather stations

Physical Resources

[file] file:/tmp/weather/stations

Sources

[file] s3a://dimajix-training/data/weather/isd-history

Direct Inputs

[mapping_output] weather/stations_raw:main

Schema

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf STRING NOT NULL USAF station id [weather/stations_raw].usaf
2 wban STRING NOT NULL WBAN station id [weather/stations_raw].wban
3 name STRING An optional name for the weather station [weather/stations_raw].name
4 country STRING The country the weather station belongs to [weather/stations_raw].country
5 state STRING Optional state within the country the weather station belongs to [weather/stations_raw].state
6 icao STRING [weather/stations_raw].icao
7 latitude FLOAT The latitude of the geo location of the weather station [weather/stations_raw].latitude
8 longitude FLOAT The longitude of the geo location of the weather station [weather/stations_raw].longitude
9 elevation FLOAT The elevation above sea level in meters of the weather station [weather/stations_raw].elevation
10 date_begin DATE The date when the weather station went into service [weather/stations_raw].date_begin
11 date_end DATE The date when the weather station went out of service [weather/stations_raw].date_end
Quality Check Result Remarks
PRIMARY KEY (usaf,wban) ERROR

Relation 'weather/stations_raw'

Description:

Physical Resources

[file] s3a://dimajix-training/data/weather/isd-history

Schema

No Column Name Data Type Constraints Description Source Columns Quality Checks
1 usaf STRING NOT NULL USAF station id
2 wban STRING NOT NULL WBAN station id
3 name STRING An optional name for the weather station
4 country STRING The country the weather station belongs to
5 state STRING Optional state within the country the weather station belongs to
6 icao STRING
7 latitude FLOAT The latitude of the geo location of the weather station
8 longitude FLOAT The longitude of the geo location of the weather station
9 elevation FLOAT The elevation above sea level in meters of the weather station
10 date_begin DATE The date when the weather station went into service
11 date_end DATE The date when the weather station went out of service

Targets

Target 'weather/aggregates'

Description: Write aggregated measurements per year

Inputs

[mapping_output] weather/aggregates:main

Outputs

[relation] weather/aggregates

Phases

CREATE BUILD TRUNCATE VERIFY DESTROY

Target 'weather/documentation'

Description:

Inputs

Outputs

Phases

VERIFY

Target 'weather/measurements'

Description: Write extracted measurements per year

Inputs

[mapping_output] weather/measurements_extracted:main

Outputs

[relation] weather/measurements

Phases

CREATE BUILD TRUNCATE VERIFY DESTROY

Target 'weather/metrics'

Description: Collect relevant metrics from measurements, to be published to a metrics collector

Inputs

Outputs

Phases

VERIFY

Target 'weather/stations'

Description: This build target is used to write the weather stations

Inputs

[mapping_output] weather/stations_raw:main

Outputs

[relation] weather/stations

Phases

CREATE BUILD TRUNCATE VERIFY DESTROY

Target 'weather/validate_stations_raw'

Description:

Inputs

Outputs

Phases

VALIDATE