Getting started

This “Getting started” tutorial is a brief introduction to Pymicra. This is in no way supposed to be a complete representation of everything that can be done with Pymicra.

In this tutorial we use some example data and refer to some example python scripts that can be downloaded here. These data and scripts are from a measurement campaign in a very small island (about 20 meters across) in a large artificial lake. At the time of these measurements the island was almost completely immersed into about 5 cm of water. Please feel free to explore both the example data and the example programs, as well as modify the programs for your own learning process!

Notation

Pymicra uses a specific notation to name each one of its columns. This notation is extremely important, because it is by these labels that Pymicra knows which variable is in each column. You can check the default notation with

In [1]: %%capture
   ...: import pymicra as pm
   ...: print(pm.notation)
   ...: 

The output is too long to be reproduced here, but on the left you’ll see the full name of the variables (which corresponds to a notation namespace/attribute) and on the right you’ll see the default notation for that variable.

We recommend to use the default notation for the sake of simplicity, however, you can change Pymicra’s notation at any time by altering the attributes of pm.notation. For example, by default the notation for the mean is '%s_mean', and every variable follows this base notation:

In [2]: pm.notation.mean
Out[2]: '%s_mean'

In [3]: pm.notation.mean_u
Out[3]: 'u_mean'

In [4]: pm.notation.mean_h2o_mass_concentration
Out[4]: 'conc_h2o_mean'

To change this, you have to change the mean notation and then re-build the whole notation with the build method:

In [5]: pm.notation.mean = 'm_%s'

In [6]: pm.notation.build()

In [7]: pm.notation.mean_u
Out[7]: 'm_u'

In [8]: pm.notation.mean_h2o_mass_concentration
Out[8]: 'm_conc_h2o'

In [9]: pm.notation.h2o='v'

In [10]: pm.notation.build()

In [11]: pm.notation.mean_h2o_mass_concentration
Out[11]: 'm_conc_v'

If you just want to change the notation of one variable, but not the full notation, just don’t re-build. For example:

In [12]: pm.notation.mean_co2_mass_concentration = 'c_m'

In [13]: pm.notation.mean_co2_mass_concentration
Out[13]: 'c_m'

In [14]: pm.notation.mean_h2o_mass_concentration
Out[14]: 'm_conc_v'

It is important to note that this changes the notation used throughout every Pymicra function. If, however, you want to use a different notation in a specific part of the program (in one specific function for example) you can create a Notation object and pass it to the function, such as

In [15]: mynotation = pm.Notation()

In [16]: mynotation.co2='c'

In [17]: mynotation.build()

In [18]: fluxes = pm.eddyCovariance(data, units, notation=mynotation) # For example

In the example above the default Pymicra notation is left untouched, and a separate notation is defined which is then used in a Pymicra function separately.

Creating file configurations file

The easiest way to read data files is using a fileConfig object. This object holds the configuration of the data files so you can just call this object when reading these files. To make it easier, Pymicra prefers to read this configurations from a file. That way you can write the configurations for some data files once, store it into a configuration file and then use it from then on every time you want to read those data files. That is what Pymicra calls a “file configuration file”, or “config file” for short. From that file, Pymicra can create a pymicra.fileConfig object. Consider, for example, the config file below

description='datalogger configuration file for a lake. Located at examples/lake.config'

variables={
0:'%Y-%m-%d',
1:'%H:%M:%S.%f',
2:'u',
3:'v',
4:'w',
5:'theta_v',
6:'mrho_h2o',
7:'mrho_co2',
8:'p',
9:'theta'}

units={
'u':'m/s',
'v':'m/s',
'w':'m/s',
'theta_v':'celsius',
'mrho_co2':'mmol/m**3',
'mrho_h2o':'mmol/m**3',
'p':'kPa',
'theta':'celsius'
}

columns_separator=','
frequency=20
header_lines=None

filename_format='%Y%m%d-%H%M.csv'
date_cols = [0, 1]

First of all, note that the .config file is written in Python syntax, so it has to be able to actually be run on python. This has to be true for all .config files.

Furthermore, the extension of the file does not matter. We adopt the .config extension for clarity, but it could be anything else.

The previous config file describes the data files in the directory ../examples/ex_data/. Here’s an example of one such file for comparison:

2013-11-08,10:00:00.000000,2.375,-5.206,-0.103,27.06,1238.0,14.675,99.19,30.43,-0.303,-0.274,-0.269,-0.261
2013-11-08,10:00:00.050000,2.4930000000000003,-5.098,-0.018000000000000002,27.12,1196.0,14.409,99.199,30.43,-0.308,-0.275,-0.271,-0.263
2013-11-08,10:00:00.100000,2.263,-5.114,0.014,27.11,1220.0,14.636,99.102,30.43,-0.306,-0.277,-0.273,-0.263
2013-11-08,10:00:00.150000,2.21,-5.235,-0.012,27.11,1238.0,14.688,99.154,30.43,-0.308,-0.277,-0.273,-0.264
2013-11-08,10:00:00.200000,2.158,-5.174,-0.112,27.12,1174.0,14.476,99.154,30.44,-0.31,-0.277,-0.273,-0.264
2013-11-08,10:00:00.250000,2.334,-5.279,-0.092,27.1,1195.0,14.671,99.154,30.43,-0.308,-0.278,-0.273,-0.265
2013-11-08,10:00:00.300000,2.396,-5.2970000000000015,0.005,27.15,1198.0,14.669,99.154,30.43,-0.309,-0.279,-0.272,-0.264
2013-11-08,10:00:00.350000,2.494,-5.246,0.039,27.13,1197.0,14.722,99.154,30.44,-0.311,-0.279,-0.273,-0.264
2013-11-08,10:00:00.400000,2.263,-5.317,-0.079,27.12,1202.0,14.709,99.154,30.43,-0.311,-0.279,-0.275,-0.265
2013-11-08,10:00:00.450000,2.135,-5.176,-0.036000000000000004,27.08,1202.0,14.731,99.154,30.44,-0.314,-0.279,-0.275,-0.267

Note that not all columns of this file are described. Columns that are not described are also read but are discarded by default. You can change that using only_named_columns=False in the timeSeries function.

We obtain the config object with

In [19]: fconfig = pm.fileConfig('../examples/lake.config')

In [20]: print(fconfig)
<pymicra.fileConfig>
datalogger configuration file for a lake. Located at examples/lake.config

Each variable defined in this file works as a keyword, since it can also be input manually when calling pymicra.fileConfig(). Thus, for more information, you can also use help(pymicra.fileConfig). Now we explain the keywords one by one. In the next section we will explain how to use this object for reading a data file.

description

The description is optional. It’s a string that serves only to better identify the config file you’re dealing with. It might useful for storage purposes and useful when printing the config object.

variables

The most important keyword is variables. This is a python dictionary where each key is a column and its corresponding value is the variable in that column. Note that we are using here the default notation to indicate which variable is in which column. If a different notation is to be used here, then you will have to define a new notation in your program (refer back to Notation for that).

Note

From this point on, for simplicity, we will assume that the default notation is used.

It is imperative that the columns be named accordingly. For example, measuring H2O contents in mmol/m^3 is different from measuring it in g/m^3 or mg/g. The first is a molar density (moles per volume), the second is a mass density (mass per volume) and the third is a mass concentration (mass per mass). In the default notation these are indicated by the names 'mrho_h2o', 'rho_h2o' and 'conc_h2o', respectively, and Pymicra needs to know which one is which.

Columns that contain parts of the timestamp have to have their name matching Python’s date format string directive, which themselves are the 1989 version default C standard format dates, which is common in many platforms.

This is useful only in case you want to index your data by timestamp, which is a huge advantage in some cases (check out what Pandas can do with timestamp-indexed data) but Pymicra can also work well without this. If you don’t wish to work with timestamps and want to work only by line number in each file, you can ignore these columns and indicate that you don’t want to parse dates. In fact, parsing of dates makes Pymicra a lot slower. Reading a file parsing its dates is about 5.5 times slower than reading the same file without parsing any dates!

units

The units keyword is also very important. It tells Pymicra in which units each variable is being measured. Units are handled by Pint, so for more details on how to define the units please refer to their documentation. Suffices to say here that the format of the units are pretty intuitive. Some quick remarks are

  • prefer to define units unambiguously ('g/(m*(s**2))' is generally preferred to 'g/m/s**2', although both will work).
  • to define that a unit is dimensionless, '1' will not work. Define it as 'dimensionless' or 'g/g' and so on.
  • if one variable does not have a unit (such as a sensor flag), you don’t have to include that variable.
  • the keys of units should exactly match the values of variables.

columns_separator

The columns_separator keyword is what it sounds: what separates one column from the other. Generally it is one character, such as a comma. A special case happens is if the columns are separated by whitespaces of varying length, or tabs. In that case it should be "whitespace".

frequency

The frequency keyword is the frequency of the data collection in Hertz.

header_lines

The keyword header_lines tells us which of the first lines are part of the file header. If there is no header then is should be None. If there are header lines than it should be a list or int. For example, if the first two lines of the file are part of a header, it should be [0, 1]. If it were the 4 first lines, [0, 1, 2, 3] (range(4) would also be acceptable).

Header lines are not used by Pymicra and are therefore skipped.

filename_format

The filename_format keyword tells Pymicra how the data files are named.

date_cols

The date_cols keyword is optional. It is a list of integers that indicates which of the columns are a part of the timestamp. If it’s not provided, then Pymicra will assume that columns whose names have the character “%” in them are part of the date and will try to parse them. If the default notation is used, this should always be true.

Reading data

To read a data file or a list of data files we use the function timeSeries along with a config file. Let us use the config file defined in the previous subsection with one of the data file it describes:

In [21]: fname = '../examples/ex_data/20131108-1000.csv'

In [22]: fconfig = pm.fileConfig('../examples/lake.config')

In [23]: data, units = pm.timeSeries(fname, fconfig, parse_dates=True)

In [24]: print(data)
                             u      v      w  theta_v  mrho_h2o  mrho_co2       p  theta
Timestamp                                                                               
2013-11-08 10:00:00.000  2.375 -5.206 -0.103    27.06    1238.0    14.675  99.190  30.43
2013-11-08 10:00:00.050  2.493 -5.098 -0.018    27.12    1196.0    14.409  99.199  30.43
2013-11-08 10:00:00.100  2.263 -5.114  0.014    27.11    1220.0    14.636  99.102  30.43
2013-11-08 10:00:00.150  2.210 -5.235 -0.012    27.11    1238.0    14.688  99.154  30.43
2013-11-08 10:00:00.200  2.158 -5.174 -0.112    27.12    1174.0    14.476  99.154  30.44
2013-11-08 10:00:00.250  2.334 -5.279 -0.092    27.10    1195.0    14.671  99.154  30.43
2013-11-08 10:00:00.300  2.396 -5.297  0.005    27.15    1198.0    14.669  99.154  30.43
2013-11-08 10:00:00.350  2.494 -5.246  0.039    27.13    1197.0    14.722  99.154  30.44
2013-11-08 10:00:00.400  2.263 -5.317 -0.079    27.12    1202.0    14.709  99.154  30.43
2013-11-08 10:00:00.450  2.135 -5.176 -0.036    27.08    1202.0    14.731  99.154  30.44
...                        ...    ...    ...      ...       ...       ...     ...    ...
2013-11-08 10:59:59.500  4.951 -4.584  0.420    28.03    1261.0    14.772  99.102  32.08
2013-11-08 10:59:59.550  5.057 -4.436  0.492    28.00    1181.0    14.718  99.138  32.07
2013-11-08 10:59:59.600  5.145 -4.424  0.409    28.10    1216.0    14.889  99.112  32.08
2013-11-08 10:59:59.650  5.282 -4.038  0.448    28.03    1198.0    14.485  99.112  32.06
2013-11-08 10:59:59.700  5.065 -4.453  0.424    28.11    1184.0    14.578  99.138  32.07
2013-11-08 10:59:59.750  5.262 -4.703  0.126    27.98    1264.0    14.929  99.138  32.08
2013-11-08 10:59:59.800  5.323 -4.882  0.242    27.95    1229.0    14.258  99.138  32.07
2013-11-08 10:59:59.850  5.344 -5.119  0.457    27.96    1198.0    14.962  99.102  32.07
2013-11-08 10:59:59.900  5.281 -5.261  0.599    28.09    1231.0    14.615  99.112  32.07
2013-11-08 10:59:59.950  5.235 -4.801  0.362    28.02    1211.0    14.682  99.164  32.08

[72000 rows x 8 columns]

Note that data is a pandas.DataFrame object which contains the whole data available in the datafile with each column being a variable. Since we indicated that we wanted to parse the dates with the option parse_dates=True, each row has its respective timestamp. If, otherwise, we were to ignore the dates, the result would be a integer-indexed dataset:

In [25]: data2, units = pm.timeSeries(fname, fconfig, parse_dates=False)

In [26]: print(data2)
           u      v      w  theta_v  mrho_h2o  mrho_co2       p  theta
0      2.375 -5.206 -0.103    27.06    1238.0    14.675  99.190  30.43
1      2.493 -5.098 -0.018    27.12    1196.0    14.409  99.199  30.43
2      2.263 -5.114  0.014    27.11    1220.0    14.636  99.102  30.43
3      2.210 -5.235 -0.012    27.11    1238.0    14.688  99.154  30.43
4      2.158 -5.174 -0.112    27.12    1174.0    14.476  99.154  30.44
5      2.334 -5.279 -0.092    27.10    1195.0    14.671  99.154  30.43
6      2.396 -5.297  0.005    27.15    1198.0    14.669  99.154  30.43
7      2.494 -5.246  0.039    27.13    1197.0    14.722  99.154  30.44
8      2.263 -5.317 -0.079    27.12    1202.0    14.709  99.154  30.43
9      2.135 -5.176 -0.036    27.08    1202.0    14.731  99.154  30.44
...      ...    ...    ...      ...       ...       ...     ...    ...
71990  4.951 -4.584  0.420    28.03    1261.0    14.772  99.102  32.08
71991  5.057 -4.436  0.492    28.00    1181.0    14.718  99.138  32.07
71992  5.145 -4.424  0.409    28.10    1216.0    14.889  99.112  32.08
71993  5.282 -4.038  0.448    28.03    1198.0    14.485  99.112  32.06
71994  5.065 -4.453  0.424    28.11    1184.0    14.578  99.138  32.07
71995  5.262 -4.703  0.126    27.98    1264.0    14.929  99.138  32.08
71996  5.323 -4.882  0.242    27.95    1229.0    14.258  99.138  32.07
71997  5.344 -5.119  0.457    27.96    1198.0    14.962  99.102  32.07
71998  5.281 -5.261  0.599    28.09    1231.0    14.615  99.112  32.07
71999  5.235 -4.801  0.362    28.02    1211.0    14.682  99.164  32.08

[72000 rows x 8 columns]

And, as mentioned, the latter way is a lot faster:

In [27]: %timeit pm.timeSeries(fname, fconfig, parse_dates=False)
   ....: %timeit pm.timeSeries(fname, fconfig, parse_dates=True)
   ....: 
1 loop, best of 3: 216 ms per loop
1 loop, best of 3: 933 ms per loop

Viewing and manipulating data

To view and manipulate data, mostly you have to follow Pandas’s DataFrame rules. For that we suggest that the user visit a Pandas tutorial. However, I’ll explain some main ideas here for the sake of completeness and introduce some few ideas specific for Pymicra that don’t exist for general Pandas DataFrames.

Printing and plotting

First, for viewing raw data on screen there’s printing. Slicing and indexing are supported by Pandas, but without support for units:

In [29]: print(data['theta_v'])
Timestamp
2013-11-08 10:00:00.000    27.06
2013-11-08 10:00:00.050    27.12
2013-11-08 10:00:00.100    27.11
2013-11-08 10:00:00.150    27.11
2013-11-08 10:00:00.200    27.12
2013-11-08 10:00:00.250    27.10
                           ...  
2013-11-08 10:59:59.700    28.11
2013-11-08 10:59:59.750    27.98
2013-11-08 10:59:59.800    27.95
2013-11-08 10:59:59.850    27.96
2013-11-08 10:59:59.900    28.09
2013-11-08 10:59:59.950    28.02
Name: theta_v, Length: 72000, dtype: float64

In [30]: print(data[['u', 'v', 'w']])
                             u      v      w
Timestamp                                   
2013-11-08 10:00:00.000  2.375 -5.206 -0.103
2013-11-08 10:00:00.050  2.493 -5.098 -0.018
2013-11-08 10:00:00.100  2.263 -5.114  0.014
2013-11-08 10:00:00.150  2.210 -5.235 -0.012
2013-11-08 10:00:00.200  2.158 -5.174 -0.112
2013-11-08 10:00:00.250  2.334 -5.279 -0.092
...                        ...    ...    ...
2013-11-08 10:59:59.700  5.065 -4.453  0.424
2013-11-08 10:59:59.750  5.262 -4.703  0.126
2013-11-08 10:59:59.800  5.323 -4.882  0.242
2013-11-08 10:59:59.850  5.344 -5.119  0.457
2013-11-08 10:59:59.900  5.281 -5.261  0.599
2013-11-08 10:59:59.950  5.235 -4.801  0.362

[72000 rows x 3 columns]

In [31]: print(data['20131108 10:15:00.000':'20131108 10:17:00.000'])
                             u      v      w  theta_v  mrho_h2o  mrho_co2       p  theta
Timestamp                                                                               
2013-11-08 10:15:00.000  2.634 -4.351  0.107    27.30    1229.0    15.002  99.128  30.80
2013-11-08 10:15:00.050  2.869 -4.249  0.040    27.44    1175.0    14.751  99.164  30.80
2013-11-08 10:15:00.100  3.320 -4.326 -0.079    27.26    1159.0    14.689  99.138  30.80
2013-11-08 10:15:00.150  2.759 -4.339 -0.007    27.24    1170.0    14.715  99.190  30.80
2013-11-08 10:15:00.200  2.748 -4.128 -0.038    27.21    1174.0    14.681  99.190  30.80
2013-11-08 10:15:00.250  3.149 -4.074 -0.387    27.20    1190.0    14.662  99.173  30.80
...                        ...    ...    ...      ...       ...       ...     ...    ...
2013-11-08 10:17:00.700  3.910 -4.698 -0.366    27.27    1170.0    14.592  99.128  30.85
2013-11-08 10:17:00.750  3.824 -4.535 -0.313    27.33    1165.0    14.492  99.164  30.85
2013-11-08 10:17:00.800  3.758 -4.353 -0.116    27.28    1103.0    14.495  99.164  30.85
2013-11-08 10:17:00.850  3.761 -4.454 -0.010    27.28    1128.0    14.611  99.164  30.85
2013-11-08 10:17:00.900  3.546 -4.766 -0.433    27.28    1131.0    14.709  99.147  30.85
2013-11-08 10:17:00.950  3.238 -4.601 -0.378    27.29    1130.0    14.809  99.147  30.85

[2420 rows x 8 columns]

Note that Pandas “guesses” if the argument you pass ('theta_v' or '2013-11-08 10:15:00' etc.) is a column indexer or a row indexer. To use these unambiguously, use the .loc method as

In [32]: print(data.loc['2013-11-08 10:15:00':'2013-11-08 10:17:00', ['u','v','w']])
                             u      v      w
Timestamp                                   
2013-11-08 10:15:00.000  2.634 -4.351  0.107
2013-11-08 10:15:00.050  2.869 -4.249  0.040
2013-11-08 10:15:00.100  3.320 -4.326 -0.079
2013-11-08 10:15:00.150  2.759 -4.339 -0.007
2013-11-08 10:15:00.200  2.748 -4.128 -0.038
2013-11-08 10:15:00.250  3.149 -4.074 -0.387
...                        ...    ...    ...
2013-11-08 10:17:00.700  3.910 -4.698 -0.366
2013-11-08 10:17:00.750  3.824 -4.535 -0.313
2013-11-08 10:17:00.800  3.758 -4.353 -0.116
2013-11-08 10:17:00.850  3.761 -4.454 -0.010
2013-11-08 10:17:00.900  3.546 -4.766 -0.433
2013-11-08 10:17:00.950  3.238 -4.601 -0.378

[2420 rows x 3 columns]

This method is actually preferred and you can find more information on this topic here.

To view these data with units, you can use the .with_units() method. The previous output would look like this using units:

In [33]: print(data.with_units(units)['theta_v'])
                         <degC>
Timestamp                      
2013-11-08 10:00:00.000   27.06
2013-11-08 10:00:00.050   27.12
2013-11-08 10:00:00.100   27.11
2013-11-08 10:00:00.150   27.11
2013-11-08 10:00:00.200   27.12
2013-11-08 10:00:00.250   27.10
...                         ...
2013-11-08 10:59:59.700   28.11
2013-11-08 10:59:59.750   27.98
2013-11-08 10:59:59.800   27.95
2013-11-08 10:59:59.850   27.96
2013-11-08 10:59:59.900   28.09
2013-11-08 10:59:59.950   28.02

[72000 rows x 1 columns]

In [34]: print(data.with_units(units)[['u', 'v', 'w']])
                                       u                v                w
                        <meter / second> <meter / second> <meter / second>
Timestamp                                                                 
2013-11-08 10:00:00.000            2.375           -5.206           -0.103
2013-11-08 10:00:00.050            2.493           -5.098           -0.018
2013-11-08 10:00:00.100            2.263           -5.114            0.014
2013-11-08 10:00:00.150            2.210           -5.235           -0.012
2013-11-08 10:00:00.200            2.158           -5.174           -0.112
2013-11-08 10:00:00.250            2.334           -5.279           -0.092
...                                  ...              ...              ...
2013-11-08 10:59:59.700            5.065           -4.453            0.424
2013-11-08 10:59:59.750            5.262           -4.703            0.126
2013-11-08 10:59:59.800            5.323           -4.882            0.242
2013-11-08 10:59:59.850            5.344           -5.119            0.457
2013-11-08 10:59:59.900            5.281           -5.261            0.599
2013-11-08 10:59:59.950            5.235           -4.801            0.362

[72000 rows x 3 columns]

In [35]: print(data.with_units(units)['2013-11-08 10:15:00'])
                                       u                v                w theta_v  \
                        <meter / second> <meter / second> <meter / second>  <degC>   
Timestamp                                                                            
2013-11-08 10:15:00.000            2.634           -4.351            0.107   27.30   
2013-11-08 10:15:00.050            2.869           -4.249            0.040   27.44   
2013-11-08 10:15:00.100            3.320           -4.326           -0.079   27.26   
2013-11-08 10:15:00.150            2.759           -4.339           -0.007   27.24   
2013-11-08 10:15:00.200            2.748           -4.128           -0.038   27.21   
2013-11-08 10:15:00.250            3.149           -4.074           -0.387   27.20   
...                                  ...              ...              ...     ...   
2013-11-08 10:15:00.700            3.057           -4.090           -0.230   27.28   
2013-11-08 10:15:00.750            3.386           -4.169           -0.082   27.21   
2013-11-08 10:15:00.800            3.731           -4.180            0.291   27.42   
2013-11-08 10:15:00.850            3.676           -4.100            0.021   27.29   
2013-11-08 10:15:00.900            3.796           -4.390            0.170   27.24   
2013-11-08 10:15:00.950            3.294           -3.322            0.560   27.50   

                                        mrho_h2o                 mrho_co2            p  theta  
                        <millimole / meter ** 3> <millimole / meter ** 3> <kilopascal> <degC>  
Timestamp                                                                                      
2013-11-08 10:15:00.000                   1229.0                   15.002       99.128   30.8  
2013-11-08 10:15:00.050                   1175.0                   14.751       99.164   30.8  
2013-11-08 10:15:00.100                   1159.0                   14.689       99.138   30.8  
2013-11-08 10:15:00.150                   1170.0                   14.715       99.190   30.8  
2013-11-08 10:15:00.200                   1174.0                   14.681       99.190   30.8  
2013-11-08 10:15:00.250                   1190.0                   14.662       99.173   30.8  
...                                          ...                      ...          ...    ...  
2013-11-08 10:15:00.700                   1151.0                   14.758       99.164   30.8  
2013-11-08 10:15:00.750                   1195.0                   14.318       99.164   30.8  
2013-11-08 10:15:00.800                   1172.0                   14.369       99.164   30.8  
2013-11-08 10:15:00.850                   1173.0                   14.687       99.164   30.8  
2013-11-08 10:15:00.900                   1153.0                   14.442       99.164   30.8  
2013-11-08 10:15:00.950                   1176.0                   14.724       99.208   30.8  

[20 rows x 8 columns]

Warning

Note that, although this method returns a Pandas DataFrame, it is not meant for calculations. Currently the DataFrame it returns is meant for visualization purposes only!

We can also plot the data on screen so we can view it interactively. This can be done directly from the DataFrame with

In [36]: from matplotlib import pyplot as plt

In [37]: data[['u', 'v', 'w']].plot()
Out[37]: <matplotlib.axes._subplots.AxesSubplot at 0x7f17c7956950>

In [38]: plt.show()
_images/uvw_plot_basics.png

Using the plt.show() command, the plot above would plot interactively. If we had used plt.savefig('figure.png') instead, it would have saved the figure as png. For more on plotting, you can checkout Pandas’s visualization guide and find out ways to make this plot look nicer, how to render it with LaTeX and some more tricks.

Pymicra also has an .xplot method, which brings a little more options to Pandas’s .plot() method.

Todo

give xplot examples

Converting units

You can manually convert between units using the contents from Manipulating and the Pint package. But Pymicra has a very useful method to do this called .convert_cols (more exist, but let’s focus on this one).

Let’s, for example, convert some units:

In [39]: conversions = {'p':'pascal', 'mrho_h2o':'mole/m^3', 'theta_v':'kelvin'}

In [40]: print(data.convert_cols(conversions, units, inplace_units=False))
(                             u      v      w  theta_v  mrho_h2o  mrho_co2        p  theta
Timestamp                                                                                
2013-11-08 10:00:00.000  2.375 -5.206 -0.103   300.21     1.238    14.675  99190.0  30.43
2013-11-08 10:00:00.050  2.493 -5.098 -0.018   300.27     1.196    14.409  99199.0  30.43
2013-11-08 10:00:00.100  2.263 -5.114  0.014   300.26     1.220    14.636  99102.0  30.43
2013-11-08 10:00:00.150  2.210 -5.235 -0.012   300.26     1.238    14.688  99154.0  30.43
2013-11-08 10:00:00.200  2.158 -5.174 -0.112   300.27     1.174    14.476  99154.0  30.44
2013-11-08 10:00:00.250  2.334 -5.279 -0.092   300.25     1.195    14.671  99154.0  30.43
...                        ...    ...    ...      ...       ...       ...      ...    ...
2013-11-08 10:59:59.700  5.065 -4.453  0.424   301.26     1.184    14.578  99138.0  32.07
2013-11-08 10:59:59.750  5.262 -4.703  0.126   301.13     1.264    14.929  99138.0  32.08
2013-11-08 10:59:59.800  5.323 -4.882  0.242   301.10     1.229    14.258  99138.0  32.07
2013-11-08 10:59:59.850  5.344 -5.119  0.457   301.11     1.198    14.962  99102.0  32.07
2013-11-08 10:59:59.900  5.281 -5.261  0.599   301.24     1.231    14.615  99112.0  32.07
2013-11-08 10:59:59.950  5.235 -4.801  0.362   301.17     1.211    14.682  99164.0  32.08

[72000 rows x 8 columns], {'theta_v': <Unit('kelvin')>, 'p': <Unit('pascal')>, 'mrho_h2o': <Unit('mole / meter ** 3')>})

Note that the units dictionary is updated automatically if the inplace_units keyword is true. The default is false for safety reasons, but passing this keyword as true is much simpler and compact:

In [41]: conversions = {'theta':'kelvin', 'theta_v':'kelvin'}

In [42]: data = data.convert_cols(conversions, units, inplace_units=True)

In [43]: print(data.with_units(units))
                                       u                v                w  theta_v  \
                        <meter / second> <meter / second> <meter / second> <kelvin>   
Timestamp                                                                             
2013-11-08 10:00:00.000            2.375           -5.206           -0.103   300.21   
2013-11-08 10:00:00.050            2.493           -5.098           -0.018   300.27   
2013-11-08 10:00:00.100            2.263           -5.114            0.014   300.26   
2013-11-08 10:00:00.150            2.210           -5.235           -0.012   300.26   
2013-11-08 10:00:00.200            2.158           -5.174           -0.112   300.27   
2013-11-08 10:00:00.250            2.334           -5.279           -0.092   300.25   
...                                  ...              ...              ...      ...   
2013-11-08 10:59:59.700            5.065           -4.453            0.424   301.26   
2013-11-08 10:59:59.750            5.262           -4.703            0.126   301.13   
2013-11-08 10:59:59.800            5.323           -4.882            0.242   301.10   
2013-11-08 10:59:59.850            5.344           -5.119            0.457   301.11   
2013-11-08 10:59:59.900            5.281           -5.261            0.599   301.24   
2013-11-08 10:59:59.950            5.235           -4.801            0.362   301.17   

                                        mrho_h2o                 mrho_co2            p    theta  
                        <millimole / meter ** 3> <millimole / meter ** 3> <kilopascal> <kelvin>  
Timestamp                                                                                        
2013-11-08 10:00:00.000                   1238.0                   14.675       99.190   303.58  
2013-11-08 10:00:00.050                   1196.0                   14.409       99.199   303.58  
2013-11-08 10:00:00.100                   1220.0                   14.636       99.102   303.58  
2013-11-08 10:00:00.150                   1238.0                   14.688       99.154   303.58  
2013-11-08 10:00:00.200                   1174.0                   14.476       99.154   303.59  
2013-11-08 10:00:00.250                   1195.0                   14.671       99.154   303.58  
...                                          ...                      ...          ...      ...  
2013-11-08 10:59:59.700                   1184.0                   14.578       99.138   305.22  
2013-11-08 10:59:59.750                   1264.0                   14.929       99.138   305.23  
2013-11-08 10:59:59.800                   1229.0                   14.258       99.138   305.22  
2013-11-08 10:59:59.850                   1198.0                   14.962       99.102   305.22  
2013-11-08 10:59:59.900                   1231.0                   14.615       99.112   305.22  
2013-11-08 10:59:59.950                   1211.0                   14.682       99.164   305.23  

[72000 rows x 8 columns]

Manipulating

Manipulating data is pretty intuitive with Pandas. For example

In [44]: data['rho_air'] = data['p']/(287.058*data['theta_v'])

In [45]: print(data['rho_air'])
Timestamp
2013-11-08 10:00:00.000    0.001151
2013-11-08 10:00:00.050    0.001151
2013-11-08 10:00:00.100    0.001150
2013-11-08 10:00:00.150    0.001150
2013-11-08 10:00:00.200    0.001150
2013-11-08 10:00:00.250    0.001150
                             ...   
2013-11-08 10:59:59.700    0.001146
2013-11-08 10:59:59.750    0.001147
2013-11-08 10:59:59.800    0.001147
2013-11-08 10:59:59.850    0.001147
2013-11-08 10:59:59.900    0.001146
2013-11-08 10:59:59.950    0.001147
Name: rho_air, Length: 72000, dtype: float64

If, however, you’re not familiar with Pandas and prefer to just stick with what you know, you can get Numpy arrays from columns using the .values attribute:

In [46]: P = data['p'].values

In [47]: Tv = data['theta_v'].values

In [48]: print(type(Tv))
<type 'numpy.ndarray'>

In [49]: rho_air = P/(287.058*Tv)

In [50]: print(rho_air)
[0.00115099 0.00115087 0.00114978 ... 0.00114654 0.00114616 0.00114702]

In [51]: print(type(rho_air))
<type 'numpy.ndarray'>

Doing that you can step out of Pandas and do your own calculations using your own Python or Numpy code. This is pretty advantageous if you have a lot of routines that are already written in your own way.