Hands-on tutorial¶

Here we give some examples of generally-used steps to obtain fluxes, spectra and Ogives. This is just the commonly done procedures, but more can (and sometimes should) be done depending on the data.

Calculating auxiliary variables and units¶

After reading the data it’s easy to rotate the coordinates. Currently, only the 2D rotation is implemented. But in the future more will come (you can contribute with another one yourself!). To rotate, use the pm.rotateCoor function:

In [1]: import pymicra as pm

ImportErrorTraceback (most recent call last)
<ipython-input-1-9b11e6f7f607> in <module>()
----> 1 import pymicra as pm

ImportError: No module named pymicra

In [2]: fname = '../examples/ex_data/20131108-1000.csv'

In [3]: fconfig = pm.fileConfig('../examples/lake.config')

NameErrorTraceback (most recent call last)
<ipython-input-3-2002138cada7> in <module>()
----> 1 fconfig = pm.fileConfig('../examples/lake.config')

NameError: name 'pm' is not defined

In [4]: data, units = pm.timeSeries(fname, fconfig, parse_dates=True)

NameErrorTraceback (most recent call last)
<ipython-input-4-93e03627fecc> in <module>()
----> 1 data, units = pm.timeSeries(fname, fconfig, parse_dates=True)

NameError: name 'pm' is not defined

In [5]: data = data.rotateCoor(how='2d')

NameErrorTraceback (most recent call last)
<ipython-input-5-e72713014a5d> in <module>()
----> 1 data = data.rotateCoor(how='2d')

NameError: name 'data' is not defined

In [6]: print(data.with_units(units).mean())

NameErrorTraceback (most recent call last)
<ipython-input-6-00e79628f730> in <module>()
----> 1 print(data.with_units(units).mean())

NameError: name 'data' is not defined

As you can see, the u, v, w components have been rotated and v and w mean are zero.

Pymicra has a very useful function called preProcess which “expands” the variables in the dataset by using the original variables to calculate new ones, such as mixing ratios, moist and dry air densities etc. In the process some units are also converted, such as temperature units, which are converted to Kelvin, if they are in Celsius. The preProcess function has some options to calculate some variables in determined ways and is very “verbose”, so as to show the user exactly what it is doing:

In [7]: data = pm.preProcess(data, units, expand_temperature=True,
   ...:     use_means=False, rho_air_from_theta_v=True, solutes=['co2'])
   ...: 

NameErrorTraceback (most recent call last)
<ipython-input-7-0a90c42effdf> in <module>()
----> 1 data = pm.preProcess(data, units, expand_temperature=True,
      2     use_means=False, rho_air_from_theta_v=True, solutes=['co2'])

NameError: name 'pm' is not defined

In [8]: print(data.with_units(units).mean())

NameErrorTraceback (most recent call last)
<ipython-input-8-00e79628f730> in <module>()
----> 1 print(data.with_units(units).mean())

NameError: name 'data' is not defined

Note that our dataset now has many other variables and that the temperatures are now in Kelvin. Note also that you must, at this point, specify which solutes you wish to consider. The only solute that Pymicra looks for by default is water vapor.

It is recommended that you carefully observe the output of preProcess to see if it’s doing what you think it’s doing and to analyse its calculation logic. By doing that you can have a better idea of what it does without having to check the code, and how to prevent it from doing some calculations you don’t want. For example, if you’re not happy with the way it calculates moist air density, you can calculate it yourself by extracting Numpy_ arrays using the .values method:

T = data['theta'].values
P = data['p'].values
water_density = data['mrho_h2o'].values
data['rho_air'] = my_rho_air_calculation(T, P, water_density)

Next we calculate the fluctuations. The way to do that is with the detrend() function/method.

In [9]: ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])

NameErrorTraceback (most recent call last)
<ipython-input-9-4bfe3d4e5669> in <module>()
----> 1 ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])

NameError: name 'data' is not defined

In [10]: print(ddata.with_units(units).mean())

NameErrorTraceback (most recent call last)
<ipython-input-10-b1913e2bc6e7> in <module>()
----> 1 print(ddata.with_units(units).mean())

NameError: name 'ddata' is not defined

Note the ignore keyword with which you can pass which variables you don’t want fluctuations of. In our case both the pressure fluctuations and the thermodynamic temperature (measured with the LI7500 and the datalogger’s internal thermometer) can’t be properly measured with the sensors used. Passing these variables as ignored will prevent Pymicra from calculating these fluctuations, which later on prevents it from inadvertently using the fluctuations of these variables with these sensors when calculating fluxes. theta fluctuations will instead be calculated with virtual temperature fluctuations and pressure fluctuations are generally not used.

The how keyword supports 'linear', 'movingmean', 'movingmedian', 'block' and 'poly'. Each of those (with the exception of 'block' and 'linear') are passed to Pandas or Numpy (pandas.rolling_mean, pandas.rolling_median and numpy.polyfit) and support their respective keywords, such as data.detrend(how='movingmedian', units=units, ignore=['theta', 'p']), min_periods=1) to determine the minimum number of observations in window required to have a value.

Now we must expand our dataset with the fluctuations by joining DataFrames:

In [11]: data = data.join(ddata)

NameErrorTraceback (most recent call last)
<ipython-input-11-8886e3b98fbe> in <module>()
----> 1 data = data.join(ddata)

NameError: name 'data' is not defined

In [12]: print(data.with_units(units).mean())

NameErrorTraceback (most recent call last)
<ipython-input-12-00e79628f730> in <module>()
----> 1 print(data.with_units(units).mean())

NameError: name 'data' is not defined

Creating a site configuration file¶

In order to use some micrometeorological function, we need a site configuration object, or site object. This object tells Pymicra how the instruments are organized and how is experimental site (vegetation, roughness length etc.). This object is created with the siteConfig() call and the easiest way is to use it with a site config file. This is like a fileConfig file, but simpler and for the micrometeorological aspects of the site. Consider this example of a site config file, saved at ../examples/lake.site:

description         = "Site configurations for the lake island"

measurement_height  = 2.75  # m
canopy_height       = 0.1   # m
displacement_height = 0.05  # m
roughness_length    = .01   # m

As in the case of fileConfig, the description is optional, but is handy for organization purposes and printing on screen.

The measurement_height keyword is the general measurement height to be used in the calculation, generally taken as the sonic anemometer height (in meters).

The canopy_height keyword is what it sounds like. Should be the mean canopy height in meters. The displacement_height is the zero-plane displacement height. If it is not given, it’ll be calculated as 2/3 of the mean canopy height.

The roughness_length is the roughness length in meters.

The creation of the site config object is done as

In [13]: siteconf = pm.siteConfig('../examples/lake.site')

NameErrorTraceback (most recent call last)
<ipython-input-13-cdf486c6076f> in <module>()
----> 1 siteconf = pm.siteConfig('../examples/lake.site')

NameError: name 'pm' is not defined

In [14]: print(siteconf)

NameErrorTraceback (most recent call last)
<ipython-input-14-e77236641ec0> in <module>()
----> 1 print(siteconf)

NameError: name 'siteconf' is not defined

Extracting fluxes¶

Finally, we are able to calculate the fluxes and turbulent scales. For that we use the eddyCovariance function along with the get_turbulent_scales keyword (which is true by default). Again, it is recommended that you carefully check the output of this function before moving on:

In [15]: results = pm.eddyCovariance(data, units, site_config=siteconf,
   ....:     get_turbulent_scales=True, wpl=True, solutes=['co2'])
   ....: 

NameErrorTraceback (most recent call last)
<ipython-input-15-8440e806e5b7> in <module>()
----> 1 results = pm.eddyCovariance(data, units, site_config=siteconf,
      2     get_turbulent_scales=True, wpl=True, solutes=['co2'])

NameError: name 'pm' is not defined

In [16]: print(results.with_units(units).mean())

NameErrorTraceback (most recent call last)
<ipython-input-16-47068f36201d> in <module>()
----> 1 print(results.with_units(units).mean())

NameError: name 'results' is not defined

Note that once more we must list the solutes available.

Note also that the units dictionary is automatically updated at with every function and method with the new variables created and their units! That way if you’re in doubt of which unit the outputs are coming, just check units directly or with the .with_units() method.

Check out the example to get fluxes of many files here and download the example data here.

The output of this file is

                                                 tau                   H                  Hv                                 E                  LE  \
                    <kilogram / meter / second ** 2> <watt / meter ** 2> <watt / meter ** 2> <millimole / meter ** 2 / second> <watt / meter ** 2>
2013-11-08 10:00:00                         0.195731           50.837195           74.537609                          7.007486          306.392623
2013-11-08 12:00:00                         0.070182           -7.116972           14.737946                          6.647450          290.331014
2013-11-08 13:00:00                         0.059115           -3.325774           12.743194                          4.875838          212.906974
2013-11-08 16:00:00                         0.017075          -13.847040           -4.131275                          2.961474          128.982980
2013-11-08 17:00:00                         0.003284           -5.557620           -2.525949                          0.931213           40.576805
2013-11-08 18:00:00                         0.026564           -6.632330           -2.782190                          1.184226           51.654208
2013-11-08 19:00:00                         0.044550          -14.860873          -11.991508                          0.925637           40.476446

                              u_star theta_v_star theta_star            mrho_h2o_star          Lo            zeta          q_star
                    <meter / second>     <kelvin>   <kelvin> <millimole / meter ** 3>     <meter> <dimensionless> <dimensionless>
2013-11-08 10:00:00         0.412886     0.156686   0.106865                16.971952  -83.429641       -0.032363        0.000266
2013-11-08 12:00:00         0.247697     0.051834  -0.025031                26.837060  -90.992476       -0.029673        0.000423
2013-11-08 13:00:00         0.227652     0.048903  -0.012763                21.417958  -81.621514       -0.033080        0.000338
2013-11-08 16:00:00         0.122980    -0.029651  -0.099383                24.080946   39.583979        0.068209        0.000384
2013-11-08 17:00:00         0.053951    -0.041353  -0.090986                17.260443    5.463569        0.494182        0.000276
2013-11-08 18:00:00         0.153328    -0.016003  -0.038149                 7.723494  113.854892        0.023714        0.000123
2013-11-08 19:00:00         0.198083    -0.053132  -0.065846                 4.672974   56.961515        0.047400        0.000074

Note that the date parsing when reading the file comes is handy now, although there are computationally faster ways to do it other than setting parse_dates=True in the timeSeries() call.

Obtaining the spectra¶

Using Numpy’s fast Fourier transform implementation, Pymicra is also able to extract spectra, co-spectra and quadratures.

We begin normally with out data:

In [17]: import pymicra as pm

ImportErrorTraceback (most recent call last)
<ipython-input-17-9b11e6f7f607> in <module>()
----> 1 import pymicra as pm

ImportError: No module named pymicra

In [18]: fname = '../examples/ex_data/20131108-1000.csv'

In [19]: fconfig = pm.fileConfig('../examples/lake.config')

NameErrorTraceback (most recent call last)
<ipython-input-19-2002138cada7> in <module>()
----> 1 fconfig = pm.fileConfig('../examples/lake.config')

NameError: name 'pm' is not defined

In [20]: data, units = pm.timeSeries(fname, fconfig, parse_dates=True)

NameErrorTraceback (most recent call last)
<ipython-input-20-93e03627fecc> in <module>()
----> 1 data, units = pm.timeSeries(fname, fconfig, parse_dates=True)

NameError: name 'pm' is not defined

In [21]: data = data.rotateCoor(how='2d')

NameErrorTraceback (most recent call last)
<ipython-input-21-e72713014a5d> in <module>()
----> 1 data = data.rotateCoor(how='2d')

NameError: name 'data' is not defined

In [22]: data = pm.preProcess(data, units, expand_temperature=True,
   ....:     use_means=False, rho_air_from_theta_v=True, solutes=['co2'])
   ....: 

NameErrorTraceback (most recent call last)
<ipython-input-22-0a90c42effdf> in <module>()
----> 1 data = pm.preProcess(data, units, expand_temperature=True,
      2     use_means=False, rho_air_from_theta_v=True, solutes=['co2'])

NameError: name 'pm' is not defined

In [23]: ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])

NameErrorTraceback (most recent call last)
<ipython-input-23-4bfe3d4e5669> in <module>()
----> 1 ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])

NameError: name 'data' is not defined

In [24]: spectra = pm.spectra(ddata[["q'", "theta_v'"]], frequency=20, anti_aliasing=True)

NameErrorTraceback (most recent call last)
<ipython-input-24-b4682cc3e5e6> in <module>()
----> 1 spectra = pm.spectra(ddata[["q'", "theta_v'"]], frequency=20, anti_aliasing=True)

NameError: name 'pm' is not defined

In [25]: print(spectra)

NameErrorTraceback (most recent call last)
<ipython-input-25-ea92637c3c3c> in <module>()
----> 1 print(spectra)

NameError: name 'spectra' is not defined

We can plot it with the raw points, but it’s hard to see anything

In [26]: spectra.plot(loglog=True, style='o')

NameErrorTraceback (most recent call last)
<ipython-input-26-3209f40b2847> in <module>()
----> 1 spectra.plot(loglog=True, style='o')

NameError: name 'spectra' is not defined

In [27]: plt.show()

The best option it to apply a binning procedure before plotting it:

In [28]: spectra.binned(bins_number=100).plot(loglog=True, style='o')

NameErrorTraceback (most recent call last)
<ipython-input-28-6dfea9cdcb8e> in <module>()
----> 1 spectra.binned(bins_number=100).plot(loglog=True, style='o')

NameError: name 'spectra' is not defined

In [29]: plt.show()

We can also calculate the cross-spectra

In [30]: crspectra = pm.crossSpectra(ddata[["q'", "theta_v'", "w'"]], frequency=20, anti_aliasing=True)

NameErrorTraceback (most recent call last)
<ipython-input-30-56e3cca37107> in <module>()
----> 1 crspectra = pm.crossSpectra(ddata[["q'", "theta_v'", "w'"]], frequency=20, anti_aliasing=True)

NameError: name 'pm' is not defined

In [31]: print(crspectra)

NameErrorTraceback (most recent call last)
<ipython-input-31-e1220752c416> in <module>()
----> 1 print(crspectra)

NameError: name 'crspectra' is not defined

We can then get the cospectra and plot it’s binned version (the same can be done with the .quadrature() method). It’s important to note that, although to pass from cross-spectra to cospectra one can merely do data.apply(np.real), it’s recommended to use the .cospectra() method or the pm.spectral.cospectra() function, since this way the notation on the resulting DataFrame will be correctly passed on, as you can see by the legend in the resulting plot.

In [32]: cospectra = crspectra[[r"X_q'_w'", r"X_theta_v'_w'"]].cospectra()

NameErrorTraceback (most recent call last)
<ipython-input-32-c593c7884705> in <module>()
----> 1 cospectra = crspectra[[r"X_q'_w'", r"X_theta_v'_w'"]].cospectra()

NameError: name 'crspectra' is not defined

In [33]: cospectra.binned(bins_number=100).plot(loglog=True, style='o')

NameErrorTraceback (most recent call last)
<ipython-input-33-69235c4682be> in <module>()
----> 1 cospectra.binned(bins_number=100).plot(loglog=True, style='o')

NameError: name 'cospectra' is not defined

In [34]: plt.show()

Now we can finally obtain an Ogive with the pm.spectral.Ogive() function. Note that we plot Og/Og.sum() (i.e. we normalize the ogive) instead of Og merely for visualization purposes, as the scale of both ogives are too different.

In [35]: Og = pm.spectral.Ogive(cospectra)

NameErrorTraceback (most recent call last)
<ipython-input-35-601571afe6a0> in <module>()
----> 1 Og = pm.spectral.Ogive(cospectra)

NameError: name 'pm' is not defined

In [36]: (Og/Og.sum()).plot(logx=True)

NameErrorTraceback (most recent call last)
<ipython-input-36-88336701827c> in <module>()
----> 1 (Og/Og.sum()).plot(logx=True)

NameError: name 'Og' is not defined

In [37]: plt.show()

Note

Include some examples for spectral corrections.

Table Of Contents

This Page