Hands-on tutorial¶
Here we give some examples of generally-used steps to obtain fluxes, spectra and Ogives. This is just the commonly done procedures, but more can (and sometimes should) be done depending on the data.
Calculating auxiliary variables and units¶
After reading the data it’s easy to rotate the coordinates. Currently, only the
2D rotation is implemented. But in the future more will come (you can
contribute with another one yourself!). To rotate, use the pm.rotateCoor
function:
In [1]: import pymicra as pm
ImportErrorTraceback (most recent call last)
<ipython-input-1-9b11e6f7f607> in <module>()
----> 1 import pymicra as pm
ImportError: No module named pymicra
In [2]: fname = '../examples/ex_data/20131108-1000.csv'
In [3]: fconfig = pm.fileConfig('../examples/lake.config')
NameErrorTraceback (most recent call last)
<ipython-input-3-2002138cada7> in <module>()
----> 1 fconfig = pm.fileConfig('../examples/lake.config')
NameError: name 'pm' is not defined
In [4]: data, units = pm.timeSeries(fname, fconfig, parse_dates=True)
NameErrorTraceback (most recent call last)
<ipython-input-4-93e03627fecc> in <module>()
----> 1 data, units = pm.timeSeries(fname, fconfig, parse_dates=True)
NameError: name 'pm' is not defined
In [5]: data = data.rotateCoor(how='2d')
NameErrorTraceback (most recent call last)
<ipython-input-5-e72713014a5d> in <module>()
----> 1 data = data.rotateCoor(how='2d')
NameError: name 'data' is not defined
In [6]: print(data.with_units(units).mean())
NameErrorTraceback (most recent call last)
<ipython-input-6-00e79628f730> in <module>()
----> 1 print(data.with_units(units).mean())
NameError: name 'data' is not defined
As you can see, the u, v, w components have been rotated and v and w
mean are zero.
Pymicra has a very useful function called preProcess which “expands” the
variables in the dataset by using the original variables to calculate new ones,
such as mixing ratios, moist and dry air densities etc. In the process some
units are also converted, such as temperature units, which are converted to
Kelvin, if they are in Celsius. The preProcess function has some options to
calculate some variables in determined ways and is very “verbose”, so as to
show the user exactly what it is doing:
In [7]: data = pm.preProcess(data, units, expand_temperature=True,
...: use_means=False, rho_air_from_theta_v=True, solutes=['co2'])
...:
NameErrorTraceback (most recent call last)
<ipython-input-7-0a90c42effdf> in <module>()
----> 1 data = pm.preProcess(data, units, expand_temperature=True,
2 use_means=False, rho_air_from_theta_v=True, solutes=['co2'])
NameError: name 'pm' is not defined
In [8]: print(data.with_units(units).mean())
NameErrorTraceback (most recent call last)
<ipython-input-8-00e79628f730> in <module>()
----> 1 print(data.with_units(units).mean())
NameError: name 'data' is not defined
Note that our dataset now has many other variables and that the temperatures are now in Kelvin. Note also that you must, at this point, specify which solutes you wish to consider. The only solute that Pymicra looks for by default is water vapor.
It is recommended that you carefully observe the output of preProcess to
see if it’s doing what you think it’s doing and to analyse its calculation
logic. By doing that you can have a better idea of what it does without having
to check the code, and how to prevent it from doing some calculations you don’t
want. For example, if you’re not happy with the way it calculates moist air
density, you can calculate it yourself by extracting Numpy_ arrays using the
.values method:
T = data['theta'].values
P = data['p'].values
water_density = data['mrho_h2o'].values
data['rho_air'] = my_rho_air_calculation(T, P, water_density)
Next we calculate the fluctuations. The way to do that is with the detrend()
function/method.
In [9]: ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])
NameErrorTraceback (most recent call last)
<ipython-input-9-4bfe3d4e5669> in <module>()
----> 1 ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])
NameError: name 'data' is not defined
In [10]: print(ddata.with_units(units).mean())
NameErrorTraceback (most recent call last)
<ipython-input-10-b1913e2bc6e7> in <module>()
----> 1 print(ddata.with_units(units).mean())
NameError: name 'ddata' is not defined
Note the ignore keyword with which you can pass which variables you don’t
want fluctuations of. In our case both the pressure fluctuations and the
thermodynamic temperature (measured with the LI7500 and the datalogger’s
internal thermometer) can’t be properly measured with the sensors used. Passing
these variables as ignored will prevent Pymicra from calculating these
fluctuations, which later on prevents it from inadvertently using the
fluctuations of these variables with these sensors when calculating fluxes.
theta fluctuations will instead be calculated with virtual temperature
fluctuations and pressure fluctuations are generally not used.
The how keyword supports 'linear', 'movingmean',
'movingmedian', 'block' and 'poly'. Each of those (with the
exception of 'block' and 'linear') are passed to Pandas or Numpy
(pandas.rolling_mean, pandas.rolling_median and numpy.polyfit) and
support their respective keywords, such as data.detrend(how='movingmedian',
units=units, ignore=['theta', 'p']), min_periods=1) to determine the minimum
number of observations in window required to have a value.
Now we must expand our dataset with the fluctuations by joining DataFrames:
In [11]: data = data.join(ddata)
NameErrorTraceback (most recent call last)
<ipython-input-11-8886e3b98fbe> in <module>()
----> 1 data = data.join(ddata)
NameError: name 'data' is not defined
In [12]: print(data.with_units(units).mean())
NameErrorTraceback (most recent call last)
<ipython-input-12-00e79628f730> in <module>()
----> 1 print(data.with_units(units).mean())
NameError: name 'data' is not defined
Creating a site configuration file¶
In order to use some micrometeorological function, we need a site configuration
object, or site object. This object tells Pymicra how the instruments are organized
and how is experimental site (vegetation, roughness length etc.). This object is created
with the siteConfig() call and the easiest way is to use it with a site config file. This
is like a fileConfig file, but simpler and for the micrometeorological aspects of the site.
Consider this example of a site config file, saved at ../examples/lake.site:
description = "Site configurations for the lake island"
measurement_height = 2.75 # m
canopy_height = 0.1 # m
displacement_height = 0.05 # m
roughness_length = .01 # m
As in the case of fileConfig, the description is optional, but is handy for
organization purposes and printing on screen.
The measurement_height keyword is the general measurement height to be used
in the calculation, generally taken as the sonic anemometer height (in meters).
The canopy_height keyword is what it sounds like. Should be the mean canopy
height in meters. The displacement_height is the zero-plane displacement
height. If it is not given, it’ll be calculated as 2/3 of the mean canopy
height.
The roughness_length is the roughness length in meters.
The creation of the site config object is done as
In [13]: siteconf = pm.siteConfig('../examples/lake.site')
NameErrorTraceback (most recent call last)
<ipython-input-13-cdf486c6076f> in <module>()
----> 1 siteconf = pm.siteConfig('../examples/lake.site')
NameError: name 'pm' is not defined
In [14]: print(siteconf)
NameErrorTraceback (most recent call last)
<ipython-input-14-e77236641ec0> in <module>()
----> 1 print(siteconf)
NameError: name 'siteconf' is not defined
Extracting fluxes¶
Finally, we are able to calculate the fluxes and turbulent scales. For that we
use the eddyCovariance function along with the get_turbulent_scales
keyword (which is true by default). Again, it is recommended that you carefully
check the output of this function before moving on:
In [15]: results = pm.eddyCovariance(data, units, site_config=siteconf,
....: get_turbulent_scales=True, wpl=True, solutes=['co2'])
....:
NameErrorTraceback (most recent call last)
<ipython-input-15-8440e806e5b7> in <module>()
----> 1 results = pm.eddyCovariance(data, units, site_config=siteconf,
2 get_turbulent_scales=True, wpl=True, solutes=['co2'])
NameError: name 'pm' is not defined
In [16]: print(results.with_units(units).mean())
NameErrorTraceback (most recent call last)
<ipython-input-16-47068f36201d> in <module>()
----> 1 print(results.with_units(units).mean())
NameError: name 'results' is not defined
Note that once more we must list the solutes available.
Note also that the units dictionary is automatically updated at with every
function and method with the new variables created and their units! That way if
you’re in doubt of which unit the outputs are coming, just check units
directly or with the .with_units() method.
Check out the example to get fluxes of many files here and download the example data here.
The output of this file is
tau H Hv E LE \
<kilogram / meter / second ** 2> <watt / meter ** 2> <watt / meter ** 2> <millimole / meter ** 2 / second> <watt / meter ** 2>
2013-11-08 10:00:00 0.195731 50.837195 74.537609 7.007486 306.392623
2013-11-08 12:00:00 0.070182 -7.116972 14.737946 6.647450 290.331014
2013-11-08 13:00:00 0.059115 -3.325774 12.743194 4.875838 212.906974
2013-11-08 16:00:00 0.017075 -13.847040 -4.131275 2.961474 128.982980
2013-11-08 17:00:00 0.003284 -5.557620 -2.525949 0.931213 40.576805
2013-11-08 18:00:00 0.026564 -6.632330 -2.782190 1.184226 51.654208
2013-11-08 19:00:00 0.044550 -14.860873 -11.991508 0.925637 40.476446
u_star theta_v_star theta_star mrho_h2o_star Lo zeta q_star
<meter / second> <kelvin> <kelvin> <millimole / meter ** 3> <meter> <dimensionless> <dimensionless>
2013-11-08 10:00:00 0.412886 0.156686 0.106865 16.971952 -83.429641 -0.032363 0.000266
2013-11-08 12:00:00 0.247697 0.051834 -0.025031 26.837060 -90.992476 -0.029673 0.000423
2013-11-08 13:00:00 0.227652 0.048903 -0.012763 21.417958 -81.621514 -0.033080 0.000338
2013-11-08 16:00:00 0.122980 -0.029651 -0.099383 24.080946 39.583979 0.068209 0.000384
2013-11-08 17:00:00 0.053951 -0.041353 -0.090986 17.260443 5.463569 0.494182 0.000276
2013-11-08 18:00:00 0.153328 -0.016003 -0.038149 7.723494 113.854892 0.023714 0.000123
2013-11-08 19:00:00 0.198083 -0.053132 -0.065846 4.672974 56.961515 0.047400 0.000074
Note that the date parsing when reading the file comes is handy now, although
there are computationally faster ways to do it other than setting
parse_dates=True in the timeSeries() call.
Obtaining the spectra¶
Using Numpy’s fast Fourier transform implementation, Pymicra is also able to extract spectra, co-spectra and quadratures.
We begin normally with out data:
In [17]: import pymicra as pm
ImportErrorTraceback (most recent call last)
<ipython-input-17-9b11e6f7f607> in <module>()
----> 1 import pymicra as pm
ImportError: No module named pymicra
In [18]: fname = '../examples/ex_data/20131108-1000.csv'
In [19]: fconfig = pm.fileConfig('../examples/lake.config')
NameErrorTraceback (most recent call last)
<ipython-input-19-2002138cada7> in <module>()
----> 1 fconfig = pm.fileConfig('../examples/lake.config')
NameError: name 'pm' is not defined
In [20]: data, units = pm.timeSeries(fname, fconfig, parse_dates=True)
NameErrorTraceback (most recent call last)
<ipython-input-20-93e03627fecc> in <module>()
----> 1 data, units = pm.timeSeries(fname, fconfig, parse_dates=True)
NameError: name 'pm' is not defined
In [21]: data = data.rotateCoor(how='2d')
NameErrorTraceback (most recent call last)
<ipython-input-21-e72713014a5d> in <module>()
----> 1 data = data.rotateCoor(how='2d')
NameError: name 'data' is not defined
In [22]: data = pm.preProcess(data, units, expand_temperature=True,
....: use_means=False, rho_air_from_theta_v=True, solutes=['co2'])
....:
NameErrorTraceback (most recent call last)
<ipython-input-22-0a90c42effdf> in <module>()
----> 1 data = pm.preProcess(data, units, expand_temperature=True,
2 use_means=False, rho_air_from_theta_v=True, solutes=['co2'])
NameError: name 'pm' is not defined
In [23]: ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])
NameErrorTraceback (most recent call last)
<ipython-input-23-4bfe3d4e5669> in <module>()
----> 1 ddata = data.detrend(how='linear', units=units, ignore=['p', 'theta'])
NameError: name 'data' is not defined
In [24]: spectra = pm.spectra(ddata[["q'", "theta_v'"]], frequency=20, anti_aliasing=True)
NameErrorTraceback (most recent call last)
<ipython-input-24-b4682cc3e5e6> in <module>()
----> 1 spectra = pm.spectra(ddata[["q'", "theta_v'"]], frequency=20, anti_aliasing=True)
NameError: name 'pm' is not defined
In [25]: print(spectra)
NameErrorTraceback (most recent call last)
<ipython-input-25-ea92637c3c3c> in <module>()
----> 1 print(spectra)
NameError: name 'spectra' is not defined
We can plot it with the raw points, but it’s hard to see anything
In [26]: spectra.plot(loglog=True, style='o')
NameErrorTraceback (most recent call last)
<ipython-input-26-3209f40b2847> in <module>()
----> 1 spectra.plot(loglog=True, style='o')
NameError: name 'spectra' is not defined
In [27]: plt.show()
The best option it to apply a binning procedure before plotting it:
In [28]: spectra.binned(bins_number=100).plot(loglog=True, style='o')
NameErrorTraceback (most recent call last)
<ipython-input-28-6dfea9cdcb8e> in <module>()
----> 1 spectra.binned(bins_number=100).plot(loglog=True, style='o')
NameError: name 'spectra' is not defined
In [29]: plt.show()
We can also calculate the cross-spectra
In [30]: crspectra = pm.crossSpectra(ddata[["q'", "theta_v'", "w'"]], frequency=20, anti_aliasing=True)
NameErrorTraceback (most recent call last)
<ipython-input-30-56e3cca37107> in <module>()
----> 1 crspectra = pm.crossSpectra(ddata[["q'", "theta_v'", "w'"]], frequency=20, anti_aliasing=True)
NameError: name 'pm' is not defined
In [31]: print(crspectra)
NameErrorTraceback (most recent call last)
<ipython-input-31-e1220752c416> in <module>()
----> 1 print(crspectra)
NameError: name 'crspectra' is not defined
We can then get the cospectra and plot it’s binned version (the same can be
done with the .quadrature() method). It’s important to note that, although
to pass from cross-spectra to cospectra one can merely do
data.apply(np.real), it’s recommended to use the .cospectra() method or
the pm.spectral.cospectra() function, since this way the notation on the
resulting DataFrame will be correctly passed on, as you can see by the legend
in the resulting plot.
In [32]: cospectra = crspectra[[r"X_q'_w'", r"X_theta_v'_w'"]].cospectra()
NameErrorTraceback (most recent call last)
<ipython-input-32-c593c7884705> in <module>()
----> 1 cospectra = crspectra[[r"X_q'_w'", r"X_theta_v'_w'"]].cospectra()
NameError: name 'crspectra' is not defined
In [33]: cospectra.binned(bins_number=100).plot(loglog=True, style='o')
NameErrorTraceback (most recent call last)
<ipython-input-33-69235c4682be> in <module>()
----> 1 cospectra.binned(bins_number=100).plot(loglog=True, style='o')
NameError: name 'cospectra' is not defined
In [34]: plt.show()
Now we can finally obtain an Ogive with the pm.spectral.Ogive() function.
Note that we plot Og/Og.sum() (i.e. we normalize the ogive) instead of
Og merely for visualization purposes, as the scale of both ogives are too
different.
In [35]: Og = pm.spectral.Ogive(cospectra)
NameErrorTraceback (most recent call last)
<ipython-input-35-601571afe6a0> in <module>()
----> 1 Og = pm.spectral.Ogive(cospectra)
NameError: name 'pm' is not defined
In [36]: (Og/Og.sum()).plot(logx=True)
NameErrorTraceback (most recent call last)
<ipython-input-36-88336701827c> in <module>()
----> 1 (Og/Og.sum()).plot(logx=True)
NameError: name 'Og' is not defined
In [37]: plt.show()
Note
Include some examples for spectral corrections.