pymicra.algs package

Submodules

pymicra.algs.auxiliar

pymicra.algs.auxiliar.applyResult(result, failed, df, control=None, testname=None, filename=None, failshow=False, index_n=None)

Auxiliar function to be used with util.qcontrol

Parameters:
  • result (bool) – whether the test failed and succeeded
  • failed (list) – list of failed variables. None object if the test was successful
  • control (dictionary) – dictionary whose keys are the names of the tests and items are lists
  • testname (string) – name of the test (has to match control dict)
  • filename (string) – name or path or identifier of the file tested
  • failshow (bool) – whether to show the failed variables or not
pymicra.algs.auxiliar.first_last(fname)

Returns first and last lines of a file

pymicra.algs.auxiliar.lenYear(year)

Calculates the length of a year in days Useful to figure out if a certain year is a leap year

pymicra.algs.auxiliar.stripDown(str, final='', args=['_', '-'])

Auxiliar function to strip down keywords from symbols

pymicra.algs.auxiliar.testValid(df_valid, testname='', failverbose=True, passverbose=True, filepath=None)

Tests a boolean DataFrane obtained from the test and prints standard output

Parameters:
  • df_valid (pandas.Series) – series contaning only True or False values for each of the variables, which should be the indexes
  • testname (string) – the name of the test that generated the True/False values
  • failverbose (bool) – whether to return which variables caused a false result
  • passverbose (bool) – whether to print something successful cases
Returns:

  • result (bool) – True if the run passed the passed
  • failed (list) – list of failed variables if result==False. None otherwise.

pymicra.algs.general

pymicra.algs.general.classbin(x, y, bins_number=100, function=<function mean>, xfunction=<function mean>, logscale=True)

Separates x and y inputs into bins based on the x array. x and y do not have to be ordered.

Parameters:
  • x (np.array) – independent variable
  • y (np.array) – dependent variable
  • bins_number (int) – number of classes (or bins) desired
  • function (callable) – funtion to be applied to both x and y-bins in order to smooth the data
  • logscale (boolean) – whether or not to use a log-spaced scale to set the bins
Returns:

  • np.array – x binned
  • np.array – y binned

pymicra.algs.general.diff_central(x, y)

Applies the central finite difference scheme

Parameters:
  • x (array) – independent variable
  • y (array) – dependent variable
Returns:

dydx – the dependent variable differentiated

Return type:

array

pymicra.algs.general.file_len(fname)

Returns length of a file through piping bash’s function wc

Parameters:fname (string) – path of the file
pymicra.algs.general.find_nearest(array, value)

Smart and small function to find the index of the nearest value, in an array, of some other value

Parameters:
  • array (array) – list or array
  • value (float) – value to look for in the array
pymicra.algs.general.fitByDate(data, degree=1, rule=None)

Given a pandas DataFrame with the index as datetime, this routine fit a n-degree polynomial to the dataset

Parameters:
  • data (pd.DataFrame, pd.Series) – dataframe whose columns have to be fitted
  • degree (int) – degree of the polynomial. Default is 1.
  • rule (str) – pandas offside string. Ex.: “10min”.
pymicra.algs.general.fitWrap(x, y, degree=1)

A wrapper to numpy.polyfit and numpy.polyval that fits data given an x and y arrays. This is specifically designed to be used with by pandas.DataFrame.apply method

Parameters:
  • x (array, list) –
  • y (array, list) –
  • degree (int) –
pymicra.algs.general.get_index(x, to_look_for)

Just like the .index method of lists, except it works for multiple values

Parameters:
  • x (list or array) – the main array
  • to_look_for (list or array) – the subset of the main whose indexes are desired
Returns:

indexes – array with the indexes of each element in y

Return type:

np.array

pymicra.algs.general.get_notation(notation_def)

Auxiliar function ro retrieve notation

pymicra.algs.general.inverse_normal_cdf(mu, sigma)

Applied the inverse normal cumulative distribution

mu: mean sigma: standard deviation

pymicra.algs.general.latexify(variables, math_mode=True)
pymicra.algs.general.limitedSubs(data, max_interp=3, func=<function <lambda>>)

Substitute elements for NaNs if a certain conditions given by fund is met at a maximum of max_interp times in a row. If there are more than that number in a row, then they are not substituted.

Parameters:
  • data (pandas.dataframe) – data to be interpolated
  • max_interp (int) – number of maximum NaNs in a row to interpolate
  • func (function) – function of x only that determines the which elements become NaNs. Should return only True or False.
Returns:

df – dataframe with the elements substituted

Return type:

pandas.dataframe

pymicra.algs.general.limited_interpolation(data, maxcount=3)

Interpolates linearly but only if gap is smaller of equal to maxcout

Parameters:
  • data (pandas.DataFrame) – dataset to interpolate
  • maxcount (int) – maximum number of consecutive NaNs to interpolate. If the number is smaller than that, nothing is done with the points.
pymicra.algs.general.line2date(line, dlconfig)

Gets a date from a line of file according to dataloggerConfig object.

Parameters:
  • line (string) – line of file with date inside
  • dlconfig (pymicra.dataloggerConfig) – configuration of the datalogger
Returns:

timestamp

Return type:

datetime object

pymicra.algs.general.mad(data, axis=None)
pymicra.algs.general.name2date(filename, dlconfig)

Gets a date from a the name of the file according to a datalogger config object

Parameters:
  • filename (string) – the (base) name of the file
  • dlconfig (pymicra.dataloggerConfig) – configuration of the datalogger
Returns:

  • cdate (datetime object)
  • Warning: Needs to be optimized in order to read question markers also after the date

pymicra.algs.general.parseDates(data, dataloggerConfig=None, date_col_names=None, clean=True, verbose=False, connector='')

Author: Tomas Chor date: 2015-08-10 This routine parses the date from a pandas DataFrame when it is divided into several columns

Parameters:
  • data (pandas DataFrame) – dataFrame whose dates have to be parsed
  • date_col_names (list) – A list of the names of the columns in which the date is divided the naming of the date columns must be in accordance with the datetime directives, so if the first column is only the year, its name must be %Y and so forth. see https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
  • connector (string) – should be used only when the default connector causes some conflit
  • first_time_skip (int) – the offset (mostly because of the bad converting done by LBA
  • clean (bool) – remove date columns from data after it is introduced as index
Returns:

data indexed by timestamp

Return type:

pandas.DataFrame

pymicra.algs.general.resample(df, rule, how=None, **kwargs)

Extends pandas resample methods to index made of integers

pymicra.algs.general.splitData(data, rule='30min', return_index=False, **kwargs)

Splits a given pandas DataFrame into a series of “rule”-spaced DataFrames

Parameters:
  • data (pandas dataframe) – data to be split
  • rule (str or int) –
    If it is a string, it should be a pandas string offset.
    Some possible values (that should be followed by an integer) are: D calendar day frequency W weekly frequency M month end frequency MS month start frequency Q quarter end frequency BQ business quarter endfrequency QS quarter start frequency A year end frequency AS year start frequency H hourly frequency T minutely frequency Min minutely frequency S secondly frequency L milliseconds U microseconds

    If it is a int, it should be the number of lines desired in each separated piece.

    If it is None, then the dataframe isn’t separated and a list containing only the full dataframe is returned.

    check it complete at http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

pymicra.algs.numeric

pymicra.algs.units

pymicra.algs.units.add(elems, units, inplace_units=False, unitdict=None, key=None)

Add elements considering their units

pymicra.algs.units.convert_cols(data, guide, units, inplace_units=False)

Converts data from one unit to the other

Parameters:
  • data (pandas.DataFrame) – to be chanhed from one unit to the other
  • guide (dict) – {names of columns : units to converted to}
  • units (dict) – units dictionary
  • inplace_units (bool) – if inunit is a dict, the dict is update in place. “key” keyword must be provided
pymicra.algs.units.convert_indexes(data, guide, units, inplace_units=False)

Converts data from one unit to the other

Parameters:
  • data (pandas.Series) – to be chanhed from one unit to the other
  • guide (dict) – {names of columns : units to converted to}
  • units (dict) – units dictionary
  • inplace_units (bool) – if inunit is a dict, the dict is update in place. “key” keyword must be provided
pymicra.algs.units.convert_to(data, inunit, outunit, inplace_units=False, key=None)

Converts data from one unit to the other

Parameters:
  • data (pandas.series) – to be chanhed from one unit to the other
  • inunit (pint.quantity or dict) – unit(s) that the data is in
  • outunit (str) – convert to this unit
  • inplace_units (bool) – if inunit is a dict, the dict is update in place. “key” keyword must be provided
  • key (str) – if inunit is a dict, it is the name of the variable to be changed
pymicra.algs.units.divide(elems, units, inplace_units=False, unitdict=None, key=None)

Divide elements considering their units

pymicra.algs.units.multiply(elems, units, inplace_units=False, unitdict=None, key=None)

Multiply elements considering their units

pymicra.algs.units.operate(elems, units, inplace_units=False, unitdict=None, key=None, operation='+')

Operate on elements considering their units

Parameters:
  • elems (list, tuple) – list of pandas.Series
  • units (list, tuple) – list of pint.units ordered as the elems list
  • inplace_units (bool) – sets dictionary inplace_units
  • unitdict (dict) – dict to be set inplace
  • key (str) – name of variables to be set inplace as dict key
pymicra.algs.units.parseUnits(unitstr)

Gets unit from string, list of strings, or dict’s values, using the UnitRegistry defined in __init__.py

pymicra.algs.units.with_units(data, units)

Wrapper around toUnitsCsv to create a method to print the contents of a dataframe plus its units into a unitsCsv file.

Parameters:
  • self (pandas.DataFrame, pandas.Series) – dataframe or series to which units belong
  • units (dict) – dictionary with the names of each column and their unit

Module contents