About this deal
In [485]: rng_hourly . tz_localize ( "US/Eastern" , ambiguous = "infer" ) Out[485]: DatetimeIndex(['2011-11-06 00:00:00-04:00', '2011-11-06 01:00:00-04:00', '2011-11-06 01:00:00-05:00', '2011-11-06 02:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None) In [486]: rng_hourly . tz_localize ( "US/Eastern" , ambiguous = "NaT" ) Out[486]: DatetimeIndex(['2011-11-06 00:00:00-04:00', 'NaT', 'NaT', '2011-11-06 02:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None) In [487]: rng_hourly . tz_localize ( "US/Eastern" , ambiguous = [ True , True , False , False ]) Out[487]: DatetimeIndex(['2011-11-06 00:00:00-04:00', '2011-11-06 01:00:00-04:00', '2011-11-06 01:00:00-05:00', '2011-11-06 02:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None) Nonexistent times when localizing # ts = pd . Timestamp ( 2022 , 12 , 9 , 15 ) >>> ts + pd . offsets . BusinessDay ( normalize = True ) Timestamp('2022-12-12 00:00:00') TypeError: unbound method holidays() must be called with TradingCalendar instance as first argument (got datetime instance instead) from secondly to every 250 milliseconds In [306]: ts [: 2 ] . resample ( "250L" ) . asfreq () Out[306]: 2012-01-01 00:00:00.000 308.0 2012-01-01 00:00:00.250 NaN 2012-01-01 00:00:00.500 NaN 2012-01-01 00:00:00.750 NaN 2012-01-01 00:00:01.000 204.0 Freq: 250L, dtype: float64 In [307]: ts [: 2 ] . resample ( "250L" ) . ffill () Out[307]: 2012-01-01 00:00:00.000 308 2012-01-01 00:00:00.250 308 2012-01-01 00:00:00.500 308 2012-01-01 00:00:00.750 308 2012-01-01 00:00:01.000 204 Freq: 250L, dtype: int64 In [308]: ts [: 2 ] . resample ( "250L" ) . ffill ( limit = 2 ) Out[308]: 2012-01-01 00:00:00.000 308.0 2012-01-01 00:00:00.250 308.0 2012-01-01 00:00:00.500 308.0 2012-01-01 00:00:00.750 NaN 2012-01-01 00:00:01.000 204.0 Freq: 250L, dtype: float64 Sparse resampling # Holiday: Dr. Martin Luther King Jr. (month=1, day=1, offset=
In [234]: bhour_mon = pd . offsets . CustomBusinessHour ( start = "10:00" , weekmask = "Tue Wed Thu Fri" ) # Monday is skipped because it's a holiday, business hour starts from 10:00 In [235]: dt + bhour_mon * 2 Out[235]: Timestamp('2014-01-21 10:00:00') Offset aliases #There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DatetimeIndex. In certain countries, such as the United States, there are laws ( Uniform Monday Holiday Act of 1968), whose rules are included in Pandas, print(USFederalHolidayCalendar.rules) as an example for developing other calendars. whenever the dob is greater than now. You may want to subtract a few years to now in the condition df['dob'] < now since it may be slightly more likely to have a 101 year old worker than a 1 year old worker...
In [260]: from pandas.tseries.holiday import ( .....: Holiday , .....: USMemorialDay , .....: AbstractHolidayCalendar , .....: nearest_workday , .....: MO , .....: ) .....: In [261]: class ExampleCalendar ( AbstractHolidayCalendar ): .....: rules = [ .....: USMemorialDay , .....: Holiday ( "July 4th" , month = 7 , day = 4 , observance = nearest_workday ), .....: Holiday ( .....: "Columbus Day" , .....: month = 10 , .....: day = 1 , .....: offset = pd . DateOffset ( weekday = MO ( 2 )), .....: ), .....: ] .....: In [262]: cal = ExampleCalendar () In [263]: cal . holidays ( datetime . datetime ( 2012 , 1 , 1 ), datetime . datetime ( 2012 , 12 , 31 )) Out[263]: DatetimeIndex(['2012-05-28', '2012-07-04', '2012-10-08'], dtype='datetime64[ns]', freq=None) hint : In [34]: dates = [ ....: pd . Timestamp ( "2012-05-01" ), ....: pd . Timestamp ( "2012-05-02" ), ....: pd . Timestamp ( "2012-05-03" ), ....: ] ....: In [35]: ts = pd . Series ( np . random . randn ( 3 ), dates ) In [36]: type ( ts . index ) Out[36]: pandas.core.indexes.datetimes.DatetimeIndex In [37]: ts . index Out[37]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None) In [38]: ts Out[38]: 2012-05-01 0.469112 2012-05-02 -0.282863 2012-05-03 -1.509059 dtype: float64 In [39]: periods = [ pd . Period ( "2012-01" ), pd . Period ( "2012-02" ), pd . Period ( "2012-03" )] In [40]: ts = pd . Series ( np . random . randn ( 3 ), periods ) In [41]: type ( ts . index ) Out[41]: pandas.core.indexes.period.PeriodIndex In [42]: ts . index Out[42]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]') In [43]: ts Out[43]: 2012-01 -1.135632 2012-02 1.212112 2012-03 -0.173215 Freq: M, dtype: float64 However, if we want more accuracy, we must consider bank holidays (for example, if we calculate costs that depend on the exact days between coupon and coupon, 1 day off in 20 is a 5% error). Knowing the holidays within a period is especially useful when estimating human habits and behaviors (medical care, travels, etc.). In the insurance sector, these patterns can directly affect accounting reserves; for example, when calculating costs incurred but not reported (IBNR). In fact, in some insurance companies it’s common to slightly increase the claims ratio in leap years due to having one calendar day more than the rest.Date times: A specific date and time with timezone support. Similar to datetime.datetime from the standard library. Return a fixed frequency DatetimeIndex with business day as the default. Parameters : start str or datetime-like, default None is similar to a Timedelta that represents a duration of time but follows specific calendar duration rules. passing ` format ` if your strings have a consistent format ; - passing ` format = 'ISO8601' ` if your strings are all ISO8601 but not necessarily in exactly the same format ; - passing ` format = 'mixed' ` , and the format will be inferred for each element individually . You might want to use ` dayfirst ` alongside this .
Finally, there is the option of writing rules for calculating public holidays (as these are usually carried over to the next working day). In [330]: small = pd . Series ( .....: range ( 6 ), .....: index = pd . to_datetime ( .....: [ .....: "2017-01-01T00:00:00" , .....: "2017-01-01T00:30:00" , .....: "2017-01-01T00:31:00" , .....: "2017-01-01T01:00:00" , .....: "2017-01-01T03:00:00" , .....: "2017-01-01T03:05:00" , .....: ] .....: ), .....: ) .....: In [331]: resampled = small . resample ( "H" ) In [332]: for name , group in resampled : .....: print ( "Group: " , name ) .....: print ( "-" * 27 ) .....: print ( group , end = " \n\n " ) .....: Group: 2017-01-01 00:00:00 --------------------------- 2017 - 01 - 01 00 : 00 : 00 0 2017 - 01 - 01 00 : 30 : 00 1 2017 - 01 - 01 00 : 31 : 00 2 dtype: int64 In [82]: pd . date_range ( start , periods = 1000 , freq = "M" ) Out[82]: DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-30', '2011-05-31', '2011-06-30', '2011-07-31', '2011-08-31', '2011-09-30', '2011-10-31', ... '2093-07-31', '2093-08-31', '2093-09-30', '2093-10-31', '2093-11-30', '2093-12-31', '2094-01-31', '2094-02-28', '2094-03-31', '2094-04-30'], dtype='datetime64[ns]', length=1000, freq='M') In [83]: pd . bdate_range ( start , periods = 250 , freq = "BQS" ) Out[83]: DatetimeIndex(['2011-01-03', '2011-04-01', '2011-07-01', '2011-10-03', '2012-01-02', '2012-04-02', '2012-07-02', '2012-10-01', '2013-01-01', '2013-04-01', ... '2071-01-01', '2071-04-01', '2071-07-01', '2071-10-01', '2072-01-01', '2072-04-01', '2072-07-01', '2072-10-03', '2073-01-02', '2073-04-03'], dtype='datetime64[ns]', length=250, freq='BQS-JAN') will increment datetimes to the same time the next day whether a day represents 23, 24 or 25 hours due to daylightHoliday: New Years Day (month=1, day=1, observance=
command refers to 20 days INCLUDING weekends, but I want it to refer to 20 WEEKDAYS; e.g. something like this: df["window"].loc[beg: beg + pd.to_timedelta(20, "Weekdays_only")] = 2 In [218]: bh = pd . offsets . BusinessHour ( start = "17:00" , end = "09:00" ) In [219]: bh Out[219]: