Category Archives: Uncategorized

Coffee boss day 10 – Neat board and RFID

Been a while! Coffee boss has a great upgrade recently, that’s the current sensor. This, combined with a new firmware that counts the time since the last time the current went low (so signifying that a boil cycle has completed therefore a new pot is ready) means that this is suddenly actually useful.

Pictured here is the various circuits combined on one piece of board. The back is a bit of a rats-nest but this should all fit in a project box reasonably neatly. I’ve made up a power cable with twisted mains cable where the individual conductors can be separated out without having to cut any insulation but I need to get that cleared off with the estates people – I would still prefer a sealed moulded plug on the part that plugs into the coffee machine itself.

The reason why I put the current sensor on this part (near the machine, where there water is) is because the other option is down on the floor, and the wire going to the sensor isn’t very long. That’s still a decent option for a more permanent installation.

Next to it is a new RFID reader, a board made by Elechouse based on a PN532 chip. This is a neat little beast but I’ve had a right battle getting it working. Elechouse have some libraries for Arduino but there are some improved ones here:

This module has a serial, an SPI and an I2C interface, with a little set of dip switches to control which one to use. I tried all kinds of ways and could. not. get. it. working. on the ESP32, only to have it work first time with a regular ATMEGA based Arduino board. Something about the ESP32 perhaps? My circuit? Likely.

Cut a long story short, I have to do a full power cycle of the ESP32 to get the PN532 to initialise properly. Reset won’t do it – presume the reset button only resets the microcontroller rather than actually interrupts the power – and that makes a difference.

Only just got this working this evening, and made up some neat cables. I’ll take it for a test drive tomorrow.

What’s the RFID reader for?

Oh right, yes, it’s so that coffee drinkers can bump their work ID when they have a cup of coffee and the machine will repeat back how many cups they’ve had recently. It’s so they can think about how much they owe. It won’t ever know who they are, but the cards that our IDs are printed onto have unique identifiers so can track you across sessions.

The Connectors Saga

  1. I decided to use Molex Mini-fit jr sockets on this because I had a few sets left-over from Polargraph drawing machines. I never actually used them on the Polargraph and they’ve been burning a hole in my pocket ever since since they are so adorable.
  2. I dismantled the Coffee Boss machine and stripped down the cable ends… And realised that the metal crimp contacts I had to go into the Mini-fit jr plugs were the wrong kind! I’ve got a reel of contacts for “Microlock Plus” system. As soon as I noticed, I remembered kicking myself for making that mistake the first time around. At least one reason why I never used these plugs and sockets.
  3. I ordered some crimp contacts for Mini-fit jr, along with a couple of sets of backshells for the plugs, so they look super pro. Expensive! But worth is for the pro-ness.
  4. When they arrived after the weekend, I crimped a couple onto the current sensor… And realised they wouldn’t fit into the plugs I had. I hadn’t wanted to test it before crimping them in case they got stuck in the plug and I couldn’t get them out.
  5. I realised that my plugs and sockets weren’t Mini-fit jr at all… They were Micro-fit 3.0.
  6. Always had been Micro-fit 3.0.
  7. And a cursory glance at the bags they were stored in confirmed that.
  8. So I ordered some crimp contacts for Micro-fit 3.0. Got them, fitted them tonight, and now we’re cooking.
  9. I wish there were some backshells to be had for the Micro-fit system, they are super pro.

Coffee boss day 9 – A simpler current arrangement and combining sensors

So I made myself a cable like this, with a section of the outer insulation cut away so that I could separate the insulated conductors inside and put the clamp around just one of them.

When I tested it with the plain, unmodified coffee machine and found that it was quite easy to tell when the heater was running and when the PTC hotplates were on. Because of that I’m not going to bother with any internal interventions inside the machine at all.

I used emonlib (https://github.com/openenergymonitor/EmonLib) and the plain current_only.ino example. Worked great, and I noticed a few interesting things. The clamp can be calibrated to give actual representative current values but I don’t care about that, I just need any old numbers so I went with the standard calibration numbers that were in the code.

  • Bottom PTC – peaked at 20,000 for less than a second when it turns on, then settles down to 2,500.
  • Top PTC – peaked at 19,000 and settled at 4,000.
  • Heater – immediate goes to about 52,000

So I think I can simply set a threshold around 40,000 and expect that breaking this threshold means that the heater is on. There’s no ambiguity about the hotplates and the heater. I’m going to make a cable with separate conductors that won’t terrify the electricians at work, and just use that.

I’ve just committed https://github.com/euphy/coffee_boss/blob/master/coffee_boss/coffee_boss.ino which is the code to read all these things:

  • DS3231 Realtime clock on the i2c buss (SDA is 21, SCL is 22)
  • load cells with the HX711 ADC on pins 4 and 15
  • VCNL4010 proximity sensor on the i2c buss
  • current sensor on pin 14

And the breadboard looks like this:

Breadboard with sensors on it

Next up is to make a little mount for the proximity sensor that holds it near the carafe.

Coffee boss day 8 – How to sense the current

Doing a bit more digging about the current clamp tells me something I realised was obvious in retrospect – the clamp needs to be around a live wire on it’s own. Clamping it over the multi-core mains cable won’t work because the EM field from the live wire and the return path through the neutral wire cancels each other out.

That’s why the clamp last night (and today, with the proper value components around it) shows no change whether the heater is on or not. OK. There’s some high-end sensors that can sense current in multi-conductor cables but the only accessible one I can see is this Modern Device one (https://moderndevice.com/product/current-sensor/). It looks like it’ll work great and isn’t expensive ($14), but it’s not a nice tidy clamp.

Options:

  1. Build a device that can be placed inline with the power cable that will allow the sensor access to only one of the wires in the cable. This could just be a customised power cable, quite easy to make but liable to raise an eyebrow from the estates people at work. This means having to understand the difference between the heater running and the hotplates.
  2. Mount the current clamp inside the coffee machine, over the wire that goes to the heater. This is non-invasive-ish, but I’m a little nervous about proximity to the heater and also having trailing wires hanging out of the coffee machine. This would give a nice unambiguous and accurate signal of when the heater is running though.
  3. Tap or sense the wire that the float switch is on. The float switch is in the water reservoir and is what triggers the heater to run. This may well be low voltage, and DC so I think a current sense clamp won’t make anything of it. The cable is terminated with plain plugs and sockets that look fairly standard so I could just make a special cable that has a sensor on it to pull a pin high or low when it’s closed or opened.
    I still don’t like the idea of having cables trailing out of the coffee machine. Perhaps I can make something discrete in the base – there is plenty of space, and routing for a cable out the bottom (the cable for the hotplate uses it). Maybe even some pogo pins on the bottom that would integrate with a little board mounted on the scales that the whole machine sits on.

I think option 3 is my favourite. It’s likely to avoid working with high voltage or current and that cable assemble looks easy to do something with in a reversible way. I need to see how to use a pin to sense the open or closedness of that switch. I feel like that should be easy.

Some machine parts

Docs for parts: https://doczz.pl/doc/430837/bravilor-bonamat here’s an excerpt that mentions the parts for the Novo:

  • PCB is 402650 or 6.101.153.000 – keypad PCB L 82mm W 65mm countries GB/IRL buttons: 2
  • Float switch is 347388 or 6.101.071.000 – magnetic switch NO connection plug-in connection cable length 160mm L 45mm mounting ø 6mm – what’s that connector on the end?
  • Heater is 417420 or 6.101.061.000 – flow heater 2000W 230V ø 58mm H 120mm connection male faston 6.3mm

I can’t find any description of the connector on the float switch. It looks like a KK-396 housing https://uk.rs-online.com/web/p/pcb-connector-housings/6795066/ except measuring the pins looks like the pitch is 3.6 or 3.7mm rather than 3.96mm.

Coffee boss day 7 – Moar sensorz

Those last graphs were nice but I don’t think they tell me enough. The graphs and the analyses make intuitive sense to the eye but they’re still disconnected enough from concrete reality to make it hard to put the numbers into context. I’ve run up against the limit for what my simple-minded analytics can do for me. I could learn more analytic skills, you know like nunchuck skills, bowhunting skills, computer hacking skills. But I won’t, I’m just going to brute force it with more sensors.

I’ve got an AC current sensor (SCT-013-000 Non-invasive AC Current Sensor Clamp Sensor 100A High QUALITY!!!) and an Adafruit VCNL4010 proximity sensor.

Inside the coffee machine

The coffee machine has a float switch in the water reservoir that it triggered as long as there’s water left in. I don’t really know how the heater/pump works or even if there is a pump there or if it’s some kind of expansion thing powered by the heater on it’s own. I looked inside:

And I’m not sure what to make of that, doesn’t look like a mechanical pump but water goes in the bottom that is discoloured from heat, and comes out the top tube going up to the sprinkler-head.

There’s two wires running from the controller mechanism on the left to the heater capsule, and they look the same as the ones leading from the main 240v input connector (bottom left) as well as sharing the same connector, so I’m going to surmise that this is a 240V AC heater.

Searching more, I find this: https://www.gastroparts.com/en/part-113792 which is a “flow heater”, 2160W, 240V. Power divided by voltage gives current so 2160W / 240V = 9A.

The current sensing clamp

  • https://learn.openenergymonitor.org/electricity-monitoring/ct-sensors/interface-with-arduino
  • https://olimex.wordpress.com/2015/09/29/energy-monitoring-with-arduino-and-current-clamp-sensor/

Ideally I’ll put the current clamp on those two wires so it only senses when the heater is running. I’d prefer not to have something mounted inside the machine since I’ll get the blame if the office burns down.

So I’ll test this system with the clamp fitted on the main power cable instead, and see how hard it is to recognise the “heater running” signal from the “hotplate” on signal. If it’s obvious, then that’s ideal. If it is indistinguishable then I’ll try it inside.

Looking at the link above, I should do some sums to figure out how to wire up the current sensor. I don’t really understand this so will be Just Doing As I’m Told.

Primary peak-current = RMS current × ?2 = 13A × 1.414 = 18.3A. 

I picked 13A as I wanted to leave a little headroom over the current that the heater would draw in case the hotplate is on too. I can’t find any spares for that bit yet so don’t know what that’ll draw.

Secondary peak-current = Primary peak-current / no. of turns = 18.3A / 2000 = 0.009191A

Ideal burden resistance = (AREF/2) / Secondary peak-current = 1.65 V / 0.009191A = 179.52 ?

I need a 180ish ohm burden resistor and a 10uF capacitor. I’ll get those from the workshop tomorrow.

I wired this up and found that there was a constant stream of numbers coming out, between 200 and 400, but turning the load on or off didn’t make any difference. In fact it’s the same whether it’s clamped over a cable or not, so I think this is just electrical noise. Boo.

Update! With the cap and burden resistor… Exact same response as last night ie fairly stable number coming out of this sensor, but no relation to whether the device is on or not. So… I was testing it with the clamp on the mains lead of a heater. It needs to be on just one wire because both wires cancel each other out. I’m going to have to build an intercept box for the power cable to go mount externally, or mount the clamp inside the coffee machine, on one of the cables near the heater. OK – that’s some more useful information. I wonder if there’s a way to get the signal out of the machine without trailing wires out of it? I don’t want it to fail it’s PAT test and inspection.

VCNL4010 Proximity sensor

This is a short-range IR-based proximity sensor chip mounted on a neat little breakout board. It uses i2c but there’s also an adafruit library for it so I don’t need to cry about the bus.

I’m planning to mount this on a little plate directly on the coffee machine behind the collar of the carafe so that it can be used to sense the presence or absence of the coffee jug.

Adafruit libraries

Adafruit have info about this module here https://learn.adafruit.com/using-vcnl4010-proximity-sensor/overview and docs for the arduino library here: https://adafruit.github.io/Adafruit_VCNL4010/class_adafruit___v_c_n_l4010.html. I tried the standard vcnl4010test.ino example that came with the Adafruit_VCNL4010 library and found that the measurement maximum distance was about 45mm which is _just_ too short for my application. It is sensitive up close (5-20mm) but tails off the further away you get.

I noticed from the docs though that I could turn up the power on the LEDs in the proximity sensors (it uses a time-of-flight thing!) so I turned that up to maximum (indicating 200mA). Made no difference.

I could still mount this sensor on a little tower so it is closer to the carafe ring. It would be fairly well protected from bumps but I did want to keep this quite compact and unobstrusive. Is one of those ultrasonic detectors the answer? I think I have one in that kit that Euan gave me that time. I’ll check tomorrow.

Coffee Boss day 6.5: Combining a scatter with a line

So based on what I spotted in the source code of matplotlib’s _axes.py (https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L1469-L1493):

    def plot(self, *args, scalex=True, scaley=True, data=None, **kwargs):
        """
        Plot y versus x as lines and/or markers.

        Call signatures::

            plot([x], y, [fmt], *, data=None, **kwargs)
            plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)

        The coordinates of the points or line nodes are given by *x*, *y*.

        The optional parameter *fmt* is a convenient way for defining basic
        formatting like color, marker and linestyle. It's a shortcut string
        notation described in the *Notes* section below.

        >>> plot(x, y)        # plot x and y using default line style and color
        >>> plot(x, y, 'bo')  # plot x and y using blue circle markers
        >>> plot(y)           # plot y using x as index array 0..N-1
        >>> plot(y, 'r+')     # ditto, but with red plusses

I saw that I could use my axe to simply plot the positions of the indicators on in two dimensions. I got this:

Which is pretty much perfectly what I want right now. I did some fairly dirty mucking around with the data to get it to do this, essentially looking for where the row-to-row weight difference crosses a threshold from low-to-high.

# median filter with a rolling window: low pass filter
df['rolling4'] = df['weight'].rolling(4).median()

# normalise by looking for difference over 8 samples
df['diff'] = df['rolling4'].diff(periods=-8)

# Tag with True where the change is over 300g
threshold = 300.0
df['thresholded'] = (df['diff'] > threshold)

# Produce 'highlight' boolean where the threshold is True, AND
# the threshold for the previous row was False. This feels pretty clunky.
df['highlight'] = (df['thresholded'] == True) & (df['thresholded'].shift(1) == False)

# Now create a new dataframe with just the highlights in, and only the interesting columns
highlights = df[df['highlight']][['datetime', 'rolling4']]

That’s good isn’t it?

Coffee Boss day 6: Horizontal line

I’ve been trying to get a horizontal line to show a threshold. It never worked. It gave me a mean-spirited error message that I couldn’t understand. I spent the last few days trying. I got this one:

Traceback (most recent call last):
  File "C:/Users/sandy_000/PycharmProjects/coffee_boss/viz/viz.py", line 65, in <module>
    df.plot(y=['diff'], ax=ax3)
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\pandas\plotting\_core.py", line 794, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\pandas\plotting\_matplotlib\__init__.py", line 62, in plot
    plot_obj.generate()
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 284, in generate
    self._adorn_subplots()
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 472, in _adorn_subplots
    sharey=self.sharey,
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\pandas\plotting\_matplotlib\tools.py", line 316, in _handle_shared_axes
    _remove_labels_from_axis(ax.xaxis)
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\pandas\plotting\_matplotlib\tools.py", line 281, in _remove_labels_from_axis
    for t in axis.get_majorticklabels():
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\matplotlib\axis.py", line 1252, in get_majorticklabels
    ticks = self.get_major_ticks()
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\matplotlib\axis.py", line 1407, in get_major_ticks
    numticks = len(self.get_majorticklocs())
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\matplotlib\axis.py", line 1324, in get_majorticklocs
    return self.major.locator()
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\matplotlib\dates.py", line 1431, in __call__
    self.refresh()
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\matplotlib\dates.py", line 1451, in refresh
    dmin, dmax = self.viewlim_to_dt()
  File "C:\Users\sandy_000\venv\coffee_boss\lib\site-packages\matplotlib\dates.py", line 1202, in viewlim_to_dt
    .format(vmin))
ValueError: view limit minimum 0.0 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units

So what I did was change

ax3.axhline(threshold, linewidth=1, color='r')

df.plot(y=[small_window['name'], 'thresholded'], secondary_y=['thresholded'], ax=ax1)
df.plot(y=[small_window['name']], ax=ax2)
df.plot(y=['diff'], ax=ax3)

To

df.plot(y=[small_window['name'], 'thresholded'], secondary_y=['thresholded'], ax=ax1)
df.plot(y=[small_window['name']], ax=ax2)
df.plot(y=['diff'], ax=ax3)

ax3.axhline(threshold, linewidth=1, color='r')

Yes. Same, but the hline happens after the plot. OK, I can make the intuitive leap for why this works and not be cross about it, but I wish I’d tried this a week ago.

Coffee Boss day 5: Further adventures

I’ve been doing a bit more work which is about:

Analysis

Finding an algorithm or treatment (I don’t know what the right word is… an analysis?) that will isolate significant changes to the weight of the machine,

  1. applying a high-cut filter using a median filter in a rolling window,
  2. then producing a diff to create something normalised,
  3. then thresholding that to produce some binary output
column_names = ['datestamp', 'date', 'time', 'weight']
df = read_csv('../output/datr20190911.csv', names=column_names, parse_dates=True, infer_datetime_format=True)
df = df.append(read_csv('../output/datr20190912.csv', names=column_names, parse_dates=True, infer_datetime_format=True))
df = df.append(read_csv('../output/datr20190913.csv', names=column_names, parse_dates=True, infer_datetime_format=True))
df = df.append(read_csv('../output/datr20190914.csv', names=column_names, parse_dates=True, infer_datetime_format=True))

df['datetime'] = pd.to_datetime(df['datestamp'])
df.set_index('datetime')
del df['datestamp']
del df['time']
del df['date']

small_window = {'name': 'rolling4', 'size': 4}
df[small_window['name']] = df['weight'].rolling(small_window['size']).median()

df['diff'] = df[small_window['name']].diff(periods=8)

threshold = 600.0
df['thresholded'] = (df['diff'] > threshold) * 1

Displaying it to check the analysis

Find how to display the charts in a way that lets me see what I’m actually doing. This has evolved into what’s below.

fig1, (ax1, ax2, ax3) = plt.subplots(nrows=3, ncols=1, sharex='all')

ax1.grid(b=True, which='major', color='#666666', linestyle='-')
ax1.minorticks_on()
ax1.grid(b=True, which='minor', color='#999999', linestyle='-', alpha=0.2)

ax2.grid(b=True, which='major', color='#666666', linestyle='-')
ax2.minorticks_on()
ax2.grid(b=True, which='minor', color='#999999', linestyle='-', alpha=0.2)

ax3.grid(b=True, which='major', color='#666666', linestyle='-')
ax3.minorticks_on()
ax3.grid(b=True, which='minor', color='#999999', linestyle='-', alpha=0.2)


df.plot(x='datetime', y=[small_window['name'], 'thresholded'], secondary_y=['thresholded'], ax=ax1)
df.plot(x='datetime', y=[small_window['name']], ax=ax2, figsize=(8, 8))
df.plot(x='datetime', y=['diff'], ax=ax3)

plt.tight_layout()
plt.show()

Coffee Boss day 4: What am I actually trying to do

Matplotlib and pandas have a couple of fundamental principles that I’m not getting. There seems to be an odd mix of global and specific commands that go into expelling a graph and I’m not seeing the link.

Naturally this is causing me to bump into some awkward questions, the main one being “what am I actually trying to do?”. I thought this was simple, but it’s not quite. I sketched the following manually as capturing what I’d like:

This chart shows the features I think I need to gather:

  1. Weight of each cup of coffee. The height of the grey boxes show this. This can be recognised by seeing a rapid drop in weight where the size of the drop is greater than can be explained by evaporation. I want to know this so that I can see the variance between the biggest and the smallest cups. Everyone pays the same.
  2. Freshness of the pot of coffee (time since last pot). The first vertical line shows the start of a new pot. I can intuitively recognise this point as being where there is a sudden increase in weight of about 2kg. This is obvious in some cases (like the end of the figure below) where the weight is low and rapidly increases.
    It is less obvious in the refill from the beginning of the figure below, where the weight beforehand was high too so there isn’t that clear jump from very low to very high. I assume in this case, there was already a spare pot of water on top of the machine waiting to be used, and so the weight drop (that is visible) only lasts the time between picking the pot up and pouring it into the machine.
  3. Number of cups in each pot. This is a simple count of the number of events recognised in 1. I can’t see any way to determine if the last small drop before a refill is a cupful (ie someone’s taking it) or if it’s just waste. A combination of age of coffee and size of cup may form a heuristic for that but I don’t know how to gather the data from the scales alone. I might add a button on the touchscreen for “discarded waste/reset pot”.

The width (or length) of the grey boxes is interesting (indicating the time between cups), but I’ve got no direct need for that data yet.

Make it time-series

Right now, the data is arranged in time sequence, and has a fixed sampling frequency, so it is a complete time-series. However, pandas doesn’t know that yet, the labels for time are just strings. I’ll make it into a true time-series because Pandas has a bunch of specific tools for working with time-series data (including resampling and how I specify the size of windows in seconds rather than samples) AND I want to be able to combine multiple days into one stream of data.

Remember to that the first tutes assumed I was converting to datetimes during import using parse_dates=True in the read_csv(...). That never worked for me, and I got errors I didn’t understand. Use df.info() to check whether the conversion had worked properly, it now looks like:

column_names = ['datestamp', 'date', 'time', 'weight']
df = read_csv('../output/datr20190923.csv', names=column_names, parse_dates=True, infer_datetime_format=True)
df['datetime'] = pd.to_datetime(df['datestamp'])
df.index = df['datetime']
del df['datestamp']
del df['time']
del df['date']
print(df.info())

and gives me:

[41141 rows x 5 columns]
 
 RangeIndex: 41141 entries, 0 to 41140
 Data columns (total 5 columns):
 weight        41141 non-null float64
 datetime      41141 non-null datetime64[ns]
 rolling4      41138 non-null float64
 rolling36     41106 non-null float64
 pct_change    41105 non-null float64
 dtypes: datetime64ns, float64(4)
 memory usage: 1.6 MB

When I run it. There’s a datetime64 object in there which is good! I wonder why it didn’t work last week? Furthermore,

data = pd.Series(df['pct_change'])
print(data)

Now gives me a time-indexed series:

datetime
 2019-09-23 00:00:01         NaN
 2019-09-23 00:00:03         NaN
 2019-09-23 00:00:05         NaN
 2019-09-23 00:00:07         NaN
 2019-09-23 00:00:09         NaN
                          …   
 2019-09-23 23:59:51    0.000046
 2019-09-23 23:59:53    0.000000
 2019-09-23 23:59:55    0.000043
 2019-09-23 23:59:57   -0.000043
 2019-09-23 23:59:59    0.000000
 Name: pct_change, Length: 41141, dtype: float64

Coffee Boss day 3: Looking for events

I don’t know how to do this bit. Not I don’t know technically, I mean I have no awareness of the nature of the tools and practice to look for events in a data stream, catergorise them, and present them.

My opening gambit is:

  1. Look through each weight sample, comparing it to the last (or the last few).
  2. If the current value is higher or lower (over a certain threshold) than it was, then:
  3. Record this as a significant event by putting it into another list with the same timestamp (events)
  4. Combine the events stream with the main data frame
  5. Present the raw weights data in a graph, and:
  6. Show the events overlaid

I can iterate through each row just using iterators and python loops, but that feels like a pandas anti-pattern. From reading around (how do I even describe this problem for google?), it seems like it’s best to do things in pandas en masse rather than by examining each record individually. I think that’s what pandas does.

df['pct_change'] = df[large_window['name']].pct_change()
df.plot(x='time', y=[large_window['name'], 'pct_change'], secondary_y=['pct_change'])

That’s a bit like what I’m looking for. The pct_change (percent change: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.pct_change.html?highlight=pct_change#pandas.Series.pct_change) will detect the scale of changes. It’ll hover around 0, but you can see where the big jumps are, then the percent change is also big.

This also uses the second_y kwarg which seems barely documented and most guides suggest a different approach. Here is something: https://stackoverflow.com/questions/29685887/secondary-y-true-changes-x-axis-in-pandas.

A negative percent change means the coffee machine is lighter (ie the pot is lifted or a cup is taken). A positive percent change means the machine got heavier (ie pot replaced or water refilled).

I can look through those percent changes and spot ones bigger than [a certain value], and mark those cases on the plot or save them out somehow for further analysis.

Coffee Boss day 2: Doing some work.

After PUBG last night, I spent a few hours trying to understand pandas and matplotlib. At 1:30am I called it a day. 10 hours into this task.

Looking at: https://machinelearningmastery.com/time-series-data-visualization-with-python/ And Julia Evans’ https://github.com/jvns/pandas-cookbook.

Data description

The machine logs into two files:

  • datr<date>.csv which is a regular log of raw measurements from the scale. The frequency of this logging is set using the regularLogInterval variable in coffee_boss.ino. These values are unfiltered, and contain all of the weird noise that this circuit collects. They are, however, a true time-series.
    The R in datr stands for Regular. But Raw works too. These files end up big (~3Mb).
  • datc<date>.csv which is a log of changes greater than a particular threshold which is designed to be just about the smallest thing that can happen with the machine. That threshold is specified in the changeThreshold variable in coffee_boss.ino. It’s currently 30, so this file will only log changes greater than 30 grams. This stream of measurements is also filtered, being a running median of the last 8 raw measurements. This filters out almost all of the noise. It uses the RunningMedian library for this. This is not a true time series, since the logging frequency is not constant.
    The C in datc stands for Change. I wish I’d thought of better names. These files are only a couple of Kb in size.

Both log files have the same format. CSVs with four columns:

  • datetime (%Y-%m-%dT%H:%M:%S)
  • date (%Y-%m-%d)
  • time (%H:%M:%S)
  • weight (float with 2 decimal places)

You can find some examples of these files in https://github.com/euphy/coffee_boss/tree/master/output. They look like this:

2019-09-26T02:57:53,2019-09-26,02:57:53,2249.48
2019-09-26T03:07:55,2019-09-26,03:07:55,2221.33
2019-09-26T03:08:51,2019-09-26,03:08:51,2119.35
2019-09-26T03:08:51,2019-09-26,03:08:51,1886.68
2019-09-26T03:08:51,2019-09-26,03:08:51,1895.43

I want to see this as a line graph with time along the X axis stretching from left to right, and weight on the Y-axis, bottom to top. Doing this with the datR files would be easiest, because they are naturally already time-series data, but they give a pretty awful output because they are so noisy. I’d have to do some filtering on it in python. That’s not such a bad idea.

The datC files are already filtered, but they are not in time series, so I have to either:

  • use the datR files and figure out how to filter the noise out of them. This seems like pandas work. I don’t know how to use pandas.
    Or
  • use the datC files and figure out how to present them as time series – interpolation of missing points perhaps, or some other built-in way to do this in matplotlib. I don’t know how to use matplotlib.

Simple plot with pandas and matplotlib

I’ve started with the datC approach:

from pandas import read_csv
from matplotlib import pyplot

column_names = ['datestamp', 'date','time','weight']
series = read_csv('../output/datc20190926.csv', names=column_names)
series.plot()
pyplot.show()

Which renders a nice graph:

This shows what I expect, in a way. Two pots of coffee made, with about six cups being taken from each one.

There’s something wrong though. The first third is all a bit messy. Not sure what’s happening there, so well look at the time, and hm, there isn’t even a time notes. The x-axis is a count of samples, not a time series. I can’t tell if that first disorganised section is an hour or nine hours. I can’t tell if the first pot of coffee was drank in an hour or in twelve. Given that this data covers a full day (from midnight to midnight), it looks like one pot of coffee lasted all day, and there was another one made late at night (7pm maybe?).

Simple plot of raw, time-series data

Lets try with the raw data (output/datr20190926.csv):

Ok that’s no better. There’s a few bad samples in there that has obscured the good samples. Can I filter it? It still hasn’t got the right X-axis either, measuring samples rather than time.

Filter and remove out-of-range samples

Filtering out values that I _know_ are bad will help and is easy. See https://stackoverflow.com/questions/29594841/how-to-filter-out-values-by-values-in-pandas-dataframe.

from pandas import read_csv
from matplotlib import pyplot

column_names = ['datestamp', 'date', 'time', 'weight']
series = read_csv('../output/datr20190926.csv', names=column_names)
series = series[(series['weight'] > -2000) & (series['weight'] < 10000 )]
series.plot()
pyplot.show()

Which filters out weights less than -2000g and more than 10000g. This is better because I can see the overall shape of the values and the positions in the overall body of samples (across the whole day). I can see that the cups being taken from each pot are not regularly spaced.

But there’s still a lot of noise that doesn’t hit those thresholds, and importantly, this approach simply throws away the samples that are outside the bounds. So that means there is a time gap at those points, and if enough of them happen (I can only see two here), then the time-series is discontinuous.

So I think the approach is not to filter out and discard bad values, it is use to use the source log data to produce an entirely new stream of weight values using something like a moving window of averaging. That’s how the firmware does it and that gives a decent result.

Use a rolling() sample window to remove noise

There is a rolling_mean() function in pandas: https://www.programcreek.com/python/example/101378/pandas.rolling_mean. Oh it’s deprecated. And I don’t want a mean anyway, means are rubbish, I want the median.
https://stackoverflow.com/questions/43437657/rolling-mean-on-pandas-on-a-specific-column describes using pandas.rolling().mean() instead, which is closer. I assume there’s a .median() function too.

from pandas import read_csv
from matplotlib import pyplot

column_names = ['datestamp', 'date', 'time', 'weight']
series = read_csv('../output/datr20190926.csv', names=column_names)
series['rolling'] = series['weight'].rolling(8).mean()
series.plot()
pyplot.show()

That’s a bit more like it. It proves that mean() isn’t the right one (mean averages are very affected by outliers). So now I need to figure out how to only plot rolling rather than rolling and the raw weight. I think this is a matplotlib issue. Well it is, and it isn’t: http://jonathansoma.com/lede/algorithms-2017/classes/fuzziness-matplotlib/understand-df-plot-in-pandas/

Using rolling().median()

from pandas import read_csv
from matplotlib import pyplot

column_names = ['datestamp', 'date', 'time', 'weight']
series = read_csv('../output/datr20190926.csv', names=column_names)
series['rolling'] = series['weight'].rolling(8).median()
series.plot(x='time', y='rolling')

# Rotate x ticks and tight_layout fits it all on the page
pyplot.xticks(rotation='vertical')
pyplot.tight_layout()
pyplot.show()

That’s more like it. It uses .median() instead of mean to deal better with outliers, and also I figured out series.plot(x='time', y='rolling') to specify which axes to use, rotate the time ticks so they don’t overlap, and tight_layout()‘d it so they didn’t fall off the bottom of the page.

This calculates the median value in a rolling window of 8 samples, so that’s about sixteen seconds. I’m interested in filtering out some more of the jaggies, and would like to see the results of a few different versions plotted together. Doing like below was a guess and it worked. It makes me start to think I’m getting a bit of a clue on how to use this toolset. Four hours in.

See the effects of different sized rolling windows

series['rolling4'] = series['weight'].rolling(4).median()
series['rolling8'] = series['weight'].rolling(8).median()
series['rolling12'] = series['weight'].rolling(12).median()
series['rolling24'] = series['weight'].rolling(24).median()
series.plot(x='time', y=['rolling4', 'rolling8', 'rolling12', 'rolling24'])

What was that weird blip at 3am? So I know that 24 samples covers 48 seconds of activity and filters out almost all variance. It shows the gulp-gulp-gulp of the coffee cups down (8:45 to 11:41) which is really cool and really clear. But I realised that’s not quite what I thought I was looking for.

This tells a story and at first I thought it filtered out too much. It doesn’t show the sequence of events of lifting the coffeepot out->returning the pot and it’s a bit lighter. Because the pot is out for a short period of time it just disappears. I thought that this little signature would be important to be able to recognise a cup of coffee being taken.

In fact, I think the heavily-filtered plot, with the simple chunky downward steps tells the story more clearly. If I could spot a drop of around a cup, then that’s simple.

I’ve just realised that the matplotlib viewer that pops up when I do pyplot.show() is really good. It does a zoom-in on a section! That’s exactly what I wanted excel to do for me. Excel really is the wrong tool for this job.

Adding more ticks to the x-axis

It does not show me very good ticks on the x-axis though, there aren’t enough. I’ll need to add some more.

from pandas import read_csv
from matplotlib import pyplot

column_names = ['datestamp', 'date', 'time', 'weight']
series = read_csv('../output/datr20190926.csv', names=column_names)
small_window = ('rolling4', 4)
large_window = ('rolling36', 36)

series[small_window[0]] = series['weight'].rolling(small_window[1]).median()
series[large_window[0]] = series['weight'].rolling(large_window[1]).median()

series.plot(x='time', y=[small_window[0], large_window[0]])
series.set_index('time')

count = series['time'].count()
no_of_ticks = 24
size_of_segment = int(count / no_of_ticks)

# two lists, one is a list of indices regularly spread out across the series
indices = list()
tick_labels = list()
for i in range(0, no_of_ticks):
    position = (size_of_segment * i) + 42/2
    indices.append(position)
    tick_labels.append(series['time'][position])

pyplot.grid(True, which='major')
pyplot.xticks(labels=tick_labels, ticks=indices, rotation='vertical')
pyplot.tight_layout()
pyplot.show()

That seems like a clunky way to do it. There must be a better one! That’s enough for today. Six hours work. Bit of Overwatch before bed.