Ticket #1098 (new)

Opened 12 years ago

Last modified 12 years ago

Shell script for telemetry lightning

Reported by: Musti Owned by: Musti
Priority: normal Milestone:
Component: telemetry Version:
Keywords: Cc: DustWolf
Related nodes: Realization state:
Blocking: 893 Effort: normal
Blocked by: Security sensitive: no

Description

A shell script with two functions is required to process data from a lightning detector, it is to be called by cron.

  • Data acquisition function

Calling readsensor -d /dev/ttyATH0 -i 3 -t 1000 returns the local sensor time since local epoch. Correlate to actual time.
Calling readsensor -d /dev/ttyATH0 -i 1 -t 1000returns the database of lightning strikes, entries separated by commas, entry structure as follows: <time> <type> <distance> <power>

Sample input string:

32824 1 31,32828 2 24,32870 1 24,32873 1 24,33229 1 24,

A CSV log of events is to be made in /tmp/

  • Statistics function

Returns comma separated data formed from the log since the last call of this function.

  • Number of events for types 1 and 2
  • Average distance
  • Average power

Attachments

try1.sh (1.1 KB) - added by DustWolf 12 years ago.
Koda za bash skripto
telemetry-stats.sh (892 bytes) - added by ziga.z 12 years ago.
log.csv (219 bytes) - added by ziga.z 12 years ago.

Change History

comment:1 Changed 12 years ago by DustWolf

For correlating the sensor time to actual time, assuming sensor seconds are still real seconds and the epoch is all that differs (and that the shell knows the proper epoch), you would want to use:

readsensor -d /dev/ttyATH0 -i 3 -t 1000 | expr $(date +%s) - $(cat -) > /tmp/time_offset

Again assuming I got the math the right way around, you would add this time_offset to the sensor time, to get time since the proper epoch. You will want to write this value to a file and reuse that, because I think this function produces a small error, depending on execution time and the last thing you want is to have a small variable error. Writing the value to a file, assuming the sensor epoch doesn't change, will produce a small constant error you can probably ignore.

comment:2 Changed 12 years ago by DustWolf

  • Cc DustWolf added

comment:3 Changed 12 years ago by DustWolf

Haven't finished adding the statistics code, ran out of time. :)

This produces the CSV log. Can be run multiple times to update the log.

DATA="32824 1 31,32828 2 24,32870 1 24,32873 1 24,33229 1 24,"
#DATA=$(readsensor -d /dev/ttyATH0 -i 1 -t 1000)
echo $DATA | tr ',' '\n' | while read row; do

 #FOR EACH COMMA SEPARATED BLOCK, EXCEPT THE LAST BLANK ONE
 if [ -n "$row" ]; then

  #PARSE DATA
  TIME=$(echo $row | cut -d ' ' -f1 | expr $(cat -) + $(cat /tmp/time_offset))
  TYPE=$(echo $row | cut -d ' ' -f2)
  DISTANCE=$(echo $row | cut -d ' ' -f3)
  POWER=$(echo $row | cut -d ' ' -f4)

  #DEBUGGING OUTPUT
  echo $TIME,$TYPE,$DISTANCE,$POWER

  #WRITING CSV LOG (appending)
  echo $TIME,$TYPE,$DISTANCE,$POWER >> /tmp/log.csv
 fi
done

Please note that this code does not output the Power variable in the output correctly, because it appears to be missing from the example.

comment:4 Changed 12 years ago by DustWolf

What is this for, by the way? Why not use RRDtool instead of making your own statistics, etc?

comment:5 follow-up: ↓ 6 Changed 12 years ago by Musti

Cool, nice work. I will give it a try tomorrow.

This is to be a part of the nodewatcher module. While the design is over simplified it will be effective. Nodewatcher pools all the nodes in the network for information. Assuming there is only a single system pooling all the nodes, the data returned from the last pooling till present is represented by this statistics. Far from perfect, but suitable at least for the proof-of-concept and basic sensor testing. RRDtool is overkill for this purpose.

Sorry, but I have poorly defined some things in the description:

  • Time is in the 100ms intervals and wraps around at 16bit value thus 65535
  • Power has not been implemented in the example but assume the value is there. I will modify the system to return a 0 value of it for now.

comment:6 in reply to: ↑ 5 Changed 12 years ago by mitar

Replying to Musti:

Assuming there is only a single system pooling all the nodes,

This is dangerous assumption. We definitely do not want this in the long term (we want redundancy and multiple nodewatchers) and it means that every time somebody manually open nodewatcher output (for example to debug) it will confuse the results.

We should not assume this. Please don't. Make the script which does not assume this.

RRDtool is overkill for this purpose.

We do not want to run RRDtool on the nodes. :-)

comment:7 Changed 12 years ago by Musti

Lets go with the dangerous for now, I need this urgently. Then we can implement a better solution.

comment:8 follow-up: ↓ 9 Changed 12 years ago by mitar

I do not really understand where is this assumption used? So why calling the script multiple times returns an error?

comment:9 in reply to: ↑ 8 ; follow-up: ↓ 13 Changed 12 years ago by DustWolf

Replying to mitar:

I do not really understand where is this assumption used? So why calling the script multiple times returns an error?

If I understand this correctly, since the statistics script "Returns comma separated data formed from the log since the last call of this function." The script must thus record an offset / clear a log, either way running it twice will not and should not return the same data. Unless we are talking about something else.

Replying to Musti:

  • Time is in the 100ms intervals and wraps around at 16bit value thus 65535

This will be fun to implement since this also means there is a possibility for it to wrap around in the middle of a single reported sequence, if I understand this correctly? I don't think I can make it handle more than once, unless some recursion wizard can make it work like that. In Bash.

Last edited 12 years ago by DustWolf (previous) (diff)

comment:10 follow-up: ↓ 11 Changed 12 years ago by Musti

Time wrap around is 100min which is much more then 5min interval at which the script is run by cron. So max 1 wrap around per single reported sequence.

Changed 12 years ago by DustWolf

Koda za bash skripto

comment:11 in reply to: ↑ 10 Changed 12 years ago by DustWolf

Replying to Musti:

Time wrap around is 100min which is much more then 5min interval at which the script is run by cron. So max 1 wrap around per single reported sequence.

I've managed to make it so that it can do any number of warparounds in a single sequence – always better not to have any potential bugs in your code, you never know.

The file attached is the script, not yet finished again, still need to make it into 100ms instead of 1s.

Should the statistics code be independent of this code or should it provide an average every time the sensor data is downloaded? Doing it together is much easier, but if you want to download sensor data every 5 min and then do the statistics hourly, that will have to be done separately.

comment:12 Changed 12 years ago by Musti

Cool. Statistics needs to be a separate function as downloading data from the sensor and it being requested are time independent.

comment:13 in reply to: ↑ 9 Changed 12 years ago by mitar

Replying to DustWolf:

The script must thus record an offset / clear a log,

Why we don't have a cron script, which does this, and stores the output to the file. And we just read the file? This is much better approach. Then you really know that you are getting correct data.

Replying to Musti:
In Bash.

It is not Bash, but Ash. There are quite some differences. Be careful when you are doing scripts.

Changed 12 years ago by ziga.z

Changed 12 years ago by ziga.z

comment:14 in reply to: ↑ description Changed 12 years ago by ziga.z

telemetry-stats.sh​ statistics for log.csv

comment:15 Changed 12 years ago by mitar

Floating point on Busybox. They suggest awk. It has dc as well.

comment:16 Changed 12 years ago by mitar

I have no idea what you are doing here, I just want to clarify some general concepts. We use cgi-bin script running on the node to output various information from the node. This script is run every time somebody (or our data collection server) access the URL of the script. So outputing this data should be a light operation because otherwise can open a door to DoS attacks on nodes.

Preferably, data should be cached on the node (/tmp is a good place for that) and cgi-bin script should just return output the data. You can make a cron script which runs the expensive computation every 5 minutes and stores it into the file and then cgi-bin script just outputs this file wherever it is accessed.

Kostko, we might create some API function for nodewatcher scripts which would do this automatically? So that you could call something like throttle(original function) and it would make sure that original function is not called more than X times in Y minutes. The issue here is that first request will still take a long of time.

I think that maybe we should not be computing anything at cgi-bin request because this also makes whole data fetching from nodes longer. cgi-bin script should really be fast. Maybe we could even make that URL in fact points to a static file, which is regenerated every X minutes by cron script? I am not sure if we want this thought, some entries in the cgi-bin script we might want recent and some entries we might don't care if they are 5 minutes old.

Note: See TracTickets for help on using tickets.
OSZAR »