Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a quickstart! #10

Closed
elementc opened this issue Oct 4, 2020 · 3 comments
Closed

Need a quickstart! #10

elementc opened this issue Oct 4, 2020 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@elementc
Copy link

elementc commented Oct 4, 2020

Hi Bennett! I see on your dev branch that you have a header for a quick start, but no content there. Want some help writing something here? What's a minimal usage of this package look like? :)

@bmeares
Copy link
Owner

bmeares commented Oct 7, 2020

Hey Element, thanks for opening the issue, sorry I just now noticed! I've been hesitant to write the README but it's about time I clean it up as we're nearing a beta release.

I'm having trouble clearly conveying the usages for Meerschaum in a single README. But minimal usage would look something like the following:

### install Meerschaum with its full-feature dependencies
$ pip install --upgrade meerschaum[full]

### launch the Meerschaum stack
###     (`mrsm stack` is an alias of docker-compose,
###     arguments in [brackets] are passed to the subprocess and not Meerschaum itself)
$ mrsm stack [-d]

### launch the Meerschaum shell, where you can manage connectors, pipes, etc.
###     (actions done in the shell can be done on the command-line instead)
$ mrsm
Meerschaum vX.X.X
mrsm ➤ 

I think I need an FAQ for the available actions. The usage and options can be seen in the CLI with help [action].

I'm planning on having a dashboard that interacts with the API like the CLI does, but frontend development is not my specialty.

Maybe I should make tutorials? Documentation can be overwhelming.

@elementc
Copy link
Author

elementc commented Oct 8, 2020

No worries, feel free to call me Casey, we spoke this past sunday at the LUP LUG.

I think a minimal example from zero to your first pipe would be useful.
I tried to follow your instructions and what's in the readme, and I was unable to get past starting the stack:

(mschm) casey@IRONFIRE:~$ mrsm stack [-d]
NOTE: Configuration file is missing. Falling back to default configuration.
You can edit the configuration with `edit config` or replace the file /home/casey/.config/meerschaum/config.yaml
Missing file /home/casey/.config/meerschaum/stack/resources/docker-compose.yaml.

Bootstrap stack configuration?

NOTE: The following files will be overwritten: [PosixPath('/home/casey/.config/meerschaum/stack/resources/docker-compose.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/datasources/datasource.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/dashboards/dashboard.yaml')] [Y/n] Y
ERROR: 
        Can't find a suitable configuration file in this directory or any
        parent. Are you in the right directory?

        Supported filenames: docker-compose.yml, docker-compose.yaml
        
(mschm) casey@IRONFIRE:~$ mrsm stack [-d]
Missing file /home/casey/.config/meerschaum/stack/resources/docker-compose.yaml.

Bootstrap stack configuration?

NOTE: The following files will be overwritten: [PosixPath('/home/casey/.config/meerschaum/stack/resources/docker-compose.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/datasources/datasource.yaml'), PosixPath('/home/casey/.config/meerschaum/stack/grafana/resources/provisioning/dashboards/dashboard.yaml')] [Y/n] Y
ERROR: Version in "./docker-compose.yaml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/
(mschm) casey@IRONFIRE:~$ mrsm stack [-d]
ERROR: Version in "./docker-compose.yaml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/
(mschm) casey@IRONFIRE:~$ docker-compose --version
docker-compose version 1.25.0, build unknown

Any info I can provide you to help debug this failure? I'm using the docker-compose and docker install that came with Pop OS 20.04. Your readme suggests installing docker-compose from pip but I have not done so: if you have a strict dependency on a particular docker-compose version you should maybe seek to keep that as a pip dependency, no?

Moreover, I think it's really important that the README give us a clear example of the usefulness of this tool. As written, it says the following:

Meerschaum is a platform for quickly creating and managing time-series data streams called Pipes.

Cool. Where can these time-series data streams come from? Where can they go to? I see some code for a SQL connector and you mentioned this was an ETL tool in our talk on Sunday, but are there already other existing sources and sinks?

I think the really critical thing is that, cool shell UI or cool web UI aside (I'd hold off on the web UI), once I have your package installed I have zero idea how to use it to extract, transform, or load data. Even the toyest of the toy examples would go a long way to showing how this tool is unique or useful.

Let's suppose I have... a sqlite3 db with a table full of student names, submission datetimes, and grades, and I'd like to extract it, apply some hashing function to the student name column, and dump out a csv of "anonymized" data. How?

(Is my supposition dumb? Is there a better scenario where this tool shines?)

@bmeares
Copy link
Owner

bmeares commented Oct 9, 2020

Hey Casey, thanks for the detailed post. I can't believe it took this long to find out, but the code for taking input to yes/no questions was just broken. I hadn't tested it properly, so answering Y still returned False 🤦. I guess this is a good example of why unit tests are important, and I need to get around to implementing them. It's definitely still an alpha release.

I've also changed the default version in the docker-compose.yaml file from 3.8 to 3 for better backwards compatibility.

Try upgrading to version 0.0.39 which should address these issues.

$ pip install --upgrade meerschaum
$ mrsm bootstrap config   ### if this doesn't work, just rm -rf ~/.config/meerschaum
$ mrsm stack

To address the README question, the current use case for Meerschaum is helping non-system engineers spin up a pre-configured Grafana/TimescaleDB stack and migrate outside data into TimescaleDB (particularly in the case of utilities data).

The plan is to implement the bootstrap pipes action so that all parameters can be easily set from one command. But for the time being, the following steps need to be taken:

In your example students DB (say you have a table assignments with columns submission_datetime, student_id, grade), you would take the following steps to migrate into the TimescaleDB / Grafana stack:

  1. Register a SQL connector with mrsm edit config. For sqlite, your config would look something like this:
meerschaum:
  connectors:
    sql:
      studentdb:   ### our label for the connection
        flavor: sqlite
        database: /path/to/students.sqlite
  1. Register the Pipe. Pipes are identified by three primary keys: (1) connection (connector_keys), (2) metric (metric_key), and (3) location (location_key). The location may be omitted, however, and will be for this example.

The connector_keys (-C) are the type and label of the connector defined in step 1, so in our case the connector_keys will be sql:studentdb.

The metric (-M) is a label we give to identify the contents of the Pipe (think power, energy, CO2, temperature, etc.). In this case I'll use the original table name assignments.

The location is a label to describe a Pipe's location, and if omitted, it's None/NULL. This is a way to further partition data streams among buildings for example, where the parent database may contain an entire campus's worth of data, but we want to partition at the building level. The plan is to derive Pipes with location as TimescaleDB continous / real-time views of the parent metric Pipe.
E.g. sql_studentdb_assignments would be the parent for sql_students_assignments_thirdblock (the : in the connector keys is converted to _ for convenience).

$ mrsm register pipes -C sql:studentdb -M assignments
  1. Set parameters. For Pipes to be automatically indexed and efficiently cached, some metadata are needed. You can set the metadata with mrsm edit pipes
$ mrsm edit pipes -C sql:studentdb -M assignments

This will open your editor and allow you to edit the parameters of the Pipe. Your parameters would look something like this:

columns:
  datetime: submission_datetime
  id: student_id
fetch:
  ### This is the query which is executed on the remote host
  definition: SELECT * FROM assignments
  ### How many minutes into the past to look for backlogged data (optional)
  backtrack_minutes: 0

Here's one way to register a Pipe from a script instead (does the same as above):

>>> pipe = mrsm.Pipe('sql:studentdb', 'assignments')
>>> pipe.parameters = {
...   'columns' : {
...     'datetime' : 'submission_datetime',
...     'id' : 'student_id',
...   'fetch' : {
...     'definition' : 'SELECT * FROM assignments',
...   }
... }
>>> pipe.register()
  1. Sync the Pipe (WIP). Once a Pipe is registered, it may be synced with its remote source. Think of this like materializing a view between hosts without linking servers.
    I'm still implementing the sync logic, and you can see how the syncing works by inspecting the pipe.fetch() method. In essence, it builds a SQL query to grab the latest data from the remote host, diffs it against its own recent data, and updates its source.
### this will sync all Pipes from the sql:studentdb connection
$ mrsm sync pipes -C sql:studentdb

I hope this wasn't too complicated. I think of it as a framework for non-system engineers who need to build visualizations for large sets of time-series data. There are a lot of pieces to get into, like how the API works and how Connectors work (SQLConnector vs APIConnector vs other types to come), but overall Pipes are a way to simplify many different connection types and organize data.

Thank you for taking interest in Meerschaum! It's exciting to answer GitHub issues and I hope I get more feedback in the future. If you have any further issues (and be warned, you'll probably run into a few bugs), feel free to reach out here or at my email bennett.meares@gmail.com or on Discord (bennett#0708). I'm still getting used to Mumble and Matrix, but I hope to join in on more LUP LUGs. Everyone was so nice, and I look forward to meeting more Linux nerds like me.

@bmeares bmeares self-assigned this Oct 9, 2020
@bmeares bmeares added the documentation Improvements or additions to documentation label Oct 20, 2020
@bmeares bmeares closed this as completed Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants