1. What is Amorphic Data Analytics?
Amorphic Data Analytics is a fully managed service provided by the platform that enables scientist and developers to quickly and easily build, train and deploy machine learning models.
2. What can I do with Amorphic Data Analytics?
You can create new machine learning model or create a new notebook to train the model on a dataset created in the Datasets tab.
3. How do I get started with Amorphic Data Analytics?
To get started with Amorphic Data Analytics, login to Amorphic Data and you can follow two approaches - ML Models and Notebooks. ML models provide you a way to bring your existing models into the Amorphic Data platform and run them on a dataset created in the Amorphic Data. In Notebooks - you can launch a notebook instance with an example notebook, modify it to connect to your data sources and then build, train, validate, and deploy the resulting model into production with just a few inputs.
4. What if I have my own training environment?
Amorphic Data provides a full end-to-end workflow in which you can perform ETL as well as train and host an ML model, but you can continue to use your existing tools. You can easily transfer the results of each stage in and out of Amorphic Data as your business requirements dictate.
5. What is ML models component of Amorphic Data?
ML models provide you a way to bring your existing models into the Amorphic Data and run them on a dataset created in the Amorphic Data.
6. How do I get started with the ML models component of Amorphic Data?
You can get started by adding a new model from the ML Model page of Amorphic Data.
7. What are the ways in which you can create a model with the ML models component of Amorphic Data?
There are three ways you can create a model from the ML Model page of Amorphic Data - you can use AWS Marketplace Subscribed model or any model artifact s3 location or upload an AWS SageMaker model tar file to create your model.
8. How can I use an AWS Marketplace model in the Amorphic Data?
Upon subscribing the ml model from AWS Marketplace using the AWS credentials provided by the Amorphic Data, the model resource is available in the “Existing Model Resource” dropdown in “Register New Model” page. You can use the model resource to create a new model after providing other information regarding the model.
9. How can I bring my own AWS Sagemaker model into the Amorphic Data?
In the “Register New Model” page of the Amorphic Data, you can import your existing AWS Sagemaker model tar files to create the model.
10. What other information do I need to create a model?
You need to know whether the output of the run analytics model on the dataset will be just the metadata or a Dataset data. Most of the time the output will be a Dataset data. You also need to provide information related to algorithm. Currently, the Amorphic Data supports only three type of algorithms - AWS XGBoost, Seq2Seq and DeepAR algorithm. You need to provide the supported file format of datasets in which the model will be run. You also need to provide the preprocess and post process ETL job created in the Amorphic Data ETL.
11. How is Amorphic Data ETL connected to the Amorphic Data ML?
The Amorphic Data ETL provides you the capability to run preprocessing and post processing ETL jobs in the ML pipeline. With preprocessing, you can perform ETL required to convert the original dataset into the format required to run the ML model. With post processing you can perform ETL operation required to convert the prediction results from the model into an output dataset format.
12. What does output type Dataset data mean in the “Register New Model” page of the Amorphic Data?
Dataset data provides you the capability of specifying the input and output schema for the preprocessing and postprocessing ETL jobs. This means that Datasets from the Dataset listing page of Amorphic Data will be used to perform ML as well as ingest the ML output data.
13. What is the format of input and output schema in the “Register New Model” page of Amorphic Data?
The format of the input and output schema is an array of JSON objects (key value pairs) containing the column type (String, Integer, Double, Date), name of column and the description.
Following is the sample format:
[{“type”:”String”,”name”: “Vin”,”Description”:”a”}, {“type”:”String”,”name”: “CustomerId”,”Description”:”a”}, {“type”:”String”,”name”: “CheckoutLocation”,”Description”:”a”}]
The input schema should match the schema of the dataset (from the Amorphic Data Dataset) on which the preprocessing ETL job will be run. Output schema should match the schema of the dataset (from the Amorphic Data Dataset) which will ingest the post process results from the post processing ETL job run on the ML model output. You create the preprocessing and post processing using the Amorphic Data ETL.
15. How do I run the model created in the Amorphic Data ML?
You can run the model created in the Amorphic Data ML by selecting the dataset from the dataset listing page of the Amorphic Data Dataset. Upon selecting the dataset, go to the respective files in the dataset and from the top right corner of the file click on “Run Analytics”.
“Run analytics” will provide you the option to select the ML model you want to run on the dataset. Note that only certain ML models will be available to run on the dataset. The available models will be the one which match the schema of the dataset to the preprocessing input schema as specified during the model creation.
Apart from that you will also be given the option to specify the target dataset. The target dataset is the dataset that will ingest the model output after the post processing ETL job. Only those target datasets will be available for selection which match the output schema as specified during the model creation.