PLGrid Forge
PLGrid Forge is a service for inference, management of credits and API keys, and access to models within a specific grant using PLGrid supercomputers. The service is hosted at https://llmlab.plgrid.pl and is integrated with the PLGrid Portal.
What is inference?
Inference in the field of artificial intelligence (AI) is the when a trained model processes new input data to generate predictions or decisions. For example, chatbots are services for inferring large language models (LLMs). They analyze queries and in turn produce responses. Inference can be performed both on local resources and by sending requests to an external server. PLGrid Forge is an example of the latter approach - model instances run on a supercomputer.
simple inference scheme
Why use PLGrid Forge?
A service for everyone
The PLGrid Forge service was designed with a wide range of recipients in mind. It provides popular language models for academic users (and not only!) interested in experimenting with AI technology.
Wide range of functionalities
PLGrid Forge is able to handle more than just large language models. The platform was designed to support a broad spectrum of artificial intelligence applications, including embedding models and vision-language models (VLMs). Thanks to this, users can conduct diverse research projects in a single environment.
Access to these resources is regulated by a credit-based billing system within the PLGrid infrastructure. Applying for credits mirrors the process of requesting access to regular computational resources through the PLGrid Portal. Tokens generated by a given model during inference are converted into credits. Their cost depends on the model size.
The most important capabilities of this service are:
- listing available models, their tags, and the assigned credit cost of one million tokens,
- generating and managing API keys,
- measuring credit usage in a grant, both total and per team member,
- imposing a monthly credit usage limit for team members,
- restricting the number of credits available from a specific key, both total and monthly.
Ease of use
PLGrid Forge offer full compatibility with the OpenAI API that makes using the service easier. Thanks to that, users can use libraries and syntax that are standard in the AI field to write code that utilizes the computing power of supercomputers without reinventing the wheel. Moreover, the platform offers public API endpoints that significantly simplify integration with external services and tools, such as Jupyter Notebook.
More information about the OpenAI API standard: https://platform.openai.com/docs/api-reference/chat
Privacy and security
A key aspect of the service is a strong focus on data privacy protection - PLGrid administrators do not have access to the content of user requests or responses generated by the models. Furthermore, no data leaves the inference servers and is neither collected nor processed by PLGrid. This opens the door to using artificial intelligence in a trusted, domestic environment without sharing data with foreign companies.
Credits allocation
Before starting inference, users must fulfill a few formal requirements, such us creating a team and a grant in PLGrid. Then, users can apply for credits that will allow the use of resources in the PLGrid Forge service. To gain access, there are four steps in total:
Step 1. Create an account, obtain affiliation, and create a team - https://guide.plgrid.pl/pl/home/for_beginners
Step 2. Receive a new grant and apply for credits for the LLM service in the Resources tab - https://guide.plgrid.pl/pl/grants/plgrid
Alternatively, credits can be added to an existing grant by renegotiating resources in the PLGrid Portal - https://guide.plgrid.pl/pl/grants/plgrid/proper/renegotiation
credits in the PLGrid Portal, on the right is the button for resource renegotiation
Step 3. Activate the service in the PLGrid Portal at https://portal.plgrid.pl/services/111
activated PLGrid Forge service in the Portal
Step 4. Go to the address of the service, generate an API key, and start working with PLGrid Forge!
Available models
The list of supported models can be found in the Models tab. Each model has its name, cost, and tags. The model name is used, among other things, in the OpenAI API to indicate which model should be queried. The number located right under the name is the cost of using one million tokens by that model.
example list of models available to the user
Suggest new models to add through Helpdesk
Tags
The table presents the most important model tags:
| Tag | Description |
|---|---|
| Active | the model is available, queries can be sent to it |
| Inactive | the model is temporarily suspended, usually for technical reasons, e.g., node image update |
| Accessible | the user has at least one grant that allows using this model |
| Inaccessible | opposite of the tag Accessible, applying for access is conducted through Helpdesk |
| Non-commercial | users with commercial affiliations cannot use this model |
| FC | the model supports function calling |
| EMB | the model is used for embedding instead of text prediction |
API Keys
An API key in inference serves to authenticate the connection with the server where the model runs.
Anyone who knows the content of the key will be able to send requests to the model, regardless of whether they have a PLGrid account or not. Therefore, the key should be treated like a password for accessing your own resources.
Generating a key
To generate a key, go to the Grants tab and click Generate API Key next to the selected grant. Alternatively, in the same tab, click the tile of the selected grant and enter Actions. Then, in the Generate API Key field, click Generate.
in the Actions tab of a selected grant, a key can be generated
In the window that opens, enter basic information about the key:
- name - should be unique among other keys in the given grant; this is the only mandatory field,
- expiration date - after this date, the key will not allow authorizing a connection with the server; this date cannot be changed after the key is created,
- monthly credit limit - how many credits the key can cumulatively use from the grant's credit pool within one month,
- total credit limit - how many credits the key can cumulatively use from the grant's credit pool.
We encourage setting monthly limits, as this minimizes damage in case the key is made public.
basic information necessary to generate a key
Next, click Create, and the content of the key will appear as Your API Token. The key must be copied and saved.
field with the generated key, on the right is the copy button
For security reasons, the content of keys is not stored in the PLGrid Forge system. This means that the content of the key can only be viewed immediately after its creation.
Managing keys
The API Keys tab contains a list of the user's keys along with basic information, status, and actions. All members of the team can create and manage only their own keys.
statuses and available actions for API keys
A key can have the following status:
- active - the key can be used,
- expired - the key's expiration date has passed,
- revoked - the user revoked the key using the
Revokeaction.
Other actions that can be performed with a key include changing limits - monthly and total - and archiving. After archiving, the key will be revoked and removed from this view, and API access will be immediately blocked. It can still be found in the Archived Keys tab. When creating new keys, the name must be unique only among non-archived keys. This means a new key can be named with the name of an archived key.
Management Keys
Management keys are used to generate API keys via requests to a specially prepared endpoint. The main advantage of this solution is that it does not require using the browser service. However, creating a management key is possible exclusively through the Management Keys section in the PLGrid Forge service. Click Create management key. A window will appear where you need to enter the name and expiration date of the key.
window for creating a management key
The generated key can be used to create new API keys, but it cannot itself be used for inference. Management keys can authorize connections only to the endpoint intended for key generation.
Please note that this key has no usage limits and allows generating API keys limited only by the limit for the given user. Therefore, exercise extreme caution not to make the management key public. The key can be revoked at any time in the Management Keys section.
API Endpoints
When credits are allocated in the grant, user selected a model and generated an API key, all is left is starting with the inference. In PLGrid Forge, connection is established via endpoints, which are public URL addresses where models await requests. Some endpoints are designated for e.g. obtaining information about models, not inference.
Regardless of the type, a request sent to a specific endpoint must contain:
- the URL address of the given endpoint,
- the header
accept: application/json, - the authorization header, containing the content of the API key.
Text Prediction
Endpoint: https://llmlab.plgrid.pl/api/v1/chat/completions
Typical use of the PLGrid Forge service is text prediction. Send a POST request and attach a message in JSON format. You need to enter the content of the key <API_key>, the name of the selected model <model_name>, and the text of the query in the content field.
curl - bash command
Example request using the curl command:
curl -X 'POST' \
'https://llmlab.plgrid.pl/api/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <API_key>' \
-H 'Content-Type: application/json' \
-d '{ \
"model": <model_name>, \
"messages": [ \
{ \
"role": "user", \
"content": "What is ACK Cyfronet AGH in Krakow? Make sure your answer is short, max 1 sentence." \
} \
], \
"max_tokens": 100, \
"top_p": 1, \
"temperature": 1, \
"presence_penalty": 0, \
"frequency_penalty": 0, \
"stream": false \
}'
The response will be sent in JSON format with the text located in the content field.
In the request, you can also change the stream field to true to receive each token in a separate message. The response will be a list of JSON messages with a few characters in the content field.
requests - Python library
Alternatively to the curl command, you can use the requests library after installing it in Python. In version Python/3.11.5 and requests/2.33.1, the syntax for the same request looks as follows:
import requests
import json
apikey = <API_key>
model = <model_name>
endpoint = 'https://llmlab.plgrid.pl/api/v1/chat/completions'
headers = {'accept': 'application/json', "Authorization": f"Bearer {apikey}", 'Content-Type': 'application/json'}
payload={
"model": model,
"messages": [
{
"role": "user",
"content": "What is ACK Cyfronet AGH in Krakow? Make sure your answer is short, max 1 sentence."
}
],
"max_tokens": 100,
"top_p": 1,
"temperature": 1,
"presence_penalty": 0,
"frequency_penalty": 0,
"stream": 'false'
}
response = requests.post(endpoint, json=payload, headers=headers)
if response.ok:
data = response.json()
print(json.dumps(data, indent=4))
The requests library can be installed from https://pypi.org/project/requests/
openai - Python library
The PLGrid Forge service is fully compatible with the syntax used by the OpenAI API. Therefore, the openai library in Python can also be used for text prediction. The same query will look as follows (version Python/3.11.5 and openai/2.30.0):
import openai
apikey = <API_key>
model = <model_name>
client = openai.OpenAI(
api_key=apikey,
base_url="https://llmlab.plgrid.pl/api/v1"
)
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": "What is ACK Cyfronet AGH in Krakow? Make sure your answer is short, max 1 sentence."
}
],
)
print(response.choices[0].message.content)
The key difference is that some options are hidden by the library. This simplifies the inference interface, especially if the user is accustomed to using this library in their code. Also, note the change in entering the endpoint - the general address is stored in base_url, while chat/completions is invoked by OpenAI() when creating the response.
The openai library can be installed from https://pypi.org/project/openai/
Listing models
PLGrid Forge also offers the ability to list available models from the terminal. Depending on the endpoint, the response will have a different format:
Endpoint:
- https://llmlab.plgrid.pl/api/v1/models - returns a response in a format compatible with the OpenAI API,
- https://llmlab.plgrid.pl/api/v1/models-plgrid-format - returns a response containing more information beyond the model name, including status, credit cost, and tags.
To list models using this method, send a GET request to one of the above endpoints. For example:
curl -X 'GET' \
'https://llmlab.plgrid.pl/api/v1/models' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <API_key>'
Model Status
Endpoint: https://llmlab.plgrid.pl/api/health
For each model, you can isolate its status by sending a GET request to the /health endpoint. To select a specific module, simply append ?model=<model_name> to the URL. If no model is specified, the response will contain the statuses of all available models.
Example script using the requests library:
import requests
import json
apikey = <API_key>
model = <model_name>
endpoint = 'https://llmlab.plgrid.pl/api/health'
headers={'accept': 'application/json', "Authorization": f"Bearer {apikey}{apikey}"}
response = requests.get(f'{endpoint}?model={model}', headers=headers)
if response.ok:
data = response.json()
print(json.dumps(data, indent=4))
Embedding
Endpoint: https://llmlab.plgrid.pl/api/v1/embeddings
Another type of inference in the PLGrid Forge service is embedding. The model maps input data, e.g., text or a list of words, into a vector of numerical values. Embedding has many applications, including improving the performance of AI models, as well as text analysis or database operations.
Creating an embedding for your data requires sending a POST request in JSON format and selecting a model with the EMB tag. Often these models also have the word "embedding" in their name. You must also select a numerical format for the generated vector. Example message:
import requests
import json
apikey = <API_key>
model = "Qwen/Qwen3-Embedding-0.6B"
endpoint='https://llmlab.plgrid.pl/api/v1/embeddings'
headers={'accept': 'application/json', "Authorization": f"Bearer {apikey}", 'Content-Type': 'application/json'}
payload={
"input": "string",
"model": model,
"encoding_format": "float"
}
response = requests.post(endpoint, json=payload, headers=headers)
print(response)
if response.ok:
data = response.json()
print(json.dumps(data, indent=4))
Creating a key with the management key
Endpoint: https://llmlab.plgrid.pl/api/v1/management/api_keys/create
PLGrid Forge provides the ability to create API keys bypassing the browser service via management keys. To do this, you must send all basic information about the key and additionally specify which grant the new key will be assigned to using the grant_id field.
Currently, the grant number can only be read from the content of the site address in the PLGrid Forge service. Go to the Grants tab and click the tile of the selected grant, then write down the number from the page address from your browser, which should have a format like: https://llmlab.plgrid.pl/grants/29. In this case, the grant number will be 29.
Example POST request to create an API key:
import requests
import json
mgmkey = <management_key>
grant_id = <grant_number>
endpoint='https://llmlab.plgrid.pl/api/v1/management/api_keys/create'
headers={'accept': 'application/json', "Authorization": f"Bearer {mgmkey}", 'Content-Type': 'application/json'}
payload={
"name": "testkey",
"grant_id": grant_id,
"total_credits_quota": 1,
"monthly_credits_quota": 1,
"expires_at": "2026-04-16T17:00:00"
}
response = requests.post(url, json=payload, headers=headers)
print(response)
if response.ok:
data = response.json()
print(json.dumps(data, indent=4))
In this query, the management key <management_key> was used for authorization.
Browser service for endpoints
Current public endpoints available in PLGrid Forge are collected in the additional service on the address: https://llmlab.plgrid.pl/api/docs. Furthermore, it allows testing endpoints, doing part of the work for the user.
endpoints along with example requests
In the top right corner, there is a key authorization field Authorize.
authorization button, needs to be clicked and the key content entered
After authorization, the specified key will be automatically added to every request sent from this page. Expanding the window of the selected endpoint, a Try it out button will appear on the right; after clicking it, you can enter the content of your query and send it using Execute.
the Try it out button allows editing the query content
Below, the full curl command used to send the query, the endpoint URL address, and the server response in JSON format are displayed.
At the very bottom of the page, some fields are listed, they present JSON messages received in response from the above endpoints.
Monitoring credit consumption
per grant
In the view of each grant, detailed information regarding credit consumption is listed.
information about the grant
per key
In the API Keys tab, in the list of keys, clicking on the key name allows viewing detailed information about it. This includes statistics on credit consumption by the selected key.
in key details, you can see credit consumption
This view does not contain the key content, as it is not stored anywhere in the PLGrid Forge service.
per team member
The number of credits consumed by each team member individually can be viewed in the Members section in the view of the selected grant. The statistic is listed both for the given month and cumulatively over the duration of the grant.
Usage Limits
Anyone who possesses the content of an API key can use inference as long as credit consumption does not exceed the currently lowest of the imposed limits:
- number of available credits in the grant,
- total limit for the key,
- monthly limit for the key,
- monthly limit for the team member to whom the key belongs.
The number of available credits in the grant is assigned in the PLGrid Portal during the allocation of grant resources. Limits for the key are provided during generation, but can be changed later in the API Keys tab.
In the view of a specific grant under Actions, it is possible to set a limit for all team members collectively. However, the same limit can be assigned to a given member individually by entering the Members section and clicking Change limit. Every change of limit, whether collective or individual, overwrites the previous value; therefore, only the latest number is taken into account.
The limit for a team member is set monthly, and there is no possibility to set total usage limits for each member.
in the member list, individual limits can be set
It must be emphasized that the individual limit is an additional restriction imposed cumulatively on all keys of a given user, not on their account. Keys generated by a member who has reached the monthly credit usage limit will not allow access to the API. The user can still use keys from other members of the team and other PLGrid users. This happens because anyone, even without an account, can use inference as long as they know the key content. The credit cost will be assigned to the user who generated the key.
For example, if the instructor distributes their API key to participants of the training, the usage generated by all people in the room will be counted against the instructor's limits in their grant.
Last update: April 28, 2026