- What are the trained models? are they algorithms or a collection of parameters in a file?

"Model" could refer to the algorithm with or without a set of trained parameters.
If you specify "trained model", the focus is on the parameters, but the algorithm is implicitly part of that, since without the algorithm, the parameters are just an arbitrary set of numbers.

- What do they look like? e.g. file extensions

That very much depends on both the algorithm you're using *and* the specific implementation. A few simple examples might help clarify matters. Let's suppose that the problem we're trying to learn is the exclusive or (XOR) function:

```
a | b | a XOR b
--+---+---------
0 | 0 | 0
0 | 1 | 1
1 | 0 | 1
1 | 1 | 0
```

First, let's use a 2-layer neural net to learn it. We'll define our activation function to be a simple step function:

$ f(x) = \begin{cases}
1 & \text{if } x > 0.5 \\
0 & \text{if } x \le 0.5
\end{cases} $

(This is actually a terrible activation function for real neural nets since it's non-differentiable, but it makes the example clearer.)

Our model is:

$h_0 = f(1\cdot a+1\cdot b + 0)\\
h_1 = f(0.5\cdot a + 0.5\cdot b + 0)\\
\,\;y = f(1\cdot h_0 - 1\cdot h_1 + 0)$

Each step of this essentially draws a hyperplane and evaluates to 1 if the input is on one side of the hyperplane and 0 otherwise. In this particular case, h_0 tells us if either a or b is true. h_1 tells us if they're both true, and y tells us if exactly one of them is true, which is the exact definition of the XOR function.

Our parameters are the coefficients and biases (the offset added at the end of each expression):

$ \begin{bmatrix}
1 & 1 & 0 \\
0.5 & 0.5 & 0 \\
1 & 1 & 0 \\
\end{bmatrix}$

They can be stored in a file in any way we want; all that matters is that the code that stores them and the code that reads them back agree on the format.

Now let's solve the same problem using a decision tree. For these, we traverse a tree, and at every node, ask a question about the input to decide which child to visit next. Ideally, each question will divide the space of possibilities exactly in half. Once we reach a leaf node, we know our answer.

In this diagram, we visit the right child iff the expression is true.

```
a+b=2
/ \
a+b=0 0
/ \
0 1
```

In this case, the model and parameters are harder to separate. The only part of the model that isn't learned is "It's a tree". The expressions in each interior node, the structure of the tree, and the value of the leaf nodes are all learned parameters. As with the weights from the neural network, we can store these in any format we want to.

Both methods are learning the same problem, and actually find basically the same solution: a XOR b = (a OR b) AND NOT (a AND B). But the nature of the mathematical model we use depends on the method we choose, the parameters depend on what we train it on, the file format depends on the code we use to do it, and the line between model and parameter is fairly arbitrary; the math works out the same regardless of how we split it up. We could even write a program that tries different methods, and outputs a *program* that classifies inputs using the method that performed best. In this case, the model and parameters aren't separate at all.

- Especially, I want to find the trained models for detecting birds (the bird types do not matter). Are there any platforms for open-source/free online trained AI models??

I don't know of any pretrained models that specifically recognize birds, but I'm not in image-recognition, so that doesn't mean much. If you're not averse to training your own model (using existing code), I believe the ImageNet dataset includes birds. AlexNet and LeNet would probably be good starting points for the model. Most if not all of the the state of the art image recognition models are based on convolutional networks, so you'll need a decent GPU to run them.